Running Gemini Nano locally in the browser

by Jose Moreno, Developer

built-in-ai-models

Gemini Nano is here; with this, AI is no longer something that only lives in the cloud. Now, we can have it built into our own browsers, starting with Chrome 138. This changes not only how new web applications are built, but also how we think about privacy and product design for our own projects.

This article is a technical write-up of a real-world project where I built a web application that utilizes AI, running entirely in the browser.

This is not a step-by-step tutorial on how to create a lean canvas using web technologies, but to show my first experience with gemini nano, how the system works, and what I learned from using AI at the browser level instead remote models.

During the development of this project, I explored how in-browser AI affects user experience and development constraints. While the benefits are attractive (especially about privacy), this approach also comes with some limitations that need to be considered if you plan to make an app using the built-in ai that comes with Chromium-based browsers.

What is Gemini Nano?

model-ui

We all love Gemini, Google's owned AI model that's really capable and interesting, early last year google shipped a built-in AI called Gemini Nano with some interesting features available as API, such as:

  • LanguageModel API (Summarizer and Writer)
  • Prompt API
  • Translator API
  • Proofreader API
  • and more... (built-in APIs)

The interesting part of this new model brought by Google is that it is a local-first provider that runs entirely in the browser, with no need for tokens and a privacy-first focus on top of an optional cloud provider with the same interface, which we are not going to discuss this time.

How to Enable Gemini Nano in Chrome (Built-in AI APIs)

To use Gemini Nano and Chrome's built-in AI APIs locally, you need to enable a few flags and meet the hardware requirements.

Requirements (Summary)

  • Chrome 140 (as for today)+
  • OS: Windows 10/11, macOS 13+, Linux, or ChromeOS
  • Storage: ~22 GB free disk space
  • Memory:
    • GPU with >4 GB VRAM or
    • CPU with ≥16 GB RAM and ≥4 cores
  • Unmetered internet connection

Enable Built-in AI on localhost

  1. Open Chrome and go to: chrome://flags/#optimization-guide-on-device-model
  2. Set it to Enabled.
  3. Enable Gemini Nano support by going to: chrome://flags/#prompt-api-for-gemini-nano
  4. Set it to Enabled.
  5. You can optionally check the model status by going to: chrome://on-device-internals
Notes
  • The model is downloaded automatically on first use.
  • If disk space drops below ~10 GB, the model is removed and re-downloaded later.
  • All built-in AI APIs work on localhost.

What did I build?

initial-app

I frequently work with sensitive information. Meaning that sending this data to external AI services is not a choice most of the time due to concerns about trust.

The project for this article is a web application designed to learn from and share my experiences with this new technology. In this case, I used it to create a Lean canvas.

From a user perspective, the goal is to:

  • Show how built-in AI works
  • Export how far browser-level AI works and its limitations
  • Show a real implementation of this technology

Rather than treating Gemini Nano as a replacement for a remote LLM, I approached it as a constrained but powerful tool that would (or would not) influence future projects.

At this point, you might be wondering, how am I using Gemini nano?

I am mainly using the Prompt API and the Summarizer API.

The Prompt API is used for: To review each block's details and provide the user with AI suggestions, and/or to do an overall consistency check, which evaluates inconsistencies and risks between the blocks.

The Summarizer API is used to: generate concise summaries of individual canvas blocks after a user requests an AI review of a block, providing the users with a quick summary version of their content

You can try the app here.

How the System Works

select-review

Technical Flow

  1. User edits a canvas block, and the content gets stored in browser memory
  2. User requests review, and the Prompt API and Summarizer API get called with the data running locally
  3. Results displayed

When Prompt API is Called

  • Block review: user clicks "Review" on a block then Prompt API analyzes content and returns structured feedback

  • Consistency check: user requests full canvas review then Prompt API evaluates relationships between blocks

Both run entirely in the browser using the local Gemini Nano model.

When Summarizer is Used

  • Automatic: after a block review completes, if available

  • Manual: user explicitly requests a summary for a block

The summarizer model downloads once, then runs locally.

overall-review

Performance and Technical Specifications

While working with the model I discovered something interesting. The model itself is very fun to work with—with rules given to the model you can measure the response time and context window size.

Context Window

When it comes to context window size, the information varies depending on which model you're using, for this case we are using the default model that comes with Chrome.

According to the model itself, Gemini Nano has a context window of 8,192 tokens. However, when using the built-in AI APIs, you'll encounter different limits:

In practice, when using Chrome's built-in AI APIs, you'll encounter different limits:

  • Prompt API: Per-prompt limit of 1,024 tokens, with sessions having an input quota of 9,216 tokens available
  • Summarizer API: Input quota of 6,000 tokens available

Monitoring Token Usage

The Prompt API provides simple ways to keep track of your token usage. You can check your current usage and available quota directly from the session object:

const session = await LanguageModel.create();
console.log(
  `${session.inputUsage} tokens used, out of ${session.inputQuota} tokens available.`,
);
// Output: 0 tokens used, out of 9216 tokens available.

Note: When creating a session, Chrome will warn you if no output language is specified. It's recommended to specify a supported output language code (en, es, or ja) to ensure optimal output quality and proper safety attestation.

The Summarizer API also provides similar monitoring capabilities:

const summarizerSession = await Summarizer.create();
console.log(`${summarizerSession.inputQuota} tokens available.`);
// Output: 6000 tokens available.

Before sending a prompt, you can measure how many tokens it will consume using measureInputUsage():

const tokenCount = await session.measureInputUsage('Your prompt text here');
console.log(`This prompt will use ${tokenCount} tokens`);

If you exceed the context window allowed, the API will automatically remove older messages (except system prompts) to make some space for the new ones. You can listen to this event using the quotaoverflow event:

session.addEventListener('quotaoverflow', () => {
  console.log('Context window overflowed, some messages were removed');
});

Processing Time

Processing time is highly dependent on the user's hardware. The model runs entirely on the host machine, which means your device impacts directly on the response time.

On my Mac Mini with 16GB of RAM and an M1 chip, I tested processing times for various token counts:

Token CountProcessing Time
1055.44s
1,0056.55s
9,12534.46s

During my tests with token counts above the API's 9,216 token quota were automatically limited to the maximum possible value. Processing times can vary between runs even with the same token count (9,125 tokens) showing times ranging from 4.49s to 34.46s.

One big advantage of using a built-in AI model is that there's no network latency, but the trade-off is that the speed depends entirely on your hardware.

What did I learn?

Using Gemini Nano made clear what I had in mind before, that in-browser AI is not a groundbreaking technology, as the first LLM models were. With that being said, although Gemini nano is no more than just a small, local model, I can agree on how useful it can be for simple to low complexity tasks.

For things like translations, text review, summaries, and basic consistency checks, it works fine, and it is really simple to implement thanks to the available API.

My main interest was the privacy features it brings to the table, and I was highly impressed as it leaves no data in or out of the browser. However, it comes with limited reasoning and context size.

In real-world projects it won't be a replacement for cloud-based models however I think it is great for small tasks that won't require complex thinking or processing.

Limitations

While Gemini Nano is useful for small tasks, using AI at the browser level introduces several limitations. For now, these limitations will influence a project entirely. I recommend taking these limitations into consideration if you plan to use a built-in browser AI as of January 2026.

  • Limited availability: Gemini Nano only works on Chrome versions above Chrome 138, operating systems, and minimal hardware configurations, which immediately reduces the availability and user base.
  • Hardware requirements: The need for sufficient disk space, memory, and CPU/GPU resources makes it unsuitable for low-end devices.
  • Limited capabilities compared to other LLMs: Compared to cloud-based LLMs, Gemini Nano is weaker and slower depending on the context for the task.
  • Small context window: A small context window forces you to keep prompts small and simple.

As a result, its use cases are best limited to projects that don't require deep analysis, broad context.

Final thoughts

The last few months, I've seen several new AI browsers (such as Altas and Coment) skyrocket, so Gemini nano didn't feel like fresh air, just another AI integration. However, after looking at it closely, I realized it wasn't just a new step to making Chrome an AI browser following the trend, but an actual model running locally on the user's machine.

I care deeply about privacy, and for me, this was the most important green flag from Gemini Nano. Being able to provide AI features without sending any user data outside the browser changes how I think about my projects.

With a few iterations and updates, I believe Gemini Nano can become a solid option for some web projects, especially those with a defined scope or basic and lightweight tasks for example projects that might need internationalization support, Grammar checkers in text editors, Summarizer for selected texts inside an app. These are just a few ideas where this new feature seems to be most useful.

That said, in-browser AI works best when treated as a tool rather than a core dependency. Used intentionally and with its limitations in mind, Gemini Nano can meaningfully improve user experience without introducing any cost to the user or privacy issues with sensitive data, which often come with cloud-based AI models, where you don't know where your data goes.

Additional Resources

If you're interested in exploring other projects built with Chrome's built-in AI APIs, here are some resources:

Have you built something interesting with Chrome's built-in AI? Feel free to try it out and share your projects and experiences with the rest of the world!

More articles

Five Business Benefits for Startups Attending Leeds Digital Festival 2024

Leeds Digital Festival (#LDF24) has become one of the UK's most influential digital culture events. In this article, we explore five reasons why startups should attend this event.

Read more

Navigating Bias in AI

Recently, our co-founder Saile Villegas joined a panel at Bruntwood SciTech in Birmingham, to discuss the everchanging outcomes of AI, with a particular focus on the issue of bias. In this article, we will share our top takeaways from this informative event.

Read more

Let's talk!

Tell us about your project!

We limit the number of clients we work with. Please get in touch to check availability.

  • Based in the UK
    - West Village, Wellington Street, Leeds, West Yorkshire LS1 1BA, UK
    - London, UK

© Seeai LTD. 2026