Browser AI Needs Shared Model Caches, Not More Duplicate Downloads - Ken Ashe

The hidden tax on in-browser models

Hugging Face is experimenting with the proposed Cross-Origin Storage API in Transformers.js. That sounds like plumbing, because it is. It is also one of the more practical things happening in browser AI.

Transformers.js lets developers run transformer models in the browser, often through WebGPU or WebAssembly. The nice part is obvious: local inference, lower server cost, better latency after setup, and user data that does not have to leave the device. The annoying part is also obvious to anyone who has shipped it: model files are big.

A small embedding model may be manageable. A speech model, vision model, or capable text model can quickly become a serious download. Today, if several sites use the same model through similar client-side stacks, each site may still end up storing its own copy because browser storage is scoped and partitioned by origin. That design is not a bug. It is a privacy boundary.

But for AI workloads, it creates a weird outcome. The user’s machine may hold multiple copies of the same model artifacts because the browser is correctly preventing one site from casually touching another site’s data. Good for privacy. Bad for local AI ergonomics.

Shared storage is simple until tracking enters the room

The proposed Cross-Origin Storage API is aimed at this kind of problem: allow controlled sharing of storage across origins. In the Transformers.js case, the target is not user secrets. It is reusable public model assets.

That distinction matters. A model weight file is not the same as a user’s browsing history or session data. If a browser can safely let multiple sites reuse the same public model files, local AI becomes less wasteful. Fewer repeat downloads. Less disk bloat. Faster first useful interaction on the second, third, or tenth site that uses the same model.

multiple separate browser windows each downloading identical heavy blocks, contrasted with the same windows drawing from

The hard part is that cross-origin storage has always been a tracking danger zone. Any shared state across sites can become a fingerprinting or re-identification tool if the rules are loose enough. Modern browsers have spent years partitioning caches, limiting third-party storage, and closing side channels for exactly this reason.

So the question is not “should sites share storage?” The better question is narrower: can browsers create a safe lane for shared, public, content-addressed artifacts like model files?

That suggests constraints. Storage should probably be tied to immutable assets, strong hashes, explicit providers, quota limits, and browser-mediated access rather than arbitrary read-write buckets. The closer this looks like “install this known model artifact once,” the more credible it gets. The closer it looks like “let origins coordinate state,” the more it smells like tracking infrastructure with an AI sticker on it.

This is what local AI adoption actually depends on

A lot of browser AI discourse gets stuck on model quality. Can a local model answer well enough? Can WebGPU run it fast enough? Can quantization keep it usable?

Those are real questions. But distribution is just as important. If the first run is slow, bandwidth-heavy, and repeated across every app, most users will bounce before they ever judge the model. Developers will quietly move inference back to servers because the product experience is easier to control there.

Hugging Face’s experiment is useful because it treats the browser as an operating environment, not just a demo surface. Transformers.js is not only competing with hosted APIs. It is competing with the expectations people have from installed apps, where shared runtimes, package caches, and system libraries are normal.

The catch is that the web is not an app store. Its security model is stricter because links are cheap and origins are untrusted by default. Any storage proposal that helps AI workloads has to survive that reality.

If I were building with Transformers.js today, I would still assume duplicated downloads and design around them: pick smaller models, lazy-load by task, show progress clearly, cache aggressively within my own origin, and measure cold-start pain on real consumer networks. But I would watch this API closely. The builders who benefit first will not be the ones chasing the largest browser model. They will be the ones packaging common, reusable models so the second app feels instant. The missed catch: shared storage only helps if many apps actually converge on the same artifacts. Custom fine-tunes everywhere bring the duplication problem right back.