On-device tool

How much context can a small model really use?

A small on-device model has a nominal context window of a few thousand tokens — but the usable window is often smaller. This tester runs a deterministic needle-in-a-haystack probe against Qwen3.5-0.8B in your browser: it hides a secret code near the start of a growing filler context, then asks for it back. Watch recall, tokens/sec, and time-to-first-token at each size. It is an honest, visceral view — not a benchmark you can game.

Loading tester…
100% on-device

The model runs on WebGPU in your browser. Nothing is sent to a server — close the tab and it is gone. First run downloads the ~404 MB model, then it is cached.

Why it matters

Knowing the practical context budget of an on-device model tells you how much you can actually stuff into a prompt before recall and throughput fall off — critical for RAG, agents, and long-document tasks.