Browser Usage
Run LLMs directly in the browser with WebGPU acceleration. No server required.
100-150 tok/s with WebGPU ·Models cached in IndexedDB ·Fully private, runs locally
React Hooks
The easiest way to use Gerbil in React. Follows the same patterns as Vercel AI SDK.
useChat
Full chat with message history, thinking mode, and streaming:
01import { useChat } from "@tryhamster/gerbil/browser";02
03function Chat() {04 const { 05 messages, // Message[] with id, role, content, thinking?06 input, // Current input value07 setInput, // Update input08 handleSubmit, // Form submit handler09 isLoading, // Model loading10 loadingProgress, // { status, file?, progress? }11 isGenerating, // Currently generating12 thinking, // Current thinking content (streaming)13 stop, // Stop generation14 clear, // Clear messages15 tps, // Tokens per second16 error, // Error message17 } = useChat({18 model: "qwen3-0.6b",19 thinking: true,20 system: "You are a helpful assistant.",21 maxTokens: 512,22 });23
24 if (isLoading) {25 return <div>Loading model: {loadingProgress?.progress}%</div>;26 }27
28 return (29 <div>30 {messages.map(m => (31 <div key={m.id}>32 {m.thinking && (33 <details>34 <summary>Thinking...</summary>35 <pre>{m.thinking}</pre>36 </details>37 )}38 <p><strong>{m.role}:</strong> {m.content}</p>39 </div>40 ))}41 42 <form onSubmit={handleSubmit}>43 <input 44 value={input} 45 onChange={e => setInput(e.target.value)}46 disabled={isGenerating}47 placeholder="Ask anything..."48 />49 <button type="submit" disabled={isGenerating}>50 {isGenerating ? `${tps.toFixed(0)} tok/s` : "Send"}51 </button>52 {isGenerating && <button type="button" onClick={stop}>Stop</button>}53 </form>54 </div>55 );56}useCompletion
One-off text generation without message history:
01import { useCompletion } from "@tryhamster/gerbil/browser";02
03function Generator() {04 const { 05 complete, // Function to generate text06 completion, // Generated text (streaming)07 thinking, // Thinking content (if enabled)08 isLoading, // Model loading09 isGenerating, // Currently generating10 tps, // Tokens per second11 stop, // Stop generation12 error, // Error message13 } = useCompletion({14 model: "qwen3-0.6b",15 thinking: true,16 maxTokens: 256,17 });18
19 if (isLoading) return <div>Loading...</div>;20
21 return (22 <div>23 <button 24 onClick={() => complete("Write a haiku about coding")} 25 disabled={isGenerating}26 >27 Generate28 </button>29 30 {thinking && (31 <details open>32 <summary>Thinking...</summary>33 <pre>{thinking}</pre>34 </details>35 )}36 37 <p>{completion}</p>38 39 {isGenerating && (40 <>41 <span>{tps.toFixed(0)} tok/s</span>42 <button onClick={stop}>Stop</button>43 </>44 )}45 </div>46 );47}Hook Options
interface UseChatOptions { model?: string; // Model ID (default: "qwen3-0.6b") autoLoad?: boolean; // Load model on mount (default: false) thinking?: boolean; // Enable thinking mode (Qwen3) system?: string; // System prompt maxTokens?: number; // Max tokens (default: 256) temperature?: number; // Temperature (default: 0.7) topP?: number; // Top-p sampling topK?: number; // Top-k sampling}
// useCompletion uses the same optionsLazy Loading (Default)
By default, models load on first generation - not on page load. This prevents surprise downloads:
// Default: model loads when user first submitsconst { handleSubmit, isLoading } = useChat();
// Preload on mount (downloads immediately)const { handleSubmit, isLoading } = useChat({ autoLoad: true });
// Manual control with load()const { load, isLoading, isReady } = useChat();
return ( <button onClick={load} disabled={isLoading || isReady}> {isLoading ? "Loading..." : isReady ? "Ready" : "Load Model"} </button>);Loading Progress States
The loadingProgress object tells you exactly what's happening during model load:
// loadingProgress.status values:
"downloading" // Fetching from network (first time) // Has: file, progress (0-100)
"loading" // Loading from IndexedDB cache (fast) // No additional properties
"ready" // Model ready for inference // No additional properties
"error" // Load failed // Has: error message
// Example usage:if (loadingProgress?.status === "downloading") { return <div>Downloading {loadingProgress.file}: {loadingProgress.progress}%</div>;}if (loadingProgress?.status === "loading") { return <div>Loading from cache...</div>;}Message Type
interface Message { id: string; // Unique ID (e.g., "msg-1") role: "user" | "assistant"; // Message role content: string; // The message content thinking?: string; // Thinking content (if enabled)}Low-Level API
For non-React apps or custom implementations, use createGerbilWorker directly:
01import { createGerbilWorker, isWebGPUSupported } from "@tryhamster/gerbil/browser";02
03// Check WebGPU support04if (!isWebGPUSupported()) {05 console.log("WebGPU not supported - use Chrome/Edge 113+");06 return;07}08
09// Create worker (loads model automatically)10const gerbil = await createGerbilWorker({11 modelId: "qwen3-0.6b",12 onProgress: (p) => {13 if (p.status === "downloading") {14 console.log(`Downloading ${p.file}: ${p.progress}%`);15 } else if (p.status === "loading") {16 console.log("Loading from cache...");17 }18 },19 onToken: (token) => {20 // token.text - the token text21 // token.state - "thinking" or "answering"22 // token.tps - tokens per second23 process.stdout.write(token.text);24 },25 onComplete: (result) => {26 console.log(`Done: ${result.tps.toFixed(1)} tok/s`);27 },28});29
30// Generate31await gerbil.generate("Write a haiku", { thinking: true });32
33// Interrupt34gerbil.interrupt();35
36// Reset conversation37gerbil.reset();38
39// Clean up40gerbil.terminate();Utilities
Helper functions to check compatibility, optimize for device capabilities, and debug issues.
import { // Basic checks isWebGPUSupported, // Quick boolean check getWebGPUInfo, // GPU adapter info // Production-ready checks checkWebGPUReady, // Full WebGPU verification getRecommendedModels, // Memory-aware model selection checkStorageQuota, // Verify disk space checkWebGPUCapabilities,// GPU buffer limits // Debugging getBrowserDiagnostics, // Full diagnostic report} from "@tryhamster/gerbil/browser";isWebGPUSupported()
Check if the browser supports WebGPU:
import { isWebGPUSupported } from "@tryhamster/gerbil/browser";
if (!isWebGPUSupported()) { // Show fallback UI or error message alert("Please use Chrome or Edge 113+ for WebGPU support");}getWebGPUInfo()
Get GPU adapter information for debugging:
import { getWebGPUInfo } from "@tryhamster/gerbil/browser";
const info = await getWebGPUInfo();console.log(info);// { supported: true, adapter: "Apple", device: "Apple M4 Max" }checkWebGPUReady()
Full WebGPU verification — checks not just if the API exists, but if it actually works:
import { checkWebGPUReady } from "@tryhamster/gerbil/browser";
const result = await checkWebGPUReady();// {// ok: true,// webgpu: true,// adapter: { vendor: "apple", architecture: "common-3", device: "", description: "" },// reason: "WebGPU is ready"// }
if (!result.ok) { console.warn(result.reason); // Human-readable explanation // Falls back to WASM automatically}getRecommendedModels()
Memory-aware model selection based on navigator.deviceMemory:
import { getRecommendedModels } from "@tryhamster/gerbil/browser";
const models = getRecommendedModels();// {// chat: "qwen3-0.6b", // or "smollm2-360m" on low-memory devices// vision: "ministral-3b", // or null if not enough memory// tts: "kokoro-82m",// stt: "whisper-base.en", // or "whisper-tiny.en" on low-memory// embedding: "Xenova/all-MiniLM-L6-v2",// reason: "8GB+ detected, using full models"// }
// Use for smart defaultsconst { messages } = useChat({ model: models.chat });Mobile: On mobile devices, Gerbil automatically uses q4 quantization (CPU-optimized) instead of q4f16 (GPU-optimized) for better compatibility and performance.
checkStorageQuota()
Verify available storage before downloading a large model:
import { checkStorageQuota } from "@tryhamster/gerbil/browser";
// Check if we have 700MB available (for qwen3-0.6b)const storage = await checkStorageQuota(700);// {// ok: true,// available: 4500, // MB available// required: 700, // MB requested// message: "4.5GB available, 700MB required"// }
if (!storage.ok) { alert(storage.message); // "Only 200MB available, need 700MB" return;}checkWebGPUCapabilities()
Check if the GPU can run a specific model (buffer size limits, etc.):
import { checkWebGPUCapabilities } from "@tryhamster/gerbil/browser";
const caps = await checkWebGPUCapabilities("qwen3-0.6b");// {// canRunModel: true,// maxBufferSize: 2147483648, // 2GB// requiredBufferSize: 500000000, // ~500MB for qwen3// reason: "GPU buffer size sufficient"// }
if (!caps.canRunModel) { console.warn(caps.reason); // Use smaller model or fall back to WASM}getBrowserDiagnostics()
Comprehensive diagnostic info for debugging compatibility issues:
import { getBrowserDiagnostics } from "@tryhamster/gerbil/browser";
const diag = await getBrowserDiagnostics();// {// browser: "Chrome",// version: "120.0.0",// platform: "macOS",// mobile: false,// webgpu: {// supported: true,// adapter: { vendor: "apple", ... }// },// memory: {// deviceMemory: 8, // GB (from navigator.deviceMemory)// jsHeapLimit: 4096 // MB// },// storage: {// available: 4500, // MB// persistent: true// }// }
// Useful for error reportingconsole.log("Diagnostics:", JSON.stringify(diag, null, 2));Model Preloading
Download models ahead of time during app initialization, so users don't wait when they first use AI. These functions work outside React hooks — perfect for app startup.
01import { 02 preloadChatModel,03 preloadEmbeddingModel,04 preloadTTSModel,05 preloadSTTModel 06} from "@tryhamster/gerbil/browser";07
08// During app initialization (before React mounts)09async function initApp() {10 // Preload LLM with progress tracking11 await preloadChatModel("qwen3-0.6b", {12 onProgress: (p) => {13 if (p.status === "downloading") {14 console.log(`Downloading ${p.file}: ${p.progress}%`);15 }16 },17 });18
19 // Preload other models (all run in parallel)20 await Promise.all([21 preloadEmbeddingModel(), // default: Xenova/all-MiniLM-L6-v222 preloadTTSModel("kokoro-82m"), // or "supertonic-66m"23 preloadSTTModel("whisper-tiny.en"),24 ]);25 26 console.log("All models ready!");27}28
29// Call during app startup30initApp();Preload Functions
| Function | Default Model | Description |
|---|---|---|
| preloadChatModel(modelId, opts?) | — | Preload LLM to IndexedDB |
| preloadEmbeddingModel(modelId?, opts?) | Xenova/all-MiniLM-L6-v2 | Preload embedding model |
| preloadTTSModel(modelId?, opts?) | kokoro-82m | Preload text-to-speech model |
| preloadSTTModel(modelId?, opts?) | whisper-tiny.en | Preload speech-to-text model |
Preload Options
interface PreloadOptions { // Track download progress onProgress?: (p: PreloadProgress) => void; // Keep model loaded in memory after preload (default: false) // false = download, then dispose to free RAM // true = download and keep in memory for instant use keepLoaded?: boolean;}
type PreloadProgress = { status: "downloading" | "loading" | "ready" | "error"; file?: string; // Current file being downloaded progress?: number; // 0-100 percentage message?: string; // Status message};keepLoaded Option
Control whether the model stays in memory after preloading:
| Value | Behavior | Use Case |
|---|---|---|
| false | Download → Dispose → Free memory | Preload for later, save RAM |
| true | Download → Keep in memory | Instant use, no disk I/O delay |
// Download only - frees RAM after preload (~400MB saved)await preloadChatModel("qwen3-0.6b");// Later: loads from IndexedDB cache (~1-2s)
// Keep in memory - uses RAM but instant inferenceawait preloadChatModel("qwen3-0.6b", { keepLoaded: true });// Later: model already loaded, no waitBrowser Models
Models optimized for browser use. Automatically cached in IndexedDB after first download.
| Model | Size | Speed | Best For |
|---|---|---|---|
| qwen3-0.6b | ~400MB | 100-150 tok/s | General use, thinking mode, reasoning |
| smollm2-360m | ~250MB | 150-200 tok/s | Faster responses, good quality |
| smollm2-135m | ~100MB | 200-300 tok/s | Fastest, basic tasks |
Browser Support
| Browser | Version | Status |
|---|---|---|
| Chrome / Edge | 113+ | ✓ Full support |
| Safari | 18+ | ⚠ May have quirks |
| Firefox | — | ✗ Behind flag, not recommended |
iOS Memory Guards
iOS Safari and iOS Chrome have strict memory limits (~300-400MB effective for WKWebView). All React hooks automatically protect against iOS crashes — no code changes required.
Automatic Protection: All hooks block large models on iOS, detect crashes, and use chunked resumable downloads.
What Happens Automatically
| Hook | iOS Guard | Crash Detect | Chunked DL |
|---|---|---|---|
| useChat | ✓ Blocks large | ✓ | ✓ Resumable |
| useCompletion | ✓ Blocks large | ✓ | ✓ Resumable |
| useSpeech | — | ✓ | ✓ Resumable |
| useVoiceInput | — | ✓ | ✓ Resumable |
| useEmbedding | — | ✓ | ✓ Resumable |
iOS Compatibility Matrix
| Model | Size | iOS Safe | Notes |
|---|---|---|---|
| smollm2-135m | ~150MB | ✓ Yes | Best for iOS |
| smollm2-360m | ~400MB | ✓ Yes | Recommended for iOS |
| qwen3-0.6b | ~700MB | ⚠ Risky | Only on iPhone 14+/iPad Pro |
| qwen3-1.7b | ~1.8GB | ✗ Blocked | Desktop only |
| kokoro-82m | ~350MB | ✓ Yes | TTS |
| whisper-tiny | ~150MB | ✓ Yes | STT |
Manual Utilities (Advanced)
For custom implementations or advanced control, these utilities are available:
01import { 02 isModelSafeForDevice, // Check if model is safe for current device03 detectMemoryCrash, // Check if previous session crashed04 setDownloadPhase, // Track download phase for crash detection05 clearDownloadPhase, // Clear phase on success06 downloadModelChunked, // Resumable chunked downloads07 hasIncompleteDownload, // Check for interrupted downloads08 clearIncompleteDownload, // Clear partial download09} from "@tryhamster/gerbil/browser";10
11// Check model safety before loading12const check = isModelSafeForDevice("qwen3-1.7b");13if (!check.safe) {14 console.log(check.reason); // "Model is too large for iOS..."15 console.log(check.recommendation); // "Use smollm2-360m or qwen3-0.6b"16 console.log(check.maxSafeModel); // "qwen3-0.6b"17}18
19// Detect if page crashed during previous model load20const crash = detectMemoryCrash();21if (crash.crashed) {22 console.log(crash.recommendation); // "The model was too large..."23 console.log(crash.phase); // "downloading" | "initializing"24 console.log(crash.modelId); // which model caused it25}Chunked Resumable Downloads
Model downloads automatically use chunked downloading with resume support. If a download is interrupted (page refresh, crash, network error), it resumes from where it left off. For manual control:
01import { 02 downloadModelChunked, 03 hasIncompleteDownload, 04 clearIncompleteDownload 05} from "@tryhamster/gerbil/browser";06
07// Check for interrupted downloads08const incomplete = await hasIncompleteDownload("qwen3-0.6b");09if (incomplete.incomplete) {10 console.log(`Resuming: ${incomplete.percent}% complete`);11}12
13// Download with progress and resume support14const buffer = await downloadModelChunked(15 "https://huggingface.co/...",16 "qwen3-0.6b",17 {18 onProgress: (info) => {19 console.log(`${info.phase}: ${info.percent}%`);20 },21 signal: abortController.signal,22 }23);| Feature | Description |
|---|---|
| HTTP Range requests | Downloads in 1.5MB chunks using Range: bytes=start-end |
| IndexedDB storage | Each chunk stored separately to avoid large transaction spikes |
| Automatic resume | Tracks completed chunks in manifest, resumes from last position |
| ETag validation | Clears cached chunks if model version changes |
| Abort support | Cancel downloads gracefully with AbortController |
| Fallback | Falls back to regular download if server doesn't support Range |
Troubleshooting
"WebGPU not supported"
- Update to Chrome/Edge 113+
- Check
chrome://gpufor WebGPU status - Try enabling
chrome://flags/#enable-unsafe-webgpu
Slow first load
First load downloads the model (~400MB for qwen3-0.6b) and compiles WebGPU shaders. Subsequent loads use IndexedDB cache and are much faster (~2-5s).
Out of memory
Smaller models like smollm2-135m use less GPU memory. Close other GPU-intensive tabs.
CORS / Header issues
Your server needs these headers for SharedArrayBuffer (required for threading):
Cross-Origin-Opener-Policy: same-originCross-Origin-Embedder-Policy: require-corpNext.js Configuration
Add the required headers and webpack config for Next.js:
01// next.config.js02/** @type {import('next').NextConfig} */03const nextConfig = {04 async headers() {05 return [06 {07 source: "/(.*)",08 headers: [09 { key: "Cross-Origin-Opener-Policy", value: "same-origin" },10 { key: "Cross-Origin-Embedder-Policy", value: "require-corp" },11 ],12 },13 ];14 },15 webpack: (config, { isServer }) => {16 config.experiments = {17 ...config.experiments,18 asyncWebAssembly: true,19 };20 21 if (isServer) {22 config.externals.push("@huggingface/transformers");23 } else {24 // Exclude Node.js polyfills from browser bundle25 config.resolve.alias = {26 ...config.resolve.alias,27 webgpu: false,28 };29 config.resolve.fallback = {30 ...config.resolve.fallback,31 path: false,32 fs: false,33 os: false,34 };35 }36
37 return config;38 },39};40
41module.exports = nextConfig;Next Steps
- React Hooks Reference → — useSpeech, useVoiceInput, useVoiceChat
- Text-to-Speech → — generate natural speech in the browser
- Speech-to-Text → — transcribe audio with Whisper
- Vision AI → — analyze images in the browser