Browser Usage
Run LLMs directly in the browser with WebGPU acceleration. No server required.
100-150 tok/s with WebGPU ·Models cached in IndexedDB ·Fully private, runs locally
React Hooks
The easiest way to use Gerbil in React. Follows the same patterns as Vercel AI SDK.
useChat
Full chat with message history, thinking mode, and streaming:
01import { useChat } from "@tryhamster/gerbil/browser";02
03function Chat() {04 const { 05 messages, // Message[] with id, role, content, thinking?06 input, // Current input value07 setInput, // Update input08 handleSubmit, // Form submit handler09 isLoading, // Model loading10 loadingProgress, // { status, file?, progress? }11 isGenerating, // Currently generating12 thinking, // Current thinking content (streaming)13 stop, // Stop generation14 clear, // Clear messages15 tps, // Tokens per second16 error, // Error message17 } = useChat({18 model: "qwen3-0.6b",19 thinking: true,20 system: "You are a helpful assistant.",21 maxTokens: 512,22 });23
24 if (isLoading) {25 return <div>Loading model: {loadingProgress?.progress}%</div>;26 }27
28 return (29 <div>30 {messages.map(m => (31 <div key={m.id}>32 {m.thinking && (33 <details>34 <summary>Thinking...</summary>35 <pre>{m.thinking}</pre>36 </details>37 )}38 <p><strong>{m.role}:</strong> {m.content}</p>39 </div>40 ))}41 42 <form onSubmit={handleSubmit}>43 <input 44 value={input} 45 onChange={e => setInput(e.target.value)}46 disabled={isGenerating}47 placeholder="Ask anything..."48 />49 <button type="submit" disabled={isGenerating}>50 {isGenerating ? `${tps.toFixed(0)} tok/s` : "Send"}51 </button>52 {isGenerating && <button type="button" onClick={stop}>Stop</button>}53 </form>54 </div>55 );56}useCompletion
One-off text generation without message history:
01import { useCompletion } from "@tryhamster/gerbil/browser";02
03function Generator() {04 const { 05 complete, // Function to generate text06 completion, // Generated text (streaming)07 thinking, // Thinking content (if enabled)08 isLoading, // Model loading09 isGenerating, // Currently generating10 tps, // Tokens per second11 stop, // Stop generation12 error, // Error message13 } = useCompletion({14 model: "qwen3-0.6b",15 thinking: true,16 maxTokens: 256,17 });18
19 if (isLoading) return <div>Loading...</div>;20
21 return (22 <div>23 <button 24 onClick={() => complete("Write a haiku about coding")} 25 disabled={isGenerating}26 >27 Generate28 </button>29 30 {thinking && (31 <details open>32 <summary>Thinking...</summary>33 <pre>{thinking}</pre>34 </details>35 )}36 37 <p>{completion}</p>38 39 {isGenerating && (40 <>41 <span>{tps.toFixed(0)} tok/s</span>42 <button onClick={stop}>Stop</button>43 </>44 )}45 </div>46 );47}Hook Options
interface UseChatOptions { model?: string; // Model ID (default: "qwen3-0.6b") autoLoad?: boolean; // Load model on mount (default: false) thinking?: boolean; // Enable thinking mode (Qwen3) system?: string; // System prompt maxTokens?: number; // Max tokens (default: 256) temperature?: number; // Temperature (default: 0.7) topP?: number; // Top-p sampling topK?: number; // Top-k sampling}
// useCompletion uses the same optionsLazy Loading (Default)
By default, models load on first generation - not on page load. This prevents surprise downloads:
// Default: model loads when user first submitsconst { handleSubmit, isLoading } = useChat();
// Preload on mount (downloads immediately)const { handleSubmit, isLoading } = useChat({ autoLoad: true });
// Manual control with load()const { load, isLoading, isReady } = useChat();
return ( <button onClick={load} disabled={isLoading || isReady}> {isLoading ? "Loading..." : isReady ? "Ready" : "Load Model"} </button>);Loading Progress States
The loadingProgress object tells you exactly what's happening during model load:
// loadingProgress.status values:
"downloading" // Fetching from network (first time) // Has: file, progress (0-100)
"loading" // Loading from IndexedDB cache (fast) // No additional properties
"ready" // Model ready for inference // No additional properties
"error" // Load failed // Has: error message
// Example usage:if (loadingProgress?.status === "downloading") { return <div>Downloading {loadingProgress.file}: {loadingProgress.progress}%</div>;}if (loadingProgress?.status === "loading") { return <div>Loading from cache...</div>;}Message Type
interface Message { id: string; // Unique ID (e.g., "msg-1") role: "user" | "assistant"; // Message role content: string; // The message content thinking?: string; // Thinking content (if enabled)}Low-Level API
For non-React apps or custom implementations, use createGerbilWorker directly:
01import { createGerbilWorker, isWebGPUSupported } from "@tryhamster/gerbil/browser";02
03// Check WebGPU support04if (!isWebGPUSupported()) {05 console.log("WebGPU not supported - use Chrome/Edge 113+");06 return;07}08
09// Create worker (loads model automatically)10const gerbil = await createGerbilWorker({11 modelId: "qwen3-0.6b",12 onProgress: (p) => {13 if (p.status === "downloading") {14 console.log(`Downloading ${p.file}: ${p.progress}%`);15 } else if (p.status === "loading") {16 console.log("Loading from cache...");17 }18 },19 onToken: (token) => {20 // token.text - the token text21 // token.state - "thinking" or "answering"22 // token.tps - tokens per second23 process.stdout.write(token.text);24 },25 onComplete: (result) => {26 console.log(`Done: ${result.tps.toFixed(1)} tok/s`);27 },28});29
30// Generate31await gerbil.generate("Write a haiku", { thinking: true });32
33// Interrupt34gerbil.interrupt();35
36// Reset conversation37gerbil.reset();38
39// Clean up40gerbil.terminate();Utilities
isWebGPUSupported()
Check if the browser supports WebGPU:
import { isWebGPUSupported } from "@tryhamster/gerbil/browser";
if (!isWebGPUSupported()) { // Show fallback UI or error message alert("Please use Chrome or Edge 113+ for WebGPU support");}getWebGPUInfo()
Get GPU adapter information for debugging:
import { getWebGPUInfo } from "@tryhamster/gerbil/browser";
const info = await getWebGPUInfo();console.log(info);// { supported: true, adapter: "Apple", device: "Apple M4 Max" }Model Preloading
Download models ahead of time during app initialization, so users don't wait when they first use AI. These functions work outside React hooks — perfect for app startup.
01import { 02 preloadChatModel,03 preloadEmbeddingModel,04 preloadTTSModel,05 preloadSTTModel 06} from "@tryhamster/gerbil/browser";07
08// During app initialization (before React mounts)09async function initApp() {10 // Preload LLM with progress tracking11 await preloadChatModel("qwen3-0.6b", {12 onProgress: (p) => {13 if (p.status === "downloading") {14 console.log(`Downloading ${p.file}: ${p.progress}%`);15 }16 },17 });18
19 // Preload other models (all run in parallel)20 await Promise.all([21 preloadEmbeddingModel(), // default: Xenova/all-MiniLM-L6-v222 preloadTTSModel("kokoro-82m"), // or "supertonic-66m"23 preloadSTTModel("whisper-tiny.en"),24 ]);25 26 console.log("All models ready!");27}28
29// Call during app startup30initApp();Preload Functions
| Function | Default Model | Description |
|---|---|---|
| preloadChatModel(modelId, opts?) | — | Preload LLM to IndexedDB |
| preloadEmbeddingModel(modelId?, opts?) | Xenova/all-MiniLM-L6-v2 | Preload embedding model |
| preloadTTSModel(modelId?, opts?) | kokoro-82m | Preload text-to-speech model |
| preloadSTTModel(modelId?, opts?) | whisper-tiny.en | Preload speech-to-text model |
Preload Options
interface PreloadOptions { // Track download progress onProgress?: (p: PreloadProgress) => void; // Keep model loaded in memory after preload (default: false) // false = download, then dispose to free RAM // true = download and keep in memory for instant use keepLoaded?: boolean;}
type PreloadProgress = { status: "downloading" | "loading" | "ready" | "error"; file?: string; // Current file being downloaded progress?: number; // 0-100 percentage message?: string; // Status message};keepLoaded Option
Control whether the model stays in memory after preloading:
| Value | Behavior | Use Case |
|---|---|---|
| false | Download → Dispose → Free memory | Preload for later, save RAM |
| true | Download → Keep in memory | Instant use, no disk I/O delay |
// Download only - frees RAM after preload (~400MB saved)await preloadChatModel("qwen3-0.6b");// Later: loads from IndexedDB cache (~1-2s)
// Keep in memory - uses RAM but instant inferenceawait preloadChatModel("qwen3-0.6b", { keepLoaded: true });// Later: model already loaded, no waitBrowser Models
Models optimized for browser use. Automatically cached in IndexedDB after first download.
| Model | Size | Speed | Best For |
|---|---|---|---|
| qwen3-0.6b | ~400MB | 100-150 tok/s | General use, thinking mode, reasoning |
| smollm2-360m | ~250MB | 150-200 tok/s | Faster responses, good quality |
| smollm2-135m | ~100MB | 200-300 tok/s | Fastest, basic tasks |
Browser Support
| Browser | Version | Status |
|---|---|---|
| Chrome / Edge | 113+ | ✓ Full support |
| Safari | 18+ | ⚠ May have quirks |
| Firefox | — | ✗ Behind flag, not recommended |
Troubleshooting
"WebGPU not supported"
- Update to Chrome/Edge 113+
- Check
chrome://gpufor WebGPU status - Try enabling
chrome://flags/#enable-unsafe-webgpu
Slow first load
First load downloads the model (~400MB for qwen3-0.6b) and compiles WebGPU shaders. Subsequent loads use IndexedDB cache and are much faster (~2-5s).
Out of memory
Smaller models like smollm2-135m use less GPU memory. Close other GPU-intensive tabs.
CORS / Header issues
Your server needs these headers for SharedArrayBuffer (required for threading):
Cross-Origin-Opener-Policy: same-originCross-Origin-Embedder-Policy: require-corpNext.js Configuration
Add the required headers and webpack config for Next.js:
01// next.config.js02/** @type {import('next').NextConfig} */03const nextConfig = {04 async headers() {05 return [06 {07 source: "/(.*)",08 headers: [09 { key: "Cross-Origin-Opener-Policy", value: "same-origin" },10 { key: "Cross-Origin-Embedder-Policy", value: "require-corp" },11 ],12 },13 ];14 },15 webpack: (config, { isServer }) => {16 config.experiments = {17 ...config.experiments,18 asyncWebAssembly: true,19 };20 21 if (isServer) {22 config.externals.push("@huggingface/transformers");23 } else {24 // Exclude Node.js polyfills from browser bundle25 config.resolve.alias = {26 ...config.resolve.alias,27 webgpu: false,28 };29 config.resolve.fallback = {30 ...config.resolve.fallback,31 path: false,32 fs: false,33 os: false,34 };35 }36
37 return config;38 },39};40
41module.exports = nextConfig;Next Steps
- React Hooks Reference → — useSpeech, useVoiceInput, useVoiceChat
- Text-to-Speech → — generate natural speech in the browser
- Speech-to-Text → — transcribe audio with Whisper
- Vision AI → — analyze images in the browser