Browser Usage

Run LLMs directly in the browser with WebGPU acceleration. No server required.

100-150 tok/s with WebGPU ·Models cached in IndexedDB ·Fully private, runs locally

React Hooks

The easiest way to use Gerbil in React. Follows the same patterns as Vercel AI SDK.

useChat

Full chat with message history, thinking mode, and streaming:

Chat.tsx
01import { useChat } from "@tryhamster/gerbil/browser";
02
03function Chat() {
04 const {
05 messages, // Message[] with id, role, content, thinking?
06 input, // Current input value
07 setInput, // Update input
08 handleSubmit, // Form submit handler
09 isLoading, // Model loading
10 loadingProgress, // { status, file?, progress? }
11 isGenerating, // Currently generating
12 thinking, // Current thinking content (streaming)
13 stop, // Stop generation
14 clear, // Clear messages
15 tps, // Tokens per second
16 error, // Error message
17 } = useChat({
18 model: "qwen3-0.6b",
19 thinking: true,
20 system: "You are a helpful assistant.",
21 maxTokens: 512,
22 });
23
24 if (isLoading) {
25 return <div>Loading model: {loadingProgress?.progress}%</div>;
26 }
27
28 return (
29 <div>
30 {messages.map(m => (
31 <div key={m.id}>
32 {m.thinking && (
33 <details>
34 <summary>Thinking...</summary>
35 <pre>{m.thinking}</pre>
36 </details>
37 )}
38 <p><strong>{m.role}:</strong> {m.content}</p>
39 </div>
40 ))}
41
42 <form onSubmit={handleSubmit}>
43 <input
44 value={input}
45 onChange={e => setInput(e.target.value)}
46 disabled={isGenerating}
47 placeholder="Ask anything..."
48 />
49 <button type="submit" disabled={isGenerating}>
50 {isGenerating ? `${tps.toFixed(0)} tok/s` : "Send"}
51 </button>
52 {isGenerating && <button type="button" onClick={stop}>Stop</button>}
53 </form>
54 </div>
55 );
56}

useCompletion

One-off text generation without message history:

Generator.tsx
01import { useCompletion } from "@tryhamster/gerbil/browser";
02
03function Generator() {
04 const {
05 complete, // Function to generate text
06 completion, // Generated text (streaming)
07 thinking, // Thinking content (if enabled)
08 isLoading, // Model loading
09 isGenerating, // Currently generating
10 tps, // Tokens per second
11 stop, // Stop generation
12 error, // Error message
13 } = useCompletion({
14 model: "qwen3-0.6b",
15 thinking: true,
16 maxTokens: 256,
17 });
18
19 if (isLoading) return <div>Loading...</div>;
20
21 return (
22 <div>
23 <button
24 onClick={() => complete("Write a haiku about coding")}
25 disabled={isGenerating}
26 >
27 Generate
28 </button>
29
30 {thinking && (
31 <details open>
32 <summary>Thinking...</summary>
33 <pre>{thinking}</pre>
34 </details>
35 )}
36
37 <p>{completion}</p>
38
39 {isGenerating && (
40 <>
41 <span>{tps.toFixed(0)} tok/s</span>
42 <button onClick={stop}>Stop</button>
43 </>
44 )}
45 </div>
46 );
47}

Hook Options

types.ts
interface UseChatOptions {
model?: string; // Model ID (default: "qwen3-0.6b")
autoLoad?: boolean; // Load model on mount (default: false)
thinking?: boolean; // Enable thinking mode (Qwen3)
system?: string; // System prompt
maxTokens?: number; // Max tokens (default: 256)
temperature?: number; // Temperature (default: 0.7)
topP?: number; // Top-p sampling
topK?: number; // Top-k sampling
}
// useCompletion uses the same options

Lazy Loading (Default)

By default, models load on first generation - not on page load. This prevents surprise downloads:

loading.tsx
// Default: model loads when user first submits
const { handleSubmit, isLoading } = useChat();
// Preload on mount (downloads immediately)
const { handleSubmit, isLoading } = useChat({ autoLoad: true });
// Manual control with load()
const { load, isLoading, isReady } = useChat();
return (
<button onClick={load} disabled={isLoading || isReady}>
{isLoading ? "Loading..." : isReady ? "Ready" : "Load Model"}
</button>
);

Loading Progress States

The loadingProgress object tells you exactly what's happening during model load:

loading-states.ts
// loadingProgress.status values:
"downloading" // Fetching from network (first time)
// Has: file, progress (0-100)
"loading" // Loading from IndexedDB cache (fast)
// No additional properties
"ready" // Model ready for inference
// No additional properties
"error" // Load failed
// Has: error message
// Example usage:
if (loadingProgress?.status === "downloading") {
return <div>Downloading {loadingProgress.file}: {loadingProgress.progress}%</div>;
}
if (loadingProgress?.status === "loading") {
return <div>Loading from cache...</div>;
}

Message Type

types.ts
interface Message {
id: string; // Unique ID (e.g., "msg-1")
role: "user" | "assistant"; // Message role
content: string; // The message content
thinking?: string; // Thinking content (if enabled)
}

Low-Level API

For non-React apps or custom implementations, use createGerbilWorker directly:

vanilla.ts
01import { createGerbilWorker, isWebGPUSupported } from "@tryhamster/gerbil/browser";
02
03// Check WebGPU support
04if (!isWebGPUSupported()) {
05 console.log("WebGPU not supported - use Chrome/Edge 113+");
06 return;
07}
08
09// Create worker (loads model automatically)
10const gerbil = await createGerbilWorker({
11 modelId: "qwen3-0.6b",
12 onProgress: (p) => {
13 if (p.status === "downloading") {
14 console.log(`Downloading ${p.file}: ${p.progress}%`);
15 } else if (p.status === "loading") {
16 console.log("Loading from cache...");
17 }
18 },
19 onToken: (token) => {
20 // token.text - the token text
21 // token.state - "thinking" or "answering"
22 // token.tps - tokens per second
23 process.stdout.write(token.text);
24 },
25 onComplete: (result) => {
26 console.log(`Done: ${result.tps.toFixed(1)} tok/s`);
27 },
28});
29
30// Generate
31await gerbil.generate("Write a haiku", { thinking: true });
32
33// Interrupt
34gerbil.interrupt();
35
36// Reset conversation
37gerbil.reset();
38
39// Clean up
40gerbil.terminate();

Utilities

isWebGPUSupported()

Check if the browser supports WebGPU:

check.ts
import { isWebGPUSupported } from "@tryhamster/gerbil/browser";
if (!isWebGPUSupported()) {
// Show fallback UI or error message
alert("Please use Chrome or Edge 113+ for WebGPU support");
}

getWebGPUInfo()

Get GPU adapter information for debugging:

info.ts
import { getWebGPUInfo } from "@tryhamster/gerbil/browser";
const info = await getWebGPUInfo();
console.log(info);
// { supported: true, adapter: "Apple", device: "Apple M4 Max" }

Model Preloading

Download models ahead of time during app initialization, so users don't wait when they first use AI. These functions work outside React hooks — perfect for app startup.

preload.ts
01import {
02 preloadChatModel,
03 preloadEmbeddingModel,
04 preloadTTSModel,
05 preloadSTTModel
06} from "@tryhamster/gerbil/browser";
07
08// During app initialization (before React mounts)
09async function initApp() {
10 // Preload LLM with progress tracking
11 await preloadChatModel("qwen3-0.6b", {
12 onProgress: (p) => {
13 if (p.status === "downloading") {
14 console.log(`Downloading ${p.file}: ${p.progress}%`);
15 }
16 },
17 });
18
19 // Preload other models (all run in parallel)
20 await Promise.all([
21 preloadEmbeddingModel(), // default: Xenova/all-MiniLM-L6-v2
22 preloadTTSModel("kokoro-82m"), // or "supertonic-66m"
23 preloadSTTModel("whisper-tiny.en"),
24 ]);
25
26 console.log("All models ready!");
27}
28
29// Call during app startup
30initApp();

Preload Functions

FunctionDefault ModelDescription
preloadChatModel(modelId, opts?)Preload LLM to IndexedDB
preloadEmbeddingModel(modelId?, opts?)Xenova/all-MiniLM-L6-v2Preload embedding model
preloadTTSModel(modelId?, opts?)kokoro-82mPreload text-to-speech model
preloadSTTModel(modelId?, opts?)whisper-tiny.enPreload speech-to-text model

Preload Options

types.ts
interface PreloadOptions {
// Track download progress
onProgress?: (p: PreloadProgress) => void;
// Keep model loaded in memory after preload (default: false)
// false = download, then dispose to free RAM
// true = download and keep in memory for instant use
keepLoaded?: boolean;
}
type PreloadProgress = {
status: "downloading" | "loading" | "ready" | "error";
file?: string; // Current file being downloaded
progress?: number; // 0-100 percentage
message?: string; // Status message
};

keepLoaded Option

Control whether the model stays in memory after preloading:

ValueBehaviorUse Case
falseDownload → Dispose → Free memoryPreload for later, save RAM
trueDownload → Keep in memoryInstant use, no disk I/O delay
keep-loaded.ts
// Download only - frees RAM after preload (~400MB saved)
await preloadChatModel("qwen3-0.6b");
// Later: loads from IndexedDB cache (~1-2s)
// Keep in memory - uses RAM but instant inference
await preloadChatModel("qwen3-0.6b", { keepLoaded: true });
// Later: model already loaded, no wait

Browser Models

Models optimized for browser use. Automatically cached in IndexedDB after first download.

ModelSizeSpeedBest For
qwen3-0.6b~400MB100-150 tok/sGeneral use, thinking mode, reasoning
smollm2-360m~250MB150-200 tok/sFaster responses, good quality
smollm2-135m~100MB200-300 tok/sFastest, basic tasks

Browser Support

BrowserVersionStatus
Chrome / Edge113+✓ Full support
Safari18+⚠ May have quirks
Firefox✗ Behind flag, not recommended

Troubleshooting

"WebGPU not supported"

  • Update to Chrome/Edge 113+
  • Check chrome://gpu for WebGPU status
  • Try enabling chrome://flags/#enable-unsafe-webgpu

Slow first load

First load downloads the model (~400MB for qwen3-0.6b) and compiles WebGPU shaders. Subsequent loads use IndexedDB cache and are much faster (~2-5s).

Out of memory

Smaller models like smollm2-135m use less GPU memory. Close other GPU-intensive tabs.

CORS / Header issues

Your server needs these headers for SharedArrayBuffer (required for threading):

Terminal
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Next.js Configuration

Add the required headers and webpack config for Next.js:

next.config.js
01// next.config.js
02/** @type {import('next').NextConfig} */
03const nextConfig = {
04 async headers() {
05 return [
06 {
07 source: "/(.*)",
08 headers: [
09 { key: "Cross-Origin-Opener-Policy", value: "same-origin" },
10 { key: "Cross-Origin-Embedder-Policy", value: "require-corp" },
11 ],
12 },
13 ];
14 },
15 webpack: (config, { isServer }) => {
16 config.experiments = {
17 ...config.experiments,
18 asyncWebAssembly: true,
19 };
20
21 if (isServer) {
22 config.externals.push("@huggingface/transformers");
23 } else {
24 // Exclude Node.js polyfills from browser bundle
25 config.resolve.alias = {
26 ...config.resolve.alias,
27 webgpu: false,
28 };
29 config.resolve.fallback = {
30 ...config.resolve.fallback,
31 path: false,
32 fs: false,
33 os: false,
34 };
35 }
36
37 return config;
38 },
39};
40
41module.exports = nextConfig;

Next Steps