Browser Usage

Name: Gerbil
Author: Gerbil

Run LLMs directly in the browser with WebGPU acceleration. No server required.

100-150 tok/s with WebGPU ·Models cached in IndexedDB ·Fully private, runs locally

React Hooks

The easiest way to use Gerbil in React. Follows the same patterns as Vercel AI SDK.

useChat

Full chat with message history, thinking mode, and streaming:

Chat.tsx

01import { useChat } from "@tryhamster/gerbil/browser";
02
03function Chat() {
04  const { 
05    messages,        // Message[] with id, role, content, thinking?
06    input,           // Current input value
07    setInput,        // Update input
08    handleSubmit,    // Form submit handler
09    isLoading,       // Model loading
10    loadingProgress, // { status, file?, progress? }
11    isGenerating,    // Currently generating
12    thinking,        // Current thinking content (streaming)
13    stop,            // Stop generation
14    clear,           // Clear messages
15    tps,             // Tokens per second
16    error,           // Error message
17  } = useChat({
18    model: "qwen3-0.6b",
19    thinking: true,
20    system: "You are a helpful assistant.",
21    maxTokens: 512,
22  });
23
24  if (isLoading) {
25    return <div>Loading model: {loadingProgress?.progress}%</div>;
26  }
27
28  return (
29    <div>
30      {messages.map(m => (
31        <div key={m.id}>
32          {m.thinking && (
33            <details>
34              <summary>Thinking...</summary>
35              <pre>{m.thinking}</pre>
36            </details>
37          )}
38          <p><strong>{m.role}:</strong> {m.content}</p>
39        </div>
40      ))}
41      
42      <form onSubmit={handleSubmit}>
43        <input 
44          value={input} 
45          onChange={e => setInput(e.target.value)}
46          disabled={isGenerating}
47          placeholder="Ask anything..."
48        />
49        <button type="submit" disabled={isGenerating}>
50          {isGenerating ? `${tps.toFixed(0)} tok/s` : "Send"}
51        </button>
52        {isGenerating && <button type="button" onClick={stop}>Stop</button>}
53      </form>
54    </div>
55  );
56}

useCompletion

One-off text generation without message history:

Generator.tsx

01import { useCompletion } from "@tryhamster/gerbil/browser";
02
03function Generator() {
04  const { 
05    complete,        // Function to generate text
06    completion,      // Generated text (streaming)
07    thinking,        // Thinking content (if enabled)
08    isLoading,       // Model loading
09    isGenerating,    // Currently generating
10    tps,             // Tokens per second
11    stop,            // Stop generation
12    error,           // Error message
13  } = useCompletion({
14    model: "qwen3-0.6b",
15    thinking: true,
16    maxTokens: 256,
17  });
18
19  if (isLoading) return <div>Loading...</div>;
20
21  return (
22    <div>
23      <button 
24        onClick={() => complete("Write a haiku about coding")} 
25        disabled={isGenerating}
26      >
27        Generate
28      </button>
29      
30      {thinking && (
31        <details open>
32          <summary>Thinking...</summary>
33          <pre>{thinking}</pre>
34        </details>
35      )}
36      
37      <p>{completion}</p>
38      
39      {isGenerating && (
40        <>
41          <span>{tps.toFixed(0)} tok/s</span>
42          <button onClick={stop}>Stop</button>
43        </>
44      )}
45    </div>
46  );
47}

Hook Options

types.ts

interface UseChatOptions {
  model?: string;        // Model ID (default: "qwen3-0.6b")
  autoLoad?: boolean;    // Load model on mount (default: false)
  thinking?: boolean;    // Enable thinking mode (Qwen3)
  system?: string;       // System prompt
  maxTokens?: number;    // Max tokens (default: 256)
  temperature?: number;  // Temperature (default: 0.7)
  topP?: number;         // Top-p sampling
  topK?: number;         // Top-k sampling
}

// useCompletion uses the same options

Lazy Loading (Default)

By default, models load on first generation - not on page load. This prevents surprise downloads:

loading.tsx

// Default: model loads when user first submits
const { handleSubmit, isLoading } = useChat();

// Preload on mount (downloads immediately)
const { handleSubmit, isLoading } = useChat({ autoLoad: true });

// Manual control with load()
const { load, isLoading, isReady } = useChat();

return (
  <button onClick={load} disabled={isLoading || isReady}>
    {isLoading ? "Loading..." : isReady ? "Ready" : "Load Model"}
  </button>
);

Loading Progress States

The loadingProgress object tells you exactly what's happening during model load:

loading-states.ts

// loadingProgress.status values:

"downloading"  // Fetching from network (first time)
               // Has: file, progress (0-100)

"loading"      // Loading from IndexedDB cache (fast)
               // No additional properties

"ready"        // Model ready for inference
               // No additional properties

"error"        // Load failed
               // Has: error message

// Example usage:
if (loadingProgress?.status === "downloading") {
  return <div>Downloading {loadingProgress.file}: {loadingProgress.progress}%</div>;
}
if (loadingProgress?.status === "loading") {
  return <div>Loading from cache...</div>;
}

Message Type

types.ts

interface Message {
  id: string;                      // Unique ID (e.g., "msg-1")
  role: "user" | "assistant";      // Message role
  content: string;                 // The message content
  thinking?: string;               // Thinking content (if enabled)
}

Low-Level API

For non-React apps or custom implementations, use createGerbilWorker directly:

vanilla.ts

01import { createGerbilWorker, isWebGPUSupported } from "@tryhamster/gerbil/browser";
02
03// Check WebGPU support
04if (!isWebGPUSupported()) {
05  console.log("WebGPU not supported - use Chrome/Edge 113+");
06  return;
07}
08
09// Create worker (loads model automatically)
10const gerbil = await createGerbilWorker({
11  modelId: "qwen3-0.6b",
12  onProgress: (p) => {
13    if (p.status === "downloading") {
14      console.log(`Downloading ${p.file}: ${p.progress}%`);
15    } else if (p.status === "loading") {
16      console.log("Loading from cache...");
17    }
18  },
19  onToken: (token) => {
20    // token.text - the token text
21    // token.state - "thinking" or "answering"
22    // token.tps - tokens per second
23    process.stdout.write(token.text);
24  },
25  onComplete: (result) => {
26    console.log(`Done: ${result.tps.toFixed(1)} tok/s`);
27  },
28});
29
30// Generate
31await gerbil.generate("Write a haiku", { thinking: true });
32
33// Interrupt
34gerbil.interrupt();
35
36// Reset conversation
37gerbil.reset();
38
39// Clean up
40gerbil.terminate();

Utilities

isWebGPUSupported()

Check if the browser supports WebGPU:

check.ts

import { isWebGPUSupported } from "@tryhamster/gerbil/browser";

if (!isWebGPUSupported()) {
  // Show fallback UI or error message
  alert("Please use Chrome or Edge 113+ for WebGPU support");
}

getWebGPUInfo()

Get GPU adapter information for debugging:

info.ts

import { getWebGPUInfo } from "@tryhamster/gerbil/browser";

const info = await getWebGPUInfo();
console.log(info);
// { supported: true, adapter: "Apple", device: "Apple M4 Max" }

Model Preloading

Download models ahead of time during app initialization, so users don't wait when they first use AI. These functions work outside React hooks — perfect for app startup.

preload.ts

01import { 
02  preloadChatModel,
03  preloadEmbeddingModel,
04  preloadTTSModel,
05  preloadSTTModel 
06} from "@tryhamster/gerbil/browser";
07
08// During app initialization (before React mounts)
09async function initApp() {
10  // Preload LLM with progress tracking
11  await preloadChatModel("qwen3-0.6b", {
12    onProgress: (p) => {
13      if (p.status === "downloading") {
14        console.log(`Downloading ${p.file}: ${p.progress}%`);
15      }
16    },
17  });
18
19  // Preload other models (all run in parallel)
20  await Promise.all([
21    preloadEmbeddingModel(),           // default: Xenova/all-MiniLM-L6-v2
22    preloadTTSModel("kokoro-82m"),     // or "supertonic-66m"
23    preloadSTTModel("whisper-tiny.en"),
24  ]);
25  
26  console.log("All models ready!");
27}
28
29// Call during app startup
30initApp();

Preload Functions

Function	Default Model	Description
preloadChatModel(modelId, opts?)	—	Preload LLM to IndexedDB
preloadEmbeddingModel(modelId?, opts?)	Xenova/all-MiniLM-L6-v2	Preload embedding model
preloadTTSModel(modelId?, opts?)	kokoro-82m	Preload text-to-speech model
preloadSTTModel(modelId?, opts?)	whisper-tiny.en	Preload speech-to-text model

Preload Options

types.ts

interface PreloadOptions {
  // Track download progress
  onProgress?: (p: PreloadProgress) => void;
  
  // Keep model loaded in memory after preload (default: false)
  // false = download, then dispose to free RAM
  // true = download and keep in memory for instant use
  keepLoaded?: boolean;
}

type PreloadProgress = {
  status: "downloading" | "loading" | "ready" | "error";
  file?: string;      // Current file being downloaded
  progress?: number;  // 0-100 percentage
  message?: string;   // Status message
};

keepLoaded Option

Control whether the model stays in memory after preloading:

Value	Behavior	Use Case
false	Download → Dispose → Free memory	Preload for later, save RAM
true	Download → Keep in memory	Instant use, no disk I/O delay

keep-loaded.ts

// Download only - frees RAM after preload (~400MB saved)
await preloadChatModel("qwen3-0.6b");
// Later: loads from IndexedDB cache (~1-2s)

// Keep in memory - uses RAM but instant inference
await preloadChatModel("qwen3-0.6b", { keepLoaded: true });
// Later: model already loaded, no wait

Browser Models

Models optimized for browser use. Automatically cached in IndexedDB after first download.

Model	Size	Speed	Best For
qwen3-0.6b	~400MB	100-150 tok/s	General use, thinking mode, reasoning
smollm2-360m	~250MB	150-200 tok/s	Faster responses, good quality
smollm2-135m	~100MB	200-300 tok/s	Fastest, basic tasks

Browser Support

Browser	Version	Status
Chrome / Edge	113+	✓ Full support
Safari	18+	⚠ May have quirks
Firefox	—	✗ Behind flag, not recommended

Troubleshooting

"WebGPU not supported"

Update to Chrome/Edge 113+
Check chrome://gpu for WebGPU status
Try enabling chrome://flags/#enable-unsafe-webgpu

Slow first load

First load downloads the model (~400MB for qwen3-0.6b) and compiles WebGPU shaders. Subsequent loads use IndexedDB cache and are much faster (~2-5s).

Out of memory

Smaller models like smollm2-135m use less GPU memory. Close other GPU-intensive tabs.

CORS / Header issues

Your server needs these headers for SharedArrayBuffer (required for threading):

Terminal

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

Next.js Configuration

Add the required headers and webpack config for Next.js:

next.config.js

01// next.config.js
02/** @type {import('next').NextConfig} */
03const nextConfig = {
04  async headers() {
05    return [
06      {
07        source: "/(.*)",
08        headers: [
09          { key: "Cross-Origin-Opener-Policy", value: "same-origin" },
10          { key: "Cross-Origin-Embedder-Policy", value: "require-corp" },
11        ],
12      },
13    ];
14  },
15  webpack: (config, { isServer }) => {
16    config.experiments = {
17      ...config.experiments,
18      asyncWebAssembly: true,
19    };
20    
21    if (isServer) {
22      config.externals.push("@huggingface/transformers");
23    } else {
24      // Exclude Node.js polyfills from browser bundle
25      config.resolve.alias = {
26        ...config.resolve.alias,
27        webgpu: false,
28      };
29      config.resolve.fallback = {
30        ...config.resolve.fallback,
31        path: false,
32        fs: false,
33        os: false,
34      };
35    }
36
37    return config;
38  },
39};
40
41module.exports = nextConfig;

Next Steps

React Hooks Reference → — useSpeech, useVoiceInput, useVoiceChat
Text-to-Speech → — generate natural speech in the browser
Speech-to-Text → — transcribe audio with Whisper
Vision AI → — analyze images in the browser