On-device LLMs &
AI SDK Provider
Run in the browser, on Node.js, or anywhere JavaScript runs. WebGPU acceleration. CPU fallback. Zero API keys.
Text, vision, TTS, transcription, tools & skills. Works with generateText, streamText, and structured output.
Run LLMs on the User's GPU
40-200+ tok/s via WebGPU with CPU fallback that runs anywhere JavaScript runs. Text, vision, TTS & transcription. All WebGPU accelerated. Cached in IndexedDB.
Works with AI SDK
Drop-in provider for generateText, streamText, and structured output. Also works with ai-sdk-tools for agents and state management.
- —Streaming responses
- —Zod schema validation
- —Tool calling
- —Thinking mode (CoT)
What You Can Build
Micro AI interactions that run locally. No API calls, no latency, no cost per request.
Smart Autocomplete
Context-aware suggestions that understand what users actually want, not just pattern matching
await gerbil.complete(input, { context })Type 'meeting' → suggests 'Schedule meeting with Sarah about Q4 planning'
Instant Classification
Route tickets, tag content, detect spam — all in real-time without server calls
await gerbil.classify(text, categories)Support ticket → 'billing' (98% confidence)
One-Click Summaries
TL;DR any content on demand. Long emails, docs, articles — instantly digestible
await gerbil.summarize(content)10-page report → 3 key takeaways in 200ms
Smart Search
Understand queries semantically, not just keywords. Find what users mean
await gerbil.search(query, documents)'stuff from last week' → finds relevant items
Writing Assistance
Grammar, tone, clarity — help users write better without leaving the input
await gerbil.improve(text, { style })Suggests clearer phrasing as you type
Smart Defaults
Pre-fill forms intelligently based on context and user patterns
await gerbil.suggest(field, context)Auto-suggests project name from description
Content Extraction
Pull structured data from unstructured text. Names, dates, entities
await gerbil.extract(text, schema)'Call John at 3pm' → { person: 'John', time: '3pm' }
Sentiment Analysis
Understand tone in real-time. Flag angry customers, celebrate happy ones
await gerbil.sentiment(text)Customer message → 'frustrated' (prioritize)
Explain Anything
Let users highlight any text and get instant explanations, definitions, or context
await gerbil.explain(selection)Highlight 'WebGPU' → explains in plain English
Image Understanding
Describe photos, analyze screenshots, extract text from images — all locally
await gerbil.generate(prompt, { images })Upload receipt → extracts items, totals, dates
Visual QA
Let users ask questions about images in your app
await gerbil.generate("What is this?", { images })'What color is the car?' → 'The car is blue'
Alt Text Generation
Auto-generate accessible image descriptions for your content
await captionImage({ image, style })Photo → 'A sunset over the ocean with orange clouds'
Voice Narration
Read content aloud with natural-sounding voices. 28 voices, on-device TTS
await gerbil.speak(text, { voice: "af_heart" })Blog post → Natural audio narration in 8x realtime
Voice Input
Let users speak instead of type. Transcribe audio locally with Whisper
await gerbil.transcribe(audioData)🎤 'Schedule meeting tomorrow' → typed text
Voice Chat
Full voice-to-voice conversations. STT → LLM → TTS, all on-device
useVoiceChat({ llmModel, voice })Speak question → AI responds with voice
// All client-side. No server. No API costs.
Explore all skillsWhy Gerbil?
AI that runs where your code runs.
Runs Anywhere
Browser. Server. Edge. Same API everywhere JavaScript runs.
Feels Instant
40-200 tok/s on WebGPU. Fast enough to feel like magic.
Nothing to Manage
No API keys. No model servers. No billing dashboards. No ops.
Private by Default
Data never leaves the device. Ship AI in healthcare, finance, anywhere.
Downloads Once
100MB-2.5GB models. Cached in IndexedDB. Instant after first load.
Production Ready
Vision, tool calling, thinking mode, skills. With one line of code.
Works Everywhere
Native integrations for your favorite frameworks and tools.
Browser & Server
Same API, different environments. Run in the browser via WebGPU or on Node.js with GPU/CPU.
Browser
01import gerbil from "@tryhamster/gerbil/browser";02
03// Load model (cached after first download)04await gerbil.loadModel("smollm2-360m");05
06// Power your UI with AI07const suggestions = await gerbil.complete(userInput);08const summary = await gerbil.summarize(longText);09const category = await gerbil.classify(content, labels);10
11// Streaming for chat UIs12for await (const chunk of gerbil.stream(prompt)) {13 updateUI(chunk);14}Node.js
01import gerbil from "@tryhamster/gerbil";02
03// Load larger model on server04await gerbil.loadModel("qwen3-0.6b");05
06// Generate with thinking mode07const result = await gerbil.generate("Write a haiku", {08 thinking: true,09 maxTokens: 100,10});11
12console.log(result.thinking); // reasoning steps13console.log(result.text); // final responseVision AI
01import { Gerbil } from "@tryhamster/gerbil";02
03const g = new Gerbil();04await g.loadModel("ministral-3b"); // Vision + reasoning05
06// Describe any image07const { text } = await g.generate("What's in this photo?", {08 images: [{ source: "https://example.com/sunset.jpg" }]09});10
11// Compare images12const diff = await g.generate("What changed?", {13 images: [14 { source: beforeScreenshot },15 { source: afterScreenshot }16 ]17});Tool Calling
01import { defineTool } from "@tryhamster/gerbil";02import { z } from "zod";03
04const weather = defineTool({05 name: "get_weather",06 description: "Get weather for a city",07 parameters: z.object({08 city: z.string(),09 }),10 execute: async ({ city }) => {11 return `Weather in ${city}: 72°F, sunny`;12 },13});14
15// LLM can now call this tool during generationSkills
01import { commit, summarize, review } from "@tryhamster/gerbil/skills";02import { describeImage, captionImage } from "@tryhamster/gerbil/skills";03
04// Generate commit message from staged changes05const msg = await commit({ type: "conventional" });06
07// Summarize any content08const tldr = await summarize({ content: longDoc });09
10// Vision skills11const alt = await captionImage({ image: photoUrl });12const analysis = await describeImage({ 13 image: screenshot, 14 focus: "text" 15});Text-to-Speech
01import { Gerbil } from "@tryhamster/gerbil";02
03const g = new Gerbil();04
05// Generate speech with Kokoro-82M06const result = await g.speak("Hello, I'm Gerbil!", {07 voice: "af_heart", // 28 voices available08 speed: 1.0,09});10
11// result.audio = Float32Array (PCM samples)12// result.sampleRate = 2400013// result.duration = seconds14
15// Or use the AI SDK16import { experimental_generateSpeech } from "ai";17const audio = await experimental_generateSpeech({18 model: gerbil.speech(),19 text: "Hello from Gerbil!",20});Speech-to-Text
01import { Gerbil } from "@tryhamster/gerbil";02import { readFileSync } from "fs";03
04const g = new Gerbil();05
06// Transcribe audio with Whisper07const audioData = new Uint8Array(readFileSync("audio.wav"));08const result = await g.transcribe(audioData, {09 timestamps: true, // Get word-level timing10});11
12console.log(result.text);13// "Hello world, this is a test"14
15// With timestamps16for (const seg of result.segments) {17 console.log(`[${seg.start}s] ${seg.text}`);18}CLI
$ gerbil "Write a haiku about coding"
🤖 Loading smollm2-360m...✓ Model loaded (2.3s)
Silent keystrokes fallBugs emerge from tangled code Coffee saves the day
⚡ 47.2 tok/s | 0.8s
$ gerbil speak "Hello world" --voice bf_emma$ gerbil transcribe audio.wav --timestamps$ gerbil voice question.wav # STT → LLM → TTSBuilt-in Models
Optimized for browser and Node.js. Small enough to download, powerful enough to impress.
| Model | Type | Size | Speed | Think | Vision | TTS | STT | Browser | Best For |
|---|---|---|---|---|---|---|---|---|---|
ministral-3b | LLM | ~2.5GB | Vision + reasoning | ||||||
qwen3-0.6b | LLM | ~400MB | General use, reasoning | ||||||
qwen2.5-0.5b | LLM | ~350MB | General use | ||||||
qwen2.5-coder-0.5b | LLM | ~400MB | Code generation | ||||||
smollm2-360m | LLM | ~250MB | Fast completions | ||||||
smollm2-135m | LLM | ~100MB | Ultra-fast, tiny | ||||||
smollm2-1.7b | LLM | ~1.2GB | Higher quality | ||||||
phi-3-mini | LLM | ~2.1GB | High quality | ||||||
llama-3.2-1b | LLM | ~800MB | General use | ||||||
gemma-2b | LLM | ~1.4GB | Balanced | ||||||
tinyllama-1.1b | LLM | ~700MB | Lightweight | ||||||
Text-to-Speech | |||||||||
kokoro-82m | TTS | ~330MB | 28 voices, 24kHz, US/UK English | ||||||
supertonic-66m | TTS | ~250MB | 4 voices, 44.1kHz, fastest | ||||||
Speech-to-Text | |||||||||
whisper-tiny.en | STT | ~39MB | Fastest transcription | ||||||
whisper-base.en | STT | ~74MB | Balanced speed/accuracy | ||||||
whisper-small.en | STT | ~244MB | High quality | ||||||
whisper-large-v3-turbo | STT | ~809MB | Best quality, 80+ langs | ||||||
// Use any HuggingFace model: await gerbil.loadModel("hf:org/model")