WebGPU

v1.0.0 / OSS MIT

On-device LLMs &
AI SDK Provider

Name: Gerbil
Author: Gerbil

Run in the browser, on Node.js, or anywhere JavaScript runs. WebGPU acceleration. CPU fallback. Zero API keys.

Text, vision, TTS, transcription, tools & skills. Works with generateText, streamText, and structured output.

$npm install@tryhamster/gerbil

Get Started GitHub

Loading playground...

WebGPU Accelerated

Run LLMs on the User's GPU

40-200+ tok/s via WebGPU with CPU fallback that runs anywhere JavaScript runs. Text, vision, TTS & transcription. All WebGPU accelerated. Cached in IndexedDB.

WebGPU / WebNN

CPU fallback

~100MB-2.5GB ONNX models

AI SDK Provider

Works with AI SDK

Drop-in provider for generateText, streamText, and structured output. Also works with ai-sdk-tools for agents and state management.

—Streaming responses
—Zod schema validation
—Tool calling
—Thinking mode (CoT)

AI SDK integration docs

text.ts

LLM text generation with streaming

01import { generateText } from "ai";

02import { gerbil } from "@tryhamster/gerbil/ai";

04const { text } = await generateText({

05 model: gerbil("qwen3-0.6b"),

06 prompt: "Explain WebGPU in one sentence",

07});

09console.log(text);

Edit prompt:

Runs locally with WebGPU — first run downloads model

Use Cases

What You Can Build

Micro AI interactions that run locally. No API calls, no latency, no cost per request.

Smart Autocomplete

Context-aware suggestions that understand what users actually want, not just pattern matching

await gerbil.complete(input, { context })

Type 'meeting' → suggests 'Schedule meeting with Sarah about Q4 planning'

Instant Classification

Route tickets, tag content, detect spam — all in real-time without server calls

await gerbil.classify(text, categories)

Support ticket → 'billing' (98% confidence)

One-Click Summaries

TL;DR any content on demand. Long emails, docs, articles — instantly digestible

await gerbil.summarize(content)

10-page report → 3 key takeaways in 200ms

Smart Search

Understand queries semantically, not just keywords. Find what users mean

await gerbil.search(query, documents)

'stuff from last week' → finds relevant items

Writing Assistance

Grammar, tone, clarity — help users write better without leaving the input

await gerbil.improve(text, { style })

Suggests clearer phrasing as you type

Smart Defaults

Pre-fill forms intelligently based on context and user patterns

await gerbil.suggest(field, context)

Auto-suggests project name from description

Content Extraction

Pull structured data from unstructured text. Names, dates, entities

await gerbil.extract(text, schema)

'Call John at 3pm' → { person: 'John', time: '3pm' }

Sentiment Analysis

Understand tone in real-time. Flag angry customers, celebrate happy ones

await gerbil.sentiment(text)

Customer message → 'frustrated' (prioritize)

Explain Anything

Let users highlight any text and get instant explanations, definitions, or context

await gerbil.explain(selection)

Highlight 'WebGPU' → explains in plain English

Image Understanding

Describe photos, analyze screenshots, extract text from images — all locally

await gerbil.generate(prompt, { images })

Upload receipt → extracts items, totals, dates

Visual QA

Let users ask questions about images in your app

await gerbil.generate("What is this?", { images })

'What color is the car?' → 'The car is blue'

Alt Text Generation

Auto-generate accessible image descriptions for your content

await captionImage({ image, style })

Photo → 'A sunset over the ocean with orange clouds'

Voice Narration

Read content aloud with natural-sounding voices. 28 voices, on-device TTS

await gerbil.speak(text, { voice: "af_heart" })

Blog post → Natural audio narration in 8x realtime

Voice Input

Let users speak instead of type. Transcribe audio locally with Whisper

await gerbil.transcribe(audioData)

🎤 'Schedule meeting tomorrow' → typed text

Voice Chat

Full voice-to-voice conversations. STT → LLM → TTS, all on-device

useVoiceChat({ llmModel, voice })

Speak question → AI responds with voice

// All client-side. No server. No API costs.

Explore all skills

Features

Why Gerbil?

AI that runs where your code runs.

Runs Anywhere

Browser. Server. Edge. Same API everywhere JavaScript runs.

Feels Instant

40-200 tok/s on WebGPU. Fast enough to feel like magic.

Nothing to Manage

No API keys. No model servers. No billing dashboards. No ops.

Private by Default

Data never leaves the device. Ship AI in healthcare, finance, anywhere.

Downloads Once

100MB-2.5GB models. Cached in IndexedDB. Instant after first load.

Production Ready

Vision, tool calling, thinking mode, skills. With one line of code.

Integrations

Works Everywhere

Native integrations for your favorite frameworks and tools.

Vercel AI SDKDrop-in provider for AI SDK

Next.jsAPI routes & React hooks

ExpressMiddleware for Express apps

LangChainLLM and embeddings

LlamaIndexRAG integration

MCPClaude Desktop, Cursor

ReactuseGerbil, useChat hooks

HonoEdge-ready middleware

Usage

Browser & Server

Same API, different environments. Run in the browser via WebGPU or on Node.js with GPU/CPU.

Browser

client.ts

01import gerbil from "@tryhamster/gerbil/browser";
02
03// Load model (cached after first download)
04await gerbil.loadModel("smollm2-360m");
05
06// Power your UI with AI
07const suggestions = await gerbil.complete(userInput);
08const summary = await gerbil.summarize(longText);
09const category = await gerbil.classify(content, labels);
10
11// Streaming for chat UIs
12for await (const chunk of gerbil.stream(prompt)) {
13  updateUI(chunk);
14}

React hooks docs

Node.js

server.ts

01import gerbil from "@tryhamster/gerbil";
02
03// Load larger model on server
04await gerbil.loadModel("qwen3-0.6b");
05
06// Generate with thinking mode
07const result = await gerbil.generate("Write a haiku", {
08  thinking: true,
09  maxTokens: 100,
10});
11
12console.log(result.thinking); // reasoning steps
13console.log(result.text);     // final response

Getting started

Vision AI

vision.ts

01import { Gerbil } from "@tryhamster/gerbil";
02
03const g = new Gerbil();
04await g.loadModel("ministral-3b"); // Vision + reasoning
05
06// Describe any image
07const { text } = await g.generate("What's in this photo?", {
08  images: [{ source: "https://example.com/sunset.jpg" }]
09});
10
11// Compare images
12const diff = await g.generate("What changed?", {
13  images: [
14    { source: beforeScreenshot },
15    { source: afterScreenshot }
16  ]
17});

Vision docs

Tool Calling

tools.ts

01import { defineTool } from "@tryhamster/gerbil";
02import { z } from "zod";
03
04const weather = defineTool({
05  name: "get_weather",
06  description: "Get weather for a city",
07  parameters: z.object({
08    city: z.string(),
09  }),
10  execute: async ({ city }) => {
11    return `Weather in ${city}: 72°F, sunny`;
12  },
13});
14
15// LLM can now call this tool during generation

Tools docs

Skills

skills.ts

01import { commit, summarize, review } from "@tryhamster/gerbil/skills";
02import { describeImage, captionImage } from "@tryhamster/gerbil/skills";
03
04// Generate commit message from staged changes
05const msg = await commit({ type: "conventional" });
06
07// Summarize any content
08const tldr = await summarize({ content: longDoc });
09
10// Vision skills
11const alt = await captionImage({ image: photoUrl });
12const analysis = await describeImage({ 
13  image: screenshot, 
14  focus: "text" 
15});

Skills docs

Text-to-Speech

tts.ts

01import { Gerbil } from "@tryhamster/gerbil";
02
03const g = new Gerbil();
04
05// Generate speech with Kokoro-82M
06const result = await g.speak("Hello, I'm Gerbil!", {
07  voice: "af_heart",  // 28 voices available
08  speed: 1.0,
09});
10
11// result.audio = Float32Array (PCM samples)
12// result.sampleRate = 24000
13// result.duration = seconds
14
15// Or use the AI SDK
16import { experimental_generateSpeech } from "ai";
17const audio = await experimental_generateSpeech({
18  model: gerbil.speech(),
19  text: "Hello from Gerbil!",
20});

TTS docs

Speech-to-Text

stt.ts

01import { Gerbil } from "@tryhamster/gerbil";
02import { readFileSync } from "fs";
03
04const g = new Gerbil();
05
06// Transcribe audio with Whisper
07const audioData = new Uint8Array(readFileSync("audio.wav"));
08const result = await g.transcribe(audioData, {
09  timestamps: true,  // Get word-level timing
10});
11
12console.log(result.text);
13// "Hello world, this is a test"
14
15// With timestamps
16for (const seg of result.segments) {
17  console.log(`[${seg.start}s] ${seg.text}`);
18}

STT docs

CLI

Terminal

$ gerbil "Write a haiku about coding"

🤖 Loading smollm2-360m...
✓ Model loaded (2.3s)

Silent keystrokes fall
Bugs emerge from tangled code  
Coffee saves the day

⚡ 47.2 tok/s | 0.8s

$ gerbil speak "Hello world" --voice bf_emma
$ gerbil transcribe audio.wav --timestamps
$ gerbil voice question.wav  # STT → LLM → TTS

CLI docs

Browse all documentation

Models

Built-in Models

Optimized for browser and Node.js. Small enough to download, powerful enough to impress.

Model	Type	Size	Best For
`ministral-3b`	LLM	~2.5GB	Vision + reasoning
`qwen3-0.6b`	LLM	~400MB	General use, reasoning
`qwen2.5-0.5b`	LLM	~350MB	General use
`qwen2.5-coder-0.5b`	LLM	~400MB	Code generation
`smollm2-360m`	LLM	~250MB	Fast completions
`smollm2-135m`	LLM	~100MB	Ultra-fast, tiny
`smollm2-1.7b`	LLM	~1.2GB	Higher quality
`phi-3-mini`	LLM	~2.1GB	High quality
`llama-3.2-1b`	LLM	~800MB	General use
`gemma-2b`	LLM	~1.4GB	Balanced
`tinyllama-1.1b`	LLM	~700MB	Lightweight
Text-to-Speech
`kokoro-82m`	TTS	~330MB	28 voices, 24kHz, US/UK English
`supertonic-66m`	TTS	~250MB	4 voices, 44.1kHz, fastest
Speech-to-Text
`whisper-tiny.en`	STT	~39MB	Fastest transcription
`whisper-base.en`	STT	~74MB	Balanced speed/accuracy
`whisper-small.en`	STT	~244MB	High quality
`whisper-large-v3-turbo`	STT	~809MB	Best quality, 80+ langs

// Use any HuggingFace model: await gerbil.loadModel("hf:org/model")

Get Started

$ npm install @tryhamster/gerbil

Read the Docs Try in Browser

On-device LLMs &AI SDK Provider