React Hooks
Gerbil ships a hook for every modality you'd build a UI around — chat, completion, autocomplete, vision, embeddings, speech, memory, and a full on-device voice assistant. Import them from @tryhamster/gerbil/hooks, drop one into a component, and you have local, GPU-accelerated AI with no server, no API keys, and no provider to wrap your app in.
Zero-config by default
Every hook works with no arguments. Each one picks a sensible default model for its capability and loads it on first use, so the smallest possible app is one line:
01"use client";02
03import { useChat } from "@tryhamster/gerbil/hooks";04
05function Chat() {06 const { messages, send, isGenerating } = useChat(); // no model needed07
08 return (09 <div>10 {messages.map((m, i) => (11 <p key={i}>12 <strong>{m.role}:</strong> {m.content}13 </p>14 ))}15 <button onClick={() => send("Hello!")} disabled={isGenerating}>16 Send17 </button>18 </div>19 );20}When you want a specific model, pass one: useChat({ model: "mlx-community/Qwen3.5-0.8B-4bit", system: "You are concise." }). Otherwise the defaults just work, and dtype: "auto" adapts precision to the device.
The two you'll reach for first
useChat manages a conversation and streams replies in. Multi-turn context is handled for you — the full history is sent each turn — and the lifecycle is reported through a single status value:
01"use client";02
03import { useChat } from "@tryhamster/gerbil/hooks";04
05function Assistant() {06 const { messages, send, status, isGenerating, tps, stop } = useChat({07 system: "You are a helpful assistant.",08 });09
10 return (11 <div>12 {messages.map((m, i) => (13 <p key={i}><strong>{m.role}:</strong> {m.content}</p>14 ))}15
16 <button onClick={() => send("Explain WebGPU in one line.")} disabled={isGenerating}>17 {status === "streaming" ? "Streaming…" : "Ask"}18 </button>19 {isGenerating && <button onClick={stop}>Stop</button>}20 {tps && <span>{tps.toFixed(0)} tok/s</span>}21 </div>22 );23}useVoiceChat is the one you won't find anywhere else: a complete spoken assistant that runs end to end on the device. It composes speech-to-text, chat, and text-to-speech into a single mic → LLM → spoken-reply loop — no cloud round-trip at any stage:
01"use client";02
03import { useVoiceChat } from "@tryhamster/gerbil/hooks";04
05function VoiceAssistant() {06 const {07 messages,08 start,09 stop,10 isListening,11 isTranscribing,12 isThinking,13 isSpeaking,14 transcript,15 } = useVoiceChat({ system: "You are a friendly voice assistant.", voice: "en_us" });16
17 const status = isListening18 ? "Listening…"19 : isTranscribing20 ? "Transcribing…"21 : isThinking22 ? "Thinking…"23 : isSpeaking24 ? "Speaking…"25 : "Tap to talk";26
27 return (28 <div>29 <button onClick={() => (isListening ? stop() : start())}>{status}</button>30 {transcript && <p>You said: {transcript}</p>}31 {messages.map((m, i) => (32 <p key={i}><strong>{m.role}:</strong> {m.content}</p>33 ))}34 </div>35 );36}Pass speak: false for a text-only voice-input loop, or point sttModel / ttsModel at your own checkpoints.
The full hook set
Every hook is zero-config and import from @tryhamster/gerbil/hooks.
| Hook | Purpose | Key return fields |
|---|---|---|
| useChat | Multi-turn conversation with streaming replies. | messages, send, regenerate, status, isGenerating, tps, stop |
| useVoiceChat ⭐ | Full on-device voice assistant (mic → LLM → spoken reply). | messages, start, stop, isListening, isThinking, isSpeaking, transcript |
| useCompletion | Single-prompt streaming with built-in input helpers. | completion, complete, input, handleInputChange, handleSubmit, stop |
| useText | One-shot text generation. | complete, completion, isGenerating, tps |
| useObject | Structured output — generate, parse JSON, validate, and retry until valid. | object, generate, attempts, isGenerating, isLoading |
| useAutocomplete | Inline autocomplete (ghost text) with built-in debounce and stale-response guards. | suggestion, onInput, accept, dismiss, isFetching, isReady |
| useVision | Image understanding (image in → text out). | describeImage, completion, isGenerating |
| useEmbedding | Text embeddings and similarity scoring. | embed, similarity, isReady |
| useTTS | Text-to-speech with playback and replay. | speak, replay, stop, isSynthesizing, isPlaying, hasAudio, rtf |
| useSTT | Speech-to-text from the microphone. | startRecording, stopRecording, transcript, isRecording, isTranscribing |
| useMemory | On-device RAG — store, recall, and search text. | add, recall, search, remove, clear, size, isReady |
| useEngine | The advanced base hook — escape hatch for full engine control. | complete, embed, describeImage, speak, load, dispose |
Reach for useEngine only when you need something the modality hooks don't expose — mixing capabilities in one component, or driving the engine imperatively. For everything else, the dedicated hook is shorter and clearer.
For inline autocomplete and ghost text, useAutocomplete owns the debounce, in-flight, and stale-response guards so your component just renders the suggestion and handles Tab to accept and Esc to dismiss. See Autocomplete & Rewrite for the full guide, including the lower-level autocomplete and rewrite methods on useEngine.
Live tokens per second
useEngine().tps updates on every token during generation, so a tok/s readout ticks live as text streams in rather than only resolving once the run finishes. The hooks built on it — including useChat and useText — surface the same live value through their own tps field.
When you drive the engine imperatively, the live number comes from the onToken callback's second argument, a meta object with { tokenIndex, tps, elapsedMs }. The tps here is decode-only — it measures token generation and excludes the one-time prefill, so it reflects steady-state throughput rather than being dragged down by prompt processing:
01const result = await engine.generate(prompt, {02 onToken: (token, meta) => {03 process.stdout.write(token);04 if (meta) {05 // meta.tokenIndex — index of this token in the decode loop06 // meta.tps — live decode-only tokens/sec (excludes prefill)07 // meta.elapsedMs — ms elapsed since generation started08 updateSpeedReadout(meta.tps);09 }10 },11});Forms with useCompletion
useCompletion is a single-prompt hook with controlled-input helpers, so a prompt box is a few lines:
01"use client";02
03import { useCompletion } from "@tryhamster/gerbil/hooks";04
05function Prompt() {06 const { completion, input, handleInputChange, handleSubmit, isLoading } =07 useCompletion();08
09 return (10 <form onSubmit={handleSubmit}>11 <input value={input} onChange={handleInputChange} placeholder="Ask anything…" />12 <button type="submit" disabled={isLoading}>13 {isLoading ? "…" : "Run"}14 </button>15 <p>{completion}</p>16 </form>17 );18}Structured output with useObject
useObject generates JSON, parses it, validates it against a schema, and retries until it's valid. On-device tokens are free, so re-rolling malformed JSON costs nothing but a moment — you get a typed object back instead of a string you have to defensively parse:
01"use client";02
03import { useObject } from "@tryhamster/gerbil/hooks";04
05type Person = { name: string; age: number };06
07function Extractor() {08 const { object, generate, attempts, isGenerating } = useObject<Person>();09
10 return (11 <div>12 <button13 onClick={() =>14 generate('Extract {name, age} from: "I am Sarah, 28"', {15 schema: { required: ["name", "age"] },16 })17 }18 disabled={isGenerating}19 >20 {isGenerating ? "Extracting…" : "Extract"}21 </button>22 {object && (23 <p>24 {object.name} is {object.age} (took {attempts} attempt{attempts === 1 ? "" : "s"})25 </p>26 )}27 </div>28 );29}The schema is either a minimal JSON-schema-ish object with required keys, or a predicate (o) => boolean for arbitrary validation. Omit it to require valid JSON only. Tune the retry budget with maxRetries (default 4). Need it imperatively instead of as a hook? Call engine.generateObject(prompt, { schema }) — it returns { object, attempts }.
They share one engine
Two components asking for the same model receive the same underlying engine, so the weights upload to the GPU once no matter how many hooks use them — and distinct models run side by side. You don't wire any of this up. See Concurrency & Memory for how the shared-engine lifecycle and GPU memory budgeting work.
Coming from the Vercel AI SDK?
If you've used the AI SDK's useChat and useCompletion, these map closely — same status lifecycle (ready → submitted → streaming), the same stop(), and a sendMessage alias on useChat. A few things differ because everything runs on the user's device:
- —No backend to wire. There's no
apiroute ortransportto configure — the model runs in the browser, so you pass amodel(or nothing) instead of an endpoint. - —Messages use string content. Each message is
{ role, content }withcontentas a plain string. The newerparts[]message shape isn't adopted — readm.contentdirectly. - —Loading is part of the API. Because the first call downloads and uploads weights, hooks surface
isLoadingandloadingProgressso you can show a one-time model-download bar.
If you're using the AI SDK on the server, the Gerbil AI SDK provider still works there — these hooks are the in-browser counterpart.