React Hooks

Gerbil ships a hook for every modality you'd build a UI around — chat, completion, autocomplete, vision, embeddings, speech, memory, and a full on-device voice assistant. Import them from @tryhamster/gerbil/hooks, drop one into a component, and you have local, GPU-accelerated AI with no server, no API keys, and no provider to wrap your app in.

Zero-config by default

Every hook works with no arguments. Each one picks a sensible default model for its capability and loads it on first use, so the smallest possible app is one line:

Chat.tsx
01"use client";
02
03import { useChat } from "@tryhamster/gerbil/hooks";
04
05function Chat() {
06 const { messages, send, isGenerating } = useChat(); // no model needed
07
08 return (
09 <div>
10 {messages.map((m, i) => (
11 <p key={i}>
12 <strong>{m.role}:</strong> {m.content}
13 </p>
14 ))}
15 <button onClick={() => send("Hello!")} disabled={isGenerating}>
16 Send
17 </button>
18 </div>
19 );
20}

When you want a specific model, pass one: useChat({ model: "mlx-community/Qwen3.5-0.8B-4bit", system: "You are concise." }). Otherwise the defaults just work, and dtype: "auto" adapts precision to the device.

The two you'll reach for first

useChat manages a conversation and streams replies in. Multi-turn context is handled for you — the full history is sent each turn — and the lifecycle is reported through a single status value:

Assistant.tsx
01"use client";
02
03import { useChat } from "@tryhamster/gerbil/hooks";
04
05function Assistant() {
06 const { messages, send, status, isGenerating, tps, stop } = useChat({
07 system: "You are a helpful assistant.",
08 });
09
10 return (
11 <div>
12 {messages.map((m, i) => (
13 <p key={i}><strong>{m.role}:</strong> {m.content}</p>
14 ))}
15
16 <button onClick={() => send("Explain WebGPU in one line.")} disabled={isGenerating}>
17 {status === "streaming" ? "Streaming…" : "Ask"}
18 </button>
19 {isGenerating && <button onClick={stop}>Stop</button>}
20 {tps && <span>{tps.toFixed(0)} tok/s</span>}
21 </div>
22 );
23}

useVoiceChat is the one you won't find anywhere else: a complete spoken assistant that runs end to end on the device. It composes speech-to-text, chat, and text-to-speech into a single mic → LLM → spoken-reply loop — no cloud round-trip at any stage:

VoiceAssistant.tsx
01"use client";
02
03import { useVoiceChat } from "@tryhamster/gerbil/hooks";
04
05function VoiceAssistant() {
06 const {
07 messages,
08 start,
09 stop,
10 isListening,
11 isTranscribing,
12 isThinking,
13 isSpeaking,
14 transcript,
15 } = useVoiceChat({ system: "You are a friendly voice assistant.", voice: "en_us" });
16
17 const status = isListening
18 ? "Listening…"
19 : isTranscribing
20 ? "Transcribing…"
21 : isThinking
22 ? "Thinking…"
23 : isSpeaking
24 ? "Speaking…"
25 : "Tap to talk";
26
27 return (
28 <div>
29 <button onClick={() => (isListening ? stop() : start())}>{status}</button>
30 {transcript && <p>You said: {transcript}</p>}
31 {messages.map((m, i) => (
32 <p key={i}><strong>{m.role}:</strong> {m.content}</p>
33 ))}
34 </div>
35 );
36}

Pass speak: false for a text-only voice-input loop, or point sttModel / ttsModel at your own checkpoints.

The full hook set

Every hook is zero-config and import from @tryhamster/gerbil/hooks.

HookPurposeKey return fields
useChatMulti-turn conversation with streaming replies.messages, send, regenerate, status, isGenerating, tps, stop
useVoiceChat ⭐Full on-device voice assistant (mic → LLM → spoken reply).messages, start, stop, isListening, isThinking, isSpeaking, transcript
useCompletionSingle-prompt streaming with built-in input helpers.completion, complete, input, handleInputChange, handleSubmit, stop
useTextOne-shot text generation.complete, completion, isGenerating, tps
useObjectStructured output — generate, parse JSON, validate, and retry until valid.object, generate, attempts, isGenerating, isLoading
useAutocompleteInline autocomplete (ghost text) with built-in debounce and stale-response guards.suggestion, onInput, accept, dismiss, isFetching, isReady
useVisionImage understanding (image in → text out).describeImage, completion, isGenerating
useEmbeddingText embeddings and similarity scoring.embed, similarity, isReady
useTTSText-to-speech with playback and replay.speak, replay, stop, isSynthesizing, isPlaying, hasAudio, rtf
useSTTSpeech-to-text from the microphone.startRecording, stopRecording, transcript, isRecording, isTranscribing
useMemoryOn-device RAG — store, recall, and search text.add, recall, search, remove, clear, size, isReady
useEngineThe advanced base hook — escape hatch for full engine control.complete, embed, describeImage, speak, load, dispose

Reach for useEngine only when you need something the modality hooks don't expose — mixing capabilities in one component, or driving the engine imperatively. For everything else, the dedicated hook is shorter and clearer.

For inline autocomplete and ghost text, useAutocomplete owns the debounce, in-flight, and stale-response guards so your component just renders the suggestion and handles Tab to accept and Esc to dismiss. See Autocomplete & Rewrite for the full guide, including the lower-level autocomplete and rewrite methods on useEngine.

Live tokens per second

useEngine().tps updates on every token during generation, so a tok/s readout ticks live as text streams in rather than only resolving once the run finishes. The hooks built on it — including useChat and useText — surface the same live value through their own tps field.

When you drive the engine imperatively, the live number comes from the onToken callback's second argument, a meta object with { tokenIndex, tps, elapsedMs }. The tps here is decode-only — it measures token generation and excludes the one-time prefill, so it reflects steady-state throughput rather than being dragged down by prompt processing:

onToken.ts
01const result = await engine.generate(prompt, {
02 onToken: (token, meta) => {
03 process.stdout.write(token);
04 if (meta) {
05 // meta.tokenIndex — index of this token in the decode loop
06 // meta.tps — live decode-only tokens/sec (excludes prefill)
07 // meta.elapsedMs — ms elapsed since generation started
08 updateSpeedReadout(meta.tps);
09 }
10 },
11});

Forms with useCompletion

useCompletion is a single-prompt hook with controlled-input helpers, so a prompt box is a few lines:

Prompt.tsx
01"use client";
02
03import { useCompletion } from "@tryhamster/gerbil/hooks";
04
05function Prompt() {
06 const { completion, input, handleInputChange, handleSubmit, isLoading } =
07 useCompletion();
08
09 return (
10 <form onSubmit={handleSubmit}>
11 <input value={input} onChange={handleInputChange} placeholder="Ask anything…" />
12 <button type="submit" disabled={isLoading}>
13 {isLoading ? "…" : "Run"}
14 </button>
15 <p>{completion}</p>
16 </form>
17 );
18}

Structured output with useObject

useObject generates JSON, parses it, validates it against a schema, and retries until it's valid. On-device tokens are free, so re-rolling malformed JSON costs nothing but a moment — you get a typed object back instead of a string you have to defensively parse:

Extractor.tsx
01"use client";
02
03import { useObject } from "@tryhamster/gerbil/hooks";
04
05type Person = { name: string; age: number };
06
07function Extractor() {
08 const { object, generate, attempts, isGenerating } = useObject<Person>();
09
10 return (
11 <div>
12 <button
13 onClick={() =>
14 generate('Extract {name, age} from: "I am Sarah, 28"', {
15 schema: { required: ["name", "age"] },
16 })
17 }
18 disabled={isGenerating}
19 >
20 {isGenerating ? "Extracting…" : "Extract"}
21 </button>
22 {object && (
23 <p>
24 {object.name} is {object.age} (took {attempts} attempt{attempts === 1 ? "" : "s"})
25 </p>
26 )}
27 </div>
28 );
29}

The schema is either a minimal JSON-schema-ish object with required keys, or a predicate (o) => boolean for arbitrary validation. Omit it to require valid JSON only. Tune the retry budget with maxRetries (default 4). Need it imperatively instead of as a hook? Call engine.generateObject(prompt, { schema }) — it returns { object, attempts }.

They share one engine

Two components asking for the same model receive the same underlying engine, so the weights upload to the GPU once no matter how many hooks use them — and distinct models run side by side. You don't wire any of this up. See Concurrency & Memory for how the shared-engine lifecycle and GPU memory budgeting work.

Coming from the Vercel AI SDK?

If you've used the AI SDK's useChat and useCompletion, these map closely — same status lifecycle (ready submitted streaming), the same stop(), and a sendMessage alias on useChat. A few things differ because everything runs on the user's device:

  • No backend to wire. There's no api route or transport to configure — the model runs in the browser, so you pass a model (or nothing) instead of an endpoint.
  • Messages use string content. Each message is { role, content } with content as a plain string. The newer parts[] message shape isn't adopted — read m.content directly.
  • Loading is part of the API. Because the first call downloads and uploads weights, hooks surface isLoading and loadingProgress so you can show a one-time model-download bar.

If you're using the AI SDK on the server, the Gerbil AI SDK provider still works there — these hooks are the in-browser counterpart.