React Hooks

Name: Gerbil
Author: Gerbil

Gerbil ships a hook for every modality you'd build a UI around — chat, completion, autocomplete, vision, embeddings, speech, memory, and a full on-device voice assistant. Import them from @tryhamster/gerbil/hooks, drop one into a component, and you have local, GPU-accelerated AI with no server, no API keys, and no provider to wrap your app in.

Zero-config by default

Every hook works with no arguments. Each one picks a sensible default model for its capability and loads it on first use, so the smallest possible app is one line:

Chat.tsx

01"use client";
02
03import { useChat } from "@tryhamster/gerbil/hooks";
04
05function Chat() {
06  const { messages, send, isGenerating } = useChat(); // no model needed
07
08  return (
09    <div>
10      {messages.map((m, i) => (
11        <p key={i}>
12          <strong>{m.role}:</strong> {m.content}
13        </p>
14      ))}
15      <button onClick={() => send("Hello!")} disabled={isGenerating}>
16        Send
17      </button>
18    </div>
19  );
20}

When you want a specific model, pass one: useChat({ model: "mlx-community/Qwen3.5-0.8B-4bit", system: "You are concise." }). Otherwise the defaults just work, and dtype: "auto" adapts precision to the device.

The two you'll reach for first

useChat manages a conversation and streams replies in. Multi-turn context is handled for you — the full history is sent each turn — and the lifecycle is reported through a single status value:

Assistant.tsx

01"use client";
02
03import { useChat } from "@tryhamster/gerbil/hooks";
04
05function Assistant() {
06  const { messages, send, status, isGenerating, tps, stop } = useChat({
07    system: "You are a helpful assistant.",
08  });
09
10  return (
11    <div>
12      {messages.map((m, i) => (
13        <p key={i}><strong>{m.role}:</strong> {m.content}</p>
14      ))}
15
16      <button onClick={() => send("Explain WebGPU in one line.")} disabled={isGenerating}>
17        {status === "streaming" ? "Streaming…" : "Ask"}
18      </button>
19      {isGenerating && <button onClick={stop}>Stop</button>}
20      {tps && <span>{tps.toFixed(0)} tok/s</span>}
21    </div>
22  );
23}

useVoiceChat is the one you won't find anywhere else: a complete spoken assistant that runs end to end on the device. It composes speech-to-text, chat, and text-to-speech into a single mic → LLM → spoken-reply loop — no cloud round-trip at any stage:

VoiceAssistant.tsx

01"use client";
02
03import { useVoiceChat } from "@tryhamster/gerbil/hooks";
04
05function VoiceAssistant() {
06  const {
07    messages,
08    start,
09    stop,
10    isListening,
11    isTranscribing,
12    isThinking,
13    isSpeaking,
14    transcript,
15  } = useVoiceChat({ system: "You are a friendly voice assistant.", voice: "en_us" });
16
17  const status = isListening
18    ? "Listening…"
19    : isTranscribing
20      ? "Transcribing…"
21      : isThinking
22        ? "Thinking…"
23        : isSpeaking
24          ? "Speaking…"
25          : "Tap to talk";
26
27  return (
28    <div>
29      <button onClick={() => (isListening ? stop() : start())}>{status}</button>
30      {transcript && <p>You said: {transcript}</p>}
31      {messages.map((m, i) => (
32        <p key={i}><strong>{m.role}:</strong> {m.content}</p>
33      ))}
34    </div>
35  );
36}

Pass speak: false for a text-only voice-input loop, or point sttModel / ttsModel at your own checkpoints.

The full hook set

Every hook is zero-config and import from @tryhamster/gerbil/hooks.

Hook	Purpose	Key return fields
useChat	Multi-turn conversation with streaming replies.	messages, send, regenerate, status, isGenerating, tps, stop
useVoiceChat ⭐	Full on-device voice assistant (mic → LLM → spoken reply).	messages, start, stop, isListening, isThinking, isSpeaking, transcript
useCompletion	Single-prompt streaming with built-in input helpers.	completion, complete, input, handleInputChange, handleSubmit, stop
useText	One-shot text generation.	complete, completion, isGenerating, tps
useObject	Structured output — generate, parse JSON, validate, and retry until valid.	object, generate, attempts, isGenerating, isLoading
useAutocomplete	Inline autocomplete (ghost text) with built-in debounce and stale-response guards.	suggestion, onInput, accept, dismiss, isFetching, isReady
useVision	Image understanding (image in → text out).	describeImage, completion, isGenerating
useEmbedding	Text embeddings and similarity scoring.	embed, similarity, isReady
useTTS	Text-to-speech with playback and replay.	speak, replay, stop, isSynthesizing, isPlaying, hasAudio, rtf
useSTT	Speech-to-text from the microphone.	startRecording, stopRecording, transcript, isRecording, isTranscribing
useMemory	On-device RAG — store, recall, and search text.	add, recall, search, remove, clear, size, isReady
useEngine	The advanced base hook — escape hatch for full engine control.	complete, embed, describeImage, speak, load, dispose

Reach for useEngine only when you need something the modality hooks don't expose — mixing capabilities in one component, or driving the engine imperatively. For everything else, the dedicated hook is shorter and clearer.

For inline autocomplete and ghost text, useAutocomplete owns the debounce, in-flight, and stale-response guards so your component just renders the suggestion and handles Tab to accept and Esc to dismiss. See Autocomplete & Rewrite for the full guide, including the lower-level autocomplete and rewrite methods on useEngine.

Live tokens per second

useEngine().tps updates on every token during generation, so a tok/s readout ticks live as text streams in rather than only resolving once the run finishes. The hooks built on it — including useChat and useText — surface the same live value through their own tps field.

When you drive the engine imperatively, the live number comes from the onToken callback's second argument, a meta object with { tokenIndex, tps, elapsedMs }. The tps here is decode-only — it measures token generation and excludes the one-time prefill, so it reflects steady-state throughput rather than being dragged down by prompt processing:

onToken.ts

01const result = await engine.generate(prompt, {
02  onToken: (token, meta) => {
03    process.stdout.write(token);
04    if (meta) {
05      // meta.tokenIndex — index of this token in the decode loop
06      // meta.tps        — live decode-only tokens/sec (excludes prefill)
07      // meta.elapsedMs  — ms elapsed since generation started
08      updateSpeedReadout(meta.tps);
09    }
10  },
11});

Forms with useCompletion

useCompletion is a single-prompt hook with controlled-input helpers, so a prompt box is a few lines:

Prompt.tsx

01"use client";
02
03import { useCompletion } from "@tryhamster/gerbil/hooks";
04
05function Prompt() {
06  const { completion, input, handleInputChange, handleSubmit, isLoading } =
07    useCompletion();
08
09  return (
10    <form onSubmit={handleSubmit}>
11      <input value={input} onChange={handleInputChange} placeholder="Ask anything…" />
12      <button type="submit" disabled={isLoading}>
13        {isLoading ? "…" : "Run"}
14      </button>
15      <p>{completion}</p>
16    </form>
17  );
18}

Structured output with useObject

useObject generates JSON, parses it, validates it against a schema, and retries until it's valid. On-device tokens are free, so re-rolling malformed JSON costs nothing but a moment — you get a typed object back instead of a string you have to defensively parse:

Extractor.tsx

01"use client";
02
03import { useObject } from "@tryhamster/gerbil/hooks";
04
05type Person = { name: string; age: number };
06
07function Extractor() {
08  const { object, generate, attempts, isGenerating } = useObject<Person>();
09
10  return (
11    <div>
12      <button
13        onClick={() =>
14          generate('Extract {name, age} from: "I am Sarah, 28"', {
15            schema: { required: ["name", "age"] },
16          })
17        }
18        disabled={isGenerating}
19      >
20        {isGenerating ? "Extracting…" : "Extract"}
21      </button>
22      {object && (
23        <p>
24          {object.name} is {object.age} (took {attempts} attempt{attempts === 1 ? "" : "s"})
25        </p>
26      )}
27    </div>
28  );
29}

The schema is either a minimal JSON-schema-ish object with required keys, or a predicate (o) => boolean for arbitrary validation. Omit it to require valid JSON only. Tune the retry budget with maxRetries (default 4). Need it imperatively instead of as a hook? Call engine.generateObject(prompt, { schema }) — it returns { object, attempts }.

They share one engine

Two components asking for the same model receive the same underlying engine, so the weights upload to the GPU once no matter how many hooks use them — and distinct models run side by side. You don't wire any of this up. See Concurrency & Memory for how the shared-engine lifecycle and GPU memory budgeting work.

Coming from the Vercel AI SDK?

If you've used the AI SDK's useChat and useCompletion, these map closely — same status lifecycle (ready → submitted → streaming), the same stop(), and a sendMessage alias on useChat. A few things differ because everything runs on the user's device:

—No backend to wire. There's no api route or transport to configure — the model runs in the browser, so you pass a model (or nothing) instead of an endpoint.
—Messages use string content. Each message is { role, content } with content as a plain string. The newer parts[] message shape isn't adopted — read m.content directly.
—Loading is part of the API. Because the first call downloads and uploads weights, hooks surface isLoading and loadingProgress so you can show a one-time model-download bar.

If you're using the AI SDK on the server, the Gerbil AI SDK provider still works there — these hooks are the in-browser counterpart.