API Reference

The native WebGPUEngine is the primary surface. It runs text, vision, embeddings, and speech on WebGPU, in the browser and in Node.

WebGPUEngine

From @tryhamster/gerbil/gpu. Created via the static create() factory — there is no new constructor.

create()

create.ts
import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
const engine = await WebGPUEngine.create(
options?: WebGPUEngineOptions
): Promise<WebGPUEngine>
// Example — dtype "auto" picks int4 on mobile, native precision on desktop.
const engine = await WebGPUEngine.create({
repo: "mlx-community/Qwen3.5-0.8B-4bit",
dtype: "auto",
enableVision: false, // build the ViT tower (vision checkpoints only)
embedding: false, // build the embedding head
maxSeqLen: 4096,
});

generate()

generate.ts
await engine.generate(
prompt: string,
options?: GenerateOptions
): Promise<GenerateResult>
// Example
const result = await engine.generate("Write a haiku", {
maxTokens: 256,
sampling: { temperature: 0.7, topP: 0.9, topK: 50 },
systemPrompt: "You are helpful.",
stopSequences: ["\n\n"],
onToken: (t) => process.stdout.write(t),
});
console.log(result.text, result.tokensPerSecond);

stream()

stream.ts
engine.stream(
prompt: string,
options?: GenerateOptions
): AsyncGenerator<string, GenerateResult, unknown>
// Example
for await (const token of engine.stream("Tell me a story")) {
process.stdout.write(token);
}

generateObject()

generateObject.ts
await engine.generateObject<T>(
prompt: string,
options?: GenerateObjectOptions
): Promise<GenerateObjectResult<T>>
// Validates JSON output; retries on parse/schema failure.
const { object, attempts } = await engine.generateObject(
"Extract: John is 32 and lives in NYC",
{
schema: { required: ["name", "age", "city"] },
maxRetries: 4,
},
);
console.log(object); // { name: "John", age: 32, city: "NYC" }

describeImage()

describeImage.ts
await engine.describeImage(
image: { pixels: Uint8Array; width: number; height: number },
prompt?: string,
options?: GenerateOptions
): Promise<GenerateResult>
// Requires create({ enableVision: true }) on a vision checkpoint.
// In Node, decode the image to RGB pixels (HWC, 0..255) yourself; the React
// hook's describeImage() takes a URL / data-URL directly.
const { text } = await engine.describeImage(
{ pixels, width, height },
"What's in this image?",
);

embed()

embed.ts
await engine.embed(
text: string,
options?: EmbedOptions
): Promise<Float32Array>
// Requires create({ embedding: true }). EmbeddingGemma is asymmetric — pass
// taskType so queries and documents get the right prefix.
const query = await engine.embed("capital of France", { taskType: "query" });
const doc = await engine.embed("Paris is the capital of France", {
taskType: "document",
});

speak()

speak.ts
await engine.speak(
text: string,
options?: { languageTag?: string; temperature?: number; topP?: number; repetitionPenalty?: number }
): Promise<{ pcm: Float32Array; sampleRate: number }>
// Native Kani-TTS-2 on WebGPU. Returns 22.05 kHz mono PCM.
const engine = await WebGPUEngine.create({ repo: "nineninesix/kani-tts-450m-0.2-ft" });
const { pcm, sampleRate } = await engine.speak("Hello, I'm Gerbil!");

Other Members

members.ts
engine.hasVision: boolean // vision encoder was built
engine.isEmbedding: boolean // embedding head was built
engine.capabilities // { text, vision, moe }
engine.config // model architecture config
engine.destroy(): void // free the GPU device + weights

MoonshineSTT

Native speech-to-text — a raw-waveform encoder/decoder (no FFT / log-mel) in its own class.

MoonshineSTT.ts
import { MoonshineSTT } from "@tryhamster/gerbil/gpu";
const stt = await MoonshineSTT.create({ repo: "UsefulSensors/moonshine-base" });
// pcm16kMono: Float32Array @ 16 kHz
const { text } = await stt.transcribe(pcm16kMono);

React Hooks

From @tryhamster/gerbil/hooks. useEngine owns the full engine lifecycle — load, unload, hot-swap on config change, and reference-counted sharing so multiple components never upload the same weights twice. See the Hooks reference for the full surface.

useEngine.tsx
import { useEngine } from "@tryhamster/gerbil/hooks";
function Chat() {
const { complete, completion, isLoading, isGenerating, tps } = useEngine({
model: "mlx-community/Qwen3.5-0.8B-4bit",
autoLoad: true, // dtype defaults to "auto": int4 on mobile, native on desktop
});
if (isLoading) return <div>Loading model…</div>;
return (
<button onClick={() => complete("What is 2+2?")} disabled={isGenerating}>
{completion || "Generate"}
</button>
);
}
// The same hook exposes describeImage (vision), embed / similarity
// (embeddings), stream, stop, and dispose. Pass enableVision: true or
// embedding: true to load those modalities.

Types

WebGPUEngineOptions

WebGPUEngineOptions.ts
interface WebGPUEngineOptions {
repo?: string; // HF repo or URL; omit for a capability default
dtype?: "auto" | "f32" | "q4"; // "auto": int4 on mobile, native on desktop
enableVision?: boolean; // build the ViT tower (~192MB, vision checkpoints)
embedding?: boolean; // build the embedding head
maxSeqLen?: number; // default: model config, capped at 4096
}

GenerateOptions

GenerateOptions.ts
interface GenerateOptions {
maxTokens?: number; // default: 512
stopSequences?: string[];
sampling?: SamplingParams; // { temperature, topP, topK, ... }
systemPrompt?: string;
onToken?: (token: string) => void;
}

GenerateResult

GenerateResult.ts
interface GenerateResult {
text: string;
tokensGenerated: number;
tokensPerSecond: number;
totalTime: number;
finishReason: "eos" | "max_tokens" | "stop_sequence";
thinking?: string;
}

GenerateObjectOptions

GenerateObjectOptions.ts
interface GenerateObjectOptions extends GenerateOptions {
// A predicate (o) => boolean, or a minimal JSON-schema-ish object with
// required / properties (required keys must exist). Omit to only require
// syntactically valid JSON.
schema?: ObjectValidator;
maxRetries?: number; // retries after the first attempt (default: 4)
}

EmbedOptions

EmbedOptions.ts
interface EmbedOptions {
taskType?: "query" | "document"; // EmbeddingGemma asymmetric prefix
taskPrompt?: string; // raw prefix override
instruction?: string; // Qwen3-Embedding instruction prefix
maxTokens?: number;
}

Skills API

skills-api.ts
import {
// Skill system
defineSkill,
useSkill,
listSkills,
loadSkills,
// Built-in skills
commit,
summarize,
explain,
review,
test,
translate,
extract,
title,
// Vision skills (require a vision model)
describeImage,
analyzeScreenshot,
extractFromImage,
compareImages,
captionImage,
} from "@tryhamster/gerbil/skills";
// All skills accept an input object and return Promise<string | T>
const msg = await commit({ type: "conventional" });
const summary = await summarize({ content, length: "short" });
const explanation = await explain({ content, level: "beginner" });
const feedback = await review({ code, focus: ["security"] });
const tests = await test({ code, framework: "vitest" });
const translated = await translate({ text, to: "es" });
const headline = await title({ content, style: "professional" });
// Vision skills
const description = await describeImage({ image: url, focus: "details" });
const analysis = await analyzeScreenshot({ image: dataUri, type: "qa" });
const text = await extractFromImage({ image, extract: "text" });
const diff = await compareImages({ image1, image2, focus: "differences" });
const alt = await captionImage({ image, style: "descriptive" });
// extract() returns Promise<T>
const data = await extract({ content, schema: myZodSchema });
// Custom skills
const mySkill = defineSkill({
name: "my-skill",
input: z.object({ text: z.string() }),
run: async ({ input, gerbil }) => gerbil.generate(input.text),
});

Gerbil Class

The higher-level Gerbil class wraps the engine with response caching and convenience helpers. For most new code, prefer WebGPUEngine / useEngine above.

Constructor + loadModel()

gerbil-legacy.ts
import { Gerbil } from "@tryhamster/gerbil";
const g = new Gerbil(config?: GerbilConfig);
await g.loadModel(modelId: string, options?: LoadOptions): Promise<void>

Methods

gerbil-methods.ts
await g.generate(prompt, options?): Promise<GenerateResult>
g.stream(prompt, options?): AsyncGenerator<string, GenerateResult>
await g.json<T>(prompt, options): Promise<T>
await g.embed(text, options?): Promise<EmbedResult>
// Response cache (Gerbil-class only)
g.getResponseCacheStats(): { hits, misses, size, hitRate }
g.clearResponseCache(): void
// Lifecycle / introspection
g.isLoaded(): boolean
g.getModelInfo(): ModelConfig | null
g.clearCache(): void // KV cache
await g.dispose(): Promise<void>