API Reference
The native WebGPUEngine is the primary surface. It runs text, vision, embeddings, and speech on WebGPU, in the browser and in Node.
WebGPUEngine
From @tryhamster/gerbil/gpu. Created via the static create() factory — there is no new constructor.
create()
create.ts
import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
const engine = await WebGPUEngine.create( options?: WebGPUEngineOptions): Promise<WebGPUEngine>
// Example — dtype "auto" picks int4 on mobile, native precision on desktop.const engine = await WebGPUEngine.create({ repo: "mlx-community/Qwen3.5-0.8B-4bit", dtype: "auto", enableVision: false, // build the ViT tower (vision checkpoints only) embedding: false, // build the embedding head maxSeqLen: 4096,});generate()
generate.ts
await engine.generate( prompt: string, options?: GenerateOptions): Promise<GenerateResult>
// Exampleconst result = await engine.generate("Write a haiku", { maxTokens: 256, sampling: { temperature: 0.7, topP: 0.9, topK: 50 }, systemPrompt: "You are helpful.", stopSequences: ["\n\n"], onToken: (t) => process.stdout.write(t),});console.log(result.text, result.tokensPerSecond);stream()
stream.ts
engine.stream( prompt: string, options?: GenerateOptions): AsyncGenerator<string, GenerateResult, unknown>
// Examplefor await (const token of engine.stream("Tell me a story")) { process.stdout.write(token);}generateObject()
generateObject.ts
await engine.generateObject<T>( prompt: string, options?: GenerateObjectOptions): Promise<GenerateObjectResult<T>>
// Validates JSON output; retries on parse/schema failure.const { object, attempts } = await engine.generateObject( "Extract: John is 32 and lives in NYC", { schema: { required: ["name", "age", "city"] }, maxRetries: 4, },);console.log(object); // { name: "John", age: 32, city: "NYC" }describeImage()
describeImage.ts
await engine.describeImage( image: { pixels: Uint8Array; width: number; height: number }, prompt?: string, options?: GenerateOptions): Promise<GenerateResult>
// Requires create({ enableVision: true }) on a vision checkpoint.// In Node, decode the image to RGB pixels (HWC, 0..255) yourself; the React// hook's describeImage() takes a URL / data-URL directly.const { text } = await engine.describeImage( { pixels, width, height }, "What's in this image?",);embed()
embed.ts
await engine.embed( text: string, options?: EmbedOptions): Promise<Float32Array>
// Requires create({ embedding: true }). EmbeddingGemma is asymmetric — pass// taskType so queries and documents get the right prefix.const query = await engine.embed("capital of France", { taskType: "query" });const doc = await engine.embed("Paris is the capital of France", { taskType: "document",});speak()
speak.ts
await engine.speak( text: string, options?: { languageTag?: string; temperature?: number; topP?: number; repetitionPenalty?: number }): Promise<{ pcm: Float32Array; sampleRate: number }>
// Native Kani-TTS-2 on WebGPU. Returns 22.05 kHz mono PCM.const engine = await WebGPUEngine.create({ repo: "nineninesix/kani-tts-450m-0.2-ft" });const { pcm, sampleRate } = await engine.speak("Hello, I'm Gerbil!");Other Members
members.ts
engine.hasVision: boolean // vision encoder was builtengine.isEmbedding: boolean // embedding head was builtengine.capabilities // { text, vision, moe }engine.config // model architecture config
engine.destroy(): void // free the GPU device + weightsMoonshineSTT
Native speech-to-text — a raw-waveform encoder/decoder (no FFT / log-mel) in its own class.
MoonshineSTT.ts
import { MoonshineSTT } from "@tryhamster/gerbil/gpu";
const stt = await MoonshineSTT.create({ repo: "UsefulSensors/moonshine-base" });
// pcm16kMono: Float32Array @ 16 kHzconst { text } = await stt.transcribe(pcm16kMono);React Hooks
From @tryhamster/gerbil/hooks. useEngine owns the full engine lifecycle — load, unload, hot-swap on config change, and reference-counted sharing so multiple components never upload the same weights twice. See the Hooks reference for the full surface.
useEngine.tsx
import { useEngine } from "@tryhamster/gerbil/hooks";
function Chat() { const { complete, completion, isLoading, isGenerating, tps } = useEngine({ model: "mlx-community/Qwen3.5-0.8B-4bit", autoLoad: true, // dtype defaults to "auto": int4 on mobile, native on desktop });
if (isLoading) return <div>Loading model…</div>; return ( <button onClick={() => complete("What is 2+2?")} disabled={isGenerating}> {completion || "Generate"} </button> );}
// The same hook exposes describeImage (vision), embed / similarity// (embeddings), stream, stop, and dispose. Pass enableVision: true or// embedding: true to load those modalities.Types
WebGPUEngineOptions
WebGPUEngineOptions.ts
interface WebGPUEngineOptions { repo?: string; // HF repo or URL; omit for a capability default dtype?: "auto" | "f32" | "q4"; // "auto": int4 on mobile, native on desktop enableVision?: boolean; // build the ViT tower (~192MB, vision checkpoints) embedding?: boolean; // build the embedding head maxSeqLen?: number; // default: model config, capped at 4096}GenerateOptions
GenerateOptions.ts
interface GenerateOptions { maxTokens?: number; // default: 512 stopSequences?: string[]; sampling?: SamplingParams; // { temperature, topP, topK, ... } systemPrompt?: string; onToken?: (token: string) => void;}GenerateResult
GenerateResult.ts
interface GenerateResult { text: string; tokensGenerated: number; tokensPerSecond: number; totalTime: number; finishReason: "eos" | "max_tokens" | "stop_sequence"; thinking?: string;}GenerateObjectOptions
GenerateObjectOptions.ts
interface GenerateObjectOptions extends GenerateOptions { // A predicate (o) => boolean, or a minimal JSON-schema-ish object with // required / properties (required keys must exist). Omit to only require // syntactically valid JSON. schema?: ObjectValidator; maxRetries?: number; // retries after the first attempt (default: 4)}EmbedOptions
EmbedOptions.ts
interface EmbedOptions { taskType?: "query" | "document"; // EmbeddingGemma asymmetric prefix taskPrompt?: string; // raw prefix override instruction?: string; // Qwen3-Embedding instruction prefix maxTokens?: number;}Skills API
skills-api.ts
import { // Skill system defineSkill, useSkill, listSkills, loadSkills,
// Built-in skills commit, summarize, explain, review, test, translate, extract, title,
// Vision skills (require a vision model) describeImage, analyzeScreenshot, extractFromImage, compareImages, captionImage,} from "@tryhamster/gerbil/skills";
// All skills accept an input object and return Promise<string | T>const msg = await commit({ type: "conventional" });const summary = await summarize({ content, length: "short" });const explanation = await explain({ content, level: "beginner" });const feedback = await review({ code, focus: ["security"] });const tests = await test({ code, framework: "vitest" });const translated = await translate({ text, to: "es" });const headline = await title({ content, style: "professional" });
// Vision skillsconst description = await describeImage({ image: url, focus: "details" });const analysis = await analyzeScreenshot({ image: dataUri, type: "qa" });const text = await extractFromImage({ image, extract: "text" });const diff = await compareImages({ image1, image2, focus: "differences" });const alt = await captionImage({ image, style: "descriptive" });
// extract() returns Promise<T>const data = await extract({ content, schema: myZodSchema });
// Custom skillsconst mySkill = defineSkill({ name: "my-skill", input: z.object({ text: z.string() }), run: async ({ input, gerbil }) => gerbil.generate(input.text),});Gerbil Class
The higher-level Gerbil class wraps the engine with response caching and convenience helpers. For most new code, prefer WebGPUEngine / useEngine above.
Constructor + loadModel()
gerbil-legacy.ts
import { Gerbil } from "@tryhamster/gerbil";
const g = new Gerbil(config?: GerbilConfig);await g.loadModel(modelId: string, options?: LoadOptions): Promise<void>Methods
gerbil-methods.ts
await g.generate(prompt, options?): Promise<GenerateResult>g.stream(prompt, options?): AsyncGenerator<string, GenerateResult>await g.json<T>(prompt, options): Promise<T>await g.embed(text, options?): Promise<EmbedResult>
// Response cache (Gerbil-class only)g.getResponseCacheStats(): { hits, misses, size, hitRate }g.clearResponseCache(): void
// Lifecycle / introspectiong.isLoaded(): booleang.getModelInfo(): ModelConfig | nullg.clearCache(): void // KV cacheawait g.dispose(): Promise<void>