Models

Built-in models and how to use any HuggingFace model.

Built-in Models

These models are curated and tested to work great with Gerbil:

ModelTypeSizeSpeedThinkVisionTTSSTTBrowserBest For
ministral-3bLLM~2.5GB
Vision + reasoning
qwen3-0.6bLLM~400MB
General use, reasoning
qwen2.5-0.5bLLM~350MB
General use
qwen2.5-coder-0.5bLLM~400MB
Code generation
smollm2-360mLLM~250MB
Fast completions
smollm2-135mLLM~100MB
Ultra-fast, tiny
smollm2-1.7bLLM~1.2GB
Higher quality
phi-3-miniLLM~2.1GB
High quality
llama-3.2-1bLLM~800MB
General use
gemma-2bLLM~1.4GB
Balanced
tinyllama-1.1bLLM~700MB
Lightweight
Text-to-Speech
kokoro-82mTTS~330MB
28 voices, 24kHz, US/UK English
supertonic-66mTTS~250MB
4 voices, 44.1kHz, fastest
Speech-to-Text
whisper-tiny.enSTT~39MB
Fastest transcription
whisper-base.enSTT~74MB
Balanced speed/accuracy
whisper-small.enSTT~244MB
High quality
whisper-large-v3-turboSTT~809MB
Best quality, 80+ langs

Choosing a Model

For general use

qwen3-0.6b is the best all-rounder. Good quality, reasonable speed, and supports thinking mode.

await g.loadModel("qwen3-0.6b");

For speed

smollm2-135m is the fastest option. Great for simple tasks where speed matters.

await g.loadModel("smollm2-135m");

For code

qwen2.5-coder-0.5b is optimized for code generation and understanding.

await g.loadModel("qwen2.5-coder-0.5b");

For reasoning

qwen3-0.6b with thinking mode shows step-by-step reasoning:

await g.loadModel("qwen3-0.6b");
const result = await g.generate("What is 17 * 23?", { thinking: true });
console.log(result.thinking); // Shows reasoning steps

For vision

ministral-3b understands images and supports reasoning. 256K context window.

await g.loadModel("ministral-3b");
const result = await g.generate("Describe this image", {
images: [{ source: "https://example.com/photo.jpg" }]
});
console.log(result.text); // Image description

See the Vision documentation for supported image formats and more examples.

Using HuggingFace Models

Load any compatible model from HuggingFace using the hf: prefix:

huggingface.ts
// Short syntax
await g.loadModel("hf:microsoft/Phi-3-mini-4k-instruct-onnx");
await g.loadModel("hf:Qwen/Qwen2.5-0.5B-Instruct");
// Full URL also works
await g.loadModel("https://huggingface.co/microsoft/Phi-3-mini");

Note: Not all HuggingFace models are compatible. Look for models with ONNX format or that work with transformers.js.

Local Models

Load models from your local filesystem:

local.ts
// Relative path
await g.loadModel("file:./models/my-fine-tune");
// Absolute path
await g.loadModel("file:/home/user/models/custom");

Load Options

Customize how models are loaded:

options.ts
await g.loadModel("qwen3-0.6b", {
// Device selection
device: "auto", // "auto" | "gpu" | "cpu" | "webgpu"
// Quantization level
dtype: "q4", // "q4" | "q8" | "fp16" | "fp32"
// Progress callback
onProgress: (info) => {
console.log(info.status, info.progress);
},
});

Quantization

LevelSizeQualitySpeed
q4SmallestGoodFastest
q8MediumBetterFast
fp16LargeBestSlower
fp32LargestBestSlowest

Model Caching

Models are cached locally after first download. Default location:

Terminal
~/.gerbil/models/

Manage cached models with the CLI:

Terminal
# List cached models
gerbil models --installed
# Remove a model
gerbil rm smollm2-135m
# Clear all cached models
gerbil cache clear