Embeddings

Name: Gerbil
Author: Gerbil

Gerbil turns text into vectors entirely on-device with EmbeddingGemma-300M — a bidirectional Gemma3 encoder that produces 768-dimensional, L2-normalized embeddings on WebGPU. Because the vectors are unit length, cosine similarity is just a dot product, and because everything runs locally, there's no server, no API keys, and nothing leaves the device. It's the same engine that powers the semantic search over these very docs.

Quick start with useEmbedding

In the browser, the useEmbedding hook is the shortest path. It downloads EmbeddingGemma on first use and exposes embed and similarity:

Similarity.tsx

01"use client";
02
03import { useEmbedding } from "@tryhamster/gerbil/hooks";
04
05function Similarity() {
06  const { embed, similarity, isReady, isLoading } = useEmbedding();
07
08  const compare = async () => {
09    // similarity() returns a cosine score in [-1, 1].
10    const score = await similarity("Hello world", "Hi there");
11    console.log(score); // ~0.7
12
13    // Or embed text yourself for indexing / storage.
14    const vec = await embed("capital of France", { taskType: "query" });
15    console.log(vec.length); // 768 (Float32Array, unit L2 norm)
16  };
17
18  if (isLoading) return <div>Loading embedding model…</div>;
19
20  return (
21    <button onClick={compare} disabled={!isReady}>
22      Compare
23    </button>
24  );
25}

embed(text, options?) returns a Float32Array of length 768. Pass { model } to the hook to point at a different embedder; omit it and you get the built-in EmbeddingGemma.

Query vs document

EmbeddingGemma is asymmetric: a search query and the documents you're searching are embedded with different task prefixes. For retrieval, embed the user's question as a query and everything in your index as a document. Getting this right meaningfully improves relevance:

task-types.ts

01const q = await embed("capital of France", { taskType: "query" });
02const doc = await embed("Paris is the capital of France.", {
03  taskType: "document",
04});
05
06// Unit-norm vectors → cosine similarity is a plain dot product.
07const score = q.reduce((sum, v, i) => sum + v * doc[i], 0);
08console.log(score); // ~0.7+

taskType defaults to "query". For comparing two free-standing texts, similarity(a, b) does the embedding and the dot product for you.

Semantic search & RAG

A retrieval pipeline is just “embed everything once, then rank by dot product per query.” Embed each document with taskType: "document", keep the vectors in memory, then embed the question as a query and sort:

useRetriever.ts

01import { useEmbedding } from "@tryhamster/gerbil/hooks";
02
03function useRetriever(documents: string[]) {
04  const { embed } = useEmbedding();
05
06  // Index once.
07  async function index() {
08    return Promise.all(
09      documents.map(async (text) => ({
10        text,
11        vector: await embed(text, { taskType: "document" }),
12      }))
13    );
14  }
15
16  // Rank documents against a question.
17  async function retrieve(
18    index: { text: string; vector: Float32Array }[],
19    question: string,
20    k = 3
21  ) {
22    const q = await embed(question, { taskType: "query" });
23    return index
24      .map((d) => ({
25        text: d.text,
26        score: d.vector.reduce((s, v, i) => s + v * q[i], 0),
27      }))
28      .sort((a, b) => b.score - a.score)
29      .slice(0, k);
30  }
31
32  return { index, retrieve };
33}

For a persistent, token-budgeted store with chunking and on-device recall built in, use Memory & RAG, which wraps the same embedder behind useMemory.

In Node & the core engine

The same embeddings run in Node through the core WebGPUEngine. Create it with embedding: true to enable embed() and similarity():

engine-embed.ts

01import { WebGPUEngine } from "@tryhamster/gerbil/gpu";
02
03const engine = await WebGPUEngine.create({
04  repo: "mlx-community/embeddinggemma-300m-4bit",
05  embedding: true,
06});
07
08const query = await engine.embed("capital of France", { taskType: "query" });
09const doc = await engine.embed("Paris is the capital of France.", {
10  taskType: "document",
11});
12
13const score = query.reduce((s, v, i) => s + v * doc[i], 0);
14console.log(score); // ~0.7+
15
16engine.destroy();

If you're already driving one engine for several capabilities, you can request embeddings from the base useEngine({ embedding: true }) hook and call embed / similarity directly — the dedicated useEmbedding hook is just a thin wrapper over it.

The model

Model	Repo	Dim	Notes
EmbeddingGemma-300M	mlx-community/embeddinggemma-300m-4bit	768	Default. Asymmetric query/document task types; light enough to run on iPad Safari.

EmbeddingGemma is the recommended default for on-device retrieval. To swap in another embedder, pass its repo id as model to the hook (or repo to the engine).