LangChain

Name: Gerbil
Author: Gerbil

Full LangChain integration with LLM, embeddings, TTS, and STT. Build chains, agents, and voice-enabled pipelines with local models.

Class	Capability
GerbilLLM	Text generation + Vision
GerbilEmbeddings	Vector embeddings

Note: The LangChain integration runs in Node.GerbilLLM and GerbilEmbeddings use native models (Qwen3.5-0.8B, EmbeddingGemma-300M), running on the WebGPU engine. For speech (Kani-TTS-2 TTS, Moonshine STT) and browser inference, see the WebGPUEngine.

Installation

Terminal

npm install @tryhamster/gerbil langchain

Quick Start

quick-start.ts

01import {
02  GerbilLLM,
03  GerbilEmbeddings,
04} from "@tryhamster/gerbil/langchain";
05
06// Text generation
07const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
08const result = await llm.invoke("Write a haiku about coding");
09
10// Embeddings
11const embeddings = new GerbilEmbeddings();
12const vector = await embeddings.embedQuery("Hello world");

GerbilLLM

Text generation with optional vision support:

llm-config.ts

01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02
03const llm = new GerbilLLM({
04  // Model configuration
05  model: "qwen3.5-0.8b",
06  device: "auto",        // "auto" | "gpu" | "cpu"
07  dtype: "q4",           // "q4" | "q8" | "fp16" | "fp32"
08
09  // Generation options
10  maxTokens: 500,
11  temperature: 0.7,
12  topP: 0.9,
13  topK: 50,
14
15  // Thinking mode (Qwen3)
16  thinking: false,
17
18  // Callbacks
19  callbacks: [
20    {
21      handleLLMStart: async (llm, prompts) => {
22        console.log("Starting generation...");
23      },
24      handleLLMEnd: async (output) => {
25        console.log("Generation complete");
26      },
27    },
28  ],
29});

invoke()

invoke.ts

01// Simple invocation
02const result = await llm.invoke("Explain recursion");
03
04// With options
05const result = await llm.invoke("Write a poem", {
06  maxTokens: 200,
07  temperature: 0.9,
08});
09
10// With stop sequences
11const result = await llm.invoke("List 3 items:\n1.", {
12  stop: ["\n4."],
13});

Streaming

streaming.ts

01// Stream tokens
02const stream = await llm.stream("Tell me a story");
03
04for await (const chunk of stream) {
05  process.stdout.write(chunk);
06}
07
08// With callbacks
09const stream = await llm.stream("Explain hooks", {
10  callbacks: [{
11    handleLLMNewToken: async (token) => {
12      console.log("Token:", token);
13    },
14  }],
15});

Vision

Use vision-capable models to analyze images:

vision.ts

01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02
03// Use a vision-capable model
04const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
05
06// Check if model supports vision
07const hasVision = await llm.supportsVision(); // true
08
09// Analyze an image
10const description = await llm.invokeWithImages(
11  "Describe this image in detail",
12  [{ source: "https://example.com/photo.jpg" }]
13);
14
15// Compare multiple images
16const diff = await llm.invokeWithImages(
17  "What changed between these two screenshots?",
18  [
19    { source: beforeScreenshot },
20    { source: afterScreenshot },
21  ]
22);
23
24// Use with local files (base64)
25import { readFileSync } from "fs";
26const imageData = readFileSync("photo.jpg").toString("base64");
27const result = await llm.invokeWithImages(
28  "What's in this photo?",
29  [{ source: `data:image/jpeg;base64,${imageData}` }]
30);

GerbilEmbeddings

embeddings.ts

01import { GerbilEmbeddings } from "@tryhamster/gerbil/langchain";
02
03const embeddings = new GerbilEmbeddings({
04  // Optional: specify embedding model (defaults to EmbeddingGemma-300M)
05  model: "embeddinggemma-300m",
06});
07
08// Single query
09const vector = await embeddings.embedQuery("What is the meaning of life?");
10// Returns: number[] (768 dimensions, L2-normalized)
11
12// Multiple documents
13const vectors = await embeddings.embedDocuments([
14  "First document",
15  "Second document",
16  "Third document",
17]);
18// Returns: number[][] (array of vectors)

Speech & Audio

Speech runs on the native WebGPU engine rather than a LangChain wrapper. Text-to-speech uses Kani-TTS-2 via engine.speak(), and speech-to-text uses Moonshine via MoonshineSTT — both running on-device on WebGPU. See the Text-to-Speech and Speech-to-Text docs.

Chains

Use Gerbil with LangChain chains:

chains.ts

01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02import { PromptTemplate } from "@langchain/core/prompts";
03import { StringOutputParser } from "@langchain/core/output_parsers";
04
05const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
06
07// Create a simple chain
08const prompt = PromptTemplate.fromTemplate(
09  "You are a helpful assistant. Answer this question: {question}"
10);
11
12const chain = prompt.pipe(llm).pipe(new StringOutputParser());
13
14const result = await chain.invoke({
15  question: "What is the capital of France?",
16});
17
18console.log(result); // "The capital of France is Paris."

Structured Output

structured.ts

01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02import { z } from "zod";
03
04const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
05
06// Define schema
07const personSchema = z.object({
08  name: z.string(),
09  age: z.number(),
10  city: z.string(),
11});
12
13// Create structured LLM
14const structuredLlm = llm.withStructuredOutput(personSchema);
15
16const result = await structuredLlm.invoke(
17  "Extract: John is 32 years old and lives in New York"
18);
19
20console.log(result);
21// { name: "John", age: 32, city: "New York" }

Vector Stores

vector-stores.ts

01import { GerbilEmbeddings } from "@tryhamster/gerbil/langchain";
02import { MemoryVectorStore } from "langchain/vectorstores/memory";
03import { Document } from "@langchain/core/documents";
04
05const embeddings = new GerbilEmbeddings();
06
07// Create documents
08const docs = [
09  new Document({ pageContent: "Gerbil is a local LLM library" }),
10  new Document({ pageContent: "It supports WebGPU acceleration" }),
11  new Document({ pageContent: "Works with the Vercel AI SDK" }),
12];
13
14// Create vector store
15const vectorStore = await MemoryVectorStore.fromDocuments(docs, embeddings);
16
17// Similarity search
18const results = await vectorStore.similaritySearch("What is Gerbil?", 2);
19console.log(results);

RAG Pipeline

Build a complete Retrieval-Augmented Generation pipeline:

rag.ts

01import { GerbilLLM, GerbilEmbeddings } from "@tryhamster/gerbil/langchain";
02import { MemoryVectorStore } from "langchain/vectorstores/memory";
03import { createRetrievalChain } from "langchain/chains/retrieval";
04import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
05import { ChatPromptTemplate } from "@langchain/core/prompts";
06
07// Initialize
08const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
09const embeddings = new GerbilEmbeddings();
10
11// Create vector store from documents
12const vectorStore = await MemoryVectorStore.fromTexts(
13  [
14    "Gerbil runs LLMs locally in Node.js",
15    "It supports GPU acceleration via WebGPU",
16    "Models are cached in IndexedDB",
17    "Works offline after first download",
18  ],
19  [{}, {}, {}, {}],
20  embeddings
21);
22
23// Create retriever
24const retriever = vectorStore.asRetriever({ k: 2 });
25
26// Create prompt
27const prompt = ChatPromptTemplate.fromTemplate(`
28Answer the question based on the context below.
29
30Context: {context}
31
32Question: {input}
33
34Answer:
35`);
36
37// Create chains
38const documentChain = await createStuffDocumentsChain({
39  llm,
40  prompt,
41});
42
43const retrievalChain = await createRetrievalChain({
44  combineDocsChain: documentChain,
45  retriever,
46});
47
48// Query
49const result = await retrievalChain.invoke({
50  input: "Does Gerbil work offline?",
51});
52
53console.log(result.answer);
54// "Yes, Gerbil works offline after the first download..."

Agents

agents.ts

01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02import { initializeAgentExecutorWithOptions } from "langchain/agents";
03import { Calculator } from "@langchain/community/tools/calculator";
04import { WebBrowser } from "langchain/tools/webbrowser";
05
06const llm = new GerbilLLM({
07  model: "qwen3.5-0.8b",
08  thinking: true, // Enable for better reasoning
09});
10
11// Create tools
12const tools = [
13  new Calculator(),
14  // Add more tools as needed
15];
16
17// Create agent
18const executor = await initializeAgentExecutorWithOptions(tools, llm, {
19  agentType: "zero-shot-react-description",
20  verbose: true,
21});
22
23// Run agent
24const result = await executor.invoke({
25  input: "What is 25 * 4 + 10?",
26});
27
28console.log(result.output);

Conversation Memory

conversation.ts

01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02import { ConversationChain } from "langchain/chains";
03import { BufferMemory } from "langchain/memory";
04
05const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
06
07const memory = new BufferMemory();
08
09const chain = new ConversationChain({
10  llm,
11  memory,
12});
13
14// First message
15await chain.call({ input: "My name is Alice" });
16
17// Second message - remembers context
18const result = await chain.call({ input: "What's my name?" });
19console.log(result.response); // "Your name is Alice!"

Document Loaders

document-loaders.ts

01import { GerbilLLM, GerbilEmbeddings } from "@tryhamster/gerbil/langchain";
02import { TextLoader } from "langchain/document_loaders/fs/text";
03import { PDFLoader } from "langchain/document_loaders/fs/pdf";
04import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
05import { MemoryVectorStore } from "langchain/vectorstores/memory";
06
07// Load documents
08const textLoader = new TextLoader("./docs/readme.txt");
09const pdfLoader = new PDFLoader("./docs/manual.pdf");
10
11const textDocs = await textLoader.load();
12const pdfDocs = await pdfLoader.load();
13
14// Split into chunks
15const splitter = new RecursiveCharacterTextSplitter({
16  chunkSize: 500,
17  chunkOverlap: 50,
18});
19
20const splitDocs = await splitter.splitDocuments([...textDocs, ...pdfDocs]);
21
22// Create vector store
23const embeddings = new GerbilEmbeddings();
24const vectorStore = await MemoryVectorStore.fromDocuments(splitDocs, embeddings);
25
26// Query
27const results = await vectorStore.similaritySearch("How do I install?", 3);

Voice-Enabled Pipeline

Build a complete voice-to-voice agent with STT → LLM → TTS. The LangChain LLM handles text; speech is the native WebGPU engine (Moonshine for STT, Kani-TTS-2 for TTS):

voice-pipeline.ts

01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02import { MoonshineSTT, WebGPUEngine } from "@tryhamster/gerbil/gpu";
03
04const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
05const stt = await MoonshineSTT.create({ repo: "UsefulSensors/moonshine-base" });
06const tts = await WebGPUEngine.create({ repo: "nineninesix/kani-tts-450m-0.2-ft" });
07
08// Voice input → LLM → Voice output
09async function voiceChat(pcm16kMono: Float32Array) {
10  // 1. Transcribe user speech (raw 16 kHz mono PCM)
11  const { text: userMessage } = await stt.transcribe(pcm16kMono);
12  console.log("User said:", userMessage);
13
14  // 2. Generate response
15  const response = await llm.invoke(userMessage);
16  console.log("AI response:", response);
17
18  // 3. Speak response
19  const { pcm, sampleRate } = await tts.speak(response, { languageTag: "en_us" });
20
21  return { pcm, sampleRate, text: response };
22}
23
24// Combine with RAG for voice-enabled knowledge base
25import { MemoryVectorStore } from "langchain/vectorstores/memory";
26import { GerbilEmbeddings } from "@tryhamster/gerbil/langchain";
27
28const embeddings = new GerbilEmbeddings();
29const vectorStore = await MemoryVectorStore.fromTexts(docs, metadata, embeddings);
30
31async function voiceRAG(pcm16kMono: Float32Array) {
32  // Transcribe question
33  const { text: question } = await stt.transcribe(pcm16kMono);
34
35  // Retrieve relevant documents
36  const relevantDocs = await vectorStore.similaritySearch(question, 3);
37  const context = relevantDocs.map(d => d.pageContent).join("\n");
38
39  // Generate answer with context
40  const answer = await llm.invoke(
41    `Context: ${context}\n\nQuestion: ${question}\n\nAnswer:`
42  );
43
44  // Speak the answer
45  const { pcm } = await tts.speak(answer, { languageTag: "en_us" });
46  return { pcm, answer };
47}