LangChain

Full LangChain integration with LLM, embeddings, TTS, and STT. Build chains, agents, and voice-enabled pipelines with local models.

ClassCapability
GerbilLLMText generation + Vision
GerbilEmbeddingsVector embeddings
Note: The LangChain integration runs in Node.GerbilLLM and GerbilEmbeddings use native models (Qwen3.5-0.8B, EmbeddingGemma-300M), running on the WebGPU engine. For speech (Kani-TTS-2 TTS, Moonshine STT) and browser inference, see the WebGPUEngine.

Installation

Terminal
npm install @tryhamster/gerbil langchain

Quick Start

quick-start.ts
01import {
02 GerbilLLM,
03 GerbilEmbeddings,
04} from "@tryhamster/gerbil/langchain";
05
06// Text generation
07const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
08const result = await llm.invoke("Write a haiku about coding");
09
10// Embeddings
11const embeddings = new GerbilEmbeddings();
12const vector = await embeddings.embedQuery("Hello world");

GerbilLLM

Text generation with optional vision support:

llm-config.ts
01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02
03const llm = new GerbilLLM({
04 // Model configuration
05 model: "qwen3.5-0.8b",
06 device: "auto", // "auto" | "gpu" | "cpu"
07 dtype: "q4", // "q4" | "q8" | "fp16" | "fp32"
08
09 // Generation options
10 maxTokens: 500,
11 temperature: 0.7,
12 topP: 0.9,
13 topK: 50,
14
15 // Thinking mode (Qwen3)
16 thinking: false,
17
18 // Callbacks
19 callbacks: [
20 {
21 handleLLMStart: async (llm, prompts) => {
22 console.log("Starting generation...");
23 },
24 handleLLMEnd: async (output) => {
25 console.log("Generation complete");
26 },
27 },
28 ],
29});

invoke()

invoke.ts
01// Simple invocation
02const result = await llm.invoke("Explain recursion");
03
04// With options
05const result = await llm.invoke("Write a poem", {
06 maxTokens: 200,
07 temperature: 0.9,
08});
09
10// With stop sequences
11const result = await llm.invoke("List 3 items:\n1.", {
12 stop: ["\n4."],
13});

Streaming

streaming.ts
01// Stream tokens
02const stream = await llm.stream("Tell me a story");
03
04for await (const chunk of stream) {
05 process.stdout.write(chunk);
06}
07
08// With callbacks
09const stream = await llm.stream("Explain hooks", {
10 callbacks: [{
11 handleLLMNewToken: async (token) => {
12 console.log("Token:", token);
13 },
14 }],
15});

Vision

Use vision-capable models to analyze images:

vision.ts
01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02
03// Use a vision-capable model
04const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
05
06// Check if model supports vision
07const hasVision = await llm.supportsVision(); // true
08
09// Analyze an image
10const description = await llm.invokeWithImages(
11 "Describe this image in detail",
12 [{ source: "https://example.com/photo.jpg" }]
13);
14
15// Compare multiple images
16const diff = await llm.invokeWithImages(
17 "What changed between these two screenshots?",
18 [
19 { source: beforeScreenshot },
20 { source: afterScreenshot },
21 ]
22);
23
24// Use with local files (base64)
25import { readFileSync } from "fs";
26const imageData = readFileSync("photo.jpg").toString("base64");
27const result = await llm.invokeWithImages(
28 "What's in this photo?",
29 [{ source: `data:image/jpeg;base64,${imageData}` }]
30);

GerbilEmbeddings

embeddings.ts
01import { GerbilEmbeddings } from "@tryhamster/gerbil/langchain";
02
03const embeddings = new GerbilEmbeddings({
04 // Optional: specify embedding model (defaults to EmbeddingGemma-300M)
05 model: "embeddinggemma-300m",
06});
07
08// Single query
09const vector = await embeddings.embedQuery("What is the meaning of life?");
10// Returns: number[] (768 dimensions, L2-normalized)
11
12// Multiple documents
13const vectors = await embeddings.embedDocuments([
14 "First document",
15 "Second document",
16 "Third document",
17]);
18// Returns: number[][] (array of vectors)

Speech & Audio

Speech runs on the native WebGPU engine rather than a LangChain wrapper. Text-to-speech uses Kani-TTS-2 via engine.speak(), and speech-to-text uses Moonshine via MoonshineSTT — both running on-device on WebGPU. See the Text-to-Speech and Speech-to-Text docs.

Chains

Use Gerbil with LangChain chains:

chains.ts
01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02import { PromptTemplate } from "@langchain/core/prompts";
03import { StringOutputParser } from "@langchain/core/output_parsers";
04
05const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
06
07// Create a simple chain
08const prompt = PromptTemplate.fromTemplate(
09 "You are a helpful assistant. Answer this question: {question}"
10);
11
12const chain = prompt.pipe(llm).pipe(new StringOutputParser());
13
14const result = await chain.invoke({
15 question: "What is the capital of France?",
16});
17
18console.log(result); // "The capital of France is Paris."

Structured Output

structured.ts
01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02import { z } from "zod";
03
04const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
05
06// Define schema
07const personSchema = z.object({
08 name: z.string(),
09 age: z.number(),
10 city: z.string(),
11});
12
13// Create structured LLM
14const structuredLlm = llm.withStructuredOutput(personSchema);
15
16const result = await structuredLlm.invoke(
17 "Extract: John is 32 years old and lives in New York"
18);
19
20console.log(result);
21// { name: "John", age: 32, city: "New York" }

Vector Stores

vector-stores.ts
01import { GerbilEmbeddings } from "@tryhamster/gerbil/langchain";
02import { MemoryVectorStore } from "langchain/vectorstores/memory";
03import { Document } from "@langchain/core/documents";
04
05const embeddings = new GerbilEmbeddings();
06
07// Create documents
08const docs = [
09 new Document({ pageContent: "Gerbil is a local LLM library" }),
10 new Document({ pageContent: "It supports WebGPU acceleration" }),
11 new Document({ pageContent: "Works with the Vercel AI SDK" }),
12];
13
14// Create vector store
15const vectorStore = await MemoryVectorStore.fromDocuments(docs, embeddings);
16
17// Similarity search
18const results = await vectorStore.similaritySearch("What is Gerbil?", 2);
19console.log(results);

RAG Pipeline

Build a complete Retrieval-Augmented Generation pipeline:

rag.ts
01import { GerbilLLM, GerbilEmbeddings } from "@tryhamster/gerbil/langchain";
02import { MemoryVectorStore } from "langchain/vectorstores/memory";
03import { createRetrievalChain } from "langchain/chains/retrieval";
04import { createStuffDocumentsChain } from "langchain/chains/combine_documents";
05import { ChatPromptTemplate } from "@langchain/core/prompts";
06
07// Initialize
08const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
09const embeddings = new GerbilEmbeddings();
10
11// Create vector store from documents
12const vectorStore = await MemoryVectorStore.fromTexts(
13 [
14 "Gerbil runs LLMs locally in Node.js",
15 "It supports GPU acceleration via WebGPU",
16 "Models are cached in IndexedDB",
17 "Works offline after first download",
18 ],
19 [{}, {}, {}, {}],
20 embeddings
21);
22
23// Create retriever
24const retriever = vectorStore.asRetriever({ k: 2 });
25
26// Create prompt
27const prompt = ChatPromptTemplate.fromTemplate(`
28Answer the question based on the context below.
29
30Context: {context}
31
32Question: {input}
33
34Answer:
35`);
36
37// Create chains
38const documentChain = await createStuffDocumentsChain({
39 llm,
40 prompt,
41});
42
43const retrievalChain = await createRetrievalChain({
44 combineDocsChain: documentChain,
45 retriever,
46});
47
48// Query
49const result = await retrievalChain.invoke({
50 input: "Does Gerbil work offline?",
51});
52
53console.log(result.answer);
54// "Yes, Gerbil works offline after the first download..."

Agents

agents.ts
01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02import { initializeAgentExecutorWithOptions } from "langchain/agents";
03import { Calculator } from "@langchain/community/tools/calculator";
04import { WebBrowser } from "langchain/tools/webbrowser";
05
06const llm = new GerbilLLM({
07 model: "qwen3.5-0.8b",
08 thinking: true, // Enable for better reasoning
09});
10
11// Create tools
12const tools = [
13 new Calculator(),
14 // Add more tools as needed
15];
16
17// Create agent
18const executor = await initializeAgentExecutorWithOptions(tools, llm, {
19 agentType: "zero-shot-react-description",
20 verbose: true,
21});
22
23// Run agent
24const result = await executor.invoke({
25 input: "What is 25 * 4 + 10?",
26});
27
28console.log(result.output);

Conversation Memory

conversation.ts
01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02import { ConversationChain } from "langchain/chains";
03import { BufferMemory } from "langchain/memory";
04
05const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
06
07const memory = new BufferMemory();
08
09const chain = new ConversationChain({
10 llm,
11 memory,
12});
13
14// First message
15await chain.call({ input: "My name is Alice" });
16
17// Second message - remembers context
18const result = await chain.call({ input: "What's my name?" });
19console.log(result.response); // "Your name is Alice!"

Document Loaders

document-loaders.ts
01import { GerbilLLM, GerbilEmbeddings } from "@tryhamster/gerbil/langchain";
02import { TextLoader } from "langchain/document_loaders/fs/text";
03import { PDFLoader } from "langchain/document_loaders/fs/pdf";
04import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
05import { MemoryVectorStore } from "langchain/vectorstores/memory";
06
07// Load documents
08const textLoader = new TextLoader("./docs/readme.txt");
09const pdfLoader = new PDFLoader("./docs/manual.pdf");
10
11const textDocs = await textLoader.load();
12const pdfDocs = await pdfLoader.load();
13
14// Split into chunks
15const splitter = new RecursiveCharacterTextSplitter({
16 chunkSize: 500,
17 chunkOverlap: 50,
18});
19
20const splitDocs = await splitter.splitDocuments([...textDocs, ...pdfDocs]);
21
22// Create vector store
23const embeddings = new GerbilEmbeddings();
24const vectorStore = await MemoryVectorStore.fromDocuments(splitDocs, embeddings);
25
26// Query
27const results = await vectorStore.similaritySearch("How do I install?", 3);

Voice-Enabled Pipeline

Build a complete voice-to-voice agent with STT → LLM → TTS. The LangChain LLM handles text; speech is the native WebGPU engine (Moonshine for STT, Kani-TTS-2 for TTS):

voice-pipeline.ts
01import { GerbilLLM } from "@tryhamster/gerbil/langchain";
02import { MoonshineSTT, WebGPUEngine } from "@tryhamster/gerbil/gpu";
03
04const llm = new GerbilLLM({ model: "qwen3.5-0.8b" });
05const stt = await MoonshineSTT.create({ repo: "UsefulSensors/moonshine-base" });
06const tts = await WebGPUEngine.create({ repo: "nineninesix/kani-tts-450m-0.2-ft" });
07
08// Voice input → LLM → Voice output
09async function voiceChat(pcm16kMono: Float32Array) {
10 // 1. Transcribe user speech (raw 16 kHz mono PCM)
11 const { text: userMessage } = await stt.transcribe(pcm16kMono);
12 console.log("User said:", userMessage);
13
14 // 2. Generate response
15 const response = await llm.invoke(userMessage);
16 console.log("AI response:", response);
17
18 // 3. Speak response
19 const { pcm, sampleRate } = await tts.speak(response, { languageTag: "en_us" });
20
21 return { pcm, sampleRate, text: response };
22}
23
24// Combine with RAG for voice-enabled knowledge base
25import { MemoryVectorStore } from "langchain/vectorstores/memory";
26import { GerbilEmbeddings } from "@tryhamster/gerbil/langchain";
27
28const embeddings = new GerbilEmbeddings();
29const vectorStore = await MemoryVectorStore.fromTexts(docs, metadata, embeddings);
30
31async function voiceRAG(pcm16kMono: Float32Array) {
32 // Transcribe question
33 const { text: question } = await stt.transcribe(pcm16kMono);
34
35 // Retrieve relevant documents
36 const relevantDocs = await vectorStore.similaritySearch(question, 3);
37 const context = relevantDocs.map(d => d.pageContent).join("\n");
38
39 // Generate answer with context
40 const answer = await llm.invoke(
41 `Context: ${context}\n\nQuestion: ${question}\n\nAnswer:`
42 );
43
44 // Speak the answer
45 const { pcm } = await tts.speak(answer, { languageTag: "en_us" });
46 return { pcm, answer };
47}