T21: Ollama and Chat

Large Language Models (LLMs) can now run locally on your machine. Ollama makes it easy to download and serve open-source models. Connecting your web app to a local LLM gives you AI-powered features without sending data to external services - like having a smart assistant living on your own computer.

Setting Up Ollama

Install Ollama, pull a model, and it serves an API on localhost:11434.

# Install and run
# ollama pull llama3
# ollama serve

# The API is now available at http://localhost:11434

Chat API Integration

async function chat(messages) {
    const response = await fetch("http://localhost:11434/api/chat", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
            model: "llama3",
            messages: messages,
            stream: false
        })
    });
    const data = await response.json();
    return data.message.content;
}

// Usage
const reply = await chat([
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Explain HTML in one sentence." }
]);

Building a Chat Interface

Store the conversation history as an array of message objects. Append each new message and send the full history to maintain context.

sequenceDiagram participant U as User participant W as Web App participant O as Ollama Server U->>W: Type message W->>W: Append to history W->>O: POST /api/chat (full history) O->>O: LLM generates response O-->>W: Response JSON W->>W: Append assistant reply W-->>U: Display response

Key Takeaways

  • Ollama runs open-source LLMs locally with a simple API
  • The chat API takes an array of messages with role and content fields
  • Send the full conversation history for context-aware responses
  • Local LLMs keep your data private - no external API calls needed