Sometimes you want AI, but you don’t want:

  • your data leaving your machine,

  • another vendor key to manage,

  • usage-based surprise bills.

That’s where local LLMs shine.

Ollama makes this easy: it runs a local server, and you talk to it like any HTTP service. Which is perfect for Java.

This post shows the simplest possible integration: Java HttpClient → Ollama → response printed.

Step 1: Install Ollama and pull a model

Install Ollama (their site has the OS-specific installer). Once it’s installed, pull a model.

If you’re unsure, start here:

ollama pull llama3.2:3b

Other good options:

  • Smaller/faster: llama3.2:1b

  • Coding-focused: qwen2.5-coder:7b

  • Small + efficient: phi3:mini

  • Vision (optional): llava

You can also see what you have locally:

ollama list

Step 2: Start Ollama

Usually Ollama runs as a background service once installed. If you need to start it manually:

ollama serve
Ollama listens on: http://localhost:11434

That’s it. Now it’s just HTTP.

Step 3: Call the local LLM from Java


We’ll use the endpoint:
POST http://localhost:11434/api/generate

Important detail for beginners: set "stream": false so you get one clean JSON response (instead of token-by-token streaming).

Main.java

import java.net.URI;
import java.net.http.*;
import java.time.Duration;

public class Main {
  public static void main(String[] args) throws Exception {
    String model = "llama3.2:3b";
    String prompt = "Write a friendly explanation of Java virtual threads in 5 lines.";

    String body = """
      {
        "model": "%s",
        "prompt": "%s",
        "stream": false
      }
      """.formatted(model, escapeJson(prompt));

    HttpClient client = HttpClient.newHttpClient();

    HttpRequest request = HttpRequest.newBuilder()
        .uri(URI.create("http://localhost:11434/api/generate"))
        .timeout(Duration.ofSeconds(30))
        .header("Content-Type", "application/json")
        .POST(HttpRequest.BodyPublishers.ofString(body))
        .build();

    try {
      HttpResponse<String> response =
          client.send(request, HttpResponse.BodyHandlers.ofString());

      if (response.statusCode() != 200) {
        System.out.println("Request failed: " + response.statusCode());
        System.out.println(response.body());
        return;
      }

      System.out.println(response.body());
    } catch (java.net.ConnectException e) {
      System.out.println("Cannot connect to Ollama at localhost:11434");
      System.out.println("Is Ollama running? Try: ollama serve");
    }
  }

  private static String escapeJson(String s) {
    return s.replace("\\", "\\\\").replace("\"", "\\\"");
  }
}

Run it, and you’ll see JSON printed back.

“Okay but I only want the text”

Totally fair.

Ollama returns JSON that includes a response field (the actual generated text). You can extract it properly with Jackson (recommended) instead of brittle string parsing.

This is the exact moment where Java feels good: you define a tiny record for the response, parse it, and keep your code clean.

Quick troubleshooting (the stuff that trips people)

If you see “connection refused”:
- Ollama isn’t running. Start it with ollama serve or open the Ollama app.

If you get a model error:
- You didn’t pull the model yet. Run ollama pull llama3.2:3b (or whatever model name you used).

If responses are slow:
- Try a smaller model like llama3.2:1b or phi3:mini.

Why local LLM + Java is a great combo

Local models are not about “replacing the cloud.” They’re about control:

  • privacy for internal docs and dev workflows,

  • fast iteration while building features,

  • no key management,

  • predictable cost (because it’s your machine doing the work).


Java fits because it’s already the home for real systems, not just experiments. Once your “hello world” call works, it’s a straight road to turning it into a service, adding caching, metrics, timeouts, and all the stuff production requires.

What I’ll write next

If you want to go one step beyond this:

  • streaming responses (token-by-token)

  • structured output (JSON-only answers)

  • tool calling (Java functions as “tools”)

  • evals (so changes don’t silently break behavior)

- Suren

Keep Reading