Skip to content

llama.cpp

Run a local GGUF model via a llama.cpp HTTP server with LlamaCppModel. No cloud API key required — the model runs on your machine.

For standalone invoke / stream usage, see LLM Models — llama.cpp.

Install

bash
pip install --extra-index-url https://core-packages.elsai.ai/root/elsai-model/ elsai-model==2.0.0
pip install --extra-index-url https://elsai-agents.elsai.ai/root/ elsai-agents==0.2.0

Setup

  1. Download a GGUF model (e.g. from Hugging Face).
  2. Start the llama.cpp server with the model loaded locally:
bash
llama-server -m /path/to/model.gguf --host 0.0.0.0 --port 8080
  1. Point the client at your server:
bash
export LLAMACPP_BASE_URL=http://localhost:8080
export LLAMACPP_MODEL_ID=default

Agent — basic

python
import os
from elsai import Agent
from elsai_model.llama_cpp import LlamaCppModel

model = LlamaCppModel(
    base_url=os.getenv("LLAMACPP_BASE_URL", "http://localhost:8080"),
    model_id=os.getenv("LLAMACPP_MODEL_ID", "default"),
    params={"temperature": 0.2},
)
agent = Agent(
    model=model,
    system_prompt="You are a concise assistant.",
)
result = agent("What is 17 + 28? Reply with just the number.")
print(result.message["content"][0]["text"])

Copyright © 2026 Elsai Foundry.