Appearance
llama.cpp
Run a local GGUF model via a llama.cpp HTTP server with LlamaCppModel. No cloud API key required — the model runs on your machine.
For standalone invoke / stream usage, see LLM Models — llama.cpp.
Install
bash
pip install --extra-index-url https://core-packages.elsai.ai/root/elsai-model/ elsai-model==2.0.0
pip install --extra-index-url https://elsai-agents.elsai.ai/root/ elsai-agents==0.2.0Setup
- Download a GGUF model (e.g. from Hugging Face).
- Start the llama.cpp server with the model loaded locally:
bash
llama-server -m /path/to/model.gguf --host 0.0.0.0 --port 8080- Point the client at your server:
bash
export LLAMACPP_BASE_URL=http://localhost:8080
export LLAMACPP_MODEL_ID=defaultAgent — basic
python
import os
from elsai import Agent
from elsai_model.llama_cpp import LlamaCppModel
model = LlamaCppModel(
base_url=os.getenv("LLAMACPP_BASE_URL", "http://localhost:8080"),
model_id=os.getenv("LLAMACPP_MODEL_ID", "default"),
params={"temperature": 0.2},
)
agent = Agent(
model=model,
system_prompt="You are a concise assistant.",
)
result = agent("What is 17 + 28? Reply with just the number.")
print(result.message["content"][0]["text"])