Appearance
Elsai Retrievers
Package: elsai-retrievers v0.1.0
Hybrid document retrieval that combines dense (semantic) and sparse (BM25) search for more accurate results than either approach alone.
Installation
bash
pip install --extra-index-url https://core-packages.elsai.ai/root/elsai-retrievers/ elsai-retrievers==0.1.0Requirements: Python >= 3.9
HybridRetriever
Merges results from multiple retrievers into a single ranked list using Reciprocal Rank Fusion (RRF). Supports combining semantic (dense vector) retrievers with BM25 sparse search over raw document chunks.
python
from elsai_retrievers.hybrid_retriever import HybridRetriever
from elsai_vectordb.chromadb import ChromaVectorDb
from elsai_embeddings.azure_embeddings import AzureOpenAIEmbeddingModel
# Set up the vector store retriever (dense / semantic)
embedding_model = AzureOpenAIEmbeddingModel(...)
chroma = ChromaVectorDb(persist_directory="./db")
dense_retriever = chroma.as_retriever(
collection_name="my_docs",
embedding_model=embedding_model,
)
# Initialise — no constructor arguments
hybrid = HybridRetriever()
# Retrieve using semantic retrievers only
results = hybrid.hybrid_retrieve(
retrievers=[dense_retriever],
question="What are the key findings?",
)
for doc in results:
print(doc.page_content)
print(doc.metadata)Constructor: No parameters — HybridRetriever().
hybrid_retrieve() parameters:
| Parameter | Description |
|---|---|
question | The search query string |
retrievers | List of semantic retrievers — any object exposing a .retrieve() method (e.g. from as_retriever()) |
chunks | Optional list of raw document chunks for BM25 sparse search. Pass this to enable keyword-based matching alongside semantic search |
Combining dense and sparse (BM25)
Pass chunks to activate BM25 alongside the semantic retrievers. Both result sets are merged with RRF before being returned.
python
from elsai_retrievers.hybrid_retriever import HybridRetriever
# chunks = list of documents/text chunks loaded from your corpus
hybrid = HybridRetriever()
results = hybrid.hybrid_retrieve(
chunks=chunks,
retrievers=[dense_retriever],
question="What does the refund policy say?",
)How hybrid retrieval works
Dense search finds semantically similar documents even when exact keywords don't match. Sparse search ensures keyword-critical documents (product names, IDs, codes) are not missed. RRF combines both ranked lists without requiring score normalisation.
Multiple semantic retrievers
You can pass retrievers from different vector stores — results from all of them are merged through RRF:
python
from elsai_retrievers.hybrid_retriever import HybridRetriever
hybrid = HybridRetriever()
results = hybrid.hybrid_retrieve(
retrievers=[chroma_retriever, pinecone_retriever],
question="Summarise the Q3 financial results.",
)Integration with Elsai Agents
python
from elsai_retrievers.hybrid_retriever import HybridRetriever
from strands import tool
hybrid = HybridRetriever()
@tool
def search_docs(query: str) -> str:
"""Search the knowledge base for relevant documents."""
docs = hybrid.hybrid_retrieve(
retrievers=[dense_retriever],
question=query,
)
return "\n\n".join(d.page_content for d in docs)
agent = Agent(tools=[search_docs])
agent("What does our refund policy say about digital purchases?")