Skip to content

Elsai Retrievers

Package: elsai-retrievers  v0.1.0

Hybrid document retrieval that combines dense (semantic) and sparse (BM25) search for more accurate results than either approach alone.

Installation

bash
pip install --extra-index-url https://core-packages.elsai.ai/root/elsai-retrievers/ elsai-retrievers==0.1.0

Requirements: Python >= 3.9


HybridRetriever

Merges results from multiple retrievers into a single ranked list using Reciprocal Rank Fusion (RRF). Supports combining semantic (dense vector) retrievers with BM25 sparse search over raw document chunks.

python
from elsai_retrievers.hybrid_retriever import HybridRetriever
from elsai_vectordb.chromadb import ChromaVectorDb
from elsai_embeddings.azure_embeddings import AzureOpenAIEmbeddingModel

# Set up the vector store retriever (dense / semantic)
embedding_model = AzureOpenAIEmbeddingModel(...)
chroma = ChromaVectorDb(persist_directory="./db")
dense_retriever = chroma.as_retriever(
    collection_name="my_docs",
    embedding_model=embedding_model,
)

# Initialise — no constructor arguments
hybrid = HybridRetriever()

# Retrieve using semantic retrievers only
results = hybrid.hybrid_retrieve(
    retrievers=[dense_retriever],
    question="What are the key findings?",
)

for doc in results:
    print(doc.page_content)
    print(doc.metadata)

Constructor: No parameters — HybridRetriever().

hybrid_retrieve() parameters:

ParameterDescription
questionThe search query string
retrieversList of semantic retrievers — any object exposing a .retrieve() method (e.g. from as_retriever())
chunksOptional list of raw document chunks for BM25 sparse search. Pass this to enable keyword-based matching alongside semantic search

Combining dense and sparse (BM25)

Pass chunks to activate BM25 alongside the semantic retrievers. Both result sets are merged with RRF before being returned.

python
from elsai_retrievers.hybrid_retriever import HybridRetriever

# chunks = list of documents/text chunks loaded from your corpus
hybrid = HybridRetriever()

results = hybrid.hybrid_retrieve(
    chunks=chunks,
    retrievers=[dense_retriever],
    question="What does the refund policy say?",
)

How hybrid retrieval works

Dense search finds semantically similar documents even when exact keywords don't match. Sparse search ensures keyword-critical documents (product names, IDs, codes) are not missed. RRF combines both ranked lists without requiring score normalisation.


Multiple semantic retrievers

You can pass retrievers from different vector stores — results from all of them are merged through RRF:

python
from elsai_retrievers.hybrid_retriever import HybridRetriever

hybrid = HybridRetriever()

results = hybrid.hybrid_retrieve(
    retrievers=[chroma_retriever, pinecone_retriever],
    question="Summarise the Q3 financial results.",
)

Integration with Elsai Agents

python
from elsai_retrievers.hybrid_retriever import HybridRetriever
from strands import tool

hybrid = HybridRetriever()

@tool
def search_docs(query: str) -> str:
    """Search the knowledge base for relevant documents."""
    docs = hybrid.hybrid_retrieve(
        retrievers=[dense_retriever],
        question=query,
    )
    return "\n\n".join(d.page_content for d in docs)

agent = Agent(tools=[search_docs])
agent("What does our refund policy say about digital purchases?")

Copyright © 2026 Elsai Foundry.