elsai Retrievers

Package: elsai-retrievers v0.1.0

Hybrid document retrieval that combines dense (semantic) and sparse (BM25) search for more accurate results than either approach alone.

Installation

bash

pip install --extra-index-url https://core-packages.elsai.ai/root/elsai-retrievers/ elsai-retrievers==0.1.0

Requirements: Python >= 3.9

HybridRetriever

Merges results from multiple retrievers into a single ranked list using Reciprocal Rank Fusion (RRF). Supports combining semantic (dense vector) retrievers with BM25 sparse search over raw document chunks.

python

from elsai_retrievers.hybrid_retriever import HybridRetriever
from elsai_vectordb.chromadb import ChromaVectorDb
from elsai_embeddings.azure_embeddings import AzureOpenAIEmbeddingModel

# Set up the vector store retriever (dense / semantic)
embedding_model = AzureOpenAIEmbeddingModel(...)
chroma = ChromaVectorDb(persist_directory="./db")
dense_retriever = chroma.as_retriever(
    collection_name="my_docs",
    embedding_model=embedding_model,
)

# Initialise — no constructor arguments
hybrid = HybridRetriever()

# Retrieve using semantic retrievers only
results = hybrid.hybrid_retrieve(
    retrievers=[dense_retriever],
    question="What are the key findings?",
)

for doc in results:
    print(doc.page_content)
    print(doc.metadata)

Constructor: No parameters — HybridRetriever().

hybrid_retrieve() parameters:

Parameter	Description
`question`	The search query string
`retrievers`	List of semantic retrievers — any object exposing a `.retrieve()` method (e.g. from `as_retriever()`)
`chunks`	Optional list of raw document chunks for BM25 sparse search. Pass this to enable keyword-based matching alongside semantic search

Combining dense and sparse (BM25)

Pass chunks to activate BM25 alongside the semantic retrievers. Both result sets are merged with RRF before being returned.

python

from elsai_retrievers.hybrid_retriever import HybridRetriever

# chunks = list of documents/text chunks loaded from your corpus
hybrid = HybridRetriever()

results = hybrid.hybrid_retrieve(
    chunks=chunks,
    retrievers=[dense_retriever],
    question="What does the refund policy say?",
)

How hybrid retrieval works

Dense search finds semantically similar documents even when exact keywords don't match. Sparse search ensures keyword-critical documents (product names, IDs, codes) are not missed. RRF combines both ranked lists without requiring score normalisation.

Multiple semantic retrievers

You can pass retrievers from different vector stores — results from all of them are merged through RRF:

python

from elsai_retrievers.hybrid_retriever import HybridRetriever

hybrid = HybridRetriever()

results = hybrid.hybrid_retrieve(
    retrievers=[chroma_retriever, pinecone_retriever],
    question="Summarise the Q3 financial results.",
)

Integration with elsai Agents

python

from elsai_retrievers.hybrid_retriever import HybridRetriever
from elsai import tool

hybrid = HybridRetriever()

@tool
def search_docs(query: str) -> str:
    """Search the knowledge base for relevant documents."""
    docs = hybrid.hybrid_retrieve(
        retrievers=[dense_retriever],
        question=query,
    )
    return "\n\n".join(d.page_content for d in docs)

agent = Agent(tools=[search_docs])
agent("What does our refund policy say about digital purchases?")

elsai Retrievers ​

Installation ​

HybridRetriever ​

Combining dense and sparse (BM25) ​

How hybrid retrieval works ​

Multiple semantic retrievers ​

Integration with elsai Agents ​

elsai Retrievers

Installation

HybridRetriever

Combining dense and sparse (BM25)

How hybrid retrieval works

Multiple semantic retrievers

Integration with elsai Agents