Elsai VectorDB#
The Elsai VectorDB package provides interfaces to work with vector databases like ChromaDB and Pinecone, enabling efficient storage and retrieval of document embeddings.
Prerequisites#
Python >= 3.9
.env file with appropriate API keys and configuration variables
Installation#
To install the elsai-vectordb package:
pip install --index-url https://elsai-core-package.optisolbusiness.com/root/elsai-vectordb/ elsai-vectordb==0.1.0
Components#
1. ChromaVectorDb#
ChromaVectorDb is a wrapper around ChromaDB to manage local document embeddings with persistent storage.
from elsai_vectordb.chromadb import ChromaVectorDb
chroma_client = ChromaVectorDb(persist_directory="your_persist_directory") # Or set in environment variable CHROMA_PERSIST_DIRECTORY
chroma_client.create_if_not_exists(collection_name="your_collection_name")
document = {
"id": "001",
"embeddings": [0.1, 0.2, 0.7], # Example embedding vector
"page_content": "This is a sample document.",
"metadatas": {"source": "example_source", "file_id": "doc1"}
}
chroma_client.add_document(document=document, collection_name="your_collection_name")
documents = chroma_client.retrieve_document(
collection_name="your_collection_name",
embeddings=[0.1, 0.2, 0.7],
files_id=["doc1"],
k=5
)
collection = chroma_client.get_collection(collection_name="your_collection_name")
chunks = chroma_client.fetch_chunks(collection_name="your_collection_name", files_id=["doc1"])
chroma_client.delete_collection(collection_name="your_collection_name")
Required Environment Variables:
CHROMA_PERSIST_DIRECTORY
– Path to the directory where ChromaDB will persist data locally
2. PineconeVectorDb#
PineconeVectorDb integrates with Pinecone to manage vector search using cloud-hosted infrastructure.
from elsai_vectordb.pinecone import PineconeVectorDb
pinecone_client = PineconeVectorDb(
index_name="testingindex",
pinecone_api_key="pinecone_api_key", # Or set in environment variable PINECONE_API_KEY
dimension=1536 # Example dimension size
)
pinecone_client.add_document(
document={
"id": "001",
"embeddings": [0.1, 0.2, 0.7], # Replace with a 1536-dimension vector
"page_content": "This is a sample document.",
"metadatas": {"source": "example_source", "file_id": "doc1"}
},
namespace="namespacename"
)
results = pinecone_client.retrieve_document(
namespace="namespacename",
question_embedding=[0.1, 0.2, 0.7],
files_id=["doc1"],
k=5
)
Required Environment Variables:
PINECONE_API_KEY
– API key to authenticate with Pinecone vector DB