Elsai Parsers#
The Elsai Parsers module is a Python module for parsing and querying Excel files using LLMs. It supports natural language queries, structured responses, and handles both small and large files.
Prerequisites#
Python >= 3.9
.env file with appropriate API keys and configuration variables
Installation#
To install the elsai-parsers package:
pip install --extra-index-url https://elsai-core-package.optisolbusiness.com/root/elsai-parsers/ elsai-parsers==0.1.0
Components#
1. Excel Parser#
ExcelParser is a class that enables natural language querying of Excel files using LLMs. It works with any model supported by elsai-model, along with embedding models and vector databases.
from elsai_model.azure_openai import AzureOpenAIConnector
print("=== Initializing Components ===")
llm = AzureOpenAIConnector()
from elsai_embeddings.azure_embeddings import AzureOpenAIEmbeddingModel
embedding_model = AzureOpenAIEmbeddingModel()
from elsai_vectordb.chromadb import ChromaVectorDb
chroma_client = ChromaVectorDb(persist_directory="dir")
# Create config dictionary
config = {
"vector_database": chroma_client,
"embedding_function": embedding_model,
"llm": llm,
"vector_store_type": "chroma"
}
print("=== Initializing ExcelParser ===")
from elsai_parsers.excel_parser import ExcelParser
# Initialize parser with file_path (required)
parser = ExcelParser(
file_path="/home/laptop-obs-318/Documents/Azure-devops/elsai-core/elsai-parsers/elsai_parsers/sample2.xlsx",
config=config
)
print("=== Testing ExcelParser.parse() ===")
# Test 1: Simple query without JSON template
print("\n--- Test 1: Simple Query (no JSON template) ---")
user_prompt = "What are the names of the creators"
result = parser.parse(user_prompt=user_prompt)
print(f"Query: {user_prompt}")
print(f"Result: {result}")
# Test 2: Query with JSON template
print("\n--- Test 2: Query with JSON Template ---")
json_template = """{
"names": ["string"],
"summary": "string"
}"""
user_prompt_json = "What are the names of the creators? Provide a summary."
result_json = parser.parse(user_prompt=user_prompt_json, json_template=json_template)
print(f"Query: {user_prompt_json}")
print(f"JSON Template: {json_template}")
print(f"Result: {result_json}")
print("\n=== All Tests Completed ===")
Key Features:
Natural language querying of Excel files
Support for structured JSON responses via templates
Works with any LLM model supported by elsai-model
Handles both small and large Excel files efficiently
Integrates with vector databases for enhanced querying capabilities