Elsai Parsers#

The Elsai Parsers module is a Python module for parsing and querying Excel files using LLMs. It supports natural language queries, structured responses, and handles both small and large files.

Prerequisites#

  • Python >= 3.9

  • .env file with appropriate API keys and configuration variables

Installation#

To install the elsai-parsers package:

pip install --extra-index-url https://elsai-core-package.optisolbusiness.com/root/elsai-parsers/ elsai-parsers==0.1.0

Components#

1. Excel Parser#

ExcelParser is a class that enables natural language querying of Excel files using LLMs. It works with any model supported by elsai-model, along with embedding models and vector databases.

from elsai_model.azure_openai import AzureOpenAIConnector

print("=== Initializing Components ===")
llm = AzureOpenAIConnector()

from elsai_embeddings.azure_embeddings import AzureOpenAIEmbeddingModel
embedding_model = AzureOpenAIEmbeddingModel()

from elsai_vectordb.chromadb import ChromaVectorDb
chroma_client = ChromaVectorDb(persist_directory="dir")

# Create config dictionary
config = {
    "vector_database": chroma_client,
    "embedding_function": embedding_model,
    "llm": llm,
    "vector_store_type": "chroma"
}

print("=== Initializing ExcelParser ===")
from elsai_parsers.excel_parser import ExcelParser

# Initialize parser with file_path (required)
parser = ExcelParser(
    file_path="/home/laptop-obs-318/Documents/Azure-devops/elsai-core/elsai-parsers/elsai_parsers/sample2.xlsx",
    config=config
)

print("=== Testing ExcelParser.parse() ===")
# Test 1: Simple query without JSON template
print("\n--- Test 1: Simple Query (no JSON template) ---")
user_prompt = "What are the names of the creators"
result = parser.parse(user_prompt=user_prompt)
print(f"Query: {user_prompt}")
print(f"Result: {result}")

# Test 2: Query with JSON template
print("\n--- Test 2: Query with JSON Template ---")
json_template = """{
    "names": ["string"],
    "summary": "string"
}"""
user_prompt_json = "What are the names of the creators? Provide a summary."
result_json = parser.parse(user_prompt=user_prompt_json, json_template=json_template)
print(f"Query: {user_prompt_json}")
print(f"JSON Template: {json_template}")
print(f"Result: {result_json}")

print("\n=== All Tests Completed ===")

Key Features:

  • Natural language querying of Excel files

  • Support for structured JSON responses via templates

  • Works with any LLM model supported by elsai-model

  • Handles both small and large Excel files efficiently

  • Integrates with vector databases for enhanced querying capabilities