Elsai Utilities#
The Elsai Utilities package provides helper classes for chunking and converting documents for use in retrieval-augmented generation (RAG) and vector database ingestion pipelines.
Prerequisites#
Python >= 3.9
Installation#
To install the elsai-utilities package:
pip install --extra-index-url https://elsai-core-package.optisolbusiness.com/root/elsai-utilities/ elsai-utilities==0.2.0
DocumentChunker#
The DocumentChunker class provides various ways to split text into structured chunks.
from elsai_utilities.splitters import DocumentChunker
chunker = DocumentChunker()
contents = "# This is the first page.\n\n## This is the second page.\n\n### This is the third page."
# Page-wise chunking
chunks = chunker.chunk_page_wise(contents=contents, file_name="example.md")
# Markdown header-wise chunking
markdown_wise_chunks = chunker.chunk_markdown_header_wise(
text=contents,
file_name="example.md",
headers_to_split_on=[("#", "Header 1"), ("##", "Header 2")],
strip_headers=True
)
# Recursive character-wise chunking
text = "This is a long piece of text that should be chunked recursively..."
recursive_chunks = chunker.chunk_recursive(
contents=text,
file_name="example.md",
chunk_size=50,
chunk_overlap=10
)
DocumentConverter#
The DocumentConverter class provides utilities for converting various document formats into Markdown or other structured objects.
from elsai_utilities.converters import DocumentConverter
converter = DocumentConverter()
# LlamaIndex to LangChain conversion
llama_index_document = {
"text_resource": {
"text": "This is a sample text extracted from LlamaIndex."
}
}
langchain_document = converter.llama_index_to_langchain_document(
llama_index_document=llama_index_document,
file_name="example.md"
)
# CSV to Markdown conversion
markdown_from_csv = converter.convert_csv_to_markdown(csv_path="data.csv")
# Excel to Markdown conversion
markdown_from_excel = converter.convert_excel_to_markdown(excel_path="data.xlsx")
ConversationalIntelligence#
The ConversationalIntelligence class provides comprehensive conversational analysis capabilities including follow-up question generation, action item detection, and topic/intent classification.
from elsai_utilities.conversational_intelligence import ConversationalIntelligence
# Initialize with your LLM instance (e.g., ChatOpenAI, Claude, etc.)
ci = ConversationalIntelligence(llm=your_llm_instance)
# Generate follow-up questions
followup_questions = ci.generate_followup_questions(
user_question="What is machine learning?",
answer="Machine learning is a subset of AI that enables computers to learn from data.",
context=["Previous discussion about AI", "User is a beginner"],
num_questions=3
)
# Safe version with fallback questions
safe_questions = ci.generate_followup_questions_safe(
user_question="What is machine learning?",
answer="Machine learning is a subset of AI...",
num_questions=3,
fallback_questions=["Can you tell me more?", "What are some examples?"]
)
# Detect action items from conversation
messages = [
"John, can you prepare the quarterly report by Friday?",
"Sure, I'll have it ready. Should I include the budget analysis?",
"Yes, and make sure to highlight the key metrics."
]
action_items = ci.detect_action_items(
messages=messages,
include_context=True,
extract_priority=True,
extract_assignee=True,
extract_due_date=True,
min_confidence=0.7
)
# Detect topics and intents
topic_intent_result = ci.detect_topics_and_intents(
messages=messages,
detect_topics=True,
detect_intents=True,
min_confidence=0.6,
max_topics=5,
max_intents=3
)
# Detect only topics
topics = ci.detect_topics_only(
messages=messages,
min_confidence=0.6,
max_topics=5
)
# Detect only intents
intents = ci.detect_intents_only(
messages=messages,
min_confidence=0.6,
max_intents=3
)
# Comprehensive conversation analysis
analysis = ci.analyze_conversation(
messages=messages,
include_followup=True,
include_actions=True,
include_topics=True,
include_intents=True
)
# Get conversation summary
summary = ci.get_conversation_summary(messages=messages)
ConversationalIntelligence Features#
Follow-up Question Generation - Generates contextually relevant follow-up questions - Supports conversation context and history - Includes safe fallback mechanisms - Validates question format and content
Action Item Detection - Extracts actionable tasks from conversations - Identifies assignees, due dates, and priorities - Provides confidence scoring - Supports context extraction
Topic and Intent Detection - Identifies main conversation topics - Classifies user intents and purposes - Supports confidence thresholds - Provides keyword extraction and categorization
Comprehensive Analysis - Combines all intelligence features - Provides conversation summaries - Offers flexible configuration options - Returns structured, validated results
Return Types#
The ConversationalIntelligence component returns structured objects:
ActionItem Contains task, assignee, due_date, priority, context, and source_message.
DetectedTopic Contains name, confidence, keywords, category, and context.
DetectedIntent Contains intent_type, confidence, entities, intent_classification, context, and source_message.
TopicIntentResponse Contains lists of detected topics and intents.
TableChunker#
The TableChunker class enables efficient handling and chunking of tabular data. It supports Markdown tables, CSV files, and Excel files by normalizing all inputs into Markdown tables before performing row-wise chunking.
from elsai_utilities.splitters import TableChunker
from elsai_utilities.converters import DocumentConverter
table_chunker = TableChunker()
converter = DocumentConverter()
# 1. Chunking a direct markdown table
markdown_table = """| Employee | Dept | Salary |
| --- | --- | --- |
| Alice | Eng | 90000 |
| Bob | Sales | 80000 |
| Charlie | HR | 75000 |
| David | Eng | 95000 |
| Eve | Sales | 82000 |
"""
chunks = table_chunker.table_chunker(markdown_table, chunk_size=2)
# 2. Converting and chunking a CSV file
markdown_from_csv = converter.convert_csv_to_markdown("data.csv")
csv_chunks = table_chunker.table_chunker(markdown_from_csv, chunk_size=3)
# 3. Converting and chunking an Excel file
markdown_from_excel = converter.convert_excel_to_markdown("data.xlsx")
excel_chunks = table_chunker.table_chunker(markdown_from_excel, chunk_size=2)
TableChunker Features#
Tabular Data Support - Handles Markdown tables, CSV, and Excel files. - Normalizes all inputs to Markdown format before chunking.
Row-wise Chunking - Performs row-wise splitting based on a configurable chunk_size. - Preserves table headers in every generated chunk.