Elsai Utilities#

The Elsai Utilities package provides helper classes for chunking and converting documents for use in retrieval-augmented generation (RAG) and vector database ingestion pipelines.

Prerequisites#

  • Python >= 3.9

Installation#

To install the elsai-utilities package:

pip install --extra-index-url https://elsai-core-package.optisolbusiness.com/root/elsai-utilities/ elsai-utilities==0.2.0

DocumentChunker#

The DocumentChunker class provides various ways to split text into structured chunks.

from elsai_utilities.splitters import DocumentChunker

chunker = DocumentChunker()

contents = "# This is the first page.\n\n## This is the second page.\n\n### This is the third page."

# Page-wise chunking
chunks = chunker.chunk_page_wise(contents=contents, file_name="example.md")

# Markdown header-wise chunking
markdown_wise_chunks = chunker.chunk_markdown_header_wise(
    text=contents,
    file_name="example.md",
    headers_to_split_on=[("#", "Header 1"), ("##", "Header 2")],
    strip_headers=True
)

# Recursive character-wise chunking
text = "This is a long piece of text that should be chunked recursively..."
recursive_chunks = chunker.chunk_recursive(
    contents=text,
    file_name="example.md",
    chunk_size=50,
    chunk_overlap=10
)

DocumentConverter#

The DocumentConverter class provides utilities for converting various document formats into Markdown or other structured objects.

from elsai_utilities.converters import DocumentConverter

converter = DocumentConverter()

# LlamaIndex to LangChain conversion
llama_index_document = {
    "text_resource": {
        "text": "This is a sample text extracted from LlamaIndex."
    }
}

langchain_document = converter.llama_index_to_langchain_document(
    llama_index_document=llama_index_document,
    file_name="example.md"
)

# CSV to Markdown conversion
markdown_from_csv = converter.convert_csv_to_markdown(csv_path="data.csv")

# Excel to Markdown conversion
markdown_from_excel = converter.convert_excel_to_markdown(excel_path="data.xlsx")

ConversationalIntelligence#

The ConversationalIntelligence class provides comprehensive conversational analysis capabilities including follow-up question generation, action item detection, and topic/intent classification.

from elsai_utilities.conversational_intelligence import ConversationalIntelligence

# Initialize with your LLM instance (e.g., ChatOpenAI, Claude, etc.)
ci = ConversationalIntelligence(llm=your_llm_instance)

# Generate follow-up questions
followup_questions = ci.generate_followup_questions(
    user_question="What is machine learning?",
    answer="Machine learning is a subset of AI that enables computers to learn from data.",
    context=["Previous discussion about AI", "User is a beginner"],
    num_questions=3
)

# Safe version with fallback questions
safe_questions = ci.generate_followup_questions_safe(
    user_question="What is machine learning?",
    answer="Machine learning is a subset of AI...",
    num_questions=3,
    fallback_questions=["Can you tell me more?", "What are some examples?"]
)

# Detect action items from conversation
messages = [
    "John, can you prepare the quarterly report by Friday?",
    "Sure, I'll have it ready. Should I include the budget analysis?",
    "Yes, and make sure to highlight the key metrics."
]

action_items = ci.detect_action_items(
    messages=messages,
    include_context=True,
    extract_priority=True,
    extract_assignee=True,
    extract_due_date=True,
    min_confidence=0.7
)

# Detect topics and intents
topic_intent_result = ci.detect_topics_and_intents(
    messages=messages,
    detect_topics=True,
    detect_intents=True,
    min_confidence=0.6,
    max_topics=5,
    max_intents=3
)

# Detect only topics
topics = ci.detect_topics_only(
    messages=messages,
    min_confidence=0.6,
    max_topics=5
)

# Detect only intents
intents = ci.detect_intents_only(
    messages=messages,
    min_confidence=0.6,
    max_intents=3
)

# Comprehensive conversation analysis
analysis = ci.analyze_conversation(
    messages=messages,
    include_followup=True,
    include_actions=True,
    include_topics=True,
    include_intents=True
)

# Get conversation summary
summary = ci.get_conversation_summary(messages=messages)

ConversationalIntelligence Features#

Follow-up Question Generation - Generates contextually relevant follow-up questions - Supports conversation context and history - Includes safe fallback mechanisms - Validates question format and content

Action Item Detection - Extracts actionable tasks from conversations - Identifies assignees, due dates, and priorities - Provides confidence scoring - Supports context extraction

Topic and Intent Detection - Identifies main conversation topics - Classifies user intents and purposes - Supports confidence thresholds - Provides keyword extraction and categorization

Comprehensive Analysis - Combines all intelligence features - Provides conversation summaries - Offers flexible configuration options - Returns structured, validated results

Return Types#

The ConversationalIntelligence component returns structured objects:

  • ActionItem Contains task, assignee, due_date, priority, context, and source_message.

  • DetectedTopic Contains name, confidence, keywords, category, and context.

  • DetectedIntent Contains intent_type, confidence, entities, intent_classification, context, and source_message.

  • TopicIntentResponse Contains lists of detected topics and intents.

TableChunker#

The TableChunker class enables efficient handling and chunking of tabular data. It supports Markdown tables, CSV files, and Excel files by normalizing all inputs into Markdown tables before performing row-wise chunking.

from elsai_utilities.splitters import TableChunker
from elsai_utilities.converters import DocumentConverter

table_chunker = TableChunker()
converter = DocumentConverter()

# 1. Chunking a direct markdown table
markdown_table = """| Employee | Dept | Salary |
| --- | --- | --- |
| Alice | Eng | 90000 |
| Bob | Sales | 80000 |
| Charlie | HR | 75000 |
| David | Eng | 95000 |
| Eve | Sales | 82000 |
"""

chunks = table_chunker.table_chunker(markdown_table, chunk_size=2)

# 2. Converting and chunking a CSV file
markdown_from_csv = converter.convert_csv_to_markdown("data.csv")
csv_chunks = table_chunker.table_chunker(markdown_from_csv, chunk_size=3)

# 3. Converting and chunking an Excel file
markdown_from_excel = converter.convert_excel_to_markdown("data.xlsx")
excel_chunks = table_chunker.table_chunker(markdown_from_excel, chunk_size=2)

TableChunker Features#

Tabular Data Support - Handles Markdown tables, CSV, and Excel files. - Normalizes all inputs to Markdown format before chunking.

Row-wise Chunking - Performs row-wise splitting based on a configurable chunk_size. - Preserves table headers in every generated chunk.