efficient-context Documentation

Overview

efficient-context is a Python library designed to optimize the handling of context for Large Language Models (LLMs) in CPU-constrained environments. It addresses the challenges of using LLMs with limited computational resources by providing efficient context management strategies.

Key Features

Context Compression: Reduce memory requirements while preserving information quality
Semantic Chunking: Go beyond token-based approaches for more effective context management
Retrieval Optimization: Minimize context size through intelligent retrieval strategies
Memory Management: Handle large contexts on limited hardware resources

Installation

pip install efficient-context

Core Components

ContextManager

The central class that orchestrates all components of the library.

from efficient_context import ContextManager

# Initialize with default settings
context_manager = ContextManager()

# Add documents
context_manager.add_document("This is a sample document about renewable energy...")
context_manager.add_documents([doc1, doc2, doc3])  # Add multiple documents

# Generate context for a query
optimized_context = context_manager.generate_context(query="Tell me about renewable energy")

Context Compression

The compression module reduces the size of content while preserving key information.

from efficient_context.compression import SemanticDeduplicator

# Initialize with custom settings
compressor = SemanticDeduplicator(
    threshold=0.85,  # Similarity threshold for deduplication
    embedding_model="lightweight",  # Use a lightweight embedding model
    min_sentence_length=10,  # Minimum length of sentences to consider
    importance_weight=0.3  # Weight given to sentence importance vs. deduplication
)

# Compress content
compressed_content = compressor.compress(
    content="Your large text content here...",
    target_size=1000  # Optional target size in tokens
)

Semantic Chunking

The chunking module divides content into semantically coherent chunks.

from efficient_context.chunking import SemanticChunker

# Initialize with custom settings
chunker = SemanticChunker(
    chunk_size=512,  # Target size for chunks in tokens
    chunk_overlap=50,  # Number of tokens to overlap between chunks
    respect_paragraphs=True,  # Avoid breaking paragraphs across chunks
    min_chunk_size=100,  # Minimum chunk size in tokens
    max_chunk_size=1024  # Maximum chunk size in tokens
)

# Chunk content
chunks = chunker.chunk(
    content="Your large text content here...",
    document_id="doc-1",  # Optional document ID
    metadata={"source": "example", "author": "John Doe"}  # Optional metadata
)

Retrieval Optimization

The retrieval module finds the most relevant chunks for a query.

from efficient_context.retrieval import CPUOptimizedRetriever

# Initialize with custom settings
retriever = CPUOptimizedRetriever(
    embedding_model="lightweight",  # Use a lightweight embedding model
    similarity_metric="cosine",  # Metric for comparing embeddings
    use_batching=True,  # Batch embedding operations
    batch_size=32,  # Size of batches for embedding
    max_index_size=5000  # Maximum number of chunks to keep in the index
)

# Index chunks
retriever.index_chunks(chunks)

# Retrieve relevant chunks
relevant_chunks = retriever.retrieve(
    query="Your query here...",
    top_k=5  # Number of chunks to retrieve
)

Memory Management

The memory module helps optimize memory usage during operations.

from efficient_context.memory import MemoryManager

# Initialize with custom settings
memory_manager = MemoryManager(
    target_usage_percent=80.0,  # Target memory usage percentage
    aggressive_cleanup=False,  # Whether to perform aggressive garbage collection
    memory_monitor_interval=None  # Interval for memory monitoring in seconds
)

# Use context manager for memory-intensive operations
with memory_manager.optimize_memory():
    # Run memory-intensive operations here
    results = process_large_documents(documents)

# Get memory usage statistics
memory_stats = memory_manager.get_memory_usage()
print(f"Process memory: {memory_stats['process_rss_bytes'] / (1024*1024):.2f} MB")

Advanced Usage

Customizing the Context Manager

from efficient_context import ContextManager
from efficient_context.compression import SemanticDeduplicator
from efficient_context.chunking import SemanticChunker
from efficient_context.retrieval import CPUOptimizedRetriever
from efficient_context.memory import MemoryManager

# Initialize a fully customized context manager
context_manager = ContextManager(
    compressor=SemanticDeduplicator(threshold=0.85),
    chunker=SemanticChunker(chunk_size=256, chunk_overlap=50),
    retriever=CPUOptimizedRetriever(embedding_model="lightweight"),
    memory_manager=MemoryManager(target_usage_percent=80.0),
    max_context_size=4096
)

Integration with LLMs

from efficient_context import ContextManager
from your_llm_library import LLM  # Replace with your actual LLM library

# Initialize components
context_manager = ContextManager()
llm = LLM(model="lightweight-model")

# Process documents
context_manager.add_documents(documents)

# For each query
query = "Tell me about renewable energy"
optimized_context = context_manager.generate_context(query=query)

# Use context with the LLM
response = llm.generate(
    prompt=query,
    context=optimized_context,
    max_tokens=512
)

Performance Considerations

Memory Usage: The library is designed to be memory-efficient, but be aware that embedding models may still require significant memory.
CPU Performance: Choose the appropriate embedding model based on your CPU capabilities. The lightweight option is recommended for constrained environments.
Batch Size: Adjust the batch_size parameter in retrieval to balance between memory usage and processing speed.
Context Size: Setting appropriate max_context_size can significantly impact performance, especially when working with limited resources.

Extending the Library

You can create custom implementations of the base classes to adapt the library to your specific needs:

from efficient_context.compression.base import BaseCompressor

class MyCustomCompressor(BaseCompressor):
    def __init__(self, custom_param=None):
        self.custom_param = custom_param
    
    def compress(self, content, target_size=None):
        # Your custom compression logic here
        return compressed_content

Troubleshooting

High Memory Usage

Reduce batch_size in the retriever
Use a more lightweight embedding model
Decrease max_index_size to limit the number of chunks stored in memory

Slow Processing

Increase batch_size (balancing with memory constraints)
Increase threshold in the SemanticDeduplicator to be more aggressive with deduplication
Reduce chunk_overlap to minimize redundant processing

Example Applications

Chatbots on Edge Devices: Enable context-aware conversations on devices with limited resources
Document QA Systems: Create efficient question-answering systems for large document collections
Embedded AI Applications: Incorporate context-aware LLM capabilities in embedded systems
Mobile Applications: Provide sophisticated LLM features in mobile apps with limited resources