biswanath2.roul
Initial commit
e4d5155
# efficient-context Documentation
## Overview
`efficient-context` is a Python library designed to optimize the handling of context for Large Language Models (LLMs) in CPU-constrained environments. It addresses the challenges of using LLMs with limited computational resources by providing efficient context management strategies.
## Key Features
1. **Context Compression**: Reduce memory requirements while preserving information quality
2. **Semantic Chunking**: Go beyond token-based approaches for more effective context management
3. **Retrieval Optimization**: Minimize context size through intelligent retrieval strategies
4. **Memory Management**: Handle large contexts on limited hardware resources
## Installation
```bash
pip install efficient-context
```
## Core Components
### ContextManager
The central class that orchestrates all components of the library.
```python
from efficient_context import ContextManager
# Initialize with default settings
context_manager = ContextManager()
# Add documents
context_manager.add_document("This is a sample document about renewable energy...")
context_manager.add_documents([doc1, doc2, doc3]) # Add multiple documents
# Generate context for a query
optimized_context = context_manager.generate_context(query="Tell me about renewable energy")
```
### Context Compression
The compression module reduces the size of content while preserving key information.
```python
from efficient_context.compression import SemanticDeduplicator
# Initialize with custom settings
compressor = SemanticDeduplicator(
threshold=0.85, # Similarity threshold for deduplication
embedding_model="lightweight", # Use a lightweight embedding model
min_sentence_length=10, # Minimum length of sentences to consider
importance_weight=0.3 # Weight given to sentence importance vs. deduplication
)
# Compress content
compressed_content = compressor.compress(
content="Your large text content here...",
target_size=1000 # Optional target size in tokens
)
```
### Semantic Chunking
The chunking module divides content into semantically coherent chunks.
```python
from efficient_context.chunking import SemanticChunker
# Initialize with custom settings
chunker = SemanticChunker(
chunk_size=512, # Target size for chunks in tokens
chunk_overlap=50, # Number of tokens to overlap between chunks
respect_paragraphs=True, # Avoid breaking paragraphs across chunks
min_chunk_size=100, # Minimum chunk size in tokens
max_chunk_size=1024 # Maximum chunk size in tokens
)
# Chunk content
chunks = chunker.chunk(
content="Your large text content here...",
document_id="doc-1", # Optional document ID
metadata={"source": "example", "author": "John Doe"} # Optional metadata
)
```
### Retrieval Optimization
The retrieval module finds the most relevant chunks for a query.
```python
from efficient_context.retrieval import CPUOptimizedRetriever
# Initialize with custom settings
retriever = CPUOptimizedRetriever(
embedding_model="lightweight", # Use a lightweight embedding model
similarity_metric="cosine", # Metric for comparing embeddings
use_batching=True, # Batch embedding operations
batch_size=32, # Size of batches for embedding
max_index_size=5000 # Maximum number of chunks to keep in the index
)
# Index chunks
retriever.index_chunks(chunks)
# Retrieve relevant chunks
relevant_chunks = retriever.retrieve(
query="Your query here...",
top_k=5 # Number of chunks to retrieve
)
```
### Memory Management
The memory module helps optimize memory usage during operations.
```python
from efficient_context.memory import MemoryManager
# Initialize with custom settings
memory_manager = MemoryManager(
target_usage_percent=80.0, # Target memory usage percentage
aggressive_cleanup=False, # Whether to perform aggressive garbage collection
memory_monitor_interval=None # Interval for memory monitoring in seconds
)
# Use context manager for memory-intensive operations
with memory_manager.optimize_memory():
# Run memory-intensive operations here
results = process_large_documents(documents)
# Get memory usage statistics
memory_stats = memory_manager.get_memory_usage()
print(f"Process memory: {memory_stats['process_rss_bytes'] / (1024*1024):.2f} MB")
```
## Advanced Usage
### Customizing the Context Manager
```python
from efficient_context import ContextManager
from efficient_context.compression import SemanticDeduplicator
from efficient_context.chunking import SemanticChunker
from efficient_context.retrieval import CPUOptimizedRetriever
from efficient_context.memory import MemoryManager
# Initialize a fully customized context manager
context_manager = ContextManager(
compressor=SemanticDeduplicator(threshold=0.85),
chunker=SemanticChunker(chunk_size=256, chunk_overlap=50),
retriever=CPUOptimizedRetriever(embedding_model="lightweight"),
memory_manager=MemoryManager(target_usage_percent=80.0),
max_context_size=4096
)
```
### Integration with LLMs
```python
from efficient_context import ContextManager
from your_llm_library import LLM # Replace with your actual LLM library
# Initialize components
context_manager = ContextManager()
llm = LLM(model="lightweight-model")
# Process documents
context_manager.add_documents(documents)
# For each query
query = "Tell me about renewable energy"
optimized_context = context_manager.generate_context(query=query)
# Use context with the LLM
response = llm.generate(
prompt=query,
context=optimized_context,
max_tokens=512
)
```
## Performance Considerations
- **Memory Usage**: The library is designed to be memory-efficient, but be aware that embedding models may still require significant memory.
- **CPU Performance**: Choose the appropriate embedding model based on your CPU capabilities. The `lightweight` option is recommended for constrained environments.
- **Batch Size**: Adjust the `batch_size` parameter in retrieval to balance between memory usage and processing speed.
- **Context Size**: Setting appropriate `max_context_size` can significantly impact performance, especially when working with limited resources.
## Extending the Library
You can create custom implementations of the base classes to adapt the library to your specific needs:
```python
from efficient_context.compression.base import BaseCompressor
class MyCustomCompressor(BaseCompressor):
def __init__(self, custom_param=None):
self.custom_param = custom_param
def compress(self, content, target_size=None):
# Your custom compression logic here
return compressed_content
```
## Troubleshooting
**High Memory Usage**
- Reduce `batch_size` in the retriever
- Use a more lightweight embedding model
- Decrease `max_index_size` to limit the number of chunks stored in memory
**Slow Processing**
- Increase `batch_size` (balancing with memory constraints)
- Increase `threshold` in the SemanticDeduplicator to be more aggressive with deduplication
- Reduce `chunk_overlap` to minimize redundant processing
## Example Applications
- **Chatbots on Edge Devices**: Enable context-aware conversations on devices with limited resources
- **Document QA Systems**: Create efficient question-answering systems for large document collections
- **Embedded AI Applications**: Incorporate context-aware LLM capabilities in embedded systems
- **Mobile Applications**: Provide sophisticated LLM features in mobile apps with limited resources