|
# efficient-context Documentation |
|
|
|
## Overview |
|
|
|
`efficient-context` is a Python library designed to optimize the handling of context for Large Language Models (LLMs) in CPU-constrained environments. It addresses the challenges of using LLMs with limited computational resources by providing efficient context management strategies. |
|
|
|
## Key Features |
|
|
|
1. **Context Compression**: Reduce memory requirements while preserving information quality |
|
2. **Semantic Chunking**: Go beyond token-based approaches for more effective context management |
|
3. **Retrieval Optimization**: Minimize context size through intelligent retrieval strategies |
|
4. **Memory Management**: Handle large contexts on limited hardware resources |
|
|
|
## Installation |
|
|
|
```bash |
|
pip install efficient-context |
|
``` |
|
|
|
## Core Components |
|
|
|
### ContextManager |
|
|
|
The central class that orchestrates all components of the library. |
|
|
|
```python |
|
from efficient_context import ContextManager |
|
|
|
# Initialize with default settings |
|
context_manager = ContextManager() |
|
|
|
# Add documents |
|
context_manager.add_document("This is a sample document about renewable energy...") |
|
context_manager.add_documents([doc1, doc2, doc3]) # Add multiple documents |
|
|
|
# Generate context for a query |
|
optimized_context = context_manager.generate_context(query="Tell me about renewable energy") |
|
``` |
|
|
|
### Context Compression |
|
|
|
The compression module reduces the size of content while preserving key information. |
|
|
|
```python |
|
from efficient_context.compression import SemanticDeduplicator |
|
|
|
# Initialize with custom settings |
|
compressor = SemanticDeduplicator( |
|
threshold=0.85, # Similarity threshold for deduplication |
|
embedding_model="lightweight", # Use a lightweight embedding model |
|
min_sentence_length=10, # Minimum length of sentences to consider |
|
importance_weight=0.3 # Weight given to sentence importance vs. deduplication |
|
) |
|
|
|
# Compress content |
|
compressed_content = compressor.compress( |
|
content="Your large text content here...", |
|
target_size=1000 # Optional target size in tokens |
|
) |
|
``` |
|
|
|
### Semantic Chunking |
|
|
|
The chunking module divides content into semantically coherent chunks. |
|
|
|
```python |
|
from efficient_context.chunking import SemanticChunker |
|
|
|
# Initialize with custom settings |
|
chunker = SemanticChunker( |
|
chunk_size=512, # Target size for chunks in tokens |
|
chunk_overlap=50, # Number of tokens to overlap between chunks |
|
respect_paragraphs=True, # Avoid breaking paragraphs across chunks |
|
min_chunk_size=100, # Minimum chunk size in tokens |
|
max_chunk_size=1024 # Maximum chunk size in tokens |
|
) |
|
|
|
# Chunk content |
|
chunks = chunker.chunk( |
|
content="Your large text content here...", |
|
document_id="doc-1", # Optional document ID |
|
metadata={"source": "example", "author": "John Doe"} # Optional metadata |
|
) |
|
``` |
|
|
|
### Retrieval Optimization |
|
|
|
The retrieval module finds the most relevant chunks for a query. |
|
|
|
```python |
|
from efficient_context.retrieval import CPUOptimizedRetriever |
|
|
|
# Initialize with custom settings |
|
retriever = CPUOptimizedRetriever( |
|
embedding_model="lightweight", # Use a lightweight embedding model |
|
similarity_metric="cosine", # Metric for comparing embeddings |
|
use_batching=True, # Batch embedding operations |
|
batch_size=32, # Size of batches for embedding |
|
max_index_size=5000 # Maximum number of chunks to keep in the index |
|
) |
|
|
|
# Index chunks |
|
retriever.index_chunks(chunks) |
|
|
|
# Retrieve relevant chunks |
|
relevant_chunks = retriever.retrieve( |
|
query="Your query here...", |
|
top_k=5 # Number of chunks to retrieve |
|
) |
|
``` |
|
|
|
### Memory Management |
|
|
|
The memory module helps optimize memory usage during operations. |
|
|
|
```python |
|
from efficient_context.memory import MemoryManager |
|
|
|
# Initialize with custom settings |
|
memory_manager = MemoryManager( |
|
target_usage_percent=80.0, # Target memory usage percentage |
|
aggressive_cleanup=False, # Whether to perform aggressive garbage collection |
|
memory_monitor_interval=None # Interval for memory monitoring in seconds |
|
) |
|
|
|
# Use context manager for memory-intensive operations |
|
with memory_manager.optimize_memory(): |
|
# Run memory-intensive operations here |
|
results = process_large_documents(documents) |
|
|
|
# Get memory usage statistics |
|
memory_stats = memory_manager.get_memory_usage() |
|
print(f"Process memory: {memory_stats['process_rss_bytes'] / (1024*1024):.2f} MB") |
|
``` |
|
|
|
## Advanced Usage |
|
|
|
### Customizing the Context Manager |
|
|
|
```python |
|
from efficient_context import ContextManager |
|
from efficient_context.compression import SemanticDeduplicator |
|
from efficient_context.chunking import SemanticChunker |
|
from efficient_context.retrieval import CPUOptimizedRetriever |
|
from efficient_context.memory import MemoryManager |
|
|
|
# Initialize a fully customized context manager |
|
context_manager = ContextManager( |
|
compressor=SemanticDeduplicator(threshold=0.85), |
|
chunker=SemanticChunker(chunk_size=256, chunk_overlap=50), |
|
retriever=CPUOptimizedRetriever(embedding_model="lightweight"), |
|
memory_manager=MemoryManager(target_usage_percent=80.0), |
|
max_context_size=4096 |
|
) |
|
``` |
|
|
|
### Integration with LLMs |
|
|
|
```python |
|
from efficient_context import ContextManager |
|
from your_llm_library import LLM # Replace with your actual LLM library |
|
|
|
# Initialize components |
|
context_manager = ContextManager() |
|
llm = LLM(model="lightweight-model") |
|
|
|
# Process documents |
|
context_manager.add_documents(documents) |
|
|
|
# For each query |
|
query = "Tell me about renewable energy" |
|
optimized_context = context_manager.generate_context(query=query) |
|
|
|
# Use context with the LLM |
|
response = llm.generate( |
|
prompt=query, |
|
context=optimized_context, |
|
max_tokens=512 |
|
) |
|
``` |
|
|
|
## Performance Considerations |
|
|
|
- **Memory Usage**: The library is designed to be memory-efficient, but be aware that embedding models may still require significant memory. |
|
- **CPU Performance**: Choose the appropriate embedding model based on your CPU capabilities. The `lightweight` option is recommended for constrained environments. |
|
- **Batch Size**: Adjust the `batch_size` parameter in retrieval to balance between memory usage and processing speed. |
|
- **Context Size**: Setting appropriate `max_context_size` can significantly impact performance, especially when working with limited resources. |
|
|
|
## Extending the Library |
|
|
|
You can create custom implementations of the base classes to adapt the library to your specific needs: |
|
|
|
```python |
|
from efficient_context.compression.base import BaseCompressor |
|
|
|
class MyCustomCompressor(BaseCompressor): |
|
def __init__(self, custom_param=None): |
|
self.custom_param = custom_param |
|
|
|
def compress(self, content, target_size=None): |
|
# Your custom compression logic here |
|
return compressed_content |
|
``` |
|
|
|
## Troubleshooting |
|
|
|
**High Memory Usage** |
|
- Reduce `batch_size` in the retriever |
|
- Use a more lightweight embedding model |
|
- Decrease `max_index_size` to limit the number of chunks stored in memory |
|
|
|
**Slow Processing** |
|
- Increase `batch_size` (balancing with memory constraints) |
|
- Increase `threshold` in the SemanticDeduplicator to be more aggressive with deduplication |
|
- Reduce `chunk_overlap` to minimize redundant processing |
|
|
|
## Example Applications |
|
|
|
- **Chatbots on Edge Devices**: Enable context-aware conversations on devices with limited resources |
|
- **Document QA Systems**: Create efficient question-answering systems for large document collections |
|
- **Embedded AI Applications**: Incorporate context-aware LLM capabilities in embedded systems |
|
- **Mobile Applications**: Provide sophisticated LLM features in mobile apps with limited resources |
|
|