File size: 7,523 Bytes

e4d5155

# efficient-context Documentation

## Overview

`efficient-context` is a Python library designed to optimize the handling of context for Large Language Models (LLMs) in CPU-constrained environments. It addresses the challenges of using LLMs with limited computational resources by providing efficient context management strategies.

## Key Features

1. **Context Compression**: Reduce memory requirements while preserving information quality
2. **Semantic Chunking**: Go beyond token-based approaches for more effective context management
3. **Retrieval Optimization**: Minimize context size through intelligent retrieval strategies
4. **Memory Management**: Handle large contexts on limited hardware resources

## Installation

```bash
pip install efficient-context
```

## Core Components

### ContextManager

The central class that orchestrates all components of the library.

```python
from efficient_context import ContextManager

# Initialize with default settings
context_manager = ContextManager()

# Add documents
context_manager.add_document("This is a sample document about renewable energy...")
context_manager.add_documents([doc1, doc2, doc3])  # Add multiple documents

# Generate context for a query
optimized_context = context_manager.generate_context(query="Tell me about renewable energy")
```

### Context Compression

The compression module reduces the size of content while preserving key information.

```python
from efficient_context.compression import SemanticDeduplicator

# Initialize with custom settings
compressor = SemanticDeduplicator(
    threshold=0.85,  # Similarity threshold for deduplication
    embedding_model="lightweight",  # Use a lightweight embedding model
    min_sentence_length=10,  # Minimum length of sentences to consider
    importance_weight=0.3  # Weight given to sentence importance vs. deduplication
)

# Compress content
compressed_content = compressor.compress(
    content="Your large text content here...",
    target_size=1000  # Optional target size in tokens
)
```

### Semantic Chunking

The chunking module divides content into semantically coherent chunks.

```python
from efficient_context.chunking import SemanticChunker

# Initialize with custom settings
chunker = SemanticChunker(
    chunk_size=512,  # Target size for chunks in tokens
    chunk_overlap=50,  # Number of tokens to overlap between chunks
    respect_paragraphs=True,  # Avoid breaking paragraphs across chunks
    min_chunk_size=100,  # Minimum chunk size in tokens
    max_chunk_size=1024  # Maximum chunk size in tokens
)

# Chunk content
chunks = chunker.chunk(
    content="Your large text content here...",
    document_id="doc-1",  # Optional document ID
    metadata={"source": "example", "author": "John Doe"}  # Optional metadata
)
```

### Retrieval Optimization

The retrieval module finds the most relevant chunks for a query.

```python
from efficient_context.retrieval import CPUOptimizedRetriever

# Initialize with custom settings
retriever = CPUOptimizedRetriever(
    embedding_model="lightweight",  # Use a lightweight embedding model
    similarity_metric="cosine",  # Metric for comparing embeddings
    use_batching=True,  # Batch embedding operations
    batch_size=32,  # Size of batches for embedding
    max_index_size=5000  # Maximum number of chunks to keep in the index
)

# Index chunks
retriever.index_chunks(chunks)

# Retrieve relevant chunks
relevant_chunks = retriever.retrieve(
    query="Your query here...",
    top_k=5  # Number of chunks to retrieve
)
```

### Memory Management

The memory module helps optimize memory usage during operations.

```python
from efficient_context.memory import MemoryManager

# Initialize with custom settings
memory_manager = MemoryManager(
    target_usage_percent=80.0,  # Target memory usage percentage
    aggressive_cleanup=False,  # Whether to perform aggressive garbage collection
    memory_monitor_interval=None  # Interval for memory monitoring in seconds
)

# Use context manager for memory-intensive operations
with memory_manager.optimize_memory():
    # Run memory-intensive operations here
    results = process_large_documents(documents)

# Get memory usage statistics
memory_stats = memory_manager.get_memory_usage()
print(f"Process memory: {memory_stats['process_rss_bytes'] / (1024*1024):.2f} MB")
```

## Advanced Usage

### Customizing the Context Manager

```python
from efficient_context import ContextManager
from efficient_context.compression import SemanticDeduplicator
from efficient_context.chunking import SemanticChunker
from efficient_context.retrieval import CPUOptimizedRetriever
from efficient_context.memory import MemoryManager

# Initialize a fully customized context manager
context_manager = ContextManager(
    compressor=SemanticDeduplicator(threshold=0.85),
    chunker=SemanticChunker(chunk_size=256, chunk_overlap=50),
    retriever=CPUOptimizedRetriever(embedding_model="lightweight"),
    memory_manager=MemoryManager(target_usage_percent=80.0),
    max_context_size=4096
)
```

### Integration with LLMs

```python
from efficient_context import ContextManager
from your_llm_library import LLM  # Replace with your actual LLM library

# Initialize components
context_manager = ContextManager()
llm = LLM(model="lightweight-model")

# Process documents
context_manager.add_documents(documents)

# For each query
query = "Tell me about renewable energy"
optimized_context = context_manager.generate_context(query=query)

# Use context with the LLM
response = llm.generate(
    prompt=query,
    context=optimized_context,
    max_tokens=512
)
```

## Performance Considerations

- **Memory Usage**: The library is designed to be memory-efficient, but be aware that embedding models may still require significant memory.
- **CPU Performance**: Choose the appropriate embedding model based on your CPU capabilities. The `lightweight` option is recommended for constrained environments.
- **Batch Size**: Adjust the `batch_size` parameter in retrieval to balance between memory usage and processing speed.
- **Context Size**: Setting appropriate `max_context_size` can significantly impact performance, especially when working with limited resources.

## Extending the Library

You can create custom implementations of the base classes to adapt the library to your specific needs:

```python
from efficient_context.compression.base import BaseCompressor

class MyCustomCompressor(BaseCompressor):
    def __init__(self, custom_param=None):
        self.custom_param = custom_param
    
    def compress(self, content, target_size=None):
        # Your custom compression logic here
        return compressed_content
```

## Troubleshooting

**High Memory Usage**
- Reduce `batch_size` in the retriever
- Use a more lightweight embedding model
- Decrease `max_index_size` to limit the number of chunks stored in memory

**Slow Processing**
- Increase `batch_size` (balancing with memory constraints)
- Increase `threshold` in the SemanticDeduplicator to be more aggressive with deduplication
- Reduce `chunk_overlap` to minimize redundant processing

## Example Applications

- **Chatbots on Edge Devices**: Enable context-aware conversations on devices with limited resources
- **Document QA Systems**: Create efficient question-answering systems for large document collections
- **Embedded AI Applications**: Incorporate context-aware LLM capabilities in embedded systems
- **Mobile Applications**: Provide sophisticated LLM features in mobile apps with limited resources