File size: 7,523 Bytes
e4d5155 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
# efficient-context Documentation
## Overview
`efficient-context` is a Python library designed to optimize the handling of context for Large Language Models (LLMs) in CPU-constrained environments. It addresses the challenges of using LLMs with limited computational resources by providing efficient context management strategies.
## Key Features
1. **Context Compression**: Reduce memory requirements while preserving information quality
2. **Semantic Chunking**: Go beyond token-based approaches for more effective context management
3. **Retrieval Optimization**: Minimize context size through intelligent retrieval strategies
4. **Memory Management**: Handle large contexts on limited hardware resources
## Installation
```bash
pip install efficient-context
```
## Core Components
### ContextManager
The central class that orchestrates all components of the library.
```python
from efficient_context import ContextManager
# Initialize with default settings
context_manager = ContextManager()
# Add documents
context_manager.add_document("This is a sample document about renewable energy...")
context_manager.add_documents([doc1, doc2, doc3]) # Add multiple documents
# Generate context for a query
optimized_context = context_manager.generate_context(query="Tell me about renewable energy")
```
### Context Compression
The compression module reduces the size of content while preserving key information.
```python
from efficient_context.compression import SemanticDeduplicator
# Initialize with custom settings
compressor = SemanticDeduplicator(
threshold=0.85, # Similarity threshold for deduplication
embedding_model="lightweight", # Use a lightweight embedding model
min_sentence_length=10, # Minimum length of sentences to consider
importance_weight=0.3 # Weight given to sentence importance vs. deduplication
)
# Compress content
compressed_content = compressor.compress(
content="Your large text content here...",
target_size=1000 # Optional target size in tokens
)
```
### Semantic Chunking
The chunking module divides content into semantically coherent chunks.
```python
from efficient_context.chunking import SemanticChunker
# Initialize with custom settings
chunker = SemanticChunker(
chunk_size=512, # Target size for chunks in tokens
chunk_overlap=50, # Number of tokens to overlap between chunks
respect_paragraphs=True, # Avoid breaking paragraphs across chunks
min_chunk_size=100, # Minimum chunk size in tokens
max_chunk_size=1024 # Maximum chunk size in tokens
)
# Chunk content
chunks = chunker.chunk(
content="Your large text content here...",
document_id="doc-1", # Optional document ID
metadata={"source": "example", "author": "John Doe"} # Optional metadata
)
```
### Retrieval Optimization
The retrieval module finds the most relevant chunks for a query.
```python
from efficient_context.retrieval import CPUOptimizedRetriever
# Initialize with custom settings
retriever = CPUOptimizedRetriever(
embedding_model="lightweight", # Use a lightweight embedding model
similarity_metric="cosine", # Metric for comparing embeddings
use_batching=True, # Batch embedding operations
batch_size=32, # Size of batches for embedding
max_index_size=5000 # Maximum number of chunks to keep in the index
)
# Index chunks
retriever.index_chunks(chunks)
# Retrieve relevant chunks
relevant_chunks = retriever.retrieve(
query="Your query here...",
top_k=5 # Number of chunks to retrieve
)
```
### Memory Management
The memory module helps optimize memory usage during operations.
```python
from efficient_context.memory import MemoryManager
# Initialize with custom settings
memory_manager = MemoryManager(
target_usage_percent=80.0, # Target memory usage percentage
aggressive_cleanup=False, # Whether to perform aggressive garbage collection
memory_monitor_interval=None # Interval for memory monitoring in seconds
)
# Use context manager for memory-intensive operations
with memory_manager.optimize_memory():
# Run memory-intensive operations here
results = process_large_documents(documents)
# Get memory usage statistics
memory_stats = memory_manager.get_memory_usage()
print(f"Process memory: {memory_stats['process_rss_bytes'] / (1024*1024):.2f} MB")
```
## Advanced Usage
### Customizing the Context Manager
```python
from efficient_context import ContextManager
from efficient_context.compression import SemanticDeduplicator
from efficient_context.chunking import SemanticChunker
from efficient_context.retrieval import CPUOptimizedRetriever
from efficient_context.memory import MemoryManager
# Initialize a fully customized context manager
context_manager = ContextManager(
compressor=SemanticDeduplicator(threshold=0.85),
chunker=SemanticChunker(chunk_size=256, chunk_overlap=50),
retriever=CPUOptimizedRetriever(embedding_model="lightweight"),
memory_manager=MemoryManager(target_usage_percent=80.0),
max_context_size=4096
)
```
### Integration with LLMs
```python
from efficient_context import ContextManager
from your_llm_library import LLM # Replace with your actual LLM library
# Initialize components
context_manager = ContextManager()
llm = LLM(model="lightweight-model")
# Process documents
context_manager.add_documents(documents)
# For each query
query = "Tell me about renewable energy"
optimized_context = context_manager.generate_context(query=query)
# Use context with the LLM
response = llm.generate(
prompt=query,
context=optimized_context,
max_tokens=512
)
```
## Performance Considerations
- **Memory Usage**: The library is designed to be memory-efficient, but be aware that embedding models may still require significant memory.
- **CPU Performance**: Choose the appropriate embedding model based on your CPU capabilities. The `lightweight` option is recommended for constrained environments.
- **Batch Size**: Adjust the `batch_size` parameter in retrieval to balance between memory usage and processing speed.
- **Context Size**: Setting appropriate `max_context_size` can significantly impact performance, especially when working with limited resources.
## Extending the Library
You can create custom implementations of the base classes to adapt the library to your specific needs:
```python
from efficient_context.compression.base import BaseCompressor
class MyCustomCompressor(BaseCompressor):
def __init__(self, custom_param=None):
self.custom_param = custom_param
def compress(self, content, target_size=None):
# Your custom compression logic here
return compressed_content
```
## Troubleshooting
**High Memory Usage**
- Reduce `batch_size` in the retriever
- Use a more lightweight embedding model
- Decrease `max_index_size` to limit the number of chunks stored in memory
**Slow Processing**
- Increase `batch_size` (balancing with memory constraints)
- Increase `threshold` in the SemanticDeduplicator to be more aggressive with deduplication
- Reduce `chunk_overlap` to minimize redundant processing
## Example Applications
- **Chatbots on Edge Devices**: Enable context-aware conversations on devices with limited resources
- **Document QA Systems**: Create efficient question-answering systems for large document collections
- **Embedded AI Applications**: Incorporate context-aware LLM capabilities in embedded systems
- **Mobile Applications**: Provide sophisticated LLM features in mobile apps with limited resources
|