File size: 7,523 Bytes
e4d5155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
# efficient-context Documentation

## Overview

`efficient-context` is a Python library designed to optimize the handling of context for Large Language Models (LLMs) in CPU-constrained environments. It addresses the challenges of using LLMs with limited computational resources by providing efficient context management strategies.

## Key Features

1. **Context Compression**: Reduce memory requirements while preserving information quality
2. **Semantic Chunking**: Go beyond token-based approaches for more effective context management
3. **Retrieval Optimization**: Minimize context size through intelligent retrieval strategies
4. **Memory Management**: Handle large contexts on limited hardware resources

## Installation

```bash
pip install efficient-context
```

## Core Components

### ContextManager

The central class that orchestrates all components of the library.

```python
from efficient_context import ContextManager

# Initialize with default settings
context_manager = ContextManager()

# Add documents
context_manager.add_document("This is a sample document about renewable energy...")
context_manager.add_documents([doc1, doc2, doc3])  # Add multiple documents

# Generate context for a query
optimized_context = context_manager.generate_context(query="Tell me about renewable energy")
```

### Context Compression

The compression module reduces the size of content while preserving key information.

```python
from efficient_context.compression import SemanticDeduplicator

# Initialize with custom settings
compressor = SemanticDeduplicator(
    threshold=0.85,  # Similarity threshold for deduplication
    embedding_model="lightweight",  # Use a lightweight embedding model
    min_sentence_length=10,  # Minimum length of sentences to consider
    importance_weight=0.3  # Weight given to sentence importance vs. deduplication
)

# Compress content
compressed_content = compressor.compress(
    content="Your large text content here...",
    target_size=1000  # Optional target size in tokens
)
```

### Semantic Chunking

The chunking module divides content into semantically coherent chunks.

```python
from efficient_context.chunking import SemanticChunker

# Initialize with custom settings
chunker = SemanticChunker(
    chunk_size=512,  # Target size for chunks in tokens
    chunk_overlap=50,  # Number of tokens to overlap between chunks
    respect_paragraphs=True,  # Avoid breaking paragraphs across chunks
    min_chunk_size=100,  # Minimum chunk size in tokens
    max_chunk_size=1024  # Maximum chunk size in tokens
)

# Chunk content
chunks = chunker.chunk(
    content="Your large text content here...",
    document_id="doc-1",  # Optional document ID
    metadata={"source": "example", "author": "John Doe"}  # Optional metadata
)
```

### Retrieval Optimization

The retrieval module finds the most relevant chunks for a query.

```python
from efficient_context.retrieval import CPUOptimizedRetriever

# Initialize with custom settings
retriever = CPUOptimizedRetriever(
    embedding_model="lightweight",  # Use a lightweight embedding model
    similarity_metric="cosine",  # Metric for comparing embeddings
    use_batching=True,  # Batch embedding operations
    batch_size=32,  # Size of batches for embedding
    max_index_size=5000  # Maximum number of chunks to keep in the index
)

# Index chunks
retriever.index_chunks(chunks)

# Retrieve relevant chunks
relevant_chunks = retriever.retrieve(
    query="Your query here...",
    top_k=5  # Number of chunks to retrieve
)
```

### Memory Management

The memory module helps optimize memory usage during operations.

```python
from efficient_context.memory import MemoryManager

# Initialize with custom settings
memory_manager = MemoryManager(
    target_usage_percent=80.0,  # Target memory usage percentage
    aggressive_cleanup=False,  # Whether to perform aggressive garbage collection
    memory_monitor_interval=None  # Interval for memory monitoring in seconds
)

# Use context manager for memory-intensive operations
with memory_manager.optimize_memory():
    # Run memory-intensive operations here
    results = process_large_documents(documents)

# Get memory usage statistics
memory_stats = memory_manager.get_memory_usage()
print(f"Process memory: {memory_stats['process_rss_bytes'] / (1024*1024):.2f} MB")
```

## Advanced Usage

### Customizing the Context Manager

```python
from efficient_context import ContextManager
from efficient_context.compression import SemanticDeduplicator
from efficient_context.chunking import SemanticChunker
from efficient_context.retrieval import CPUOptimizedRetriever
from efficient_context.memory import MemoryManager

# Initialize a fully customized context manager
context_manager = ContextManager(
    compressor=SemanticDeduplicator(threshold=0.85),
    chunker=SemanticChunker(chunk_size=256, chunk_overlap=50),
    retriever=CPUOptimizedRetriever(embedding_model="lightweight"),
    memory_manager=MemoryManager(target_usage_percent=80.0),
    max_context_size=4096
)
```

### Integration with LLMs

```python
from efficient_context import ContextManager
from your_llm_library import LLM  # Replace with your actual LLM library

# Initialize components
context_manager = ContextManager()
llm = LLM(model="lightweight-model")

# Process documents
context_manager.add_documents(documents)

# For each query
query = "Tell me about renewable energy"
optimized_context = context_manager.generate_context(query=query)

# Use context with the LLM
response = llm.generate(
    prompt=query,
    context=optimized_context,
    max_tokens=512
)
```

## Performance Considerations

- **Memory Usage**: The library is designed to be memory-efficient, but be aware that embedding models may still require significant memory.
- **CPU Performance**: Choose the appropriate embedding model based on your CPU capabilities. The `lightweight` option is recommended for constrained environments.
- **Batch Size**: Adjust the `batch_size` parameter in retrieval to balance between memory usage and processing speed.
- **Context Size**: Setting appropriate `max_context_size` can significantly impact performance, especially when working with limited resources.

## Extending the Library

You can create custom implementations of the base classes to adapt the library to your specific needs:

```python
from efficient_context.compression.base import BaseCompressor

class MyCustomCompressor(BaseCompressor):
    def __init__(self, custom_param=None):
        self.custom_param = custom_param
    
    def compress(self, content, target_size=None):
        # Your custom compression logic here
        return compressed_content
```

## Troubleshooting

**High Memory Usage**
- Reduce `batch_size` in the retriever
- Use a more lightweight embedding model
- Decrease `max_index_size` to limit the number of chunks stored in memory

**Slow Processing**
- Increase `batch_size` (balancing with memory constraints)
- Increase `threshold` in the SemanticDeduplicator to be more aggressive with deduplication
- Reduce `chunk_overlap` to minimize redundant processing

## Example Applications

- **Chatbots on Edge Devices**: Enable context-aware conversations on devices with limited resources
- **Document QA Systems**: Create efficient question-answering systems for large document collections
- **Embedded AI Applications**: Incorporate context-aware LLM capabilities in embedded systems
- **Mobile Applications**: Provide sophisticated LLM features in mobile apps with limited resources