File size: 7,766 Bytes
166ba0a
26884bd
166ba0a
 
 
 
 
 
 
 
 
26884bd
21fd477
f7470ea
21fd477
 
 
310e884
 
f7470ea
 
 
310e884
f7470ea
310e884
f7470ea
 
 
 
 
 
 
 
 
 
 
21fd477
310e884
21fd477
310e884
21fd477
310e884
21fd477
310e884
 
 
 
 
21fd477
310e884
 
 
 
 
 
 
 
 
 
 
 
21fd477
 
310e884
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21fd477
 
310e884
 
 
 
21fd477
310e884
 
 
 
 
f7470ea
310e884
f7470ea
310e884
 
 
 
 
 
 
 
 
 
 
f7470ea
310e884
f7470ea
 
310e884
 
 
f7470ea
 
 
310e884
 
 
f7470ea
310e884
 
f7470ea
310e884
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7470ea
 
 
 
 
310e884
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21fd477
f7470ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
310e884
21fd477
310e884
 
 
 
 
21fd477
 
 
310e884
21fd477
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
---
title: AI Realizability Index
emoji: πŸ“š
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: "latest"
app_file: app.py
pinned: false
---

# AI Realizability Index - AI Paper Evaluation System

A comprehensive system for evaluating AI research papers using advanced language models with asynchronous processing and concurrent evaluation capabilities.

## Features

- **Daily Paper Crawling**: Automatically fetches papers from Hugging Face daily
- **AI Evaluation**: Uses Claude Sonnet to evaluate papers across multiple dimensions
- **Concurrent Processing**: True asynchronous evaluation with multiple papers processed simultaneously
- **Re-evaluation**: Ability to re-run evaluations for papers with updated results
- **Batch Evaluation**: "Evaluate All" feature to process multiple papers at once
- **Interactive Dashboard**: Beautiful web interface for browsing and evaluating papers
- **Asynchronous Database**: High-performance SQLite with WAL mode for concurrent operations
- **Smart Navigation**: Intelligent date navigation with fallback mechanisms
- **Real-time Status Updates**: Live progress tracking and notifications

## Recent Updates

### v0.1.0 - Asynchronous & Concurrent Features
- **Asynchronous Database**: Migrated from `sqlite3` to `aiosqlite` for better performance
- **Concurrent Evaluation**: Multiple papers can be evaluated simultaneously
- **Re-evaluation**: Added "Re-evaluate" button for papers to update evaluation results
- **Batch Processing**: "Evaluate All" button to process all un-evaluated papers
- **Enhanced UI**: Improved progress indicators and real-time notifications
- **Database Optimization**: WAL mode and performance pragmas for better concurrency

## Hugging Face Spaces Deployment

This application is configured for deployment on Hugging Face Spaces.

### Configuration

- **Port**: 7860 (Hugging Face Spaces standard)
- **Health Check**: `/api/health` endpoint
- **Docker**: Optimized Dockerfile for containerized deployment

### Deployment Steps

1. **Fork/Clone** this repository to your Hugging Face account
2. **Create a new Space** on Hugging Face
3. **Select Docker** as the SDK
4. **Set Environment Variables**:
   - `ANTHROPIC_API_KEY`: Your Anthropic API key for Claude access
5. **Deploy**: The Space will automatically build and deploy

### Environment Variables

```bash
ANTHROPIC_API_KEY=your_api_key_here
PORT=7860  # Optional, defaults to 7860
```

## Local Development

### Prerequisites

- Python 3.9+
- Anthropic API key

### Installation

1. **Clone the repository**:
   ```bash
   git clone <repository-url>
   cd paperindex
   ```

2. **Install dependencies**:
   ```bash
   pip install -r requirements.txt
   ```

3. **Set environment variables**:
   ```bash
   export ANTHROPIC_API_KEY=your_api_key_here
   ```

4. **Run the application**:
   ```bash
   python app.py
   ```

5. **Access the application**:
   - Main interface: http://localhost:7860
   - API documentation: http://localhost:7860/docs

## API Endpoints

### Core Endpoints

- `GET /api/daily` - Get daily papers with smart navigation
- `GET /api/paper/{paper_id}` - Get paper details
- `GET /api/eval/{paper_id}` - Get paper evaluation
- `GET /api/health` - Health check endpoint

### Evaluation Endpoints

- `POST /api/papers/evaluate/{arxiv_id}` - Start paper evaluation
- `POST /api/papers/reevaluate/{arxiv_id}` - Re-evaluate a paper
- `GET /api/papers/evaluate/{arxiv_id}/status` - Get evaluation status
- `GET /api/papers/evaluate/active-tasks` - Get currently running evaluations

### Cache Management

- `GET /api/cache/status` - Get cache statistics
- `POST /api/cache/clear` - Clear all cached data
- `POST /api/cache/refresh/{date}` - Refresh cache for specific date

## Architecture

### Frontend
- **HTML/CSS/JavaScript**: Modern, responsive interface
- **Real-time Updates**: Dynamic content loading with polling
- **Theme Support**: Light/dark mode toggle
- **Progress Indicators**: Visual feedback for evaluation status
- **Batch Operations**: "Evaluate All" functionality with sequential processing

### Backend
- **FastAPI**: High-performance web framework
- **Async SQLite**: `aiosqlite` with WAL mode for concurrent operations
- **Async Processing**: Background evaluation tasks with task tracking
- **Concurrent Evaluation**: Multiple papers evaluated simultaneously
- **Caching**: Intelligent caching system for performance

### AI Integration
- **Async Anthropic**: Non-blocking API calls with `AsyncAnthropic`
- **Multi-dimensional Analysis**: Comprehensive evaluation criteria
- **Structured Output**: JSON-based evaluation results
- **Error Handling**: Robust error handling and retry mechanisms

## Database Schema

### Papers Table
```sql
CREATE TABLE papers (
    arxiv_id TEXT PRIMARY KEY,
    title TEXT NOT NULL,
    authors TEXT NOT NULL,
    abstract TEXT,
    categories TEXT,
    published_date TEXT,
    evaluation_content TEXT,
    evaluation_score REAL,
    overall_score REAL,
    evaluation_tags TEXT,
    evaluation_status TEXT DEFAULT 'not_started',
    is_evaluated BOOLEAN DEFAULT FALSE,
    evaluation_date TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

### Database Optimizations
- **WAL Mode**: `PRAGMA journal_mode=WAL` for better concurrency
- **Performance Pragmas**: Optimized settings for concurrent access
- **Asynchronous Operations**: All database calls are async/await

## Evaluation Dimensions

The system evaluates papers across 12 key dimensions:

1. **Task Formalization** - Clarity of problem definition
2. **Data & Resource Availability** - Access to required data
3. **Input-Output Complexity** - Complexity of inputs/outputs
4. **Real-World Interaction** - Practical applicability
5. **Existing AI Coverage** - Current AI capabilities
6. **Automation Barriers** - Technical challenges
7. **Human Originality** - Creative contribution
8. **Safety & Ethics** - Responsible AI considerations
9. **Societal/Economic Impact** - Broader implications
10. **Technical Maturity Needed** - Development requirements
11. **3-Year Feasibility** - Short-term potential
12. **Overall Automatability** - Comprehensive assessment

## Key Features

### Concurrent Evaluation
- Multiple papers can be evaluated simultaneously
- Global task tracking prevents duplicate evaluations
- Real-time status updates via polling
- Automatic error handling and recovery

### Re-evaluation System
- "Re-evaluate" button appears after initial evaluation
- Updates existing evaluation results in database
- Maintains evaluation history and timestamps
- Same comprehensive evaluation criteria

### Batch Processing
- "Evaluate All" button processes all un-evaluated papers
- Sequential processing with delays to prevent API overload
- Progress tracking and real-time notifications
- Automatic button state management

### Enhanced UI/UX
- Progress circles with proper layering
- Bottom-right notification system
- Dynamic button states and text updates
- Responsive design with modern styling

## Performance Optimizations

### Database
- Asynchronous operations with `aiosqlite`
- WAL mode for better concurrency
- Optimized SQLite pragmas
- Connection pooling and management

### API Calls
- Non-blocking Anthropic API calls
- Concurrent evaluation processing
- Task tracking and management
- Error handling and retry logic

### Frontend
- Efficient DOM manipulation
- Polling with appropriate intervals
- Memory management for log entries
- Optimized event handling

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## License

This project is licensed under the MIT License - see the LICENSE file for details.