Spaces:

zwt963
/

paperindex

Running

File size: 7,766 Bytes

---
title: AI Realizability Index
emoji: 📚
colorFrom: blue
colorTo: purple
sdk: docker
sdk_version: "latest"
app_file: app.py
pinned: false
---

# AI Realizability Index - AI Paper Evaluation System

A comprehensive system for evaluating AI research papers using advanced language models with asynchronous processing and concurrent evaluation capabilities.

## Features

- **Daily Paper Crawling**: Automatically fetches papers from Hugging Face daily
- **AI Evaluation**: Uses Claude Sonnet to evaluate papers across multiple dimensions
- **Concurrent Processing**: True asynchronous evaluation with multiple papers processed simultaneously
- **Re-evaluation**: Ability to re-run evaluations for papers with updated results
- **Batch Evaluation**: "Evaluate All" feature to process multiple papers at once
- **Interactive Dashboard**: Beautiful web interface for browsing and evaluating papers
- **Asynchronous Database**: High-performance SQLite with WAL mode for concurrent operations
- **Smart Navigation**: Intelligent date navigation with fallback mechanisms
- **Real-time Status Updates**: Live progress tracking and notifications

## Recent Updates

### v0.1.0 - Asynchronous & Concurrent Features
- **Asynchronous Database**: Migrated from `sqlite3` to `aiosqlite` for better performance
- **Concurrent Evaluation**: Multiple papers can be evaluated simultaneously
- **Re-evaluation**: Added "Re-evaluate" button for papers to update evaluation results
- **Batch Processing**: "Evaluate All" button to process all un-evaluated papers
- **Enhanced UI**: Improved progress indicators and real-time notifications
- **Database Optimization**: WAL mode and performance pragmas for better concurrency

## Hugging Face Spaces Deployment

This application is configured for deployment on Hugging Face Spaces.

### Configuration

- **Port**: 7860 (Hugging Face Spaces standard)
- **Health Check**: `/api/health` endpoint
- **Docker**: Optimized Dockerfile for containerized deployment

### Deployment Steps

1. **Fork/Clone** this repository to your Hugging Face account
2. **Create a new Space** on Hugging Face
3. **Select Docker** as the SDK
4. **Set Environment Variables**:
   - `ANTHROPIC_API_KEY`: Your Anthropic API key for Claude access
5. **Deploy**: The Space will automatically build and deploy

### Environment Variables

```bash
ANTHROPIC_API_KEY=your_api_key_here
PORT=7860  # Optional, defaults to 7860
```

## Local Development

### Prerequisites

- Python 3.9+
- Anthropic API key

### Installation

1. **Clone the repository**:
   ```bash
   git clone <repository-url>
   cd paperindex
   ```

2. **Install dependencies**:
   ```bash
   pip install -r requirements.txt
   ```

3. **Set environment variables**:
   ```bash
   export ANTHROPIC_API_KEY=your_api_key_here
   ```

4. **Run the application**:
   ```bash
   python app.py
   ```

5. **Access the application**:
   - Main interface: http://localhost:7860
   - API documentation: http://localhost:7860/docs

## API Endpoints

### Core Endpoints

- `GET /api/daily` - Get daily papers with smart navigation
- `GET /api/paper/{paper_id}` - Get paper details
- `GET /api/eval/{paper_id}` - Get paper evaluation
- `GET /api/health` - Health check endpoint

### Evaluation Endpoints

- `POST /api/papers/evaluate/{arxiv_id}` - Start paper evaluation
- `POST /api/papers/reevaluate/{arxiv_id}` - Re-evaluate a paper
- `GET /api/papers/evaluate/{arxiv_id}/status` - Get evaluation status
- `GET /api/papers/evaluate/active-tasks` - Get currently running evaluations

### Cache Management

- `GET /api/cache/status` - Get cache statistics
- `POST /api/cache/clear` - Clear all cached data
- `POST /api/cache/refresh/{date}` - Refresh cache for specific date

## Architecture

### Frontend
- **HTML/CSS/JavaScript**: Modern, responsive interface
- **Real-time Updates**: Dynamic content loading with polling
- **Theme Support**: Light/dark mode toggle
- **Progress Indicators**: Visual feedback for evaluation status
- **Batch Operations**: "Evaluate All" functionality with sequential processing

### Backend
- **FastAPI**: High-performance web framework
- **Async SQLite**: `aiosqlite` with WAL mode for concurrent operations
- **Async Processing**: Background evaluation tasks with task tracking
- **Concurrent Evaluation**: Multiple papers evaluated simultaneously
- **Caching**: Intelligent caching system for performance

### AI Integration
- **Async Anthropic**: Non-blocking API calls with `AsyncAnthropic`
- **Multi-dimensional Analysis**: Comprehensive evaluation criteria
- **Structured Output**: JSON-based evaluation results
- **Error Handling**: Robust error handling and retry mechanisms

## Database Schema

### Papers Table
```sql
CREATE TABLE papers (
    arxiv_id TEXT PRIMARY KEY,
    title TEXT NOT NULL,
    authors TEXT NOT NULL,
    abstract TEXT,
    categories TEXT,
    published_date TEXT,
    evaluation_content TEXT,
    evaluation_score REAL,
    overall_score REAL,
    evaluation_tags TEXT,
    evaluation_status TEXT DEFAULT 'not_started',
    is_evaluated BOOLEAN DEFAULT FALSE,
    evaluation_date TIMESTAMP,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```

### Database Optimizations
- **WAL Mode**: `PRAGMA journal_mode=WAL` for better concurrency
- **Performance Pragmas**: Optimized settings for concurrent access
- **Asynchronous Operations**: All database calls are async/await

## Evaluation Dimensions

The system evaluates papers across 12 key dimensions:

1. **Task Formalization** - Clarity of problem definition
2. **Data & Resource Availability** - Access to required data
3. **Input-Output Complexity** - Complexity of inputs/outputs
4. **Real-World Interaction** - Practical applicability
5. **Existing AI Coverage** - Current AI capabilities
6. **Automation Barriers** - Technical challenges
7. **Human Originality** - Creative contribution
8. **Safety & Ethics** - Responsible AI considerations
9. **Societal/Economic Impact** - Broader implications
10. **Technical Maturity Needed** - Development requirements
11. **3-Year Feasibility** - Short-term potential
12. **Overall Automatability** - Comprehensive assessment

## Key Features

### Concurrent Evaluation
- Multiple papers can be evaluated simultaneously
- Global task tracking prevents duplicate evaluations
- Real-time status updates via polling
- Automatic error handling and recovery

### Re-evaluation System
- "Re-evaluate" button appears after initial evaluation
- Updates existing evaluation results in database
- Maintains evaluation history and timestamps
- Same comprehensive evaluation criteria

### Batch Processing
- "Evaluate All" button processes all un-evaluated papers
- Sequential processing with delays to prevent API overload
- Progress tracking and real-time notifications
- Automatic button state management

### Enhanced UI/UX
- Progress circles with proper layering
- Bottom-right notification system
- Dynamic button states and text updates
- Responsive design with modern styling

## Performance Optimizations

### Database
- Asynchronous operations with `aiosqlite`
- WAL mode for better concurrency
- Optimized SQLite pragmas
- Connection pooling and management

### API Calls
- Non-blocking Anthropic API calls
- Concurrent evaluation processing
- Task tracking and management
- Error handling and retry logic

### Frontend
- Efficient DOM manipulation
- Polling with appropriate intervals
- Memory management for log entries
- Optimized event handling

## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests if applicable
5. Submit a pull request

## License

This project is licensed under the MIT License - see the LICENSE file for details.