Spaces:

zwt963
/

paperindex

Sleeping

App Files Files Community

DVampire commited on Aug 15

Commit

78f6650

1 Parent(s): 6c5cf21

update website

Browse files

Files changed (15) hide show

DATABASE_MIGRATION_SUMMARY.md +0 -147
DATABASE_USAGE.md +0 -182
PROJECT_STRUCTURE.md +0 -87
app.py +173 -75
frontend/index.html +10 -4
frontend/main.js +250 -15
frontend/paper.js +22 -2
frontend/styles.css +150 -2
requirements.txt +1 -0
src/agents/evaluator.py +5 -5
src/database/db.py +101 -92
debug_comparison.py → test/debug_comparison.py +0 -0
test/test_async_db.py +138 -0
test/test_concurrent_eval.py +97 -0
test_evaluation.py → test/test_evaluation.py +7 -7

DATABASE_MIGRATION_SUMMARY.md DELETED Viewed

@@ -1,147 +0,0 @@
-# 数据库迁移完成总结
-## 概述
-已成功将系统从JSON文件存储迁移到SQLite数据库存储，现在每篇arXiv文章的评价内容都存储在数据库中，支持更好的数据管理和查询功能。
-## 主要修改
-### 1. 数据库结构 (`src/database/db.py`)
-**新增 papers 表：**
-- `arxiv_id`: 论文唯一标识
-- `title`, `authors`, `abstract`: 论文基本信息
-- `evaluation_content`: 评价内容（JSON格式）
-- `evaluation_score`: 总体自动化评分
-- `evaluation_tags`: 评价标签
-- `is_evaluated`: 评价状态标记
-- `evaluation_date`: 评价时间
-- `created_at`, `updated_at`: 时间戳
-**新增数据库方法：**
-- `insert_paper()`: 插入新论文
-- `get_paper()`: 获取单个论文
-- `update_paper_evaluation()`: 更新评价内容
-- `get_evaluated_papers()`: 获取已评价论文
-- `get_unevaluated_papers()`: 获取未评价论文
-- `search_papers()`: 搜索论文
-- `get_papers_count()`: 获取统计信息
-### 2. 评价器修改 (`src/agents/evaluator.py`)
-**ConversationState 类：**
-- 添加 `arxiv_id` 字段
-**save_node 函数：**
-- 改为保存到数据库而不是JSON文件
-- 自动提取评分和标签信息
-- 支持结构化数据存储
-**run_evaluation 函数：**
-- 添加 `arxiv_id` 参数支持
-### 3. API接口修改 (`app.py`)
-**修改的接口：**
-- `/api/evals`: 从数据库获取评价列表
-- `/api/has-eval/{paper_id}`: 检查数据库中的评价状态
-- `/api/eval/{paper_id}`: 从数据库获取评价内容
-**新增接口：**
-- `/api/papers/status`: 获取论文统计信息
-- `/api/papers/insert`: 插入新论文
-- `/api/papers/evaluate/{arxiv_id}`: 评价论文
-### 4. CLI工具修改 (`src/cli/cli.py`)
-**新增参数：**
-- `--arxiv-id`: 指定论文的arXiv ID
-**功能增强：**
-- 支持将评价结果保存到数据库
-- 保持向后兼容性（仍可保存到文件）
-## 使用示例
-### 1. 使用CLI评价论文并保存到数据库
-```bash
-# 评价论文并保存到数据库
-python cli.py https://arxiv.org/pdf/2508.05629 --arxiv-id 2508.05629
-# 同时保存到文件和数据库
-python cli.py https://arxiv.org/pdf/2508.05629 --arxiv-id 2508.05629 -o /path/to/output
-```
-### 2. 使用API插入论文
-```bash
-curl -X POST "http://localhost:8000/api/papers/insert" \
-  -H "Content-Type: application/json" \
-  -d '{
-    "arxiv_id": "2508.05629",
-    "title": "Your Paper Title",
-    "authors": "Author 1, Author 2",
-    "abstract": "Paper abstract...",
-    "categories": "cs.AI, cs.LG",
-    "published_date": "2024-08-01"
-  }'
-```
-### 3. 获取评价统计
-```bash
-curl "http://localhost:8000/api/papers/status"
-```
-## 数据库优势
-1. **结构化存储**: 论文信息和评价内容分离，便于管理
-2. **状态跟踪**: 通过 `is_evaluated` 字段跟踪评价状态
-3. **标签系统**: 支持为评价添加标签，便于分类筛选
-4. **搜索功能**: 支持按标题、作者、摘要搜索
-5. **统计功能**: 轻松获取论文统计信息
-6. **API支持**: 完整的RESTful API接口
-7. **数据完整性**: SQLite提供ACID特性
-## 迁移注意事项
-1. **现有JSON文件**: 可以编写脚本将现有JSON文件导入数据库
-2. **数据库备份**: 建议定期备份数据库文件
-3. **向后兼容**: CLI工具仍支持保存到文件，保持兼容性
-4. **配置路径**: 数据库文件路径在 `configs/paper_agent.py` 中配置
-## 测试验证
-已创建并运行测试脚本验证所有数据库功能：
-- ✅ 论文插入
-- ✅ 论文查询
-- ✅ 评价更新
-- ✅ 状态检查
-- ✅ 统计功能
-- ✅ 搜索功能
-## 下一步建议
-1. **数据迁移**: 编写脚本将现有JSON文件导入数据库
-2. **前端更新**: 更新前端界面以支持新的数据库功能
-3. **批量操作**: 添加批量论文插入和评价功能
-4. **数据导出**: 添加数据导出功能
-5. **性能优化**: 为大量数据添加索引优化
-## 文件清单
-**修改的文件：**
-- `src/database/db.py` - 数据库结构和操作
-- `src/agents/evaluator.py` - 评价器修改
-- `app.py` - API接口修改
-- `src/cli/cli.py` - CLI工具修改
-**新增的文件：**
-- `DATABASE_USAGE.md` - 使用说明文档
-- `DATABASE_MIGRATION_SUMMARY.md` - 本总结文档
-**配置文件：**
-- `configs/paper_agent.py` - 数据库路径配置
-现在系统已经完全支持数据库存储，可以更好地管理论文评价数据！

DATABASE_USAGE.md DELETED Viewed

@@ -1,182 +0,0 @@
-# Papers Database 使用说明
-## 概述
-现在系统已经支持将arXiv文章和评价内容存储到SQLite数据库中，而不是保存在JSON文件中。这样可以更好地管理论文数据，支持查询、统计和标签管理。
-## 数据库结构
-### papers 表
-| 字段 | 类型 | 说明 |
-|------|------|------|
-| arxiv_id | TEXT PRIMARY KEY | arXiv论文ID |
-| title | TEXT NOT NULL | 论文标题 |
-| authors | TEXT NOT NULL | 作者列表 |
-| abstract | TEXT | 论文摘要 |
-| categories | TEXT | 论文分类 |
-| published_date | TEXT | 发布日期 |
-| evaluation_content | TEXT | 评价内容（JSON格式） |
-| evaluation_score | REAL | 总体自动化评分 |
-| evaluation_tags | TEXT | 评价标签 |
-| is_evaluated | BOOLEAN | 是否已评价 |
-| evaluation_date | TIMESTAMP | 评价日期 |
-| created_at | TIMESTAMP | 创建时间 |
-| updated_at | TIMESTAMP | 更新时间 |
-## 使用方法
-### 1. 插入论文
-```python
-from src.database.db import db
-# 插入新论文
-db.insert_paper(
-    arxiv_id="2508.05629",
-    title="Your Paper Title",
-    authors="Author 1, Author 2",
-    abstract="Paper abstract...",
-    categories="cs.AI, cs.LG",
-    published_date="2024-08-01"
-)
-```
-### 2. 更新评价
-```python
-# 更新论文评价
-db.update_paper_evaluation(
-    arxiv_id="2508.05629",
-    evaluation_content='{"overall_automatability": 3, "three_year_feasibility": 75}',
-    evaluation_score=3.0,
-    evaluation_tags="3yr_feasibility:75%,overall_automatability:3/4"
-)
-```
-### 3. 查询论文
-```python
-# 获取单个论文
-paper = db.get_paper("2508.05629")
-# 获取所有已评价的论文
-evaluated_papers = db.get_evaluated_papers()
-# 获取所有未评价的论文
-unevaluated_papers = db.get_unevaluated_papers()
-# 搜索论文
-search_results = db.search_papers("AI")
-```
-### 4. 统计信息
-```python
-# 获取论文统计
-count = db.get_papers_count()
-print(f"总论文数: {count['total']}")
-print(f"已评价: {count['evaluated']}")
-print(f"未评价: {count['unevaluated']}")
-```
-## API 接口
-### 获取评价列表
-```
-GET /api/evals
-```
-### 检查论文是否已评价
-```
-GET /api/has-eval/{paper_id}
-```
-### 获取论文评价
-```
-GET /api/eval/{paper_id}
-```
-### 获取论文统计
-```
-GET /api/papers/status
-```
-### 插入新论文
-```
-POST /api/papers/insert
-Content-Type: application/json
-{
-    "arxiv_id": "2508.05629",
-    "title": "Paper Title",
-    "authors": "Author 1, Author 2",
-    "abstract": "Abstract...",
-    "categories": "cs.AI",
-    "published_date": "2024-08-01"
-}
-```
-### 评价论文
-```
-POST /api/papers/evaluate/{arxiv_id}
-```
-## CLI 工具使用
-### 评价论文并保存到数据库
-```bash
-# 使用arxiv_id参数将评价保存到数据库
-python cli.py https://arxiv.org/pdf/2508.05629 --arxiv-id 2508.05629
-# 同时保存到文件和数据库
-python cli.py https://arxiv.org/pdf/2508.05629 --arxiv-id 2508.05629 -o /path/to/output
-```
-## 迁移现有数据
-如果你有现有的JSON评价文件，可以编写脚本将它们导入到数据库中：
-```python
-import json
-import os
-from src.database.db import db
-def migrate_json_to_db(json_dir="workdir"):
-    """将JSON文件迁移到数据库"""
-    for filename in os.listdir(json_dir):
-        if filename.endswith('.json'):
-            filepath = os.path.join(json_dir, filename)
-            with open(filepath, 'r') as f:
-                data = json.load(f)
-            # 提取arxiv_id（假设文件名包含arxiv_id）
-            arxiv_id = filename.split('_')[0]  # 根据实际文件名格式调整
-            # 更新数据库中的评价
-            if 'response' in data:
-                db.update_paper_evaluation(
-                    arxiv_id=arxiv_id,
-                    evaluation_content=data['response'],
-                    evaluation_score=None,  # 需要从内容中解析
-                    evaluation_tags=None
-                )
-                print(f"Migrated {filename} for paper {arxiv_id}")
-```
-## 优势
-1. **结构化存储**: 论文信息和评价内容分开存储，便于查询
-2. **标签系统**: 支持为评价添加标签，便于分类和筛选
-3. **统计功能**: 可以轻松获取论文统计信息
-4. **搜索功能**: 支持按标题、作者、摘要搜索论文
-5. **状态管理**: 通过`is_evaluated`字段跟踪评价状态
-6. **API支持**: 提供完整的RESTful API接口
-## 注意事项
-1. 确保在评价论文前先插入论文基本信息
-2. 评价内容建议使用JSON格式，便于解析和展示
-3. 定期备份数据库文件
-4. 可以使用`evaluation_tags`字段存储关键评分信息，便于快速筛选

PROJECT_STRUCTURE.md DELETED Viewed

@@ -1,87 +0,0 @@
-# PaperIndex 项目结构
-## 目录组织
-```
-paperindex/
-├── app.py                 # 主应用程序入口点
-├── cli.py                 # 命令行工具入口点
-├── src/                   # 源代码目录
-│   ├── __init__.py
-│   ├── app.py            # 内部应用入口（已废弃）
-│   ├── agents/           # AI 代理模块
-│   │   ├── __init__.py
-│   │   ├── evaluator.py  # 论文评估器
-│   │   └── prompt.py     # 评估提示词
-│   ├── database/         # 数据库模块
-│   │   ├── __init__.py
-│   │   ├── models.py     # 数据库模型和类
-│   │   └── papers_cache.db
-│   ├── server/           # 服务器模块
-│   │   ├── __init__.py
-│   │   └── server.py     # FastAPI 服务器
-│   └── cli/              # 命令行工具模块
-│       ├── __init__.py
-│       └── cli.py        # CLI 实现
-├── frontend/             # 前端文件
-│   ├── index.html
-│   ├── paper.html
-│   ├── main.js
-│   ├── paper.js
-│   └── styles.css
-├── data/                 # 数据目录
-│   └── pdfs/
-├── workdir/              # 工作目录
-├── requirements.txt      # Python 依赖
-├── Dockerfile           # Docker 配置
-└── README.md            # 项目说明
-```
-## 模块说明
-### `src/agents/`
-AI 代理模块，负责论文评估功能：
-- `evaluator.py`: 使用 LangGraph 和 Claude API 进行论文评估
-- `prompt.py`: 包含评估提示词和工具定义
-### `src/database/`
-数据库管理模块：
-- `models.py`: 包含 PapersDatabase 类和数据库操作
-- 包含 SQLite 数据库文件
-- 负责论文缓存和状态管理
-### `src/server/`
-FastAPI 服务器模块：
-- `server.py`: 主要的 Web 服务器实现
-- 提供 RESTful API 接口
-- 处理前端请求
-### `src/cli/`
-命令行工具模块：
-- `cli.py`: 独立的论文评估命令行工具
-- 支持本地 PDF 和在线 URL 评估
-## 使用方法
-### 启动 Web 应用
-```bash
-python app.py
-```
-### 使用命令行工具
-```bash
-python cli.py <pdf_path_or_url> [options]
-```
-### 开发模式
-```bash
-# 在 src 目录下运行
-cd src
-python -m uvicorn server.server:app --reload --host 0.0.0.0 --port 8000
-```
-## 导入路径
-- 从根目录导入：`from src.agents.evaluator import Evaluator`
-- 在 src 目录内导入：`from agents.evaluator import Evaluator`
-- 模块间导入使用相对路径或绝对路径

app.py CHANGED Viewed

@@ -25,7 +25,6 @@ from src.database import db
 from src.logger import logger
 from src.config import config
 from src.crawl import HuggingFaceDailyPapers
-from src.utils import assemble_project_path
 from src.agents.evaluator import run_evaluation
 app = FastAPI(title="PaperAgent")
@@ -67,8 +66,8 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
     hf_daily = HuggingFaceDailyPapers()
     # First, check if we have fresh cache for the requested date
-    cached_data = db.get_cached_papers(target_date)
-    if cached_data and db.is_cache_fresh(target_date):
         print(f"Using cached data for {target_date}")
         return {
             "date": target_date,
@@ -91,8 +90,8 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
                 print(f"Redirected from {target_date} to {actual_date}")
                 # Check if the redirected date has fresh cache
-                cached_data = db.get_cached_papers(actual_date)
-                if cached_data and db.is_cache_fresh(actual_date):
                     print(f"Using cached data for redirected date {actual_date}")
                     return {
                         "date": actual_date,
@@ -108,7 +107,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
                 enriched_cards = await enrich_cards(cards)
                 # Cache the results for the redirected date
-                db.cache_papers(actual_date, html, enriched_cards)
                 return {
                     "date": actual_date,
@@ -121,7 +120,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
             # If we got the exact date we requested, process normally
             cards = hf_daily.parse_daily_cards(html)
             enriched_cards = await enrich_cards(cards)
-            db.cache_papers(actual_date, html, enriched_cards)
             return {
                 "date": actual_date,
@@ -134,7 +133,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
         except Exception as e:
             print(f"Failed to fetch {target_date} for previous navigation: {e}")
             # Fallback to cached data if available
-            cached_data = db.get_cached_papers(target_date)
             if cached_data:
                 return {
                     "date": target_date,
@@ -157,7 +156,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
             if actual_date == target_date:
                 cards = hf_daily.parse_daily_cards(html)
                 enriched_cards = await enrich_cards(cards)
-                db.cache_papers(actual_date, html, enriched_cards)
                 return {
                     "date": actual_date,
@@ -174,8 +173,8 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
             # Try to find the next available date by incrementing
             next_date = await find_next_available_date_forward(target_date)
             if next_date:
-                cached_data = db.get_cached_papers(next_date)
-                if cached_data and db.is_cache_fresh(next_date):
                     print(f"Using cached data for next available date {next_date}")
                     return {
                         "date": next_date,
@@ -190,7 +189,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
                 actual_date, html = await hf_daily.fetch_daily_html(next_date)
                 cards = hf_daily.parse_daily_cards(html)
                 enriched_cards = await enrich_cards(cards)
-                db.cache_papers(actual_date, html, enriched_cards)
                 return {
                     "date": actual_date,
@@ -214,7 +213,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
             # Try to find next available date
             next_date = await find_next_available_date_forward(target_date)
             if next_date:
-                cached_data = db.get_cached_papers(next_date)
                 if cached_data:
                     return {
                         "date": next_date,
@@ -239,8 +238,8 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
                 print(f"Redirected from {target_date} to {actual_date}")
                 # Check if the redirected date has fresh cache
-                cached_data = db.get_cached_papers(actual_date)
-                if cached_data and db.is_cache_fresh(actual_date):
                     print(f"Using cached data for redirected date {actual_date}")
                     return {
                         "date": actual_date,
@@ -256,7 +255,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
                 enriched_cards = await enrich_cards(cards)
                 # Cache the results for the redirected date
-                db.cache_papers(actual_date, html, enriched_cards)
                 return {
                     "date": actual_date,
@@ -269,7 +268,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
             # If we got the exact date we requested, process normally
             cards = hf_daily.parse_daily_cards(html)
             enriched_cards = await enrich_cards(cards)
-            db.cache_papers(actual_date, html, enriched_cards)
             return {
                 "date": actual_date,
@@ -283,7 +282,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
             print(f"Failed to fetch {target_date}: {e}")
             # If everything fails, return cached data if available
-            cached_data = db.get_cached_papers(target_date)
             if cached_data:
                 return {
                     "date": target_date,
@@ -309,7 +308,7 @@ async def find_next_available_date_forward(start_date: str, max_attempts: int =
         date_str = current_date.strftime("%Y-%m-%d")
         # Check if we have cache for this date
-        cached_data = db.get_cached_papers(date_str)
         if cached_data:
             return date_str
@@ -338,7 +337,7 @@ async def enrich_cards(cards):
     for c in cards:
         arxiv_id = c.get("arxiv_id")
         if arxiv_id:
-            paper = db.get_paper(arxiv_id)
             if paper:
                 # Add evaluation status
                 c["has_eval"] = paper.get('is_evaluated', False)
@@ -369,9 +368,9 @@ async def enrich_cards(cards):
 @app.get("/api/evals")
-def list_evals() -> Dict[str, Any]:
     # Get evaluated papers from database
-    evaluated_papers = db.get_evaluated_papers()
     items: List[Dict[str, Any]] = []
     for paper in evaluated_papers:
@@ -388,16 +387,16 @@ def list_evals() -> Dict[str, Any]:
 @app.get("/api/has-eval/{paper_id}")
-def has_eval(paper_id: str) -> Dict[str, bool]:
-    paper = db.get_paper(paper_id)
     exists = paper is not None and paper.get('is_evaluated', False)
     return {"exists": exists}
 @app.get("/api/paper/{paper_id}")
-def get_paper_details(paper_id: str) -> Dict[str, Any]:
     """Get detailed paper information from database"""
-    paper = db.get_paper(paper_id)
     if not paper:
         raise HTTPException(status_code=404, detail="Paper not found")
@@ -416,8 +415,8 @@ def get_paper_details(paper_id: str) -> Dict[str, Any]:
 @app.get("/api/paper-score/{paper_id}")
-def get_paper_score(paper_id: str) -> Dict[str, Any]:
-    paper = db.get_paper(paper_id)
     print(f"Paper data for {paper_id}:", paper)
     if not paper or not paper.get('is_evaluated', False):
@@ -468,8 +467,8 @@ def get_paper_score(paper_id: str) -> Dict[str, Any]:
 @app.get("/api/eval/{paper_id}")
-def get_eval(paper_id: str) -> Any:
-    paper = db.get_paper(paper_id)
     if not paper or not paper.get('is_evaluated', False):
         raise HTTPException(status_code=404, detail="Evaluation not found")
@@ -491,12 +490,13 @@ def get_eval(paper_id: str) -> Any:
 @app.get("/api/available-dates")
-def get_available_dates() -> Dict[str, Any]:
     """Get list of available dates in the cache"""
-    with db.get_connection() as conn:
-        cursor = conn.cursor()
-        cursor.execute('SELECT date_str FROM papers_cache ORDER BY date_str DESC LIMIT 30')
-        dates = [row['date_str'] for row in cursor.fetchall()]
         return {
             "available_dates": dates,
@@ -505,21 +505,21 @@ def get_available_dates() -> Dict[str, Any]:
 @app.get("/api/cache/status")
-def get_cache_status() -> Dict[str, Any]:
     """Get cache status and statistics"""
-    with db.get_connection() as conn:
-        cursor = conn.cursor()
         # Get total cached dates
-        cursor.execute('SELECT COUNT(*) as count FROM papers_cache')
-        total_cached = cursor.fetchone()['count']
         # Get latest cached date
-        cursor.execute('SELECT date_str, updated_at FROM latest_date WHERE id = 1')
-        latest_info = cursor.fetchone()
         # Get cache age distribution
-        cursor.execute('''
             SELECT
                 CASE
                     WHEN updated_at > datetime('now', '-1 hour') THEN '1 hour'
@@ -531,7 +531,8 @@ def get_cache_status() -> Dict[str, Any]:
             FROM papers_cache
             GROUP BY age_group
         ''')
-        age_distribution = {row['age_group']: row['count'] for row in cursor.fetchall()}
         return {
             "total_cached_dates": total_cached,
@@ -542,12 +543,12 @@ def get_cache_status() -> Dict[str, Any]:
 @app.get("/api/papers/status")
-def get_papers_status() -> Dict[str, Any]:
     """Get papers database status and statistics"""
-    papers_count = db.get_papers_count()
     # Get recent evaluations
-    recent_papers = db.get_evaluated_papers()
     recent_evaluations = []
     for paper in recent_papers[:10]:  # Get last 10 evaluations
         recent_evaluations.append({
@@ -564,7 +565,7 @@ def get_papers_status() -> Dict[str, Any]:
 @app.post("/api/papers/insert")
-def insert_paper(paper_data: Dict[str, Any]) -> Dict[str, Any]:
     """Insert a new paper into the database"""
     try:
         required_fields = ['arxiv_id', 'title', 'authors']
@@ -572,7 +573,7 @@ def insert_paper(paper_data: Dict[str, Any]) -> Dict[str, Any]:
             if field not in paper_data:
                 raise HTTPException(status_code=400, detail=f"Missing required field: {field}")
-        db.insert_paper(
             arxiv_id=paper_data['arxiv_id'],
             title=paper_data['title'],
             authors=paper_data['authors'],
@@ -586,19 +587,26 @@ def insert_paper(paper_data: Dict[str, Any]) -> Dict[str, Any]:
         raise HTTPException(status_code=500, detail=f"Failed to insert paper: {str(e)}")
 @app.post("/api/papers/evaluate/{arxiv_id}")
-async def evaluate_paper(arxiv_id: str) -> Dict[str, Any]:
     """Evaluate a paper by its arxiv_id"""
     try:
         # Check if paper exists in database
-        paper = db.get_paper(arxiv_id)
         if not paper:
             raise HTTPException(status_code=404, detail="Paper not found in database")
-        # Check if already evaluated
-        if paper.get('is_evaluated', False):
             return {"message": f"Paper {arxiv_id} already evaluated", "status": "already_evaluated"}
         # Create PDF URL from arxiv_id
         pdf_url = f"https://arxiv.org/pdf/{arxiv_id}.pdf"
@@ -606,8 +614,8 @@ async def evaluate_paper(arxiv_id: str) -> Dict[str, Any]:
         async def run_eval():
             try:
                 # Update paper status to "evaluating"
-                db.update_paper_status(arxiv_id, "evaluating")
-                logger.info(f"Started evaluation for {arxiv_id}")
                 result = await run_evaluation(
                     pdf_path=pdf_url,
@@ -616,40 +624,51 @@ async def evaluate_paper(arxiv_id: str) -> Dict[str, Any]:
                 )
                 # Update paper status to "completed"
-                db.update_paper_status(arxiv_id, "completed")
-                logger.info(f"Evaluation completed for {arxiv_id}")
             except Exception as e:
                 # Update paper status to "failed"
-                db.update_paper_status(arxiv_id, "failed")
-                logger.error(f"Evaluation failed for {arxiv_id}: {str(e)}")
-        # Start evaluation in background
-        asyncio.create_task(run_eval())
         return {
-            "message": f"Evaluation started for paper {arxiv_id}",
             "status": "started",
-            "pdf_url": pdf_url
         }
     except Exception as e:
         raise HTTPException(status_code=500, detail=f"Failed to evaluate paper: {str(e)}")
 @app.get("/api/papers/evaluate/{arxiv_id}/status")
-def get_evaluation_status(arxiv_id: str) -> Dict[str, Any]:
     """Get evaluation status for a paper"""
     try:
-        paper = db.get_paper(arxiv_id)
         if not paper:
             raise HTTPException(status_code=404, detail="Paper not found")
         status = paper.get('evaluation_status', 'not_started')
         is_evaluated = paper.get('is_evaluated', False)
         return {
             "arxiv_id": arxiv_id,
             "status": status,
             "is_evaluated": is_evaluated,
             "evaluation_date": paper.get('evaluation_date'),
             "evaluation_score": paper.get('evaluation_score')
         }
@@ -657,13 +676,88 @@ def get_evaluation_status(arxiv_id: str) -> Dict[str, Any]:
         raise HTTPException(status_code=500, detail=f"Failed to get evaluation status: {str(e)}")
 @app.post("/api/cache/clear")
-def clear_cache() -> Dict[str, str]:
     """Clear all cached data"""
-    with db.get_connection() as conn:
-        cursor = conn.cursor()
-        cursor.execute('DELETE FROM papers_cache')
-        conn.commit()
     return {"message": "Cache cleared successfully"}
@@ -679,7 +773,7 @@ async def refresh_cache(date_str: str) -> Dict[str, Any]:
         cards = hf_daily.parse_daily_cards(html)
         # Cache the results
-        db.cache_papers(actual_date, html, cards)
         return {
             "message": f"Cache refreshed for {actual_date}",
@@ -711,7 +805,7 @@ async def get_styles():
     response.headers["Expires"] = "0"
     return response
-if __name__ == "__main__":
     # Parse command line arguments
     args = parse_args()
@@ -724,7 +818,7 @@ if __name__ == "__main__":
     logger.info(f"| Config:\n{config.pretty_text}")
     # Initialize the database
-    db.init_db(config=config)
     logger.info(f"| Database initialized at: {config.db_path}")
     # Load Frontend
@@ -733,5 +827,9 @@ if __name__ == "__main__":
     logger.info(f"| Frontend initialized at: {config.frontend_path}")
     # Use port 7860 for Hugging Face Spaces, fallback to 7860 for local development
-    port = int(os.environ.get("PORT", 7860))
-    uvicorn.run(app, host="0.0.0.0", port=port)

 from src.logger import logger
 from src.config import config
 from src.crawl import HuggingFaceDailyPapers
 from src.agents.evaluator import run_evaluation
 app = FastAPI(title="PaperAgent")
     hf_daily = HuggingFaceDailyPapers()
     # First, check if we have fresh cache for the requested date
+    cached_data = await db.get_cached_papers(target_date)
+    if cached_data and await db.is_cache_fresh(target_date):
         print(f"Using cached data for {target_date}")
         return {
             "date": target_date,
                 print(f"Redirected from {target_date} to {actual_date}")
                 # Check if the redirected date has fresh cache
+                cached_data = await db.get_cached_papers(actual_date)
+                if cached_data and await db.is_cache_fresh(actual_date):
                     print(f"Using cached data for redirected date {actual_date}")
                     return {
                         "date": actual_date,
                 enriched_cards = await enrich_cards(cards)
                 # Cache the results for the redirected date
+                await db.cache_papers(actual_date, html, enriched_cards)
                 return {
                     "date": actual_date,
             # If we got the exact date we requested, process normally
             cards = hf_daily.parse_daily_cards(html)
             enriched_cards = await enrich_cards(cards)
+            await db.cache_papers(actual_date, html, enriched_cards)
             return {
                 "date": actual_date,
         except Exception as e:
             print(f"Failed to fetch {target_date} for previous navigation: {e}")
             # Fallback to cached data if available
+            cached_data = await db.get_cached_papers(target_date)
             if cached_data:
                 return {
                     "date": target_date,
             if actual_date == target_date:
                 cards = hf_daily.parse_daily_cards(html)
                 enriched_cards = await enrich_cards(cards)
+                await db.cache_papers(actual_date, html, enriched_cards)
                 return {
                     "date": actual_date,
             # Try to find the next available date by incrementing
             next_date = await find_next_available_date_forward(target_date)
             if next_date:
+                cached_data = await db.get_cached_papers(next_date)
+                if cached_data and await db.is_cache_fresh(next_date):
                     print(f"Using cached data for next available date {next_date}")
                     return {
                         "date": next_date,
                 actual_date, html = await hf_daily.fetch_daily_html(next_date)
                 cards = hf_daily.parse_daily_cards(html)
                 enriched_cards = await enrich_cards(cards)
+                await db.cache_papers(actual_date, html, enriched_cards)
                 return {
                     "date": actual_date,
             # Try to find next available date
             next_date = await find_next_available_date_forward(target_date)
             if next_date:
+                cached_data = await db.get_cached_papers(next_date)
                 if cached_data:
                     return {
                         "date": next_date,
                 print(f"Redirected from {target_date} to {actual_date}")
                 # Check if the redirected date has fresh cache
+                cached_data = await db.get_cached_papers(actual_date)
+                if cached_data and await db.is_cache_fresh(actual_date):
                     print(f"Using cached data for redirected date {actual_date}")
                     return {
                         "date": actual_date,
                 enriched_cards = await enrich_cards(cards)
                 # Cache the results for the redirected date
+                await db.cache_papers(actual_date, html, enriched_cards)
                 return {
                     "date": actual_date,
             # If we got the exact date we requested, process normally
             cards = hf_daily.parse_daily_cards(html)
             enriched_cards = await enrich_cards(cards)
+            await db.cache_papers(actual_date, html, enriched_cards)
             return {
                 "date": actual_date,
             print(f"Failed to fetch {target_date}: {e}")
             # If everything fails, return cached data if available
+            cached_data = await db.get_cached_papers(target_date)
             if cached_data:
                 return {
                     "date": target_date,
         date_str = current_date.strftime("%Y-%m-%d")
         # Check if we have cache for this date
+        cached_data = await db.get_cached_papers(date_str)
         if cached_data:
             return date_str
     for c in cards:
         arxiv_id = c.get("arxiv_id")
         if arxiv_id:
+            paper = await db.get_paper(arxiv_id)
             if paper:
                 # Add evaluation status
                 c["has_eval"] = paper.get('is_evaluated', False)
 @app.get("/api/evals")
+async def list_evals() -> Dict[str, Any]:
     # Get evaluated papers from database
+    evaluated_papers = await db.get_evaluated_papers()
     items: List[Dict[str, Any]] = []
     for paper in evaluated_papers:
 @app.get("/api/has-eval/{paper_id}")
+async def has_eval(paper_id: str) -> Dict[str, bool]:
+    paper = await db.get_paper(paper_id)
     exists = paper is not None and paper.get('is_evaluated', False)
     return {"exists": exists}
 @app.get("/api/paper/{paper_id}")
+async def get_paper_details(paper_id: str) -> Dict[str, Any]:
     """Get detailed paper information from database"""
+    paper = await db.get_paper(paper_id)
     if not paper:
         raise HTTPException(status_code=404, detail="Paper not found")
 @app.get("/api/paper-score/{paper_id}")
+async def get_paper_score(paper_id: str) -> Dict[str, Any]:
+    paper = await db.get_paper(paper_id)
     print(f"Paper data for {paper_id}:", paper)
     if not paper or not paper.get('is_evaluated', False):
 @app.get("/api/eval/{paper_id}")
+async def get_eval(paper_id: str) -> Any:
+    paper = await db.get_paper(paper_id)
     if not paper or not paper.get('is_evaluated', False):
         raise HTTPException(status_code=404, detail="Evaluation not found")
 @app.get("/api/available-dates")
+async def get_available_dates() -> Dict[str, Any]:
     """Get list of available dates in the cache"""
+    async with db.get_connection() as conn:
+        cursor = await conn.cursor()
+        await cursor.execute('SELECT date_str FROM papers_cache ORDER BY date_str DESC LIMIT 30')
+        rows = await cursor.fetchall()
+        dates = [row['date_str'] for row in rows]
         return {
             "available_dates": dates,
 @app.get("/api/cache/status")
+async def get_cache_status() -> Dict[str, Any]:
     """Get cache status and statistics"""
+    async with db.get_connection() as conn:
+        cursor = await conn.cursor()
         # Get total cached dates
+        await cursor.execute('SELECT COUNT(*) as count FROM papers_cache')
+        total_cached = (await cursor.fetchone())['count']
         # Get latest cached date
+        await cursor.execute('SELECT date_str, updated_at FROM latest_date WHERE id = 1')
+        latest_info = await cursor.fetchone()
         # Get cache age distribution
+        await cursor.execute('''
             SELECT
                 CASE
                     WHEN updated_at > datetime('now', '-1 hour') THEN '1 hour'
             FROM papers_cache
             GROUP BY age_group
         ''')
+        rows = await cursor.fetchall()
+        age_distribution = {row['age_group']: row['count'] for row in rows}
         return {
             "total_cached_dates": total_cached,
 @app.get("/api/papers/status")
+async def get_papers_status() -> Dict[str, Any]:
     """Get papers database status and statistics"""
+    papers_count = await db.get_papers_count()
     # Get recent evaluations
+    recent_papers = await db.get_evaluated_papers()
     recent_evaluations = []
     for paper in recent_papers[:10]:  # Get last 10 evaluations
         recent_evaluations.append({
 @app.post("/api/papers/insert")
+async def insert_paper(paper_data: Dict[str, Any]) -> Dict[str, Any]:
     """Insert a new paper into the database"""
     try:
         required_fields = ['arxiv_id', 'title', 'authors']
             if field not in paper_data:
                 raise HTTPException(status_code=400, detail=f"Missing required field: {field}")
+        await db.insert_paper(
             arxiv_id=paper_data['arxiv_id'],
             title=paper_data['title'],
             authors=paper_data['authors'],
         raise HTTPException(status_code=500, detail=f"Failed to insert paper: {str(e)}")
+# Global task tracker for concurrent evaluations
+evaluation_tasks = {}
 @app.post("/api/papers/evaluate/{arxiv_id}")
+async def evaluate_paper(arxiv_id: str, force_reevaluate: bool = False) -> Dict[str, Any]:
     """Evaluate a paper by its arxiv_id"""
     try:
         # Check if paper exists in database
+        paper = await db.get_paper(arxiv_id)
         if not paper:
             raise HTTPException(status_code=404, detail="Paper not found in database")
+        # Check if already evaluated (unless force_reevaluate is True)
+        if not force_reevaluate and paper.get('is_evaluated', False):
             return {"message": f"Paper {arxiv_id} already evaluated", "status": "already_evaluated"}
+        # Check if evaluation is already running
+        if arxiv_id in evaluation_tasks and not evaluation_tasks[arxiv_id].done():
+            return {"message": f"Evaluation already running for {arxiv_id}", "status": "already_running"}
         # Create PDF URL from arxiv_id
         pdf_url = f"https://arxiv.org/pdf/{arxiv_id}.pdf"
         async def run_eval():
             try:
                 # Update paper status to "evaluating"
+                await db.update_paper_status(arxiv_id, "evaluating")
+                logger.info(f"Started {'re-' if force_reevaluate else ''}evaluation for {arxiv_id}")
                 result = await run_evaluation(
                     pdf_path=pdf_url,
                 )
                 # Update paper status to "completed"
+                await db.update_paper_status(arxiv_id, "completed")
+                logger.info(f"{'Re-' if force_reevaluate else ''}evaluation completed for {arxiv_id}")
             except Exception as e:
                 # Update paper status to "failed"
+                await db.update_paper_status(arxiv_id, "failed")
+                logger.error(f"{'Re-' if force_reevaluate else ''}evaluation failed for {arxiv_id}: {str(e)}")
+            finally:
+                # Clean up task from tracker
+                if arxiv_id in evaluation_tasks:
+                    del evaluation_tasks[arxiv_id]
+        # Start evaluation in background and track it
+        task = asyncio.create_task(run_eval())
+        evaluation_tasks[arxiv_id] = task
         return {
+            "message": f"{'Re-' if force_reevaluate else ''}evaluation started for paper {arxiv_id}",
             "status": "started",
+            "pdf_url": pdf_url,
+            "concurrent_tasks": len(evaluation_tasks),
+            "is_reevaluate": force_reevaluate
         }
     except Exception as e:
         raise HTTPException(status_code=500, detail=f"Failed to evaluate paper: {str(e)}")
 @app.get("/api/papers/evaluate/{arxiv_id}/status")
+async def get_evaluation_status(arxiv_id: str) -> Dict[str, Any]:
     """Get evaluation status for a paper"""
     try:
+        paper = await db.get_paper(arxiv_id)
         if not paper:
             raise HTTPException(status_code=404, detail="Paper not found")
         status = paper.get('evaluation_status', 'not_started')
         is_evaluated = paper.get('is_evaluated', False)
+        # Check if task is currently running
+        is_running = arxiv_id in evaluation_tasks and not evaluation_tasks[arxiv_id].done()
         return {
             "arxiv_id": arxiv_id,
             "status": status,
             "is_evaluated": is_evaluated,
+            "is_running": is_running,
             "evaluation_date": paper.get('evaluation_date'),
             "evaluation_score": paper.get('evaluation_score')
         }
         raise HTTPException(status_code=500, detail=f"Failed to get evaluation status: {str(e)}")
+@app.post("/api/papers/reevaluate/{arxiv_id}")
+async def reevaluate_paper(arxiv_id: str) -> Dict[str, Any]:
+    """Re-evaluate a paper by its arxiv_id"""
+    try:
+        # Check if paper exists in database
+        paper = await db.get_paper(arxiv_id)
+        if not paper:
+            raise HTTPException(status_code=404, detail="Paper not found in database")
+        # Check if evaluation is already running
+        if arxiv_id in evaluation_tasks and not evaluation_tasks[arxiv_id].done():
+            return {"message": f"Evaluation already running for {arxiv_id}", "status": "already_running"}
+        # Create PDF URL from arxiv_id
+        pdf_url = f"https://arxiv.org/pdf/{arxiv_id}.pdf"
+        # Run re-evaluation in background task
+        async def run_reeval():
+            try:
+                # Update paper status to "evaluating"
+                await db.update_paper_status(arxiv_id, "evaluating")
+                logger.info(f"Started re-evaluation for {arxiv_id}")
+                result = await run_evaluation(
+                    pdf_path=pdf_url,
+                    arxiv_id=arxiv_id,
+                    api_key=os.getenv("ANTHROPIC_API_KEY")
+                )
+                # Update paper status to "completed"
+                await db.update_paper_status(arxiv_id, "completed")
+                logger.info(f"Re-evaluation completed for {arxiv_id}")
+            except Exception as e:
+                # Update paper status to "failed"
+                await db.update_paper_status(arxiv_id, "failed")
+                logger.error(f"Re-evaluation failed for {arxiv_id}: {str(e)}")
+            finally:
+                # Clean up task from tracker
+                if arxiv_id in evaluation_tasks:
+                    del evaluation_tasks[arxiv_id]
+        # Start re-evaluation in background and track it
+        task = asyncio.create_task(run_reeval())
+        evaluation_tasks[arxiv_id] = task
+        return {
+            "message": f"Re-evaluation started for paper {arxiv_id}",
+            "status": "started",
+            "pdf_url": pdf_url,
+            "concurrent_tasks": len(evaluation_tasks),
+            "is_reevaluate": True
+        }
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=f"Failed to re-evaluate paper: {str(e)}")
+@app.get("/api/papers/evaluate/active-tasks")
+async def get_active_evaluation_tasks() -> Dict[str, Any]:
+    """Get list of currently running evaluation tasks"""
+    active_tasks = {}
+    for arxiv_id, task in evaluation_tasks.items():
+        if not task.done():
+            active_tasks[arxiv_id] = {
+                "status": "running",
+                "done": task.done(),
+                "cancelled": task.cancelled()
+            }
+    return {
+        "active_tasks": active_tasks,
+        "total_active": len(active_tasks),
+        "total_tracked": len(evaluation_tasks)
+    }
 @app.post("/api/cache/clear")
+async def clear_cache() -> Dict[str, str]:
     """Clear all cached data"""
+    async with db.get_connection() as conn:
+        cursor = await conn.cursor()
+        await cursor.execute('DELETE FROM papers_cache')
+        await conn.commit()
     return {"message": "Cache cleared successfully"}
         cards = hf_daily.parse_daily_cards(html)
         # Cache the results
+        await db.cache_papers(actual_date, html, cards)
         return {
             "message": f"Cache refreshed for {actual_date}",
     response.headers["Expires"] = "0"
     return response
+async def main():
     # Parse command line arguments
     args = parse_args()
     logger.info(f"| Config:\n{config.pretty_text}")
     # Initialize the database
+    await db.init_db(config=config)
     logger.info(f"| Database initialized at: {config.db_path}")
     # Load Frontend
     logger.info(f"| Frontend initialized at: {config.frontend_path}")
     # Use port 7860 for Hugging Face Spaces, fallback to 7860 for local development
+    config_uvicorn = uvicorn.Config(app, host="0.0.0.0", port=7860)
+    server = uvicorn.Server(config_uvicorn)
+    await server.serve()
+if __name__ == "__main__":
+    asyncio.run(main())

frontend/index.html CHANGED Viewed

@@ -48,10 +48,16 @@
         </div>
         <div class="header-center">
-          <div class="ai-search-container">
-            <i class="fas fa-sparkles"></i>
-            <input type="text" placeholder="Search any paper with AI..." class="ai-search-input">
-            <i class="fas fa-cube"></i>
           </div>
         </div>

         </div>
         <div class="header-center">
+          <div class="search-batch-container">
+            <div class="ai-search-container">
+              <i class="fas fa-sparkles"></i>
+              <input type="text" placeholder="Search any paper with AI..." class="ai-search-input">
+              <i class="fas fa-cube"></i>
+            </div>
+            <button class="batch-evaluate-btn" id="batchEvaluateBtn">
+              <i class="fas fa-rocket"></i>
+              <span>Evaluate All</span>
+            </button>
           </div>
         </div>

frontend/main.js CHANGED Viewed

@@ -416,6 +416,9 @@ class PaperCardRenderer {
         button.onclick = () => {
           window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
         };
       } else {
         // Paper doesn't have evaluation - show evaluate button
         evalIcon.className = 'fas fa-play eval-icon';
@@ -433,6 +436,145 @@ class PaperCardRenderer {
     }
   }
   async checkPaperScore(card, arxivId) {
     try {
       // First check if the card already has score data from the API response
@@ -500,17 +642,17 @@ class PaperCardRenderer {
     }, 100);
   }
-  async evaluatePaper(button, arxivId) {
     const spinner = button.querySelector('.fa-spinner');
     const evalIcon = button.querySelector('.eval-icon');
     const evalText = button.querySelector('.eval-text');
     const paperTitle = button.getAttribute('data-paper-title');
-    // Show loading state
     spinner.style.display = 'inline-block';
     evalIcon.style.display = 'none';
-    evalText.textContent = 'Evaluating...';
-    button.className = 'eval-button evaluating-state';
     button.disabled = true;
     try {
@@ -534,23 +676,27 @@ class PaperCardRenderer {
       });
       // Start evaluation
-      const response = await fetch(`/api/papers/evaluate/${encodeURIComponent(arxivId)}`, {
         method: 'POST'
       });
       if (response.ok) {
         const result = await response.json();
-        if (result.status === 'already_evaluated') {
           // Paper was already evaluated, redirect to evaluation page
           window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
         } else {
           // Evaluation started, show progress and poll for status
-          evalText.textContent = 'Started...';
           button.className = 'eval-button started-state';
           // Start polling for status
-          this.pollEvaluationStatus(button, arxivId);
         }
       } else {
         throw new Error('Failed to start evaluation');
@@ -567,14 +713,15 @@ class PaperCardRenderer {
     }
   }
-  async pollEvaluationStatus(button, arxivId) {
     const evalIcon = button.querySelector('.eval-icon');
     const evalText = button.querySelector('.eval-text');
     let pollCount = 0;
     const maxPolls = 60; // Poll for up to 5 minutes (5s intervals)
     // Show log message
-    this.showLogMessage(`Started evaluation for paper ${arxivId}`, 'info');
     const poll = async () => {
       try {
@@ -584,24 +731,31 @@ class PaperCardRenderer {
           switch (status.status) {
             case 'evaluating':
-              evalText.textContent = `Evaluating... (${pollCount * 5}s)`;
               evalIcon.className = 'fas fa-spinner fa-spin eval-icon';
               button.className = 'eval-button evaluating-state';
-              this.showLogMessage(`Evaluating paper ${arxivId}... (${pollCount * 5}s)`, 'info');
               break;
             case 'completed':
               evalIcon.className = 'fas fa-check eval-icon';
-              evalText.textContent = 'Completed';
               button.className = 'eval-button evaluation-state';
               button.onclick = () => {
                 window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
               };
-              this.showLogMessage(`Evaluation completed for paper ${arxivId}`, 'success');
               // Add score badge after completion
               this.checkPaperScore(button.closest('.hf-paper-card'), arxivId);
               return; // Stop polling
             case 'failed':
@@ -749,6 +903,19 @@ class PaperIndexApp {
         e.target.classList.add('active');
       });
     });
   }
   async loadDaily(direction = null) {
@@ -822,7 +989,75 @@ class PaperIndexApp {
     }
   }
-  // Removed showFallbackNotification - now using unified notification system
   // Unified notification system
   showNotification(options) {

         button.onclick = () => {
           window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
         };
+        // Add re-evaluate button for already evaluated papers
+        this.addReevaluateButton(card, arxivId);
       } else {
         // Paper doesn't have evaluation - show evaluate button
         evalIcon.className = 'fas fa-play eval-icon';
     }
   }
+  addReevaluateButton(card, arxivId) {
+    // Check if re-evaluate button already exists
+    if (card.querySelector('.reevaluate-button')) {
+      return;
+    }
+    const cardActions = card.querySelector('.card-actions');
+    if (cardActions) {
+      const reevaluateButton = document.createElement('button');
+      reevaluateButton.className = 'reevaluate-button';
+      reevaluateButton.innerHTML = `
+        <i class="fas fa-redo"></i>
+        <span>Re-evaluate</span>
+      `;
+      reevaluateButton.onclick = () => {
+        this.reevaluatePaper(reevaluateButton, arxivId);
+      };
+      cardActions.appendChild(reevaluateButton);
+    }
+  }
+  async reevaluatePaper(button, arxivId) {
+    const icon = button.querySelector('i');
+    const text = button.querySelector('span');
+    const originalText = text.textContent;
+    const originalIcon = icon.className;
+    // Show loading state
+    icon.className = 'fas fa-spinner fa-spin';
+    text.textContent = 'Re-evaluating...';
+    button.disabled = true;
+    // Show log message
+    this.showLogMessage(`Started re-evaluation for paper ${arxivId}`, 'info');
+    try {
+      const response = await fetch(`/api/papers/reevaluate/${encodeURIComponent(arxivId)}`, {
+        method: 'POST'
+      });
+      if (response.ok) {
+        const result = await response.json();
+        if (result.status === 'already_running') {
+          text.textContent = 'Already running';
+          this.showLogMessage(`Re-evaluation already running for paper ${arxivId}`, 'warning');
+          setTimeout(() => {
+            icon.className = originalIcon;
+            text.textContent = originalText;
+            button.disabled = false;
+          }, 2000);
+        } else {
+          // Start polling for status
+          this.pollReevaluationStatus(button, arxivId, originalText, originalIcon);
+        }
+      } else {
+        throw new Error('Failed to start re-evaluation');
+      }
+    } catch (error) {
+      console.error('Error re-evaluating paper:', error);
+      icon.className = 'fas fa-exclamation-triangle';
+      text.textContent = 'Error';
+      this.showLogMessage(`Re-evaluation failed for paper ${arxivId}: ${error.message}`, 'error');
+      setTimeout(() => {
+        icon.className = originalIcon;
+        text.textContent = originalText;
+        button.disabled = false;
+      }, 2000);
+    }
+  }
+  async pollReevaluationStatus(button, arxivId, originalText, originalIcon) {
+    const icon = button.querySelector('i');
+    const text = button.querySelector('span');
+    let pollCount = 0;
+    const maxPolls = 60; // Poll for up to 5 minutes (5s intervals)
+    const poll = async () => {
+      try {
+        const response = await fetch(`/api/papers/evaluate/${encodeURIComponent(arxivId)}/status`);
+        if (response.ok) {
+          const status = await response.json();
+          switch (status.status) {
+            case 'evaluating':
+              text.textContent = `Re-evaluating... (${pollCount * 5}s)`;
+              icon.className = 'fas fa-spinner fa-spin';
+              this.showLogMessage(`Re-evaluating paper ${arxivId}... (${pollCount * 5}s)`, 'info');
+              break;
+            case 'completed':
+              icon.className = 'fas fa-check';
+              text.textContent = 'Re-evaluated';
+              button.disabled = false;
+              this.showLogMessage(`Re-evaluation completed for paper ${arxivId}`, 'success');
+              // Refresh the page to show updated results
+              setTimeout(() => {
+                window.location.reload();
+              }, 1000);
+              return;
+            case 'failed':
+              icon.className = 'fas fa-exclamation-triangle';
+              text.textContent = 'Failed';
+              button.disabled = false;
+              this.showLogMessage(`Re-evaluation failed for paper ${arxivId}`, 'error');
+              return;
+            default:
+              text.textContent = `Status: ${status.status}`;
+          }
+          pollCount++;
+          if (pollCount < maxPolls) {
+            setTimeout(poll, 5000);
+          } else {
+            icon.className = 'fas fa-clock';
+            text.textContent = 'Timeout';
+            button.disabled = false;
+            this.showLogMessage(`Re-evaluation timeout for paper ${arxivId}`, 'warning');
+          }
+        } else {
+          throw new Error('Failed to get status');
+        }
+      } catch (error) {
+        console.error('Error polling re-evaluation status:', error);
+        icon.className = 'fas fa-exclamation-triangle';
+        text.textContent = 'Error';
+        button.disabled = false;
+      }
+    };
+    poll();
+  }
   async checkPaperScore(card, arxivId) {
     try {
       // First check if the card already has score data from the API response
     }, 100);
   }
+  async evaluatePaper(button, arxivId, isReevaluate = false) {
     const spinner = button.querySelector('.fa-spinner');
     const evalIcon = button.querySelector('.eval-icon');
     const evalText = button.querySelector('.eval-text');
     const paperTitle = button.getAttribute('data-paper-title');
+    // Clear any existing state classes and show loading state
+    button.className = 'eval-button started-state';
     spinner.style.display = 'inline-block';
     evalIcon.style.display = 'none';
+    evalText.textContent = isReevaluate ? 'Re-starting...' : 'Starting...';
     button.disabled = true;
     try {
       });
       // Start evaluation
+      const url = isReevaluate ?
+        `/api/papers/reevaluate/${encodeURIComponent(arxivId)}` :
+        `/api/papers/evaluate/${encodeURIComponent(arxivId)}`;
+      const response = await fetch(url, {
         method: 'POST'
       });
       if (response.ok) {
         const result = await response.json();
+        if (result.status === 'already_evaluated' && !isReevaluate) {
           // Paper was already evaluated, redirect to evaluation page
           window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
         } else {
           // Evaluation started, show progress and poll for status
+          evalText.textContent = isReevaluate ? 'Re-started...' : 'Started...';
           button.className = 'eval-button started-state';
           // Start polling for status
+          this.pollEvaluationStatus(button, arxivId, isReevaluate);
         }
       } else {
         throw new Error('Failed to start evaluation');
     }
   }
+  async pollEvaluationStatus(button, arxivId, isReevaluate = false) {
     const evalIcon = button.querySelector('.eval-icon');
     const evalText = button.querySelector('.eval-text');
     let pollCount = 0;
     const maxPolls = 60; // Poll for up to 5 minutes (5s intervals)
     // Show log message
+    const action = isReevaluate ? 're-evaluation' : 'evaluation';
+    this.showLogMessage(`Started ${action} for paper ${arxivId}`, 'info');
     const poll = async () => {
       try {
           switch (status.status) {
             case 'evaluating':
+              evalText.textContent = isReevaluate ? `Re-evaluating... (${pollCount * 5}s)` : `Evaluating... (${pollCount * 5}s)`;
               evalIcon.className = 'fas fa-spinner fa-spin eval-icon';
               button.className = 'eval-button evaluating-state';
+              const evaluatingAction = isReevaluate ? 'Re-evaluating' : 'Evaluating';
+              this.showLogMessage(`${evaluatingAction} paper ${arxivId}... (${pollCount * 5}s)`, 'info');
               break;
             case 'completed':
               evalIcon.className = 'fas fa-check eval-icon';
+              evalText.textContent = isReevaluate ? 'Re-evaluated' : 'Completed';
               button.className = 'eval-button evaluation-state';
               button.onclick = () => {
                 window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
               };
+              const completedAction = isReevaluate ? 'Re-evaluation' : 'Evaluation';
+              this.showLogMessage(`${completedAction} completed for paper ${arxivId}`, 'success');
               // Add score badge after completion
               this.checkPaperScore(button.closest('.hf-paper-card'), arxivId);
+              // Add re-evaluate button if not already re-evaluating
+              if (!isReevaluate) {
+                this.addReevaluateButton(button.closest('.hf-paper-card'), arxivId);
+              }
               return; // Stop polling
             case 'failed':
         e.target.classList.add('active');
       });
     });
+    // Batch evaluate button
+    const batchEvaluateBtn = document.getElementById('batchEvaluateBtn');
+    console.log('Looking for batchEvaluateBtn:', batchEvaluateBtn);
+    if (batchEvaluateBtn) {
+      console.log('Adding click listener to batchEvaluateBtn');
+      batchEvaluateBtn.addEventListener('click', () => {
+        console.log('Batch evaluate button clicked');
+        this.startBatchEvaluation();
+      });
+    } else {
+      console.error('batchEvaluateBtn not found during initialization');
+    }
   }
   async loadDaily(direction = null) {
     }
   }
+  async startBatchEvaluation() {
+    console.log('startBatchEvaluation called');
+    const button = document.getElementById('batchEvaluateBtn');
+    if (!button) {
+      console.error('batchEvaluateBtn not found');
+      return;
+    }
+    console.log('Found batchEvaluateBtn:', button);
+    // Disable button and show loading state
+    button.disabled = true;
+    const originalContent = button.innerHTML;
+    button.innerHTML = '<i class="fas fa-spinner fa-spin"></i><span>Starting...</span>';
+    try {
+      // Find all unevaluated evaluate buttons
+      const unevaluatedButtons = document.querySelectorAll('.eval-button');
+      console.log('Found eval buttons:', unevaluatedButtons.length);
+      const buttonsToClick = [];
+      unevaluatedButtons.forEach((evalButton, index) => {
+        const evalText = evalButton.querySelector('.eval-text');
+        console.log(`Button ${index}:`, evalText ? evalText.textContent : 'no text');
+        if (evalText && (evalText.textContent === 'Evaluate' || evalText.textContent === 'Check')) {
+          buttonsToClick.push(evalButton);
+        }
+      });
+      console.log('Buttons to click:', buttonsToClick.length);
+      if (buttonsToClick.length === 0) {
+        console.log('No buttons to click');
+        this.cardRenderer.showLogMessage('All papers have already been evaluated.', 'info');
+        return;
+      }
+      this.cardRenderer.showLogMessage(`Starting batch evaluation of ${buttonsToClick.length} papers...`, 'info');
+      // Click each evaluate button with delay
+      for (let i = 0; i < buttonsToClick.length; i++) {
+        const evalButton = buttonsToClick[i];
+        // Update button text to show progress
+        button.innerHTML = `<i class="fas fa-spinner fa-spin"></i><span>Starting ${i + 1} of ${buttonsToClick.length}</span>`;
+        console.log(`Clicking button ${i + 1}:`, evalButton);
+        // Simulate click on the evaluate button
+        evalButton.click();
+        // Add delay between clicks to avoid API overload
+        await new Promise(resolve => setTimeout(resolve, 1000));
+      }
+      this.cardRenderer.showLogMessage(`Started evaluation for ${buttonsToClick.length} papers. They will complete in the background.`, 'success');
+    } catch (error) {
+      console.error('Batch evaluation error:', error);
+      this.cardRenderer.showLogMessage(`Batch evaluation failed: ${error.message}`, 'error');
+    } finally {
+      // Restore button state
+      button.disabled = false;
+      button.innerHTML = originalContent;
+    }
+  }
   // Unified notification system
   showNotification(options) {

frontend/paper.js CHANGED Viewed

@@ -252,7 +252,24 @@ class PaperEvaluationRenderer {
       </section>
     `;
-    contentEl.innerHTML = execSummary +
       `<section class="evaluation-section">
         <div class="section-header">
           <h2><i class="fas fa-chart-bar"></i> Detailed Dimensional Analysis</h2>
@@ -524,9 +541,12 @@ class PaperEvaluationRenderer {
 class PaperEvaluationApp {
   constructor() {
     this.renderer = new PaperEvaluationRenderer();
     this.init();
   }
   async init() {
     const id = getParam('id');
     console.log('PaperEvaluationApp init with ID:', id);
@@ -592,7 +612,7 @@ class PaperEvaluationApp {
 // Initialize the application when DOM is loaded
 document.addEventListener('DOMContentLoaded', () => {
-  new PaperEvaluationApp();
 });

       </section>
     `;
+    // Add action buttons at the top
+    const actionButtons = `
+      <section class="evaluation-section">
+        <div class="section-header">
+          <div style="display: flex; justify-content: space-between; align-items: center;">
+            <h2><i class="fas fa-chart-line"></i> Evaluation Actions</h2>
+            <div class="action-buttons">
+              <a href="/" class="action-btn primary">
+                <i class="fas fa-arrow-left"></i>
+                Back to Daily Papers
+              </a>
+            </div>
+          </div>
+        </div>
+      </section>
+    `;
+    contentEl.innerHTML = actionButtons + execSummary +
       `<section class="evaluation-section">
         <div class="section-header">
           <h2><i class="fas fa-chart-bar"></i> Detailed Dimensional Analysis</h2>
 class PaperEvaluationApp {
   constructor() {
     this.renderer = new PaperEvaluationRenderer();
+    this.paperId = getParam('id');
     this.init();
   }
   async init() {
     const id = getParam('id');
     console.log('PaperEvaluationApp init with ID:', id);
 // Initialize the application when DOM is loaded
 document.addEventListener('DOMContentLoaded', () => {
+  window.paperApp = new PaperEvaluationApp();
 });

frontend/styles.css CHANGED Viewed

@@ -188,7 +188,7 @@ body {
   margin: 0 auto;
   padding: 0 24px;
   display: grid;
-  grid-template-columns: 1fr 2fr 1fr;
   gap: 32px;
   align-items: center;
 }
@@ -205,9 +205,18 @@ body {
   font-size: 16px;
 }
 .ai-search-container {
   position: relative;
-  width: 100%;
 }
 .ai-search-input {
@@ -245,6 +254,41 @@ body {
   font-size: 16px;
 }
 .header-right {
   display: flex;
   flex-direction: column;
@@ -737,6 +781,7 @@ body {
   border-radius: 50%;
   transform: translateY(-50%);
   animation: spin 1s linear infinite;
 }
 @keyframes spin {
@@ -762,6 +807,7 @@ body {
   border-radius: 50%;
   transform: translateY(-50%);
   animation: pulse 1.5s ease-in-out infinite;
 }
 @keyframes pulse {
@@ -823,11 +869,113 @@ body {
   border-color: var(--text-muted);
 }
 /* Spinner animation */
 .eval-button .fa-spinner {
   animation: spin 1s linear infinite;
 }
 @keyframes spin {
   from { transform: rotate(0deg); }
   to { transform: rotate(360deg); }

   margin: 0 auto;
   padding: 0 24px;
   display: grid;
+  grid-template-columns: 1fr 1fr 1fr;
   gap: 32px;
   align-items: center;
 }
   font-size: 16px;
 }
+.search-batch-container {
+  display: flex;
+  align-items: center;
+  gap: 16px;
+  width: 100%;
+  justify-content: center;
+}
 .ai-search-container {
   position: relative;
+  flex: 1;
+  max-width: 800px;
 }
 .ai-search-input {
   font-size: 16px;
 }
+.batch-evaluate-btn {
+  display: flex;
+  align-items: center;
+  gap: 8px;
+  padding: 12px 20px;
+  background: linear-gradient(135deg, var(--accent-primary), var(--accent-secondary));
+  color: white;
+  border: none;
+  border-radius: 12px;
+  font-size: 14px;
+  font-weight: 600;
+  cursor: pointer;
+  transition: all 0.2s ease;
+  box-shadow: 0 2px 8px rgba(59, 130, 246, 0.3);
+}
+.batch-evaluate-btn:hover {
+  transform: translateY(-1px);
+  box-shadow: 0 4px 12px rgba(59, 130, 246, 0.4);
+}
+.batch-evaluate-btn:active {
+  transform: translateY(0);
+}
+.batch-evaluate-btn:disabled {
+  opacity: 0.6;
+  cursor: not-allowed;
+  transform: none;
+}
+.batch-evaluate-btn i {
+  font-size: 16px;
+}
 .header-right {
   display: flex;
   flex-direction: column;
   border-radius: 50%;
   transform: translateY(-50%);
   animation: spin 1s linear infinite;
+  z-index: 1;
 }
 @keyframes spin {
   border-radius: 50%;
   transform: translateY(-50%);
   animation: pulse 1.5s ease-in-out infinite;
+  z-index: 1;
 }
 @keyframes pulse {
   border-color: var(--text-muted);
 }
+/* Re-evaluate button */
+.reevaluate-button {
+  display: inline-flex;
+  align-items: center;
+  gap: 6px;
+  padding: 8px 16px;
+  border: 1px solid var(--accent-secondary);
+  border-radius: 8px;
+  background-color: var(--bg-secondary);
+  color: var(--accent-secondary);
+  font-size: 12px;
+  font-weight: 500;
+  text-decoration: none;
+  cursor: pointer;
+  transition: all 0.2s ease;
+  min-width: 100px;
+  justify-content: center;
+  margin-left: 8px;
+}
+.reevaluate-button:hover {
+  background-color: var(--accent-secondary);
+  color: white;
+  border-color: var(--accent-secondary);
+}
+.reevaluate-button:disabled {
+  opacity: 0.6;
+  cursor: not-allowed;
+}
+.reevaluate-button i {
+  font-size: 12px;
+}
+/* Action buttons for paper detail page */
+.action-buttons {
+  display: flex;
+  gap: 12px;
+  align-items: center;
+}
+.action-btn {
+  display: inline-flex;
+  align-items: center;
+  gap: 8px;
+  padding: 10px 16px;
+  border: 1px solid var(--border-medium);
+  border-radius: 8px;
+  background-color: var(--bg-secondary);
+  color: var(--text-secondary);
+  font-size: 14px;
+  font-weight: 500;
+  text-decoration: none;
+  cursor: pointer;
+  transition: all 0.2s ease;
+}
+.action-btn:hover {
+  background-color: var(--bg-tertiary);
+  color: var(--text-primary);
+  border-color: var(--border-medium);
+}
+.action-btn.primary {
+  background-color: var(--accent-primary);
+  color: white;
+  border-color: var(--accent-primary);
+}
+.action-btn.primary:hover {
+  background-color: var(--accent-primary);
+  opacity: 0.9;
+}
+.action-btn.secondary {
+  background-color: var(--accent-secondary);
+  color: white;
+  border-color: var(--accent-secondary);
+}
+.action-btn.secondary:hover {
+  background-color: var(--accent-secondary);
+  opacity: 0.9;
+}
+.action-btn:disabled {
+  opacity: 0.6;
+  cursor: not-allowed;
+}
 /* Spinner animation */
 .eval-button .fa-spinner {
   animation: spin 1s linear infinite;
 }
+/* Ensure only one ::after pseudo-element is visible at a time */
+.eval-button::after {
+  content: none;
+}
+.eval-button.evaluating-state::after,
+.eval-button.started-state::after,
+.eval-button.processing-state::after {
+  content: '';
+}
 @keyframes spin {
   from { transform: rotate(0deg); }
   to { transform: rotate(360deg); }

requirements.txt CHANGED Viewed

@@ -9,4 +9,5 @@ httpx>=0.27.0
 beautifulsoup4>=4.12.3
 lxml>=5.2.2
 mmengine>=0.10.7

 beautifulsoup4>=4.12.3
 lxml>=5.2.2
 mmengine>=0.10.7
+aiosqlite>=0.20.0

src/agents/evaluator.py CHANGED Viewed

@@ -9,7 +9,7 @@ from typing import Any, Dict, List, Optional
 from pathlib import Path
 from datetime import datetime
-from anthropic import Anthropic
 from anthropic.types import ToolUseBlock
 from langgraph.graph import END, StateGraph
 from pydantic import BaseModel, Field
@@ -59,7 +59,7 @@ class Evaluator:
         api_key = api_key or os.getenv("ANTHROPIC_API_KEY")
         if not api_key:
             raise ValueError("Anthropic API key is required. Please set HF_SECRET_ANTHROPIC_API_KEY in Hugging Face Spaces secrets or ANTHROPIC_API_KEY environment variable.")
-        self.client = Anthropic(api_key=api_key)
         self.system_prompt = REVIEWER_SYSTEM_PROMPT
         self.eval_template = EVALUATION_PROMPT_TEMPLATE
@@ -91,8 +91,8 @@ class Evaluator:
         })
         try:
-            # Call Anthropic API with tools
-            response = self.client.messages.create(
                 model=config.model_id,
                 max_tokens=4000,
                 system=self.system_prompt,
@@ -210,7 +210,7 @@ async def save_node(state: ConversationState) -> ConversationState:
                 logger.warning(f"Warning: Could not parse evaluation_content as JSON: {e}")
         # Save to database
-        db.update_paper_evaluation(
             arxiv_id=state.arxiv_id,
             evaluation_content=evaluation_content,
             evaluation_score=evaluation_score,

 from pathlib import Path
 from datetime import datetime
+from anthropic import AsyncAnthropic
 from anthropic.types import ToolUseBlock
 from langgraph.graph import END, StateGraph
 from pydantic import BaseModel, Field
         api_key = api_key or os.getenv("ANTHROPIC_API_KEY")
         if not api_key:
             raise ValueError("Anthropic API key is required. Please set HF_SECRET_ANTHROPIC_API_KEY in Hugging Face Spaces secrets or ANTHROPIC_API_KEY environment variable.")
+        self.client = AsyncAnthropic(api_key=api_key)
         self.system_prompt = REVIEWER_SYSTEM_PROMPT
         self.eval_template = EVALUATION_PROMPT_TEMPLATE
         })
         try:
+            # Call Anthropic API with tools (async)
+            response = await self.client.messages.create(
                 model=config.model_id,
                 max_tokens=4000,
                 system=self.system_prompt,
                 logger.warning(f"Warning: Could not parse evaluation_content as JSON: {e}")
         # Save to database
+        await db.update_paper_evaluation(
             arxiv_id=state.arxiv_id,
             evaluation_content=evaluation_content,
             evaluation_score=evaluation_score,

src/database/db.py CHANGED Viewed

@@ -1,9 +1,9 @@
 import os
 import json
-import sqlite3
 from datetime import date, datetime, timedelta
 from typing import Any, Dict, List, Optional
-from contextlib import contextmanager
 class PapersDatabase():
@@ -11,16 +11,16 @@ class PapersDatabase():
         super().__init__(**kwargs)
         self.db_path = None
-    def init_db(self, config):
         """Initialize the database with required tables"""
         self.db_path = config.db_path
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
             # Create papers cache table
-            cursor.execute('''
                 CREATE TABLE IF NOT EXISTS papers_cache (
                     date_str TEXT PRIMARY KEY,
                     html_content TEXT NOT NULL,
@@ -31,7 +31,7 @@ class PapersDatabase():
             ''')
             # Create papers table for individual arXiv papers
-            cursor.execute('''
                 CREATE TABLE IF NOT EXISTS papers (
                     arxiv_id TEXT PRIMARY KEY,
                     title TEXT NOT NULL,
@@ -52,7 +52,7 @@ class PapersDatabase():
             ''')
             # Create latest_date table to track the most recent available date
-            cursor.execute('''
                 CREATE TABLE IF NOT EXISTS latest_date (
                     id INTEGER PRIMARY KEY CHECK (id = 1),
                     date_str TEXT NOT NULL,
@@ -61,34 +61,39 @@ class PapersDatabase():
             ''')
             # Insert default latest_date record if it doesn't exist
-            cursor.execute('''
                 INSERT OR IGNORE INTO latest_date (id, date_str)
                 VALUES (1, ?)
             ''', (date.today().isoformat(),))
-            conn.commit()
-    @contextmanager
-    def get_connection(self):
         """Context manager for database connections"""
-        conn = sqlite3.connect(self.db_path)
-        conn.row_factory = sqlite3.Row  # Enable dict-like access
         try:
             yield conn
         finally:
-            conn.close()
-    def get_cached_papers(self, date_str: str) -> Optional[Dict[str, Any]]:
         """Get cached papers for a specific date"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('''
                 SELECT parsed_cards, created_at
                 FROM papers_cache
                 WHERE date_str = ?
             ''', (date_str,))
-            row = cursor.fetchone()
             if row:
                 return {
                     'cards': json.loads(row['parsed_cards']),
@@ -96,47 +101,47 @@ class PapersDatabase():
                 }
             return None
-    def cache_papers(self, date_str: str, html_content: str, parsed_cards: List[Dict[str, Any]]):
         """Cache papers for a specific date"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('''
                 INSERT OR REPLACE INTO papers_cache
                 (date_str, html_content, parsed_cards, updated_at)
                 VALUES (?, ?, ?, CURRENT_TIMESTAMP)
             ''', (date_str, html_content, json.dumps(parsed_cards)))
-            conn.commit()
-    def get_latest_cached_date(self) -> Optional[str]:
         """Get the latest cached date"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('SELECT date_str FROM latest_date WHERE id = 1')
-            row = cursor.fetchone()
             return row['date_str'] if row else None
-    def update_latest_date(self, date_str: str):
         """Update the latest available date"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('''
                 UPDATE latest_date
                 SET date_str = ?, updated_at = CURRENT_TIMESTAMP
                 WHERE id = 1
             ''', (date_str,))
-            conn.commit()
-    def is_cache_fresh(self, date_str: str, max_age_hours: int = 24) -> bool:
         """Check if cache is fresh (within max_age_hours)"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('''
                 SELECT updated_at
                 FROM papers_cache
                 WHERE date_str = ?
             ''', (date_str,))
-            row = cursor.fetchone()
             if not row:
                 return False
@@ -144,64 +149,65 @@ class PapersDatabase():
             age = datetime.now(cached_time.tzinfo) - cached_time
             return age.total_seconds() < max_age_hours * 3600
-    def cleanup_old_cache(self, days_to_keep: int = 7):
         """Clean up old cache entries"""
         cutoff_date = (datetime.now() - timedelta(days=days_to_keep)).isoformat()
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('''
                 DELETE FROM papers_cache
                 WHERE updated_at < ?
             ''', (cutoff_date,))
-            conn.commit()
     # Papers table methods
-    def insert_paper(self, arxiv_id: str, title: str, authors: str, abstract: str = None,
                     categories: str = None, published_date: str = None):
         """Insert a new paper into the papers table"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('''
                 INSERT OR REPLACE INTO papers
                 (arxiv_id, title, authors, abstract, categories, published_date, updated_at)
                 VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
             ''', (arxiv_id, title, authors, abstract, categories, published_date))
-            conn.commit()
-    def get_paper(self, arxiv_id: str) -> Optional[Dict[str, Any]]:
         """Get a paper by arxiv_id"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('''
                 SELECT * FROM papers WHERE arxiv_id = ?
             ''', (arxiv_id,))
-            row = cursor.fetchone()
             if row:
                 return dict(row)
             return None
-    def get_papers_by_evaluation_status(self, is_evaluated: bool = None) -> List[Dict[str, Any]]:
         """Get papers by evaluation status"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
             if is_evaluated is None:
-                cursor.execute('SELECT * FROM papers ORDER BY created_at DESC')
             else:
-                cursor.execute('''
                     SELECT * FROM papers
                     WHERE is_evaluated = ?
                     ORDER BY created_at DESC
                 ''', (is_evaluated,))
-            return [dict(row) for row in cursor.fetchall()]
-    def update_paper_evaluation(self, arxiv_id: str, evaluation_content: str,
                                evaluation_score: float = None, overall_score: float = None, evaluation_tags: str = None):
         """Update paper with evaluation content"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('''
                 UPDATE papers
                 SET evaluation_content = ?,
                     evaluation_score = ?,
@@ -213,57 +219,60 @@ class PapersDatabase():
                     updated_at = CURRENT_TIMESTAMP
                 WHERE arxiv_id = ?
             ''', (evaluation_content, evaluation_score, overall_score, evaluation_tags, arxiv_id))
-            conn.commit()
-    def update_paper_status(self, arxiv_id: str, status: str):
         """Update paper evaluation status"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('''
                 UPDATE papers
                 SET evaluation_status = ?,
                     updated_at = CURRENT_TIMESTAMP
                 WHERE arxiv_id = ?
             ''', (status, arxiv_id))
-            conn.commit()
-    def get_unevaluated_papers(self) -> List[Dict[str, Any]]:
         """Get all papers that haven't been evaluated yet"""
-        return self.get_papers_by_evaluation_status(is_evaluated=False)
-    def get_evaluated_papers(self) -> List[Dict[str, Any]]:
         """Get all papers that have been evaluated"""
-        return self.get_papers_by_evaluation_status(is_evaluated=True)
-    def search_papers(self, query: str) -> List[Dict[str, Any]]:
         """Search papers by title, authors, or abstract"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
             search_pattern = f'%{query}%'
-            cursor.execute('''
                 SELECT * FROM papers
                 WHERE title LIKE ? OR authors LIKE ? OR abstract LIKE ?
                 ORDER BY created_at DESC
             ''', (search_pattern, search_pattern, search_pattern))
-            return [dict(row) for row in cursor.fetchall()]
-    def delete_paper(self, arxiv_id: str):
         """Delete a paper from the database"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('DELETE FROM papers WHERE arxiv_id = ?', (arxiv_id,))
-            conn.commit()
-    def get_papers_count(self) -> Dict[str, int]:
         """Get count of papers by evaluation status"""
-        with self.get_connection() as conn:
-            cursor = conn.cursor()
-            cursor.execute('SELECT COUNT(*) as total FROM papers')
-            total = cursor.fetchone()['total']
-            cursor.execute('SELECT COUNT(*) as evaluated FROM papers WHERE is_evaluated = TRUE')
-            evaluated = cursor.fetchone()['evaluated']
             return {
                 'total': total,

 import os
 import json
+import aiosqlite
 from datetime import date, datetime, timedelta
 from typing import Any, Dict, List, Optional
+from contextlib import asynccontextmanager
 class PapersDatabase():
         super().__init__(**kwargs)
         self.db_path = None
+    async def init_db(self, config):
         """Initialize the database with required tables"""
         self.db_path = config.db_path
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
             # Create papers cache table
+            await cursor.execute('''
                 CREATE TABLE IF NOT EXISTS papers_cache (
                     date_str TEXT PRIMARY KEY,
                     html_content TEXT NOT NULL,
             ''')
             # Create papers table for individual arXiv papers
+            await cursor.execute('''
                 CREATE TABLE IF NOT EXISTS papers (
                     arxiv_id TEXT PRIMARY KEY,
                     title TEXT NOT NULL,
             ''')
             # Create latest_date table to track the most recent available date
+            await cursor.execute('''
                 CREATE TABLE IF NOT EXISTS latest_date (
                     id INTEGER PRIMARY KEY CHECK (id = 1),
                     date_str TEXT NOT NULL,
             ''')
             # Insert default latest_date record if it doesn't exist
+            await cursor.execute('''
                 INSERT OR IGNORE INTO latest_date (id, date_str)
                 VALUES (1, ?)
             ''', (date.today().isoformat(),))
+            await conn.commit()
+    @asynccontextmanager
+    async def get_connection(self):
         """Context manager for database connections"""
+        conn = await aiosqlite.connect(self.db_path)
+        conn.row_factory = aiosqlite.Row  # Enable dict-like access
+        # Enable WAL mode for better concurrency
+        await conn.execute("PRAGMA journal_mode=WAL")
+        await conn.execute("PRAGMA synchronous=NORMAL")
+        await conn.execute("PRAGMA cache_size=10000")
+        await conn.execute("PRAGMA temp_store=MEMORY")
         try:
             yield conn
         finally:
+            await conn.close()
+    async def get_cached_papers(self, date_str: str) -> Optional[Dict[str, Any]]:
         """Get cached papers for a specific date"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('''
                 SELECT parsed_cards, created_at
                 FROM papers_cache
                 WHERE date_str = ?
             ''', (date_str,))
+            row = await cursor.fetchone()
             if row:
                 return {
                     'cards': json.loads(row['parsed_cards']),
                 }
             return None
+    async def cache_papers(self, date_str: str, html_content: str, parsed_cards: List[Dict[str, Any]]):
         """Cache papers for a specific date"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('''
                 INSERT OR REPLACE INTO papers_cache
                 (date_str, html_content, parsed_cards, updated_at)
                 VALUES (?, ?, ?, CURRENT_TIMESTAMP)
             ''', (date_str, html_content, json.dumps(parsed_cards)))
+            await conn.commit()
+    async def get_latest_cached_date(self) -> Optional[str]:
         """Get the latest cached date"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('SELECT date_str FROM latest_date WHERE id = 1')
+            row = await cursor.fetchone()
             return row['date_str'] if row else None
+    async def update_latest_date(self, date_str: str):
         """Update the latest available date"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('''
                 UPDATE latest_date
                 SET date_str = ?, updated_at = CURRENT_TIMESTAMP
                 WHERE id = 1
             ''', (date_str,))
+            await conn.commit()
+    async def is_cache_fresh(self, date_str: str, max_age_hours: int = 24) -> bool:
         """Check if cache is fresh (within max_age_hours)"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('''
                 SELECT updated_at
                 FROM papers_cache
                 WHERE date_str = ?
             ''', (date_str,))
+            row = await cursor.fetchone()
             if not row:
                 return False
             age = datetime.now(cached_time.tzinfo) - cached_time
             return age.total_seconds() < max_age_hours * 3600
+    async def cleanup_old_cache(self, days_to_keep: int = 7):
         """Clean up old cache entries"""
         cutoff_date = (datetime.now() - timedelta(days=days_to_keep)).isoformat()
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('''
                 DELETE FROM papers_cache
                 WHERE updated_at < ?
             ''', (cutoff_date,))
+            await conn.commit()
     # Papers table methods
+    async def insert_paper(self, arxiv_id: str, title: str, authors: str, abstract: str = None,
                     categories: str = None, published_date: str = None):
         """Insert a new paper into the papers table"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('''
                 INSERT OR REPLACE INTO papers
                 (arxiv_id, title, authors, abstract, categories, published_date, updated_at)
                 VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
             ''', (arxiv_id, title, authors, abstract, categories, published_date))
+            await conn.commit()
+    async def get_paper(self, arxiv_id: str) -> Optional[Dict[str, Any]]:
         """Get a paper by arxiv_id"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('''
                 SELECT * FROM papers WHERE arxiv_id = ?
             ''', (arxiv_id,))
+            row = await cursor.fetchone()
             if row:
                 return dict(row)
             return None
+    async def get_papers_by_evaluation_status(self, is_evaluated: bool = None) -> List[Dict[str, Any]]:
         """Get papers by evaluation status"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
             if is_evaluated is None:
+                await cursor.execute('SELECT * FROM papers ORDER BY created_at DESC')
             else:
+                await cursor.execute('''
                     SELECT * FROM papers
                     WHERE is_evaluated = ?
                     ORDER BY created_at DESC
                 ''', (is_evaluated,))
+            rows = await cursor.fetchall()
+            return [dict(row) for row in rows]
+    async def update_paper_evaluation(self, arxiv_id: str, evaluation_content: str,
                                evaluation_score: float = None, overall_score: float = None, evaluation_tags: str = None):
         """Update paper with evaluation content"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('''
                 UPDATE papers
                 SET evaluation_content = ?,
                     evaluation_score = ?,
                     updated_at = CURRENT_TIMESTAMP
                 WHERE arxiv_id = ?
             ''', (evaluation_content, evaluation_score, overall_score, evaluation_tags, arxiv_id))
+            await conn.commit()
+    async def update_paper_status(self, arxiv_id: str, status: str):
         """Update paper evaluation status"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('''
                 UPDATE papers
                 SET evaluation_status = ?,
                     updated_at = CURRENT_TIMESTAMP
                 WHERE arxiv_id = ?
             ''', (status, arxiv_id))
+            await conn.commit()
+    async def get_unevaluated_papers(self) -> List[Dict[str, Any]]:
         """Get all papers that haven't been evaluated yet"""
+        return await self.get_papers_by_evaluation_status(is_evaluated=False)
+    async def get_evaluated_papers(self) -> List[Dict[str, Any]]:
         """Get all papers that have been evaluated"""
+        return await self.get_papers_by_evaluation_status(is_evaluated=True)
+    async def search_papers(self, query: str) -> List[Dict[str, Any]]:
         """Search papers by title, authors, or abstract"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
             search_pattern = f'%{query}%'
+            await cursor.execute('''
                 SELECT * FROM papers
                 WHERE title LIKE ? OR authors LIKE ? OR abstract LIKE ?
                 ORDER BY created_at DESC
             ''', (search_pattern, search_pattern, search_pattern))
+            rows = await cursor.fetchall()
+            return [dict(row) for row in rows]
+    async def delete_paper(self, arxiv_id: str):
         """Delete a paper from the database"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('DELETE FROM papers WHERE arxiv_id = ?', (arxiv_id,))
+            await conn.commit()
+    async def get_papers_count(self) -> Dict[str, int]:
         """Get count of papers by evaluation status"""
+        async with self.get_connection() as conn:
+            cursor = await conn.cursor()
+            await cursor.execute('SELECT COUNT(*) as total FROM papers')
+            total_row = await cursor.fetchone()
+            total = total_row['total']
+            await cursor.execute('SELECT COUNT(*) as evaluated FROM papers WHERE is_evaluated = TRUE')
+            evaluated_row = await cursor.fetchone()
+            evaluated = evaluated_row['evaluated']
             return {
                 'total': total,

debug_comparison.py → test/debug_comparison.py RENAMED Viewed

File without changes

test/test_async_db.py ADDED Viewed

	@@ -0,0 +1,138 @@

+#!/usr/bin/env python3
+"""
+Test script for async database operations
+"""
+import asyncio
+import argparse
+import os
+import sys
+from pathlib import Path
+from mmengine.config import DictAction
+# Add the project root to the path
+root = str(Path(__file__).resolve().parents[1])
+sys.path.append(root)
+from src.database import db
+from src.config import config
+from src.logger import logger
+def parse_args():
+    parser = argparse.ArgumentParser(description='main')
+    parser.add_argument("--config", default=os.path.join(root, "configs", "paper_agent.py"), help="config file path")
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
+    args = parser.parse_args()
+    return args
+async def test_async_database():
+    """Test async database operations"""
+    print("🧪 Testing Async Database Operations")
+    try:
+        # Initialize database
+        await db.init_db(config=config)
+        print("✅ Database initialized successfully")
+        # Test inserting a paper
+        test_arxiv_id = "2401.00001"
+        await db.insert_paper(
+            arxiv_id=test_arxiv_id,
+            title="Test Async Paper",
+            authors="Test Author",
+            abstract="This is a test paper for async database operations.",
+            categories="cs.AI",
+            published_date="2024-01-01"
+        )
+        print("✅ Paper inserted successfully")
+        # Test getting the paper
+        paper = await db.get_paper(test_arxiv_id)
+        if paper:
+            print(f"✅ Paper retrieved: {paper['title']}")
+        else:
+            print("❌ Paper not found")
+            return False
+        # Test updating paper evaluation
+        await db.update_paper_evaluation(
+            arxiv_id=test_arxiv_id,
+            evaluation_content="Test evaluation content",
+            evaluation_score=3.5,
+            overall_score=3.2,
+            evaluation_tags="test_tag"
+        )
+        print("✅ Paper evaluation updated successfully")
+        # Test getting evaluated papers
+        evaluated_papers = await db.get_evaluated_papers()
+        print(f"✅ Found {len(evaluated_papers)} evaluated papers")
+        # Test getting paper count
+        count = await db.get_papers_count()
+        print(f"✅ Paper count: {count}")
+        # Test searching papers
+        search_results = await db.search_papers("Test")
+        print(f"✅ Search results: {len(search_results)} papers found")
+        # Test cache operations
+        await db.cache_papers("2024-01-01", "<html>test</html>", [{"test": "data"}])
+        print("✅ Cache operation successful")
+        cached_data = await db.get_cached_papers("2024-01-01")
+        if cached_data:
+            print("✅ Cache retrieval successful")
+        else:
+            print("❌ Cache retrieval failed")
+        # Test cache freshness
+        is_fresh = await db.is_cache_fresh("2024-01-01")
+        print(f"✅ Cache freshness check: {is_fresh}")
+        print("\n🎉 All async database tests passed!")
+        return True
+    except Exception as e:
+        print(f"❌ Error during async database test: {str(e)}")
+        import traceback
+        traceback.print_exc()
+        return False
+async def main():
+    """Main function"""
+    print("🚀 Starting Async Database Test")
+        # Parse command line arguments
+    args = parse_args()
+    # Initialize the configuration
+    config.init_config(args.config, args)
+    # Initialize logger
+    logger.init_logger(config=config)
+    # Run the test
+    success = await test_async_database()
+    if success:
+        print("\n✅ All tests completed successfully!")
+        sys.exit(0)
+    else:
+        print("\n❌ Tests failed!")
+        sys.exit(1)
+if __name__ == "__main__":
+    asyncio.run(main())

test/test_concurrent_eval.py ADDED Viewed

	@@ -0,0 +1,97 @@

+#!/usr/bin/env python3
+"""
+Test script for concurrent evaluation operations
+"""
+import asyncio
+import aiohttp
+import json
+import sys
+from pathlib import Path
+# Add the project root to the path
+root = str(Path(__file__).resolve().parents[1])
+sys.path.append(root)
+# Test papers (these should exist in your database)
+TEST_PAPERS = [
+    "2401.00001",
+    "2401.00002",
+    "2401.00003"
+]
+BASE_URL = "http://localhost:7860"
+async def test_concurrent_evaluations():
+    """Test concurrent evaluation of multiple papers"""
+    print("🧪 Testing Concurrent Evaluations")
+    async with aiohttp.ClientSession() as session:
+        # Start multiple evaluations concurrently
+        tasks = []
+        for arxiv_id in TEST_PAPERS:
+            print(f"Starting evaluation for {arxiv_id}")
+            task = asyncio.create_task(start_evaluation(session, arxiv_id))
+            tasks.append(task)
+        # Wait for all evaluations to start
+        results = await asyncio.gather(*tasks, return_exceptions=True)
+        print("\n=== Evaluation Start Results ===")
+        for i, result in enumerate(results):
+            if isinstance(result, Exception):
+                print(f"❌ Error starting evaluation for {TEST_PAPERS[i]}: {result}")
+            else:
+                print(f"✅ Started evaluation for {TEST_PAPERS[i]}: {result.get('status')}")
+        # Check active tasks
+        print("\n=== Checking Active Tasks ===")
+        async with session.get(f"{BASE_URL}/api/papers/evaluate/active-tasks") as response:
+            if response.status == 200:
+                active_tasks = await response.json()
+                print(f"Active tasks: {active_tasks['total_active']}")
+                print(f"Tracked tasks: {active_tasks['total_tracked']}")
+                for arxiv_id, task_info in active_tasks['active_tasks'].items():
+                    print(f"  - {arxiv_id}: {task_info['status']}")
+            else:
+                print(f"❌ Failed to get active tasks: {response.status}")
+        # Monitor status for a few seconds
+        print("\n=== Monitoring Status ===")
+        for _ in range(5):
+            await asyncio.sleep(2)
+            for arxiv_id in TEST_PAPERS:
+                async with session.get(f"{BASE_URL}/api/papers/evaluate/{arxiv_id}/status") as response:
+                    if response.status == 200:
+                        status = await response.json()
+                        print(f"{arxiv_id}: {status['status']} (running: {status.get('is_running', False)})")
+                    else:
+                        print(f"❌ Failed to get status for {arxiv_id}")
+async def start_evaluation(session, arxiv_id):
+    """Start evaluation for a specific paper"""
+    async with session.post(f"{BASE_URL}/api/papers/evaluate/{arxiv_id}") as response:
+        if response.status == 200:
+            return await response.json()
+        else:
+            error_text = await response.text()
+            raise Exception(f"HTTP {response.status}: {error_text}")
+async def main():
+    """Main function"""
+    print("🚀 Starting Concurrent Evaluation Test")
+    try:
+        await test_concurrent_evaluations()
+        print("\n✅ Concurrent evaluation test completed!")
+    except Exception as e:
+        print(f"\n❌ Test failed: {str(e)}")
+        import traceback
+        traceback.print_exc()
+        sys.exit(1)
+if __name__ == "__main__":
+    asyncio.run(main())

test_evaluation.py → test/test_evaluation.py RENAMED Viewed

@@ -15,7 +15,7 @@ from mmengine import DictAction
 load_dotenv(verbose=True)
 # 设置根目录路径
-root = str(Path(__file__).parent)
 sys.path.append(root)
 from src.database import db
@@ -64,13 +64,13 @@ async def test_evaluation():
     try:
         # Check if paper exists in database
-        paper = db.get_paper(test_arxiv_id)
         if paper:
             print(f"✅ Paper found in database: {paper['title']}")
         else:
             print(f"⚠️  Paper not in database, creating new record")
             # Insert test paper
-            db.insert_paper(
                 arxiv_id=test_arxiv_id,
                 title="Test Paper for Evaluation",
                 authors="Test Author",
@@ -100,7 +100,7 @@ async def test_evaluation():
             print("⚠️  Evaluation result may be incomplete")
         # Check evaluation status in database
-        updated_paper = db.get_paper(test_arxiv_id)
         if updated_paper and updated_paper.get('is_evaluated'):
             print("✅ Evaluation saved to database")
             print(f"Evaluation score: {updated_paper.get('evaluation_score')}")
@@ -123,14 +123,14 @@ async def test_database_operations():
     try:
         # Test getting paper
-        paper = db.get_paper("2508.09889")
         if paper:
             print(f"✅ Database connection OK, found paper: {paper['title']}")
         else:
             print("⚠️  Test paper not found in database")
         # Test getting paper statistics
-        stats = db.get_papers_count()
         print(f"✅ Paper statistics: Total={stats['total']}, Evaluated={stats['evaluated']}, Unevaluated={stats['unevaluated']}")
         return True
@@ -156,7 +156,7 @@ async def main():
     logger.info(f"| Config:\n{config.pretty_text}")
     # Initialize database
-    db.init_db(config=config)
     logger.info(f"| Database initialized at: {config.db_path}")
     print(f"✅ Database initialized: {config.db_path}")

 load_dotenv(verbose=True)
 # 设置根目录路径
+root = str(Path(__file__).resolve().parents[1])
 sys.path.append(root)
 from src.database import db
     try:
         # Check if paper exists in database
+        paper = await db.get_paper(test_arxiv_id)
         if paper:
             print(f"✅ Paper found in database: {paper['title']}")
         else:
             print(f"⚠️  Paper not in database, creating new record")
             # Insert test paper
+            await db.insert_paper(
                 arxiv_id=test_arxiv_id,
                 title="Test Paper for Evaluation",
                 authors="Test Author",
             print("⚠️  Evaluation result may be incomplete")
         # Check evaluation status in database
+        updated_paper = await db.get_paper(test_arxiv_id)
         if updated_paper and updated_paper.get('is_evaluated'):
             print("✅ Evaluation saved to database")
             print(f"Evaluation score: {updated_paper.get('evaluation_score')}")
     try:
         # Test getting paper
+        paper = await db.get_paper("2508.09889")
         if paper:
             print(f"✅ Database connection OK, found paper: {paper['title']}")
         else:
             print("⚠️  Test paper not found in database")
         # Test getting paper statistics
+        stats = await db.get_papers_count()
         print(f"✅ Paper statistics: Total={stats['total']}, Evaluated={stats['evaluated']}, Unevaluated={stats['unevaluated']}")
         return True
     logger.info(f"| Config:\n{config.pretty_text}")
     # Initialize database
+    await db.init_db(config=config)
     logger.info(f"| Database initialized at: {config.db_path}")
     print(f"✅ Database initialized: {config.db_path}")