DVampire commited on
Commit
78f6650
·
1 Parent(s): 6c5cf21

update website

Browse files
DATABASE_MIGRATION_SUMMARY.md DELETED
@@ -1,147 +0,0 @@
1
- # 数据库迁移完成总结
2
-
3
- ## 概述
4
-
5
- 已成功将系统从JSON文件存储迁移到SQLite数据库存储,现在每篇arXiv文章的评价内容都存储在数据库中,支持更好的数据管理和查询功能。
6
-
7
- ## 主要修改
8
-
9
- ### 1. 数据库结构 (`src/database/db.py`)
10
-
11
- **新增 papers 表:**
12
- - `arxiv_id`: 论文唯一标识
13
- - `title`, `authors`, `abstract`: 论文基本信息
14
- - `evaluation_content`: 评价内容(JSON格式)
15
- - `evaluation_score`: 总体自动化评分
16
- - `evaluation_tags`: 评价标签
17
- - `is_evaluated`: 评价状态标记
18
- - `evaluation_date`: 评价时间
19
- - `created_at`, `updated_at`: 时间戳
20
-
21
- **新增数据库方法:**
22
- - `insert_paper()`: 插入新论文
23
- - `get_paper()`: 获取单个论文
24
- - `update_paper_evaluation()`: 更新评价内容
25
- - `get_evaluated_papers()`: 获取已评价论文
26
- - `get_unevaluated_papers()`: 获取未评价论文
27
- - `search_papers()`: 搜索论文
28
- - `get_papers_count()`: 获取统计信息
29
-
30
- ### 2. 评价器修改 (`src/agents/evaluator.py`)
31
-
32
- **ConversationState 类:**
33
- - 添加 `arxiv_id` 字段
34
-
35
- **save_node 函数:**
36
- - 改为保存到数据库而不是JSON文件
37
- - 自动提取评分和标签信息
38
- - 支持结构化数据存储
39
-
40
- **run_evaluation 函数:**
41
- - 添加 `arxiv_id` 参数支持
42
-
43
- ### 3. API接口修改 (`app.py`)
44
-
45
- **修改的接口:**
46
- - `/api/evals`: 从数据库获取评价列表
47
- - `/api/has-eval/{paper_id}`: 检查数据库中的评价状态
48
- - `/api/eval/{paper_id}`: 从数据库获取评价内容
49
-
50
- **新增接口:**
51
- - `/api/papers/status`: 获取论文统计信息
52
- - `/api/papers/insert`: 插入新论文
53
- - `/api/papers/evaluate/{arxiv_id}`: 评价论文
54
-
55
- ### 4. CLI工具修改 (`src/cli/cli.py`)
56
-
57
- **新增参数:**
58
- - `--arxiv-id`: 指定论文的arXiv ID
59
-
60
- **功能增强:**
61
- - 支持将评价结果保存到数据库
62
- - 保持向后兼容性(仍可保存到文件)
63
-
64
- ## 使用示例
65
-
66
- ### 1. 使用CLI评价论文并保存到数据库
67
-
68
- ```bash
69
- # 评价论文并保存到数据库
70
- python cli.py https://arxiv.org/pdf/2508.05629 --arxiv-id 2508.05629
71
-
72
- # 同时保存到文件和数据库
73
- python cli.py https://arxiv.org/pdf/2508.05629 --arxiv-id 2508.05629 -o /path/to/output
74
- ```
75
-
76
- ### 2. 使用API插入论文
77
-
78
- ```bash
79
- curl -X POST "http://localhost:8000/api/papers/insert" \
80
- -H "Content-Type: application/json" \
81
- -d '{
82
- "arxiv_id": "2508.05629",
83
- "title": "Your Paper Title",
84
- "authors": "Author 1, Author 2",
85
- "abstract": "Paper abstract...",
86
- "categories": "cs.AI, cs.LG",
87
- "published_date": "2024-08-01"
88
- }'
89
- ```
90
-
91
- ### 3. 获取评价统计
92
-
93
- ```bash
94
- curl "http://localhost:8000/api/papers/status"
95
- ```
96
-
97
- ## 数据库优势
98
-
99
- 1. **结构化存储**: 论文信息和评价内容分离,便于管理
100
- 2. **状态跟踪**: 通过 `is_evaluated` 字段跟踪评价状态
101
- 3. **标签系统**: 支持为评价添加标签,便于分类筛选
102
- 4. **搜索功能**: 支持按标题、作者、摘要搜索
103
- 5. **统计功能**: 轻松获取论文统计信息
104
- 6. **API支持**: 完整的RESTful API接口
105
- 7. **数据完整性**: SQLite提供ACID特性
106
-
107
- ## 迁移注意事项
108
-
109
- 1. **现有JSON文件**: 可以编写脚本将现有JSON文件导入数据库
110
- 2. **数据库备份**: 建议定期备份数据库文件
111
- 3. **向后兼容**: CLI工具仍支持保存到文件,保持兼容性
112
- 4. **配置路径**: 数据库文件路径在 `configs/paper_agent.py` 中配置
113
-
114
- ## 测试验证
115
-
116
- 已创建并运行测试脚本验证所有数据库功能:
117
- - ✅ 论文插入
118
- - ✅ 论文查询
119
- - ✅ 评价更新
120
- - ✅ 状态检查
121
- - ✅ 统计功能
122
- - ✅ 搜索功能
123
-
124
- ## 下一步建议
125
-
126
- 1. **数据迁移**: 编写脚本将现有JSON文件导入数据库
127
- 2. **前端更新**: 更新前端界面以支持新的数据库功能
128
- 3. **批量操作**: 添加批量论文插入和评价功能
129
- 4. **数据导出**: 添加数据导出功能
130
- 5. **性能优化**: 为大量数据添加索引优化
131
-
132
- ## 文件清单
133
-
134
- **修改的文件:**
135
- - `src/database/db.py` - 数据库结构和操作
136
- - `src/agents/evaluator.py` - 评价器修改
137
- - `app.py` - API接口修改
138
- - `src/cli/cli.py` - CLI工具修改
139
-
140
- **新增的文件:**
141
- - `DATABASE_USAGE.md` - 使用说明文档
142
- - `DATABASE_MIGRATION_SUMMARY.md` - 本总结文档
143
-
144
- **配置文件:**
145
- - `configs/paper_agent.py` - 数据库路径配置
146
-
147
- 现在系统已经完全支持数据库存储,可以更好地管理论文评价数据!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
DATABASE_USAGE.md DELETED
@@ -1,182 +0,0 @@
1
- # Papers Database 使用说明
2
-
3
- ## 概述
4
-
5
- 现在系统已经支持将arXiv文章和评价内容存储到SQLite数据库中,而不是保存在JSON文件中。这样可以更好地管理论文数据,支持查询、统计和标签管理。
6
-
7
- ## 数据库结构
8
-
9
- ### papers 表
10
-
11
- | 字段 | 类型 | 说明 |
12
- |------|------|------|
13
- | arxiv_id | TEXT PRIMARY KEY | arXiv论文ID |
14
- | title | TEXT NOT NULL | 论文标题 |
15
- | authors | TEXT NOT NULL | 作者列表 |
16
- | abstract | TEXT | 论文摘要 |
17
- | categories | TEXT | 论文分类 |
18
- | published_date | TEXT | 发布日期 |
19
- | evaluation_content | TEXT | 评价内容(JSON格式) |
20
- | evaluation_score | REAL | 总体自动化评分 |
21
- | evaluation_tags | TEXT | 评价标签 |
22
- | is_evaluated | BOOLEAN | 是否已评价 |
23
- | evaluation_date | TIMESTAMP | 评价日期 |
24
- | created_at | TIMESTAMP | 创建时间 |
25
- | updated_at | TIMESTAMP | 更新时间 |
26
-
27
- ## 使用方法
28
-
29
- ### 1. 插入论文
30
-
31
- ```python
32
- from src.database.db import db
33
-
34
- # 插入新论文
35
- db.insert_paper(
36
- arxiv_id="2508.05629",
37
- title="Your Paper Title",
38
- authors="Author 1, Author 2",
39
- abstract="Paper abstract...",
40
- categories="cs.AI, cs.LG",
41
- published_date="2024-08-01"
42
- )
43
- ```
44
-
45
- ### 2. 更新评价
46
-
47
- ```python
48
- # 更新论文评价
49
- db.update_paper_evaluation(
50
- arxiv_id="2508.05629",
51
- evaluation_content='{"overall_automatability": 3, "three_year_feasibility": 75}',
52
- evaluation_score=3.0,
53
- evaluation_tags="3yr_feasibility:75%,overall_automatability:3/4"
54
- )
55
- ```
56
-
57
- ### 3. 查询论文
58
-
59
- ```python
60
- # 获取单个论文
61
- paper = db.get_paper("2508.05629")
62
-
63
- # 获取所有已评价的论文
64
- evaluated_papers = db.get_evaluated_papers()
65
-
66
- # 获取所有未评价的论文
67
- unevaluated_papers = db.get_unevaluated_papers()
68
-
69
- # 搜索论文
70
- search_results = db.search_papers("AI")
71
- ```
72
-
73
- ### 4. 统计信息
74
-
75
- ```python
76
- # 获取论文统计
77
- count = db.get_papers_count()
78
- print(f"总论文数: {count['total']}")
79
- print(f"已评价: {count['evaluated']}")
80
- print(f"未评价: {count['unevaluated']}")
81
- ```
82
-
83
- ## API 接口
84
-
85
- ### 获取评价列表
86
- ```
87
- GET /api/evals
88
- ```
89
-
90
- ### 检查论文是否已评价
91
- ```
92
- GET /api/has-eval/{paper_id}
93
- ```
94
-
95
- ### 获取论文评价
96
- ```
97
- GET /api/eval/{paper_id}
98
- ```
99
-
100
- ### 获取论文统计
101
- ```
102
- GET /api/papers/status
103
- ```
104
-
105
- ### 插入新论文
106
- ```
107
- POST /api/papers/insert
108
- Content-Type: application/json
109
-
110
- {
111
- "arxiv_id": "2508.05629",
112
- "title": "Paper Title",
113
- "authors": "Author 1, Author 2",
114
- "abstract": "Abstract...",
115
- "categories": "cs.AI",
116
- "published_date": "2024-08-01"
117
- }
118
- ```
119
-
120
- ### 评价论文
121
- ```
122
- POST /api/papers/evaluate/{arxiv_id}
123
- ```
124
-
125
- ## CLI 工具使用
126
-
127
- ### 评价论文并保存到数据库
128
-
129
- ```bash
130
- # 使用arxiv_id参数将评价保存到数据库
131
- python cli.py https://arxiv.org/pdf/2508.05629 --arxiv-id 2508.05629
132
-
133
- # 同时保存到文件和数据库
134
- python cli.py https://arxiv.org/pdf/2508.05629 --arxiv-id 2508.05629 -o /path/to/output
135
- ```
136
-
137
- ## 迁移现有数据
138
-
139
- 如果你有现有的JSON评价文件,可以编写脚本将它们导入到数据库中:
140
-
141
- ```python
142
- import json
143
- import os
144
- from src.database.db import db
145
-
146
- def migrate_json_to_db(json_dir="workdir"):
147
- """将JSON文件迁移到数据库"""
148
- for filename in os.listdir(json_dir):
149
- if filename.endswith('.json'):
150
- filepath = os.path.join(json_dir, filename)
151
- with open(filepath, 'r') as f:
152
- data = json.load(f)
153
-
154
- # 提取arxiv_id(假设文件名包含arxiv_id)
155
- arxiv_id = filename.split('_')[0] # 根据实际文件名格式调整
156
-
157
- # 更新数据库中的评价
158
- if 'response' in data:
159
- db.update_paper_evaluation(
160
- arxiv_id=arxiv_id,
161
- evaluation_content=data['response'],
162
- evaluation_score=None, # 需要从内容中解析
163
- evaluation_tags=None
164
- )
165
- print(f"Migrated {filename} for paper {arxiv_id}")
166
- ```
167
-
168
- ## 优势
169
-
170
- 1. **结构化存储**: 论文信息和评价内容分开存储,便于查询
171
- 2. **标签系统**: 支持为评价添加标签,便于分类和筛选
172
- 3. **统计功能**: 可以轻松获取论文统计信息
173
- 4. **搜索功能**: 支持按标题、作者、摘要搜索论文
174
- 5. **状态管理**: 通过`is_evaluated`字段跟踪评价状态
175
- 6. **API支持**: 提供完整的RESTful API接口
176
-
177
- ## 注意事项
178
-
179
- 1. 确保在评价论文前先插入论文基本信息
180
- 2. 评价内容建议使用JSON格式,便于解析和展示
181
- 3. 定期备份数据库文件
182
- 4. 可以使用`evaluation_tags`字段存储关键评分信息,便于快速筛选
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
PROJECT_STRUCTURE.md DELETED
@@ -1,87 +0,0 @@
1
- # PaperIndex 项目结构
2
-
3
- ## 目录组织
4
-
5
- ```
6
- paperindex/
7
- ├── app.py # 主应用程序入口点
8
- ├── cli.py # 命令行工具入口点
9
- ├── src/ # 源代码目录
10
- │ ├── __init__.py
11
- │ ├── app.py # 内部应用入口(已废弃)
12
- │ ├── agents/ # AI 代理模块
13
- │ │ ├── __init__.py
14
- │ │ ├── evaluator.py # 论文评估器
15
- │ │ └── prompt.py # 评估提示词
16
- │ ├── database/ # 数据库模块
17
- │ │ ├── __init__.py
18
- │ │ ├── models.py # 数据库模型和类
19
- │ │ └── papers_cache.db
20
- │ ├── server/ # 服务器模块
21
- │ │ ├── __init__.py
22
- │ │ └── server.py # FastAPI 服务器
23
- │ └── cli/ # 命令行工具模块
24
- │ ├── __init__.py
25
- │ └── cli.py # CLI 实现
26
- ├── frontend/ # 前端文件
27
- │ ├── index.html
28
- │ ├── paper.html
29
- │ ├── main.js
30
- │ ├── paper.js
31
- │ └── styles.css
32
- ├── data/ # 数据目录
33
- │ └── pdfs/
34
- ├── workdir/ # 工作目录
35
- ├── requirements.txt # Python 依赖
36
- ├── Dockerfile # Docker 配置
37
- └── README.md # 项目说明
38
- ```
39
-
40
- ## 模块说明
41
-
42
- ### `src/agents/`
43
- AI 代理模块,负责论文评估功能:
44
- - `evaluator.py`: 使用 LangGraph 和 Claude API 进行论文评估
45
- - `prompt.py`: 包含评估提示词和工具定义
46
-
47
- ### `src/database/`
48
- 数据库管理模块:
49
- - `models.py`: 包含 PapersDatabase 类和数据库操作
50
- - 包含 SQLite 数据库文件
51
- - 负责论文缓存和状态管理
52
-
53
- ### `src/server/`
54
- FastAPI 服务器模块:
55
- - `server.py`: 主要的 Web 服务器实现
56
- - 提供 RESTful API 接口
57
- - 处理前端请求
58
-
59
- ### `src/cli/`
60
- 命令行工具模块:
61
- - `cli.py`: 独立的论文评估命令行工具
62
- - 支持本地 PDF 和在线 URL 评估
63
-
64
- ## 使用方法
65
-
66
- ### 启动 Web 应用
67
- ```bash
68
- python app.py
69
- ```
70
-
71
- ### 使用命令行工具
72
- ```bash
73
- python cli.py <pdf_path_or_url> [options]
74
- ```
75
-
76
- ### 开发模式
77
- ```bash
78
- # 在 src 目录下运行
79
- cd src
80
- python -m uvicorn server.server:app --reload --host 0.0.0.0 --port 8000
81
- ```
82
-
83
- ## 导入路径
84
-
85
- - 从根目录导入:`from src.agents.evaluator import Evaluator`
86
- - 在 src 目录内导入:`from agents.evaluator import Evaluator`
87
- - 模块间导入使用相对路径或绝对路径
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -25,7 +25,6 @@ from src.database import db
25
  from src.logger import logger
26
  from src.config import config
27
  from src.crawl import HuggingFaceDailyPapers
28
- from src.utils import assemble_project_path
29
  from src.agents.evaluator import run_evaluation
30
 
31
  app = FastAPI(title="PaperAgent")
@@ -67,8 +66,8 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
67
  hf_daily = HuggingFaceDailyPapers()
68
 
69
  # First, check if we have fresh cache for the requested date
70
- cached_data = db.get_cached_papers(target_date)
71
- if cached_data and db.is_cache_fresh(target_date):
72
  print(f"Using cached data for {target_date}")
73
  return {
74
  "date": target_date,
@@ -91,8 +90,8 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
91
  print(f"Redirected from {target_date} to {actual_date}")
92
 
93
  # Check if the redirected date has fresh cache
94
- cached_data = db.get_cached_papers(actual_date)
95
- if cached_data and db.is_cache_fresh(actual_date):
96
  print(f"Using cached data for redirected date {actual_date}")
97
  return {
98
  "date": actual_date,
@@ -108,7 +107,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
108
  enriched_cards = await enrich_cards(cards)
109
 
110
  # Cache the results for the redirected date
111
- db.cache_papers(actual_date, html, enriched_cards)
112
 
113
  return {
114
  "date": actual_date,
@@ -121,7 +120,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
121
  # If we got the exact date we requested, process normally
122
  cards = hf_daily.parse_daily_cards(html)
123
  enriched_cards = await enrich_cards(cards)
124
- db.cache_papers(actual_date, html, enriched_cards)
125
 
126
  return {
127
  "date": actual_date,
@@ -134,7 +133,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
134
  except Exception as e:
135
  print(f"Failed to fetch {target_date} for previous navigation: {e}")
136
  # Fallback to cached data if available
137
- cached_data = db.get_cached_papers(target_date)
138
  if cached_data:
139
  return {
140
  "date": target_date,
@@ -157,7 +156,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
157
  if actual_date == target_date:
158
  cards = hf_daily.parse_daily_cards(html)
159
  enriched_cards = await enrich_cards(cards)
160
- db.cache_papers(actual_date, html, enriched_cards)
161
 
162
  return {
163
  "date": actual_date,
@@ -174,8 +173,8 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
174
  # Try to find the next available date by incrementing
175
  next_date = await find_next_available_date_forward(target_date)
176
  if next_date:
177
- cached_data = db.get_cached_papers(next_date)
178
- if cached_data and db.is_cache_fresh(next_date):
179
  print(f"Using cached data for next available date {next_date}")
180
  return {
181
  "date": next_date,
@@ -190,7 +189,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
190
  actual_date, html = await hf_daily.fetch_daily_html(next_date)
191
  cards = hf_daily.parse_daily_cards(html)
192
  enriched_cards = await enrich_cards(cards)
193
- db.cache_papers(actual_date, html, enriched_cards)
194
 
195
  return {
196
  "date": actual_date,
@@ -214,7 +213,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
214
  # Try to find next available date
215
  next_date = await find_next_available_date_forward(target_date)
216
  if next_date:
217
- cached_data = db.get_cached_papers(next_date)
218
  if cached_data:
219
  return {
220
  "date": next_date,
@@ -239,8 +238,8 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
239
  print(f"Redirected from {target_date} to {actual_date}")
240
 
241
  # Check if the redirected date has fresh cache
242
- cached_data = db.get_cached_papers(actual_date)
243
- if cached_data and db.is_cache_fresh(actual_date):
244
  print(f"Using cached data for redirected date {actual_date}")
245
  return {
246
  "date": actual_date,
@@ -256,7 +255,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
256
  enriched_cards = await enrich_cards(cards)
257
 
258
  # Cache the results for the redirected date
259
- db.cache_papers(actual_date, html, enriched_cards)
260
 
261
  return {
262
  "date": actual_date,
@@ -269,7 +268,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
269
  # If we got the exact date we requested, process normally
270
  cards = hf_daily.parse_daily_cards(html)
271
  enriched_cards = await enrich_cards(cards)
272
- db.cache_papers(actual_date, html, enriched_cards)
273
 
274
  return {
275
  "date": actual_date,
@@ -283,7 +282,7 @@ async def get_daily(date_str: Optional[str] = None, direction: Optional[str] = N
283
  print(f"Failed to fetch {target_date}: {e}")
284
 
285
  # If everything fails, return cached data if available
286
- cached_data = db.get_cached_papers(target_date)
287
  if cached_data:
288
  return {
289
  "date": target_date,
@@ -309,7 +308,7 @@ async def find_next_available_date_forward(start_date: str, max_attempts: int =
309
  date_str = current_date.strftime("%Y-%m-%d")
310
 
311
  # Check if we have cache for this date
312
- cached_data = db.get_cached_papers(date_str)
313
  if cached_data:
314
  return date_str
315
 
@@ -338,7 +337,7 @@ async def enrich_cards(cards):
338
  for c in cards:
339
  arxiv_id = c.get("arxiv_id")
340
  if arxiv_id:
341
- paper = db.get_paper(arxiv_id)
342
  if paper:
343
  # Add evaluation status
344
  c["has_eval"] = paper.get('is_evaluated', False)
@@ -369,9 +368,9 @@ async def enrich_cards(cards):
369
 
370
 
371
  @app.get("/api/evals")
372
- def list_evals() -> Dict[str, Any]:
373
  # Get evaluated papers from database
374
- evaluated_papers = db.get_evaluated_papers()
375
  items: List[Dict[str, Any]] = []
376
 
377
  for paper in evaluated_papers:
@@ -388,16 +387,16 @@ def list_evals() -> Dict[str, Any]:
388
 
389
 
390
  @app.get("/api/has-eval/{paper_id}")
391
- def has_eval(paper_id: str) -> Dict[str, bool]:
392
- paper = db.get_paper(paper_id)
393
  exists = paper is not None and paper.get('is_evaluated', False)
394
  return {"exists": exists}
395
 
396
 
397
  @app.get("/api/paper/{paper_id}")
398
- def get_paper_details(paper_id: str) -> Dict[str, Any]:
399
  """Get detailed paper information from database"""
400
- paper = db.get_paper(paper_id)
401
  if not paper:
402
  raise HTTPException(status_code=404, detail="Paper not found")
403
 
@@ -416,8 +415,8 @@ def get_paper_details(paper_id: str) -> Dict[str, Any]:
416
 
417
 
418
  @app.get("/api/paper-score/{paper_id}")
419
- def get_paper_score(paper_id: str) -> Dict[str, Any]:
420
- paper = db.get_paper(paper_id)
421
  print(f"Paper data for {paper_id}:", paper)
422
 
423
  if not paper or not paper.get('is_evaluated', False):
@@ -468,8 +467,8 @@ def get_paper_score(paper_id: str) -> Dict[str, Any]:
468
 
469
 
470
  @app.get("/api/eval/{paper_id}")
471
- def get_eval(paper_id: str) -> Any:
472
- paper = db.get_paper(paper_id)
473
  if not paper or not paper.get('is_evaluated', False):
474
  raise HTTPException(status_code=404, detail="Evaluation not found")
475
 
@@ -491,12 +490,13 @@ def get_eval(paper_id: str) -> Any:
491
 
492
 
493
  @app.get("/api/available-dates")
494
- def get_available_dates() -> Dict[str, Any]:
495
  """Get list of available dates in the cache"""
496
- with db.get_connection() as conn:
497
- cursor = conn.cursor()
498
- cursor.execute('SELECT date_str FROM papers_cache ORDER BY date_str DESC LIMIT 30')
499
- dates = [row['date_str'] for row in cursor.fetchall()]
 
500
 
501
  return {
502
  "available_dates": dates,
@@ -505,21 +505,21 @@ def get_available_dates() -> Dict[str, Any]:
505
 
506
 
507
  @app.get("/api/cache/status")
508
- def get_cache_status() -> Dict[str, Any]:
509
  """Get cache status and statistics"""
510
- with db.get_connection() as conn:
511
- cursor = conn.cursor()
512
 
513
  # Get total cached dates
514
- cursor.execute('SELECT COUNT(*) as count FROM papers_cache')
515
- total_cached = cursor.fetchone()['count']
516
 
517
  # Get latest cached date
518
- cursor.execute('SELECT date_str, updated_at FROM latest_date WHERE id = 1')
519
- latest_info = cursor.fetchone()
520
 
521
  # Get cache age distribution
522
- cursor.execute('''
523
  SELECT
524
  CASE
525
  WHEN updated_at > datetime('now', '-1 hour') THEN '1 hour'
@@ -531,7 +531,8 @@ def get_cache_status() -> Dict[str, Any]:
531
  FROM papers_cache
532
  GROUP BY age_group
533
  ''')
534
- age_distribution = {row['age_group']: row['count'] for row in cursor.fetchall()}
 
535
 
536
  return {
537
  "total_cached_dates": total_cached,
@@ -542,12 +543,12 @@ def get_cache_status() -> Dict[str, Any]:
542
 
543
 
544
  @app.get("/api/papers/status")
545
- def get_papers_status() -> Dict[str, Any]:
546
  """Get papers database status and statistics"""
547
- papers_count = db.get_papers_count()
548
 
549
  # Get recent evaluations
550
- recent_papers = db.get_evaluated_papers()
551
  recent_evaluations = []
552
  for paper in recent_papers[:10]: # Get last 10 evaluations
553
  recent_evaluations.append({
@@ -564,7 +565,7 @@ def get_papers_status() -> Dict[str, Any]:
564
 
565
 
566
  @app.post("/api/papers/insert")
567
- def insert_paper(paper_data: Dict[str, Any]) -> Dict[str, Any]:
568
  """Insert a new paper into the database"""
569
  try:
570
  required_fields = ['arxiv_id', 'title', 'authors']
@@ -572,7 +573,7 @@ def insert_paper(paper_data: Dict[str, Any]) -> Dict[str, Any]:
572
  if field not in paper_data:
573
  raise HTTPException(status_code=400, detail=f"Missing required field: {field}")
574
 
575
- db.insert_paper(
576
  arxiv_id=paper_data['arxiv_id'],
577
  title=paper_data['title'],
578
  authors=paper_data['authors'],
@@ -586,19 +587,26 @@ def insert_paper(paper_data: Dict[str, Any]) -> Dict[str, Any]:
586
  raise HTTPException(status_code=500, detail=f"Failed to insert paper: {str(e)}")
587
 
588
 
 
 
 
589
  @app.post("/api/papers/evaluate/{arxiv_id}")
590
- async def evaluate_paper(arxiv_id: str) -> Dict[str, Any]:
591
  """Evaluate a paper by its arxiv_id"""
592
  try:
593
  # Check if paper exists in database
594
- paper = db.get_paper(arxiv_id)
595
  if not paper:
596
  raise HTTPException(status_code=404, detail="Paper not found in database")
597
 
598
- # Check if already evaluated
599
- if paper.get('is_evaluated', False):
600
  return {"message": f"Paper {arxiv_id} already evaluated", "status": "already_evaluated"}
601
 
 
 
 
 
602
  # Create PDF URL from arxiv_id
603
  pdf_url = f"https://arxiv.org/pdf/{arxiv_id}.pdf"
604
 
@@ -606,8 +614,8 @@ async def evaluate_paper(arxiv_id: str) -> Dict[str, Any]:
606
  async def run_eval():
607
  try:
608
  # Update paper status to "evaluating"
609
- db.update_paper_status(arxiv_id, "evaluating")
610
- logger.info(f"Started evaluation for {arxiv_id}")
611
 
612
  result = await run_evaluation(
613
  pdf_path=pdf_url,
@@ -616,40 +624,51 @@ async def evaluate_paper(arxiv_id: str) -> Dict[str, Any]:
616
  )
617
 
618
  # Update paper status to "completed"
619
- db.update_paper_status(arxiv_id, "completed")
620
- logger.info(f"Evaluation completed for {arxiv_id}")
621
  except Exception as e:
622
  # Update paper status to "failed"
623
- db.update_paper_status(arxiv_id, "failed")
624
- logger.error(f"Evaluation failed for {arxiv_id}: {str(e)}")
 
 
 
 
625
 
626
- # Start evaluation in background
627
- asyncio.create_task(run_eval())
 
628
 
629
  return {
630
- "message": f"Evaluation started for paper {arxiv_id}",
631
  "status": "started",
632
- "pdf_url": pdf_url
 
 
633
  }
634
  except Exception as e:
635
  raise HTTPException(status_code=500, detail=f"Failed to evaluate paper: {str(e)}")
636
 
637
 
638
  @app.get("/api/papers/evaluate/{arxiv_id}/status")
639
- def get_evaluation_status(arxiv_id: str) -> Dict[str, Any]:
640
  """Get evaluation status for a paper"""
641
  try:
642
- paper = db.get_paper(arxiv_id)
643
  if not paper:
644
  raise HTTPException(status_code=404, detail="Paper not found")
645
 
646
  status = paper.get('evaluation_status', 'not_started')
647
  is_evaluated = paper.get('is_evaluated', False)
648
 
 
 
 
649
  return {
650
  "arxiv_id": arxiv_id,
651
  "status": status,
652
  "is_evaluated": is_evaluated,
 
653
  "evaluation_date": paper.get('evaluation_date'),
654
  "evaluation_score": paper.get('evaluation_score')
655
  }
@@ -657,13 +676,88 @@ def get_evaluation_status(arxiv_id: str) -> Dict[str, Any]:
657
  raise HTTPException(status_code=500, detail=f"Failed to get evaluation status: {str(e)}")
658
 
659
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
660
  @app.post("/api/cache/clear")
661
- def clear_cache() -> Dict[str, str]:
662
  """Clear all cached data"""
663
- with db.get_connection() as conn:
664
- cursor = conn.cursor()
665
- cursor.execute('DELETE FROM papers_cache')
666
- conn.commit()
667
  return {"message": "Cache cleared successfully"}
668
 
669
 
@@ -679,7 +773,7 @@ async def refresh_cache(date_str: str) -> Dict[str, Any]:
679
  cards = hf_daily.parse_daily_cards(html)
680
 
681
  # Cache the results
682
- db.cache_papers(actual_date, html, cards)
683
 
684
  return {
685
  "message": f"Cache refreshed for {actual_date}",
@@ -711,7 +805,7 @@ async def get_styles():
711
  response.headers["Expires"] = "0"
712
  return response
713
 
714
- if __name__ == "__main__":
715
  # Parse command line arguments
716
  args = parse_args()
717
 
@@ -724,7 +818,7 @@ if __name__ == "__main__":
724
  logger.info(f"| Config:\n{config.pretty_text}")
725
 
726
  # Initialize the database
727
- db.init_db(config=config)
728
  logger.info(f"| Database initialized at: {config.db_path}")
729
 
730
  # Load Frontend
@@ -733,5 +827,9 @@ if __name__ == "__main__":
733
  logger.info(f"| Frontend initialized at: {config.frontend_path}")
734
 
735
  # Use port 7860 for Hugging Face Spaces, fallback to 7860 for local development
736
- port = int(os.environ.get("PORT", 7860))
737
- uvicorn.run(app, host="0.0.0.0", port=port)
 
 
 
 
 
25
  from src.logger import logger
26
  from src.config import config
27
  from src.crawl import HuggingFaceDailyPapers
 
28
  from src.agents.evaluator import run_evaluation
29
 
30
  app = FastAPI(title="PaperAgent")
 
66
  hf_daily = HuggingFaceDailyPapers()
67
 
68
  # First, check if we have fresh cache for the requested date
69
+ cached_data = await db.get_cached_papers(target_date)
70
+ if cached_data and await db.is_cache_fresh(target_date):
71
  print(f"Using cached data for {target_date}")
72
  return {
73
  "date": target_date,
 
90
  print(f"Redirected from {target_date} to {actual_date}")
91
 
92
  # Check if the redirected date has fresh cache
93
+ cached_data = await db.get_cached_papers(actual_date)
94
+ if cached_data and await db.is_cache_fresh(actual_date):
95
  print(f"Using cached data for redirected date {actual_date}")
96
  return {
97
  "date": actual_date,
 
107
  enriched_cards = await enrich_cards(cards)
108
 
109
  # Cache the results for the redirected date
110
+ await db.cache_papers(actual_date, html, enriched_cards)
111
 
112
  return {
113
  "date": actual_date,
 
120
  # If we got the exact date we requested, process normally
121
  cards = hf_daily.parse_daily_cards(html)
122
  enriched_cards = await enrich_cards(cards)
123
+ await db.cache_papers(actual_date, html, enriched_cards)
124
 
125
  return {
126
  "date": actual_date,
 
133
  except Exception as e:
134
  print(f"Failed to fetch {target_date} for previous navigation: {e}")
135
  # Fallback to cached data if available
136
+ cached_data = await db.get_cached_papers(target_date)
137
  if cached_data:
138
  return {
139
  "date": target_date,
 
156
  if actual_date == target_date:
157
  cards = hf_daily.parse_daily_cards(html)
158
  enriched_cards = await enrich_cards(cards)
159
+ await db.cache_papers(actual_date, html, enriched_cards)
160
 
161
  return {
162
  "date": actual_date,
 
173
  # Try to find the next available date by incrementing
174
  next_date = await find_next_available_date_forward(target_date)
175
  if next_date:
176
+ cached_data = await db.get_cached_papers(next_date)
177
+ if cached_data and await db.is_cache_fresh(next_date):
178
  print(f"Using cached data for next available date {next_date}")
179
  return {
180
  "date": next_date,
 
189
  actual_date, html = await hf_daily.fetch_daily_html(next_date)
190
  cards = hf_daily.parse_daily_cards(html)
191
  enriched_cards = await enrich_cards(cards)
192
+ await db.cache_papers(actual_date, html, enriched_cards)
193
 
194
  return {
195
  "date": actual_date,
 
213
  # Try to find next available date
214
  next_date = await find_next_available_date_forward(target_date)
215
  if next_date:
216
+ cached_data = await db.get_cached_papers(next_date)
217
  if cached_data:
218
  return {
219
  "date": next_date,
 
238
  print(f"Redirected from {target_date} to {actual_date}")
239
 
240
  # Check if the redirected date has fresh cache
241
+ cached_data = await db.get_cached_papers(actual_date)
242
+ if cached_data and await db.is_cache_fresh(actual_date):
243
  print(f"Using cached data for redirected date {actual_date}")
244
  return {
245
  "date": actual_date,
 
255
  enriched_cards = await enrich_cards(cards)
256
 
257
  # Cache the results for the redirected date
258
+ await db.cache_papers(actual_date, html, enriched_cards)
259
 
260
  return {
261
  "date": actual_date,
 
268
  # If we got the exact date we requested, process normally
269
  cards = hf_daily.parse_daily_cards(html)
270
  enriched_cards = await enrich_cards(cards)
271
+ await db.cache_papers(actual_date, html, enriched_cards)
272
 
273
  return {
274
  "date": actual_date,
 
282
  print(f"Failed to fetch {target_date}: {e}")
283
 
284
  # If everything fails, return cached data if available
285
+ cached_data = await db.get_cached_papers(target_date)
286
  if cached_data:
287
  return {
288
  "date": target_date,
 
308
  date_str = current_date.strftime("%Y-%m-%d")
309
 
310
  # Check if we have cache for this date
311
+ cached_data = await db.get_cached_papers(date_str)
312
  if cached_data:
313
  return date_str
314
 
 
337
  for c in cards:
338
  arxiv_id = c.get("arxiv_id")
339
  if arxiv_id:
340
+ paper = await db.get_paper(arxiv_id)
341
  if paper:
342
  # Add evaluation status
343
  c["has_eval"] = paper.get('is_evaluated', False)
 
368
 
369
 
370
  @app.get("/api/evals")
371
+ async def list_evals() -> Dict[str, Any]:
372
  # Get evaluated papers from database
373
+ evaluated_papers = await db.get_evaluated_papers()
374
  items: List[Dict[str, Any]] = []
375
 
376
  for paper in evaluated_papers:
 
387
 
388
 
389
  @app.get("/api/has-eval/{paper_id}")
390
+ async def has_eval(paper_id: str) -> Dict[str, bool]:
391
+ paper = await db.get_paper(paper_id)
392
  exists = paper is not None and paper.get('is_evaluated', False)
393
  return {"exists": exists}
394
 
395
 
396
  @app.get("/api/paper/{paper_id}")
397
+ async def get_paper_details(paper_id: str) -> Dict[str, Any]:
398
  """Get detailed paper information from database"""
399
+ paper = await db.get_paper(paper_id)
400
  if not paper:
401
  raise HTTPException(status_code=404, detail="Paper not found")
402
 
 
415
 
416
 
417
  @app.get("/api/paper-score/{paper_id}")
418
+ async def get_paper_score(paper_id: str) -> Dict[str, Any]:
419
+ paper = await db.get_paper(paper_id)
420
  print(f"Paper data for {paper_id}:", paper)
421
 
422
  if not paper or not paper.get('is_evaluated', False):
 
467
 
468
 
469
  @app.get("/api/eval/{paper_id}")
470
+ async def get_eval(paper_id: str) -> Any:
471
+ paper = await db.get_paper(paper_id)
472
  if not paper or not paper.get('is_evaluated', False):
473
  raise HTTPException(status_code=404, detail="Evaluation not found")
474
 
 
490
 
491
 
492
  @app.get("/api/available-dates")
493
+ async def get_available_dates() -> Dict[str, Any]:
494
  """Get list of available dates in the cache"""
495
+ async with db.get_connection() as conn:
496
+ cursor = await conn.cursor()
497
+ await cursor.execute('SELECT date_str FROM papers_cache ORDER BY date_str DESC LIMIT 30')
498
+ rows = await cursor.fetchall()
499
+ dates = [row['date_str'] for row in rows]
500
 
501
  return {
502
  "available_dates": dates,
 
505
 
506
 
507
  @app.get("/api/cache/status")
508
+ async def get_cache_status() -> Dict[str, Any]:
509
  """Get cache status and statistics"""
510
+ async with db.get_connection() as conn:
511
+ cursor = await conn.cursor()
512
 
513
  # Get total cached dates
514
+ await cursor.execute('SELECT COUNT(*) as count FROM papers_cache')
515
+ total_cached = (await cursor.fetchone())['count']
516
 
517
  # Get latest cached date
518
+ await cursor.execute('SELECT date_str, updated_at FROM latest_date WHERE id = 1')
519
+ latest_info = await cursor.fetchone()
520
 
521
  # Get cache age distribution
522
+ await cursor.execute('''
523
  SELECT
524
  CASE
525
  WHEN updated_at > datetime('now', '-1 hour') THEN '1 hour'
 
531
  FROM papers_cache
532
  GROUP BY age_group
533
  ''')
534
+ rows = await cursor.fetchall()
535
+ age_distribution = {row['age_group']: row['count'] for row in rows}
536
 
537
  return {
538
  "total_cached_dates": total_cached,
 
543
 
544
 
545
  @app.get("/api/papers/status")
546
+ async def get_papers_status() -> Dict[str, Any]:
547
  """Get papers database status and statistics"""
548
+ papers_count = await db.get_papers_count()
549
 
550
  # Get recent evaluations
551
+ recent_papers = await db.get_evaluated_papers()
552
  recent_evaluations = []
553
  for paper in recent_papers[:10]: # Get last 10 evaluations
554
  recent_evaluations.append({
 
565
 
566
 
567
  @app.post("/api/papers/insert")
568
+ async def insert_paper(paper_data: Dict[str, Any]) -> Dict[str, Any]:
569
  """Insert a new paper into the database"""
570
  try:
571
  required_fields = ['arxiv_id', 'title', 'authors']
 
573
  if field not in paper_data:
574
  raise HTTPException(status_code=400, detail=f"Missing required field: {field}")
575
 
576
+ await db.insert_paper(
577
  arxiv_id=paper_data['arxiv_id'],
578
  title=paper_data['title'],
579
  authors=paper_data['authors'],
 
587
  raise HTTPException(status_code=500, detail=f"Failed to insert paper: {str(e)}")
588
 
589
 
590
+ # Global task tracker for concurrent evaluations
591
+ evaluation_tasks = {}
592
+
593
  @app.post("/api/papers/evaluate/{arxiv_id}")
594
+ async def evaluate_paper(arxiv_id: str, force_reevaluate: bool = False) -> Dict[str, Any]:
595
  """Evaluate a paper by its arxiv_id"""
596
  try:
597
  # Check if paper exists in database
598
+ paper = await db.get_paper(arxiv_id)
599
  if not paper:
600
  raise HTTPException(status_code=404, detail="Paper not found in database")
601
 
602
+ # Check if already evaluated (unless force_reevaluate is True)
603
+ if not force_reevaluate and paper.get('is_evaluated', False):
604
  return {"message": f"Paper {arxiv_id} already evaluated", "status": "already_evaluated"}
605
 
606
+ # Check if evaluation is already running
607
+ if arxiv_id in evaluation_tasks and not evaluation_tasks[arxiv_id].done():
608
+ return {"message": f"Evaluation already running for {arxiv_id}", "status": "already_running"}
609
+
610
  # Create PDF URL from arxiv_id
611
  pdf_url = f"https://arxiv.org/pdf/{arxiv_id}.pdf"
612
 
 
614
  async def run_eval():
615
  try:
616
  # Update paper status to "evaluating"
617
+ await db.update_paper_status(arxiv_id, "evaluating")
618
+ logger.info(f"Started {'re-' if force_reevaluate else ''}evaluation for {arxiv_id}")
619
 
620
  result = await run_evaluation(
621
  pdf_path=pdf_url,
 
624
  )
625
 
626
  # Update paper status to "completed"
627
+ await db.update_paper_status(arxiv_id, "completed")
628
+ logger.info(f"{'Re-' if force_reevaluate else ''}evaluation completed for {arxiv_id}")
629
  except Exception as e:
630
  # Update paper status to "failed"
631
+ await db.update_paper_status(arxiv_id, "failed")
632
+ logger.error(f"{'Re-' if force_reevaluate else ''}evaluation failed for {arxiv_id}: {str(e)}")
633
+ finally:
634
+ # Clean up task from tracker
635
+ if arxiv_id in evaluation_tasks:
636
+ del evaluation_tasks[arxiv_id]
637
 
638
+ # Start evaluation in background and track it
639
+ task = asyncio.create_task(run_eval())
640
+ evaluation_tasks[arxiv_id] = task
641
 
642
  return {
643
+ "message": f"{'Re-' if force_reevaluate else ''}evaluation started for paper {arxiv_id}",
644
  "status": "started",
645
+ "pdf_url": pdf_url,
646
+ "concurrent_tasks": len(evaluation_tasks),
647
+ "is_reevaluate": force_reevaluate
648
  }
649
  except Exception as e:
650
  raise HTTPException(status_code=500, detail=f"Failed to evaluate paper: {str(e)}")
651
 
652
 
653
  @app.get("/api/papers/evaluate/{arxiv_id}/status")
654
+ async def get_evaluation_status(arxiv_id: str) -> Dict[str, Any]:
655
  """Get evaluation status for a paper"""
656
  try:
657
+ paper = await db.get_paper(arxiv_id)
658
  if not paper:
659
  raise HTTPException(status_code=404, detail="Paper not found")
660
 
661
  status = paper.get('evaluation_status', 'not_started')
662
  is_evaluated = paper.get('is_evaluated', False)
663
 
664
+ # Check if task is currently running
665
+ is_running = arxiv_id in evaluation_tasks and not evaluation_tasks[arxiv_id].done()
666
+
667
  return {
668
  "arxiv_id": arxiv_id,
669
  "status": status,
670
  "is_evaluated": is_evaluated,
671
+ "is_running": is_running,
672
  "evaluation_date": paper.get('evaluation_date'),
673
  "evaluation_score": paper.get('evaluation_score')
674
  }
 
676
  raise HTTPException(status_code=500, detail=f"Failed to get evaluation status: {str(e)}")
677
 
678
 
679
+ @app.post("/api/papers/reevaluate/{arxiv_id}")
680
+ async def reevaluate_paper(arxiv_id: str) -> Dict[str, Any]:
681
+ """Re-evaluate a paper by its arxiv_id"""
682
+ try:
683
+ # Check if paper exists in database
684
+ paper = await db.get_paper(arxiv_id)
685
+ if not paper:
686
+ raise HTTPException(status_code=404, detail="Paper not found in database")
687
+
688
+ # Check if evaluation is already running
689
+ if arxiv_id in evaluation_tasks and not evaluation_tasks[arxiv_id].done():
690
+ return {"message": f"Evaluation already running for {arxiv_id}", "status": "already_running"}
691
+
692
+ # Create PDF URL from arxiv_id
693
+ pdf_url = f"https://arxiv.org/pdf/{arxiv_id}.pdf"
694
+
695
+ # Run re-evaluation in background task
696
+ async def run_reeval():
697
+ try:
698
+ # Update paper status to "evaluating"
699
+ await db.update_paper_status(arxiv_id, "evaluating")
700
+ logger.info(f"Started re-evaluation for {arxiv_id}")
701
+
702
+ result = await run_evaluation(
703
+ pdf_path=pdf_url,
704
+ arxiv_id=arxiv_id,
705
+ api_key=os.getenv("ANTHROPIC_API_KEY")
706
+ )
707
+
708
+ # Update paper status to "completed"
709
+ await db.update_paper_status(arxiv_id, "completed")
710
+ logger.info(f"Re-evaluation completed for {arxiv_id}")
711
+ except Exception as e:
712
+ # Update paper status to "failed"
713
+ await db.update_paper_status(arxiv_id, "failed")
714
+ logger.error(f"Re-evaluation failed for {arxiv_id}: {str(e)}")
715
+ finally:
716
+ # Clean up task from tracker
717
+ if arxiv_id in evaluation_tasks:
718
+ del evaluation_tasks[arxiv_id]
719
+
720
+ # Start re-evaluation in background and track it
721
+ task = asyncio.create_task(run_reeval())
722
+ evaluation_tasks[arxiv_id] = task
723
+
724
+ return {
725
+ "message": f"Re-evaluation started for paper {arxiv_id}",
726
+ "status": "started",
727
+ "pdf_url": pdf_url,
728
+ "concurrent_tasks": len(evaluation_tasks),
729
+ "is_reevaluate": True
730
+ }
731
+ except Exception as e:
732
+ raise HTTPException(status_code=500, detail=f"Failed to re-evaluate paper: {str(e)}")
733
+
734
+
735
+ @app.get("/api/papers/evaluate/active-tasks")
736
+ async def get_active_evaluation_tasks() -> Dict[str, Any]:
737
+ """Get list of currently running evaluation tasks"""
738
+ active_tasks = {}
739
+ for arxiv_id, task in evaluation_tasks.items():
740
+ if not task.done():
741
+ active_tasks[arxiv_id] = {
742
+ "status": "running",
743
+ "done": task.done(),
744
+ "cancelled": task.cancelled()
745
+ }
746
+
747
+ return {
748
+ "active_tasks": active_tasks,
749
+ "total_active": len(active_tasks),
750
+ "total_tracked": len(evaluation_tasks)
751
+ }
752
+
753
+
754
  @app.post("/api/cache/clear")
755
+ async def clear_cache() -> Dict[str, str]:
756
  """Clear all cached data"""
757
+ async with db.get_connection() as conn:
758
+ cursor = await conn.cursor()
759
+ await cursor.execute('DELETE FROM papers_cache')
760
+ await conn.commit()
761
  return {"message": "Cache cleared successfully"}
762
 
763
 
 
773
  cards = hf_daily.parse_daily_cards(html)
774
 
775
  # Cache the results
776
+ await db.cache_papers(actual_date, html, cards)
777
 
778
  return {
779
  "message": f"Cache refreshed for {actual_date}",
 
805
  response.headers["Expires"] = "0"
806
  return response
807
 
808
+ async def main():
809
  # Parse command line arguments
810
  args = parse_args()
811
 
 
818
  logger.info(f"| Config:\n{config.pretty_text}")
819
 
820
  # Initialize the database
821
+ await db.init_db(config=config)
822
  logger.info(f"| Database initialized at: {config.db_path}")
823
 
824
  # Load Frontend
 
827
  logger.info(f"| Frontend initialized at: {config.frontend_path}")
828
 
829
  # Use port 7860 for Hugging Face Spaces, fallback to 7860 for local development
830
+ config_uvicorn = uvicorn.Config(app, host="0.0.0.0", port=7860)
831
+ server = uvicorn.Server(config_uvicorn)
832
+ await server.serve()
833
+
834
+ if __name__ == "__main__":
835
+ asyncio.run(main())
frontend/index.html CHANGED
@@ -48,10 +48,16 @@
48
  </div>
49
 
50
  <div class="header-center">
51
- <div class="ai-search-container">
52
- <i class="fas fa-sparkles"></i>
53
- <input type="text" placeholder="Search any paper with AI..." class="ai-search-input">
54
- <i class="fas fa-cube"></i>
 
 
 
 
 
 
55
  </div>
56
  </div>
57
 
 
48
  </div>
49
 
50
  <div class="header-center">
51
+ <div class="search-batch-container">
52
+ <div class="ai-search-container">
53
+ <i class="fas fa-sparkles"></i>
54
+ <input type="text" placeholder="Search any paper with AI..." class="ai-search-input">
55
+ <i class="fas fa-cube"></i>
56
+ </div>
57
+ <button class="batch-evaluate-btn" id="batchEvaluateBtn">
58
+ <i class="fas fa-rocket"></i>
59
+ <span>Evaluate All</span>
60
+ </button>
61
  </div>
62
  </div>
63
 
frontend/main.js CHANGED
@@ -416,6 +416,9 @@ class PaperCardRenderer {
416
  button.onclick = () => {
417
  window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
418
  };
 
 
 
419
  } else {
420
  // Paper doesn't have evaluation - show evaluate button
421
  evalIcon.className = 'fas fa-play eval-icon';
@@ -433,6 +436,145 @@ class PaperCardRenderer {
433
  }
434
  }
435
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
436
  async checkPaperScore(card, arxivId) {
437
  try {
438
  // First check if the card already has score data from the API response
@@ -500,17 +642,17 @@ class PaperCardRenderer {
500
  }, 100);
501
  }
502
 
503
- async evaluatePaper(button, arxivId) {
504
  const spinner = button.querySelector('.fa-spinner');
505
  const evalIcon = button.querySelector('.eval-icon');
506
  const evalText = button.querySelector('.eval-text');
507
  const paperTitle = button.getAttribute('data-paper-title');
508
 
509
- // Show loading state
 
510
  spinner.style.display = 'inline-block';
511
  evalIcon.style.display = 'none';
512
- evalText.textContent = 'Evaluating...';
513
- button.className = 'eval-button evaluating-state';
514
  button.disabled = true;
515
 
516
  try {
@@ -534,23 +676,27 @@ class PaperCardRenderer {
534
  });
535
 
536
  // Start evaluation
537
- const response = await fetch(`/api/papers/evaluate/${encodeURIComponent(arxivId)}`, {
 
 
 
 
538
  method: 'POST'
539
  });
540
 
541
  if (response.ok) {
542
  const result = await response.json();
543
 
544
- if (result.status === 'already_evaluated') {
545
  // Paper was already evaluated, redirect to evaluation page
546
  window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
547
  } else {
548
  // Evaluation started, show progress and poll for status
549
- evalText.textContent = 'Started...';
550
  button.className = 'eval-button started-state';
551
 
552
  // Start polling for status
553
- this.pollEvaluationStatus(button, arxivId);
554
  }
555
  } else {
556
  throw new Error('Failed to start evaluation');
@@ -567,14 +713,15 @@ class PaperCardRenderer {
567
  }
568
  }
569
 
570
- async pollEvaluationStatus(button, arxivId) {
571
  const evalIcon = button.querySelector('.eval-icon');
572
  const evalText = button.querySelector('.eval-text');
573
  let pollCount = 0;
574
  const maxPolls = 60; // Poll for up to 5 minutes (5s intervals)
575
 
576
  // Show log message
577
- this.showLogMessage(`Started evaluation for paper ${arxivId}`, 'info');
 
578
 
579
  const poll = async () => {
580
  try {
@@ -584,24 +731,31 @@ class PaperCardRenderer {
584
 
585
  switch (status.status) {
586
  case 'evaluating':
587
- evalText.textContent = `Evaluating... (${pollCount * 5}s)`;
588
  evalIcon.className = 'fas fa-spinner fa-spin eval-icon';
589
  button.className = 'eval-button evaluating-state';
590
- this.showLogMessage(`Evaluating paper ${arxivId}... (${pollCount * 5}s)`, 'info');
 
591
  break;
592
 
593
  case 'completed':
594
  evalIcon.className = 'fas fa-check eval-icon';
595
- evalText.textContent = 'Completed';
596
  button.className = 'eval-button evaluation-state';
597
  button.onclick = () => {
598
  window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
599
  };
600
- this.showLogMessage(`Evaluation completed for paper ${arxivId}`, 'success');
 
601
 
602
  // Add score badge after completion
603
  this.checkPaperScore(button.closest('.hf-paper-card'), arxivId);
604
 
 
 
 
 
 
605
  return; // Stop polling
606
 
607
  case 'failed':
@@ -749,6 +903,19 @@ class PaperIndexApp {
749
  e.target.classList.add('active');
750
  });
751
  });
 
 
 
 
 
 
 
 
 
 
 
 
 
752
  }
753
 
754
  async loadDaily(direction = null) {
@@ -822,7 +989,75 @@ class PaperIndexApp {
822
  }
823
  }
824
 
825
- // Removed showFallbackNotification - now using unified notification system
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
826
 
827
  // Unified notification system
828
  showNotification(options) {
 
416
  button.onclick = () => {
417
  window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
418
  };
419
+
420
+ // Add re-evaluate button for already evaluated papers
421
+ this.addReevaluateButton(card, arxivId);
422
  } else {
423
  // Paper doesn't have evaluation - show evaluate button
424
  evalIcon.className = 'fas fa-play eval-icon';
 
436
  }
437
  }
438
 
439
+ addReevaluateButton(card, arxivId) {
440
+ // Check if re-evaluate button already exists
441
+ if (card.querySelector('.reevaluate-button')) {
442
+ return;
443
+ }
444
+
445
+ const cardActions = card.querySelector('.card-actions');
446
+ if (cardActions) {
447
+ const reevaluateButton = document.createElement('button');
448
+ reevaluateButton.className = 'reevaluate-button';
449
+ reevaluateButton.innerHTML = `
450
+ <i class="fas fa-redo"></i>
451
+ <span>Re-evaluate</span>
452
+ `;
453
+ reevaluateButton.onclick = () => {
454
+ this.reevaluatePaper(reevaluateButton, arxivId);
455
+ };
456
+
457
+ cardActions.appendChild(reevaluateButton);
458
+ }
459
+ }
460
+
461
+ async reevaluatePaper(button, arxivId) {
462
+ const icon = button.querySelector('i');
463
+ const text = button.querySelector('span');
464
+ const originalText = text.textContent;
465
+ const originalIcon = icon.className;
466
+
467
+ // Show loading state
468
+ icon.className = 'fas fa-spinner fa-spin';
469
+ text.textContent = 'Re-evaluating...';
470
+ button.disabled = true;
471
+
472
+ // Show log message
473
+ this.showLogMessage(`Started re-evaluation for paper ${arxivId}`, 'info');
474
+
475
+ try {
476
+ const response = await fetch(`/api/papers/reevaluate/${encodeURIComponent(arxivId)}`, {
477
+ method: 'POST'
478
+ });
479
+
480
+ if (response.ok) {
481
+ const result = await response.json();
482
+
483
+ if (result.status === 'already_running') {
484
+ text.textContent = 'Already running';
485
+ this.showLogMessage(`Re-evaluation already running for paper ${arxivId}`, 'warning');
486
+ setTimeout(() => {
487
+ icon.className = originalIcon;
488
+ text.textContent = originalText;
489
+ button.disabled = false;
490
+ }, 2000);
491
+ } else {
492
+ // Start polling for status
493
+ this.pollReevaluationStatus(button, arxivId, originalText, originalIcon);
494
+ }
495
+ } else {
496
+ throw new Error('Failed to start re-evaluation');
497
+ }
498
+ } catch (error) {
499
+ console.error('Error re-evaluating paper:', error);
500
+ icon.className = 'fas fa-exclamation-triangle';
501
+ text.textContent = 'Error';
502
+ this.showLogMessage(`Re-evaluation failed for paper ${arxivId}: ${error.message}`, 'error');
503
+ setTimeout(() => {
504
+ icon.className = originalIcon;
505
+ text.textContent = originalText;
506
+ button.disabled = false;
507
+ }, 2000);
508
+ }
509
+ }
510
+
511
+ async pollReevaluationStatus(button, arxivId, originalText, originalIcon) {
512
+ const icon = button.querySelector('i');
513
+ const text = button.querySelector('span');
514
+ let pollCount = 0;
515
+ const maxPolls = 60; // Poll for up to 5 minutes (5s intervals)
516
+
517
+ const poll = async () => {
518
+ try {
519
+ const response = await fetch(`/api/papers/evaluate/${encodeURIComponent(arxivId)}/status`);
520
+ if (response.ok) {
521
+ const status = await response.json();
522
+
523
+ switch (status.status) {
524
+ case 'evaluating':
525
+ text.textContent = `Re-evaluating... (${pollCount * 5}s)`;
526
+ icon.className = 'fas fa-spinner fa-spin';
527
+ this.showLogMessage(`Re-evaluating paper ${arxivId}... (${pollCount * 5}s)`, 'info');
528
+ break;
529
+
530
+ case 'completed':
531
+ icon.className = 'fas fa-check';
532
+ text.textContent = 'Re-evaluated';
533
+ button.disabled = false;
534
+ this.showLogMessage(`Re-evaluation completed for paper ${arxivId}`, 'success');
535
+
536
+ // Refresh the page to show updated results
537
+ setTimeout(() => {
538
+ window.location.reload();
539
+ }, 1000);
540
+ return;
541
+
542
+ case 'failed':
543
+ icon.className = 'fas fa-exclamation-triangle';
544
+ text.textContent = 'Failed';
545
+ button.disabled = false;
546
+ this.showLogMessage(`Re-evaluation failed for paper ${arxivId}`, 'error');
547
+ return;
548
+
549
+ default:
550
+ text.textContent = `Status: ${status.status}`;
551
+ }
552
+
553
+ pollCount++;
554
+ if (pollCount < maxPolls) {
555
+ setTimeout(poll, 5000);
556
+ } else {
557
+ icon.className = 'fas fa-clock';
558
+ text.textContent = 'Timeout';
559
+ button.disabled = false;
560
+ this.showLogMessage(`Re-evaluation timeout for paper ${arxivId}`, 'warning');
561
+ }
562
+ } else {
563
+ throw new Error('Failed to get status');
564
+ }
565
+ } catch (error) {
566
+ console.error('Error polling re-evaluation status:', error);
567
+ icon.className = 'fas fa-exclamation-triangle';
568
+ text.textContent = 'Error';
569
+ button.disabled = false;
570
+ }
571
+ };
572
+
573
+ poll();
574
+ }
575
+
576
+
577
+
578
  async checkPaperScore(card, arxivId) {
579
  try {
580
  // First check if the card already has score data from the API response
 
642
  }, 100);
643
  }
644
 
645
+ async evaluatePaper(button, arxivId, isReevaluate = false) {
646
  const spinner = button.querySelector('.fa-spinner');
647
  const evalIcon = button.querySelector('.eval-icon');
648
  const evalText = button.querySelector('.eval-text');
649
  const paperTitle = button.getAttribute('data-paper-title');
650
 
651
+ // Clear any existing state classes and show loading state
652
+ button.className = 'eval-button started-state';
653
  spinner.style.display = 'inline-block';
654
  evalIcon.style.display = 'none';
655
+ evalText.textContent = isReevaluate ? 'Re-starting...' : 'Starting...';
 
656
  button.disabled = true;
657
 
658
  try {
 
676
  });
677
 
678
  // Start evaluation
679
+ const url = isReevaluate ?
680
+ `/api/papers/reevaluate/${encodeURIComponent(arxivId)}` :
681
+ `/api/papers/evaluate/${encodeURIComponent(arxivId)}`;
682
+
683
+ const response = await fetch(url, {
684
  method: 'POST'
685
  });
686
 
687
  if (response.ok) {
688
  const result = await response.json();
689
 
690
+ if (result.status === 'already_evaluated' && !isReevaluate) {
691
  // Paper was already evaluated, redirect to evaluation page
692
  window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
693
  } else {
694
  // Evaluation started, show progress and poll for status
695
+ evalText.textContent = isReevaluate ? 'Re-started...' : 'Started...';
696
  button.className = 'eval-button started-state';
697
 
698
  // Start polling for status
699
+ this.pollEvaluationStatus(button, arxivId, isReevaluate);
700
  }
701
  } else {
702
  throw new Error('Failed to start evaluation');
 
713
  }
714
  }
715
 
716
+ async pollEvaluationStatus(button, arxivId, isReevaluate = false) {
717
  const evalIcon = button.querySelector('.eval-icon');
718
  const evalText = button.querySelector('.eval-text');
719
  let pollCount = 0;
720
  const maxPolls = 60; // Poll for up to 5 minutes (5s intervals)
721
 
722
  // Show log message
723
+ const action = isReevaluate ? 're-evaluation' : 'evaluation';
724
+ this.showLogMessage(`Started ${action} for paper ${arxivId}`, 'info');
725
 
726
  const poll = async () => {
727
  try {
 
731
 
732
  switch (status.status) {
733
  case 'evaluating':
734
+ evalText.textContent = isReevaluate ? `Re-evaluating... (${pollCount * 5}s)` : `Evaluating... (${pollCount * 5}s)`;
735
  evalIcon.className = 'fas fa-spinner fa-spin eval-icon';
736
  button.className = 'eval-button evaluating-state';
737
+ const evaluatingAction = isReevaluate ? 'Re-evaluating' : 'Evaluating';
738
+ this.showLogMessage(`${evaluatingAction} paper ${arxivId}... (${pollCount * 5}s)`, 'info');
739
  break;
740
 
741
  case 'completed':
742
  evalIcon.className = 'fas fa-check eval-icon';
743
+ evalText.textContent = isReevaluate ? 'Re-evaluated' : 'Completed';
744
  button.className = 'eval-button evaluation-state';
745
  button.onclick = () => {
746
  window.location.href = `/paper.html?id=${encodeURIComponent(arxivId)}`;
747
  };
748
+ const completedAction = isReevaluate ? 'Re-evaluation' : 'Evaluation';
749
+ this.showLogMessage(`${completedAction} completed for paper ${arxivId}`, 'success');
750
 
751
  // Add score badge after completion
752
  this.checkPaperScore(button.closest('.hf-paper-card'), arxivId);
753
 
754
+ // Add re-evaluate button if not already re-evaluating
755
+ if (!isReevaluate) {
756
+ this.addReevaluateButton(button.closest('.hf-paper-card'), arxivId);
757
+ }
758
+
759
  return; // Stop polling
760
 
761
  case 'failed':
 
903
  e.target.classList.add('active');
904
  });
905
  });
906
+
907
+ // Batch evaluate button
908
+ const batchEvaluateBtn = document.getElementById('batchEvaluateBtn');
909
+ console.log('Looking for batchEvaluateBtn:', batchEvaluateBtn);
910
+ if (batchEvaluateBtn) {
911
+ console.log('Adding click listener to batchEvaluateBtn');
912
+ batchEvaluateBtn.addEventListener('click', () => {
913
+ console.log('Batch evaluate button clicked');
914
+ this.startBatchEvaluation();
915
+ });
916
+ } else {
917
+ console.error('batchEvaluateBtn not found during initialization');
918
+ }
919
  }
920
 
921
  async loadDaily(direction = null) {
 
989
  }
990
  }
991
 
992
+ async startBatchEvaluation() {
993
+ console.log('startBatchEvaluation called');
994
+
995
+ const button = document.getElementById('batchEvaluateBtn');
996
+ if (!button) {
997
+ console.error('batchEvaluateBtn not found');
998
+ return;
999
+ }
1000
+
1001
+ console.log('Found batchEvaluateBtn:', button);
1002
+
1003
+ // Disable button and show loading state
1004
+ button.disabled = true;
1005
+ const originalContent = button.innerHTML;
1006
+ button.innerHTML = '<i class="fas fa-spinner fa-spin"></i><span>Starting...</span>';
1007
+
1008
+ try {
1009
+ // Find all unevaluated evaluate buttons
1010
+ const unevaluatedButtons = document.querySelectorAll('.eval-button');
1011
+ console.log('Found eval buttons:', unevaluatedButtons.length);
1012
+
1013
+ const buttonsToClick = [];
1014
+
1015
+ unevaluatedButtons.forEach((evalButton, index) => {
1016
+ const evalText = evalButton.querySelector('.eval-text');
1017
+ console.log(`Button ${index}:`, evalText ? evalText.textContent : 'no text');
1018
+ if (evalText && (evalText.textContent === 'Evaluate' || evalText.textContent === 'Check')) {
1019
+ buttonsToClick.push(evalButton);
1020
+ }
1021
+ });
1022
+
1023
+ console.log('Buttons to click:', buttonsToClick.length);
1024
+
1025
+ if (buttonsToClick.length === 0) {
1026
+ console.log('No buttons to click');
1027
+ this.cardRenderer.showLogMessage('All papers have already been evaluated.', 'info');
1028
+ return;
1029
+ }
1030
+
1031
+ this.cardRenderer.showLogMessage(`Starting batch evaluation of ${buttonsToClick.length} papers...`, 'info');
1032
+
1033
+ // Click each evaluate button with delay
1034
+ for (let i = 0; i < buttonsToClick.length; i++) {
1035
+ const evalButton = buttonsToClick[i];
1036
+
1037
+ // Update button text to show progress
1038
+ button.innerHTML = `<i class="fas fa-spinner fa-spin"></i><span>Starting ${i + 1} of ${buttonsToClick.length}</span>`;
1039
+
1040
+ console.log(`Clicking button ${i + 1}:`, evalButton);
1041
+ // Simulate click on the evaluate button
1042
+ evalButton.click();
1043
+
1044
+ // Add delay between clicks to avoid API overload
1045
+ await new Promise(resolve => setTimeout(resolve, 1000));
1046
+ }
1047
+
1048
+ this.cardRenderer.showLogMessage(`Started evaluation for ${buttonsToClick.length} papers. They will complete in the background.`, 'success');
1049
+
1050
+ } catch (error) {
1051
+ console.error('Batch evaluation error:', error);
1052
+ this.cardRenderer.showLogMessage(`Batch evaluation failed: ${error.message}`, 'error');
1053
+ } finally {
1054
+ // Restore button state
1055
+ button.disabled = false;
1056
+ button.innerHTML = originalContent;
1057
+ }
1058
+ }
1059
+
1060
+
1061
 
1062
  // Unified notification system
1063
  showNotification(options) {
frontend/paper.js CHANGED
@@ -252,7 +252,24 @@ class PaperEvaluationRenderer {
252
  </section>
253
  `;
254
 
255
- contentEl.innerHTML = execSummary +
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
256
  `<section class="evaluation-section">
257
  <div class="section-header">
258
  <h2><i class="fas fa-chart-bar"></i> Detailed Dimensional Analysis</h2>
@@ -524,9 +541,12 @@ class PaperEvaluationRenderer {
524
  class PaperEvaluationApp {
525
  constructor() {
526
  this.renderer = new PaperEvaluationRenderer();
 
527
  this.init();
528
  }
529
 
 
 
530
  async init() {
531
  const id = getParam('id');
532
  console.log('PaperEvaluationApp init with ID:', id);
@@ -592,7 +612,7 @@ class PaperEvaluationApp {
592
 
593
  // Initialize the application when DOM is loaded
594
  document.addEventListener('DOMContentLoaded', () => {
595
- new PaperEvaluationApp();
596
  });
597
 
598
 
 
252
  </section>
253
  `;
254
 
255
+ // Add action buttons at the top
256
+ const actionButtons = `
257
+ <section class="evaluation-section">
258
+ <div class="section-header">
259
+ <div style="display: flex; justify-content: space-between; align-items: center;">
260
+ <h2><i class="fas fa-chart-line"></i> Evaluation Actions</h2>
261
+ <div class="action-buttons">
262
+ <a href="/" class="action-btn primary">
263
+ <i class="fas fa-arrow-left"></i>
264
+ Back to Daily Papers
265
+ </a>
266
+ </div>
267
+ </div>
268
+ </div>
269
+ </section>
270
+ `;
271
+
272
+ contentEl.innerHTML = actionButtons + execSummary +
273
  `<section class="evaluation-section">
274
  <div class="section-header">
275
  <h2><i class="fas fa-chart-bar"></i> Detailed Dimensional Analysis</h2>
 
541
  class PaperEvaluationApp {
542
  constructor() {
543
  this.renderer = new PaperEvaluationRenderer();
544
+ this.paperId = getParam('id');
545
  this.init();
546
  }
547
 
548
+
549
+
550
  async init() {
551
  const id = getParam('id');
552
  console.log('PaperEvaluationApp init with ID:', id);
 
612
 
613
  // Initialize the application when DOM is loaded
614
  document.addEventListener('DOMContentLoaded', () => {
615
+ window.paperApp = new PaperEvaluationApp();
616
  });
617
 
618
 
frontend/styles.css CHANGED
@@ -188,7 +188,7 @@ body {
188
  margin: 0 auto;
189
  padding: 0 24px;
190
  display: grid;
191
- grid-template-columns: 1fr 2fr 1fr;
192
  gap: 32px;
193
  align-items: center;
194
  }
@@ -205,9 +205,18 @@ body {
205
  font-size: 16px;
206
  }
207
 
 
 
 
 
 
 
 
 
208
  .ai-search-container {
209
  position: relative;
210
- width: 100%;
 
211
  }
212
 
213
  .ai-search-input {
@@ -245,6 +254,41 @@ body {
245
  font-size: 16px;
246
  }
247
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
248
  .header-right {
249
  display: flex;
250
  flex-direction: column;
@@ -737,6 +781,7 @@ body {
737
  border-radius: 50%;
738
  transform: translateY(-50%);
739
  animation: spin 1s linear infinite;
 
740
  }
741
 
742
  @keyframes spin {
@@ -762,6 +807,7 @@ body {
762
  border-radius: 50%;
763
  transform: translateY(-50%);
764
  animation: pulse 1.5s ease-in-out infinite;
 
765
  }
766
 
767
  @keyframes pulse {
@@ -823,11 +869,113 @@ body {
823
  border-color: var(--text-muted);
824
  }
825
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
826
  /* Spinner animation */
827
  .eval-button .fa-spinner {
828
  animation: spin 1s linear infinite;
829
  }
830
 
 
 
 
 
 
 
 
 
 
 
 
831
  @keyframes spin {
832
  from { transform: rotate(0deg); }
833
  to { transform: rotate(360deg); }
 
188
  margin: 0 auto;
189
  padding: 0 24px;
190
  display: grid;
191
+ grid-template-columns: 1fr 1fr 1fr;
192
  gap: 32px;
193
  align-items: center;
194
  }
 
205
  font-size: 16px;
206
  }
207
 
208
+ .search-batch-container {
209
+ display: flex;
210
+ align-items: center;
211
+ gap: 16px;
212
+ width: 100%;
213
+ justify-content: center;
214
+ }
215
+
216
  .ai-search-container {
217
  position: relative;
218
+ flex: 1;
219
+ max-width: 800px;
220
  }
221
 
222
  .ai-search-input {
 
254
  font-size: 16px;
255
  }
256
 
257
+ .batch-evaluate-btn {
258
+ display: flex;
259
+ align-items: center;
260
+ gap: 8px;
261
+ padding: 12px 20px;
262
+ background: linear-gradient(135deg, var(--accent-primary), var(--accent-secondary));
263
+ color: white;
264
+ border: none;
265
+ border-radius: 12px;
266
+ font-size: 14px;
267
+ font-weight: 600;
268
+ cursor: pointer;
269
+ transition: all 0.2s ease;
270
+ box-shadow: 0 2px 8px rgba(59, 130, 246, 0.3);
271
+ }
272
+
273
+ .batch-evaluate-btn:hover {
274
+ transform: translateY(-1px);
275
+ box-shadow: 0 4px 12px rgba(59, 130, 246, 0.4);
276
+ }
277
+
278
+ .batch-evaluate-btn:active {
279
+ transform: translateY(0);
280
+ }
281
+
282
+ .batch-evaluate-btn:disabled {
283
+ opacity: 0.6;
284
+ cursor: not-allowed;
285
+ transform: none;
286
+ }
287
+
288
+ .batch-evaluate-btn i {
289
+ font-size: 16px;
290
+ }
291
+
292
  .header-right {
293
  display: flex;
294
  flex-direction: column;
 
781
  border-radius: 50%;
782
  transform: translateY(-50%);
783
  animation: spin 1s linear infinite;
784
+ z-index: 1;
785
  }
786
 
787
  @keyframes spin {
 
807
  border-radius: 50%;
808
  transform: translateY(-50%);
809
  animation: pulse 1.5s ease-in-out infinite;
810
+ z-index: 1;
811
  }
812
 
813
  @keyframes pulse {
 
869
  border-color: var(--text-muted);
870
  }
871
 
872
+ /* Re-evaluate button */
873
+ .reevaluate-button {
874
+ display: inline-flex;
875
+ align-items: center;
876
+ gap: 6px;
877
+ padding: 8px 16px;
878
+ border: 1px solid var(--accent-secondary);
879
+ border-radius: 8px;
880
+ background-color: var(--bg-secondary);
881
+ color: var(--accent-secondary);
882
+ font-size: 12px;
883
+ font-weight: 500;
884
+ text-decoration: none;
885
+ cursor: pointer;
886
+ transition: all 0.2s ease;
887
+ min-width: 100px;
888
+ justify-content: center;
889
+ margin-left: 8px;
890
+ }
891
+
892
+ .reevaluate-button:hover {
893
+ background-color: var(--accent-secondary);
894
+ color: white;
895
+ border-color: var(--accent-secondary);
896
+ }
897
+
898
+ .reevaluate-button:disabled {
899
+ opacity: 0.6;
900
+ cursor: not-allowed;
901
+ }
902
+
903
+ .reevaluate-button i {
904
+ font-size: 12px;
905
+ }
906
+
907
+ /* Action buttons for paper detail page */
908
+ .action-buttons {
909
+ display: flex;
910
+ gap: 12px;
911
+ align-items: center;
912
+ }
913
+
914
+ .action-btn {
915
+ display: inline-flex;
916
+ align-items: center;
917
+ gap: 8px;
918
+ padding: 10px 16px;
919
+ border: 1px solid var(--border-medium);
920
+ border-radius: 8px;
921
+ background-color: var(--bg-secondary);
922
+ color: var(--text-secondary);
923
+ font-size: 14px;
924
+ font-weight: 500;
925
+ text-decoration: none;
926
+ cursor: pointer;
927
+ transition: all 0.2s ease;
928
+ }
929
+
930
+ .action-btn:hover {
931
+ background-color: var(--bg-tertiary);
932
+ color: var(--text-primary);
933
+ border-color: var(--border-medium);
934
+ }
935
+
936
+ .action-btn.primary {
937
+ background-color: var(--accent-primary);
938
+ color: white;
939
+ border-color: var(--accent-primary);
940
+ }
941
+
942
+ .action-btn.primary:hover {
943
+ background-color: var(--accent-primary);
944
+ opacity: 0.9;
945
+ }
946
+
947
+ .action-btn.secondary {
948
+ background-color: var(--accent-secondary);
949
+ color: white;
950
+ border-color: var(--accent-secondary);
951
+ }
952
+
953
+ .action-btn.secondary:hover {
954
+ background-color: var(--accent-secondary);
955
+ opacity: 0.9;
956
+ }
957
+
958
+ .action-btn:disabled {
959
+ opacity: 0.6;
960
+ cursor: not-allowed;
961
+ }
962
+
963
  /* Spinner animation */
964
  .eval-button .fa-spinner {
965
  animation: spin 1s linear infinite;
966
  }
967
 
968
+ /* Ensure only one ::after pseudo-element is visible at a time */
969
+ .eval-button::after {
970
+ content: none;
971
+ }
972
+
973
+ .eval-button.evaluating-state::after,
974
+ .eval-button.started-state::after,
975
+ .eval-button.processing-state::after {
976
+ content: '';
977
+ }
978
+
979
  @keyframes spin {
980
  from { transform: rotate(0deg); }
981
  to { transform: rotate(360deg); }
requirements.txt CHANGED
@@ -9,4 +9,5 @@ httpx>=0.27.0
9
  beautifulsoup4>=4.12.3
10
  lxml>=5.2.2
11
  mmengine>=0.10.7
 
12
 
 
9
  beautifulsoup4>=4.12.3
10
  lxml>=5.2.2
11
  mmengine>=0.10.7
12
+ aiosqlite>=0.20.0
13
 
src/agents/evaluator.py CHANGED
@@ -9,7 +9,7 @@ from typing import Any, Dict, List, Optional
9
  from pathlib import Path
10
  from datetime import datetime
11
 
12
- from anthropic import Anthropic
13
  from anthropic.types import ToolUseBlock
14
  from langgraph.graph import END, StateGraph
15
  from pydantic import BaseModel, Field
@@ -59,7 +59,7 @@ class Evaluator:
59
  api_key = api_key or os.getenv("ANTHROPIC_API_KEY")
60
  if not api_key:
61
  raise ValueError("Anthropic API key is required. Please set HF_SECRET_ANTHROPIC_API_KEY in Hugging Face Spaces secrets or ANTHROPIC_API_KEY environment variable.")
62
- self.client = Anthropic(api_key=api_key)
63
  self.system_prompt = REVIEWER_SYSTEM_PROMPT
64
  self.eval_template = EVALUATION_PROMPT_TEMPLATE
65
 
@@ -91,8 +91,8 @@ class Evaluator:
91
  })
92
 
93
  try:
94
- # Call Anthropic API with tools
95
- response = self.client.messages.create(
96
  model=config.model_id,
97
  max_tokens=4000,
98
  system=self.system_prompt,
@@ -210,7 +210,7 @@ async def save_node(state: ConversationState) -> ConversationState:
210
  logger.warning(f"Warning: Could not parse evaluation_content as JSON: {e}")
211
 
212
  # Save to database
213
- db.update_paper_evaluation(
214
  arxiv_id=state.arxiv_id,
215
  evaluation_content=evaluation_content,
216
  evaluation_score=evaluation_score,
 
9
  from pathlib import Path
10
  from datetime import datetime
11
 
12
+ from anthropic import AsyncAnthropic
13
  from anthropic.types import ToolUseBlock
14
  from langgraph.graph import END, StateGraph
15
  from pydantic import BaseModel, Field
 
59
  api_key = api_key or os.getenv("ANTHROPIC_API_KEY")
60
  if not api_key:
61
  raise ValueError("Anthropic API key is required. Please set HF_SECRET_ANTHROPIC_API_KEY in Hugging Face Spaces secrets or ANTHROPIC_API_KEY environment variable.")
62
+ self.client = AsyncAnthropic(api_key=api_key)
63
  self.system_prompt = REVIEWER_SYSTEM_PROMPT
64
  self.eval_template = EVALUATION_PROMPT_TEMPLATE
65
 
 
91
  })
92
 
93
  try:
94
+ # Call Anthropic API with tools (async)
95
+ response = await self.client.messages.create(
96
  model=config.model_id,
97
  max_tokens=4000,
98
  system=self.system_prompt,
 
210
  logger.warning(f"Warning: Could not parse evaluation_content as JSON: {e}")
211
 
212
  # Save to database
213
+ await db.update_paper_evaluation(
214
  arxiv_id=state.arxiv_id,
215
  evaluation_content=evaluation_content,
216
  evaluation_score=evaluation_score,
src/database/db.py CHANGED
@@ -1,9 +1,9 @@
1
  import os
2
  import json
3
- import sqlite3
4
  from datetime import date, datetime, timedelta
5
  from typing import Any, Dict, List, Optional
6
- from contextlib import contextmanager
7
 
8
 
9
  class PapersDatabase():
@@ -11,16 +11,16 @@ class PapersDatabase():
11
  super().__init__(**kwargs)
12
  self.db_path = None
13
 
14
- def init_db(self, config):
15
  """Initialize the database with required tables"""
16
 
17
  self.db_path = config.db_path
18
 
19
- with self.get_connection() as conn:
20
- cursor = conn.cursor()
21
 
22
  # Create papers cache table
23
- cursor.execute('''
24
  CREATE TABLE IF NOT EXISTS papers_cache (
25
  date_str TEXT PRIMARY KEY,
26
  html_content TEXT NOT NULL,
@@ -31,7 +31,7 @@ class PapersDatabase():
31
  ''')
32
 
33
  # Create papers table for individual arXiv papers
34
- cursor.execute('''
35
  CREATE TABLE IF NOT EXISTS papers (
36
  arxiv_id TEXT PRIMARY KEY,
37
  title TEXT NOT NULL,
@@ -52,7 +52,7 @@ class PapersDatabase():
52
  ''')
53
 
54
  # Create latest_date table to track the most recent available date
55
- cursor.execute('''
56
  CREATE TABLE IF NOT EXISTS latest_date (
57
  id INTEGER PRIMARY KEY CHECK (id = 1),
58
  date_str TEXT NOT NULL,
@@ -61,34 +61,39 @@ class PapersDatabase():
61
  ''')
62
 
63
  # Insert default latest_date record if it doesn't exist
64
- cursor.execute('''
65
  INSERT OR IGNORE INTO latest_date (id, date_str)
66
  VALUES (1, ?)
67
  ''', (date.today().isoformat(),))
68
 
69
- conn.commit()
70
 
71
- @contextmanager
72
- def get_connection(self):
73
  """Context manager for database connections"""
74
- conn = sqlite3.connect(self.db_path)
75
- conn.row_factory = sqlite3.Row # Enable dict-like access
 
 
 
 
 
76
  try:
77
  yield conn
78
  finally:
79
- conn.close()
80
 
81
- def get_cached_papers(self, date_str: str) -> Optional[Dict[str, Any]]:
82
  """Get cached papers for a specific date"""
83
- with self.get_connection() as conn:
84
- cursor = conn.cursor()
85
- cursor.execute('''
86
  SELECT parsed_cards, created_at
87
  FROM papers_cache
88
  WHERE date_str = ?
89
  ''', (date_str,))
90
 
91
- row = cursor.fetchone()
92
  if row:
93
  return {
94
  'cards': json.loads(row['parsed_cards']),
@@ -96,47 +101,47 @@ class PapersDatabase():
96
  }
97
  return None
98
 
99
- def cache_papers(self, date_str: str, html_content: str, parsed_cards: List[Dict[str, Any]]):
100
  """Cache papers for a specific date"""
101
- with self.get_connection() as conn:
102
- cursor = conn.cursor()
103
- cursor.execute('''
104
  INSERT OR REPLACE INTO papers_cache
105
  (date_str, html_content, parsed_cards, updated_at)
106
  VALUES (?, ?, ?, CURRENT_TIMESTAMP)
107
  ''', (date_str, html_content, json.dumps(parsed_cards)))
108
- conn.commit()
109
 
110
- def get_latest_cached_date(self) -> Optional[str]:
111
  """Get the latest cached date"""
112
- with self.get_connection() as conn:
113
- cursor = conn.cursor()
114
- cursor.execute('SELECT date_str FROM latest_date WHERE id = 1')
115
- row = cursor.fetchone()
116
  return row['date_str'] if row else None
117
 
118
- def update_latest_date(self, date_str: str):
119
  """Update the latest available date"""
120
- with self.get_connection() as conn:
121
- cursor = conn.cursor()
122
- cursor.execute('''
123
  UPDATE latest_date
124
  SET date_str = ?, updated_at = CURRENT_TIMESTAMP
125
  WHERE id = 1
126
  ''', (date_str,))
127
- conn.commit()
128
 
129
- def is_cache_fresh(self, date_str: str, max_age_hours: int = 24) -> bool:
130
  """Check if cache is fresh (within max_age_hours)"""
131
- with self.get_connection() as conn:
132
- cursor = conn.cursor()
133
- cursor.execute('''
134
  SELECT updated_at
135
  FROM papers_cache
136
  WHERE date_str = ?
137
  ''', (date_str,))
138
 
139
- row = cursor.fetchone()
140
  if not row:
141
  return False
142
 
@@ -144,64 +149,65 @@ class PapersDatabase():
144
  age = datetime.now(cached_time.tzinfo) - cached_time
145
  return age.total_seconds() < max_age_hours * 3600
146
 
147
- def cleanup_old_cache(self, days_to_keep: int = 7):
148
  """Clean up old cache entries"""
149
  cutoff_date = (datetime.now() - timedelta(days=days_to_keep)).isoformat()
150
- with self.get_connection() as conn:
151
- cursor = conn.cursor()
152
- cursor.execute('''
153
  DELETE FROM papers_cache
154
  WHERE updated_at < ?
155
  ''', (cutoff_date,))
156
- conn.commit()
157
 
158
  # Papers table methods
159
- def insert_paper(self, arxiv_id: str, title: str, authors: str, abstract: str = None,
160
  categories: str = None, published_date: str = None):
161
  """Insert a new paper into the papers table"""
162
- with self.get_connection() as conn:
163
- cursor = conn.cursor()
164
- cursor.execute('''
165
  INSERT OR REPLACE INTO papers
166
  (arxiv_id, title, authors, abstract, categories, published_date, updated_at)
167
  VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
168
  ''', (arxiv_id, title, authors, abstract, categories, published_date))
169
- conn.commit()
170
 
171
- def get_paper(self, arxiv_id: str) -> Optional[Dict[str, Any]]:
172
  """Get a paper by arxiv_id"""
173
- with self.get_connection() as conn:
174
- cursor = conn.cursor()
175
- cursor.execute('''
176
  SELECT * FROM papers WHERE arxiv_id = ?
177
  ''', (arxiv_id,))
178
 
179
- row = cursor.fetchone()
180
  if row:
181
  return dict(row)
182
  return None
183
 
184
- def get_papers_by_evaluation_status(self, is_evaluated: bool = None) -> List[Dict[str, Any]]:
185
  """Get papers by evaluation status"""
186
- with self.get_connection() as conn:
187
- cursor = conn.cursor()
188
  if is_evaluated is None:
189
- cursor.execute('SELECT * FROM papers ORDER BY created_at DESC')
190
  else:
191
- cursor.execute('''
192
  SELECT * FROM papers
193
  WHERE is_evaluated = ?
194
  ORDER BY created_at DESC
195
  ''', (is_evaluated,))
196
 
197
- return [dict(row) for row in cursor.fetchall()]
 
198
 
199
- def update_paper_evaluation(self, arxiv_id: str, evaluation_content: str,
200
  evaluation_score: float = None, overall_score: float = None, evaluation_tags: str = None):
201
  """Update paper with evaluation content"""
202
- with self.get_connection() as conn:
203
- cursor = conn.cursor()
204
- cursor.execute('''
205
  UPDATE papers
206
  SET evaluation_content = ?,
207
  evaluation_score = ?,
@@ -213,57 +219,60 @@ class PapersDatabase():
213
  updated_at = CURRENT_TIMESTAMP
214
  WHERE arxiv_id = ?
215
  ''', (evaluation_content, evaluation_score, overall_score, evaluation_tags, arxiv_id))
216
- conn.commit()
217
 
218
- def update_paper_status(self, arxiv_id: str, status: str):
219
  """Update paper evaluation status"""
220
- with self.get_connection() as conn:
221
- cursor = conn.cursor()
222
- cursor.execute('''
223
  UPDATE papers
224
  SET evaluation_status = ?,
225
  updated_at = CURRENT_TIMESTAMP
226
  WHERE arxiv_id = ?
227
  ''', (status, arxiv_id))
228
- conn.commit()
229
 
230
- def get_unevaluated_papers(self) -> List[Dict[str, Any]]:
231
  """Get all papers that haven't been evaluated yet"""
232
- return self.get_papers_by_evaluation_status(is_evaluated=False)
233
 
234
- def get_evaluated_papers(self) -> List[Dict[str, Any]]:
235
  """Get all papers that have been evaluated"""
236
- return self.get_papers_by_evaluation_status(is_evaluated=True)
237
 
238
- def search_papers(self, query: str) -> List[Dict[str, Any]]:
239
  """Search papers by title, authors, or abstract"""
240
- with self.get_connection() as conn:
241
- cursor = conn.cursor()
242
  search_pattern = f'%{query}%'
243
- cursor.execute('''
244
  SELECT * FROM papers
245
  WHERE title LIKE ? OR authors LIKE ? OR abstract LIKE ?
246
  ORDER BY created_at DESC
247
  ''', (search_pattern, search_pattern, search_pattern))
248
 
249
- return [dict(row) for row in cursor.fetchall()]
 
250
 
251
- def delete_paper(self, arxiv_id: str):
252
  """Delete a paper from the database"""
253
- with self.get_connection() as conn:
254
- cursor = conn.cursor()
255
- cursor.execute('DELETE FROM papers WHERE arxiv_id = ?', (arxiv_id,))
256
- conn.commit()
257
 
258
- def get_papers_count(self) -> Dict[str, int]:
259
  """Get count of papers by evaluation status"""
260
- with self.get_connection() as conn:
261
- cursor = conn.cursor()
262
- cursor.execute('SELECT COUNT(*) as total FROM papers')
263
- total = cursor.fetchone()['total']
 
264
 
265
- cursor.execute('SELECT COUNT(*) as evaluated FROM papers WHERE is_evaluated = TRUE')
266
- evaluated = cursor.fetchone()['evaluated']
 
267
 
268
  return {
269
  'total': total,
 
1
  import os
2
  import json
3
+ import aiosqlite
4
  from datetime import date, datetime, timedelta
5
  from typing import Any, Dict, List, Optional
6
+ from contextlib import asynccontextmanager
7
 
8
 
9
  class PapersDatabase():
 
11
  super().__init__(**kwargs)
12
  self.db_path = None
13
 
14
+ async def init_db(self, config):
15
  """Initialize the database with required tables"""
16
 
17
  self.db_path = config.db_path
18
 
19
+ async with self.get_connection() as conn:
20
+ cursor = await conn.cursor()
21
 
22
  # Create papers cache table
23
+ await cursor.execute('''
24
  CREATE TABLE IF NOT EXISTS papers_cache (
25
  date_str TEXT PRIMARY KEY,
26
  html_content TEXT NOT NULL,
 
31
  ''')
32
 
33
  # Create papers table for individual arXiv papers
34
+ await cursor.execute('''
35
  CREATE TABLE IF NOT EXISTS papers (
36
  arxiv_id TEXT PRIMARY KEY,
37
  title TEXT NOT NULL,
 
52
  ''')
53
 
54
  # Create latest_date table to track the most recent available date
55
+ await cursor.execute('''
56
  CREATE TABLE IF NOT EXISTS latest_date (
57
  id INTEGER PRIMARY KEY CHECK (id = 1),
58
  date_str TEXT NOT NULL,
 
61
  ''')
62
 
63
  # Insert default latest_date record if it doesn't exist
64
+ await cursor.execute('''
65
  INSERT OR IGNORE INTO latest_date (id, date_str)
66
  VALUES (1, ?)
67
  ''', (date.today().isoformat(),))
68
 
69
+ await conn.commit()
70
 
71
+ @asynccontextmanager
72
+ async def get_connection(self):
73
  """Context manager for database connections"""
74
+ conn = await aiosqlite.connect(self.db_path)
75
+ conn.row_factory = aiosqlite.Row # Enable dict-like access
76
+ # Enable WAL mode for better concurrency
77
+ await conn.execute("PRAGMA journal_mode=WAL")
78
+ await conn.execute("PRAGMA synchronous=NORMAL")
79
+ await conn.execute("PRAGMA cache_size=10000")
80
+ await conn.execute("PRAGMA temp_store=MEMORY")
81
  try:
82
  yield conn
83
  finally:
84
+ await conn.close()
85
 
86
+ async def get_cached_papers(self, date_str: str) -> Optional[Dict[str, Any]]:
87
  """Get cached papers for a specific date"""
88
+ async with self.get_connection() as conn:
89
+ cursor = await conn.cursor()
90
+ await cursor.execute('''
91
  SELECT parsed_cards, created_at
92
  FROM papers_cache
93
  WHERE date_str = ?
94
  ''', (date_str,))
95
 
96
+ row = await cursor.fetchone()
97
  if row:
98
  return {
99
  'cards': json.loads(row['parsed_cards']),
 
101
  }
102
  return None
103
 
104
+ async def cache_papers(self, date_str: str, html_content: str, parsed_cards: List[Dict[str, Any]]):
105
  """Cache papers for a specific date"""
106
+ async with self.get_connection() as conn:
107
+ cursor = await conn.cursor()
108
+ await cursor.execute('''
109
  INSERT OR REPLACE INTO papers_cache
110
  (date_str, html_content, parsed_cards, updated_at)
111
  VALUES (?, ?, ?, CURRENT_TIMESTAMP)
112
  ''', (date_str, html_content, json.dumps(parsed_cards)))
113
+ await conn.commit()
114
 
115
+ async def get_latest_cached_date(self) -> Optional[str]:
116
  """Get the latest cached date"""
117
+ async with self.get_connection() as conn:
118
+ cursor = await conn.cursor()
119
+ await cursor.execute('SELECT date_str FROM latest_date WHERE id = 1')
120
+ row = await cursor.fetchone()
121
  return row['date_str'] if row else None
122
 
123
+ async def update_latest_date(self, date_str: str):
124
  """Update the latest available date"""
125
+ async with self.get_connection() as conn:
126
+ cursor = await conn.cursor()
127
+ await cursor.execute('''
128
  UPDATE latest_date
129
  SET date_str = ?, updated_at = CURRENT_TIMESTAMP
130
  WHERE id = 1
131
  ''', (date_str,))
132
+ await conn.commit()
133
 
134
+ async def is_cache_fresh(self, date_str: str, max_age_hours: int = 24) -> bool:
135
  """Check if cache is fresh (within max_age_hours)"""
136
+ async with self.get_connection() as conn:
137
+ cursor = await conn.cursor()
138
+ await cursor.execute('''
139
  SELECT updated_at
140
  FROM papers_cache
141
  WHERE date_str = ?
142
  ''', (date_str,))
143
 
144
+ row = await cursor.fetchone()
145
  if not row:
146
  return False
147
 
 
149
  age = datetime.now(cached_time.tzinfo) - cached_time
150
  return age.total_seconds() < max_age_hours * 3600
151
 
152
+ async def cleanup_old_cache(self, days_to_keep: int = 7):
153
  """Clean up old cache entries"""
154
  cutoff_date = (datetime.now() - timedelta(days=days_to_keep)).isoformat()
155
+ async with self.get_connection() as conn:
156
+ cursor = await conn.cursor()
157
+ await cursor.execute('''
158
  DELETE FROM papers_cache
159
  WHERE updated_at < ?
160
  ''', (cutoff_date,))
161
+ await conn.commit()
162
 
163
  # Papers table methods
164
+ async def insert_paper(self, arxiv_id: str, title: str, authors: str, abstract: str = None,
165
  categories: str = None, published_date: str = None):
166
  """Insert a new paper into the papers table"""
167
+ async with self.get_connection() as conn:
168
+ cursor = await conn.cursor()
169
+ await cursor.execute('''
170
  INSERT OR REPLACE INTO papers
171
  (arxiv_id, title, authors, abstract, categories, published_date, updated_at)
172
  VALUES (?, ?, ?, ?, ?, ?, CURRENT_TIMESTAMP)
173
  ''', (arxiv_id, title, authors, abstract, categories, published_date))
174
+ await conn.commit()
175
 
176
+ async def get_paper(self, arxiv_id: str) -> Optional[Dict[str, Any]]:
177
  """Get a paper by arxiv_id"""
178
+ async with self.get_connection() as conn:
179
+ cursor = await conn.cursor()
180
+ await cursor.execute('''
181
  SELECT * FROM papers WHERE arxiv_id = ?
182
  ''', (arxiv_id,))
183
 
184
+ row = await cursor.fetchone()
185
  if row:
186
  return dict(row)
187
  return None
188
 
189
+ async def get_papers_by_evaluation_status(self, is_evaluated: bool = None) -> List[Dict[str, Any]]:
190
  """Get papers by evaluation status"""
191
+ async with self.get_connection() as conn:
192
+ cursor = await conn.cursor()
193
  if is_evaluated is None:
194
+ await cursor.execute('SELECT * FROM papers ORDER BY created_at DESC')
195
  else:
196
+ await cursor.execute('''
197
  SELECT * FROM papers
198
  WHERE is_evaluated = ?
199
  ORDER BY created_at DESC
200
  ''', (is_evaluated,))
201
 
202
+ rows = await cursor.fetchall()
203
+ return [dict(row) for row in rows]
204
 
205
+ async def update_paper_evaluation(self, arxiv_id: str, evaluation_content: str,
206
  evaluation_score: float = None, overall_score: float = None, evaluation_tags: str = None):
207
  """Update paper with evaluation content"""
208
+ async with self.get_connection() as conn:
209
+ cursor = await conn.cursor()
210
+ await cursor.execute('''
211
  UPDATE papers
212
  SET evaluation_content = ?,
213
  evaluation_score = ?,
 
219
  updated_at = CURRENT_TIMESTAMP
220
  WHERE arxiv_id = ?
221
  ''', (evaluation_content, evaluation_score, overall_score, evaluation_tags, arxiv_id))
222
+ await conn.commit()
223
 
224
+ async def update_paper_status(self, arxiv_id: str, status: str):
225
  """Update paper evaluation status"""
226
+ async with self.get_connection() as conn:
227
+ cursor = await conn.cursor()
228
+ await cursor.execute('''
229
  UPDATE papers
230
  SET evaluation_status = ?,
231
  updated_at = CURRENT_TIMESTAMP
232
  WHERE arxiv_id = ?
233
  ''', (status, arxiv_id))
234
+ await conn.commit()
235
 
236
+ async def get_unevaluated_papers(self) -> List[Dict[str, Any]]:
237
  """Get all papers that haven't been evaluated yet"""
238
+ return await self.get_papers_by_evaluation_status(is_evaluated=False)
239
 
240
+ async def get_evaluated_papers(self) -> List[Dict[str, Any]]:
241
  """Get all papers that have been evaluated"""
242
+ return await self.get_papers_by_evaluation_status(is_evaluated=True)
243
 
244
+ async def search_papers(self, query: str) -> List[Dict[str, Any]]:
245
  """Search papers by title, authors, or abstract"""
246
+ async with self.get_connection() as conn:
247
+ cursor = await conn.cursor()
248
  search_pattern = f'%{query}%'
249
+ await cursor.execute('''
250
  SELECT * FROM papers
251
  WHERE title LIKE ? OR authors LIKE ? OR abstract LIKE ?
252
  ORDER BY created_at DESC
253
  ''', (search_pattern, search_pattern, search_pattern))
254
 
255
+ rows = await cursor.fetchall()
256
+ return [dict(row) for row in rows]
257
 
258
+ async def delete_paper(self, arxiv_id: str):
259
  """Delete a paper from the database"""
260
+ async with self.get_connection() as conn:
261
+ cursor = await conn.cursor()
262
+ await cursor.execute('DELETE FROM papers WHERE arxiv_id = ?', (arxiv_id,))
263
+ await conn.commit()
264
 
265
+ async def get_papers_count(self) -> Dict[str, int]:
266
  """Get count of papers by evaluation status"""
267
+ async with self.get_connection() as conn:
268
+ cursor = await conn.cursor()
269
+ await cursor.execute('SELECT COUNT(*) as total FROM papers')
270
+ total_row = await cursor.fetchone()
271
+ total = total_row['total']
272
 
273
+ await cursor.execute('SELECT COUNT(*) as evaluated FROM papers WHERE is_evaluated = TRUE')
274
+ evaluated_row = await cursor.fetchone()
275
+ evaluated = evaluated_row['evaluated']
276
 
277
  return {
278
  'total': total,
debug_comparison.py → test/debug_comparison.py RENAMED
File without changes
test/test_async_db.py ADDED
@@ -0,0 +1,138 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for async database operations
4
+ """
5
+
6
+ import asyncio
7
+ import argparse
8
+ import os
9
+ import sys
10
+ from pathlib import Path
11
+ from mmengine.config import DictAction
12
+
13
+ # Add the project root to the path
14
+ root = str(Path(__file__).resolve().parents[1])
15
+ sys.path.append(root)
16
+
17
+ from src.database import db
18
+ from src.config import config
19
+ from src.logger import logger
20
+
21
+ def parse_args():
22
+ parser = argparse.ArgumentParser(description='main')
23
+ parser.add_argument("--config", default=os.path.join(root, "configs", "paper_agent.py"), help="config file path")
24
+
25
+ parser.add_argument(
26
+ '--cfg-options',
27
+ nargs='+',
28
+ action=DictAction,
29
+ help='override some settings in the used config, the key-value pair '
30
+ 'in xxx=yyy format will be merged into config file. If the value to '
31
+ 'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
32
+ 'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
33
+ 'Note that the quotation marks are necessary and that no white space '
34
+ 'is allowed.')
35
+ args = parser.parse_args()
36
+ return args
37
+
38
+
39
+ async def test_async_database():
40
+ """Test async database operations"""
41
+ print("🧪 Testing Async Database Operations")
42
+
43
+ try:
44
+ # Initialize database
45
+ await db.init_db(config=config)
46
+ print("✅ Database initialized successfully")
47
+
48
+ # Test inserting a paper
49
+ test_arxiv_id = "2401.00001"
50
+ await db.insert_paper(
51
+ arxiv_id=test_arxiv_id,
52
+ title="Test Async Paper",
53
+ authors="Test Author",
54
+ abstract="This is a test paper for async database operations.",
55
+ categories="cs.AI",
56
+ published_date="2024-01-01"
57
+ )
58
+ print("✅ Paper inserted successfully")
59
+
60
+ # Test getting the paper
61
+ paper = await db.get_paper(test_arxiv_id)
62
+ if paper:
63
+ print(f"✅ Paper retrieved: {paper['title']}")
64
+ else:
65
+ print("❌ Paper not found")
66
+ return False
67
+
68
+ # Test updating paper evaluation
69
+ await db.update_paper_evaluation(
70
+ arxiv_id=test_arxiv_id,
71
+ evaluation_content="Test evaluation content",
72
+ evaluation_score=3.5,
73
+ overall_score=3.2,
74
+ evaluation_tags="test_tag"
75
+ )
76
+ print("✅ Paper evaluation updated successfully")
77
+
78
+ # Test getting evaluated papers
79
+ evaluated_papers = await db.get_evaluated_papers()
80
+ print(f"✅ Found {len(evaluated_papers)} evaluated papers")
81
+
82
+ # Test getting paper count
83
+ count = await db.get_papers_count()
84
+ print(f"✅ Paper count: {count}")
85
+
86
+ # Test searching papers
87
+ search_results = await db.search_papers("Test")
88
+ print(f"✅ Search results: {len(search_results)} papers found")
89
+
90
+ # Test cache operations
91
+ await db.cache_papers("2024-01-01", "<html>test</html>", [{"test": "data"}])
92
+ print("✅ Cache operation successful")
93
+
94
+ cached_data = await db.get_cached_papers("2024-01-01")
95
+ if cached_data:
96
+ print("✅ Cache retrieval successful")
97
+ else:
98
+ print("❌ Cache retrieval failed")
99
+
100
+ # Test cache freshness
101
+ is_fresh = await db.is_cache_fresh("2024-01-01")
102
+ print(f"✅ Cache freshness check: {is_fresh}")
103
+
104
+ print("\n🎉 All async database tests passed!")
105
+ return True
106
+
107
+ except Exception as e:
108
+ print(f"❌ Error during async database test: {str(e)}")
109
+ import traceback
110
+ traceback.print_exc()
111
+ return False
112
+
113
+
114
+ async def main():
115
+ """Main function"""
116
+ print("🚀 Starting Async Database Test")
117
+ # Parse command line arguments
118
+ args = parse_args()
119
+
120
+ # Initialize the configuration
121
+ config.init_config(args.config, args)
122
+
123
+ # Initialize logger
124
+ logger.init_logger(config=config)
125
+
126
+ # Run the test
127
+ success = await test_async_database()
128
+
129
+ if success:
130
+ print("\n✅ All tests completed successfully!")
131
+ sys.exit(0)
132
+ else:
133
+ print("\n❌ Tests failed!")
134
+ sys.exit(1)
135
+
136
+
137
+ if __name__ == "__main__":
138
+ asyncio.run(main())
test/test_concurrent_eval.py ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Test script for concurrent evaluation operations
4
+ """
5
+
6
+ import asyncio
7
+ import aiohttp
8
+ import json
9
+ import sys
10
+ from pathlib import Path
11
+
12
+ # Add the project root to the path
13
+ root = str(Path(__file__).resolve().parents[1])
14
+ sys.path.append(root)
15
+
16
+ # Test papers (these should exist in your database)
17
+ TEST_PAPERS = [
18
+ "2401.00001",
19
+ "2401.00002",
20
+ "2401.00003"
21
+ ]
22
+
23
+ BASE_URL = "http://localhost:7860"
24
+
25
+ async def test_concurrent_evaluations():
26
+ """Test concurrent evaluation of multiple papers"""
27
+ print("🧪 Testing Concurrent Evaluations")
28
+
29
+ async with aiohttp.ClientSession() as session:
30
+ # Start multiple evaluations concurrently
31
+ tasks = []
32
+ for arxiv_id in TEST_PAPERS:
33
+ print(f"Starting evaluation for {arxiv_id}")
34
+ task = asyncio.create_task(start_evaluation(session, arxiv_id))
35
+ tasks.append(task)
36
+
37
+ # Wait for all evaluations to start
38
+ results = await asyncio.gather(*tasks, return_exceptions=True)
39
+
40
+ print("\n=== Evaluation Start Results ===")
41
+ for i, result in enumerate(results):
42
+ if isinstance(result, Exception):
43
+ print(f"❌ Error starting evaluation for {TEST_PAPERS[i]}: {result}")
44
+ else:
45
+ print(f"✅ Started evaluation for {TEST_PAPERS[i]}: {result.get('status')}")
46
+
47
+ # Check active tasks
48
+ print("\n=== Checking Active Tasks ===")
49
+ async with session.get(f"{BASE_URL}/api/papers/evaluate/active-tasks") as response:
50
+ if response.status == 200:
51
+ active_tasks = await response.json()
52
+ print(f"Active tasks: {active_tasks['total_active']}")
53
+ print(f"Tracked tasks: {active_tasks['total_tracked']}")
54
+ for arxiv_id, task_info in active_tasks['active_tasks'].items():
55
+ print(f" - {arxiv_id}: {task_info['status']}")
56
+ else:
57
+ print(f"❌ Failed to get active tasks: {response.status}")
58
+
59
+ # Monitor status for a few seconds
60
+ print("\n=== Monitoring Status ===")
61
+ for _ in range(5):
62
+ await asyncio.sleep(2)
63
+ for arxiv_id in TEST_PAPERS:
64
+ async with session.get(f"{BASE_URL}/api/papers/evaluate/{arxiv_id}/status") as response:
65
+ if response.status == 200:
66
+ status = await response.json()
67
+ print(f"{arxiv_id}: {status['status']} (running: {status.get('is_running', False)})")
68
+ else:
69
+ print(f"❌ Failed to get status for {arxiv_id}")
70
+
71
+
72
+ async def start_evaluation(session, arxiv_id):
73
+ """Start evaluation for a specific paper"""
74
+ async with session.post(f"{BASE_URL}/api/papers/evaluate/{arxiv_id}") as response:
75
+ if response.status == 200:
76
+ return await response.json()
77
+ else:
78
+ error_text = await response.text()
79
+ raise Exception(f"HTTP {response.status}: {error_text}")
80
+
81
+
82
+ async def main():
83
+ """Main function"""
84
+ print("🚀 Starting Concurrent Evaluation Test")
85
+
86
+ try:
87
+ await test_concurrent_evaluations()
88
+ print("\n✅ Concurrent evaluation test completed!")
89
+ except Exception as e:
90
+ print(f"\n❌ Test failed: {str(e)}")
91
+ import traceback
92
+ traceback.print_exc()
93
+ sys.exit(1)
94
+
95
+
96
+ if __name__ == "__main__":
97
+ asyncio.run(main())
test_evaluation.py → test/test_evaluation.py RENAMED
@@ -15,7 +15,7 @@ from mmengine import DictAction
15
  load_dotenv(verbose=True)
16
 
17
  # 设置根目录路径
18
- root = str(Path(__file__).parent)
19
  sys.path.append(root)
20
 
21
  from src.database import db
@@ -64,13 +64,13 @@ async def test_evaluation():
64
 
65
  try:
66
  # Check if paper exists in database
67
- paper = db.get_paper(test_arxiv_id)
68
  if paper:
69
  print(f"✅ Paper found in database: {paper['title']}")
70
  else:
71
  print(f"⚠️ Paper not in database, creating new record")
72
  # Insert test paper
73
- db.insert_paper(
74
  arxiv_id=test_arxiv_id,
75
  title="Test Paper for Evaluation",
76
  authors="Test Author",
@@ -100,7 +100,7 @@ async def test_evaluation():
100
  print("⚠️ Evaluation result may be incomplete")
101
 
102
  # Check evaluation status in database
103
- updated_paper = db.get_paper(test_arxiv_id)
104
  if updated_paper and updated_paper.get('is_evaluated'):
105
  print("✅ Evaluation saved to database")
106
  print(f"Evaluation score: {updated_paper.get('evaluation_score')}")
@@ -123,14 +123,14 @@ async def test_database_operations():
123
 
124
  try:
125
  # Test getting paper
126
- paper = db.get_paper("2508.09889")
127
  if paper:
128
  print(f"✅ Database connection OK, found paper: {paper['title']}")
129
  else:
130
  print("⚠️ Test paper not found in database")
131
 
132
  # Test getting paper statistics
133
- stats = db.get_papers_count()
134
  print(f"✅ Paper statistics: Total={stats['total']}, Evaluated={stats['evaluated']}, Unevaluated={stats['unevaluated']}")
135
 
136
  return True
@@ -156,7 +156,7 @@ async def main():
156
  logger.info(f"| Config:\n{config.pretty_text}")
157
 
158
  # Initialize database
159
- db.init_db(config=config)
160
  logger.info(f"| Database initialized at: {config.db_path}")
161
 
162
  print(f"✅ Database initialized: {config.db_path}")
 
15
  load_dotenv(verbose=True)
16
 
17
  # 设置根目录路径
18
+ root = str(Path(__file__).resolve().parents[1])
19
  sys.path.append(root)
20
 
21
  from src.database import db
 
64
 
65
  try:
66
  # Check if paper exists in database
67
+ paper = await db.get_paper(test_arxiv_id)
68
  if paper:
69
  print(f"✅ Paper found in database: {paper['title']}")
70
  else:
71
  print(f"⚠️ Paper not in database, creating new record")
72
  # Insert test paper
73
+ await db.insert_paper(
74
  arxiv_id=test_arxiv_id,
75
  title="Test Paper for Evaluation",
76
  authors="Test Author",
 
100
  print("⚠️ Evaluation result may be incomplete")
101
 
102
  # Check evaluation status in database
103
+ updated_paper = await db.get_paper(test_arxiv_id)
104
  if updated_paper and updated_paper.get('is_evaluated'):
105
  print("✅ Evaluation saved to database")
106
  print(f"Evaluation score: {updated_paper.get('evaluation_score')}")
 
123
 
124
  try:
125
  # Test getting paper
126
+ paper = await db.get_paper("2508.09889")
127
  if paper:
128
  print(f"✅ Database connection OK, found paper: {paper['title']}")
129
  else:
130
  print("⚠️ Test paper not found in database")
131
 
132
  # Test getting paper statistics
133
+ stats = await db.get_papers_count()
134
  print(f"✅ Paper statistics: Total={stats['total']}, Evaluated={stats['evaluated']}, Unevaluated={stats['unevaluated']}")
135
 
136
  return True
 
156
  logger.info(f"| Config:\n{config.pretty_text}")
157
 
158
  # Initialize database
159
+ await db.init_db(config=config)
160
  logger.info(f"| Database initialized at: {config.db_path}")
161
 
162
  print(f"✅ Database initialized: {config.db_path}")