auto conf discovery
Browse files
.github/scripts/AI_DISCOVERY_README.md
ADDED
@@ -0,0 +1,236 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# AI-Powered Conference Discovery System
|
2 |
+
|
3 |
+
This system automatically discovers new AI conferences by combining web scraping with AI analysis to find, categorize, and validate conference information.
|
4 |
+
|
5 |
+
## Overview
|
6 |
+
|
7 |
+
The AI discovery system works in multiple stages:
|
8 |
+
|
9 |
+
1. **Web Scraping**: Searches multiple sources for conference information
|
10 |
+
2. **AI Analysis**: Uses LLMs to categorize and extract structured data
|
11 |
+
3. **Validation**: Filters results based on confidence scores and criteria
|
12 |
+
4. **Integration**: Adds validated conferences to your `conferences.yml` file
|
13 |
+
|
14 |
+
## Configuration
|
15 |
+
|
16 |
+
### Environment Variables
|
17 |
+
|
18 |
+
Set up these environment variables in your GitHub repository secrets:
|
19 |
+
|
20 |
+
- `OPENAI_API_KEY`: Your OpenAI API key for AI analysis (optional but recommended)
|
21 |
+
|
22 |
+
### Configuration File
|
23 |
+
|
24 |
+
Edit `.github/scripts/ai_config.yml` to customize:
|
25 |
+
|
26 |
+
#### Target Categories
|
27 |
+
```yaml
|
28 |
+
target_categories:
|
29 |
+
machine-learning:
|
30 |
+
- "machine learning"
|
31 |
+
- "ML"
|
32 |
+
- "artificial intelligence"
|
33 |
+
# Add more keywords...
|
34 |
+
```
|
35 |
+
|
36 |
+
#### Discovery Sources
|
37 |
+
```yaml
|
38 |
+
sources:
|
39 |
+
wikicfp:
|
40 |
+
enabled: true
|
41 |
+
max_results_per_keyword: 10
|
42 |
+
deadline_trackers:
|
43 |
+
enabled: true
|
44 |
+
urls:
|
45 |
+
- "https://aideadlin.es/"
|
46 |
+
```
|
47 |
+
|
48 |
+
#### AI Enhancement
|
49 |
+
```yaml
|
50 |
+
ai_enhancement:
|
51 |
+
enabled: true
|
52 |
+
model: "gpt-3.5-turbo"
|
53 |
+
confidence_threshold: 0.6
|
54 |
+
```
|
55 |
+
|
56 |
+
## How It Works
|
57 |
+
|
58 |
+
### 1. Web Scraping Sources
|
59 |
+
|
60 |
+
#### WikiCFP (Call for Papers)
|
61 |
+
- Searches for conferences using your target keywords
|
62 |
+
- Extracts conference titles, deadlines, and locations
|
63 |
+
- Follows links to get detailed information
|
64 |
+
|
65 |
+
#### Deadline Tracking Sites
|
66 |
+
- Scrapes popular AI deadline aggregators
|
67 |
+
- Extracts conference information from structured lists
|
68 |
+
- Identifies conferences with relevant keywords
|
69 |
+
|
70 |
+
#### University Pages (Optional)
|
71 |
+
- Monitors AI department news pages
|
72 |
+
- Looks for conference announcements
|
73 |
+
- Can be resource-intensive, disabled by default
|
74 |
+
|
75 |
+
### 2. AI Analysis
|
76 |
+
|
77 |
+
When OpenAI API key is provided, the system:
|
78 |
+
|
79 |
+
- **Categorizes** conferences into your target categories
|
80 |
+
- **Extracts** structured data (full names, locations, etc.)
|
81 |
+
- **Validates** that conferences are legitimate academic events
|
82 |
+
- **Assigns confidence scores** based on relevance and quality
|
83 |
+
|
84 |
+
### 3. Filtering & Validation
|
85 |
+
|
86 |
+
Conferences must meet criteria to be added:
|
87 |
+
|
88 |
+
- **Confidence score** ≥ 0.6 (configurable)
|
89 |
+
- **Title length** ≥ 3 characters
|
90 |
+
- **Has relevant tags** from your target categories
|
91 |
+
- **Future dates** (current or next year)
|
92 |
+
- **Not duplicates** of existing conferences
|
93 |
+
|
94 |
+
### 4. Output
|
95 |
+
|
96 |
+
Valid conferences are:
|
97 |
+
|
98 |
+
- Added to `src/data/conferences.yml`
|
99 |
+
- Formatted consistently with existing entries
|
100 |
+
- Marked with discovery source for verification
|
101 |
+
- Sorted by deadline
|
102 |
+
|
103 |
+
## Usage
|
104 |
+
|
105 |
+
### Automatic (Recommended)
|
106 |
+
|
107 |
+
The system runs automatically:
|
108 |
+
- **Weekly**: Every Monday at 6 AM UTC via GitHub Actions
|
109 |
+
- **Manual**: Trigger via GitHub Actions "Run workflow" button
|
110 |
+
- **On changes**: When someone modifies `conferences.yml`
|
111 |
+
|
112 |
+
### Manual Execution
|
113 |
+
|
114 |
+
```bash
|
115 |
+
# Install dependencies
|
116 |
+
pip install -r .github/scripts/requirements.txt
|
117 |
+
|
118 |
+
# Set API key (optional)
|
119 |
+
export OPENAI_API_KEY="your-key-here"
|
120 |
+
|
121 |
+
# Run discovery
|
122 |
+
python .github/scripts/ai_conference_discovery.py
|
123 |
+
```
|
124 |
+
|
125 |
+
## Sample Output
|
126 |
+
|
127 |
+
The system will add conferences like:
|
128 |
+
|
129 |
+
```yaml
|
130 |
+
- title: NEURIPS
|
131 |
+
year: 2026
|
132 |
+
id: neurips26
|
133 |
+
full_name: Conference on Neural Information Processing Systems
|
134 |
+
link: https://neurips.cc/Conferences/2026
|
135 |
+
deadline: '2026-05-20 23:59:59'
|
136 |
+
timezone: AoE
|
137 |
+
tags:
|
138 |
+
- machine-learning
|
139 |
+
- deep-learning
|
140 |
+
city: Vancouver
|
141 |
+
country: Canada
|
142 |
+
note: 'Auto-discovered from WikiCFP. Please verify details.'
|
143 |
+
```
|
144 |
+
|
145 |
+
## Monitoring & Debugging
|
146 |
+
|
147 |
+
### Logs
|
148 |
+
|
149 |
+
The system provides detailed logging:
|
150 |
+
- Conference discovery progress
|
151 |
+
- AI analysis results
|
152 |
+
- Filtering decisions
|
153 |
+
- Errors and warnings
|
154 |
+
|
155 |
+
### Manual Review
|
156 |
+
|
157 |
+
All auto-discovered conferences include:
|
158 |
+
- Source attribution in the `note` field
|
159 |
+
- GitHub PR for review before merging
|
160 |
+
- Confidence scores for quality assessment
|
161 |
+
|
162 |
+
### Troubleshooting
|
163 |
+
|
164 |
+
Common issues:
|
165 |
+
|
166 |
+
1. **No conferences found**: Check if keywords in `ai_config.yml` are relevant
|
167 |
+
2. **Low confidence scores**: Adjust `confidence_threshold` in config
|
168 |
+
3. **API rate limits**: Increase delays in rate limiting settings
|
169 |
+
4. **Duplicates**: System automatically deduplicates based on title+year
|
170 |
+
|
171 |
+
## Customization
|
172 |
+
|
173 |
+
### Adding New Sources
|
174 |
+
|
175 |
+
To add new conference sources:
|
176 |
+
|
177 |
+
1. Add URL to `sources` in `ai_config.yml`
|
178 |
+
2. Implement parsing logic in `ai_conference_discovery.py`
|
179 |
+
3. Test with a small keyword set first
|
180 |
+
|
181 |
+
### Modifying Categories
|
182 |
+
|
183 |
+
To change target categories:
|
184 |
+
|
185 |
+
1. Edit `target_categories` in `ai_config.yml`
|
186 |
+
2. Add relevant keywords for each category
|
187 |
+
3. Update the category mapping in your filtering logic
|
188 |
+
|
189 |
+
### Adjusting Quality Filters
|
190 |
+
|
191 |
+
Fine-tune discovery by modifying:
|
192 |
+
|
193 |
+
- `confidence_threshold`: Higher = fewer but higher quality conferences
|
194 |
+
- `years_ahead`: How far into the future to look
|
195 |
+
- `exclude_patterns`: Patterns to filter out (workshops, etc.)
|
196 |
+
|
197 |
+
## Cost Considerations
|
198 |
+
|
199 |
+
### OpenAI API Usage
|
200 |
+
|
201 |
+
Typical costs per discovery run:
|
202 |
+
- ~$0.10-0.50 for analyzing 50 conferences
|
203 |
+
- Depends on description length and model choice
|
204 |
+
- Can be disabled by setting `ai_enhancement.enabled: false`
|
205 |
+
|
206 |
+
### Rate Limiting
|
207 |
+
|
208 |
+
The system respects rate limits:
|
209 |
+
- 1 second delay between WikiCFP requests
|
210 |
+
- 0.5 second delay between OpenAI API calls
|
211 |
+
- Configurable timeouts and retries
|
212 |
+
|
213 |
+
## Security & Privacy
|
214 |
+
|
215 |
+
- API keys stored as GitHub secrets
|
216 |
+
- No sensitive data logged
|
217 |
+
- Respects robots.txt where possible
|
218 |
+
- User-agent identifies the tool appropriately
|
219 |
+
|
220 |
+
## Contributing
|
221 |
+
|
222 |
+
To improve the discovery system:
|
223 |
+
|
224 |
+
1. Add new conference sources in the scraping modules
|
225 |
+
2. Improve AI prompts for better categorization
|
226 |
+
3. Enhance parsing logic for different website formats
|
227 |
+
4. Add new target categories or keywords
|
228 |
+
|
229 |
+
## Support
|
230 |
+
|
231 |
+
For issues or improvements:
|
232 |
+
|
233 |
+
1. Check the GitHub Actions logs for error details
|
234 |
+
2. Test manually with `python ai_conference_discovery.py`
|
235 |
+
3. Verify configuration in `ai_config.yml`
|
236 |
+
4. Submit issues with example conference URLs that should be discovered
|
.github/scripts/ai_conference_discovery.py
ADDED
@@ -0,0 +1,496 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/usr/bin/env python3
|
2 |
+
"""
|
3 |
+
AI-Powered Conference Discovery System
|
4 |
+
|
5 |
+
This script automatically discovers new AI conferences by:
|
6 |
+
1. Scraping multiple reliable sources (WikiCFP, conference websites, etc.)
|
7 |
+
2. Using AI models to categorize and extract conference details
|
8 |
+
3. Validating and deduplicating against existing conferences
|
9 |
+
4. Adding new conferences to conferences.yml
|
10 |
+
"""
|
11 |
+
|
12 |
+
import os
|
13 |
+
import json
|
14 |
+
import yaml
|
15 |
+
import requests
|
16 |
+
import time
|
17 |
+
from datetime import datetime, timedelta
|
18 |
+
from typing import Dict, List, Any, Optional, Tuple
|
19 |
+
from dataclasses import dataclass
|
20 |
+
from urllib.parse import urljoin, urlparse
|
21 |
+
import re
|
22 |
+
from bs4 import BeautifulSoup
|
23 |
+
|
24 |
+
# Configuration for target categories
|
25 |
+
TARGET_CATEGORIES = {
|
26 |
+
"machine-learning": ["machine learning", "ML", "artificial intelligence", "AI"],
|
27 |
+
"lifelong-learning": ["lifelong learning", "continual learning", "incremental learning"],
|
28 |
+
"robotics": ["robotics", "autonomous systems", "robot"],
|
29 |
+
"computer-vision": ["computer vision", "CV", "image processing", "visual recognition"],
|
30 |
+
"web-search": ["web search", "information retrieval", "search engines"],
|
31 |
+
"data-mining": ["data mining", "knowledge discovery", "big data analytics"],
|
32 |
+
"natural-language-processing": ["natural language processing", "NLP", "computational linguistics", "text mining"],
|
33 |
+
"signal-processing": ["signal processing", "DSP", "audio processing", "speech"],
|
34 |
+
"human-computer-interaction": ["HCI", "human computer interaction", "user interface", "UX"],
|
35 |
+
"computer-graphics": ["computer graphics", "visualization", "rendering", "3D"],
|
36 |
+
"mathematics": ["mathematics", "mathematical optimization", "numerical methods"],
|
37 |
+
"reinforcement-learning": ["reinforcement learning", "RL", "deep RL", "multi-agent"]
|
38 |
+
}
|
39 |
+
|
40 |
+
@dataclass
|
41 |
+
class ConferenceCandidate:
|
42 |
+
"""Data class for discovered conference candidates"""
|
43 |
+
title: str
|
44 |
+
full_name: str = ""
|
45 |
+
url: str = ""
|
46 |
+
deadline: str = ""
|
47 |
+
abstract_deadline: str = ""
|
48 |
+
conference_date: str = ""
|
49 |
+
location: str = ""
|
50 |
+
city: str = ""
|
51 |
+
country: str = ""
|
52 |
+
description: str = ""
|
53 |
+
tags: List[str] = None
|
54 |
+
year: int = 0
|
55 |
+
confidence_score: float = 0.0
|
56 |
+
source: str = ""
|
57 |
+
|
58 |
+
def __post_init__(self):
|
59 |
+
if self.tags is None:
|
60 |
+
self.tags = []
|
61 |
+
|
62 |
+
class ConferenceDiscoveryEngine:
|
63 |
+
"""Main engine for discovering conferences using AI and web scraping"""
|
64 |
+
|
65 |
+
def __init__(self, openai_api_key: Optional[str] = None):
|
66 |
+
self.openai_api_key = openai_api_key or os.getenv('OPENAI_API_KEY')
|
67 |
+
self.session = requests.Session()
|
68 |
+
self.session.headers.update({
|
69 |
+
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
|
70 |
+
})
|
71 |
+
|
72 |
+
def discover_conferences(self) -> List[ConferenceCandidate]:
|
73 |
+
"""Main method to discover conferences from multiple sources"""
|
74 |
+
candidates = []
|
75 |
+
|
76 |
+
print("🔍 Starting AI-powered conference discovery...")
|
77 |
+
|
78 |
+
# Source 1: WikiCFP
|
79 |
+
print("📊 Scraping WikiCFP...")
|
80 |
+
wikicfp_candidates = self._scrape_wikicfp()
|
81 |
+
candidates.extend(wikicfp_candidates)
|
82 |
+
|
83 |
+
# Source 2: AI Conference deadlines websites
|
84 |
+
print("🌐 Scraping popular AI deadline trackers...")
|
85 |
+
deadline_sites_candidates = self._scrape_deadline_sites()
|
86 |
+
candidates.extend(deadline_sites_candidates)
|
87 |
+
|
88 |
+
# Source 3: University AI department pages
|
89 |
+
print("🎓 Checking university AI department pages...")
|
90 |
+
university_candidates = self._scrape_university_pages()
|
91 |
+
candidates.extend(university_candidates)
|
92 |
+
|
93 |
+
# Use AI to enhance and categorize candidates
|
94 |
+
print("🤖 Using AI to analyze and categorize conferences...")
|
95 |
+
enhanced_candidates = self._ai_enhance_candidates(candidates)
|
96 |
+
|
97 |
+
# Filter and validate
|
98 |
+
print("✅ Filtering and validating candidates...")
|
99 |
+
valid_candidates = self._filter_candidates(enhanced_candidates)
|
100 |
+
|
101 |
+
print(f"🎉 Discovered {len(valid_candidates)} potential new conferences")
|
102 |
+
return valid_candidates
|
103 |
+
|
104 |
+
def _scrape_wikicfp(self) -> List[ConferenceCandidate]:
|
105 |
+
"""Scrape WikiCFP for conference information"""
|
106 |
+
candidates = []
|
107 |
+
base_url = "http://www.wikicfp.com/cfp/"
|
108 |
+
|
109 |
+
# Search for conferences in our target categories
|
110 |
+
for category, keywords in TARGET_CATEGORIES.items():
|
111 |
+
for keyword in keywords[:2]: # Limit to avoid overwhelming
|
112 |
+
try:
|
113 |
+
search_url = f"{base_url}servlet/tool.search?q={keyword.replace(' ', '+')}&year=f"
|
114 |
+
response = self._safe_request(search_url)
|
115 |
+
if not response:
|
116 |
+
continue
|
117 |
+
|
118 |
+
soup = BeautifulSoup(response.text, 'html.parser')
|
119 |
+
conferences = self._parse_wikicfp_results(soup, category)
|
120 |
+
candidates.extend(conferences)
|
121 |
+
|
122 |
+
time.sleep(1) # Be respectful
|
123 |
+
except Exception as e:
|
124 |
+
print(f"Error scraping WikiCFP for {keyword}: {e}")
|
125 |
+
|
126 |
+
return candidates
|
127 |
+
|
128 |
+
def _parse_wikicfp_results(self, soup: BeautifulSoup, category: str) -> List[ConferenceCandidate]:
|
129 |
+
"""Parse WikiCFP search results"""
|
130 |
+
candidates = []
|
131 |
+
|
132 |
+
# WikiCFP results are typically in tables
|
133 |
+
for row in soup.find_all('tr')[1:10]: # Skip header, limit results
|
134 |
+
cells = row.find_all('td')
|
135 |
+
if len(cells) >= 4:
|
136 |
+
try:
|
137 |
+
title_cell = cells[0]
|
138 |
+
deadline_cell = cells[1]
|
139 |
+
location_cell = cells[2]
|
140 |
+
|
141 |
+
title_link = title_cell.find('a')
|
142 |
+
if title_link:
|
143 |
+
title = title_link.get_text(strip=True)
|
144 |
+
url = urljoin("http://www.wikicfp.com/cfp/", title_link.get('href', ''))
|
145 |
+
|
146 |
+
candidate = ConferenceCandidate(
|
147 |
+
title=title,
|
148 |
+
url=url,
|
149 |
+
deadline=deadline_cell.get_text(strip=True),
|
150 |
+
location=location_cell.get_text(strip=True),
|
151 |
+
tags=[category],
|
152 |
+
source="WikiCFP"
|
153 |
+
)
|
154 |
+
|
155 |
+
# Extract more details from the conference page
|
156 |
+
self._enhance_from_wikicfp_page(candidate)
|
157 |
+
candidates.append(candidate)
|
158 |
+
|
159 |
+
except Exception as e:
|
160 |
+
print(f"Error parsing WikiCFP row: {e}")
|
161 |
+
continue
|
162 |
+
|
163 |
+
return candidates
|
164 |
+
|
165 |
+
def _enhance_from_wikicfp_page(self, candidate: ConferenceCandidate):
|
166 |
+
"""Extract additional details from individual WikiCFP conference pages"""
|
167 |
+
try:
|
168 |
+
response = self._safe_request(candidate.url)
|
169 |
+
if not response:
|
170 |
+
return
|
171 |
+
|
172 |
+
soup = BeautifulSoup(response.text, 'html.parser')
|
173 |
+
|
174 |
+
# Extract conference details
|
175 |
+
content = soup.find('div', class_='cfp') or soup.find('table')
|
176 |
+
if content:
|
177 |
+
text = content.get_text()
|
178 |
+
|
179 |
+
# Extract conference dates
|
180 |
+
date_pattern = r'Conference[:\s]*([A-Za-z]+ \d{1,2}[-–—]\d{1,2}, \d{4})'
|
181 |
+
date_match = re.search(date_pattern, text)
|
182 |
+
if date_match:
|
183 |
+
candidate.conference_date = date_match.group(1)
|
184 |
+
|
185 |
+
# Extract abstract deadline
|
186 |
+
abstract_pattern = r'Abstract[:\s]*([A-Za-z]+ \d{1,2}, \d{4})'
|
187 |
+
abstract_match = re.search(abstract_pattern, text)
|
188 |
+
if abstract_match:
|
189 |
+
candidate.abstract_deadline = abstract_match.group(1)
|
190 |
+
|
191 |
+
# Extract location details
|
192 |
+
location_pattern = r'Location[:\s]*([^.\n]+)'
|
193 |
+
location_match = re.search(location_pattern, text)
|
194 |
+
if location_match:
|
195 |
+
candidate.location = location_match.group(1).strip()
|
196 |
+
|
197 |
+
except Exception as e:
|
198 |
+
print(f"Error enhancing WikiCFP page {candidate.url}: {e}")
|
199 |
+
|
200 |
+
def _scrape_deadline_sites(self) -> List[ConferenceCandidate]:
|
201 |
+
"""Scrape popular AI deadline tracking websites"""
|
202 |
+
candidates = []
|
203 |
+
|
204 |
+
# Popular deadline tracking sites
|
205 |
+
sites = [
|
206 |
+
"https://aideadlin.es/",
|
207 |
+
"https://jackietseng.github.io/conference_call_for_paper/conferences.html"
|
208 |
+
]
|
209 |
+
|
210 |
+
for site_url in sites:
|
211 |
+
try:
|
212 |
+
response = self._safe_request(site_url)
|
213 |
+
if response:
|
214 |
+
soup = BeautifulSoup(response.text, 'html.parser')
|
215 |
+
site_candidates = self._parse_deadline_site(soup, site_url)
|
216 |
+
candidates.extend(site_candidates)
|
217 |
+
except Exception as e:
|
218 |
+
print(f"Error scraping {site_url}: {e}")
|
219 |
+
|
220 |
+
return candidates
|
221 |
+
|
222 |
+
def _parse_deadline_site(self, soup: BeautifulSoup, source_url: str) -> List[ConferenceCandidate]:
|
223 |
+
"""Parse deadline tracking websites for conference info"""
|
224 |
+
candidates = []
|
225 |
+
|
226 |
+
# Look for conference entries (this will vary by site structure)
|
227 |
+
conf_elements = (soup.find_all('div', class_='conf') +
|
228 |
+
soup.find_all('tr') +
|
229 |
+
soup.find_all('li'))
|
230 |
+
|
231 |
+
for element in conf_elements[:20]: # Limit results
|
232 |
+
try:
|
233 |
+
text = element.get_text(strip=True)
|
234 |
+
if len(text) > 10 and any(keyword in text.lower() for keywords in TARGET_CATEGORIES.values() for keyword in keywords):
|
235 |
+
|
236 |
+
# Extract conference name and deadline
|
237 |
+
title_match = re.search(r'([A-Z]{2,}[\w\s]*\d{4})', text)
|
238 |
+
deadline_match = re.search(r'(\w+ \d{1,2}, \d{4})', text)
|
239 |
+
|
240 |
+
if title_match:
|
241 |
+
candidate = ConferenceCandidate(
|
242 |
+
title=title_match.group(1),
|
243 |
+
deadline=deadline_match.group(1) if deadline_match else "",
|
244 |
+
source=f"DeadlineTracker-{urlparse(source_url).netloc}",
|
245 |
+
description=text[:200]
|
246 |
+
)
|
247 |
+
candidates.append(candidate)
|
248 |
+
|
249 |
+
except Exception as e:
|
250 |
+
continue
|
251 |
+
|
252 |
+
return candidates
|
253 |
+
|
254 |
+
def _scrape_university_pages(self) -> List[ConferenceCandidate]:
|
255 |
+
"""Scrape university AI department pages for conference announcements"""
|
256 |
+
candidates = []
|
257 |
+
|
258 |
+
# Major AI research institutions
|
259 |
+
university_urls = [
|
260 |
+
"https://www.cs.stanford.edu/news/",
|
261 |
+
"https://www.csail.mit.edu/news",
|
262 |
+
"https://ai.berkeley.edu/news/",
|
263 |
+
"https://www.cs.cmu.edu/news"
|
264 |
+
]
|
265 |
+
|
266 |
+
for url in university_urls:
|
267 |
+
try:
|
268 |
+
response = self._safe_request(url)
|
269 |
+
if response:
|
270 |
+
soup = BeautifulSoup(response.text, 'html.parser')
|
271 |
+
# Look for conference-related announcements
|
272 |
+
links = soup.find_all('a', href=True)
|
273 |
+
for link in links[:10]:
|
274 |
+
link_text = link.get_text(strip=True).lower()
|
275 |
+
if ('conference' in link_text or 'cfp' in link_text or
|
276 |
+
'call for papers' in link_text):
|
277 |
+
# This is a potential conference announcement
|
278 |
+
# You would extract more details here
|
279 |
+
pass
|
280 |
+
except Exception as e:
|
281 |
+
print(f"Error scraping {url}: {e}")
|
282 |
+
|
283 |
+
return candidates
|
284 |
+
|
285 |
+
def _ai_enhance_candidates(self, candidates: List[ConferenceCandidate]) -> List[ConferenceCandidate]:
|
286 |
+
"""Use AI to enhance and categorize conference candidates"""
|
287 |
+
if not self.openai_api_key:
|
288 |
+
print("⚠️ No OpenAI API key found. Skipping AI enhancement.")
|
289 |
+
return candidates
|
290 |
+
|
291 |
+
enhanced = []
|
292 |
+
|
293 |
+
try:
|
294 |
+
import openai
|
295 |
+
openai.api_key = self.openai_api_key
|
296 |
+
|
297 |
+
for candidate in candidates:
|
298 |
+
try:
|
299 |
+
# Create a prompt for the AI to analyze the conference
|
300 |
+
prompt = f"""
|
301 |
+
Analyze this conference information and provide structured data:
|
302 |
+
|
303 |
+
Title: {candidate.title}
|
304 |
+
Description: {candidate.description}
|
305 |
+
Location: {candidate.location}
|
306 |
+
Current Tags: {candidate.tags}
|
307 |
+
|
308 |
+
Please provide:
|
309 |
+
1. Most appropriate categories from: {list(TARGET_CATEGORIES.keys())}
|
310 |
+
2. Confidence score (0-1) that this is a legitimate AI/CS conference
|
311 |
+
3. Standardized full conference name
|
312 |
+
4. Extracted city and country from location
|
313 |
+
5. Year (if determinable)
|
314 |
+
|
315 |
+
Respond in JSON format only.
|
316 |
+
"""
|
317 |
+
|
318 |
+
response = openai.ChatCompletion.create(
|
319 |
+
model="gpt-3.5-turbo",
|
320 |
+
messages=[{"role": "user", "content": prompt}],
|
321 |
+
max_tokens=300,
|
322 |
+
temperature=0.1
|
323 |
+
)
|
324 |
+
|
325 |
+
ai_analysis = json.loads(response.choices[0].message.content)
|
326 |
+
|
327 |
+
# Update candidate with AI insights
|
328 |
+
candidate.tags = ai_analysis.get('categories', candidate.tags)
|
329 |
+
candidate.confidence_score = ai_analysis.get('confidence_score', 0.5)
|
330 |
+
candidate.full_name = ai_analysis.get('full_name', candidate.title)
|
331 |
+
candidate.city = ai_analysis.get('city', candidate.city)
|
332 |
+
candidate.country = ai_analysis.get('country', candidate.country)
|
333 |
+
candidate.year = ai_analysis.get('year', candidate.year)
|
334 |
+
|
335 |
+
enhanced.append(candidate)
|
336 |
+
|
337 |
+
time.sleep(0.5) # Rate limiting
|
338 |
+
|
339 |
+
except Exception as e:
|
340 |
+
print(f"Error in AI analysis for {candidate.title}: {e}")
|
341 |
+
enhanced.append(candidate) # Add without enhancement
|
342 |
+
|
343 |
+
except ImportError:
|
344 |
+
print("OpenAI package not available. Install with: pip install openai")
|
345 |
+
return candidates
|
346 |
+
|
347 |
+
return enhanced
|
348 |
+
|
349 |
+
def _filter_candidates(self, candidates: List[ConferenceCandidate]) -> List[ConferenceCandidate]:
|
350 |
+
"""Filter and validate conference candidates"""
|
351 |
+
current_year = datetime.now().year
|
352 |
+
next_year = current_year + 1
|
353 |
+
|
354 |
+
valid_candidates = []
|
355 |
+
|
356 |
+
for candidate in candidates:
|
357 |
+
# Basic validation criteria
|
358 |
+
if (candidate.confidence_score >= 0.6 and # AI confidence threshold
|
359 |
+
len(candidate.title) >= 3 and # Reasonable title length
|
360 |
+
candidate.tags and # Has categories
|
361 |
+
any(year in [current_year, next_year] for year in [candidate.year]) and # Current/next year
|
362 |
+
candidate.title not in [existing['title'] for existing in self._load_existing_conferences()]): # Not duplicate
|
363 |
+
|
364 |
+
valid_candidates.append(candidate)
|
365 |
+
|
366 |
+
return valid_candidates
|
367 |
+
|
368 |
+
def _load_existing_conferences(self) -> List[Dict]:
|
369 |
+
"""Load existing conferences to avoid duplicates"""
|
370 |
+
try:
|
371 |
+
with open('src/data/conferences.yml', 'r') as f:
|
372 |
+
return yaml.safe_load(f) or []
|
373 |
+
except FileNotFoundError:
|
374 |
+
return []
|
375 |
+
|
376 |
+
def _safe_request(self, url: str, timeout: int = 10) -> Optional[requests.Response]:
|
377 |
+
"""Make a safe HTTP request with error handling"""
|
378 |
+
try:
|
379 |
+
response = self.session.get(url, timeout=timeout)
|
380 |
+
response.raise_for_status()
|
381 |
+
return response
|
382 |
+
except Exception as e:
|
383 |
+
print(f"Request failed for {url}: {e}")
|
384 |
+
return None
|
385 |
+
|
386 |
+
def add_to_conferences_yml(self, candidates: List[ConferenceCandidate]) -> int:
|
387 |
+
"""Add validated candidates to conferences.yml"""
|
388 |
+
if not candidates:
|
389 |
+
return 0
|
390 |
+
|
391 |
+
# Load existing conferences
|
392 |
+
existing_conferences = self._load_existing_conferences()
|
393 |
+
|
394 |
+
added_count = 0
|
395 |
+
for candidate in candidates:
|
396 |
+
# Convert to conference format
|
397 |
+
conference_entry = {
|
398 |
+
'title': candidate.title,
|
399 |
+
'year': candidate.year or datetime.now().year + 1,
|
400 |
+
'id': self._generate_conference_id(candidate.title, candidate.year),
|
401 |
+
'full_name': candidate.full_name or candidate.title,
|
402 |
+
'link': candidate.url,
|
403 |
+
'deadline': self._parse_deadline(candidate.deadline),
|
404 |
+
'timezone': 'AoE', # Default timezone
|
405 |
+
'date': candidate.conference_date,
|
406 |
+
'tags': candidate.tags,
|
407 |
+
'city': candidate.city,
|
408 |
+
'country': candidate.country,
|
409 |
+
'note': f'Auto-discovered from {candidate.source}. Please verify details.'
|
410 |
+
}
|
411 |
+
|
412 |
+
# Add abstract deadline if available
|
413 |
+
if candidate.abstract_deadline:
|
414 |
+
conference_entry['abstract_deadline'] = self._parse_deadline(candidate.abstract_deadline)
|
415 |
+
|
416 |
+
existing_conferences.append(conference_entry)
|
417 |
+
added_count += 1
|
418 |
+
|
419 |
+
# Sort conferences by deadline
|
420 |
+
existing_conferences.sort(key=lambda x: x.get('deadline', '9999'))
|
421 |
+
|
422 |
+
# Write back to file
|
423 |
+
with open('src/data/conferences.yml', 'w') as f:
|
424 |
+
yaml.dump(existing_conferences, f, default_flow_style=False, sort_keys=False)
|
425 |
+
|
426 |
+
return added_count
|
427 |
+
|
428 |
+
def _generate_conference_id(self, title: str, year: int) -> str:
|
429 |
+
"""Generate a unique conference ID"""
|
430 |
+
# Extract acronym or use first few letters
|
431 |
+
words = title.split()
|
432 |
+
if len(words) > 1:
|
433 |
+
acronym = ''.join([word[0].lower() for word in words if word[0].isupper()])
|
434 |
+
if len(acronym) >= 2:
|
435 |
+
return f"{acronym}{str(year)[-2:]}"
|
436 |
+
|
437 |
+
# Fallback to first few letters + year
|
438 |
+
clean_title = re.sub(r'[^a-zA-Z0-9]', '', title.lower())
|
439 |
+
return f"{clean_title[:6]}{str(year)[-2:]}"
|
440 |
+
|
441 |
+
def _parse_deadline(self, deadline_str: str) -> str:
|
442 |
+
"""Parse deadline string into standardized format"""
|
443 |
+
if not deadline_str:
|
444 |
+
return ""
|
445 |
+
|
446 |
+
try:
|
447 |
+
# Try to parse various deadline formats
|
448 |
+
deadline_patterns = [
|
449 |
+
r'(\w+ \d{1,2}, \d{4})',
|
450 |
+
r'(\d{4}-\d{2}-\d{2})',
|
451 |
+
r'(\d{1,2}/\d{1,2}/\d{4})'
|
452 |
+
]
|
453 |
+
|
454 |
+
for pattern in deadline_patterns:
|
455 |
+
match = re.search(pattern, deadline_str)
|
456 |
+
if match:
|
457 |
+
date_str = match.group(1)
|
458 |
+
# Convert to standardized format (YYYY-MM-DD HH:MM:SS)
|
459 |
+
try:
|
460 |
+
parsed_date = datetime.strptime(date_str, "%B %d, %Y")
|
461 |
+
return parsed_date.strftime("%Y-%m-%d 23:59:59")
|
462 |
+
except ValueError:
|
463 |
+
try:
|
464 |
+
parsed_date = datetime.strptime(date_str, "%Y-%m-%d")
|
465 |
+
return parsed_date.strftime("%Y-%m-%d 23:59:59")
|
466 |
+
except ValueError:
|
467 |
+
continue
|
468 |
+
|
469 |
+
return deadline_str # Return as-is if parsing fails
|
470 |
+
|
471 |
+
except Exception:
|
472 |
+
return deadline_str
|
473 |
+
|
474 |
+
def main():
|
475 |
+
"""Main function to run conference discovery"""
|
476 |
+
print("🚀 Starting AI-Powered Conference Discovery System")
|
477 |
+
|
478 |
+
# Initialize the discovery engine
|
479 |
+
engine = ConferenceDiscoveryEngine()
|
480 |
+
|
481 |
+
# Discover conferences
|
482 |
+
candidates = engine.discover_conferences()
|
483 |
+
|
484 |
+
if candidates:
|
485 |
+
print(f"\n📋 Found {len(candidates)} potential conferences:")
|
486 |
+
for candidate in candidates:
|
487 |
+
print(f" • {candidate.title} ({candidate.confidence_score:.2f} confidence) - {candidate.tags}")
|
488 |
+
|
489 |
+
# Add to conferences.yml
|
490 |
+
added_count = engine.add_to_conferences_yml(candidates)
|
491 |
+
print(f"\n✅ Added {added_count} new conferences to conferences.yml")
|
492 |
+
else:
|
493 |
+
print("❌ No new conferences discovered")
|
494 |
+
|
495 |
+
if __name__ == "__main__":
|
496 |
+
main()
|
.github/scripts/ai_config.yml
ADDED
@@ -0,0 +1,149 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# AI Conference Discovery Configuration
|
2 |
+
|
3 |
+
# Target categories and their associated keywords for discovery
|
4 |
+
target_categories:
|
5 |
+
machine-learning:
|
6 |
+
- "machine learning"
|
7 |
+
- "ML"
|
8 |
+
- "artificial intelligence"
|
9 |
+
- "AI"
|
10 |
+
- "deep learning"
|
11 |
+
- "neural networks"
|
12 |
+
|
13 |
+
lifelong-learning:
|
14 |
+
- "lifelong learning"
|
15 |
+
- "continual learning"
|
16 |
+
- "incremental learning"
|
17 |
+
- "online learning"
|
18 |
+
|
19 |
+
robotics:
|
20 |
+
- "robotics"
|
21 |
+
- "autonomous systems"
|
22 |
+
- "robot"
|
23 |
+
- "automation"
|
24 |
+
- "mechatronics"
|
25 |
+
|
26 |
+
computer-vision:
|
27 |
+
- "computer vision"
|
28 |
+
- "CV"
|
29 |
+
- "image processing"
|
30 |
+
- "visual recognition"
|
31 |
+
- "pattern recognition"
|
32 |
+
|
33 |
+
web-search:
|
34 |
+
- "web search"
|
35 |
+
- "information retrieval"
|
36 |
+
- "search engines"
|
37 |
+
- "IR"
|
38 |
+
|
39 |
+
data-mining:
|
40 |
+
- "data mining"
|
41 |
+
- "knowledge discovery"
|
42 |
+
- "big data analytics"
|
43 |
+
- "data science"
|
44 |
+
|
45 |
+
natural-language-processing:
|
46 |
+
- "natural language processing"
|
47 |
+
- "NLP"
|
48 |
+
- "computational linguistics"
|
49 |
+
- "text mining"
|
50 |
+
- "language models"
|
51 |
+
|
52 |
+
signal-processing:
|
53 |
+
- "signal processing"
|
54 |
+
- "DSP"
|
55 |
+
- "audio processing"
|
56 |
+
- "speech"
|
57 |
+
- "multimedia"
|
58 |
+
|
59 |
+
human-computer-interaction:
|
60 |
+
- "HCI"
|
61 |
+
- "human computer interaction"
|
62 |
+
- "user interface"
|
63 |
+
- "UX"
|
64 |
+
- "usability"
|
65 |
+
|
66 |
+
computer-graphics:
|
67 |
+
- "computer graphics"
|
68 |
+
- "visualization"
|
69 |
+
- "rendering"
|
70 |
+
- "3D"
|
71 |
+
- "virtual reality"
|
72 |
+
|
73 |
+
mathematics:
|
74 |
+
- "mathematics"
|
75 |
+
- "mathematical optimization"
|
76 |
+
- "numerical methods"
|
77 |
+
- "statistics"
|
78 |
+
|
79 |
+
reinforcement-learning:
|
80 |
+
- "reinforcement learning"
|
81 |
+
- "RL"
|
82 |
+
- "deep RL"
|
83 |
+
- "multi-agent"
|
84 |
+
- "Q-learning"
|
85 |
+
|
86 |
+
# Discovery sources configuration
|
87 |
+
sources:
|
88 |
+
wikicfp:
|
89 |
+
enabled: true
|
90 |
+
base_url: "http://www.wikicfp.com/cfp/"
|
91 |
+
max_results_per_keyword: 10
|
92 |
+
request_delay: 1 # seconds between requests
|
93 |
+
|
94 |
+
deadline_trackers:
|
95 |
+
enabled: true
|
96 |
+
urls:
|
97 |
+
- "https://aideadlin.es/"
|
98 |
+
- "https://jackietseng.github.io/conference_call_for_paper/conferences.html"
|
99 |
+
max_results_per_site: 20
|
100 |
+
|
101 |
+
university_pages:
|
102 |
+
enabled: false # Can be resource intensive
|
103 |
+
urls:
|
104 |
+
- "https://www.cs.stanford.edu/news/"
|
105 |
+
- "https://www.csail.mit.edu/news"
|
106 |
+
- "https://ai.berkeley.edu/news/"
|
107 |
+
- "https://www.cs.cmu.edu/news"
|
108 |
+
|
109 |
+
# AI enhancement configuration
|
110 |
+
ai_enhancement:
|
111 |
+
enabled: true
|
112 |
+
model: "gpt-3.5-turbo"
|
113 |
+
confidence_threshold: 0.6 # Minimum confidence to include conference
|
114 |
+
max_tokens: 300
|
115 |
+
temperature: 0.1
|
116 |
+
rate_limit_delay: 0.5 # seconds between API calls
|
117 |
+
|
118 |
+
# Filtering criteria
|
119 |
+
filtering:
|
120 |
+
min_title_length: 3
|
121 |
+
max_description_length: 500
|
122 |
+
years_ahead: 2 # How many years in the future to consider
|
123 |
+
exclude_patterns:
|
124 |
+
- "workshop"
|
125 |
+
- "symposium on XYZ" # Add patterns to exclude
|
126 |
+
|
127 |
+
required_fields:
|
128 |
+
- "title"
|
129 |
+
- "tags"
|
130 |
+
|
131 |
+
# Output configuration
|
132 |
+
output:
|
133 |
+
auto_add_to_yml: true
|
134 |
+
backup_before_changes: true
|
135 |
+
sort_by_deadline: true
|
136 |
+
add_discovery_note: true
|
137 |
+
|
138 |
+
# Rate limiting and safety
|
139 |
+
rate_limiting:
|
140 |
+
max_requests_per_minute: 30
|
141 |
+
request_timeout: 10 # seconds
|
142 |
+
max_retries: 3
|
143 |
+
backoff_factor: 2
|
144 |
+
|
145 |
+
# Logging
|
146 |
+
logging:
|
147 |
+
level: "INFO"
|
148 |
+
include_timestamps: true
|
149 |
+
log_api_calls: false # Set to true for debugging
|
.github/scripts/requirements.txt
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Core dependencies for conference automation
|
2 |
+
pyyaml>=6.0
|
3 |
+
requests>=2.25.0
|
4 |
+
|
5 |
+
# Web scraping dependencies
|
6 |
+
beautifulsoup4>=4.9.0
|
7 |
+
lxml>=4.6.0
|
8 |
+
|
9 |
+
# AI/ML dependencies
|
10 |
+
openai>=0.27.0
|
11 |
+
|
12 |
+
# Data processing
|
13 |
+
python-dateutil>=2.8.0
|
14 |
+
|
15 |
+
# Optional: For more advanced NLP tasks
|
16 |
+
# nltk>=3.6.0
|
17 |
+
# spacy>=3.4.0
|
.github/workflows/update-conferences.yml
CHANGED
@@ -5,6 +5,8 @@ permissions:
|
|
5 |
|
6 |
on:
|
7 |
workflow_dispatch: # Allow manual trigger
|
|
|
|
|
8 |
pull_request:
|
9 |
paths:
|
10 |
- 'src/data/conferences.yml'
|
@@ -24,11 +26,17 @@ jobs:
|
|
24 |
- name: Install dependencies
|
25 |
run: |
|
26 |
python -m pip install --upgrade pip
|
27 |
-
pip install
|
28 |
|
29 |
-
- name: Update conferences
|
30 |
run: python .github/scripts/update_conferences.py
|
31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
32 |
- name: Check for changes
|
33 |
id: git-check
|
34 |
run: |
|
@@ -38,10 +46,12 @@ jobs:
|
|
38 |
if: steps.git-check.outputs.changes == 'true'
|
39 |
uses: peter-evans/create-pull-request@v5
|
40 |
with:
|
41 |
-
commit-message: 'chore: update conference data from ccfddl'
|
42 |
-
title: 'Update conference data
|
43 |
body: |
|
44 |
-
This PR updates the conference data from
|
|
|
|
|
45 |
|
46 |
Auto-generated by GitHub Actions.
|
47 |
branch: update-conferences
|
|
|
5 |
|
6 |
on:
|
7 |
workflow_dispatch: # Allow manual trigger
|
8 |
+
schedule:
|
9 |
+
- cron: '0 6 * * 1' # Run every Monday at 6 AM UTC
|
10 |
pull_request:
|
11 |
paths:
|
12 |
- 'src/data/conferences.yml'
|
|
|
26 |
- name: Install dependencies
|
27 |
run: |
|
28 |
python -m pip install --upgrade pip
|
29 |
+
pip install -r .github/scripts/requirements.txt
|
30 |
|
31 |
+
- name: Update conferences from ccfddl
|
32 |
run: python .github/scripts/update_conferences.py
|
33 |
|
34 |
+
- name: AI-powered conference discovery
|
35 |
+
env:
|
36 |
+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
37 |
+
run: python .github/scripts/ai_conference_discovery.py
|
38 |
+
continue-on-error: true # Don't fail the workflow if AI discovery fails
|
39 |
+
|
40 |
- name: Check for changes
|
41 |
id: git-check
|
42 |
run: |
|
|
|
46 |
if: steps.git-check.outputs.changes == 'true'
|
47 |
uses: peter-evans/create-pull-request@v5
|
48 |
with:
|
49 |
+
commit-message: 'chore: update conference data from ccfddl and AI discovery'
|
50 |
+
title: 'Update conference data (ccfddl + AI discovery)'
|
51 |
body: |
|
52 |
+
This PR updates the conference data from multiple sources:
|
53 |
+
- Updates from ccfddl repository
|
54 |
+
- New conferences discovered via AI-powered web scraping
|
55 |
|
56 |
Auto-generated by GitHub Actions.
|
57 |
branch: update-conferences
|