scdrand23 commited on
Commit
1beefeb
·
1 Parent(s): 8e72353

auto conf discovery

Browse files
.github/scripts/AI_DISCOVERY_README.md ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI-Powered Conference Discovery System
2
+
3
+ This system automatically discovers new AI conferences by combining web scraping with AI analysis to find, categorize, and validate conference information.
4
+
5
+ ## Overview
6
+
7
+ The AI discovery system works in multiple stages:
8
+
9
+ 1. **Web Scraping**: Searches multiple sources for conference information
10
+ 2. **AI Analysis**: Uses LLMs to categorize and extract structured data
11
+ 3. **Validation**: Filters results based on confidence scores and criteria
12
+ 4. **Integration**: Adds validated conferences to your `conferences.yml` file
13
+
14
+ ## Configuration
15
+
16
+ ### Environment Variables
17
+
18
+ Set up these environment variables in your GitHub repository secrets:
19
+
20
+ - `OPENAI_API_KEY`: Your OpenAI API key for AI analysis (optional but recommended)
21
+
22
+ ### Configuration File
23
+
24
+ Edit `.github/scripts/ai_config.yml` to customize:
25
+
26
+ #### Target Categories
27
+ ```yaml
28
+ target_categories:
29
+ machine-learning:
30
+ - "machine learning"
31
+ - "ML"
32
+ - "artificial intelligence"
33
+ # Add more keywords...
34
+ ```
35
+
36
+ #### Discovery Sources
37
+ ```yaml
38
+ sources:
39
+ wikicfp:
40
+ enabled: true
41
+ max_results_per_keyword: 10
42
+ deadline_trackers:
43
+ enabled: true
44
+ urls:
45
+ - "https://aideadlin.es/"
46
+ ```
47
+
48
+ #### AI Enhancement
49
+ ```yaml
50
+ ai_enhancement:
51
+ enabled: true
52
+ model: "gpt-3.5-turbo"
53
+ confidence_threshold: 0.6
54
+ ```
55
+
56
+ ## How It Works
57
+
58
+ ### 1. Web Scraping Sources
59
+
60
+ #### WikiCFP (Call for Papers)
61
+ - Searches for conferences using your target keywords
62
+ - Extracts conference titles, deadlines, and locations
63
+ - Follows links to get detailed information
64
+
65
+ #### Deadline Tracking Sites
66
+ - Scrapes popular AI deadline aggregators
67
+ - Extracts conference information from structured lists
68
+ - Identifies conferences with relevant keywords
69
+
70
+ #### University Pages (Optional)
71
+ - Monitors AI department news pages
72
+ - Looks for conference announcements
73
+ - Can be resource-intensive, disabled by default
74
+
75
+ ### 2. AI Analysis
76
+
77
+ When OpenAI API key is provided, the system:
78
+
79
+ - **Categorizes** conferences into your target categories
80
+ - **Extracts** structured data (full names, locations, etc.)
81
+ - **Validates** that conferences are legitimate academic events
82
+ - **Assigns confidence scores** based on relevance and quality
83
+
84
+ ### 3. Filtering & Validation
85
+
86
+ Conferences must meet criteria to be added:
87
+
88
+ - **Confidence score** ≥ 0.6 (configurable)
89
+ - **Title length** ≥ 3 characters
90
+ - **Has relevant tags** from your target categories
91
+ - **Future dates** (current or next year)
92
+ - **Not duplicates** of existing conferences
93
+
94
+ ### 4. Output
95
+
96
+ Valid conferences are:
97
+
98
+ - Added to `src/data/conferences.yml`
99
+ - Formatted consistently with existing entries
100
+ - Marked with discovery source for verification
101
+ - Sorted by deadline
102
+
103
+ ## Usage
104
+
105
+ ### Automatic (Recommended)
106
+
107
+ The system runs automatically:
108
+ - **Weekly**: Every Monday at 6 AM UTC via GitHub Actions
109
+ - **Manual**: Trigger via GitHub Actions "Run workflow" button
110
+ - **On changes**: When someone modifies `conferences.yml`
111
+
112
+ ### Manual Execution
113
+
114
+ ```bash
115
+ # Install dependencies
116
+ pip install -r .github/scripts/requirements.txt
117
+
118
+ # Set API key (optional)
119
+ export OPENAI_API_KEY="your-key-here"
120
+
121
+ # Run discovery
122
+ python .github/scripts/ai_conference_discovery.py
123
+ ```
124
+
125
+ ## Sample Output
126
+
127
+ The system will add conferences like:
128
+
129
+ ```yaml
130
+ - title: NEURIPS
131
+ year: 2026
132
+ id: neurips26
133
+ full_name: Conference on Neural Information Processing Systems
134
+ link: https://neurips.cc/Conferences/2026
135
+ deadline: '2026-05-20 23:59:59'
136
+ timezone: AoE
137
+ tags:
138
+ - machine-learning
139
+ - deep-learning
140
+ city: Vancouver
141
+ country: Canada
142
+ note: 'Auto-discovered from WikiCFP. Please verify details.'
143
+ ```
144
+
145
+ ## Monitoring & Debugging
146
+
147
+ ### Logs
148
+
149
+ The system provides detailed logging:
150
+ - Conference discovery progress
151
+ - AI analysis results
152
+ - Filtering decisions
153
+ - Errors and warnings
154
+
155
+ ### Manual Review
156
+
157
+ All auto-discovered conferences include:
158
+ - Source attribution in the `note` field
159
+ - GitHub PR for review before merging
160
+ - Confidence scores for quality assessment
161
+
162
+ ### Troubleshooting
163
+
164
+ Common issues:
165
+
166
+ 1. **No conferences found**: Check if keywords in `ai_config.yml` are relevant
167
+ 2. **Low confidence scores**: Adjust `confidence_threshold` in config
168
+ 3. **API rate limits**: Increase delays in rate limiting settings
169
+ 4. **Duplicates**: System automatically deduplicates based on title+year
170
+
171
+ ## Customization
172
+
173
+ ### Adding New Sources
174
+
175
+ To add new conference sources:
176
+
177
+ 1. Add URL to `sources` in `ai_config.yml`
178
+ 2. Implement parsing logic in `ai_conference_discovery.py`
179
+ 3. Test with a small keyword set first
180
+
181
+ ### Modifying Categories
182
+
183
+ To change target categories:
184
+
185
+ 1. Edit `target_categories` in `ai_config.yml`
186
+ 2. Add relevant keywords for each category
187
+ 3. Update the category mapping in your filtering logic
188
+
189
+ ### Adjusting Quality Filters
190
+
191
+ Fine-tune discovery by modifying:
192
+
193
+ - `confidence_threshold`: Higher = fewer but higher quality conferences
194
+ - `years_ahead`: How far into the future to look
195
+ - `exclude_patterns`: Patterns to filter out (workshops, etc.)
196
+
197
+ ## Cost Considerations
198
+
199
+ ### OpenAI API Usage
200
+
201
+ Typical costs per discovery run:
202
+ - ~$0.10-0.50 for analyzing 50 conferences
203
+ - Depends on description length and model choice
204
+ - Can be disabled by setting `ai_enhancement.enabled: false`
205
+
206
+ ### Rate Limiting
207
+
208
+ The system respects rate limits:
209
+ - 1 second delay between WikiCFP requests
210
+ - 0.5 second delay between OpenAI API calls
211
+ - Configurable timeouts and retries
212
+
213
+ ## Security & Privacy
214
+
215
+ - API keys stored as GitHub secrets
216
+ - No sensitive data logged
217
+ - Respects robots.txt where possible
218
+ - User-agent identifies the tool appropriately
219
+
220
+ ## Contributing
221
+
222
+ To improve the discovery system:
223
+
224
+ 1. Add new conference sources in the scraping modules
225
+ 2. Improve AI prompts for better categorization
226
+ 3. Enhance parsing logic for different website formats
227
+ 4. Add new target categories or keywords
228
+
229
+ ## Support
230
+
231
+ For issues or improvements:
232
+
233
+ 1. Check the GitHub Actions logs for error details
234
+ 2. Test manually with `python ai_conference_discovery.py`
235
+ 3. Verify configuration in `ai_config.yml`
236
+ 4. Submit issues with example conference URLs that should be discovered
.github/scripts/ai_conference_discovery.py ADDED
@@ -0,0 +1,496 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ AI-Powered Conference Discovery System
4
+
5
+ This script automatically discovers new AI conferences by:
6
+ 1. Scraping multiple reliable sources (WikiCFP, conference websites, etc.)
7
+ 2. Using AI models to categorize and extract conference details
8
+ 3. Validating and deduplicating against existing conferences
9
+ 4. Adding new conferences to conferences.yml
10
+ """
11
+
12
+ import os
13
+ import json
14
+ import yaml
15
+ import requests
16
+ import time
17
+ from datetime import datetime, timedelta
18
+ from typing import Dict, List, Any, Optional, Tuple
19
+ from dataclasses import dataclass
20
+ from urllib.parse import urljoin, urlparse
21
+ import re
22
+ from bs4 import BeautifulSoup
23
+
24
+ # Configuration for target categories
25
+ TARGET_CATEGORIES = {
26
+ "machine-learning": ["machine learning", "ML", "artificial intelligence", "AI"],
27
+ "lifelong-learning": ["lifelong learning", "continual learning", "incremental learning"],
28
+ "robotics": ["robotics", "autonomous systems", "robot"],
29
+ "computer-vision": ["computer vision", "CV", "image processing", "visual recognition"],
30
+ "web-search": ["web search", "information retrieval", "search engines"],
31
+ "data-mining": ["data mining", "knowledge discovery", "big data analytics"],
32
+ "natural-language-processing": ["natural language processing", "NLP", "computational linguistics", "text mining"],
33
+ "signal-processing": ["signal processing", "DSP", "audio processing", "speech"],
34
+ "human-computer-interaction": ["HCI", "human computer interaction", "user interface", "UX"],
35
+ "computer-graphics": ["computer graphics", "visualization", "rendering", "3D"],
36
+ "mathematics": ["mathematics", "mathematical optimization", "numerical methods"],
37
+ "reinforcement-learning": ["reinforcement learning", "RL", "deep RL", "multi-agent"]
38
+ }
39
+
40
+ @dataclass
41
+ class ConferenceCandidate:
42
+ """Data class for discovered conference candidates"""
43
+ title: str
44
+ full_name: str = ""
45
+ url: str = ""
46
+ deadline: str = ""
47
+ abstract_deadline: str = ""
48
+ conference_date: str = ""
49
+ location: str = ""
50
+ city: str = ""
51
+ country: str = ""
52
+ description: str = ""
53
+ tags: List[str] = None
54
+ year: int = 0
55
+ confidence_score: float = 0.0
56
+ source: str = ""
57
+
58
+ def __post_init__(self):
59
+ if self.tags is None:
60
+ self.tags = []
61
+
62
+ class ConferenceDiscoveryEngine:
63
+ """Main engine for discovering conferences using AI and web scraping"""
64
+
65
+ def __init__(self, openai_api_key: Optional[str] = None):
66
+ self.openai_api_key = openai_api_key or os.getenv('OPENAI_API_KEY')
67
+ self.session = requests.Session()
68
+ self.session.headers.update({
69
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
70
+ })
71
+
72
+ def discover_conferences(self) -> List[ConferenceCandidate]:
73
+ """Main method to discover conferences from multiple sources"""
74
+ candidates = []
75
+
76
+ print("🔍 Starting AI-powered conference discovery...")
77
+
78
+ # Source 1: WikiCFP
79
+ print("📊 Scraping WikiCFP...")
80
+ wikicfp_candidates = self._scrape_wikicfp()
81
+ candidates.extend(wikicfp_candidates)
82
+
83
+ # Source 2: AI Conference deadlines websites
84
+ print("🌐 Scraping popular AI deadline trackers...")
85
+ deadline_sites_candidates = self._scrape_deadline_sites()
86
+ candidates.extend(deadline_sites_candidates)
87
+
88
+ # Source 3: University AI department pages
89
+ print("🎓 Checking university AI department pages...")
90
+ university_candidates = self._scrape_university_pages()
91
+ candidates.extend(university_candidates)
92
+
93
+ # Use AI to enhance and categorize candidates
94
+ print("🤖 Using AI to analyze and categorize conferences...")
95
+ enhanced_candidates = self._ai_enhance_candidates(candidates)
96
+
97
+ # Filter and validate
98
+ print("✅ Filtering and validating candidates...")
99
+ valid_candidates = self._filter_candidates(enhanced_candidates)
100
+
101
+ print(f"🎉 Discovered {len(valid_candidates)} potential new conferences")
102
+ return valid_candidates
103
+
104
+ def _scrape_wikicfp(self) -> List[ConferenceCandidate]:
105
+ """Scrape WikiCFP for conference information"""
106
+ candidates = []
107
+ base_url = "http://www.wikicfp.com/cfp/"
108
+
109
+ # Search for conferences in our target categories
110
+ for category, keywords in TARGET_CATEGORIES.items():
111
+ for keyword in keywords[:2]: # Limit to avoid overwhelming
112
+ try:
113
+ search_url = f"{base_url}servlet/tool.search?q={keyword.replace(' ', '+')}&year=f"
114
+ response = self._safe_request(search_url)
115
+ if not response:
116
+ continue
117
+
118
+ soup = BeautifulSoup(response.text, 'html.parser')
119
+ conferences = self._parse_wikicfp_results(soup, category)
120
+ candidates.extend(conferences)
121
+
122
+ time.sleep(1) # Be respectful
123
+ except Exception as e:
124
+ print(f"Error scraping WikiCFP for {keyword}: {e}")
125
+
126
+ return candidates
127
+
128
+ def _parse_wikicfp_results(self, soup: BeautifulSoup, category: str) -> List[ConferenceCandidate]:
129
+ """Parse WikiCFP search results"""
130
+ candidates = []
131
+
132
+ # WikiCFP results are typically in tables
133
+ for row in soup.find_all('tr')[1:10]: # Skip header, limit results
134
+ cells = row.find_all('td')
135
+ if len(cells) >= 4:
136
+ try:
137
+ title_cell = cells[0]
138
+ deadline_cell = cells[1]
139
+ location_cell = cells[2]
140
+
141
+ title_link = title_cell.find('a')
142
+ if title_link:
143
+ title = title_link.get_text(strip=True)
144
+ url = urljoin("http://www.wikicfp.com/cfp/", title_link.get('href', ''))
145
+
146
+ candidate = ConferenceCandidate(
147
+ title=title,
148
+ url=url,
149
+ deadline=deadline_cell.get_text(strip=True),
150
+ location=location_cell.get_text(strip=True),
151
+ tags=[category],
152
+ source="WikiCFP"
153
+ )
154
+
155
+ # Extract more details from the conference page
156
+ self._enhance_from_wikicfp_page(candidate)
157
+ candidates.append(candidate)
158
+
159
+ except Exception as e:
160
+ print(f"Error parsing WikiCFP row: {e}")
161
+ continue
162
+
163
+ return candidates
164
+
165
+ def _enhance_from_wikicfp_page(self, candidate: ConferenceCandidate):
166
+ """Extract additional details from individual WikiCFP conference pages"""
167
+ try:
168
+ response = self._safe_request(candidate.url)
169
+ if not response:
170
+ return
171
+
172
+ soup = BeautifulSoup(response.text, 'html.parser')
173
+
174
+ # Extract conference details
175
+ content = soup.find('div', class_='cfp') or soup.find('table')
176
+ if content:
177
+ text = content.get_text()
178
+
179
+ # Extract conference dates
180
+ date_pattern = r'Conference[:\s]*([A-Za-z]+ \d{1,2}[-–—]\d{1,2}, \d{4})'
181
+ date_match = re.search(date_pattern, text)
182
+ if date_match:
183
+ candidate.conference_date = date_match.group(1)
184
+
185
+ # Extract abstract deadline
186
+ abstract_pattern = r'Abstract[:\s]*([A-Za-z]+ \d{1,2}, \d{4})'
187
+ abstract_match = re.search(abstract_pattern, text)
188
+ if abstract_match:
189
+ candidate.abstract_deadline = abstract_match.group(1)
190
+
191
+ # Extract location details
192
+ location_pattern = r'Location[:\s]*([^.\n]+)'
193
+ location_match = re.search(location_pattern, text)
194
+ if location_match:
195
+ candidate.location = location_match.group(1).strip()
196
+
197
+ except Exception as e:
198
+ print(f"Error enhancing WikiCFP page {candidate.url}: {e}")
199
+
200
+ def _scrape_deadline_sites(self) -> List[ConferenceCandidate]:
201
+ """Scrape popular AI deadline tracking websites"""
202
+ candidates = []
203
+
204
+ # Popular deadline tracking sites
205
+ sites = [
206
+ "https://aideadlin.es/",
207
+ "https://jackietseng.github.io/conference_call_for_paper/conferences.html"
208
+ ]
209
+
210
+ for site_url in sites:
211
+ try:
212
+ response = self._safe_request(site_url)
213
+ if response:
214
+ soup = BeautifulSoup(response.text, 'html.parser')
215
+ site_candidates = self._parse_deadline_site(soup, site_url)
216
+ candidates.extend(site_candidates)
217
+ except Exception as e:
218
+ print(f"Error scraping {site_url}: {e}")
219
+
220
+ return candidates
221
+
222
+ def _parse_deadline_site(self, soup: BeautifulSoup, source_url: str) -> List[ConferenceCandidate]:
223
+ """Parse deadline tracking websites for conference info"""
224
+ candidates = []
225
+
226
+ # Look for conference entries (this will vary by site structure)
227
+ conf_elements = (soup.find_all('div', class_='conf') +
228
+ soup.find_all('tr') +
229
+ soup.find_all('li'))
230
+
231
+ for element in conf_elements[:20]: # Limit results
232
+ try:
233
+ text = element.get_text(strip=True)
234
+ if len(text) > 10 and any(keyword in text.lower() for keywords in TARGET_CATEGORIES.values() for keyword in keywords):
235
+
236
+ # Extract conference name and deadline
237
+ title_match = re.search(r'([A-Z]{2,}[\w\s]*\d{4})', text)
238
+ deadline_match = re.search(r'(\w+ \d{1,2}, \d{4})', text)
239
+
240
+ if title_match:
241
+ candidate = ConferenceCandidate(
242
+ title=title_match.group(1),
243
+ deadline=deadline_match.group(1) if deadline_match else "",
244
+ source=f"DeadlineTracker-{urlparse(source_url).netloc}",
245
+ description=text[:200]
246
+ )
247
+ candidates.append(candidate)
248
+
249
+ except Exception as e:
250
+ continue
251
+
252
+ return candidates
253
+
254
+ def _scrape_university_pages(self) -> List[ConferenceCandidate]:
255
+ """Scrape university AI department pages for conference announcements"""
256
+ candidates = []
257
+
258
+ # Major AI research institutions
259
+ university_urls = [
260
+ "https://www.cs.stanford.edu/news/",
261
+ "https://www.csail.mit.edu/news",
262
+ "https://ai.berkeley.edu/news/",
263
+ "https://www.cs.cmu.edu/news"
264
+ ]
265
+
266
+ for url in university_urls:
267
+ try:
268
+ response = self._safe_request(url)
269
+ if response:
270
+ soup = BeautifulSoup(response.text, 'html.parser')
271
+ # Look for conference-related announcements
272
+ links = soup.find_all('a', href=True)
273
+ for link in links[:10]:
274
+ link_text = link.get_text(strip=True).lower()
275
+ if ('conference' in link_text or 'cfp' in link_text or
276
+ 'call for papers' in link_text):
277
+ # This is a potential conference announcement
278
+ # You would extract more details here
279
+ pass
280
+ except Exception as e:
281
+ print(f"Error scraping {url}: {e}")
282
+
283
+ return candidates
284
+
285
+ def _ai_enhance_candidates(self, candidates: List[ConferenceCandidate]) -> List[ConferenceCandidate]:
286
+ """Use AI to enhance and categorize conference candidates"""
287
+ if not self.openai_api_key:
288
+ print("⚠️ No OpenAI API key found. Skipping AI enhancement.")
289
+ return candidates
290
+
291
+ enhanced = []
292
+
293
+ try:
294
+ import openai
295
+ openai.api_key = self.openai_api_key
296
+
297
+ for candidate in candidates:
298
+ try:
299
+ # Create a prompt for the AI to analyze the conference
300
+ prompt = f"""
301
+ Analyze this conference information and provide structured data:
302
+
303
+ Title: {candidate.title}
304
+ Description: {candidate.description}
305
+ Location: {candidate.location}
306
+ Current Tags: {candidate.tags}
307
+
308
+ Please provide:
309
+ 1. Most appropriate categories from: {list(TARGET_CATEGORIES.keys())}
310
+ 2. Confidence score (0-1) that this is a legitimate AI/CS conference
311
+ 3. Standardized full conference name
312
+ 4. Extracted city and country from location
313
+ 5. Year (if determinable)
314
+
315
+ Respond in JSON format only.
316
+ """
317
+
318
+ response = openai.ChatCompletion.create(
319
+ model="gpt-3.5-turbo",
320
+ messages=[{"role": "user", "content": prompt}],
321
+ max_tokens=300,
322
+ temperature=0.1
323
+ )
324
+
325
+ ai_analysis = json.loads(response.choices[0].message.content)
326
+
327
+ # Update candidate with AI insights
328
+ candidate.tags = ai_analysis.get('categories', candidate.tags)
329
+ candidate.confidence_score = ai_analysis.get('confidence_score', 0.5)
330
+ candidate.full_name = ai_analysis.get('full_name', candidate.title)
331
+ candidate.city = ai_analysis.get('city', candidate.city)
332
+ candidate.country = ai_analysis.get('country', candidate.country)
333
+ candidate.year = ai_analysis.get('year', candidate.year)
334
+
335
+ enhanced.append(candidate)
336
+
337
+ time.sleep(0.5) # Rate limiting
338
+
339
+ except Exception as e:
340
+ print(f"Error in AI analysis for {candidate.title}: {e}")
341
+ enhanced.append(candidate) # Add without enhancement
342
+
343
+ except ImportError:
344
+ print("OpenAI package not available. Install with: pip install openai")
345
+ return candidates
346
+
347
+ return enhanced
348
+
349
+ def _filter_candidates(self, candidates: List[ConferenceCandidate]) -> List[ConferenceCandidate]:
350
+ """Filter and validate conference candidates"""
351
+ current_year = datetime.now().year
352
+ next_year = current_year + 1
353
+
354
+ valid_candidates = []
355
+
356
+ for candidate in candidates:
357
+ # Basic validation criteria
358
+ if (candidate.confidence_score >= 0.6 and # AI confidence threshold
359
+ len(candidate.title) >= 3 and # Reasonable title length
360
+ candidate.tags and # Has categories
361
+ any(year in [current_year, next_year] for year in [candidate.year]) and # Current/next year
362
+ candidate.title not in [existing['title'] for existing in self._load_existing_conferences()]): # Not duplicate
363
+
364
+ valid_candidates.append(candidate)
365
+
366
+ return valid_candidates
367
+
368
+ def _load_existing_conferences(self) -> List[Dict]:
369
+ """Load existing conferences to avoid duplicates"""
370
+ try:
371
+ with open('src/data/conferences.yml', 'r') as f:
372
+ return yaml.safe_load(f) or []
373
+ except FileNotFoundError:
374
+ return []
375
+
376
+ def _safe_request(self, url: str, timeout: int = 10) -> Optional[requests.Response]:
377
+ """Make a safe HTTP request with error handling"""
378
+ try:
379
+ response = self.session.get(url, timeout=timeout)
380
+ response.raise_for_status()
381
+ return response
382
+ except Exception as e:
383
+ print(f"Request failed for {url}: {e}")
384
+ return None
385
+
386
+ def add_to_conferences_yml(self, candidates: List[ConferenceCandidate]) -> int:
387
+ """Add validated candidates to conferences.yml"""
388
+ if not candidates:
389
+ return 0
390
+
391
+ # Load existing conferences
392
+ existing_conferences = self._load_existing_conferences()
393
+
394
+ added_count = 0
395
+ for candidate in candidates:
396
+ # Convert to conference format
397
+ conference_entry = {
398
+ 'title': candidate.title,
399
+ 'year': candidate.year or datetime.now().year + 1,
400
+ 'id': self._generate_conference_id(candidate.title, candidate.year),
401
+ 'full_name': candidate.full_name or candidate.title,
402
+ 'link': candidate.url,
403
+ 'deadline': self._parse_deadline(candidate.deadline),
404
+ 'timezone': 'AoE', # Default timezone
405
+ 'date': candidate.conference_date,
406
+ 'tags': candidate.tags,
407
+ 'city': candidate.city,
408
+ 'country': candidate.country,
409
+ 'note': f'Auto-discovered from {candidate.source}. Please verify details.'
410
+ }
411
+
412
+ # Add abstract deadline if available
413
+ if candidate.abstract_deadline:
414
+ conference_entry['abstract_deadline'] = self._parse_deadline(candidate.abstract_deadline)
415
+
416
+ existing_conferences.append(conference_entry)
417
+ added_count += 1
418
+
419
+ # Sort conferences by deadline
420
+ existing_conferences.sort(key=lambda x: x.get('deadline', '9999'))
421
+
422
+ # Write back to file
423
+ with open('src/data/conferences.yml', 'w') as f:
424
+ yaml.dump(existing_conferences, f, default_flow_style=False, sort_keys=False)
425
+
426
+ return added_count
427
+
428
+ def _generate_conference_id(self, title: str, year: int) -> str:
429
+ """Generate a unique conference ID"""
430
+ # Extract acronym or use first few letters
431
+ words = title.split()
432
+ if len(words) > 1:
433
+ acronym = ''.join([word[0].lower() for word in words if word[0].isupper()])
434
+ if len(acronym) >= 2:
435
+ return f"{acronym}{str(year)[-2:]}"
436
+
437
+ # Fallback to first few letters + year
438
+ clean_title = re.sub(r'[^a-zA-Z0-9]', '', title.lower())
439
+ return f"{clean_title[:6]}{str(year)[-2:]}"
440
+
441
+ def _parse_deadline(self, deadline_str: str) -> str:
442
+ """Parse deadline string into standardized format"""
443
+ if not deadline_str:
444
+ return ""
445
+
446
+ try:
447
+ # Try to parse various deadline formats
448
+ deadline_patterns = [
449
+ r'(\w+ \d{1,2}, \d{4})',
450
+ r'(\d{4}-\d{2}-\d{2})',
451
+ r'(\d{1,2}/\d{1,2}/\d{4})'
452
+ ]
453
+
454
+ for pattern in deadline_patterns:
455
+ match = re.search(pattern, deadline_str)
456
+ if match:
457
+ date_str = match.group(1)
458
+ # Convert to standardized format (YYYY-MM-DD HH:MM:SS)
459
+ try:
460
+ parsed_date = datetime.strptime(date_str, "%B %d, %Y")
461
+ return parsed_date.strftime("%Y-%m-%d 23:59:59")
462
+ except ValueError:
463
+ try:
464
+ parsed_date = datetime.strptime(date_str, "%Y-%m-%d")
465
+ return parsed_date.strftime("%Y-%m-%d 23:59:59")
466
+ except ValueError:
467
+ continue
468
+
469
+ return deadline_str # Return as-is if parsing fails
470
+
471
+ except Exception:
472
+ return deadline_str
473
+
474
+ def main():
475
+ """Main function to run conference discovery"""
476
+ print("🚀 Starting AI-Powered Conference Discovery System")
477
+
478
+ # Initialize the discovery engine
479
+ engine = ConferenceDiscoveryEngine()
480
+
481
+ # Discover conferences
482
+ candidates = engine.discover_conferences()
483
+
484
+ if candidates:
485
+ print(f"\n📋 Found {len(candidates)} potential conferences:")
486
+ for candidate in candidates:
487
+ print(f" • {candidate.title} ({candidate.confidence_score:.2f} confidence) - {candidate.tags}")
488
+
489
+ # Add to conferences.yml
490
+ added_count = engine.add_to_conferences_yml(candidates)
491
+ print(f"\n✅ Added {added_count} new conferences to conferences.yml")
492
+ else:
493
+ print("❌ No new conferences discovered")
494
+
495
+ if __name__ == "__main__":
496
+ main()
.github/scripts/ai_config.yml ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AI Conference Discovery Configuration
2
+
3
+ # Target categories and their associated keywords for discovery
4
+ target_categories:
5
+ machine-learning:
6
+ - "machine learning"
7
+ - "ML"
8
+ - "artificial intelligence"
9
+ - "AI"
10
+ - "deep learning"
11
+ - "neural networks"
12
+
13
+ lifelong-learning:
14
+ - "lifelong learning"
15
+ - "continual learning"
16
+ - "incremental learning"
17
+ - "online learning"
18
+
19
+ robotics:
20
+ - "robotics"
21
+ - "autonomous systems"
22
+ - "robot"
23
+ - "automation"
24
+ - "mechatronics"
25
+
26
+ computer-vision:
27
+ - "computer vision"
28
+ - "CV"
29
+ - "image processing"
30
+ - "visual recognition"
31
+ - "pattern recognition"
32
+
33
+ web-search:
34
+ - "web search"
35
+ - "information retrieval"
36
+ - "search engines"
37
+ - "IR"
38
+
39
+ data-mining:
40
+ - "data mining"
41
+ - "knowledge discovery"
42
+ - "big data analytics"
43
+ - "data science"
44
+
45
+ natural-language-processing:
46
+ - "natural language processing"
47
+ - "NLP"
48
+ - "computational linguistics"
49
+ - "text mining"
50
+ - "language models"
51
+
52
+ signal-processing:
53
+ - "signal processing"
54
+ - "DSP"
55
+ - "audio processing"
56
+ - "speech"
57
+ - "multimedia"
58
+
59
+ human-computer-interaction:
60
+ - "HCI"
61
+ - "human computer interaction"
62
+ - "user interface"
63
+ - "UX"
64
+ - "usability"
65
+
66
+ computer-graphics:
67
+ - "computer graphics"
68
+ - "visualization"
69
+ - "rendering"
70
+ - "3D"
71
+ - "virtual reality"
72
+
73
+ mathematics:
74
+ - "mathematics"
75
+ - "mathematical optimization"
76
+ - "numerical methods"
77
+ - "statistics"
78
+
79
+ reinforcement-learning:
80
+ - "reinforcement learning"
81
+ - "RL"
82
+ - "deep RL"
83
+ - "multi-agent"
84
+ - "Q-learning"
85
+
86
+ # Discovery sources configuration
87
+ sources:
88
+ wikicfp:
89
+ enabled: true
90
+ base_url: "http://www.wikicfp.com/cfp/"
91
+ max_results_per_keyword: 10
92
+ request_delay: 1 # seconds between requests
93
+
94
+ deadline_trackers:
95
+ enabled: true
96
+ urls:
97
+ - "https://aideadlin.es/"
98
+ - "https://jackietseng.github.io/conference_call_for_paper/conferences.html"
99
+ max_results_per_site: 20
100
+
101
+ university_pages:
102
+ enabled: false # Can be resource intensive
103
+ urls:
104
+ - "https://www.cs.stanford.edu/news/"
105
+ - "https://www.csail.mit.edu/news"
106
+ - "https://ai.berkeley.edu/news/"
107
+ - "https://www.cs.cmu.edu/news"
108
+
109
+ # AI enhancement configuration
110
+ ai_enhancement:
111
+ enabled: true
112
+ model: "gpt-3.5-turbo"
113
+ confidence_threshold: 0.6 # Minimum confidence to include conference
114
+ max_tokens: 300
115
+ temperature: 0.1
116
+ rate_limit_delay: 0.5 # seconds between API calls
117
+
118
+ # Filtering criteria
119
+ filtering:
120
+ min_title_length: 3
121
+ max_description_length: 500
122
+ years_ahead: 2 # How many years in the future to consider
123
+ exclude_patterns:
124
+ - "workshop"
125
+ - "symposium on XYZ" # Add patterns to exclude
126
+
127
+ required_fields:
128
+ - "title"
129
+ - "tags"
130
+
131
+ # Output configuration
132
+ output:
133
+ auto_add_to_yml: true
134
+ backup_before_changes: true
135
+ sort_by_deadline: true
136
+ add_discovery_note: true
137
+
138
+ # Rate limiting and safety
139
+ rate_limiting:
140
+ max_requests_per_minute: 30
141
+ request_timeout: 10 # seconds
142
+ max_retries: 3
143
+ backoff_factor: 2
144
+
145
+ # Logging
146
+ logging:
147
+ level: "INFO"
148
+ include_timestamps: true
149
+ log_api_calls: false # Set to true for debugging
.github/scripts/requirements.txt ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core dependencies for conference automation
2
+ pyyaml>=6.0
3
+ requests>=2.25.0
4
+
5
+ # Web scraping dependencies
6
+ beautifulsoup4>=4.9.0
7
+ lxml>=4.6.0
8
+
9
+ # AI/ML dependencies
10
+ openai>=0.27.0
11
+
12
+ # Data processing
13
+ python-dateutil>=2.8.0
14
+
15
+ # Optional: For more advanced NLP tasks
16
+ # nltk>=3.6.0
17
+ # spacy>=3.4.0
.github/workflows/update-conferences.yml CHANGED
@@ -5,6 +5,8 @@ permissions:
5
 
6
  on:
7
  workflow_dispatch: # Allow manual trigger
 
 
8
  pull_request:
9
  paths:
10
  - 'src/data/conferences.yml'
@@ -24,11 +26,17 @@ jobs:
24
  - name: Install dependencies
25
  run: |
26
  python -m pip install --upgrade pip
27
- pip install pyyaml requests
28
 
29
- - name: Update conferences
30
  run: python .github/scripts/update_conferences.py
31
 
 
 
 
 
 
 
32
  - name: Check for changes
33
  id: git-check
34
  run: |
@@ -38,10 +46,12 @@ jobs:
38
  if: steps.git-check.outputs.changes == 'true'
39
  uses: peter-evans/create-pull-request@v5
40
  with:
41
- commit-message: 'chore: update conference data from ccfddl'
42
- title: 'Update conference data from ccfddl'
43
  body: |
44
- This PR updates the conference data from the ccfddl repository.
 
 
45
 
46
  Auto-generated by GitHub Actions.
47
  branch: update-conferences
 
5
 
6
  on:
7
  workflow_dispatch: # Allow manual trigger
8
+ schedule:
9
+ - cron: '0 6 * * 1' # Run every Monday at 6 AM UTC
10
  pull_request:
11
  paths:
12
  - 'src/data/conferences.yml'
 
26
  - name: Install dependencies
27
  run: |
28
  python -m pip install --upgrade pip
29
+ pip install -r .github/scripts/requirements.txt
30
 
31
+ - name: Update conferences from ccfddl
32
  run: python .github/scripts/update_conferences.py
33
 
34
+ - name: AI-powered conference discovery
35
+ env:
36
+ OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
37
+ run: python .github/scripts/ai_conference_discovery.py
38
+ continue-on-error: true # Don't fail the workflow if AI discovery fails
39
+
40
  - name: Check for changes
41
  id: git-check
42
  run: |
 
46
  if: steps.git-check.outputs.changes == 'true'
47
  uses: peter-evans/create-pull-request@v5
48
  with:
49
+ commit-message: 'chore: update conference data from ccfddl and AI discovery'
50
+ title: 'Update conference data (ccfddl + AI discovery)'
51
  body: |
52
+ This PR updates the conference data from multiple sources:
53
+ - Updates from ccfddl repository
54
+ - New conferences discovered via AI-powered web scraping
55
 
56
  Auto-generated by GitHub Actions.
57
  branch: update-conferences