milwright commited on
Commit
7b3ba65
Β·
1 Parent(s): eeb2654

Add filtering for excessive dashes in passage extraction

Browse files

- Detect sequences of 3+ consecutive dashes/hyphens
- Calculate dash ratio relative to total words
- Add quality scoring penalties for excessive dash usage
- Reject passages with dash sequences or >2% dash ratio
- Prevents selection of passages with formatting separators

Files changed (2) hide show
  1. LEADERBOARD_ROADMAP.md +171 -0
  2. src/clozeGameEngine.js +7 -0
LEADERBOARD_ROADMAP.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Cloze Reader Leaderboard Implementation Roadmap
2
+
3
+ ## Overview
4
+ This document outlines the implementation plan for adding a competitive leaderboard system to the Cloze Reader game, where players can submit their scores using 3-letter acronyms.
5
+
6
+ ## Phase 1: Core Infrastructure (Week 1-2)
7
+
8
+ ### 1.1 Database Schema
9
+ - Create leaderboard table structure:
10
+ ```sql
11
+ leaderboard {
12
+ id: UUID
13
+ acronym: VARCHAR(3)
14
+ score: INTEGER
15
+ level_reached: INTEGER
16
+ total_time: INTEGER (seconds)
17
+ created_at: TIMESTAMP
18
+ ip_hash: VARCHAR(64) // For rate limiting
19
+ }
20
+ ```
21
+
22
+ ### 1.2 API Endpoints
23
+ - `POST /api/leaderboard/submit` - Submit new score
24
+ - `GET /api/leaderboard/top/{period}` - Get top scores (daily/weekly/all-time)
25
+ - `GET /api/leaderboard/check-acronym/{acronym}` - Validate acronym availability
26
+
27
+ ### 1.3 Score Calculation
28
+ - Base score = (correct_answers * 100) * level_multiplier
29
+ - Time bonus = max(0, 1000 - seconds_per_round)
30
+ - Streak bonus = consecutive_correct * 50
31
+
32
+ ## Phase 2: Frontend Integration (Week 2-3)
33
+
34
+ ### 2.1 UI Components
35
+ - **Leaderboard Modal** (`leaderboardModal.js`)
36
+ - Top 10 display with rank, acronym, score, level
37
+ - Period toggle (Today/Week/All-Time)
38
+ - Personal best highlight
39
+
40
+ ### 2.2 Score Submission Flow
41
+ - End-of-game prompt for acronym entry
42
+ - 3-letter validation (A-Z only)
43
+ - Profanity filter implementation
44
+ - Success/error feedback
45
+
46
+ ### 2.3 Visual Elements
47
+ - Trophy icons for top 3 positions
48
+ - Animated score counter
49
+ - Level badges display
50
+
51
+ ## Phase 3: Security & Performance (Week 3-4)
52
+
53
+ ### 3.1 Anti-Cheat Measures
54
+ - Server-side score validation
55
+ - Rate limiting (1 submission per 5 minutes per IP)
56
+ - Score feasibility checks (max possible score per level)
57
+ - Request signing with session tokens
58
+
59
+ ### 3.2 Caching Strategy
60
+ - Redis cache for top 100 scores
61
+ - 5-minute TTL for leaderboard queries
62
+ - Real-time updates for top 10 changes
63
+
64
+ ### 3.3 Data Persistence
65
+ - PostgreSQL for primary storage
66
+ - Daily backups of leaderboard data
67
+ - Archived monthly snapshots
68
+
69
+ ## Phase 4: Advanced Features (Week 4-5)
70
+
71
+ ### 4.1 Achievement System
72
+ - "First Timer" - First leaderboard entry
73
+ - "Vocabulary Master" - 10+ correct in a row
74
+ - "Speed Reader" - Complete round < 30 seconds
75
+ - "Persistent Scholar" - Play 7 days straight
76
+
77
+ ### 4.2 Social Features
78
+ - Share score to social media
79
+ - Challenge link generation
80
+ - Friend acronym tracking
81
+
82
+ ### 4.3 Analytics Dashboard
83
+ - Player retention metrics
84
+ - Popular acronym analysis
85
+ - Score distribution graphs
86
+
87
+ ## Technical Implementation Details
88
+
89
+ ### Backend Changes Required
90
+
91
+ 1. **FastAPI Endpoints** (`app.py`):
92
+ ```python
93
+ @app.post("/api/leaderboard/submit")
94
+ async def submit_score(score_data: ScoreSubmission)
95
+
96
+ @app.get("/api/leaderboard/top/{period}")
97
+ async def get_leaderboard(period: str, limit: int = 10)
98
+ ```
99
+
100
+ 2. **Database Models** (`models.py` - new file):
101
+ ```python
102
+ class LeaderboardEntry(Base):
103
+ __tablename__ = "leaderboard"
104
+ # Schema implementation
105
+ ```
106
+
107
+ 3. **Validation Service** (`validation.py` - new file):
108
+ - Acronym format validation
109
+ - Profanity checking
110
+ - Score feasibility verification
111
+
112
+ ### Frontend Changes Required
113
+
114
+ 1. **Game Engine Integration** (`clozeGameEngine.js`):
115
+ - Track game metrics for scoring
116
+ - Call submission API on game end
117
+ - Store session data for validation
118
+
119
+ 2. **UI Updates** (`app.js`):
120
+ - Add leaderboard button to main menu
121
+ - Integrate submission modal
122
+ - Handle API responses
123
+
124
+ 3. **New Modules**:
125
+ - `leaderboardService.js` - API communication
126
+ - `scoreCalculator.js` - Client-side scoring logic
127
+ - `leaderboardUI.js` - UI component management
128
+
129
+ ## Deployment Considerations
130
+
131
+ ### Infrastructure Requirements
132
+ - Database: PostgreSQL 14+
133
+ - Cache: Redis 6+
134
+ - API rate limiting: nginx or API Gateway
135
+ - SSL certificate for secure submissions
136
+
137
+ ### Environment Variables
138
+ ```
139
+ DATABASE_URL=postgresql://...
140
+ REDIS_URL=redis://...
141
+ LEADERBOARD_SECRET=... # For request signing
142
+ PROFANITY_API_KEY=... # Optional external service
143
+ ```
144
+
145
+ ### Migration Strategy
146
+ 1. Deploy database schema
147
+ 2. Enable API endpoints (feature flagged)
148
+ 3. Gradual UI rollout (A/B testing)
149
+ 4. Full launch with announcement
150
+
151
+ ## Success Metrics
152
+
153
+ - **Engagement**: 30% of players submit scores
154
+ - **Retention**: 15% return to beat their score
155
+ - **Performance**: <100ms leaderboard load time
156
+ - **Security**: Zero validated cheating incidents
157
+
158
+ ## Timeline Summary
159
+
160
+ - **Week 1-2**: Backend infrastructure
161
+ - **Week 2-3**: Frontend integration
162
+ - **Week 3-4**: Security hardening
163
+ - **Week 4-5**: Advanced features
164
+ - **Week 6**: Testing & deployment
165
+
166
+ ## Open Questions
167
+
168
+ 1. Should we allow Unicode characters in acronyms?
169
+ 2. Reset frequency for periodic leaderboards?
170
+ 3. Maximum entries per player per day?
171
+ 4. Prize/reward system for top performers?
src/clozeGameEngine.js CHANGED
@@ -164,6 +164,10 @@ class ClozeGame {
164
  const sentenceList = passage.split(/[.!?]+/).filter(s => s.trim().length > 10);
165
  const lines = passage.split('\n').filter(l => l.trim());
166
 
 
 
 
 
167
  // Check for repetitive patterns (common in indexes/TOCs)
168
  const repeatedPhrases = ['CONTENTS', 'CHAPTER', 'Volume', 'Vol.', 'Part', 'Book'];
169
  const repetitionCount = repeatedPhrases.reduce((count, phrase) =>
@@ -182,6 +186,7 @@ class ClozeGame {
182
  const avgWordsPerSentence = totalWords / Math.max(1, sentenceList.length);
183
  const repetitionRatio = repetitionCount / totalWords;
184
  const titleLineRatio = titleLines / Math.max(1, lines.length);
 
185
 
186
  // Stricter thresholds for higher levels
187
  const capsThreshold = this.currentLevel >= 3 ? 0.03 : 0.05;
@@ -198,6 +203,8 @@ class ClozeGame {
198
  if (shortWordRatio < 0.3) { qualityScore += 2; issues.push(`short-words: ${Math.round(shortWordRatio * 100)}%`); }
199
  if (repetitionRatio > 0.02) { qualityScore += repetitionRatio * 50; issues.push(`repetitive: ${Math.round(repetitionRatio * 100)}%`); }
200
  if (titleLineRatio > 0.2) { qualityScore += 5; issues.push(`title-lines: ${Math.round(titleLineRatio * 100)}%`); }
 
 
201
 
202
  // Reject if quality score indicates technical/non-narrative content
203
  if (qualityScore > 3) {
 
164
  const sentenceList = passage.split(/[.!?]+/).filter(s => s.trim().length > 10);
165
  const lines = passage.split('\n').filter(l => l.trim());
166
 
167
+ // Count excessive dashes (n-dashes, m-dashes, hyphens in sequence)
168
+ const dashSequences = (passage.match(/[-—–]{3,}/g) || []).length;
169
+ const totalDashes = (passage.match(/[-—–]/g) || []).length;
170
+
171
  // Check for repetitive patterns (common in indexes/TOCs)
172
  const repeatedPhrases = ['CONTENTS', 'CHAPTER', 'Volume', 'Vol.', 'Part', 'Book'];
173
  const repetitionCount = repeatedPhrases.reduce((count, phrase) =>
 
186
  const avgWordsPerSentence = totalWords / Math.max(1, sentenceList.length);
187
  const repetitionRatio = repetitionCount / totalWords;
188
  const titleLineRatio = titleLines / Math.max(1, lines.length);
189
+ const dashRatio = totalDashes / totalWords;
190
 
191
  // Stricter thresholds for higher levels
192
  const capsThreshold = this.currentLevel >= 3 ? 0.03 : 0.05;
 
203
  if (shortWordRatio < 0.3) { qualityScore += 2; issues.push(`short-words: ${Math.round(shortWordRatio * 100)}%`); }
204
  if (repetitionRatio > 0.02) { qualityScore += repetitionRatio * 50; issues.push(`repetitive: ${Math.round(repetitionRatio * 100)}%`); }
205
  if (titleLineRatio > 0.2) { qualityScore += 5; issues.push(`title-lines: ${Math.round(titleLineRatio * 100)}%`); }
206
+ if (dashSequences > 0) { qualityScore += dashSequences * 3; issues.push(`dash-sequences: ${dashSequences}`); }
207
+ if (dashRatio > 0.02) { qualityScore += dashRatio * 25; issues.push(`dashes: ${Math.round(dashRatio * 100)}%`); }
208
 
209
  // Reject if quality score indicates technical/non-narrative content
210
  if (qualityScore > 3) {