File size: 3,671 Bytes
a23cfdb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
title: Gambling Comment Filter
emoji: 🎲
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
short_description: A PoC filter for detecting gambling-related comments
---

# Gambling Comment Filter
A high-performance filter for detecting online gambling-related comments. This project is built using FastAPI and is designed to be deployed on Hugging Face Spaces. It uses robust Unicode normalization (via unidecode and a custom visual mapping) and dynamic rule management to catch obfuscated gambling content in comments.

## Features
- **Robust Text Normalization:** Converts fancy or obfuscated Unicode characters (bold, italic, fullwidth, Cyrillic/Greek lookalikes) into plain ASCII.
- **Dynamic Rule Management:** Add or update filtering rules (platform names, gambling terms, safe indicators, gambling contexts, ambiguous terms) on the fly using a web interface.
- **File Upload Support:** Process comments in bulk by uploading CSV or Excel files.
- **Score-Based Classification:** Uses a scoring algorithm to determine if a comment is gambling-related based on multiple signals.
- **Hugging Face Spaces Ready:** Deploy your project easily with a Dockerfile and run it as a Hugging Face Space.

## Project Structure
```
gambling-comment-filter/
β”œβ”€β”€ app.py                # Main FastAPI application with filtering logic and endpoints
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ Dockerfile            # Docker configuration for deployment on Hugging Face Spaces
└── templates/
    └── index.html        # HTML template for the web interface
```

## Requirements
- Python 3.9+
- [FastAPI](https://fastapi.tiangolo.com/)
- [Uvicorn](https://www.uvicorn.org/)
- [Jinja2](https://palletsprojects.com/p/jinja/)
- [Pandas](https://pandas.pydata.org/)
- [openpyxl](https://openpyxl.readthedocs.io/en/stable/)
- [unidecode](https://pypi.org/project/Unidecode/)

## Setup and Local Testing
1. **Clone the Repository**
   ```bash
   git clone https://huggingface.co/spaces/ariansyahdedy/gambling-comment-filter
   cd gambling-comment-filter
   ```

2. **Create a Virtual Environment and Install Dependencies**
   ```bash
   python -m venv venv
   source venv/bin/activate  # On Windows use: venv\Scripts\activate
   pip install -r requirements.txt
   ```

3. **Run the Application**
   ```bash
   uvicorn app:app --reload
   ```

4. **Access the Web Interface**
   Open your browser and visit http://localhost:8000

## Deployment on Hugging Face Spaces
1. **Create a New Space**
   Go to Hugging Face Spaces and create a new Space using the Docker runtime.

2. **Push Your Local Project to the Space**
   ```bash
   cd path/to/gambling-comment-filter
   git init  # if not already a git repo
   git add .
   git commit -m "Initial commit for Gambling Comment Filter"
   git remote add hf https://huggingface.co/spaces/ariansyahdedy/gambling-comment-filter
   git push hf main
   ```
   The Space will automatically build and deploy your project.

## Customization
* **Updating Rules:** Use the web interface to add new rules via the `/add_rule` endpoint.
* **Visual Mapping:** The `_robust_normalize` function uses a `VISUAL_MAP` dictionary to convert fancy characters into plain ASCII. You can update this mapping directly in `app.py` or add new entries through the `/add_visual_char` endpoint.
* **Scoring:** Adjust the scoring logic in `is_gambling_comment` if you want to tweak the sensitivity.

## License
This project is licensed under the MIT License. See the LICENSE file for details.

## Contributing
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.