Phishing-Detector / README.md
th1enq's picture
add bert & xgboost from global
6eb199d
---
title: Phishing Detector
emoji: πŸ”
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 4.39.0
app_file: app.py
pinned: false
---
# Phishing Detector
A comprehensive multi-model phishing detection system using:
## πŸ€– Models
- **DeBERTa + LSTM**: Advanced transformer with attention mechanism (`khoa-done/phishing-detector`)
- **BERT**: Fine-tuned BERT model (`th1enq/bert_checkpoint`)
- **XGBoost**: Traditional ML with feature engineering (`th1enq/xgboost_checkpoint`)
## ✨ Features
- **URL Structure Analysis**: Extract 30+ features from URL patterns
- **HTML Content Analysis**: Extract 43+ features from webpage content
- **Combined Predictions**: Weighted ensemble of all models
- **Visual Attention Weights**: See which tokens influence decisions
- **Real-time Web Scraping**: Fetch and analyze live websites
- **Multi-tab Interface**: Compare results across different models
## πŸš€ Usage
1. **Enter a URL**: System will fetch the webpage and analyze both URL structure and content
2. **Enter text**: Direct analysis of suspicious text content
3. **Compare Models**: Use different tabs to see how each model performs
## πŸ“Š Model Performance
- **DeBERTa + LSTM**: Best for context understanding with attention visualization
- **BERT**: Reliable baseline with robust predictions
- **XGBoost**: Fast traditional ML approach with feature interpretability
## πŸ”§ Technical Details
- All models loaded from Hugging Face Hub for easy deployment
- Feature extraction modules included for XGBoost functionality
- Dark theme optimized interface with visual analytics
- Graceful fallbacks if models fail to load
## πŸ“ Examples
Try these URLs to see the system in action:
- `https://github.com/user/repo` (should be benign)
- `http://suspicious-phishing-site.example` (simulated phishing)
- Or paste any suspicious email content for analysis