Spaces:
Sleeping
Sleeping
Upload 4 files
Browse files- README.md +96 -1
- app.py +338 -0
- model_wrapper.py +227 -0
- requirements.txt +9 -0
README.md
CHANGED
@@ -11,4 +11,99 @@ license: apache-2.0
|
|
11 |
short_description: Financial transactions fraud detection.
|
12 |
---
|
13 |
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
short_description: Financial transactions fraud detection.
|
12 |
---
|
13 |
|
14 |
+
# 🔒 Credit Card Fraud Detection System
|
15 |
+
|
16 |
+
**Instantly detect fraudulent transactions with AI-powered risk assessment**
|
17 |
+
|
18 |
+
This system uses an **XGBoost machine learning model** to analyse credit card transactions and predict fraud risk in real-time. Simply enter transaction details and get an immediate risk assessment.
|
19 |
+
|
20 |
+
## 🚀 Quick Start
|
21 |
+
|
22 |
+
1. **Single Transaction**: Enter transaction details → Get instant fraud probability
|
23 |
+
2. **Batch Processing**: Upload CSV file → Process multiple transactions at once
|
24 |
+
3. **Risk Assessment**: Receive colour-coded risk levels with clear recommendations
|
25 |
+
|
26 |
+
## 🎯 How It Works
|
27 |
+
|
28 |
+
The AI model analyses **40+ transaction features** including:
|
29 |
+
- Transaction amount and timing
|
30 |
+
- Card details and type
|
31 |
+
- Email domain patterns
|
32 |
+
- Geographic information
|
33 |
+
- User behaviour history
|
34 |
+
|
35 |
+
## 📊 Risk Levels Explained
|
36 |
+
|
37 |
+
| Risk Level | Probability | What It Means | Action Required |
|
38 |
+
|------------|-------------|---------------|-----------------|
|
39 |
+
| 🔴 **High Risk** | ≥80% | Very likely fraud | Block transaction immediately |
|
40 |
+
| 🟡 **Medium Risk** | 50-79% | Suspicious activity | Manual review needed |
|
41 |
+
| 🟠 **Low Risk** | 20-49% | Some concerns | Monitor closely |
|
42 |
+
| 🟢 **Very Low Risk** | <20% | Normal transaction | Process as usual |
|
43 |
+
|
44 |
+
## 💡 Example Use Cases
|
45 |
+
|
46 |
+
- **Banks**: Screen transactions before processing
|
47 |
+
- **E-commerce**: Protect against fraudulent purchases
|
48 |
+
- **Fintech**: Real-time fraud monitoring
|
49 |
+
- **Research**: Analyse transaction patterns
|
50 |
+
|
51 |
+
## 🛠️ Features
|
52 |
+
|
53 |
+
✅ **Real-time predictions** - Results in under 1 second
|
54 |
+
✅ **High accuracy** - Trained on large transaction dataset
|
55 |
+
✅ **Easy to use** - Simple web interface, no coding required
|
56 |
+
✅ **Batch processing** - Handle multiple transactions at once
|
57 |
+
✅ **Professional insights** - Clear risk levels and recommendations
|
58 |
+
|
59 |
+
## 📈 Model Performance
|
60 |
+
|
61 |
+
- **Algorithm**: XGBoost (Extreme Gradient Boosting)
|
62 |
+
- **Training Data**: Thousands of real transaction records
|
63 |
+
- **Accuracy**: High precision with low false positives
|
64 |
+
- **Speed**: Real-time inference (<100ms per prediction)
|
65 |
+
|
66 |
+
## 🔧 How to Use
|
67 |
+
|
68 |
+
### For Single Transactions:
|
69 |
+
1. Fill in the transaction form
|
70 |
+
2. Click "Analyse Transaction"
|
71 |
+
3. View risk assessment and follow recommendations
|
72 |
+
|
73 |
+
### For Multiple Transactions:
|
74 |
+
1. Prepare CSV file with transaction data
|
75 |
+
2. Upload file in "Batch Processing" tab
|
76 |
+
3. Download results with fraud probabilities
|
77 |
+
|
78 |
+
## 📝 CSV Format for Batch Processing
|
79 |
+
|
80 |
+
Your CSV should include columns like:
|
81 |
+
```
|
82 |
+
TransactionAmt, card4, P_emaildomain, addr1, addr2, card1, card2, etc.
|
83 |
+
```
|
84 |
+
|
85 |
+
## ⚡ Try It Now
|
86 |
+
|
87 |
+
No setup required - just enter your transaction details and get instant results!
|
88 |
+
|
89 |
+
## 🛡️ Important Notes
|
90 |
+
|
91 |
+
- This is a **demonstration system** for educational purposes
|
92 |
+
- For production use, implement proper security measures
|
93 |
+
- Always combine AI predictions with human expertise
|
94 |
+
- Follow your organisation's fraud prevention policies
|
95 |
+
|
96 |
+
## 🔬 Technical Details
|
97 |
+
|
98 |
+
The model uses advanced feature engineering including:
|
99 |
+
- Logarithmic transformations
|
100 |
+
- Time-based features
|
101 |
+
- Interaction variables
|
102 |
+
- Categorical encoding
|
103 |
+
- Missing value handling
|
104 |
+
|
105 |
+
Built with Python, scikit-learn, XGBoost, and Gradio.
|
106 |
+
|
107 |
+
---
|
108 |
+
|
109 |
+
**Ready to detect fraud?** Start by entering a transaction above! 👆
|
app.py
ADDED
@@ -0,0 +1,338 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import gradio as gr
|
2 |
+
import pandas as pd
|
3 |
+
import numpy as np
|
4 |
+
import joblib
|
5 |
+
from model_wrapper import FraudDetectionModel
|
6 |
+
import os
|
7 |
+
|
8 |
+
# Initialize the fraud detection model
|
9 |
+
fraud_model = FraudDetectionModel()
|
10 |
+
|
11 |
+
# Load model if files exist
|
12 |
+
try:
|
13 |
+
# Load the specific XGBoost model files from your training
|
14 |
+
model_path = "fraud_detection_model_xgboost_20250727_145448.joblib"
|
15 |
+
preprocessor_path = "preprocessor_20250727_145448.joblib"
|
16 |
+
metadata_path = "model_metadata_20250727_145448.joblib"
|
17 |
+
|
18 |
+
if os.path.exists(model_path) and os.path.exists(preprocessor_path):
|
19 |
+
if os.path.exists(metadata_path):
|
20 |
+
fraud_model.load_model(model_path, preprocessor_path, metadata_path)
|
21 |
+
else:
|
22 |
+
fraud_model.load_model(model_path, preprocessor_path)
|
23 |
+
model_loaded = True
|
24 |
+
else:
|
25 |
+
model_loaded = False
|
26 |
+
print("Model files not found. Please upload the following files:")
|
27 |
+
print("- fraud_detection_model_xgboost_20250727_145448.joblib")
|
28 |
+
print("- preprocessor_20250727_145448.joblib")
|
29 |
+
print("- model_metadata_20250727_145448.joblib")
|
30 |
+
except Exception as e:
|
31 |
+
model_loaded = False
|
32 |
+
print(f"Error loading model: {e}")
|
33 |
+
|
34 |
+
def predict_fraud_risk(
|
35 |
+
transaction_amount,
|
36 |
+
card_type,
|
37 |
+
email_domain,
|
38 |
+
transaction_hour,
|
39 |
+
addr1,
|
40 |
+
addr2,
|
41 |
+
card1,
|
42 |
+
card2,
|
43 |
+
dist1,
|
44 |
+
c1, c2, c3, c4, c5, c6,
|
45 |
+
d1, d2, d3, d4, d5,
|
46 |
+
m1, m2, m3, m4, m5, m6
|
47 |
+
):
|
48 |
+
"""Predict fraud risk for a transaction"""
|
49 |
+
|
50 |
+
if not model_loaded:
|
51 |
+
return "❌ Model not loaded. Please contact administrator.", "", "", ""
|
52 |
+
|
53 |
+
try:
|
54 |
+
# Prepare transaction data
|
55 |
+
transaction_data = {
|
56 |
+
'TransactionAmt': float(transaction_amount),
|
57 |
+
'card4': card_type,
|
58 |
+
'P_emaildomain': email_domain,
|
59 |
+
'R_emaildomain': email_domain,
|
60 |
+
'addr1': float(addr1) if addr1 else None,
|
61 |
+
'addr2': float(addr2) if addr2 else None,
|
62 |
+
'card1': float(card1) if card1 else None,
|
63 |
+
'card2': float(card2) if card2 else None,
|
64 |
+
'card3': float(transaction_amount), # Often similar to transaction amount
|
65 |
+
'card5': 142.0, # Default value
|
66 |
+
'card6': 'credit', # Default value
|
67 |
+
'dist1': float(dist1) if dist1 else None,
|
68 |
+
'dist2': float(dist1) if dist1 else None, # Often similar to dist1
|
69 |
+
'C1': float(c1),
|
70 |
+
'C2': float(c2),
|
71 |
+
'C3': float(c3),
|
72 |
+
'C4': float(c4),
|
73 |
+
'C5': float(c5),
|
74 |
+
'C6': float(c6),
|
75 |
+
'C7': 0.0,
|
76 |
+
'C8': 0.0,
|
77 |
+
'C9': 1.0,
|
78 |
+
'C10': 0.0,
|
79 |
+
'C11': 1.0,
|
80 |
+
'C12': 1.0,
|
81 |
+
'C13': 1.0,
|
82 |
+
'C14': 1.0,
|
83 |
+
'D1': float(d1),
|
84 |
+
'D2': float(d2),
|
85 |
+
'D3': float(d3),
|
86 |
+
'D4': float(d4),
|
87 |
+
'D5': float(d5),
|
88 |
+
'D10': 0.0,
|
89 |
+
'D15': 0.0,
|
90 |
+
'M1': m1,
|
91 |
+
'M2': m2,
|
92 |
+
'M3': m3,
|
93 |
+
'M4': m4,
|
94 |
+
'M5': m5,
|
95 |
+
'M6': m6,
|
96 |
+
'TransactionDT': transaction_hour * 3600 # Convert hour to seconds
|
97 |
+
}
|
98 |
+
|
99 |
+
# Make prediction
|
100 |
+
result = fraud_model.predict_single_transaction(transaction_data)
|
101 |
+
|
102 |
+
if 'error' in result:
|
103 |
+
return f"❌ {result['error']}", "", "", ""
|
104 |
+
|
105 |
+
# Format results
|
106 |
+
probability = result['fraud_probability']
|
107 |
+
risk_level = result['risk_level']
|
108 |
+
recommendation = result['recommendation']
|
109 |
+
|
110 |
+
# Create risk indicator
|
111 |
+
if probability >= 0.8:
|
112 |
+
risk_indicator = f"🔴 HIGH RISK ({probability:.1%})"
|
113 |
+
elif probability >= 0.5:
|
114 |
+
risk_indicator = f"🟡 MEDIUM RISK ({probability:.1%})"
|
115 |
+
elif probability >= 0.2:
|
116 |
+
risk_indicator = f"🟠 LOW RISK ({probability:.1%})"
|
117 |
+
else:
|
118 |
+
risk_indicator = f"🟢 VERY LOW RISK ({probability:.1%})"
|
119 |
+
|
120 |
+
return risk_indicator, f"{probability:.4f}", risk_level, recommendation
|
121 |
+
|
122 |
+
except Exception as e:
|
123 |
+
return f"❌ Error: {str(e)}", "", "", ""
|
124 |
+
|
125 |
+
def predict_from_csv(file):
|
126 |
+
"""Predict fraud risk for multiple transactions from CSV"""
|
127 |
+
|
128 |
+
if not model_loaded:
|
129 |
+
return "❌ Model not loaded. Please contact administrator."
|
130 |
+
|
131 |
+
if file is None:
|
132 |
+
return "❌ Please upload a CSV file."
|
133 |
+
|
134 |
+
try:
|
135 |
+
# Read CSV file
|
136 |
+
df = pd.read_csv(file.name)
|
137 |
+
|
138 |
+
# Make batch predictions
|
139 |
+
results_df = fraud_model.predict_batch(df)
|
140 |
+
|
141 |
+
# Save results
|
142 |
+
output_path = "fraud_predictions.csv"
|
143 |
+
results_df.to_csv(output_path, index=False)
|
144 |
+
|
145 |
+
# Create summary
|
146 |
+
total_transactions = len(results_df)
|
147 |
+
high_risk = len(results_df[results_df['fraud_probability'] >= 0.8])
|
148 |
+
medium_risk = len(results_df[(results_df['fraud_probability'] >= 0.5) & (results_df['fraud_probability'] < 0.8)])
|
149 |
+
low_risk = len(results_df[(results_df['fraud_probability'] >= 0.2) & (results_df['fraud_probability'] < 0.5)])
|
150 |
+
very_low_risk = len(results_df[results_df['fraud_probability'] < 0.2])
|
151 |
+
|
152 |
+
summary = f"""
|
153 |
+
📊 **Batch Prediction Summary**
|
154 |
+
|
155 |
+
Total Transactions: {total_transactions}
|
156 |
+
🔴 High Risk: {high_risk} ({high_risk/total_transactions:.1%})
|
157 |
+
🟡 Medium Risk: {medium_risk} ({medium_risk/total_transactions:.1%})
|
158 |
+
🟠 Low Risk: {low_risk} ({low_risk/total_transactions:.1%})
|
159 |
+
🟢 Very Low Risk: {very_low_risk} ({very_low_risk/total_transactions:.1%})
|
160 |
+
|
161 |
+
Results saved to: {output_path}
|
162 |
+
"""
|
163 |
+
|
164 |
+
return summary, output_path
|
165 |
+
|
166 |
+
except Exception as e:
|
167 |
+
return f"❌ Error processing CSV: {str(e)}", None
|
168 |
+
|
169 |
+
# Create Gradio interface
|
170 |
+
with gr.Blocks(title="Fraud Detection System", theme=gr.themes.Soft()) as app:
|
171 |
+
|
172 |
+
gr.Markdown("""
|
173 |
+
# 🔒 Credit Card Fraud Detection System
|
174 |
+
|
175 |
+
This system uses machine learning to assess the risk of credit card transactions being fraudulent.
|
176 |
+
Enter transaction details below to get a risk assessment.
|
177 |
+
|
178 |
+
**Risk Levels:**
|
179 |
+
- 🔴 High Risk (≥80%): Block transaction immediately
|
180 |
+
- 🟡 Medium Risk (50-79%): Manual review required
|
181 |
+
- 🟠 Low Risk (20-49%): Monitor transaction
|
182 |
+
- 🟢 Very Low Risk (<20%): Process normally
|
183 |
+
""")
|
184 |
+
|
185 |
+
with gr.Tabs():
|
186 |
+
|
187 |
+
# Single Transaction Tab
|
188 |
+
with gr.TabItem("Single Transaction"):
|
189 |
+
with gr.Row():
|
190 |
+
with gr.Column():
|
191 |
+
gr.Markdown("### Transaction Details")
|
192 |
+
transaction_amount = gr.Number(label="Transaction Amount ($)", value=100.0)
|
193 |
+
card_type = gr.Dropdown(
|
194 |
+
choices=["visa", "mastercard", "american express", "discover"],
|
195 |
+
label="Card Type",
|
196 |
+
value="visa"
|
197 |
+
)
|
198 |
+
email_domain = gr.Textbox(label="Email Domain", value="gmail.com")
|
199 |
+
transaction_hour = gr.Slider(0, 23, label="Transaction Hour", value=12)
|
200 |
+
|
201 |
+
gr.Markdown("### Address & Card Info")
|
202 |
+
addr1 = gr.Number(label="Address 1", value=325.0)
|
203 |
+
addr2 = gr.Number(label="Address 2", value=87.0)
|
204 |
+
card1 = gr.Number(label="Card 1", value=13553)
|
205 |
+
card2 = gr.Number(label="Card 2", value=150.0)
|
206 |
+
dist1 = gr.Number(label="Distance 1", value=19.0)
|
207 |
+
|
208 |
+
with gr.Column():
|
209 |
+
gr.Markdown("### Transaction Counts")
|
210 |
+
c1 = gr.Number(label="C1", value=1.0)
|
211 |
+
c2 = gr.Number(label="C2", value=1.0)
|
212 |
+
c3 = gr.Number(label="C3", value=0.0)
|
213 |
+
c4 = gr.Number(label="C4", value=0.0)
|
214 |
+
c5 = gr.Number(label="C5", value=0.0)
|
215 |
+
c6 = gr.Number(label="C6", value=1.0)
|
216 |
+
|
217 |
+
gr.Markdown("### Time Deltas")
|
218 |
+
d1 = gr.Number(label="D1", value=0.0)
|
219 |
+
d2 = gr.Number(label="D2", value=0.0)
|
220 |
+
d3 = gr.Number(label="D3", value=0.0)
|
221 |
+
d4 = gr.Number(label="D4", value=0.0)
|
222 |
+
d5 = gr.Number(label="D5", value=20.0)
|
223 |
+
|
224 |
+
gr.Markdown("### Match Features")
|
225 |
+
m1 = gr.Dropdown(choices=["T", "F"], label="M1", value="T")
|
226 |
+
m2 = gr.Dropdown(choices=["T", "F"], label="M2", value="T")
|
227 |
+
m3 = gr.Dropdown(choices=["T", "F"], label="M3", value="T")
|
228 |
+
m4 = gr.Dropdown(choices=["M0", "M1", "M2"], label="M4", value="M0")
|
229 |
+
m5 = gr.Dropdown(choices=["T", "F"], label="M5", value="F")
|
230 |
+
m6 = gr.Dropdown(choices=["T", "F"], label="M6", value="F")
|
231 |
+
|
232 |
+
predict_btn = gr.Button("🔍 Analyze Transaction", variant="primary", size="lg")
|
233 |
+
|
234 |
+
with gr.Row():
|
235 |
+
risk_output = gr.Textbox(label="Risk Assessment", lines=1)
|
236 |
+
probability_output = gr.Textbox(label="Fraud Probability", lines=1)
|
237 |
+
|
238 |
+
with gr.Row():
|
239 |
+
risk_level_output = gr.Textbox(label="Risk Level", lines=1)
|
240 |
+
recommendation_output = gr.Textbox(label="Recommendation", lines=2)
|
241 |
+
|
242 |
+
predict_btn.click(
|
243 |
+
predict_fraud_risk,
|
244 |
+
inputs=[
|
245 |
+
transaction_amount, card_type, email_domain, transaction_hour,
|
246 |
+
addr1, addr2, card1, card2, dist1,
|
247 |
+
c1, c2, c3, c4, c5, c6,
|
248 |
+
d1, d2, d3, d4, d5,
|
249 |
+
m1, m2, m3, m4, m5, m6
|
250 |
+
],
|
251 |
+
outputs=[risk_output, probability_output, risk_level_output, recommendation_output]
|
252 |
+
)
|
253 |
+
|
254 |
+
# Batch Processing Tab
|
255 |
+
with gr.TabItem("Batch Processing"):
|
256 |
+
gr.Markdown("""
|
257 |
+
### Upload CSV File for Batch Processing
|
258 |
+
|
259 |
+
Upload a CSV file containing multiple transactions. The file should include the same columns
|
260 |
+
as used in single transaction prediction.
|
261 |
+
""")
|
262 |
+
|
263 |
+
file_upload = gr.File(label="Upload CSV File", file_types=[".csv"])
|
264 |
+
batch_btn = gr.Button("🔍 Process Batch", variant="primary")
|
265 |
+
|
266 |
+
batch_output = gr.Textbox(label="Batch Results", lines=10)
|
267 |
+
download_file = gr.File(label="Download Results")
|
268 |
+
|
269 |
+
batch_btn.click(
|
270 |
+
predict_from_csv,
|
271 |
+
inputs=[file_upload],
|
272 |
+
outputs=[batch_output, download_file]
|
273 |
+
)
|
274 |
+
|
275 |
+
# Model Info Tab
|
276 |
+
with gr.TabItem("Model Information"):
|
277 |
+
if model_loaded and fraud_model.metadata:
|
278 |
+
model_info = fraud_model.get_model_info()
|
279 |
+
gr.Markdown(f"""
|
280 |
+
### Model Status
|
281 |
+
**Status:** ✅ {model_info.get('model_name', 'XGBoost')} Model Loaded
|
282 |
+
**AUC Score:** {model_info.get('auc_score', 'N/A')}
|
283 |
+
**Training Date:** {model_info.get('training_timestamp', 'N/A')}
|
284 |
+
**Features:** {model_info.get('feature_count', 'N/A')}
|
285 |
+
|
286 |
+
### About This Model
|
287 |
+
This fraud detection system uses an **XGBoost classifier** trained on a comprehensive dataset
|
288 |
+
of credit card transactions. The model achieved high performance with advanced feature engineering
|
289 |
+
and ensemble learning techniques.
|
290 |
+
|
291 |
+
### Model Performance
|
292 |
+
- **Algorithm**: XGBoost (Extreme Gradient Boosting)
|
293 |
+
- **AUC Score**: {model_info.get('auc_score', 'N/A')}
|
294 |
+
- **Features Used**: {model_info.get('feature_count', 'N/A')} engineered features
|
295 |
+
- **Training Method**: Cross-validation with stratified sampling
|
296 |
+
- **Speed**: Real-time predictions (<100ms)
|
297 |
+
|
298 |
+
### Features Used
|
299 |
+
The model processes over 40 features including:
|
300 |
+
- **Transaction Details**: Amount, timing, frequency patterns
|
301 |
+
- **Card Information**: Type, issuer details, security features
|
302 |
+
- **User Behaviour**: Email domains, address patterns, historical counts
|
303 |
+
- **Device & Session**: Geographic data, device fingerprinting
|
304 |
+
- **Engineered Features**: Ratios, transformations, temporal patterns
|
305 |
+
|
306 |
+
### XGBoost Advantages
|
307 |
+
- **High Accuracy**: Excellent performance on tabular data
|
308 |
+
- **Feature Importance**: Clear understanding of decision factors
|
309 |
+
- **Robustness**: Handles missing values and outliers well
|
310 |
+
- **Scalability**: Efficient training and inference
|
311 |
+
""")
|
312 |
+
else:
|
313 |
+
gr.Markdown(f"""
|
314 |
+
### Model Status
|
315 |
+
**Status:** {'✅ Basic Model Loaded' if model_loaded else '❌ Not Loaded'}
|
316 |
+
|
317 |
+
### About This Model
|
318 |
+
This fraud detection system uses advanced machine learning algorithms to assess transaction risk.
|
319 |
+
The model was trained on a large dataset of credit card transactions and uses multiple features
|
320 |
+
including transaction amount, card details, user behaviour patterns, and timing information.
|
321 |
+
|
322 |
+
### Features Used
|
323 |
+
- Transaction amount and timing
|
324 |
+
- Card information (type, numbers)
|
325 |
+
- Email domain patterns
|
326 |
+
- Address information
|
327 |
+
- User behaviour counts
|
328 |
+
- Device and session data
|
329 |
+
|
330 |
+
### Model Performance
|
331 |
+
- **Algorithm**: Ensemble methods (Random Forest, XGBoost, LightGBM)
|
332 |
+
- **Accuracy**: High precision in detecting fraudulent transactions
|
333 |
+
- **Speed**: Real-time predictions
|
334 |
+
""")
|
335 |
+
|
336 |
+
# Launch the app
|
337 |
+
if __name__ == "__main__":
|
338 |
+
app.launch()
|
model_wrapper.py
ADDED
@@ -0,0 +1,227 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import joblib
|
2 |
+
import pandas as pd
|
3 |
+
import numpy as np
|
4 |
+
from sklearn.preprocessing import StandardScaler, LabelEncoder
|
5 |
+
from sklearn.impute import SimpleImputer
|
6 |
+
|
7 |
+
class FraudDetectionModel:
|
8 |
+
"""Wrapper class for the fraud detection model"""
|
9 |
+
|
10 |
+
def __init__(self, model_path=None, preprocessor_path=None, metadata_path=None):
|
11 |
+
self.model = None
|
12 |
+
self.preprocessor = None
|
13 |
+
self.metadata = None
|
14 |
+
|
15 |
+
if model_path and preprocessor_path:
|
16 |
+
self.load_model(model_path, preprocessor_path, metadata_path)
|
17 |
+
|
18 |
+
def load_model(self, model_path, preprocessor_path, metadata_path=None):
|
19 |
+
"""Load the trained model, preprocessor, and metadata"""
|
20 |
+
self.model = joblib.load(model_path)
|
21 |
+
self.preprocessor = joblib.load(preprocessor_path)
|
22 |
+
|
23 |
+
if metadata_path:
|
24 |
+
self.metadata = joblib.load(metadata_path)
|
25 |
+
print(f"Loaded {self.metadata['model_name']} model with AUC: {self.metadata['auc_score']:.4f}")
|
26 |
+
else:
|
27 |
+
print("Model and preprocessor loaded successfully!")
|
28 |
+
|
29 |
+
def predict_single_transaction(self, transaction_data):
|
30 |
+
"""
|
31 |
+
Predict fraud probability for a single transaction
|
32 |
+
|
33 |
+
Args:
|
34 |
+
transaction_data (dict): Dictionary containing transaction features
|
35 |
+
|
36 |
+
Returns:
|
37 |
+
dict: Prediction results with probability and risk level
|
38 |
+
"""
|
39 |
+
if self.model is None or self.preprocessor is None:
|
40 |
+
raise ValueError("Model not loaded. Please load model first.")
|
41 |
+
|
42 |
+
# Convert to DataFrame
|
43 |
+
df = pd.DataFrame([transaction_data])
|
44 |
+
|
45 |
+
# Add TransactionID if not present (required for preprocessing)
|
46 |
+
if 'TransactionID' not in df.columns:
|
47 |
+
df['TransactionID'] = 'temp_id'
|
48 |
+
|
49 |
+
try:
|
50 |
+
# Preprocess the data
|
51 |
+
X_processed, _ = self.preprocessor.preprocess(df, fit=False)
|
52 |
+
|
53 |
+
# Make prediction
|
54 |
+
fraud_probability = self.model.predict_proba(X_processed)[0, 1]
|
55 |
+
|
56 |
+
# Determine risk level
|
57 |
+
if fraud_probability >= 0.8:
|
58 |
+
risk_level = "High Risk"
|
59 |
+
recommendation = "Block transaction and investigate immediately"
|
60 |
+
elif fraud_probability >= 0.5:
|
61 |
+
risk_level = "Medium Risk"
|
62 |
+
recommendation = "Review transaction manually"
|
63 |
+
elif fraud_probability >= 0.2:
|
64 |
+
risk_level = "Low Risk"
|
65 |
+
recommendation = "Monitor transaction"
|
66 |
+
else:
|
67 |
+
risk_level = "Very Low Risk"
|
68 |
+
recommendation = "Process normally"
|
69 |
+
|
70 |
+
return {
|
71 |
+
"fraud_probability": float(fraud_probability),
|
72 |
+
"risk_level": risk_level,
|
73 |
+
"recommendation": recommendation,
|
74 |
+
"is_suspicious": fraud_probability >= 0.5
|
75 |
+
}
|
76 |
+
|
77 |
+
except Exception as e:
|
78 |
+
return {
|
79 |
+
"error": f"Prediction failed: {str(e)}",
|
80 |
+
"fraud_probability": None,
|
81 |
+
"risk_level": "Unknown",
|
82 |
+
"recommendation": "Manual review required"
|
83 |
+
}
|
84 |
+
|
85 |
+
def predict_batch(self, transactions_df):
|
86 |
+
"""
|
87 |
+
Predict fraud probabilities for multiple transactions
|
88 |
+
|
89 |
+
Args:
|
90 |
+
transactions_df (pd.DataFrame): DataFrame containing transaction data
|
91 |
+
|
92 |
+
Returns:
|
93 |
+
pd.DataFrame: DataFrame with predictions added
|
94 |
+
"""
|
95 |
+
if self.model is None or self.preprocessor is None:
|
96 |
+
raise ValueError("Model not loaded. Please load model first.")
|
97 |
+
|
98 |
+
# Preprocess the data
|
99 |
+
X_processed, _ = self.preprocessor.preprocess(transactions_df, fit=False)
|
100 |
+
|
101 |
+
# Make predictions
|
102 |
+
fraud_probabilities = self.model.predict_proba(X_processed)[:, 1]
|
103 |
+
|
104 |
+
# Add predictions to original DataFrame
|
105 |
+
result_df = transactions_df.copy()
|
106 |
+
result_df['fraud_probability'] = fraud_probabilities
|
107 |
+
result_df['is_suspicious'] = fraud_probabilities >= 0.5
|
108 |
+
|
109 |
+
# Add risk levels
|
110 |
+
risk_levels = []
|
111 |
+
for prob in fraud_probabilities:
|
112 |
+
if prob >= 0.8:
|
113 |
+
risk_levels.append("High Risk")
|
114 |
+
elif prob >= 0.5:
|
115 |
+
risk_levels.append("Medium Risk")
|
116 |
+
elif prob >= 0.2:
|
117 |
+
risk_levels.append("Low Risk")
|
118 |
+
else:
|
119 |
+
risk_levels.append("Very Low Risk")
|
120 |
+
|
121 |
+
result_df['risk_level'] = risk_levels
|
122 |
+
|
123 |
+
return result_df
|
124 |
+
|
125 |
+
def get_feature_importance(self, top_n=20):
|
126 |
+
"""Get feature importance if available"""
|
127 |
+
if self.model is None:
|
128 |
+
raise ValueError("Model not loaded.")
|
129 |
+
|
130 |
+
if hasattr(self.model, 'feature_importances_'):
|
131 |
+
feature_names = self.preprocessor.feature_names
|
132 |
+
importance_df = pd.DataFrame({
|
133 |
+
'feature': feature_names,
|
134 |
+
'importance': self.model.feature_importances_
|
135 |
+
}).sort_values('importance', ascending=False).head(top_n)
|
136 |
+
|
137 |
+
return importance_df
|
138 |
+
else:
|
139 |
+
return "Feature importance not available for this model type."
|
140 |
+
|
141 |
+
def get_model_info(self):
|
142 |
+
"""Get information about the loaded model"""
|
143 |
+
if self.model is None:
|
144 |
+
return "No model loaded."
|
145 |
+
|
146 |
+
info = {
|
147 |
+
"model_type": type(self.model).__name__,
|
148 |
+
"feature_count": len(self.preprocessor.feature_names) if self.preprocessor else "Unknown",
|
149 |
+
"preprocessing_steps": [
|
150 |
+
"Categorical encoding",
|
151 |
+
"Feature engineering",
|
152 |
+
"Missing value imputation",
|
153 |
+
"Feature scaling"
|
154 |
+
]
|
155 |
+
}
|
156 |
+
|
157 |
+
# Add metadata information if available
|
158 |
+
if self.metadata:
|
159 |
+
info.update({
|
160 |
+
"model_name": self.metadata.get('model_name', 'Unknown'),
|
161 |
+
"auc_score": self.metadata.get('auc_score', 'Unknown'),
|
162 |
+
"training_timestamp": self.metadata.get('timestamp', 'Unknown'),
|
163 |
+
"model_file": self.metadata.get('model_file', 'Unknown'),
|
164 |
+
"preprocessor_file": self.metadata.get('preprocessor_file', 'Unknown')
|
165 |
+
})
|
166 |
+
|
167 |
+
return info
|
168 |
+
|
169 |
+
# Example usage and testing
|
170 |
+
if __name__ == "__main__":
|
171 |
+
# Initialize model wrapper with specific files
|
172 |
+
fraud_model = FraudDetectionModel(
|
173 |
+
model_path="fraud_detection_model_xgboost_20250727_145448.joblib",
|
174 |
+
preprocessor_path="preprocessor_20250727_145448.joblib",
|
175 |
+
metadata_path="model_metadata_20250727_145448.joblib"
|
176 |
+
)
|
177 |
+
|
178 |
+
# Example transaction data for testing
|
179 |
+
sample_transaction = {
|
180 |
+
'TransactionAmt': 150.0,
|
181 |
+
'card1': 13553,
|
182 |
+
'card2': 150.0,
|
183 |
+
'card3': 150.0,
|
184 |
+
'card4': 'discover',
|
185 |
+
'card5': 142.0,
|
186 |
+
'card6': 'credit',
|
187 |
+
'addr1': 325.0,
|
188 |
+
'addr2': 87.0,
|
189 |
+
'dist1': 19.0,
|
190 |
+
'dist2': 19.0,
|
191 |
+
'P_emaildomain': 'gmail.com',
|
192 |
+
'R_emaildomain': 'gmail.com',
|
193 |
+
'C1': 1.0,
|
194 |
+
'C2': 1.0,
|
195 |
+
'C3': 0.0,
|
196 |
+
'C4': 0.0,
|
197 |
+
'C5': 0.0,
|
198 |
+
'C6': 1.0,
|
199 |
+
'C7': 0.0,
|
200 |
+
'C8': 0.0,
|
201 |
+
'C9': 1.0,
|
202 |
+
'C10': 0.0,
|
203 |
+
'C11': 1.0,
|
204 |
+
'C12': 1.0,
|
205 |
+
'C13': 1.0,
|
206 |
+
'C14': 1.0,
|
207 |
+
'D1': 0.0,
|
208 |
+
'D2': 0.0,
|
209 |
+
'D3': 0.0,
|
210 |
+
'D4': 0.0,
|
211 |
+
'D5': 20.0,
|
212 |
+
'D10': 0.0,
|
213 |
+
'D15': 0.0,
|
214 |
+
'M1': 'T',
|
215 |
+
'M2': 'T',
|
216 |
+
'M3': 'T',
|
217 |
+
'M4': 'M0',
|
218 |
+
'M5': 'F',
|
219 |
+
'M6': 'F',
|
220 |
+
'TransactionDT': 86400
|
221 |
+
}
|
222 |
+
|
223 |
+
print("Sample transaction for testing:")
|
224 |
+
print(sample_transaction)
|
225 |
+
print("\n" + "="*50)
|
226 |
+
print("Model wrapper created successfully!")
|
227 |
+
print("To use: load your model files and call predict_single_transaction()")
|
requirements.txt
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
gradio
|
2 |
+
pandas
|
3 |
+
numpy
|
4 |
+
scikit-learn
|
5 |
+
xgboost
|
6 |
+
lightgbm
|
7 |
+
joblib
|
8 |
+
matplotlib
|
9 |
+
seaborn
|