0xnu commited on
Commit
513edc0
·
verified ·
1 Parent(s): 9721d79

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +96 -1
  2. app.py +338 -0
  3. model_wrapper.py +227 -0
  4. requirements.txt +9 -0
README.md CHANGED
@@ -11,4 +11,99 @@ license: apache-2.0
11
  short_description: Financial transactions fraud detection.
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  short_description: Financial transactions fraud detection.
12
  ---
13
 
14
+ # 🔒 Credit Card Fraud Detection System
15
+
16
+ **Instantly detect fraudulent transactions with AI-powered risk assessment**
17
+
18
+ This system uses an **XGBoost machine learning model** to analyse credit card transactions and predict fraud risk in real-time. Simply enter transaction details and get an immediate risk assessment.
19
+
20
+ ## 🚀 Quick Start
21
+
22
+ 1. **Single Transaction**: Enter transaction details → Get instant fraud probability
23
+ 2. **Batch Processing**: Upload CSV file → Process multiple transactions at once
24
+ 3. **Risk Assessment**: Receive colour-coded risk levels with clear recommendations
25
+
26
+ ## 🎯 How It Works
27
+
28
+ The AI model analyses **40+ transaction features** including:
29
+ - Transaction amount and timing
30
+ - Card details and type
31
+ - Email domain patterns
32
+ - Geographic information
33
+ - User behaviour history
34
+
35
+ ## 📊 Risk Levels Explained
36
+
37
+ | Risk Level | Probability | What It Means | Action Required |
38
+ |------------|-------------|---------------|-----------------|
39
+ | 🔴 **High Risk** | ≥80% | Very likely fraud | Block transaction immediately |
40
+ | 🟡 **Medium Risk** | 50-79% | Suspicious activity | Manual review needed |
41
+ | 🟠 **Low Risk** | 20-49% | Some concerns | Monitor closely |
42
+ | 🟢 **Very Low Risk** | <20% | Normal transaction | Process as usual |
43
+
44
+ ## 💡 Example Use Cases
45
+
46
+ - **Banks**: Screen transactions before processing
47
+ - **E-commerce**: Protect against fraudulent purchases
48
+ - **Fintech**: Real-time fraud monitoring
49
+ - **Research**: Analyse transaction patterns
50
+
51
+ ## 🛠️ Features
52
+
53
+ ✅ **Real-time predictions** - Results in under 1 second
54
+ ✅ **High accuracy** - Trained on large transaction dataset
55
+ ✅ **Easy to use** - Simple web interface, no coding required
56
+ ✅ **Batch processing** - Handle multiple transactions at once
57
+ ✅ **Professional insights** - Clear risk levels and recommendations
58
+
59
+ ## 📈 Model Performance
60
+
61
+ - **Algorithm**: XGBoost (Extreme Gradient Boosting)
62
+ - **Training Data**: Thousands of real transaction records
63
+ - **Accuracy**: High precision with low false positives
64
+ - **Speed**: Real-time inference (<100ms per prediction)
65
+
66
+ ## 🔧 How to Use
67
+
68
+ ### For Single Transactions:
69
+ 1. Fill in the transaction form
70
+ 2. Click "Analyse Transaction"
71
+ 3. View risk assessment and follow recommendations
72
+
73
+ ### For Multiple Transactions:
74
+ 1. Prepare CSV file with transaction data
75
+ 2. Upload file in "Batch Processing" tab
76
+ 3. Download results with fraud probabilities
77
+
78
+ ## 📝 CSV Format for Batch Processing
79
+
80
+ Your CSV should include columns like:
81
+ ```
82
+ TransactionAmt, card4, P_emaildomain, addr1, addr2, card1, card2, etc.
83
+ ```
84
+
85
+ ## ⚡ Try It Now
86
+
87
+ No setup required - just enter your transaction details and get instant results!
88
+
89
+ ## 🛡️ Important Notes
90
+
91
+ - This is a **demonstration system** for educational purposes
92
+ - For production use, implement proper security measures
93
+ - Always combine AI predictions with human expertise
94
+ - Follow your organisation's fraud prevention policies
95
+
96
+ ## 🔬 Technical Details
97
+
98
+ The model uses advanced feature engineering including:
99
+ - Logarithmic transformations
100
+ - Time-based features
101
+ - Interaction variables
102
+ - Categorical encoding
103
+ - Missing value handling
104
+
105
+ Built with Python, scikit-learn, XGBoost, and Gradio.
106
+
107
+ ---
108
+
109
+ **Ready to detect fraud?** Start by entering a transaction above! 👆
app.py ADDED
@@ -0,0 +1,338 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ import numpy as np
4
+ import joblib
5
+ from model_wrapper import FraudDetectionModel
6
+ import os
7
+
8
+ # Initialize the fraud detection model
9
+ fraud_model = FraudDetectionModel()
10
+
11
+ # Load model if files exist
12
+ try:
13
+ # Load the specific XGBoost model files from your training
14
+ model_path = "fraud_detection_model_xgboost_20250727_145448.joblib"
15
+ preprocessor_path = "preprocessor_20250727_145448.joblib"
16
+ metadata_path = "model_metadata_20250727_145448.joblib"
17
+
18
+ if os.path.exists(model_path) and os.path.exists(preprocessor_path):
19
+ if os.path.exists(metadata_path):
20
+ fraud_model.load_model(model_path, preprocessor_path, metadata_path)
21
+ else:
22
+ fraud_model.load_model(model_path, preprocessor_path)
23
+ model_loaded = True
24
+ else:
25
+ model_loaded = False
26
+ print("Model files not found. Please upload the following files:")
27
+ print("- fraud_detection_model_xgboost_20250727_145448.joblib")
28
+ print("- preprocessor_20250727_145448.joblib")
29
+ print("- model_metadata_20250727_145448.joblib")
30
+ except Exception as e:
31
+ model_loaded = False
32
+ print(f"Error loading model: {e}")
33
+
34
+ def predict_fraud_risk(
35
+ transaction_amount,
36
+ card_type,
37
+ email_domain,
38
+ transaction_hour,
39
+ addr1,
40
+ addr2,
41
+ card1,
42
+ card2,
43
+ dist1,
44
+ c1, c2, c3, c4, c5, c6,
45
+ d1, d2, d3, d4, d5,
46
+ m1, m2, m3, m4, m5, m6
47
+ ):
48
+ """Predict fraud risk for a transaction"""
49
+
50
+ if not model_loaded:
51
+ return "❌ Model not loaded. Please contact administrator.", "", "", ""
52
+
53
+ try:
54
+ # Prepare transaction data
55
+ transaction_data = {
56
+ 'TransactionAmt': float(transaction_amount),
57
+ 'card4': card_type,
58
+ 'P_emaildomain': email_domain,
59
+ 'R_emaildomain': email_domain,
60
+ 'addr1': float(addr1) if addr1 else None,
61
+ 'addr2': float(addr2) if addr2 else None,
62
+ 'card1': float(card1) if card1 else None,
63
+ 'card2': float(card2) if card2 else None,
64
+ 'card3': float(transaction_amount), # Often similar to transaction amount
65
+ 'card5': 142.0, # Default value
66
+ 'card6': 'credit', # Default value
67
+ 'dist1': float(dist1) if dist1 else None,
68
+ 'dist2': float(dist1) if dist1 else None, # Often similar to dist1
69
+ 'C1': float(c1),
70
+ 'C2': float(c2),
71
+ 'C3': float(c3),
72
+ 'C4': float(c4),
73
+ 'C5': float(c5),
74
+ 'C6': float(c6),
75
+ 'C7': 0.0,
76
+ 'C8': 0.0,
77
+ 'C9': 1.0,
78
+ 'C10': 0.0,
79
+ 'C11': 1.0,
80
+ 'C12': 1.0,
81
+ 'C13': 1.0,
82
+ 'C14': 1.0,
83
+ 'D1': float(d1),
84
+ 'D2': float(d2),
85
+ 'D3': float(d3),
86
+ 'D4': float(d4),
87
+ 'D5': float(d5),
88
+ 'D10': 0.0,
89
+ 'D15': 0.0,
90
+ 'M1': m1,
91
+ 'M2': m2,
92
+ 'M3': m3,
93
+ 'M4': m4,
94
+ 'M5': m5,
95
+ 'M6': m6,
96
+ 'TransactionDT': transaction_hour * 3600 # Convert hour to seconds
97
+ }
98
+
99
+ # Make prediction
100
+ result = fraud_model.predict_single_transaction(transaction_data)
101
+
102
+ if 'error' in result:
103
+ return f"❌ {result['error']}", "", "", ""
104
+
105
+ # Format results
106
+ probability = result['fraud_probability']
107
+ risk_level = result['risk_level']
108
+ recommendation = result['recommendation']
109
+
110
+ # Create risk indicator
111
+ if probability >= 0.8:
112
+ risk_indicator = f"🔴 HIGH RISK ({probability:.1%})"
113
+ elif probability >= 0.5:
114
+ risk_indicator = f"🟡 MEDIUM RISK ({probability:.1%})"
115
+ elif probability >= 0.2:
116
+ risk_indicator = f"🟠 LOW RISK ({probability:.1%})"
117
+ else:
118
+ risk_indicator = f"🟢 VERY LOW RISK ({probability:.1%})"
119
+
120
+ return risk_indicator, f"{probability:.4f}", risk_level, recommendation
121
+
122
+ except Exception as e:
123
+ return f"❌ Error: {str(e)}", "", "", ""
124
+
125
+ def predict_from_csv(file):
126
+ """Predict fraud risk for multiple transactions from CSV"""
127
+
128
+ if not model_loaded:
129
+ return "❌ Model not loaded. Please contact administrator."
130
+
131
+ if file is None:
132
+ return "❌ Please upload a CSV file."
133
+
134
+ try:
135
+ # Read CSV file
136
+ df = pd.read_csv(file.name)
137
+
138
+ # Make batch predictions
139
+ results_df = fraud_model.predict_batch(df)
140
+
141
+ # Save results
142
+ output_path = "fraud_predictions.csv"
143
+ results_df.to_csv(output_path, index=False)
144
+
145
+ # Create summary
146
+ total_transactions = len(results_df)
147
+ high_risk = len(results_df[results_df['fraud_probability'] >= 0.8])
148
+ medium_risk = len(results_df[(results_df['fraud_probability'] >= 0.5) & (results_df['fraud_probability'] < 0.8)])
149
+ low_risk = len(results_df[(results_df['fraud_probability'] >= 0.2) & (results_df['fraud_probability'] < 0.5)])
150
+ very_low_risk = len(results_df[results_df['fraud_probability'] < 0.2])
151
+
152
+ summary = f"""
153
+ 📊 **Batch Prediction Summary**
154
+
155
+ Total Transactions: {total_transactions}
156
+ 🔴 High Risk: {high_risk} ({high_risk/total_transactions:.1%})
157
+ 🟡 Medium Risk: {medium_risk} ({medium_risk/total_transactions:.1%})
158
+ 🟠 Low Risk: {low_risk} ({low_risk/total_transactions:.1%})
159
+ 🟢 Very Low Risk: {very_low_risk} ({very_low_risk/total_transactions:.1%})
160
+
161
+ Results saved to: {output_path}
162
+ """
163
+
164
+ return summary, output_path
165
+
166
+ except Exception as e:
167
+ return f"❌ Error processing CSV: {str(e)}", None
168
+
169
+ # Create Gradio interface
170
+ with gr.Blocks(title="Fraud Detection System", theme=gr.themes.Soft()) as app:
171
+
172
+ gr.Markdown("""
173
+ # 🔒 Credit Card Fraud Detection System
174
+
175
+ This system uses machine learning to assess the risk of credit card transactions being fraudulent.
176
+ Enter transaction details below to get a risk assessment.
177
+
178
+ **Risk Levels:**
179
+ - 🔴 High Risk (≥80%): Block transaction immediately
180
+ - 🟡 Medium Risk (50-79%): Manual review required
181
+ - 🟠 Low Risk (20-49%): Monitor transaction
182
+ - 🟢 Very Low Risk (<20%): Process normally
183
+ """)
184
+
185
+ with gr.Tabs():
186
+
187
+ # Single Transaction Tab
188
+ with gr.TabItem("Single Transaction"):
189
+ with gr.Row():
190
+ with gr.Column():
191
+ gr.Markdown("### Transaction Details")
192
+ transaction_amount = gr.Number(label="Transaction Amount ($)", value=100.0)
193
+ card_type = gr.Dropdown(
194
+ choices=["visa", "mastercard", "american express", "discover"],
195
+ label="Card Type",
196
+ value="visa"
197
+ )
198
+ email_domain = gr.Textbox(label="Email Domain", value="gmail.com")
199
+ transaction_hour = gr.Slider(0, 23, label="Transaction Hour", value=12)
200
+
201
+ gr.Markdown("### Address & Card Info")
202
+ addr1 = gr.Number(label="Address 1", value=325.0)
203
+ addr2 = gr.Number(label="Address 2", value=87.0)
204
+ card1 = gr.Number(label="Card 1", value=13553)
205
+ card2 = gr.Number(label="Card 2", value=150.0)
206
+ dist1 = gr.Number(label="Distance 1", value=19.0)
207
+
208
+ with gr.Column():
209
+ gr.Markdown("### Transaction Counts")
210
+ c1 = gr.Number(label="C1", value=1.0)
211
+ c2 = gr.Number(label="C2", value=1.0)
212
+ c3 = gr.Number(label="C3", value=0.0)
213
+ c4 = gr.Number(label="C4", value=0.0)
214
+ c5 = gr.Number(label="C5", value=0.0)
215
+ c6 = gr.Number(label="C6", value=1.0)
216
+
217
+ gr.Markdown("### Time Deltas")
218
+ d1 = gr.Number(label="D1", value=0.0)
219
+ d2 = gr.Number(label="D2", value=0.0)
220
+ d3 = gr.Number(label="D3", value=0.0)
221
+ d4 = gr.Number(label="D4", value=0.0)
222
+ d5 = gr.Number(label="D5", value=20.0)
223
+
224
+ gr.Markdown("### Match Features")
225
+ m1 = gr.Dropdown(choices=["T", "F"], label="M1", value="T")
226
+ m2 = gr.Dropdown(choices=["T", "F"], label="M2", value="T")
227
+ m3 = gr.Dropdown(choices=["T", "F"], label="M3", value="T")
228
+ m4 = gr.Dropdown(choices=["M0", "M1", "M2"], label="M4", value="M0")
229
+ m5 = gr.Dropdown(choices=["T", "F"], label="M5", value="F")
230
+ m6 = gr.Dropdown(choices=["T", "F"], label="M6", value="F")
231
+
232
+ predict_btn = gr.Button("🔍 Analyze Transaction", variant="primary", size="lg")
233
+
234
+ with gr.Row():
235
+ risk_output = gr.Textbox(label="Risk Assessment", lines=1)
236
+ probability_output = gr.Textbox(label="Fraud Probability", lines=1)
237
+
238
+ with gr.Row():
239
+ risk_level_output = gr.Textbox(label="Risk Level", lines=1)
240
+ recommendation_output = gr.Textbox(label="Recommendation", lines=2)
241
+
242
+ predict_btn.click(
243
+ predict_fraud_risk,
244
+ inputs=[
245
+ transaction_amount, card_type, email_domain, transaction_hour,
246
+ addr1, addr2, card1, card2, dist1,
247
+ c1, c2, c3, c4, c5, c6,
248
+ d1, d2, d3, d4, d5,
249
+ m1, m2, m3, m4, m5, m6
250
+ ],
251
+ outputs=[risk_output, probability_output, risk_level_output, recommendation_output]
252
+ )
253
+
254
+ # Batch Processing Tab
255
+ with gr.TabItem("Batch Processing"):
256
+ gr.Markdown("""
257
+ ### Upload CSV File for Batch Processing
258
+
259
+ Upload a CSV file containing multiple transactions. The file should include the same columns
260
+ as used in single transaction prediction.
261
+ """)
262
+
263
+ file_upload = gr.File(label="Upload CSV File", file_types=[".csv"])
264
+ batch_btn = gr.Button("🔍 Process Batch", variant="primary")
265
+
266
+ batch_output = gr.Textbox(label="Batch Results", lines=10)
267
+ download_file = gr.File(label="Download Results")
268
+
269
+ batch_btn.click(
270
+ predict_from_csv,
271
+ inputs=[file_upload],
272
+ outputs=[batch_output, download_file]
273
+ )
274
+
275
+ # Model Info Tab
276
+ with gr.TabItem("Model Information"):
277
+ if model_loaded and fraud_model.metadata:
278
+ model_info = fraud_model.get_model_info()
279
+ gr.Markdown(f"""
280
+ ### Model Status
281
+ **Status:** ✅ {model_info.get('model_name', 'XGBoost')} Model Loaded
282
+ **AUC Score:** {model_info.get('auc_score', 'N/A')}
283
+ **Training Date:** {model_info.get('training_timestamp', 'N/A')}
284
+ **Features:** {model_info.get('feature_count', 'N/A')}
285
+
286
+ ### About This Model
287
+ This fraud detection system uses an **XGBoost classifier** trained on a comprehensive dataset
288
+ of credit card transactions. The model achieved high performance with advanced feature engineering
289
+ and ensemble learning techniques.
290
+
291
+ ### Model Performance
292
+ - **Algorithm**: XGBoost (Extreme Gradient Boosting)
293
+ - **AUC Score**: {model_info.get('auc_score', 'N/A')}
294
+ - **Features Used**: {model_info.get('feature_count', 'N/A')} engineered features
295
+ - **Training Method**: Cross-validation with stratified sampling
296
+ - **Speed**: Real-time predictions (<100ms)
297
+
298
+ ### Features Used
299
+ The model processes over 40 features including:
300
+ - **Transaction Details**: Amount, timing, frequency patterns
301
+ - **Card Information**: Type, issuer details, security features
302
+ - **User Behaviour**: Email domains, address patterns, historical counts
303
+ - **Device & Session**: Geographic data, device fingerprinting
304
+ - **Engineered Features**: Ratios, transformations, temporal patterns
305
+
306
+ ### XGBoost Advantages
307
+ - **High Accuracy**: Excellent performance on tabular data
308
+ - **Feature Importance**: Clear understanding of decision factors
309
+ - **Robustness**: Handles missing values and outliers well
310
+ - **Scalability**: Efficient training and inference
311
+ """)
312
+ else:
313
+ gr.Markdown(f"""
314
+ ### Model Status
315
+ **Status:** {'✅ Basic Model Loaded' if model_loaded else '❌ Not Loaded'}
316
+
317
+ ### About This Model
318
+ This fraud detection system uses advanced machine learning algorithms to assess transaction risk.
319
+ The model was trained on a large dataset of credit card transactions and uses multiple features
320
+ including transaction amount, card details, user behaviour patterns, and timing information.
321
+
322
+ ### Features Used
323
+ - Transaction amount and timing
324
+ - Card information (type, numbers)
325
+ - Email domain patterns
326
+ - Address information
327
+ - User behaviour counts
328
+ - Device and session data
329
+
330
+ ### Model Performance
331
+ - **Algorithm**: Ensemble methods (Random Forest, XGBoost, LightGBM)
332
+ - **Accuracy**: High precision in detecting fraudulent transactions
333
+ - **Speed**: Real-time predictions
334
+ """)
335
+
336
+ # Launch the app
337
+ if __name__ == "__main__":
338
+ app.launch()
model_wrapper.py ADDED
@@ -0,0 +1,227 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import joblib
2
+ import pandas as pd
3
+ import numpy as np
4
+ from sklearn.preprocessing import StandardScaler, LabelEncoder
5
+ from sklearn.impute import SimpleImputer
6
+
7
+ class FraudDetectionModel:
8
+ """Wrapper class for the fraud detection model"""
9
+
10
+ def __init__(self, model_path=None, preprocessor_path=None, metadata_path=None):
11
+ self.model = None
12
+ self.preprocessor = None
13
+ self.metadata = None
14
+
15
+ if model_path and preprocessor_path:
16
+ self.load_model(model_path, preprocessor_path, metadata_path)
17
+
18
+ def load_model(self, model_path, preprocessor_path, metadata_path=None):
19
+ """Load the trained model, preprocessor, and metadata"""
20
+ self.model = joblib.load(model_path)
21
+ self.preprocessor = joblib.load(preprocessor_path)
22
+
23
+ if metadata_path:
24
+ self.metadata = joblib.load(metadata_path)
25
+ print(f"Loaded {self.metadata['model_name']} model with AUC: {self.metadata['auc_score']:.4f}")
26
+ else:
27
+ print("Model and preprocessor loaded successfully!")
28
+
29
+ def predict_single_transaction(self, transaction_data):
30
+ """
31
+ Predict fraud probability for a single transaction
32
+
33
+ Args:
34
+ transaction_data (dict): Dictionary containing transaction features
35
+
36
+ Returns:
37
+ dict: Prediction results with probability and risk level
38
+ """
39
+ if self.model is None or self.preprocessor is None:
40
+ raise ValueError("Model not loaded. Please load model first.")
41
+
42
+ # Convert to DataFrame
43
+ df = pd.DataFrame([transaction_data])
44
+
45
+ # Add TransactionID if not present (required for preprocessing)
46
+ if 'TransactionID' not in df.columns:
47
+ df['TransactionID'] = 'temp_id'
48
+
49
+ try:
50
+ # Preprocess the data
51
+ X_processed, _ = self.preprocessor.preprocess(df, fit=False)
52
+
53
+ # Make prediction
54
+ fraud_probability = self.model.predict_proba(X_processed)[0, 1]
55
+
56
+ # Determine risk level
57
+ if fraud_probability >= 0.8:
58
+ risk_level = "High Risk"
59
+ recommendation = "Block transaction and investigate immediately"
60
+ elif fraud_probability >= 0.5:
61
+ risk_level = "Medium Risk"
62
+ recommendation = "Review transaction manually"
63
+ elif fraud_probability >= 0.2:
64
+ risk_level = "Low Risk"
65
+ recommendation = "Monitor transaction"
66
+ else:
67
+ risk_level = "Very Low Risk"
68
+ recommendation = "Process normally"
69
+
70
+ return {
71
+ "fraud_probability": float(fraud_probability),
72
+ "risk_level": risk_level,
73
+ "recommendation": recommendation,
74
+ "is_suspicious": fraud_probability >= 0.5
75
+ }
76
+
77
+ except Exception as e:
78
+ return {
79
+ "error": f"Prediction failed: {str(e)}",
80
+ "fraud_probability": None,
81
+ "risk_level": "Unknown",
82
+ "recommendation": "Manual review required"
83
+ }
84
+
85
+ def predict_batch(self, transactions_df):
86
+ """
87
+ Predict fraud probabilities for multiple transactions
88
+
89
+ Args:
90
+ transactions_df (pd.DataFrame): DataFrame containing transaction data
91
+
92
+ Returns:
93
+ pd.DataFrame: DataFrame with predictions added
94
+ """
95
+ if self.model is None or self.preprocessor is None:
96
+ raise ValueError("Model not loaded. Please load model first.")
97
+
98
+ # Preprocess the data
99
+ X_processed, _ = self.preprocessor.preprocess(transactions_df, fit=False)
100
+
101
+ # Make predictions
102
+ fraud_probabilities = self.model.predict_proba(X_processed)[:, 1]
103
+
104
+ # Add predictions to original DataFrame
105
+ result_df = transactions_df.copy()
106
+ result_df['fraud_probability'] = fraud_probabilities
107
+ result_df['is_suspicious'] = fraud_probabilities >= 0.5
108
+
109
+ # Add risk levels
110
+ risk_levels = []
111
+ for prob in fraud_probabilities:
112
+ if prob >= 0.8:
113
+ risk_levels.append("High Risk")
114
+ elif prob >= 0.5:
115
+ risk_levels.append("Medium Risk")
116
+ elif prob >= 0.2:
117
+ risk_levels.append("Low Risk")
118
+ else:
119
+ risk_levels.append("Very Low Risk")
120
+
121
+ result_df['risk_level'] = risk_levels
122
+
123
+ return result_df
124
+
125
+ def get_feature_importance(self, top_n=20):
126
+ """Get feature importance if available"""
127
+ if self.model is None:
128
+ raise ValueError("Model not loaded.")
129
+
130
+ if hasattr(self.model, 'feature_importances_'):
131
+ feature_names = self.preprocessor.feature_names
132
+ importance_df = pd.DataFrame({
133
+ 'feature': feature_names,
134
+ 'importance': self.model.feature_importances_
135
+ }).sort_values('importance', ascending=False).head(top_n)
136
+
137
+ return importance_df
138
+ else:
139
+ return "Feature importance not available for this model type."
140
+
141
+ def get_model_info(self):
142
+ """Get information about the loaded model"""
143
+ if self.model is None:
144
+ return "No model loaded."
145
+
146
+ info = {
147
+ "model_type": type(self.model).__name__,
148
+ "feature_count": len(self.preprocessor.feature_names) if self.preprocessor else "Unknown",
149
+ "preprocessing_steps": [
150
+ "Categorical encoding",
151
+ "Feature engineering",
152
+ "Missing value imputation",
153
+ "Feature scaling"
154
+ ]
155
+ }
156
+
157
+ # Add metadata information if available
158
+ if self.metadata:
159
+ info.update({
160
+ "model_name": self.metadata.get('model_name', 'Unknown'),
161
+ "auc_score": self.metadata.get('auc_score', 'Unknown'),
162
+ "training_timestamp": self.metadata.get('timestamp', 'Unknown'),
163
+ "model_file": self.metadata.get('model_file', 'Unknown'),
164
+ "preprocessor_file": self.metadata.get('preprocessor_file', 'Unknown')
165
+ })
166
+
167
+ return info
168
+
169
+ # Example usage and testing
170
+ if __name__ == "__main__":
171
+ # Initialize model wrapper with specific files
172
+ fraud_model = FraudDetectionModel(
173
+ model_path="fraud_detection_model_xgboost_20250727_145448.joblib",
174
+ preprocessor_path="preprocessor_20250727_145448.joblib",
175
+ metadata_path="model_metadata_20250727_145448.joblib"
176
+ )
177
+
178
+ # Example transaction data for testing
179
+ sample_transaction = {
180
+ 'TransactionAmt': 150.0,
181
+ 'card1': 13553,
182
+ 'card2': 150.0,
183
+ 'card3': 150.0,
184
+ 'card4': 'discover',
185
+ 'card5': 142.0,
186
+ 'card6': 'credit',
187
+ 'addr1': 325.0,
188
+ 'addr2': 87.0,
189
+ 'dist1': 19.0,
190
+ 'dist2': 19.0,
191
+ 'P_emaildomain': 'gmail.com',
192
+ 'R_emaildomain': 'gmail.com',
193
+ 'C1': 1.0,
194
+ 'C2': 1.0,
195
+ 'C3': 0.0,
196
+ 'C4': 0.0,
197
+ 'C5': 0.0,
198
+ 'C6': 1.0,
199
+ 'C7': 0.0,
200
+ 'C8': 0.0,
201
+ 'C9': 1.0,
202
+ 'C10': 0.0,
203
+ 'C11': 1.0,
204
+ 'C12': 1.0,
205
+ 'C13': 1.0,
206
+ 'C14': 1.0,
207
+ 'D1': 0.0,
208
+ 'D2': 0.0,
209
+ 'D3': 0.0,
210
+ 'D4': 0.0,
211
+ 'D5': 20.0,
212
+ 'D10': 0.0,
213
+ 'D15': 0.0,
214
+ 'M1': 'T',
215
+ 'M2': 'T',
216
+ 'M3': 'T',
217
+ 'M4': 'M0',
218
+ 'M5': 'F',
219
+ 'M6': 'F',
220
+ 'TransactionDT': 86400
221
+ }
222
+
223
+ print("Sample transaction for testing:")
224
+ print(sample_transaction)
225
+ print("\n" + "="*50)
226
+ print("Model wrapper created successfully!")
227
+ print("To use: load your model files and call predict_single_transaction()")
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ gradio
2
+ pandas
3
+ numpy
4
+ scikit-learn
5
+ xgboost
6
+ lightgbm
7
+ joblib
8
+ matplotlib
9
+ seaborn