leynessa commited on
Commit
33e4db8
·
verified ·
1 Parent(s): 1f06e89

Upload 4 files

Browse files
Files changed (4) hide show
  1. ReadMe.md +170 -0
  2. app.py +378 -0
  3. gitignore.txt +55 -0
  4. requirements.txt +8 -3
ReadMe.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Transaction Purpose Classification System
2
+ A machine learning system that classifies financial transactions based on their purpose text using multiple algorithms including Naive Bayes, Logistic Regression, and Support Vector Machines.
3
+ 🌟 Features
4
+
5
+ Multiple ML Models: Compare performance across different algorithms
6
+ Text Preprocessing: Advanced text cleaning with NLTK
7
+ Interactive Web Interface: Built with Streamlit
8
+ Real-time Classification: Classify new transactions instantly
9
+ Model Comparison: Detailed analysis of model performance
10
+ LLM Integration Guide: Conceptual approach for transformer-based models
11
+
12
+ 🚀 Live Demo
13
+ Visit the live demo on Hugging Face Spaces: [Your Space URL]
14
+ 📊 Model Performance
15
+ The system trains and compares three different models:
16
+
17
+ Naive Bayes: Fast and effective for text classification
18
+ Logistic Regression: Good baseline with interpretable results
19
+ Support Vector Machine: Often achieves high accuracy
20
+
21
+ 🛠️ Local Development
22
+ Prerequisites
23
+
24
+ Python 3.8+
25
+ pip
26
+
27
+ Installation
28
+
29
+ Clone the repository:
30
+
31
+ bashgit clone https://github.com/yourusername/transaction-classification.git
32
+ cd transaction-classification
33
+
34
+ Install dependencies:
35
+
36
+ bashpip install -r requirements.txt
37
+
38
+ Run the application:
39
+
40
+ bashstreamlit run app.py
41
+
42
+ Open your browser and go to http://localhost:8501
43
+
44
+ 📁 Project Structure
45
+ transaction-classification/
46
+ ├── app.py # Main Streamlit application
47
+ ├── requirements.txt # Python dependencies
48
+ ├── README.md # Project documentation
49
+ └── .gitignore # Git ignore file
50
+ 🔧 How It Works
51
+ 1. Data Preprocessing
52
+ The system preprocesses transaction text by:
53
+
54
+ Converting to lowercase
55
+ Removing punctuation and digits
56
+ Removing stopwords
57
+ Lemmatizing words
58
+ Filtering short words
59
+
60
+ 2. Feature Extraction
61
+ Uses TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert text into numerical features suitable for machine learning.
62
+ 3. Model Training
63
+ Three models are trained and compared:
64
+
65
+ Naive Bayes: Probabilistic classifier based on Bayes' theorem
66
+ Logistic Regression: Linear model for classification
67
+ SVM: Support Vector Machine for high-dimensional data
68
+
69
+ 4. Classification
70
+ The best-performing model is used to classify new transactions into categories like:
71
+
72
+ Rent
73
+ Groceries
74
+ Utilities
75
+ Subscriptions
76
+ Transportation
77
+ Dining
78
+ Shopping
79
+ Healthcare
80
+ Fitness
81
+
82
+ 🤖 Large Language Model Approach
83
+ Conceptual Implementation
84
+ For improved performance, the system could be enhanced using transformer models:
85
+ pythonfrom transformers import AutoTokenizer, AutoModelForSequenceClassification
86
+
87
+ # Load pre-trained BERT model
88
+ tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
89
+ model = AutoModelForSequenceClassification.from_pretrained(
90
+ 'bert-base-uncased',
91
+ num_labels=num_classes
92
+ )
93
+
94
+ # Fine-tune on transaction data
95
+ # (See full implementation in the app)
96
+ Benefits of LLM Approach
97
+
98
+ Better Context Understanding: Captures semantic meaning
99
+ Higher Accuracy: State-of-the-art performance
100
+ Transfer Learning: Leverages pre-trained knowledge
101
+ Robust to Variations: Handles different phrasings better
102
+
103
+ Implementation Considerations
104
+
105
+ Computational Requirements: Needs GPU for training
106
+ Training Time: Longer than traditional ML
107
+ Model Size: Larger deployment footprint
108
+ Complexity: More complex pipeline
109
+
110
+ 📈 Model Evaluation
111
+ Models are evaluated using:
112
+
113
+ Accuracy: Overall correctness
114
+ Precision: Correct positive predictions
115
+ Recall: Ability to find all positive cases
116
+ F1-Score: Harmonic mean of precision and recall
117
+
118
+ 🔄 API Usage
119
+ While the main interface is web-based, the core functionality can be adapted for API usage:
120
+ python# Example classification
121
+ def classify_transaction(purpose_text):
122
+ cleaned_text = preprocess_text(purpose_text)
123
+ vectorized = vectorizer.transform([cleaned_text])
124
+ prediction = model.predict(vectorized)[0]
125
+ return prediction
126
+
127
+ # Usage
128
+ result = classify_transaction("Monthly apartment rent payment")
129
+ print(f"Predicted category: {result}")
130
+ 🚀 Deployment to Hugging Face Spaces
131
+ Step 1: Create Space
132
+
133
+ Go to Hugging Face Spaces
134
+ Click "Create new Space"
135
+ Choose "Streamlit" as the SDK
136
+ Set space name and visibility
137
+
138
+ Step 2: Upload Files
139
+ Upload these files to your space:
140
+
141
+ app.py
142
+ requirements.txt
143
+ README.md
144
+
145
+ Step 3: Configuration
146
+ The space will automatically detect the Streamlit app and deploy it.
147
+ Step 4: Access
148
+ Your app will be available at: https://huggingface.co/spaces/yourusername/yourspacename
149
+ 📝 Sample Data
150
+ The system includes sample transaction data covering common categories:
151
+ Purpose TextTransaction Type"Monthly apartment rent payment"rent"Grocery shopping at walmart"groceries"Electric bill payment"utilities"Netflix monthly subscription"subscription"Gas station fuel"transportation
152
+ 🤝 Contributing
153
+
154
+ Fork the repository
155
+ Create a feature branch
156
+ Make your changes
157
+ Add tests if applicable
158
+ Submit a pull request
159
+
160
+ 📄 License
161
+ This project is licensed under the MIT License - see the LICENSE file for details.
162
+ 🔮 Future Enhancements
163
+
164
+ Add more transaction categories
165
+ Implement ensemble methods
166
+ Add confidence scoring
167
+ Include data upload functionality
168
+ Add model retraining capability
169
+ Implement A/B testing framework
170
+ Add logging and monitoring
app.py ADDED
@@ -0,0 +1,378 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import numpy as np
4
+ import joblib
5
+ import re
6
+ from sklearn.feature_extraction.text import TfidfVectorizer
7
+ from sklearn.naive_bayes import MultinomialNB
8
+ from sklearn.linear_model import LogisticRegression
9
+ from sklearn.svm import LinearSVC
10
+ from sklearn.metrics import classification_report, accuracy_score
11
+ from sklearn.model_selection import train_test_split
12
+ import nltk
13
+ from nltk.corpus import stopwords
14
+ from nltk.stem import WordNetLemmatizer
15
+ import plotly.express as px
16
+ import plotly.graph_objects as go
17
+ from plotly.subplots import make_subplots
18
+
19
+ # Download required NLTK data
20
+ @st.cache_resource
21
+ def download_nltk_data():
22
+ try:
23
+ nltk.data.find('tokenizers/punkt')
24
+ nltk.data.find('corpora/stopwords')
25
+ nltk.data.find('corpora/wordnet')
26
+ except LookupError:
27
+ nltk.download('punkt', quiet=True)
28
+ nltk.download('stopwords', quiet=True)
29
+ nltk.download('wordnet', quiet=True)
30
+ nltk.download('omw-1.4', quiet=True)
31
+
32
+ download_nltk_data()
33
+
34
+ # Initialize preprocessing tools
35
+ stop_words = set(stopwords.words('english'))
36
+ lemmatizer = WordNetLemmatizer()
37
+
38
+ def preprocess_text(text):
39
+ """Clean and preprocess text for classification"""
40
+ if pd.isna(text):
41
+ return ""
42
+
43
+ text = str(text).lower()
44
+ text = re.sub(r'[^\w\s]', '', text) # remove punctuation
45
+ text = re.sub(r'\d+', '', text) # remove digits
46
+
47
+ words = text.split()
48
+ words = [lemmatizer.lemmatize(word) for word in words if word not in stop_words and len(word) > 2]
49
+
50
+ return ' '.join(words)
51
+
52
+ # Sample data for demonstration
53
+ @st.cache_data
54
+ def create_sample_data():
55
+ """Create sample transaction data"""
56
+ sample_data = [
57
+ ("Monthly apartment rent payment", "rent"),
58
+ ("Grocery shopping at walmart", "groceries"),
59
+ ("Electric bill payment", "utilities"),
60
+ ("Netflix monthly subscription", "subscription"),
61
+ ("Gas station fuel", "transportation"),
62
+ ("Restaurant dinner", "dining"),
63
+ ("Apartment rent for december", "rent"),
64
+ ("Weekly grocery shopping", "groceries"),
65
+ ("Water bill payment", "utilities"),
66
+ ("Spotify premium subscription", "subscription"),
67
+ ("Bus fare to work", "transportation"),
68
+ ("Coffee shop breakfast", "dining"),
69
+ ("Monthly rent payment", "rent"),
70
+ ("Food shopping at target", "groceries"),
71
+ ("Internet bill", "utilities"),
72
+ ("Amazon Prime membership", "subscription"),
73
+ ("Uber ride home", "transportation"),
74
+ ("Pizza delivery", "dining"),
75
+ ("Rent for apartment", "rent"),
76
+ ("Supermarket groceries", "groceries"),
77
+ ("Phone bill payment", "utilities"),
78
+ ("YouTube premium", "subscription"),
79
+ ("Train ticket", "transportation"),
80
+ ("Fast food lunch", "dining"),
81
+ ("Office supplies", "shopping"),
82
+ ("Medical appointment", "healthcare"),
83
+ ("Gym membership", "fitness"),
84
+ ("Book purchase", "shopping"),
85
+ ("Doctor visit", "healthcare"),
86
+ ("Fitness class", "fitness"),
87
+ ("Clothing purchase", "shopping"),
88
+ ("Pharmacy prescription", "healthcare"),
89
+ ("Personal trainer", "fitness"),
90
+ ("Electronics store", "shopping"),
91
+ ("Dentist appointment", "healthcare"),
92
+ ("Yoga class", "fitness"),
93
+ ("Gift for friend", "shopping"),
94
+ ("Eye exam", "healthcare"),
95
+ ("Swimming pool fee", "fitness"),
96
+ ("Home improvement", "shopping")
97
+ ]
98
+
99
+ df = pd.DataFrame(sample_data, columns=['purpose_text', 'transaction_type'])
100
+ return df
101
+
102
+ @st.cache_resource
103
+ def train_models(df):
104
+ """Train multiple models and return the best one"""
105
+ # Preprocess data
106
+ df['cleaned_purpose'] = df['purpose_text'].apply(preprocess_text)
107
+
108
+ X = df["cleaned_purpose"]
109
+ y = df["transaction_type"]
110
+
111
+ # Train-test split
112
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
113
+
114
+ # TF-IDF Vectorization
115
+ vectorizer = TfidfVectorizer(max_features=1000, ngram_range=(1, 2))
116
+ X_train_vec = vectorizer.fit_transform(X_train)
117
+ X_test_vec = vectorizer.transform(X_test)
118
+
119
+ # Train models
120
+ models = {
121
+ "Naive Bayes": MultinomialNB(),
122
+ "Logistic Regression": LogisticRegression(max_iter=1000, random_state=42),
123
+ "SVM (LinearSVC)": LinearSVC(random_state=42)
124
+ }
125
+
126
+ results = {}
127
+ trained_models = {}
128
+
129
+ for name, model in models.items():
130
+ model.fit(X_train_vec, y_train)
131
+ y_pred = model.predict(X_test_vec)
132
+ acc = accuracy_score(y_test, y_pred)
133
+ results[name] = {
134
+ 'accuracy': acc,
135
+ 'predictions': y_pred,
136
+ 'actual': y_test
137
+ }
138
+ trained_models[name] = model
139
+
140
+ # Find best model
141
+ best_model_name = max(results, key=lambda x: results[x]['accuracy'])
142
+ best_model = trained_models[best_model_name]
143
+
144
+ return best_model, vectorizer, results, trained_models
145
+
146
+ def main():
147
+ st.set_page_config(
148
+ page_title="Transaction Classification System",
149
+ page_icon="💳",
150
+ layout="wide"
151
+ )
152
+
153
+ st.title("💳 Transaction Purpose Classification")
154
+ st.markdown("---")
155
+
156
+ # Sidebar
157
+ st.sidebar.title("Navigation")
158
+ page = st.sidebar.radio("Choose a page:", ["🏠 Home", "📊 Model Training", "🔍 Classification", "📈 Model Comparison"])
159
+
160
+ # Load data
161
+ df = create_sample_data()
162
+
163
+ if page == "🏠 Home":
164
+ st.header("Welcome to Transaction Classification System")
165
+
166
+ col1, col2 = st.columns(2)
167
+
168
+ with col1:
169
+ st.subheader("📖 Project Overview")
170
+ st.write("""
171
+ This system classifies financial transactions based on their purpose text using machine learning.
172
+
173
+ **Features:**
174
+ - Multiple ML models (Naive Bayes, Logistic Regression, SVM)
175
+ - Text preprocessing with NLTK
176
+ - Interactive model comparison
177
+ - Real-time transaction classification
178
+ """)
179
+
180
+ with col2:
181
+ st.subheader("📊 Sample Data")
182
+ st.dataframe(df.head(10))
183
+
184
+ st.subheader("🏷️ Transaction Types")
185
+ type_counts = df['transaction_type'].value_counts()
186
+ fig = px.pie(values=type_counts.values, names=type_counts.index, title="Distribution of Transaction Types")
187
+ st.plotly_chart(fig, use_container_width=True)
188
+
189
+ elif page == "📊 Model Training":
190
+ st.header("Model Training & Evaluation")
191
+
192
+ # Train models
193
+ with st.spinner("Training models..."):
194
+ best_model, vectorizer, results, trained_models = train_models(df)
195
+
196
+ col1, col2 = st.columns(2)
197
+
198
+ with col1:
199
+ st.subheader("📈 Model Performance")
200
+
201
+ # Create results dataframe
202
+ results_df = pd.DataFrame({
203
+ 'Model': list(results.keys()),
204
+ 'Accuracy': [results[model]['accuracy'] for model in results.keys()]
205
+ })
206
+
207
+ fig = px.bar(results_df, x='Model', y='Accuracy', title="Model Accuracy Comparison")
208
+ fig.update_layout(yaxis_range=[0, 1])
209
+ st.plotly_chart(fig, use_container_width=True)
210
+
211
+ st.dataframe(results_df)
212
+
213
+ with col2:
214
+ st.subheader("🎯 Best Model Details")
215
+ best_model_name = max(results, key=lambda x: results[x]['accuracy'])
216
+ st.success(f"**Best Model:** {best_model_name}")
217
+ st.metric("Accuracy", f"{results[best_model_name]['accuracy']:.3f}")
218
+
219
+ # Classification report
220
+ st.subheader("📋 Classification Report")
221
+ y_test = results[best_model_name]['actual']
222
+ y_pred = results[best_model_name]['predictions']
223
+
224
+ report = classification_report(y_test, y_pred, output_dict=True)
225
+ report_df = pd.DataFrame(report).transpose()
226
+ st.dataframe(report_df.round(3))
227
+
228
+ # Store models in session state
229
+ st.session_state.best_model = best_model
230
+ st.session_state.vectorizer = vectorizer
231
+ st.session_state.trained_models = trained_models
232
+
233
+ elif page == "🔍 Classification":
234
+ st.header("Classify New Transaction")
235
+
236
+ # Check if models are trained
237
+ if 'best_model' not in st.session_state:
238
+ st.warning("Please train the models first by visiting the 'Model Training' page.")
239
+ return
240
+
241
+ # Input form
242
+ with st.form("classification_form"):
243
+ purpose_text = st.text_area("Enter transaction purpose:",
244
+ placeholder="e.g., Monthly apartment rent payment",
245
+ height=100)
246
+
247
+ submitted = st.form_submit_button("Classify Transaction")
248
+
249
+ if submitted and purpose_text:
250
+ # Preprocess input
251
+ cleaned_text = preprocess_text(purpose_text)
252
+
253
+ # Make prediction
254
+ vectorized_text = st.session_state.vectorizer.transform([cleaned_text])
255
+ prediction = st.session_state.best_model.predict(vectorized_text)[0]
256
+ prediction_proba = st.session_state.best_model.predict_proba(vectorized_text)[0]
257
+
258
+ # Get class labels
259
+ classes = st.session_state.best_model.classes_
260
+
261
+ # Display results
262
+ col1, col2 = st.columns(2)
263
+
264
+ with col1:
265
+ st.subheader("🎯 Classification Result")
266
+ st.success(f"**Predicted Type:** {prediction}")
267
+ st.info(f"**Original Text:** {purpose_text}")
268
+ st.info(f"**Processed Text:** {cleaned_text}")
269
+
270
+ with col2:
271
+ st.subheader("📊 Prediction Confidence")
272
+ proba_df = pd.DataFrame({
273
+ 'Transaction Type': classes,
274
+ 'Probability': prediction_proba
275
+ }).sort_values('Probability', ascending=False)
276
+
277
+ fig = px.bar(proba_df, x='Probability', y='Transaction Type',
278
+ orientation='h', title="Prediction Probabilities")
279
+ st.plotly_chart(fig, use_container_width=True)
280
+
281
+ elif page == "📈 Model Comparison":
282
+ st.header("Detailed Model Comparison")
283
+
284
+ # Check if models are trained
285
+ if 'trained_models' not in st.session_state:
286
+ st.warning("Please train the models first by visiting the 'Model Training' page.")
287
+ return
288
+
289
+ # Model comparison
290
+ st.subheader("🔍 Model Analysis")
291
+
292
+ # Get sample predictions for comparison
293
+ sample_texts = [
294
+ "Monthly rent payment",
295
+ "Grocery shopping",
296
+ "Netflix subscription",
297
+ "Gas station",
298
+ "Restaurant dinner"
299
+ ]
300
+
301
+ comparison_data = []
302
+ for text in sample_texts:
303
+ cleaned = preprocess_text(text)
304
+ vectorized = st.session_state.vectorizer.transform([cleaned])
305
+
306
+ row = {'Text': text, 'Cleaned': cleaned}
307
+ for model_name, model in st.session_state.trained_models.items():
308
+ prediction = model.predict(vectorized)[0]
309
+ row[model_name] = prediction
310
+
311
+ comparison_data.append(row)
312
+
313
+ comparison_df = pd.DataFrame(comparison_data)
314
+ st.dataframe(comparison_df, use_container_width=True)
315
+
316
+ # LLM/Transformer approach explanation
317
+ st.subheader("🤖 Large Language Model Approach")
318
+
319
+ with st.expander("Click to see LLM implementation strategy"):
320
+ st.markdown("""
321
+ ### Using Transformer Models for Transaction Classification
322
+
323
+ **Approach:**
324
+ 1. **Pre-trained Model Selection**: Use `bert-base-uncased` or `distilbert-base-uncased`
325
+ 2. **Tokenization**: Use HuggingFace's tokenizer for the selected model
326
+ 3. **Model Architecture**: Add a classification head on top of the transformer
327
+ 4. **Fine-tuning**: Train on labeled transaction data
328
+
329
+ **Code Example:**
330
+ ```python
331
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
332
+ from transformers import Trainer, TrainingArguments
333
+
334
+ # Load pre-trained model
335
+ tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
336
+ model = AutoModelForSequenceClassification.from_pretrained(
337
+ 'bert-base-uncased',
338
+ num_labels=len(unique_labels)
339
+ )
340
+
341
+ # Tokenize data
342
+ def tokenize_function(examples):
343
+ return tokenizer(examples['purpose_text'], truncation=True, padding=True)
344
+
345
+ # Fine-tune model
346
+ training_args = TrainingArguments(
347
+ output_dir='./results',
348
+ num_train_epochs=3,
349
+ per_device_train_batch_size=16,
350
+ per_device_eval_batch_size=64,
351
+ warmup_steps=500,
352
+ weight_decay=0.01,
353
+ )
354
+
355
+ trainer = Trainer(
356
+ model=model,
357
+ args=training_args,
358
+ train_dataset=train_dataset,
359
+ eval_dataset=eval_dataset,
360
+ )
361
+
362
+ trainer.train()
363
+ ```
364
+
365
+ **Benefits:**
366
+ - Better semantic understanding
367
+ - Handles context better than TF-IDF
368
+ - Can capture complex patterns
369
+ - State-of-the-art performance
370
+
371
+ **Drawbacks:**
372
+ - Requires more computational resources
373
+ - Longer training time
374
+ - More complex deployment
375
+ """)
376
+
377
+ if __name__ == "__main__":
378
+ main()
gitignore.txt ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual environments
24
+ venv/
25
+ env/
26
+ ENV/
27
+
28
+ # IDE
29
+ .vscode/
30
+ .idea/
31
+ *.swp
32
+ *.swo
33
+ *~
34
+
35
+ # OS
36
+ .DS_Store
37
+ Thumbs.db
38
+
39
+ # Jupyter Notebook
40
+ .ipynb_checkpoints
41
+
42
+ # Model files
43
+ *.pkl
44
+ *.joblib
45
+
46
+ # Data files
47
+ *.csv
48
+ *.json
49
+ data/
50
+
51
+ # Logs
52
+ *.log
53
+
54
+ # Streamlit
55
+ .streamlit/
requirements.txt CHANGED
@@ -1,3 +1,8 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
 
 
 
1
+
2
+ streamlit==1.28.1
3
+ pandas==2.0.3
4
+ numpy==1.24.3
5
+ scikit-learn==1.3.0
6
+ nltk==3.8.1
7
+ plotly==5.17.0
8
+ joblib==1.3.2