File size: 5,350 Bytes
33e4db8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
Transaction Purpose Classification System
A machine learning system that classifies financial transactions based on their purpose text using multiple algorithms including Naive Bayes, Logistic Regression, and Support Vector Machines.
๐ŸŒŸ Features

Multiple ML Models: Compare performance across different algorithms
Text Preprocessing: Advanced text cleaning with NLTK
Interactive Web Interface: Built with Streamlit
Real-time Classification: Classify new transactions instantly
Model Comparison: Detailed analysis of model performance
LLM Integration Guide: Conceptual approach for transformer-based models

๐Ÿš€ Live Demo
Visit the live demo on Hugging Face Spaces: [Your Space URL]
๐Ÿ“Š Model Performance
The system trains and compares three different models:

Naive Bayes: Fast and effective for text classification
Logistic Regression: Good baseline with interpretable results
Support Vector Machine: Often achieves high accuracy

๐Ÿ› ๏ธ Local Development
Prerequisites

Python 3.8+
pip

Installation

Clone the repository:

bashgit clone https://github.com/yourusername/transaction-classification.git
cd transaction-classification

Install dependencies:

bashpip install -r requirements.txt

Run the application:

bashstreamlit run app.py

Open your browser and go to http://localhost:8501

๐Ÿ“ Project Structure
transaction-classification/
โ”œโ”€โ”€ app.py              # Main Streamlit application
โ”œโ”€โ”€ requirements.txt    # Python dependencies
โ”œโ”€โ”€ README.md          # Project documentation
โ””โ”€โ”€ .gitignore         # Git ignore file
๐Ÿ”ง How It Works
1. Data Preprocessing
The system preprocesses transaction text by:

Converting to lowercase
Removing punctuation and digits
Removing stopwords
Lemmatizing words
Filtering short words

2. Feature Extraction
Uses TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert text into numerical features suitable for machine learning.
3. Model Training
Three models are trained and compared:

Naive Bayes: Probabilistic classifier based on Bayes' theorem
Logistic Regression: Linear model for classification
SVM: Support Vector Machine for high-dimensional data

4. Classification
The best-performing model is used to classify new transactions into categories like:

Rent
Groceries
Utilities
Subscriptions
Transportation
Dining
Shopping
Healthcare
Fitness

๐Ÿค– Large Language Model Approach
Conceptual Implementation
For improved performance, the system could be enhanced using transformer models:
pythonfrom transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load pre-trained BERT model
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

model = AutoModelForSequenceClassification.from_pretrained(
    'bert-base-uncased', 

    num_labels=num_classes

)


# Fine-tune on transaction data
# (See full implementation in the app)
Benefits of LLM Approach

Better Context Understanding: Captures semantic meaning
Higher Accuracy: State-of-the-art performance
Transfer Learning: Leverages pre-trained knowledge
Robust to Variations: Handles different phrasings better

Implementation Considerations

Computational Requirements: Needs GPU for training
Training Time: Longer than traditional ML
Model Size: Larger deployment footprint
Complexity: More complex pipeline

๐Ÿ“ˆ Model Evaluation
Models are evaluated using:

Accuracy: Overall correctness
Precision: Correct positive predictions
Recall: Ability to find all positive cases
F1-Score: Harmonic mean of precision and recall

๐Ÿ”„ API Usage
While the main interface is web-based, the core functionality can be adapted for API usage:
python# Example classification
def classify_transaction(purpose_text):
    cleaned_text = preprocess_text(purpose_text)

    vectorized = vectorizer.transform([cleaned_text])

    prediction = model.predict(vectorized)[0]

    return prediction


# Usage
result = classify_transaction("Monthly apartment rent payment")

print(f"Predicted category: {result}")

๐Ÿš€ Deployment to Hugging Face Spaces

Step 1: Create Space



Go to Hugging Face Spaces

Click "Create new Space"

Choose "Streamlit" as the SDK

Set space name and visibility



Step 2: Upload Files

Upload these files to your space:



app.py

requirements.txt

README.md



Step 3: Configuration

The space will automatically detect the Streamlit app and deploy it.

Step 4: Access

Your app will be available at: https://huggingface.co/spaces/yourusername/yourspacename

๐Ÿ“ Sample Data

The system includes sample transaction data covering common categories:

Purpose TextTransaction Type"Monthly apartment rent payment"rent"Grocery shopping at walmart"groceries"Electric bill payment"utilities"Netflix monthly subscription"subscription"Gas station fuel"transportation

๐Ÿค Contributing



Fork the repository

Create a feature branch

Make your changes

Add tests if applicable

Submit a pull request



๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ”ฎ Future Enhancements



 Add more transaction categories

 Implement ensemble methods

 Add confidence scoring

 Include data upload functionality

 Add model retraining capability

 Implement A/B testing framework

 Add logging and monitoring