File size: 5,350 Bytes
33e4db8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 |
Transaction Purpose Classification System
A machine learning system that classifies financial transactions based on their purpose text using multiple algorithms including Naive Bayes, Logistic Regression, and Support Vector Machines.
๐ Features
Multiple ML Models: Compare performance across different algorithms
Text Preprocessing: Advanced text cleaning with NLTK
Interactive Web Interface: Built with Streamlit
Real-time Classification: Classify new transactions instantly
Model Comparison: Detailed analysis of model performance
LLM Integration Guide: Conceptual approach for transformer-based models
๐ Live Demo
Visit the live demo on Hugging Face Spaces: [Your Space URL]
๐ Model Performance
The system trains and compares three different models:
Naive Bayes: Fast and effective for text classification
Logistic Regression: Good baseline with interpretable results
Support Vector Machine: Often achieves high accuracy
๐ ๏ธ Local Development
Prerequisites
Python 3.8+
pip
Installation
Clone the repository:
bashgit clone https://github.com/yourusername/transaction-classification.git
cd transaction-classification
Install dependencies:
bashpip install -r requirements.txt
Run the application:
bashstreamlit run app.py
Open your browser and go to http://localhost:8501
๐ Project Structure
transaction-classification/
โโโ app.py # Main Streamlit application
โโโ requirements.txt # Python dependencies
โโโ README.md # Project documentation
โโโ .gitignore # Git ignore file
๐ง How It Works
1. Data Preprocessing
The system preprocesses transaction text by:
Converting to lowercase
Removing punctuation and digits
Removing stopwords
Lemmatizing words
Filtering short words
2. Feature Extraction
Uses TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert text into numerical features suitable for machine learning.
3. Model Training
Three models are trained and compared:
Naive Bayes: Probabilistic classifier based on Bayes' theorem
Logistic Regression: Linear model for classification
SVM: Support Vector Machine for high-dimensional data
4. Classification
The best-performing model is used to classify new transactions into categories like:
Rent
Groceries
Utilities
Subscriptions
Transportation
Dining
Shopping
Healthcare
Fitness
๐ค Large Language Model Approach
Conceptual Implementation
For improved performance, the system could be enhanced using transformer models:
pythonfrom transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load pre-trained BERT model
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForSequenceClassification.from_pretrained(
'bert-base-uncased',
num_labels=num_classes
)
# Fine-tune on transaction data
# (See full implementation in the app)
Benefits of LLM Approach
Better Context Understanding: Captures semantic meaning
Higher Accuracy: State-of-the-art performance
Transfer Learning: Leverages pre-trained knowledge
Robust to Variations: Handles different phrasings better
Implementation Considerations
Computational Requirements: Needs GPU for training
Training Time: Longer than traditional ML
Model Size: Larger deployment footprint
Complexity: More complex pipeline
๐ Model Evaluation
Models are evaluated using:
Accuracy: Overall correctness
Precision: Correct positive predictions
Recall: Ability to find all positive cases
F1-Score: Harmonic mean of precision and recall
๐ API Usage
While the main interface is web-based, the core functionality can be adapted for API usage:
python# Example classification
def classify_transaction(purpose_text):
cleaned_text = preprocess_text(purpose_text)
vectorized = vectorizer.transform([cleaned_text])
prediction = model.predict(vectorized)[0]
return prediction
# Usage
result = classify_transaction("Monthly apartment rent payment")
print(f"Predicted category: {result}")
๐ Deployment to Hugging Face Spaces
Step 1: Create Space
Go to Hugging Face Spaces
Click "Create new Space"
Choose "Streamlit" as the SDK
Set space name and visibility
Step 2: Upload Files
Upload these files to your space:
app.py
requirements.txt
README.md
Step 3: Configuration
The space will automatically detect the Streamlit app and deploy it.
Step 4: Access
Your app will be available at: https://huggingface.co/spaces/yourusername/yourspacename
๐ Sample Data
The system includes sample transaction data covering common categories:
Purpose TextTransaction Type"Monthly apartment rent payment"rent"Grocery shopping at walmart"groceries"Electric bill payment"utilities"Netflix monthly subscription"subscription"Gas station fuel"transportation
๐ค Contributing
Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ฎ Future Enhancements
Add more transaction categories
Implement ensemble methods
Add confidence scoring
Include data upload functionality
Add model retraining capability
Implement A/B testing framework
Add logging and monitoring |