bank / ReadMe.md
leynessa's picture
Upload 4 files
33e4db8 verified
Transaction Purpose Classification System
A machine learning system that classifies financial transactions based on their purpose text using multiple algorithms including Naive Bayes, Logistic Regression, and Support Vector Machines.
๐ŸŒŸ Features
Multiple ML Models: Compare performance across different algorithms
Text Preprocessing: Advanced text cleaning with NLTK
Interactive Web Interface: Built with Streamlit
Real-time Classification: Classify new transactions instantly
Model Comparison: Detailed analysis of model performance
LLM Integration Guide: Conceptual approach for transformer-based models
๐Ÿš€ Live Demo
Visit the live demo on Hugging Face Spaces: [Your Space URL]
๐Ÿ“Š Model Performance
The system trains and compares three different models:
Naive Bayes: Fast and effective for text classification
Logistic Regression: Good baseline with interpretable results
Support Vector Machine: Often achieves high accuracy
๐Ÿ› ๏ธ Local Development
Prerequisites
Python 3.8+
pip
Installation
Clone the repository:
bashgit clone https://github.com/yourusername/transaction-classification.git
cd transaction-classification
Install dependencies:
bashpip install -r requirements.txt
Run the application:
bashstreamlit run app.py
Open your browser and go to http://localhost:8501
๐Ÿ“ Project Structure
transaction-classification/
โ”œโ”€โ”€ app.py # Main Streamlit application
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”œโ”€โ”€ README.md # Project documentation
โ””โ”€โ”€ .gitignore # Git ignore file
๐Ÿ”ง How It Works
1. Data Preprocessing
The system preprocesses transaction text by:
Converting to lowercase
Removing punctuation and digits
Removing stopwords
Lemmatizing words
Filtering short words
2. Feature Extraction
Uses TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert text into numerical features suitable for machine learning.
3. Model Training
Three models are trained and compared:
Naive Bayes: Probabilistic classifier based on Bayes' theorem
Logistic Regression: Linear model for classification
SVM: Support Vector Machine for high-dimensional data
4. Classification
The best-performing model is used to classify new transactions into categories like:
Rent
Groceries
Utilities
Subscriptions
Transportation
Dining
Shopping
Healthcare
Fitness
๐Ÿค– Large Language Model Approach
Conceptual Implementation
For improved performance, the system could be enhanced using transformer models:
pythonfrom transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load pre-trained BERT model
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForSequenceClassification.from_pretrained(
'bert-base-uncased',
num_labels=num_classes
)
# Fine-tune on transaction data
# (See full implementation in the app)
Benefits of LLM Approach
Better Context Understanding: Captures semantic meaning
Higher Accuracy: State-of-the-art performance
Transfer Learning: Leverages pre-trained knowledge
Robust to Variations: Handles different phrasings better
Implementation Considerations
Computational Requirements: Needs GPU for training
Training Time: Longer than traditional ML
Model Size: Larger deployment footprint
Complexity: More complex pipeline
๐Ÿ“ˆ Model Evaluation
Models are evaluated using:
Accuracy: Overall correctness
Precision: Correct positive predictions
Recall: Ability to find all positive cases
F1-Score: Harmonic mean of precision and recall
๐Ÿ”„ API Usage
While the main interface is web-based, the core functionality can be adapted for API usage:
python# Example classification
def classify_transaction(purpose_text):
cleaned_text = preprocess_text(purpose_text)
vectorized = vectorizer.transform([cleaned_text])
prediction = model.predict(vectorized)[0]
return prediction
# Usage
result = classify_transaction("Monthly apartment rent payment")
print(f"Predicted category: {result}")
๐Ÿš€ Deployment to Hugging Face Spaces
Step 1: Create Space
Go to Hugging Face Spaces
Click "Create new Space"
Choose "Streamlit" as the SDK
Set space name and visibility
Step 2: Upload Files
Upload these files to your space:
app.py
requirements.txt
README.md
Step 3: Configuration
The space will automatically detect the Streamlit app and deploy it.
Step 4: Access
Your app will be available at: https://huggingface.co/spaces/yourusername/yourspacename
๐Ÿ“ Sample Data
The system includes sample transaction data covering common categories:
Purpose TextTransaction Type"Monthly apartment rent payment"rent"Grocery shopping at walmart"groceries"Electric bill payment"utilities"Netflix monthly subscription"subscription"Gas station fuel"transportation
๐Ÿค Contributing
Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request
๐Ÿ“„ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐Ÿ”ฎ Future Enhancements
Add more transaction categories
Implement ensemble methods
Add confidence scoring
Include data upload functionality
Add model retraining capability
Implement A/B testing framework
Add logging and monitoring