Transaction Purpose Classification System A machine learning system that classifies financial transactions based on their purpose text using multiple algorithms including Naive Bayes, Logistic Regression, and Support Vector Machines. ๐ Features
Multiple ML Models: Compare performance across different algorithms Text Preprocessing: Advanced text cleaning with NLTK Interactive Web Interface: Built with Streamlit Real-time Classification: Classify new transactions instantly Model Comparison: Detailed analysis of model performance LLM Integration Guide: Conceptual approach for transformer-based models
๐ Live Demo Visit the live demo on Hugging Face Spaces: [Your Space URL] ๐ Model Performance The system trains and compares three different models:
Naive Bayes: Fast and effective for text classification Logistic Regression: Good baseline with interpretable results Support Vector Machine: Often achieves high accuracy
๐ ๏ธ Local Development Prerequisites
Python 3.8+ pip
Installation
Clone the repository:
bashgit clone https://github.com/yourusername/transaction-classification.git cd transaction-classification
Install dependencies:
bashpip install -r requirements.txt
Run the application:
bashstreamlit run app.py
Open your browser and go to http://localhost:8501
๐ Project Structure transaction-classification/ โโโ app.py # Main Streamlit application โโโ requirements.txt # Python dependencies โโโ README.md # Project documentation โโโ .gitignore # Git ignore file ๐ง How It Works
- Data Preprocessing The system preprocesses transaction text by:
Converting to lowercase Removing punctuation and digits Removing stopwords Lemmatizing words Filtering short words
- Feature Extraction Uses TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert text into numerical features suitable for machine learning.
- Model Training Three models are trained and compared:
Naive Bayes: Probabilistic classifier based on Bayes' theorem Logistic Regression: Linear model for classification SVM: Support Vector Machine for high-dimensional data
- Classification The best-performing model is used to classify new transactions into categories like:
Rent Groceries Utilities Subscriptions Transportation Dining Shopping Healthcare Fitness
๐ค Large Language Model Approach Conceptual Implementation For improved performance, the system could be enhanced using transformer models: pythonfrom transformers import AutoTokenizer, AutoModelForSequenceClassification
Load pre-trained BERT model
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') model = AutoModelForSequenceClassification.from_pretrained( 'bert-base-uncased', num_labels=num_classes )
Fine-tune on transaction data
(See full implementation in the app)
Benefits of LLM Approach
Better Context Understanding: Captures semantic meaning Higher Accuracy: State-of-the-art performance Transfer Learning: Leverages pre-trained knowledge Robust to Variations: Handles different phrasings better
Implementation Considerations
Computational Requirements: Needs GPU for training Training Time: Longer than traditional ML Model Size: Larger deployment footprint Complexity: More complex pipeline
๐ Model Evaluation Models are evaluated using:
Accuracy: Overall correctness Precision: Correct positive predictions Recall: Ability to find all positive cases F1-Score: Harmonic mean of precision and recall
๐ API Usage While the main interface is web-based, the core functionality can be adapted for API usage: python# Example classification def classify_transaction(purpose_text): cleaned_text = preprocess_text(purpose_text) vectorized = vectorizer.transform([cleaned_text]) prediction = model.predict(vectorized)[0] return prediction
Usage
result = classify_transaction("Monthly apartment rent payment") print(f"Predicted category: {result}") ๐ Deployment to Hugging Face Spaces Step 1: Create Space
Go to Hugging Face Spaces Click "Create new Space" Choose "Streamlit" as the SDK Set space name and visibility
Step 2: Upload Files Upload these files to your space:
app.py requirements.txt README.md
Step 3: Configuration The space will automatically detect the Streamlit app and deploy it. Step 4: Access Your app will be available at: https://huggingface.co/spaces/yourusername/yourspacename ๐ Sample Data The system includes sample transaction data covering common categories: Purpose TextTransaction Type"Monthly apartment rent payment"rent"Grocery shopping at walmart"groceries"Electric bill payment"utilities"Netflix monthly subscription"subscription"Gas station fuel"transportation ๐ค Contributing
Fork the repository Create a feature branch Make your changes Add tests if applicable Submit a pull request
๐ License This project is licensed under the MIT License - see the LICENSE file for details. ๐ฎ Future Enhancements
Add more transaction categories Implement ensemble methods Add confidence scoring Include data upload functionality Add model retraining capability Implement A/B testing framework Add logging and monitoring