Transaction Purpose Classification System A machine learning system that classifies financial transactions based on their purpose text using multiple algorithms including Naive Bayes, Logistic Regression, and Support Vector Machines. 🌟 Features

Multiple ML Models: Compare performance across different algorithms Text Preprocessing: Advanced text cleaning with NLTK Interactive Web Interface: Built with Streamlit Real-time Classification: Classify new transactions instantly Model Comparison: Detailed analysis of model performance LLM Integration Guide: Conceptual approach for transformer-based models

🚀 Live Demo Visit the live demo on Hugging Face Spaces: [Your Space URL] 📊 Model Performance The system trains and compares three different models:

Naive Bayes: Fast and effective for text classification Logistic Regression: Good baseline with interpretable results Support Vector Machine: Often achieves high accuracy

🛠️ Local Development Prerequisites

Python 3.8+ pip

Installation

Clone the repository:

bashgit clone https://github.com/yourusername/transaction-classification.git cd transaction-classification

Install dependencies:

bashpip install -r requirements.txt

Run the application:

bashstreamlit run app.py

Open your browser and go to http://localhost:8501

📁 Project Structure transaction-classification/ ├── app.py # Main Streamlit application ├── requirements.txt # Python dependencies ├── README.md # Project documentation └── .gitignore # Git ignore file 🔧 How It Works

Data Preprocessing The system preprocesses transaction text by:

Converting to lowercase Removing punctuation and digits Removing stopwords Lemmatizing words Filtering short words

Feature Extraction Uses TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert text into numerical features suitable for machine learning.
Model Training Three models are trained and compared:

Naive Bayes: Probabilistic classifier based on Bayes' theorem Logistic Regression: Linear model for classification SVM: Support Vector Machine for high-dimensional data

Classification The best-performing model is used to classify new transactions into categories like:

Rent Groceries Utilities Subscriptions Transportation Dining Shopping Healthcare Fitness

🤖 Large Language Model Approach Conceptual Implementation For improved performance, the system could be enhanced using transformer models: pythonfrom transformers import AutoTokenizer, AutoModelForSequenceClassification

Load pre-trained BERT model

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased') model = AutoModelForSequenceClassification.from_pretrained( 'bert-base-uncased', num_labels=num_classes )

Fine-tune on transaction data

(See full implementation in the app)

Benefits of LLM Approach

Better Context Understanding: Captures semantic meaning Higher Accuracy: State-of-the-art performance Transfer Learning: Leverages pre-trained knowledge Robust to Variations: Handles different phrasings better

Implementation Considerations

Computational Requirements: Needs GPU for training Training Time: Longer than traditional ML Model Size: Larger deployment footprint Complexity: More complex pipeline

📈 Model Evaluation Models are evaluated using:

Accuracy: Overall correctness Precision: Correct positive predictions Recall: Ability to find all positive cases F1-Score: Harmonic mean of precision and recall

🔄 API Usage While the main interface is web-based, the core functionality can be adapted for API usage: python# Example classification def classify_transaction(purpose_text): cleaned_text = preprocess_text(purpose_text) vectorized = vectorizer.transform([cleaned_text]) prediction = model.predict(vectorized)[0] return prediction

Usage

result = classify_transaction("Monthly apartment rent payment") print(f"Predicted category: {result}") 🚀 Deployment to Hugging Face Spaces Step 1: Create Space

Go to Hugging Face Spaces Click "Create new Space" Choose "Streamlit" as the SDK Set space name and visibility

Step 2: Upload Files Upload these files to your space:

app.py requirements.txt README.md

Step 3: Configuration The space will automatically detect the Streamlit app and deploy it. Step 4: Access Your app will be available at: https://huggingface.co/spaces/yourusername/yourspacename 📝 Sample Data The system includes sample transaction data covering common categories: Purpose TextTransaction Type"Monthly apartment rent payment"rent"Grocery shopping at walmart"groceries"Electric bill payment"utilities"Netflix monthly subscription"subscription"Gas station fuel"transportation 🤝 Contributing

Fork the repository Create a feature branch Make your changes Add tests if applicable Submit a pull request

📄 License This project is licensed under the MIT License - see the LICENSE file for details. 🔮 Future Enhancements

Add more transaction categories Implement ensemble methods Add confidence scoring Include data upload functionality Add model retraining capability Implement A/B testing framework Add logging and monitoring