Spaces:

leynessa
/

bank

Sleeping

File size: 5,350 Bytes

33e4db8

Transaction Purpose Classification System
A machine learning system that classifies financial transactions based on their purpose text using multiple algorithms including Naive Bayes, Logistic Regression, and Support Vector Machines.
🌟 Features

Multiple ML Models: Compare performance across different algorithms
Text Preprocessing: Advanced text cleaning with NLTK
Interactive Web Interface: Built with Streamlit
Real-time Classification: Classify new transactions instantly
Model Comparison: Detailed analysis of model performance
LLM Integration Guide: Conceptual approach for transformer-based models

🚀 Live Demo
Visit the live demo on Hugging Face Spaces: [Your Space URL]
📊 Model Performance
The system trains and compares three different models:

Naive Bayes: Fast and effective for text classification
Logistic Regression: Good baseline with interpretable results
Support Vector Machine: Often achieves high accuracy

🛠️ Local Development
Prerequisites

Python 3.8+
pip

Installation

Clone the repository:

bashgit clone https://github.com/yourusername/transaction-classification.git
cd transaction-classification

Install dependencies:

bashpip install -r requirements.txt

Run the application:

bashstreamlit run app.py

Open your browser and go to http://localhost:8501

📁 Project Structure
transaction-classification/
├── app.py              # Main Streamlit application
├── requirements.txt    # Python dependencies
├── README.md          # Project documentation
└── .gitignore         # Git ignore file
🔧 How It Works
1. Data Preprocessing
The system preprocesses transaction text by:

Converting to lowercase
Removing punctuation and digits
Removing stopwords
Lemmatizing words
Filtering short words

2. Feature Extraction
Uses TF-IDF (Term Frequency-Inverse Document Frequency) vectorization to convert text into numerical features suitable for machine learning.
3. Model Training
Three models are trained and compared:

Naive Bayes: Probabilistic classifier based on Bayes' theorem
Logistic Regression: Linear model for classification
SVM: Support Vector Machine for high-dimensional data

4. Classification
The best-performing model is used to classify new transactions into categories like:

Rent
Groceries
Utilities
Subscriptions
Transportation
Dining
Shopping
Healthcare
Fitness

🤖 Large Language Model Approach
Conceptual Implementation
For improved performance, the system could be enhanced using transformer models:
pythonfrom transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load pre-trained BERT model
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

model = AutoModelForSequenceClassification.from_pretrained(
    'bert-base-uncased', 

    num_labels=num_classes

)


# Fine-tune on transaction data
# (See full implementation in the app)
Benefits of LLM Approach

Better Context Understanding: Captures semantic meaning
Higher Accuracy: State-of-the-art performance
Transfer Learning: Leverages pre-trained knowledge
Robust to Variations: Handles different phrasings better

Implementation Considerations

Computational Requirements: Needs GPU for training
Training Time: Longer than traditional ML
Model Size: Larger deployment footprint
Complexity: More complex pipeline

📈 Model Evaluation
Models are evaluated using:

Accuracy: Overall correctness
Precision: Correct positive predictions
Recall: Ability to find all positive cases
F1-Score: Harmonic mean of precision and recall

🔄 API Usage
While the main interface is web-based, the core functionality can be adapted for API usage:
python# Example classification
def classify_transaction(purpose_text):
    cleaned_text = preprocess_text(purpose_text)

    vectorized = vectorizer.transform([cleaned_text])

    prediction = model.predict(vectorized)[0]

    return prediction


# Usage
result = classify_transaction("Monthly apartment rent payment")

print(f"Predicted category: {result}")

🚀 Deployment to Hugging Face Spaces

Step 1: Create Space



Go to Hugging Face Spaces

Click "Create new Space"

Choose "Streamlit" as the SDK

Set space name and visibility



Step 2: Upload Files

Upload these files to your space:



app.py

requirements.txt

README.md



Step 3: Configuration

The space will automatically detect the Streamlit app and deploy it.

Step 4: Access

Your app will be available at: https://huggingface.co/spaces/yourusername/yourspacename

📝 Sample Data

The system includes sample transaction data covering common categories:

Purpose TextTransaction Type"Monthly apartment rent payment"rent"Grocery shopping at walmart"groceries"Electric bill payment"utilities"Netflix monthly subscription"subscription"Gas station fuel"transportation

🤝 Contributing



Fork the repository

Create a feature branch

Make your changes

Add tests if applicable

Submit a pull request



📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔮 Future Enhancements



 Add more transaction categories

 Implement ensemble methods

 Add confidence scoring

 Include data upload functionality

 Add model retraining capability

 Implement A/B testing framework

 Add logging and monitoring