|
--- |
|
language: |
|
- fa |
|
metrics: |
|
- f1 |
|
- accuracy |
|
- precision |
|
- recall |
|
base_model: |
|
- sbunlp/fabert |
|
pipeline_tag: text-classification |
|
tags: |
|
- code |
|
--- |
|
# **Fine-Tuned FaBERT Model for Formality Classification** |
|
|
|
This repository contains a fine-tuned version of **FABERT**, a pre-trained language model designed for **formality classification**. This model has been specifically trained to classify text as **formal** or **informal**, making it ideal for applications in content moderation, social media monitoring, and customer support automation. |
|
|
|
## **Model Overview** |
|
- **Architecture:** Built on the **FABERT** model, a transformer-based architecture optimized for NLP tasks. |
|
- **Task:** **Formality Classification** – distinguishing between formal and informal language in text. |
|
- **Fine-Tuning:** The model has been fine-tuned on a custom dataset containing a variety of formal and informal text. |
|
|
|
## **Key Features** |
|
- **Multilingual Support:** This model is capable of classifying text in multiple languages, ensuring robustness in diverse linguistic contexts. |
|
- **High Performance:** Fine-tuned to provide accurate predictions for formal vs. informal text classification. |
|
- **Efficient for Deployment:** Optimized for real-time use in environments like social media platforms, content moderation tools, and communication systems. |
|
|
|
## **How to Use the Model** |
|
|
|
You can use this model in your Python code with the Hugging Face `transformers` library and PyTorch. The following code snippet demonstrates how to tokenize text, make predictions, and classify whether the text is formal or informal. |
|
|
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
# Load the pre-trained tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained("faimlab/fabert_formality_classifier") |
|
model = AutoModelForSequenceClassification.from_pretrained("faimlab/fabert_formality_classifier") |
|
|
|
# Ensure the model runs on GPU if available |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model.to(device) |
|
|
|
# Example input text |
|
input_text = "Please find attached the report for your review." |
|
|
|
# Tokenize the input |
|
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=512) |
|
|
|
# Move the model and input to GPU if available |
|
inputs = {key: value.to(device) for key, value in inputs.items()} |
|
|
|
# Make predictions |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
logits = outputs.logits |
|
|
|
# Get the predicted label |
|
predicted_label = logits.argmax(dim=1).item() |
|
print(f"Predicted Label: {predicted_label}") |