mik3ml's picture
Update README.md
0bbc3ef verified
---
license: apache-2.0
language:
- it
metrics:
- accuracy
base_model:
- DeepMount00/ModernBERT-base-ita
pipeline_tag: text-classification
---
# Text Quality Classifier (Binary)
This model aim to classify the general quality and educational content of a given text. The available labels are 'LABEL_0' that means **bad quality** and 'LABEL_1' that means **good quality**.
It can be used to efficiently filter by quality huge quantity of raw text. Useful for creating pretraining italian datasets.
The model tend to classify as "good quality" wikipedia-like texts, containing educational, well structured and explained text.
## How to get access
This is a private model, but if you want to get access explain us how you're going to use this model at <a href="mailto:[email protected]">[email protected]</a>
## Eval
Durante la fase di valutazione, il modello ha ottenuto le seguenti metriche:
* **Eval Loss:** 0.3422
* **Accuracy:** 0.8607
* **F1-Score:** 0.8597
## How to use
```python
from transformers import pipeline
MODEL = "ReDiX/text-quality-classifier-ita"
pipe = pipeline("text-classification", model=MODEL, tokenizer=MODEL)
example_text = "Questo è un testo di esempio in italiano per la classificazione."
result = pipe(example_text)
print(f"TEXT: '{example_text}'")
print(f"RESULT: {result}")
```
# Eval
![](confusion_matrix.png)