Text Quality Classifier (Binary)
This model aim to classify the general quality and educational content of a given text. The available labels are 'LABEL_0' that means bad quality and 'LABEL_1' that means good quality. It can be used to efficiently filter by quality huge quantity of raw text. Useful for creating pretraining italian datasets. The model tend to classify as "good quality" wikipedia-like texts, containing educational, well structured and explained text.
How to get access
This is a private model, but if you want to get access explain us how you're going to use this model at [email protected]
Eval
Durante la fase di valutazione, il modello ha ottenuto le seguenti metriche:
- Eval Loss: 0.3422
- Accuracy: 0.8607
- F1-Score: 0.8597
How to use
from transformers import pipeline
MODEL = "ReDiX/text-quality-classifier-ita"
pipe = pipeline("text-classification", model=MODEL, tokenizer=MODEL)
example_text = "Questo è un testo di esempio in italiano per la classificazione."
result = pipe(example_text)
print(f"TEXT: '{example_text}'")
print(f"RESULT: {result}")
Eval
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for ReDiX/text-quality-classifier-ita
Base model
DeepMount00/ModernBERT-base-ita