metadata

license: apache-2.0
language:
  - it
metrics:
  - accuracy
base_model:
  - DeepMount00/ModernBERT-base-ita
pipeline_tag: text-classification

Text Quality Classifier (Binary)

This model aim to classify the general quality and educational content of a given text. The available labels are 'LABEL_0' that means bad quality and 'LABEL_1' that means good quality. It can be used to efficiently filter by quality huge quantity of raw text. Useful for creating pretraining italian datasets. The model tend to classify as "good quality" wikipedia-like texts, containing educational, well structured and explained text.

How to get access

This is a private model, but if you want to get access explain us how you're going to use this model at [email protected]

Eval

Durante la fase di valutazione, il modello ha ottenuto le seguenti metriche:

Eval Loss: 0.3422
Accuracy: 0.8607
F1-Score: 0.8597

How to use

from transformers import pipeline

MODEL = "ReDiX/text-quality-classifier-ita"
pipe = pipeline("text-classification", model=MODEL, tokenizer=MODEL)

example_text = "Questo è un testo di esempio in italiano per la classificazione."
result = pipe(example_text)
print(f"TEXT: '{example_text}'")
print(f"RESULT: {result}")