Text Quality Classifier (Binary)

This model aim to classify the general quality and educational content of a given text. The available labels are 'LABEL_0' that means bad quality and 'LABEL_1' that means good quality. It can be used to efficiently filter by quality huge quantity of raw text. Useful for creating pretraining italian datasets. The model tend to classify as "good quality" wikipedia-like texts, containing educational, well structured and explained text.

How to get access

This is a private model, but if you want to get access explain us how you're going to use this model at [email protected]

Eval

Durante la fase di valutazione, il modello ha ottenuto le seguenti metriche:

Eval Loss: 0.3422
Accuracy: 0.8607
F1-Score: 0.8597

How to use

from transformers import pipeline

MODEL = "ReDiX/text-quality-classifier-ita"
pipe = pipeline("text-classification", model=MODEL, tokenizer=MODEL)

example_text = "Questo è un testo di esempio in italiano per la classificazione."
result = pipe(example_text)
print(f"TEXT: '{example_text}'")
print(f"RESULT: {result}")

ReDiX
/

text-quality-classifier-ita

You need to agree to share your contact information to access this model

Text Quality Classifier (Binary)

How to get access

Eval

How to use

Eval

Model tree for ReDiX/text-quality-classifier-ita