You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Text Quality Classifier (Binary)

This model aim to classify the general quality and educational content of a given text. The available labels are 'LABEL_0' that means bad quality and 'LABEL_1' that means good quality. It can be used to efficiently filter by quality huge quantity of raw text. Useful for creating pretraining italian datasets. The model tend to classify as "good quality" wikipedia-like texts, containing educational, well structured and explained text.

How to get access

This is a private model, but if you want to get access explain us how you're going to use this model at [email protected]

Eval

Durante la fase di valutazione, il modello ha ottenuto le seguenti metriche:

  • Eval Loss: 0.3422
  • Accuracy: 0.8607
  • F1-Score: 0.8597

How to use

from transformers import pipeline

MODEL = "ReDiX/text-quality-classifier-ita"
pipe = pipeline("text-classification", model=MODEL, tokenizer=MODEL)

example_text = "Questo è un testo di esempio in italiano per la classificazione."
result = pipe(example_text)
print(f"TEXT: '{example_text}'")
print(f"RESULT: {result}")

Eval

Downloads last month
10
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ReDiX/text-quality-classifier-ita

Finetuned
(2)
this model