ReDiX
/

text-quality-classifier-ita

Text Classification

Model card Files Files and versions Community

text-quality-classifier-ita / README.md

mik3ml's picture

Update README.md

0bbc3ef verified 12 days ago

|

history blame contribute delete

1.35 kB

	---
	license: apache-2.0
	language:
	- it
	metrics:
	- accuracy
	base_model:
	- DeepMount00/ModernBERT-base-ita
	pipeline_tag: text-classification
	---

	# Text Quality Classifier (Binary)

	This model aim to classify the general quality and educational content of a given text. The available labels are 'LABEL_0' that means bad quality and 'LABEL_1' that means good quality.
	It can be used to efficiently filter by quality huge quantity of raw text. Useful for creating pretraining italian datasets.
	The model tend to classify as "good quality" wikipedia-like texts, containing educational, well structured and explained text.

	## How to get access
	This is a private model, but if you want to get access explain us how you're going to use this model at <a href="mailto:[email protected]">[email protected]</a>


	## Eval

	Durante la fase di valutazione, il modello ha ottenuto le seguenti metriche:

	* Eval Loss: 0.3422
	* Accuracy: 0.8607
	* F1-Score: 0.8597

	## How to use

	```python
	from transformers import pipeline

	MODEL = "ReDiX/text-quality-classifier-ita"
	pipe = pipeline("text-classification", model=MODEL, tokenizer=MODEL)

	example_text = "Questo è un testo di esempio in italiano per la classificazione."
	result = pipe(example_text)
	print(f"TEXT: '{example_text}'")
	print(f"RESULT: {result}")
	```

	# Eval

	![](confusion_matrix.png)