DeepakKumarMSL commited on
Commit
4cd7090
ยท
verified ยท
1 Parent(s): 1aac351

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Zero-Shot Text Classification using `facebook/bart-large-mnli`
2
+
3
+ This repository demonstrates how to use the [`facebook/bart-large-mnli`](https://huggingface.co/facebook/bart-large-mnli) model for **zero-shot text classification** based on **natural language inference (NLI)**.
4
+
5
+ We extend the base usage by:
6
+ - Using a labeled dataset for benchmarking
7
+ - Performing optional fine-tuning
8
+ - Quantizing the model to FP16
9
+ - Scoring model performance
10
+
11
+ ---
12
+
13
+ ## ๐Ÿ“Œ Model Description
14
+
15
+ - **Model:** `facebook/bart-large-mnli`
16
+ - **Type:** NLI-based zero-shot classifier
17
+ - **Architecture:** BART (Bidirectional and Auto-Regressive Transformers)
18
+ - **Usage:** Classifies text by scoring label hypotheses as NLI entailment
19
+
20
+ ---
21
+
22
+ ## ๐Ÿ“‚ Dataset
23
+
24
+ We use the [`yahoo_answers_topics`](https://huggingface.co/datasets/yahoo_answers_topics) dataset from Hugging Face for evaluation. It contains questions categorized into 10 topics.
25
+
26
+ ```python
27
+ from datasets import load_dataset
28
+
29
+ dataset = load_dataset("yahoo_answers_topics")
30
+ ```
31
+
32
+ # ๐Ÿง  Zero-Shot Classification Logic
33
+ The model checks whether a text entails a hypothesis like:
34
+
35
+ "This text is about sports."
36
+
37
+ For each candidate label (e.g., "sports", "education", "health"), we convert them into such hypotheses and use the model to score them.
38
+
39
+ # โœ… Example: Inference with Zero-Shot Pipeline
40
+ ```python
41
+ from transformers import pipeline
42
+
43
+ classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
44
+
45
+ sequence = "The team played well and won the championship."
46
+ labels = ["sports", "politics", "education", "technology"]
47
+
48
+ result = classifier(sequence, candidate_labels=labels)
49
+ print(result)
50
+ ```
51
+
52
+ # ๐Ÿ“Š Scoring / Evaluation
53
+ Evaluate zero-shot classification using accuracy or top-k accuracy:
54
+
55
+ ```python
56
+ from sklearn.metrics import accuracy_score
57
+
58
+ def evaluate_zero_shot(dataset, labels):
59
+ correct = 0
60
+ total = 0
61
+ for example in dataset:
62
+ result = classifier(example["question_content"], candidate_labels=labels)
63
+ predicted = result["labels"][0]
64
+ true = labels[example["topic"]]
65
+ correct += int(predicted == true)
66
+ total += 1
67
+ return correct / total
68
+
69
+ labels = ["Society & Culture", "Science & Mathematics", "Health", "Education",
70
+ "Computers & Internet", "Sports", "Business & Finance", "Entertainment & Music",
71
+ "Family & Relationships", "Politics & Government"]
72
+
73
+ acc = evaluate_zero_shot(dataset["test"].select(range(100)), labels)
74
+ print(f"Accuracy: {acc:.2%}")
75
+ ```