gautamnancy commited on
Commit
e4b7e3f
·
verified ·
1 Parent(s): 6b0c80b

Upload 7 files

Browse files
README.md ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # RoBERTa-Base Model for Emotion Classification
2
+
3
+ This repository hosts a fine-tuned version of the RoBERTa model for emotion classification tasks. The model has been trained to accurately classify text into six emotion categories, making it suitable for sentiment analysis and emotional content understanding.
4
+
5
+ ---
6
+
7
+ ## Model Details
8
+
9
+ - **Model Name:** RoBERTa-Base for Emotion Classification
10
+ - **Model Architecture:** RoBERTa Base
11
+ - **Task:** Emotion Classification
12
+ - **Dataset:** Hugging Face Emotion Dataset
13
+ - **Quantization:** Float16 version available
14
+ - **Fine-tuning Framework:** Hugging Face Transformers
15
+
16
+ ---
17
+
18
+ ## Usage
19
+
20
+ ### Installation
21
+
22
+ ```
23
+ pip install transformers torch
24
+ ```
25
+
26
+ ### Loading the Model
27
+
28
+ ```
29
+ from transformers import RobertaTokenizer, RobertaForSequenceClassification
30
+ import torch
31
+ import re
32
+
33
+ # Load model and tokenizer
34
+ model_path = "emotion-model" # or "quantized-emotion-model" for the quantized version
35
+ model = RobertaForSequenceClassification.from_pretrained(model_path)
36
+ tokenizer = RobertaTokenizer.from_pretrained(model_path)
37
+
38
+ # Set device
39
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
40
+ model = model.to(device)
41
+ ```
42
+
43
+ ### Prediction Function
44
+
45
+ ```
46
+ def predict_emotions(texts, model, tokenizer, device='cpu'):
47
+ """
48
+ Predicts emotion labels for input text(s) using a fine-tuned transformer model.
49
+
50
+ Args:
51
+ texts (str or List[str]): A single string or list of strings to classify.
52
+ model: Trained transformer model.
53
+ tokenizer: Corresponding tokenizer.
54
+ device (str): 'cpu' or 'cuda'. Default is 'cpu'.
55
+
56
+ Returns:
57
+ List[str]: List of predicted emotion labels.
58
+ """
59
+ # Ensure model is on correct device
60
+ model.to(device)
61
+
62
+ # If a single string is passed, convert to list
63
+ if isinstance(texts, str):
64
+ texts = [texts]
65
+
66
+ # Preprocess: simple text cleaning
67
+ def preprocess(text):
68
+ text = text.lower()
69
+ text = re.sub(r"http\S+|www\S+|https\S+", '', text)
70
+ text = re.sub(r'\@\w+|\#', '', text)
71
+ text = re.sub(r"[^a-zA-Z0-9\s.,!?']", '', text)
72
+ text = re.sub(r'\s+', ' ', text).strip()
73
+ return text
74
+
75
+ cleaned_texts = [preprocess(t) for t in texts]
76
+
77
+ # Tokenize
78
+ inputs = tokenizer(cleaned_texts, padding=True, truncation=True, return_tensors="pt").to(device)
79
+
80
+ # Inference
81
+ model.eval()
82
+ with torch.no_grad():
83
+ outputs = model(**inputs)
84
+ preds = torch.argmax(outputs.logits, dim=1).tolist()
85
+
86
+ # Emotion dataset label map
87
+ label_map = {
88
+ 0: "sadness",
89
+ 1: "joy",
90
+ 2: "love",
91
+ 3: "anger",
92
+ 4: "fear",
93
+ 5: "surprise"
94
+ }
95
+
96
+ return [label_map[p] for p in preds]
97
+ ```
98
+
99
+ ### Example Usage
100
+
101
+ ```
102
+ # Example texts
103
+ sample_texts = [
104
+ "I'm so happy about the new job opportunity!",
105
+ "I can't believe they cancelled my favorite show. This is terrible.",
106
+ "The sunset over the mountains took my breath away. It was magnificent!"
107
+ ]
108
+
109
+ # Run predictions
110
+ results = predict_emotions(sample_texts, model, tokenizer, device)
111
+
112
+ # Show results
113
+ for text, emotion in zip(sample_texts, results):
114
+ print(f"Text: {text}\nPredicted Emotion: {emotion}\n")
115
+ ```
116
+
117
+ ---
118
+
119
+ ## Performance Metrics
120
+
121
+ - **Accuracy:** 0.94
122
+ - **F1 Score:** 0.939736
123
+ - **Precision:** 0.941654
124
+ - **Recall:** 0.94
125
+
126
+ ---
127
+
128
+ ## Fine-Tuning Details
129
+
130
+ ### Dataset
131
+
132
+ The model was fine-tuned on the Hugging Face Emotion dataset which contains text labeled with six emotion categories:
133
+ - sadness
134
+ - joy
135
+ - love
136
+ - anger
137
+ - fear
138
+ - surprise
139
+
140
+ ### Training Configuration
141
+
142
+ - **Epochs:** 3
143
+ - **Batch Size:** 16
144
+ - **Learning Rate:** 2e-5
145
+ - **Max Length:** 128 tokens
146
+ - **Evaluation Strategy:** epoch
147
+ - **Weight Decay:** 0.01
148
+ - **Optimizer:** AdamW
149
+
150
+ ### Quantization
151
+
152
+ A quantized version of the model is available using PyTorch's float16 format to reduce model size and improve inference efficiency.
153
+
154
+ ---
155
+
156
+ ## Repository Structure
157
+
158
+ ```
159
+ .
160
+ ├── emotion-model/ # Full-precision model
161
+ │ ├── config.json
162
+ │ ├── model.safetensors
163
+ │ ├── tokenizer_config.json
164
+ │ ├── special_tokens_map.json
165
+ │ ├── vocab.json
166
+ │ └── merges.txt
167
+ ├── quantized-emotion-model/ # Quantized model (float16)
168
+ │ ├── config.json
169
+ │ ├── model.safetensors
170
+ │ ├── tokenizer_config.json
171
+ │ ├── special_tokens_map.json
172
+ │ ├── vocab.json
173
+ │ └── merges.txt
174
+ └── README.md # Model documentation
175
+ ```
176
+
177
+ ---
178
+
179
+ ## Limitations
180
+
181
+ - The model may not generalize well to domains outside the fine-tuning dataset.
182
+ - Emotion detection can be subjective and context-dependent.
183
+ - The quantized version may show minor accuracy degradation compared to the full-precision model.
184
+
185
+ ---
186
+
187
+ ## Contributing
188
+
189
+ Contributions are welcome! Feel free to open an issue or PR for improvements, fixes, or feature extensions.
config (1).json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "RobertaForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0",
14
+ "1": "LABEL_1",
15
+ "2": "LABEL_2",
16
+ "3": "LABEL_3",
17
+ "4": "LABEL_4",
18
+ "5": "LABEL_5"
19
+ },
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 3072,
22
+ "label2id": {
23
+ "LABEL_0": 0,
24
+ "LABEL_1": 1,
25
+ "LABEL_2": 2,
26
+ "LABEL_3": 3,
27
+ "LABEL_4": 4,
28
+ "LABEL_5": 5
29
+ },
30
+ "layer_norm_eps": 1e-05,
31
+ "max_position_embeddings": 514,
32
+ "model_type": "roberta",
33
+ "num_attention_heads": 12,
34
+ "num_hidden_layers": 12,
35
+ "pad_token_id": 1,
36
+ "position_embedding_type": "absolute",
37
+ "problem_type": "single_label_classification",
38
+ "torch_dtype": "float16",
39
+ "transformers_version": "4.51.3",
40
+ "type_vocab_size": 1,
41
+ "use_cache": true,
42
+ "vocab_size": 50265
43
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model (2).safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ee4e14c130fd472b333ea524b515cb05798fbc277bce27c0c1a2106b5405174
3
+ size 249324580
special_tokens_map (1).json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": true,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": true,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": true,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer_config (1).json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "<s>",
6
+ "lstrip": false,
7
+ "normalized": true,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "<pad>",
14
+ "lstrip": false,
15
+ "normalized": true,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "</s>",
22
+ "lstrip": false,
23
+ "normalized": true,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "<unk>",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "50264": {
37
+ "content": "<mask>",
38
+ "lstrip": true,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ }
44
+ },
45
+ "bos_token": "<s>",
46
+ "clean_up_tokenization_spaces": false,
47
+ "cls_token": "<s>",
48
+ "eos_token": "</s>",
49
+ "errors": "replace",
50
+ "extra_special_tokens": {},
51
+ "mask_token": "<mask>",
52
+ "model_max_length": 512,
53
+ "pad_token": "<pad>",
54
+ "sep_token": "</s>",
55
+ "tokenizer_class": "RobertaTokenizer",
56
+ "unk_token": "<unk>"
57
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff