代码旅行 commited on
Commit
a1ea779
·
verified ·
1 Parent(s): 1d7ef86

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ library_name: transformers
5
+ tags:
6
+ - distilbert
7
+ - text-classification
8
+ - emotion-analysis
9
+ - pytorch
10
+ - mac-m4-test
11
+ pipeline_tag: text-classification
12
+ widget:
13
+ - text: "I'm so excited to try out the new Mac Mini M4 for machine learning!"
14
+ example_title: "Excitement Example"
15
+ - text: "I'm a bit worried about the performance on complex tasks."
16
+ example_title: "Worry Example"
17
+ - text: "I am so grateful for all the help you have provided."
18
+ example_title: "Gratitude Example"
19
+ ---
20
+
21
+ # `distilbert-base-uncased` Finetuned for Emotion Analysis
22
+
23
+ 这是一个基于 `distilbert-base-uncased` 微调的、能够识别 **28种细粒度情感** 的分析模型。
24
+
25
+ **特别说明:这个模型主要是为了测试在新款 Mac Mini M4 上进行本地模型微调的流程和性能而创建的一次技术尝试。因此,它没有经过详尽的评估,主要用于演示和实验目的。**
26
+
27
+ ## 模型描述
28
+
29
+ 本模型可以识别文本中蕴含的 28 种不同的情绪。这比传统的情感分析(如积极/消极/中性)提供了更丰富、更细致的视角。
30
+
31
+ 完整的标签列表如下:
32
+
33
+ | Label | 中文 | Label | 中文 |
34
+ | :--- | :--- | :--- | :--- |
35
+ | `admiration` | 钦佩 | `gratitude` | 感谢 |
36
+ | `amusement` | 娱乐 | `grief` | 悲痛 |
37
+ | `anger` | 愤怒 | `joy` | 开心 |
38
+ | `annoyance` | 烦躁 | `love` | 爱 |
39
+ | `approval` | 认同 | `nervousness` | 紧张 |
40
+ | `caring` | 关心 | `optimism` | 乐观 |
41
+ | `confusion` | 困惑 | `pride` | 自豪 |
42
+ | `curiosity` | 好奇 | `realization` | 顿悟 |
43
+ | `desire` | 渴望 | `relief` | 如释重负 |
44
+ | `disappointment` | 失望 | `remorse` | 懊悔 |
45
+ | `disapproval` | 不认同 | `sadness` | 悲伤 |
46
+ | `disgust` | 厌恶 | `surprise` | 惊讶 |
47
+ | `embarrassment`| 尴尬 | `neutral` | 中性 |
48
+ | `excitement` | 激动 | `fear` | 害怕 |
49
+
50
+
51
+ ## 如何使用 (How to Use)
52
+
53
+ 你可以通过 `transformers` 库的 `pipeline` 轻松使用这个模型。
54
+
55
+ ```python
56
+ from transformers import pipeline
57
+
58
+ # 使用模型 ID 加载 pipeline
59
+ model_id = "tourcoder/distilbert-base-uncased-finetuned-emotion-analysis"
60
+ emotion_classifier = pipeline("text-classification", model=model_id)
61
+
62
+ # 进行预测
63
+ text = "I can't believe I finished the project, I am so relieved!"
64
+ results = emotion_classifier(text)
65
+
66
+ print(results)
67
+ # 预期输出: [{'label': 'relief', 'score': 0.9...}]
68
+ ```
69
+
70
+ ### 在 Apple Silicon (M1/M2/M3/M4) 上运行
71
+
72
+ 如果你在 Mac 上使用,可以指定设备为 `"mps"` 来利用 Apple Silicon 的 GPU 加速。
73
+
74
+ ```python
75
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
76
+ import torch
77
+
78
+ # 模型ID
79
+ model_id = "tourcoder/distilbert-base-uncased-finetuned-emotion-analysis"
80
+
81
+ # 检查 MPS 是否可用
82
+ device = "mps" if torch.backends.mps.is_available() else "cpu"
83
+ print(f"Using device: {device}")
84
+
85
+ # 加载模型和分词器
86
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
87
+ model = AutoModelForSequenceClassification.from_pretrained(model_id).to(device)
88
+
89
+ # 准备输入
90
+ text = "This experiment on the Mac Mini M4 was a great success!"
91
+ inputs = tokenizer(text, return_tensors="pt").to(device)
92
+
93
+ # 推理
94
+ with torch.no_grad():
95
+ logits = model(**inputs).logits
96
+
97
+ # 获取预测结果
98
+ predicted_class_id = logits.argmax().item()
99
+ predicted_label = model.config.id2label[predicted_class_id]
100
+
101
+ print(f"Text: '{text}'")
102
+ print(f"Predicted emotion: {predicted_label}")
103
+ # 预期输出: Predicted emotion: joy (或 pride / admiration)
104
+ ```
105
+
106
+ ## 训练与实验说明
107
+
108
+ * **实验目的**: 验证和体验在 **Mac Mini (M4 芯片)** 上使用 PyTorch 和 `transformers` 库进行本地模型微调的完整流程。
109
+ * **硬件**: Apple Mac Mini (M4 Chip)
110
+ * **框架**: PyTorch (利用 MPS 后端进行加速)
111
+ * **基础模型**: `distilbert-base-uncased`
112
+ * **数据集**: 该模型使用了包含28个情感标签的数据集进行微调,自制数据集。
113
+ * **免责声明**: 这是一个概念验证(Proof of Concept)模型。其性能和鲁棒性未经过严格测试,不建议直接用于生产环境。
114
+
115
+ ## 局限性 (Limitations)
116
+
117
+ * `distilbert` 是一个轻量级模型,虽然速度快,但在理解复杂和细微的情感上可能不如更大的模型(如 `RoBERTa` 或 `DeBERTa`)。
118
+ * 模型的表现高度依赖于其训练数据。对于训练集中未涵盖的文本风格或领域,其预测可能不准确。
119
+ * 模型可能会反映出训练数据中存在的偏见。
config.json ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation": "gelu",
3
+ "architectures": [
4
+ "DistilBertForSequenceClassification"
5
+ ],
6
+ "attention_dropout": 0.1,
7
+ "dim": 768,
8
+ "dropout": 0.1,
9
+ "hidden_dim": 3072,
10
+ "id2label": {
11
+ "0": "LABEL_0",
12
+ "1": "LABEL_1",
13
+ "2": "LABEL_2",
14
+ "3": "LABEL_3",
15
+ "4": "LABEL_4",
16
+ "5": "LABEL_5",
17
+ "6": "LABEL_6",
18
+ "7": "LABEL_7",
19
+ "8": "LABEL_8",
20
+ "9": "LABEL_9",
21
+ "10": "LABEL_10",
22
+ "11": "LABEL_11",
23
+ "12": "LABEL_12",
24
+ "13": "LABEL_13",
25
+ "14": "LABEL_14",
26
+ "15": "LABEL_15",
27
+ "16": "LABEL_16",
28
+ "17": "LABEL_17",
29
+ "18": "LABEL_18",
30
+ "19": "LABEL_19",
31
+ "20": "LABEL_20",
32
+ "21": "LABEL_21",
33
+ "22": "LABEL_22",
34
+ "23": "LABEL_23",
35
+ "24": "LABEL_24",
36
+ "25": "LABEL_25",
37
+ "26": "LABEL_26",
38
+ "27": "LABEL_27"
39
+ },
40
+ "initializer_range": 0.02,
41
+ "label2id": {
42
+ "LABEL_0": 0,
43
+ "LABEL_1": 1,
44
+ "LABEL_10": 10,
45
+ "LABEL_11": 11,
46
+ "LABEL_12": 12,
47
+ "LABEL_13": 13,
48
+ "LABEL_14": 14,
49
+ "LABEL_15": 15,
50
+ "LABEL_16": 16,
51
+ "LABEL_17": 17,
52
+ "LABEL_18": 18,
53
+ "LABEL_19": 19,
54
+ "LABEL_2": 2,
55
+ "LABEL_20": 20,
56
+ "LABEL_21": 21,
57
+ "LABEL_22": 22,
58
+ "LABEL_23": 23,
59
+ "LABEL_24": 24,
60
+ "LABEL_25": 25,
61
+ "LABEL_26": 26,
62
+ "LABEL_27": 27,
63
+ "LABEL_3": 3,
64
+ "LABEL_4": 4,
65
+ "LABEL_5": 5,
66
+ "LABEL_6": 6,
67
+ "LABEL_7": 7,
68
+ "LABEL_8": 8,
69
+ "LABEL_9": 9
70
+ },
71
+ "max_position_embeddings": 512,
72
+ "model_type": "distilbert",
73
+ "n_heads": 12,
74
+ "n_layers": 6,
75
+ "pad_token_id": 0,
76
+ "problem_type": "single_label_classification",
77
+ "qa_dropout": 0.1,
78
+ "seq_classif_dropout": 0.2,
79
+ "sinusoidal_pos_embds": false,
80
+ "tie_weights_": true,
81
+ "torch_dtype": "float32",
82
+ "transformers_version": "4.53.2",
83
+ "vocab_size": 30522
84
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27147274e75dcfc6b354b51509db56323576d10e8f57e2492e8692687616c326
3
+ size 267912544
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "DistilBertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff