Upload folder using huggingface_hub

Browse files

Files changed (7) hide show

README.md +119 -0
config.json +84 -0
model.safetensors +3 -0
special_tokens_map.json +7 -0
tokenizer.json +0 -0
tokenizer_config.json +56 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,119 @@

+---
+language: en
+license: apache-2.0
+library_name: transformers
+tags:
+- distilbert
+- text-classification
+- emotion-analysis
+- pytorch
+- mac-m4-test
+pipeline_tag: text-classification
+widget:
+- text: "I'm so excited to try out the new Mac Mini M4 for machine learning!"
+  example_title: "Excitement Example"
+- text: "I'm a bit worried about the performance on complex tasks."
+  example_title: "Worry Example"
+- text: "I am so grateful for all the help you have provided."
+  example_title: "Gratitude Example"
+---
+# `distilbert-base-uncased` Finetuned for Emotion Analysis
+这是一个基于 `distilbert-base-uncased` 微调的、能够识别 **28种细粒度情感** 的分析模型。
+**特别说明：这个模型主要是为了测试在新款 Mac Mini M4 上进行本地模型微调的流程和性能而创建的一次技术尝试。因此，它没有经过详尽的评估，主要用于演示和实验目的。**
+## 模型描述
+本模型可以识别文本中蕴含的 28 种不同的情绪。这比传统的情感分析（如积极/消极/中性）提供了更丰富、更细致的视角。
+完整的标签列表如下：
+| Label | 中文 | Label | 中文 |
+| :--- | :--- | :--- | :--- |
+| `admiration` | 钦佩 | `gratitude` | 感谢 |
+| `amusement` | 娱乐 | `grief` | 悲痛 |
+| `anger` | 愤怒 | `joy` | 开心 |
+| `annoyance` | 烦躁 | `love` | 爱 |
+| `approval` | 认同 | `nervousness` | 紧张 |
+| `caring` | 关心 | `optimism` | 乐观 |
+| `confusion` | 困惑 | `pride` | 自豪 |
+| `curiosity` | 好奇 | `realization` | 顿悟 |
+| `desire` | 渴望 | `relief` | 如释重负 |
+| `disappointment` | 失望 | `remorse` | 懊悔 |
+| `disapproval` | 不认同 | `sadness` | 悲伤 |
+| `disgust` | 厌恶 | `surprise` | 惊讶 |
+| `embarrassment`| 尴尬 | `neutral` | 中性 |
+| `excitement` | 激动 | `fear` | 害怕 |
+## 如何使用 (How to Use)
+你可以通过 `transformers` 库的 `pipeline` 轻松使用这个模型。
+```python
+from transformers import pipeline
+# 使用模型 ID 加载 pipeline
+model_id = "tourcoder/distilbert-base-uncased-finetuned-emotion-analysis"
+emotion_classifier = pipeline("text-classification", model=model_id)
+# 进行预测
+text = "I can't believe I finished the project, I am so relieved!"
+results = emotion_classifier(text)
+print(results)
+# 预期输出: [{'label': 'relief', 'score': 0.9...}]
+```
+### 在 Apple Silicon (M1/M2/M3/M4) 上运行
+如果你在 Mac 上使用，可以指定设备为 `"mps"` 来利用 Apple Silicon 的 GPU 加速。
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# 模型ID
+model_id = "tourcoder/distilbert-base-uncased-finetuned-emotion-analysis"
+# 检查 MPS 是否可用
+device = "mps" if torch.backends.mps.is_available() else "cpu"
+print(f"Using device: {device}")
+# 加载模型和分词器
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(model_id).to(device)
+# 准备输入
+text = "This experiment on the Mac Mini M4 was a great success!"
+inputs = tokenizer(text, return_tensors="pt").to(device)
+# 推理
+with torch.no_grad():
+    logits = model(**inputs).logits
+# 获取预测结果
+predicted_class_id = logits.argmax().item()
+predicted_label = model.config.id2label[predicted_class_id]
+print(f"Text: '{text}'")
+print(f"Predicted emotion: {predicted_label}")
+# 预期输出: Predicted emotion: joy (或 pride / admiration)
+```
+## 训练与实验说明
+* **实验目的**: 验证和体验在 **Mac Mini (M4 芯片)** 上使用 PyTorch 和 `transformers` 库进行本地模型微调的完整流程。
+* **硬件**: Apple Mac Mini (M4 Chip)
+* **框架**: PyTorch (利用 MPS 后端进行加速)
+* **基础模型**: `distilbert-base-uncased`
+* **数据集**: 该模型使用了包含28个情感标签的数据集进行微调，自制数据集。
+* **免责声明**: 这是一个概念验证（Proof of Concept）模型。其性能和鲁棒性未经过严格测试，不建议直接用于生产环境。
+## 局限性 (Limitations)
+* `distilbert` 是一个轻量级模型，虽然速度快，但在理解复杂和细微的情感上可能不如更大的模型（如 `RoBERTa` 或 `DeBERTa`）。
+* 模型的表现高度依赖于其训练数据。对于训练集中未涵盖的文本风格或领域，其预测可能不准确。
+* 模型可能会反映出训练数据中存在的偏见。

config.json ADDED Viewed

	@@ -0,0 +1,84 @@

+{
+  "activation": "gelu",
+  "architectures": [
+    "DistilBertForSequenceClassification"
+  ],
+  "attention_dropout": 0.1,
+  "dim": 768,
+  "dropout": 0.1,
+  "hidden_dim": 3072,
+  "id2label": {
+    "0": "LABEL_0",
+    "1": "LABEL_1",
+    "2": "LABEL_2",
+    "3": "LABEL_3",
+    "4": "LABEL_4",
+    "5": "LABEL_5",
+    "6": "LABEL_6",
+    "7": "LABEL_7",
+    "8": "LABEL_8",
+    "9": "LABEL_9",
+    "10": "LABEL_10",
+    "11": "LABEL_11",
+    "12": "LABEL_12",
+    "13": "LABEL_13",
+    "14": "LABEL_14",
+    "15": "LABEL_15",
+    "16": "LABEL_16",
+    "17": "LABEL_17",
+    "18": "LABEL_18",
+    "19": "LABEL_19",
+    "20": "LABEL_20",
+    "21": "LABEL_21",
+    "22": "LABEL_22",
+    "23": "LABEL_23",
+    "24": "LABEL_24",
+    "25": "LABEL_25",
+    "26": "LABEL_26",
+    "27": "LABEL_27"
+  },
+  "initializer_range": 0.02,
+  "label2id": {
+    "LABEL_0": 0,
+    "LABEL_1": 1,
+    "LABEL_10": 10,
+    "LABEL_11": 11,
+    "LABEL_12": 12,
+    "LABEL_13": 13,
+    "LABEL_14": 14,
+    "LABEL_15": 15,
+    "LABEL_16": 16,
+    "LABEL_17": 17,
+    "LABEL_18": 18,
+    "LABEL_19": 19,
+    "LABEL_2": 2,
+    "LABEL_20": 20,
+    "LABEL_21": 21,
+    "LABEL_22": 22,
+    "LABEL_23": 23,
+    "LABEL_24": 24,
+    "LABEL_25": 25,
+    "LABEL_26": 26,
+    "LABEL_27": 27,
+    "LABEL_3": 3,
+    "LABEL_4": 4,
+    "LABEL_5": 5,
+    "LABEL_6": 6,
+    "LABEL_7": 7,
+    "LABEL_8": 8,
+    "LABEL_9": 9
+  },
+  "max_position_embeddings": 512,
+  "model_type": "distilbert",
+  "n_heads": 12,
+  "n_layers": 6,
+  "pad_token_id": 0,
+  "problem_type": "single_label_classification",
+  "qa_dropout": 0.1,
+  "seq_classif_dropout": 0.2,
+  "sinusoidal_pos_embds": false,
+  "tie_weights_": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.53.2",
+  "vocab_size": 30522
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:27147274e75dcfc6b354b51509db56323576d10e8f57e2492e8692687616c326
+size 267912544

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "DistilBertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff