File size: 4,417 Bytes

a1ea779

---
language: en
license: apache-2.0
library_name: transformers
tags:
- distilbert
- text-classification
- emotion-analysis
- pytorch
- mac-m4-test
pipeline_tag: text-classification
widget:
- text: "I'm so excited to try out the new Mac Mini M4 for machine learning!"
  example_title: "Excitement Example"
- text: "I'm a bit worried about the performance on complex tasks."
  example_title: "Worry Example"
- text: "I am so grateful for all the help you have provided."
  example_title: "Gratitude Example"
---

# `distilbert-base-uncased` Finetuned for Emotion Analysis

这是一个基于 `distilbert-base-uncased` 微调的、能够识别 **28种细粒度情感** 的分析模型。

**特别说明：这个模型主要是为了测试在新款 Mac Mini M4 上进行本地模型微调的流程和性能而创建的一次技术尝试。因此，它没有经过详尽的评估，主要用于演示和实验目的。**

## 模型描述

本模型可以识别文本中蕴含的 28 种不同的情绪。这比传统的情感分析（如积极/消极/中性）提供了更丰富、更细致的视角。

完整的标签列表如下：

| Label | 中文 | Label | 中文 |
| :--- | :--- | :--- | :--- |
| `admiration` | 钦佩 | `gratitude` | 感谢 |
| `amusement` | 娱乐 | `grief` | 悲痛 |
| `anger` | 愤怒 | `joy` | 开心 |
| `annoyance` | 烦躁 | `love` | 爱 |
| `approval` | 认同 | `nervousness` | 紧张 |
| `caring` | 关心 | `optimism` | 乐观 |
| `confusion` | 困惑 | `pride` | 自豪 |
| `curiosity` | 好奇 | `realization` | 顿悟 |
| `desire` | 渴望 | `relief` | 如释重负 |
| `disappointment` | 失望 | `remorse` | 懊悔 |
| `disapproval` | 不认同 | `sadness` | 悲伤 |
| `disgust` | 厌恶 | `surprise` | 惊讶 |
| `embarrassment`| 尴尬 | `neutral` | 中性 |
| `excitement` | 激动 | `fear` | 害怕 |


## 如何使用 (How to Use)

你可以通过 `transformers` 库的 `pipeline` 轻松使用这个模型。

```python
from transformers import pipeline

# 使用模型 ID 加载 pipeline
model_id = "tourcoder/distilbert-base-uncased-finetuned-emotion-analysis"
emotion_classifier = pipeline("text-classification", model=model_id)

# 进行预测
text = "I can't believe I finished the project, I am so relieved!"
results = emotion_classifier(text)

print(results)
# 预期输出: [{'label': 'relief', 'score': 0.9...}]
```

### 在 Apple Silicon (M1/M2/M3/M4) 上运行

如果你在 Mac 上使用，可以指定设备为 `"mps"` 来利用 Apple Silicon 的 GPU 加速。

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# 模型ID
model_id = "tourcoder/distilbert-base-uncased-finetuned-emotion-analysis"

# 检查 MPS 是否可用
device = "mps" if torch.backends.mps.is_available() else "cpu"
print(f"Using device: {device}")

# 加载模型和分词器
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id).to(device)

# 准备输入
text = "This experiment on the Mac Mini M4 was a great success!"
inputs = tokenizer(text, return_tensors="pt").to(device)

# 推理
with torch.no_grad():
    logits = model(**inputs).logits

# 获取预测结果
predicted_class_id = logits.argmax().item()
predicted_label = model.config.id2label[predicted_class_id]

print(f"Text: '{text}'")
print(f"Predicted emotion: {predicted_label}")
# 预期输出: Predicted emotion: joy (或 pride / admiration)
```

## 训练与实验说明

* **实验目的**: 验证和体验在 **Mac Mini (M4 芯片)** 上使用 PyTorch 和 `transformers` 库进行本地模型微调的完整流程。
* **硬件**: Apple Mac Mini (M4 Chip)
* **框架**: PyTorch (利用 MPS 后端进行加速)
* **基础模型**: `distilbert-base-uncased`
* **数据集**: 该模型使用了包含28个情感标签的数据集进行微调，自制数据集。
* **免责声明**: 这是一个概念验证（Proof of Concept）模型。其性能和鲁棒性未经过严格测试，不建议直接用于生产环境。

## 局限性 (Limitations)

* `distilbert` 是一个轻量级模型，虽然速度快，但在理解复杂和细微的情感上可能不如更大的模型（如 `RoBERTa` 或 `DeBERTa`）。
* 模型的表现高度依赖于其训练数据。对于训练集中未涵盖的文本风格或领域，其预测可能不准确。
* 模型可能会反映出训练数据中存在的偏见。