代码旅行

Upload folder using huggingface_hub

a1ea779 verified about 1 month ago

4.42 kB

	---
	language: en
	license: apache-2.0
	library_name: transformers
	tags:
	- distilbert
	- text-classification
	- emotion-analysis
	- pytorch
	- mac-m4-test
	pipeline_tag: text-classification
	widget:
	- text: "I'm so excited to try out the new Mac Mini M4 for machine learning!"
	example_title: "Excitement Example"
	- text: "I'm a bit worried about the performance on complex tasks."
	example_title: "Worry Example"
	- text: "I am so grateful for all the help you have provided."
	example_title: "Gratitude Example"
	---

	# `distilbert-base-uncased` Finetuned for Emotion Analysis

	这是一个基于 `distilbert-base-uncased` 微调的、能够识别 28种细粒度情感的分析模型。

	特别说明：这个模型主要是为了测试在新款 Mac Mini M4 上进行本地模型微调的流程和性能而创建的一次技术尝试。因此，它没有经过详尽的评估，主要用于演示和实验目的。

	## 模型描述

	本模型可以识别文本中蕴含的 28 种不同的情绪。这比传统的情感分析（如积极/消极/中性）提供了更丰富、更细致的视角。

	完整的标签列表如下：

	\| Label \| 中文 \| Label \| 中文 \|
	\| :--- \| :--- \| :--- \| :--- \|
	\| `admiration` \| 钦佩 \| `gratitude` \| 感谢 \|
	\| `amusement` \| 娱乐 \| `grief` \| 悲痛 \|
	\| `anger` \| 愤怒 \| `joy` \| 开心 \|
	\| `annoyance` \| 烦躁 \| `love` \| 爱 \|
	\| `approval` \| 认同 \| `nervousness` \| 紧张 \|
	\| `caring` \| 关心 \| `optimism` \| 乐观 \|
	\| `confusion` \| 困惑 \| `pride` \| 自豪 \|
	\| `curiosity` \| 好奇 \| `realization` \| 顿悟 \|
	\| `desire` \| 渴望 \| `relief` \| 如释重负 \|
	\| `disappointment` \| 失望 \| `remorse` \| 懊悔 \|
	\| `disapproval` \| 不认同 \| `sadness` \| 悲伤 \|
	\| `disgust` \| 厌恶 \| `surprise` \| 惊讶 \|
	\| `embarrassment`\| 尴尬 \| `neutral` \| 中性 \|
	\| `excitement` \| 激动 \| `fear` \| 害怕 \|


	## 如何使用 (How to Use)

	你可以通过 `transformers` 库的 `pipeline` 轻松使用这个模型。

	```python
	from transformers import pipeline

	# 使用模型 ID 加载 pipeline
	model_id = "tourcoder/distilbert-base-uncased-finetuned-emotion-analysis"
	emotion_classifier = pipeline("text-classification", model=model_id)

	# 进行预测
	text = "I can't believe I finished the project, I am so relieved!"
	results = emotion_classifier(text)

	print(results)
	# 预期输出: [{'label': 'relief', 'score': 0.9...}]
	```

	### 在 Apple Silicon (M1/M2/M3/M4) 上运行

	如果你在 Mac 上使用，可以指定设备为 `"mps"` 来利用 Apple Silicon 的 GPU 加速。

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# 模型ID
	model_id = "tourcoder/distilbert-base-uncased-finetuned-emotion-analysis"

	# 检查 MPS 是否可用
	device = "mps" if torch.backends.mps.is_available() else "cpu"
	print(f"Using device: {device}")

	# 加载模型和分词器
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForSequenceClassification.from_pretrained(model_id).to(device)

	# 准备输入
	text = "This experiment on the Mac Mini M4 was a great success!"
	inputs = tokenizer(text, return_tensors="pt").to(device)

	# 推理
	with torch.no_grad():
	logits = model(**inputs).logits

	# 获取预测结果
	predicted_class_id = logits.argmax().item()
	predicted_label = model.config.id2label[predicted_class_id]

	print(f"Text: '{text}'")
	print(f"Predicted emotion: {predicted_label}")
	# 预期输出: Predicted emotion: joy (或 pride / admiration)
	```

	## 训练与实验说明

	* 实验目的: 验证和体验在 Mac Mini (M4 芯片) 上使用 PyTorch 和 `transformers` 库进行本地模型微调的完整流程。
	* 硬件: Apple Mac Mini (M4 Chip)
	* 框架: PyTorch (利用 MPS 后端进行加速)
	* 基础模型: `distilbert-base-uncased`
	* 数据集: 该模型使用了包含28个情感标签的数据集进行微调，自制数据集。
	* 免责声明: 这是一个概念验证（Proof of Concept）模型。其性能和鲁棒性未经过严格测试，不建议直接用于生产环境。

	## 局限性 (Limitations)

	* `distilbert` 是一个轻量级模型，虽然速度快，但在理解复杂和细微的情感上可能不如更大的模型（如 `RoBERTa` 或 `DeBERTa`）。
	* 模型的表现高度依赖于其训练数据。对于训练集中未涵盖的文本风格或领域，其预测可能不准确。
	* 模型可能会反映出训练数据中存在的偏见。