File size: 3,930 Bytes
9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 9e45a43 46db3f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
---
library_name: transformers
tags:
- topic
- multi-sentiment
license: mit
datasets:
- valurank/Topic_Classification
language:
- en
metrics:
- accuracy
- f1
- precision
- recall
base_model:
- distilbert/distilbert-base-uncased
---
# Model Card for Topic Classification Model
A fine-tuned DistilBERT model for multi-class topic classification. This model predicts the most relevant topic label from a predefined set based on input text. It was trained using 🤗 Transformers and PyTorch on a custom dataset derived from academic and news-style corpora.
## Model Details
### Model Description
This model was developed by Daniel (@AfroLogicInsect) to classify text into one of several predefined topics. It builds on the `distilbert-base-uncased` architecture and was fine-tuned for multi-class classification using a softmax output layer.
- **Developed by:** Daniel 🇳🇬 (@AfroLogicInsect)
- **Model type:** DistilBERT-based multi-class sequence classifier
- **Language(s):** English
- **License:** MIT
- **Finetuned from:** distilbert-base-uncased
### Model Sources
- **Repository:** [AfroLogicInsect/topic-model-analysis-model](https://huggingface.co/AfroLogicInsect/topic-model-analysis-model)
- **Paper:** arXiv:1910.09700 (DistilBERT)
- **Demo:** [Coming soon]
## Uses
### Direct Use
- Classify academic or news-style text into topics such as AI, finance, sports, climate, etc.
- Embed in dashboards or content moderation tools for automatic tagging
### Downstream Use
- Can be extended to hierarchical topic classification
- Useful for building recommendation engines or content filters
### Out-of-Scope Use
- Not suitable for sentiment or emotion classification
- May not generalize well to informal or slang-heavy text
## Bias, Risks, and Limitations
- Trained on curated corpora — may reflect biases in source material
- Topics are predefined and static — emerging topics may be misclassified
- Confidence scores are probabilistic, not definitive
### Recommendations
- Use `top_k=5` with `return_all_scores=True` to retrieve multiple topic predictions
- Consider fine-tuning on domain-specific data for improved accuracy
## How to Get Started
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="AfroLogicInsect/topic-model-analysis-model",
tokenizer="AfroLogicInsect/topic-model-analysis-model",
return_all_scores=True
)
text = "New AI breakthrough in natural language processing"
results = classifier(text)
top_5 = sorted(results[0], key=lambda x: x['score'], reverse=True)[:5]
for i, res in enumerate(top_5):
print(f"Top {i+1}: {res['label']} ({res['score']:.3f})")
```
## Training Details
### Dataset
- Custom multi-class topic dataset based on arXiv abstracts and news articles
- Labels include domains like AI, finance, sports, climate, etc.
### Hyperparameters
- Epochs: 3
- Batch size: 16
- Learning rate: 2e-5
- Evaluation every 200 steps
- Metric: F1 score
### Trainer Setup
Used Hugging Face `Trainer` API with `TrainingArguments` configured for early stopping and best model selection.
## Evaluation
Model achieved strong performance across multiple topic categories. Evaluation metrics include:
- **Accuracy:** ~90.8%
- **F1 Score:** ~0.91
- **Precision:** ~0.89
- **Recall:** ~0.93
## Environmental Impact
- **Hardware:** Google Colab (NVIDIA T4 GPU)
- **Training Time:** ~2.5 hours
- **Carbon Emitted:** ~0.3 kg CO₂eq (estimated via [ML Impact Calculator](https://mlco2.github.io/impact#compute))
## Citation
```bibtex
@misc{afrologicinsect2025topicmodel,
title = {AfroLogicInsect Topic Classification Model},
author = {Akan Daniel},
year = {2025},
howpublished = {\url{https://huggingface.co/AfroLogicInsect/topic-model-analysis-model}},
}
```
## Contact
- Name: Daniel (@AfroLogicInsect)
- Location: Lagos, Nigeria
- Contact: GitHub / Hugging Face / email ([email protected]) |