Model Card for Topic Classification Model

A fine-tuned DistilBERT model for multi-class topic classification. This model predicts the most relevant topic label from a predefined set based on input text. It was trained using 🤗 Transformers and PyTorch on a custom dataset derived from academic and news-style corpora.

Model Details

Model Description

This model was developed by Daniel (@AfroLogicInsect) to classify text into one of several predefined topics. It builds on the distilbert-base-uncased architecture and was fine-tuned for multi-class classification using a softmax output layer.

Developed by: Daniel 🇳🇬 (@AfroLogicInsect)
Model type: DistilBERT-based multi-class sequence classifier
Language(s): English
License: MIT
Finetuned from: distilbert-base-uncased

Model Sources

Repository: AfroLogicInsect/topic-model-analysis-model
Paper: arXiv:1910.09700 (DistilBERT)
Demo: [Coming soon]

Uses

Direct Use

Classify academic or news-style text into topics such as AI, finance, sports, climate, etc.
Embed in dashboards or content moderation tools for automatic tagging

Downstream Use

Can be extended to hierarchical topic classification
Useful for building recommendation engines or content filters

Out-of-Scope Use

Not suitable for sentiment or emotion classification
May not generalize well to informal or slang-heavy text

Bias, Risks, and Limitations

Trained on curated corpora — may reflect biases in source material
Topics are predefined and static — emerging topics may be misclassified
Confidence scores are probabilistic, not definitive

Recommendations

Use top_k=5 with return_all_scores=True to retrieve multiple topic predictions
Consider fine-tuning on domain-specific data for improved accuracy

How to Get Started

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="AfroLogicInsect/topic-model-analysis-model",
    tokenizer="AfroLogicInsect/topic-model-analysis-model",
    return_all_scores=True
)

text = "New AI breakthrough in natural language processing"
results = classifier(text)
top_5 = sorted(results[0], key=lambda x: x['score'], reverse=True)[:5]
for i, res in enumerate(top_5):
    print(f"Top {i+1}: {res['label']} ({res['score']:.3f})")

Training Details

Dataset

Custom multi-class topic dataset based on arXiv abstracts and news articles
Labels include domains like AI, finance, sports, climate, etc.

Hyperparameters

Epochs: 3
Batch size: 16
Learning rate: 2e-5
Evaluation every 200 steps
Metric: F1 score

Trainer Setup

Used Hugging Face Trainer API with TrainingArguments configured for early stopping and best model selection.

Evaluation

Model achieved strong performance across multiple topic categories. Evaluation metrics include:

Accuracy: ~90.8%
F1 Score: ~0.91
Precision: ~0.89
Recall: ~0.93

Environmental Impact

Hardware: Google Colab (NVIDIA T4 GPU)
Training Time: ~2.5 hours
Carbon Emitted: ~0.3 kg CO₂eq (estimated via ML Impact Calculator)

Citation

@misc{afrologicinsect2025topicmodel,
  title = {AfroLogicInsect Topic Classification Model},
  author = {Akan Daniel},
  year = {2025},
  howpublished = {\url{https://huggingface.co/AfroLogicInsect/topic-model-analysis-model}},
}

Contact

Name: Daniel (@AfroLogicInsect)
Location: Lagos, Nigeria
Contact: GitHub / Hugging Face / email ([email protected])

AfroLogicInsect
/

topic-model-analysis-model