Model Card for Topic Classification Model

A fine-tuned DistilBERT model for multi-class topic classification. This model predicts the most relevant topic label from a predefined set based on input text. It was trained using 🤗 Transformers and PyTorch on a custom dataset derived from academic and news-style corpora.

Model Details

Model Description

This model was developed by Daniel (@AfroLogicInsect) to classify text into one of several predefined topics. It builds on the distilbert-base-uncased architecture and was fine-tuned for multi-class classification using a softmax output layer.

  • Developed by: Daniel 🇳🇬 (@AfroLogicInsect)
  • Model type: DistilBERT-based multi-class sequence classifier
  • Language(s): English
  • License: MIT
  • Finetuned from: distilbert-base-uncased

Model Sources

Uses

Direct Use

  • Classify academic or news-style text into topics such as AI, finance, sports, climate, etc.
  • Embed in dashboards or content moderation tools for automatic tagging

Downstream Use

  • Can be extended to hierarchical topic classification
  • Useful for building recommendation engines or content filters

Out-of-Scope Use

  • Not suitable for sentiment or emotion classification
  • May not generalize well to informal or slang-heavy text

Bias, Risks, and Limitations

  • Trained on curated corpora — may reflect biases in source material
  • Topics are predefined and static — emerging topics may be misclassified
  • Confidence scores are probabilistic, not definitive

Recommendations

  • Use top_k=5 with return_all_scores=True to retrieve multiple topic predictions
  • Consider fine-tuning on domain-specific data for improved accuracy

How to Get Started

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="AfroLogicInsect/topic-model-analysis-model",
    tokenizer="AfroLogicInsect/topic-model-analysis-model",
    return_all_scores=True
)

text = "New AI breakthrough in natural language processing"
results = classifier(text)
top_5 = sorted(results[0], key=lambda x: x['score'], reverse=True)[:5]
for i, res in enumerate(top_5):
    print(f"Top {i+1}: {res['label']} ({res['score']:.3f})")

Training Details

Dataset

  • Custom multi-class topic dataset based on arXiv abstracts and news articles
  • Labels include domains like AI, finance, sports, climate, etc.

Hyperparameters

  • Epochs: 3
  • Batch size: 16
  • Learning rate: 2e-5
  • Evaluation every 200 steps
  • Metric: F1 score

Trainer Setup

Used Hugging Face Trainer API with TrainingArguments configured for early stopping and best model selection.

Evaluation

Model achieved strong performance across multiple topic categories. Evaluation metrics include:

  • Accuracy: ~90.8%
  • F1 Score: ~0.91
  • Precision: ~0.89
  • Recall: ~0.93

Environmental Impact

  • Hardware: Google Colab (NVIDIA T4 GPU)
  • Training Time: ~2.5 hours
  • Carbon Emitted: ~0.3 kg CO₂eq (estimated via ML Impact Calculator)

Citation

@misc{afrologicinsect2025topicmodel,
  title = {AfroLogicInsect Topic Classification Model},
  author = {Akan Daniel},
  year = {2025},
  howpublished = {\url{https://huggingface.co/AfroLogicInsect/topic-model-analysis-model}},
}

Contact

  • Name: Daniel (@AfroLogicInsect)
  • Location: Lagos, Nigeria
  • Contact: GitHub / Hugging Face / email ([email protected])
Downloads last month
8
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AfroLogicInsect/topic-model-analysis-model

Finetuned
(9263)
this model

Dataset used to train AfroLogicInsect/topic-model-analysis-model

Space using AfroLogicInsect/topic-model-analysis-model 1