metadata

datasets:
  - custom
language:
  - en
license: apache-2.0
pipeline_tag: text-classification
library_name: transformers
tags:
  - LLM
  - classification
  - instruction-tuned
  - multi-label
  - qwen

BenchHub-Cat-7b

Project page: https://huggingface.co/BenchHub. Code: https://github.com/rladmstn1714/BenchHub

BenchHub-Cat-7b is a category classification model based on Qwen2.5-7B, fine-tuned to assign natural language queries to structured category triplets: (subject, skill, target).

🔧 Model Details

Base Model: Qwen2.5-7B-Instruct
Task: Structured multi-label classification (triple: subject, skill, target)
Prompting Style: Instruction-style with expected format output
Training Framework: Axolotl + DeepSpeed ZeRO-3

🧪 Training Configuration

Hyperparameter	Value
Sequence Length	8192
Learning Rate	2 × 10⁻⁵
Batch Size (Effective)	256
Epochs	3
Scheduler	Cosine Decay
Warmup Ratio	0.05
Optimizer	Method from [19]
Trainer	DeepSpeed ZeRO-3
Hardware	4× A6000 48GB GPUs
Training Time	~5 hours per run

🧠 Intended Use

Input: Natural language question or instruction
Output: Triplet (subject, skill, target), such as:

{ "subject_type": "history",  
"task_type": "reasoning",  
"target_type": "korea"}

✨ Prompt Example

### Instruction:
Classify the following query into subject, skill, and target.

### Query:
How did Confucianism shape education in East Asia?

### Output:
{ "subject_type": "history",  
"task_type": "reasoning",  
"target_type": "korea"}

📜 License

Apache 2.0