DeepakKumarMSL's picture
Create README.md
9ff2532 verified
# 🧠 Image Classification AI Model (CIFAR-100)
This repository contains a Vision Transformer (ViT)-based AI model fine-tuned for **image classification** on the CIFAR-100 dataset. The model is built using `google/vit-base-patch16-224`, quantized to **FP16** for efficient inference, and delivers high accuracy in multi-class image classification tasks.
---
## 🚀 Features
- 🖼️ **Task**: Image Classification
- 🧠 **Base Model**: `google/vit-base-patch16-224` (Vision Transformer)
- 🧪 **Quantized**: FP16 for faster and memory-efficient inference
- 🎯 **Dataset**: 100 fine-grained object categories
-**CUDA Enabled**: Optimized for GPU acceleration
- 📈 **High Accuracy**: Fine-tuned and evaluated on validation split
---
## 📊 Dataset Used
**Hugging Face Dataset**: [`tanganke/cifar100`](https://huggingface.co/datasets/tanganke/cifar100)
- **Description**: CIFAR-100 is a dataset of 60,000 32×32 color images in 100 classes (600 images per class)
- **Split**: 50,000 training images and 10,000 test images
- **Categories**: Animals, Vehicles, Food, Household items, etc.
- **License**: MIT License (from source)
```python
from datasets import load_dataset
dataset = load_dataset("tanganke/cifar100")
```
## 🛠️ Model & Training Configuration
- Model: google/vit-base-patch16-224
- Image Size: 224x224 (resized from 32x32)
- Framework: Hugging Face Transformers & Datasets
- Training Environment: Kaggle Notebook with CUDA
- Epochs: 5–10 (with early stopping)
- Batch Size: 32
- Optimizer: AdamW
- Loss Function: CrossEntropyLoss
# ✅ Evaluation & Scoring
- Accuracy: ~70–80% (varies by configuration)
- Validation Tool: evaluate or sklearn.metrics
- Metric: Accuracy, Top-1 and Top-5 scores
- Inference Speed: Significantly faster after quantizationextractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
# 🔍 Inference Example
```python
from PIL import Image
import torch
def predict(image_path):
image = Image.open(image_path).convert("RGB")
inputs = feature_extractor(images=image, return_tensors="pt").to("cuda")
outputs = model(**inputs)
logits = outputs.logits
predicted_class = logits.argmax(-1).item()
return dataset["train"].features["fine_label"].int2str(predicted_class)
print(predict("sample_image.jpg"))
```
# 📁 Folder Structure
📦image-classification-vit
┣ 📂vit-cifar100-fp16
┣ 📜train.py
┣ 📜inference.py
┣ 📜README.md
┗ 📜requirements.txt