DeepakKumarMSL's picture
Create README.md
9ff2532 verified

๐Ÿง  Image Classification AI Model (CIFAR-100)

This repository contains a Vision Transformer (ViT)-based AI model fine-tuned for image classification on the CIFAR-100 dataset. The model is built using google/vit-base-patch16-224, quantized to FP16 for efficient inference, and delivers high accuracy in multi-class image classification tasks.


๐Ÿš€ Features

  • ๐Ÿ–ผ๏ธ Task: Image Classification
  • ๐Ÿง  Base Model: google/vit-base-patch16-224 (Vision Transformer)
  • ๐Ÿงช Quantized: FP16 for faster and memory-efficient inference
  • ๐ŸŽฏ Dataset: 100 fine-grained object categories
  • โšก CUDA Enabled: Optimized for GPU acceleration
  • ๐Ÿ“ˆ High Accuracy: Fine-tuned and evaluated on validation split

๐Ÿ“Š Dataset Used

Hugging Face Dataset: tanganke/cifar100

  • Description: CIFAR-100 is a dataset of 60,000 32ร—32 color images in 100 classes (600 images per class)
  • Split: 50,000 training images and 10,000 test images
  • Categories: Animals, Vehicles, Food, Household items, etc.
  • License: MIT License (from source)
from datasets import load_dataset

dataset = load_dataset("tanganke/cifar100")

๐Ÿ› ๏ธ Model & Training Configuration

  • Model: google/vit-base-patch16-224

  • Image Size: 224x224 (resized from 32x32)

  • Framework: Hugging Face Transformers & Datasets

  • Training Environment: Kaggle Notebook with CUDA

  • Epochs: 5โ€“10 (with early stopping)

  • Batch Size: 32

  • Optimizer: AdamW

  • Loss Function: CrossEntropyLoss

โœ… Evaluation & Scoring

  • Accuracy: ~70โ€“80% (varies by configuration)

  • Validation Tool: evaluate or sklearn.metrics

  • Metric: Accuracy, Top-1 and Top-5 scores

  • Inference Speed: Significantly faster after quantizationextractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")

๐Ÿ” Inference Example

from PIL import Image
import torch

def predict(image_path):
    image = Image.open(image_path).convert("RGB")
    inputs = feature_extractor(images=image, return_tensors="pt").to("cuda")
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = logits.argmax(-1).item()
    return dataset["train"].features["fine_label"].int2str(predicted_class)

print(predict("sample_image.jpg"))

๐Ÿ“ Folder Structure

๐Ÿ“ฆimage-classification-vit โ”ฃ ๐Ÿ“‚vit-cifar100-fp16 โ”ฃ ๐Ÿ“œtrain.py โ”ฃ ๐Ÿ“œinference.py โ”ฃ ๐Ÿ“œREADME.md โ”— ๐Ÿ“œrequirements.txt