Action Detection with CNN-GRU on MobileNetV2

Overview

This model performs human action classification on videos using a CNN-GRU architecture built on top of MobileNetV2 (1.0, 224) features and trained on the UCF101 dataset.
It is well-suited for recognizing actions from short trimmed video clips.


Model Details


Usage

Requirements

pip install torch torchvision opencv-python

Example Code

from action_model import load_action_model, preprocess_frames, predict_action
import cv2

# Load model
model = load_action_model(model_path="best_model.pt", device="cpu", num_classes=5)

# Read frames from video
cap = cv2.VideoCapture("path_to_video.mp4")
frames = []
while True:
    ret, frame = cap.read()
    if not ret:
        break
    frames.append(frame)
cap.release()

# Preprocess frames for model input
clip_tensor = preprocess_frames(frames[:16], seq_len=16, resize=(112,112))

# Predict action
result = predict_action(model, clip_tensor, device="cpu")
print(result)

Training & Evaluation

  • Trained on UCF101 split 1 with MobileNetV2 backbone.
  • Sequence length: 16 frames per clip.
  • Metric: Top-1 classification accuracy.

Intended Use & Limitations

Intended for:

  • Video analytics
  • Educational research
  • Baseline for video action recognition tasks

Limitations:

  • Predicts only UCF101 subset classes
  • Needs short, trimmed video clips
  • Not robust to out-of-domain videos or very low-res input

Tags

actioncnn-gruvideo-classificationucf101mobilenetv2deep-learningtorch

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for NanG01/Action_detection

Finetuned
(56)
this model