Action Detection with CNN-GRU on MobileNetV2
Overview
This model performs human action classification on videos using a CNN-GRU architecture built on top of MobileNetV2 (1.0, 224) features and trained on the UCF101 dataset.
It is well-suited for recognizing actions from short trimmed video clips.
Model Details
Base model:
google/mobilenet_v2_1.0_224
Architecture: CNN-GRU
Dataset: UCF101 - Action Recognition Dataset (https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos)
Task: Video Classification (Action Recognition)
Metrics: Accuracy
License: MIT
Usage
Requirements
pip install torch torchvision opencv-python
Example Code
from action_model import load_action_model, preprocess_frames, predict_action
import cv2
# Load model
model = load_action_model(model_path="best_model.pt", device="cpu", num_classes=5)
# Read frames from video
cap = cv2.VideoCapture("path_to_video.mp4")
frames = []
while True:
ret, frame = cap.read()
if not ret:
break
frames.append(frame)
cap.release()
# Preprocess frames for model input
clip_tensor = preprocess_frames(frames[:16], seq_len=16, resize=(112,112))
# Predict action
result = predict_action(model, clip_tensor, device="cpu")
print(result)
Training & Evaluation
- Trained on UCF101 split 1 with MobileNetV2 backbone.
- Sequence length: 16 frames per clip.
- Metric: Top-1 classification accuracy.
Intended Use & Limitations
Intended for:
- Video analytics
- Educational research
- Baseline for video action recognition tasks
Limitations:
- Predicts only UCF101 subset classes
- Needs short, trimmed video clips
- Not robust to out-of-domain videos or very low-res input
Tags
action
路 cnn-gru
路 video-classification
路 ucf101
路 mobilenetv2
路 deep-learning
路 torch
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
馃檵
Ask for provider support
Model tree for NanG01/Action_detection
Base model
google/mobilenet_v2_1.0_224