Action_detection / README.md
NanG01's picture
Updated with architecture
0db24a7 verified
---
license: mit
datasets:
- abdallahwagih/ucf101-videos
metrics:
- accuracy
base_model:
- google/mobilenet_v2_1.0_224
pipeline_tag: video-classification
tags:
- action-recognition
- cnn-gru
- video-classification
- ucf101
- action
- mobilenetv2
- deep-learning
- pytorch
---
# Action Detection with CNN-GRU on MobileNetV2
## Overview
This model performs human action classification on videos using a CNN-GRU architecture built on top of **MobileNetV2 (1.0, 224)** features and trained on the [UCF101](https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos) dataset.
It is well-suited for recognizing actions from short trimmed video clips.
***
## Model Details
- **Base model:** `google/mobilenet_v2_1.0_224`
- **Architecture:** CNN-GRU
![CNN-GRU Architecture](./cnn_architecture.png)
- **Dataset:** UCF101 - Action Recognition Dataset (https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos)
- **Task:** Video Classification (Action Recognition)
- **Metrics:** Accuracy
- **License:** MIT
***
## Usage
### Requirements
```bash
pip install torch torchvision opencv-python
```
### Example Code
```python
from action_model import load_action_model, preprocess_frames, predict_action
import cv2
# Load model
model = load_action_model(model_path="best_model.pt", device="cpu", num_classes=5)
# Read frames from video
cap = cv2.VideoCapture("path_to_video.mp4")
frames = []
while True:
ret, frame = cap.read()
if not ret:
break
frames.append(frame)
cap.release()
# Preprocess frames for model input
clip_tensor = preprocess_frames(frames[:16], seq_len=16, resize=(112,112))
# Predict action
result = predict_action(model, clip_tensor, device="cpu")
print(result)
```
***
## Training & Evaluation
- Trained on UCF101 split 1 with MobileNetV2 backbone.
- Sequence length: 16 frames per clip.
- Metric: Top-1 classification accuracy.
***
## Intended Use & Limitations
**Intended for:**
- Video analytics
- Educational research
- Baseline for video action recognition tasks
**Limitations:**
- Predicts only UCF101 subset classes
- Needs short, trimmed video clips
- Not robust to out-of-domain videos or very low-res input
***
## Tags
`action` 路 `cnn-gru` 路 `video-classification` 路 `ucf101` 路 `mobilenetv2` 路 `deep-learning` 路 `torch`