NanG01 commited on
Commit
2d4f1b0
verified
1 Parent(s): a775f28

Update README.md

Browse files

![diagram-export-27-08-2025-19_58_02.png](https://cdn-uploads.huggingface.co/production/uploads/6883d9803cf41741e3a9f69a/XqEZybue6-StX1ayUt2kl.png)

Files changed (1) hide show
  1. README.md +103 -3
README.md CHANGED
@@ -1,3 +1,103 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - abdallahwagih/ucf101-videos
5
+ metrics:
6
+ - accuracy
7
+ base_model:
8
+ - google/mobilenet_v2_1.0_224
9
+ pipeline_tag: video-classification
10
+
11
+ tags:
12
+ - action-recognition
13
+ - cnn-gru
14
+ - video-classification
15
+ - ucf101
16
+ - action
17
+ - mobilenetv2
18
+ - deep-learning
19
+ - pytorch
20
+ ---
21
+
22
+ # Action Detection with CNN-GRU on MobileNetV2
23
+
24
+ ## Overview
25
+
26
+ This model performs human action classification on videos using a [CNN-GRU architecture](https://arxiv.org/abs/1412.7753) built on top of **MobileNetV2 (1.0, 224)** features and trained on the [UCF101](https://www.crcv.ucf.edu/data/UCF101.php) dataset.
27
+ It is well-suited for recognizing actions from short trimmed video clips.
28
+
29
+ ***
30
+
31
+ ## Model Details
32
+
33
+ - **Base model:** `google/mobilenet_v2_1.0_224`
34
+ - **Architecture:** CNN-GRU
35
+ - **Dataset:** UCF101 - Action Recognition Dataset (https://www.kaggle.com/datasets/abdallahwagih/ucf101-videos)
36
+ - **Task:** Video Classification (Action Recognition)
37
+ - **Metrics:** Accuracy
38
+ - **License:** MIT
39
+
40
+ ***
41
+
42
+ ## Usage
43
+
44
+ ### Requirements
45
+
46
+ ```bash
47
+ pip install torch torchvision opencv-python
48
+ ```
49
+
50
+ ### Example Code
51
+
52
+ ```python
53
+ from action_model import load_action_model, preprocess_frames, predict_action
54
+ import cv2
55
+
56
+ # Load model
57
+ model = load_action_model(model_path="best_model.pt", device="cpu", num_classes=5)
58
+
59
+ # Read frames from video
60
+ cap = cv2.VideoCapture("path_to_video.mp4")
61
+ frames = []
62
+ while True:
63
+ ret, frame = cap.read()
64
+ if not ret:
65
+ break
66
+ frames.append(frame)
67
+ cap.release()
68
+
69
+ # Preprocess frames for model input
70
+ clip_tensor = preprocess_frames(frames[:16], seq_len=16, resize=(112,112))
71
+
72
+ # Predict action
73
+ result = predict_action(model, clip_tensor, device="cpu")
74
+ print(result)
75
+ ```
76
+
77
+ ***
78
+
79
+ ## Training & Evaluation
80
+
81
+ - Trained on UCF101 split 1 with MobileNetV2 backbone.
82
+ - Sequence length: 16 frames per clip.
83
+ - Metric: Top-1 classification accuracy.
84
+
85
+ ***
86
+
87
+ ## Intended Use & Limitations
88
+
89
+ **Intended for:**
90
+ - Video analytics
91
+ - Educational research
92
+ - Baseline for video action recognition tasks
93
+
94
+ **Limitations:**
95
+ - Predicts only UCF101 subset classes
96
+ - Needs short, trimmed video clips
97
+ - Not robust to out-of-domain videos or very low-res input
98
+
99
+ ***
100
+
101
+ ## Tags
102
+
103
+ `action` 路 `cnn-gru` 路 `video-classification` 路 `ucf101` 路 `mobilenetv2` 路 `deep-learning` 路 `torch`