LPX55 commited on
Commit
a786828
·
verified ·
1 Parent(s): 572a990

Delete sav_dataset

Browse files
sav_dataset/LICENSE DELETED
@@ -1,30 +0,0 @@
1
- BSD License
2
-
3
- For SAM 2 Eval software
4
-
5
- Copyright (c) Meta Platforms, Inc. and affiliates.
6
-
7
- Redistribution and use in source and binary forms, with or without modification,
8
- are permitted provided that the following conditions are met:
9
-
10
- * Redistributions of source code must retain the above copyright notice, this
11
- list of conditions and the following disclaimer.
12
-
13
- * Redistributions in binary form must reproduce the above copyright notice,
14
- this list of conditions and the following disclaimer in the documentation
15
- and/or other materials provided with the distribution.
16
-
17
- * Neither the name Meta nor the names of its contributors may be used to
18
- endorse or promote products derived from this software without specific
19
- prior written permission.
20
-
21
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
22
- ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
23
- WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
24
- DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
25
- ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
26
- (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
27
- LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
28
- ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
29
- (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
30
- SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
sav_dataset/LICENSE_DAVIS DELETED
@@ -1,29 +0,0 @@
1
- BSD 3-Clause License
2
-
3
- Copyright (c) 2020, DAVIS: Densely Annotated VIdeo Segmentation
4
- All rights reserved.
5
-
6
- Redistribution and use in source and binary forms, with or without
7
- modification, are permitted provided that the following conditions are met:
8
-
9
- 1. Redistributions of source code must retain the above copyright notice, this
10
- list of conditions and the following disclaimer.
11
-
12
- 2. Redistributions in binary form must reproduce the above copyright notice,
13
- this list of conditions and the following disclaimer in the documentation
14
- and/or other materials provided with the distribution.
15
-
16
- 3. Neither the name of the copyright holder nor the names of its
17
- contributors may be used to endorse or promote products derived from
18
- this software without specific prior written permission.
19
-
20
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21
- AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22
- IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23
- DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24
- FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25
- DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26
- SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27
- CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28
- OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29
- OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
sav_dataset/LICENSE_VOS_BENCHMARK DELETED
@@ -1,7 +0,0 @@
1
- Copyright 2023 Rex Cheng
2
-
3
- Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4
-
5
- The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6
-
7
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 
 
 
 
 
 
 
 
sav_dataset/README.md DELETED
@@ -1,164 +0,0 @@
1
- # Segment Anything Video (SA-V) Dataset
2
-
3
- ## Overview
4
-
5
- [Segment Anything Video (SA-V)](https://ai.meta.com/datasets/segment-anything-video/), consists of 51K diverse videos and 643K high-quality spatio-temporal segmentation masks (i.e., masklets). The dataset is released under the CC by 4.0 license. Browse the dataset [here](https://sam2.metademolab.com/dataset).
6
-
7
- ![SA-V dataset](../assets/sa_v_dataset.jpg?raw=true)
8
-
9
- ## Getting Started
10
-
11
- ### Download the dataset
12
-
13
- Visit [here](https://ai.meta.com/datasets/segment-anything-video-downloads/) to download SA-V including the training, val and test sets.
14
-
15
- ### Dataset Stats
16
-
17
- | | Num Videos | Num Masklets |
18
- | ---------- | ---------- | ----------------------------------------- |
19
- | SA-V train | 50,583 | 642,036 (auto 451,720 and manual 190,316) |
20
- | SA-V val | 155 | 293 |
21
- | SA-V test | 150 | 278 |
22
-
23
- ### Notebooks
24
-
25
- To load and visualize the SA-V training set annotations, refer to the example [sav_visualization_example.ipynb](./sav_visualization_example.ipynb) notebook.
26
-
27
- ### SA-V train
28
-
29
- For SA-V training set we release the mp4 videos and store the masklet annotations per video as json files . Automatic masklets and manual masklets are stored separately as two json files: `{video_id}_auto.json` and `{video_id}_manual.json`. They can be loaded as dictionaries in python in the format below.
30
-
31
- ```
32
- {
33
- "video_id" : str; video id
34
- "video_duration" : float64; the duration in seconds of this video
35
- "video_frame_count" : float64; the number of frames in the video
36
- "video_height" : float64; the height of the video
37
- "video_width" : float64; the width of the video
38
- "video_resolution" : float64; video_height $\times$ video_width
39
- "video_environment" : List[str]; "Indoor" or "Outdoor"
40
- "video_split" : str; "train" for training set
41
- "masklet" : List[List[Dict]]; masklet annotations in list of list of RLEs.
42
- The outer list is over frames in the video and the inner list
43
- is over objects in the video.
44
- "masklet_id" : List[int]; the masklet ids
45
- "masklet_size_rel" : List[float]; the average mask area normalized by resolution
46
- across all the frames where the object is visible
47
- "masklet_size_abs" : List[float]; the average mask area (in pixels)
48
- across all the frames where the object is visible
49
- "masklet_size_bucket" : List[str]; "small": $1$ <= masklet_size_abs < $32^2$,
50
- "medium": $32^2$ <= masklet_size_abs < $96^2$,
51
- and "large": masklet_size_abs > $96^2$
52
- "masklet_visibility_changes" : List[int]; the number of times where the visibility changes
53
- after the first appearance (e.g., invisible -> visible
54
- or visible -> invisible)
55
- "masklet_first_appeared_frame" : List[int]; the index of the frame where the object appears
56
- the first time in the video. Always 0 for auto masklets.
57
- "masklet_frame_count" : List[int]; the number of frames being annotated. Note that
58
- videos are annotated at 6 fps (annotated every 4 frames)
59
- while the videos are at 24 fps.
60
- "masklet_edited_frame_count" : List[int]; the number of frames being edited by human annotators.
61
- Always 0 for auto masklets.
62
- "masklet_type" : List[str]; "auto" or "manual"
63
- "masklet_stability_score" : Optional[List[List[float]]]; per-mask stability scores. Auto annotation only.
64
- "masklet_num" : int; the number of manual/auto masklets in the video
65
-
66
- }
67
- ```
68
-
69
- Note that in SA-V train, there are in total 50,583 videos where all of them have manual annotations. Among the 50,583 videos there are 48,436 videos that also have automatic annotations.
70
-
71
- ### SA-V val and test
72
-
73
- For SA-V val and test sets, we release the extracted frames as jpeg files, and the masks as png files with the following directory structure:
74
-
75
- ```
76
- sav_val(sav_test)
77
- ├── sav_val.txt (sav_test.txt): a list of video ids in the split
78
- ├── JPEGImages_24fps # videos are extracted at 24 fps
79
- │ ├── {video_id}
80
- │ │ ├── 00000.jpg # video frame
81
- │ │ ├── 00001.jpg # video frame
82
- │ │ ├── 00002.jpg # video frame
83
- │ │ ├── 00003.jpg # video frame
84
- │ │ └── ...
85
- │ ├── {video_id}
86
- │ ├── {video_id}
87
- │ └── ...
88
- └── Annotations_6fps # videos are annotated at 6 fps
89
- ├── {video_id}
90
- │ ├── 000 # obj 000
91
- │ │ ├── 00000.png # mask for object 000 in 00000.jpg
92
- │ │ ├── 00004.png # mask for object 000 in 00004.jpg
93
- │ │ ├── 00008.png # mask for object 000 in 00008.jpg
94
- │ │ ├── 00012.png # mask for object 000 in 00012.jpg
95
- │ │ └── ...
96
- │ ├── 001 # obj 001
97
- │ ├── 002 # obj 002
98
- │ └── ...
99
- ├── {video_id}
100
- ├── {video_id}
101
- └── ...
102
- ```
103
-
104
- All masklets in val and test sets are manually annotated in every frame by annotators. For each annotated object in a video, we store the annotated masks in a single png. This is because the annotated objects may overlap, e.g., it is possible in our SA-V dataset for there to be a mask for the whole person as well as a separate mask for their hands.
105
-
106
- ## SA-V Val and Test Evaluation
107
-
108
- We provide an evaluator to compute the common J and F metrics on SA-V val and test sets. To run the evaluation, we need to first install a few dependencies as follows:
109
-
110
- ```
111
- pip install -r requirements.txt
112
- ```
113
-
114
- Then we can evaluate the predictions as follows:
115
-
116
- ```
117
- python sav_evaluator.py --gt_root {GT_ROOT} --pred_root {PRED_ROOT}
118
- ```
119
-
120
- or run
121
-
122
- ```
123
- python sav_evaluator.py --help
124
- ```
125
-
126
- to print a complete help message.
127
-
128
- The evaluator expects the `GT_ROOT` to be one of the following folder structures, and `GT_ROOT` and `PRED_ROOT` to have the same structure.
129
-
130
- - Same as SA-V val and test directory structure
131
-
132
- ```
133
- {GT_ROOT} # gt root folder
134
- ├── {video_id}
135
- │ ├── 000 # all masks associated with obj 000
136
- │ │ ├── 00000.png # mask for object 000 in frame 00000 (binary mask)
137
- │ │ └── ...
138
- │ ├── 001 # all masks associated with obj 001
139
- │ ├── 002 # all masks associated with obj 002
140
- │ └── ...
141
- ├── {video_id}
142
- ├── {video_id}
143
- └── ...
144
- ```
145
-
146
- In the paper for the experiments on SA-V val and test, we run inference on the 24 fps videos, and evaluate on the subset of frames where we have ground truth annotations (first and last annotated frames dropped). The evaluator will ignore the masks in frames where we don't have ground truth annotations.
147
-
148
- - Same as [DAVIS](https://github.com/davisvideochallenge/davis2017-evaluation) directory structure
149
-
150
- ```
151
- {GT_ROOT} # gt root folder
152
- ├── {video_id}
153
- │ ├── 00000.png # annotations in frame 00000 (may contain multiple objects)
154
- │ └── ...
155
- ├── {video_id}
156
- ├── {video_id}
157
- └── ...
158
- ```
159
-
160
- ## License
161
-
162
- The evaluation code is licensed under the [BSD 3 license](./LICENSE). Please refer to the paper for more details on the models. The videos and annotations in SA-V Dataset are released under CC BY 4.0.
163
-
164
- Third-party code: the evaluation software is heavily adapted from [`VOS-Benchmark`](https://github.com/hkchengrex/vos-benchmark) and [`DAVIS`](https://github.com/davisvideochallenge/davis2017-evaluation) (with their licenses in [`LICENSE_DAVIS`](./LICENSE_DAVIS) and [`LICENSE_VOS_BENCHMARK`](./LICENSE_VOS_BENCHMARK)).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
sav_dataset/example/sav_000001.mp4 DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:1f2083abd4f7af38e85de2a91b7859f0cf8695e21128a08f9af5abbdf43e5bd0
3
- size 5725431
 
 
 
 
sav_dataset/example/sav_000001_auto.json DELETED
The diff for this file is too large to render. See raw diff
 
sav_dataset/example/sav_000001_manual.json DELETED
The diff for this file is too large to render. See raw diff
 
sav_dataset/requirements.txt DELETED
@@ -1,7 +0,0 @@
1
- pycocoevalcap
2
- scikit-image
3
- opencv-python
4
- tqdm
5
- pillow
6
- numpy
7
- matplotlib
 
 
 
 
 
 
 
 
sav_dataset/sav_evaluator.py DELETED
@@ -1,89 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
-
4
- # This source code is licensed under the license found in the
5
- # LICENSE file in the sav_dataset directory of this source tree.
6
-
7
- # adapted from https://github.com/hkchengrex/vos-benchmark
8
- # and https://github.com/davisvideochallenge/davis2017-evaluation
9
- # with their licenses found in the LICENSE_VOS_BENCHMARK and LICENSE_DAVIS files
10
- # in the sav_dataset directory.
11
- from argparse import ArgumentParser
12
-
13
- from utils.sav_benchmark import benchmark
14
-
15
- """
16
- The structure of the {GT_ROOT} can be either of the follow two structures.
17
- {GT_ROOT} and {PRED_ROOT} should be of the same format
18
-
19
- 1. SA-V val/test structure
20
- {GT_ROOT} # gt root folder
21
- ├── {video_id}
22
- │ ├── 000 # all masks associated with obj 000
23
- │ │ ├── {frame_id}.png # mask for object 000 in {frame_id} (binary mask)
24
- │ │ └── ...
25
- │ ├── 001 # all masks associated with obj 001
26
- │ ├── 002 # all masks associated with obj 002
27
- │ └── ...
28
- ├── {video_id}
29
- ├── {video_id}
30
- └── ...
31
-
32
- 2. Similar to DAVIS structure:
33
-
34
- {GT_ROOT} # gt root folder
35
- ├── {video_id}
36
- │ ├── {frame_id}.png # annotation in {frame_id} (may contain multiple objects)
37
- │ └── ...
38
- ├── {video_id}
39
- ├── {video_id}
40
- └── ...
41
- """
42
-
43
-
44
- parser = ArgumentParser()
45
- parser.add_argument(
46
- "--gt_root",
47
- required=True,
48
- help="Path to the GT folder. For SA-V, it's sav_val/Annotations_6fps or sav_test/Annotations_6fps",
49
- )
50
- parser.add_argument(
51
- "--pred_root",
52
- required=True,
53
- help="Path to a folder containing folders of masks to be evaluated, with exactly the same structure as gt_root",
54
- )
55
- parser.add_argument(
56
- "-n", "--num_processes", default=16, type=int, help="Number of concurrent processes"
57
- )
58
- parser.add_argument(
59
- "-s",
60
- "--strict",
61
- help="Make sure every video in the gt_root folder has a corresponding video in the prediction",
62
- action="store_true",
63
- )
64
- parser.add_argument(
65
- "-q",
66
- "--quiet",
67
- help="Quietly run evaluation without printing the information out",
68
- action="store_true",
69
- )
70
-
71
- # https://github.com/davisvideochallenge/davis2017-evaluation/blob/d34fdef71ce3cb24c1a167d860b707e575b3034c/davis2017/evaluation.py#L85
72
- parser.add_argument(
73
- "--do_not_skip_first_and_last_frame",
74
- help="In SA-V val and test, we skip the first and the last annotated frames in evaluation. "
75
- "Set this to true for evaluation on settings that doen't skip first and last frames",
76
- action="store_true",
77
- )
78
-
79
-
80
- if __name__ == "__main__":
81
- args = parser.parse_args()
82
- benchmark(
83
- [args.gt_root],
84
- [args.pred_root],
85
- args.strict,
86
- args.num_processes,
87
- verbose=not args.quiet,
88
- skip_first_and_last=not args.do_not_skip_first_and_last_frame,
89
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
sav_dataset/sav_visualization_example.ipynb DELETED
The diff for this file is too large to render. See raw diff
 
sav_dataset/utils/sav_benchmark.py DELETED
@@ -1,488 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
-
4
- # This source code is licensed under the license found in the
5
- # LICENSE file in the sav_dataset directory of this source tree.
6
-
7
- # adapted from https://github.com/hkchengrex/vos-benchmark
8
- # and https://github.com/davisvideochallenge/davis2017-evaluation
9
- # with their licenses found in the LICENSE_VOS_BENCHMARK and LICENSE_DAVIS files
10
- # in the sav_dataset directory.
11
- import math
12
- import os
13
- import time
14
- from collections import defaultdict
15
- from multiprocessing import Pool
16
- from os import path
17
- from typing import Dict, List, Tuple
18
-
19
- import cv2
20
- import numpy as np
21
- import tqdm
22
- from PIL import Image
23
- from skimage.morphology import disk
24
-
25
-
26
- class VideoEvaluator:
27
- def __init__(self, gt_root, pred_root, skip_first_and_last=True) -> None:
28
- """
29
- gt_root: path to the folder storing the gt masks
30
- pred_root: path to the folder storing the predicted masks
31
- skip_first_and_last: whether we should skip the evaluation of the first and the last frame.
32
- True for SA-V val and test, same as in DAVIS semi-supervised evaluation.
33
- """
34
- self.gt_root = gt_root
35
- self.pred_root = pred_root
36
- self.skip_first_and_last = skip_first_and_last
37
-
38
- def __call__(self, vid_name: str) -> Tuple[str, Dict[str, float], Dict[str, float]]:
39
- """
40
- vid_name: name of the video to evaluate
41
- """
42
-
43
- # scan the folder to find subfolders for evaluation and
44
- # check if the folder structure is SA-V
45
- to_evaluate, is_sav_format = self.scan_vid_folder(vid_name)
46
-
47
- # evaluate each (gt_path, pred_path) pair
48
- eval_results = []
49
- for all_frames, obj_id, gt_path, pred_path in to_evaluate:
50
- if self.skip_first_and_last:
51
- # skip the first and the last frames
52
- all_frames = all_frames[1:-1]
53
-
54
- evaluator = Evaluator(name=vid_name, obj_id=obj_id)
55
- for frame in all_frames:
56
- gt_array, pred_array = self.get_gt_and_pred(
57
- gt_path, pred_path, frame, is_sav_format
58
- )
59
- evaluator.feed_frame(mask=pred_array, gt=gt_array)
60
-
61
- iou, boundary_f = evaluator.conclude()
62
- eval_results.append((obj_id, iou, boundary_f))
63
-
64
- if is_sav_format:
65
- iou_output, boundary_f_output = self.consolidate(eval_results)
66
- else:
67
- assert len(eval_results) == 1
68
- iou_output = eval_results[0][1]
69
- boundary_f_output = eval_results[0][2]
70
-
71
- return vid_name, iou_output, boundary_f_output
72
-
73
- def get_gt_and_pred(
74
- self,
75
- gt_path: str,
76
- pred_path: str,
77
- f_name: str,
78
- is_sav_format: bool,
79
- ) -> Tuple[np.ndarray, np.ndarray]:
80
- """
81
- Get the ground-truth and predicted masks for a single frame.
82
- """
83
- gt_mask_path = path.join(gt_path, f_name)
84
- pred_mask_path = path.join(pred_path, f_name)
85
- assert os.path.exists(pred_mask_path), f"{pred_mask_path} not found"
86
-
87
- gt_array = np.array(Image.open(gt_mask_path))
88
- pred_array = np.array(Image.open(pred_mask_path))
89
- assert (
90
- gt_array.shape[-2:] == pred_array.shape[-2:]
91
- ), f"shape mismatch: {gt_mask_path}, {pred_mask_path}"
92
-
93
- if is_sav_format:
94
- assert len(np.unique(gt_array)) <= 2, (
95
- f"found more than 1 object in {gt_mask_path} "
96
- "SA-V format assumes one object mask per png file."
97
- )
98
- assert len(np.unique(pred_array)) <= 2, (
99
- f"found more than 1 object in {pred_mask_path} "
100
- "SA-V format assumes one object mask per png file."
101
- )
102
- gt_array = gt_array > 0
103
- pred_array = pred_array > 0
104
-
105
- return gt_array, pred_array
106
-
107
- def scan_vid_folder(self, vid_name) -> Tuple[List, bool]:
108
- """
109
- Scan the folder structure of the video and return a list of folders for evaluate.
110
- """
111
-
112
- vid_gt_path = path.join(self.gt_root, vid_name)
113
- vid_pred_path = path.join(self.pred_root, vid_name)
114
- all_files_and_dirs = sorted(os.listdir(vid_gt_path))
115
- to_evaluate = []
116
- if all(name.endswith(".png") for name in all_files_and_dirs):
117
- # All files are png files, dataset structure similar to DAVIS
118
- is_sav_format = False
119
- frames = all_files_and_dirs
120
- obj_dir = None
121
- to_evaluate.append((frames, obj_dir, vid_gt_path, vid_pred_path))
122
- else:
123
- # SA-V dataset structure, going one layer down into each subdirectory
124
- is_sav_format = True
125
- for obj_dir in all_files_and_dirs:
126
- obj_gt_path = path.join(vid_gt_path, obj_dir)
127
- obj_pred_path = path.join(vid_pred_path, obj_dir)
128
- frames = sorted(os.listdir(obj_gt_path))
129
- to_evaluate.append((frames, obj_dir, obj_gt_path, obj_pred_path))
130
- return to_evaluate, is_sav_format
131
-
132
- def consolidate(
133
- self, eval_results
134
- ) -> Tuple[str, Dict[str, float], Dict[str, float]]:
135
- """
136
- Consolidate the results of all the objects from the video into one dictionary.
137
- """
138
- iou_output = {}
139
- boundary_f_output = {}
140
- for obj_id, iou, boundary_f in eval_results:
141
- assert len(iou) == 1
142
- key = list(iou.keys())[0]
143
- iou_output[obj_id] = iou[key]
144
- boundary_f_output[obj_id] = boundary_f[key]
145
- return iou_output, boundary_f_output
146
-
147
-
148
- #################################################################################################################
149
- # Functions below are from https://github.com/hkchengrex/vos-benchmark with minor modifications
150
- # _seg2bmap from https://github.com/hkchengrex/vos-benchmark/blob/main/vos_benchmark/utils.py
151
- # get_iou and Evaluator from https://github.com/hkchengrex/vos-benchmark/blob/main/vos_benchmark/evaluator.py
152
- # benchmark from https://github.com/hkchengrex/vos-benchmark/blob/main/vos_benchmark/benchmark.py with slight mod
153
- #################################################################################################################
154
-
155
-
156
- def _seg2bmap(seg, width=None, height=None):
157
- """
158
- From a segmentation, compute a binary boundary map with 1 pixel wide
159
- boundaries. The boundary pixels are offset by 1/2 pixel towards the
160
- origin from the actual segment boundary.
161
- Arguments:
162
- seg : Segments labeled from 1..k.
163
- width : Width of desired bmap <= seg.shape[1]
164
- height : Height of desired bmap <= seg.shape[0]
165
- Returns:
166
- bmap (ndarray): Binary boundary map.
167
- David Martin <[email protected]>
168
- January 2003
169
- """
170
-
171
- seg = seg.astype(bool)
172
- seg[seg > 0] = 1
173
-
174
- assert np.atleast_3d(seg).shape[2] == 1
175
-
176
- width = seg.shape[1] if width is None else width
177
- height = seg.shape[0] if height is None else height
178
-
179
- h, w = seg.shape[:2]
180
-
181
- ar1 = float(width) / float(height)
182
- ar2 = float(w) / float(h)
183
-
184
- assert not (
185
- width > w | height > h | abs(ar1 - ar2) > 0.01
186
- ), "Can" "t convert %dx%d seg to %dx%d bmap." % (w, h, width, height)
187
-
188
- e = np.zeros_like(seg)
189
- s = np.zeros_like(seg)
190
- se = np.zeros_like(seg)
191
-
192
- e[:, :-1] = seg[:, 1:]
193
- s[:-1, :] = seg[1:, :]
194
- se[:-1, :-1] = seg[1:, 1:]
195
-
196
- b = seg ^ e | seg ^ s | seg ^ se
197
- b[-1, :] = seg[-1, :] ^ e[-1, :]
198
- b[:, -1] = seg[:, -1] ^ s[:, -1]
199
- b[-1, -1] = 0
200
-
201
- if w == width and h == height:
202
- bmap = b
203
- else:
204
- bmap = np.zeros((height, width))
205
- for x in range(w):
206
- for y in range(h):
207
- if b[y, x]:
208
- j = 1 + math.floor((y - 1) + height / h)
209
- i = 1 + math.floor((x - 1) + width / h)
210
- bmap[j, i] = 1
211
-
212
- return bmap
213
-
214
-
215
- def get_iou(intersection, pixel_sum):
216
- # handle edge cases without resorting to epsilon
217
- if intersection == pixel_sum:
218
- # both mask and gt have zero pixels in them
219
- assert intersection == 0
220
- return 1
221
-
222
- return intersection / (pixel_sum - intersection)
223
-
224
-
225
- class Evaluator:
226
- def __init__(self, boundary=0.008, name=None, obj_id=None):
227
- # boundary: used in computing boundary F-score
228
- self.boundary = boundary
229
- self.name = name
230
- self.obj_id = obj_id
231
- self.objects_in_gt = set()
232
- self.objects_in_masks = set()
233
-
234
- self.object_iou = defaultdict(list)
235
- self.boundary_f = defaultdict(list)
236
-
237
- def feed_frame(self, mask: np.ndarray, gt: np.ndarray):
238
- """
239
- Compute and accumulate metrics for a single frame (mask/gt pair)
240
- """
241
-
242
- # get all objects in the ground-truth
243
- gt_objects = np.unique(gt)
244
- gt_objects = gt_objects[gt_objects != 0].tolist()
245
-
246
- # get all objects in the predicted mask
247
- mask_objects = np.unique(mask)
248
- mask_objects = mask_objects[mask_objects != 0].tolist()
249
-
250
- self.objects_in_gt.update(set(gt_objects))
251
- self.objects_in_masks.update(set(mask_objects))
252
-
253
- all_objects = self.objects_in_gt.union(self.objects_in_masks)
254
-
255
- # boundary disk for boundary F-score. It is the same for all objects.
256
- bound_pix = np.ceil(self.boundary * np.linalg.norm(mask.shape))
257
- boundary_disk = disk(bound_pix)
258
-
259
- for obj_idx in all_objects:
260
- obj_mask = mask == obj_idx
261
- obj_gt = gt == obj_idx
262
-
263
- # object iou
264
- self.object_iou[obj_idx].append(
265
- get_iou((obj_mask * obj_gt).sum(), obj_mask.sum() + obj_gt.sum())
266
- )
267
- """
268
- # boundary f-score
269
- This part is copied from davis2017-evaluation
270
- """
271
- mask_boundary = _seg2bmap(obj_mask)
272
- gt_boundary = _seg2bmap(obj_gt)
273
- mask_dilated = cv2.dilate(mask_boundary.astype(np.uint8), boundary_disk)
274
- gt_dilated = cv2.dilate(gt_boundary.astype(np.uint8), boundary_disk)
275
-
276
- # Get the intersection
277
- gt_match = gt_boundary * mask_dilated
278
- fg_match = mask_boundary * gt_dilated
279
-
280
- # Area of the intersection
281
- n_fg = np.sum(mask_boundary)
282
- n_gt = np.sum(gt_boundary)
283
-
284
- # Compute precision and recall
285
- if n_fg == 0 and n_gt > 0:
286
- precision = 1
287
- recall = 0
288
- elif n_fg > 0 and n_gt == 0:
289
- precision = 0
290
- recall = 1
291
- elif n_fg == 0 and n_gt == 0:
292
- precision = 1
293
- recall = 1
294
- else:
295
- precision = np.sum(fg_match) / float(n_fg)
296
- recall = np.sum(gt_match) / float(n_gt)
297
-
298
- # Compute F measure
299
- if precision + recall == 0:
300
- F = 0
301
- else:
302
- F = 2 * precision * recall / (precision + recall)
303
- self.boundary_f[obj_idx].append(F)
304
-
305
- def conclude(self):
306
- all_iou = {}
307
- all_boundary_f = {}
308
-
309
- for object_id in self.objects_in_gt:
310
- all_iou[object_id] = np.mean(self.object_iou[object_id]) * 100
311
- all_boundary_f[object_id] = np.mean(self.boundary_f[object_id]) * 100
312
-
313
- return all_iou, all_boundary_f
314
-
315
-
316
- def benchmark(
317
- gt_roots,
318
- mask_roots,
319
- strict=True,
320
- num_processes=None,
321
- *,
322
- verbose=True,
323
- skip_first_and_last=True,
324
- ):
325
- """
326
- gt_roots: a list of paths to datasets, i.e., [path_to_DatasetA, path_to_DatasetB, ...]
327
- mask_roots: same as above, but the .png are masks predicted by the model
328
- strict: when True, all videos in the dataset must have corresponding predictions.
329
- Setting it to False is useful in cases where the ground-truth contains both train/val
330
- sets, but the model only predicts the val subset.
331
- Either way, if a video is predicted (i.e., the corresponding folder exists),
332
- then it must at least contain all the masks in the ground truth annotations.
333
- Masks that are in the prediction but not in the ground-truth
334
- (i.e., sparse annotations) are ignored.
335
- skip_first_and_last: whether we should skip the first and the last frame in evaluation.
336
- This is used by DAVIS 2017 in their semi-supervised evaluation.
337
- It should be disabled for unsupervised evaluation.
338
- """
339
-
340
- assert len(gt_roots) == len(mask_roots)
341
- single_dataset = len(gt_roots) == 1
342
-
343
- if verbose:
344
- if skip_first_and_last:
345
- print(
346
- "We are *SKIPPING* the evaluation of the first and the last frame (standard for semi-supervised video object segmentation)."
347
- )
348
- else:
349
- print(
350
- "We are *NOT SKIPPING* the evaluation of the first and the last frame (*NOT STANDARD* for semi-supervised video object segmentation)."
351
- )
352
-
353
- pool = Pool(num_processes)
354
- start = time.time()
355
- to_wait = []
356
- for gt_root, mask_root in zip(gt_roots, mask_roots):
357
- # Validate folders
358
- validated = True
359
- gt_videos = os.listdir(gt_root)
360
- mask_videos = os.listdir(mask_root)
361
-
362
- # if the user passed the root directory instead of Annotations
363
- if len(gt_videos) != len(mask_videos):
364
- if "Annotations" in gt_videos:
365
- if ".png" not in os.listdir(path.join(gt_root, "Annotations"))[0]:
366
- gt_root = path.join(gt_root, "Annotations")
367
- gt_videos = os.listdir(gt_root)
368
-
369
- # remove non-folder items
370
- gt_videos = list(filter(lambda x: path.isdir(path.join(gt_root, x)), gt_videos))
371
- mask_videos = list(
372
- filter(lambda x: path.isdir(path.join(mask_root, x)), mask_videos)
373
- )
374
-
375
- if not strict:
376
- videos = sorted(list(set(gt_videos) & set(mask_videos)))
377
- else:
378
- gt_extras = set(gt_videos) - set(mask_videos)
379
- mask_extras = set(mask_videos) - set(gt_videos)
380
-
381
- if len(gt_extras) > 0:
382
- print(
383
- f"Videos that are in {gt_root} but not in {mask_root}: {gt_extras}"
384
- )
385
- validated = False
386
- if len(mask_extras) > 0:
387
- print(
388
- f"Videos that are in {mask_root} but not in {gt_root}: {mask_extras}"
389
- )
390
- validated = False
391
- if not validated:
392
- print("Validation failed. Exiting.")
393
- exit(1)
394
-
395
- videos = sorted(gt_videos)
396
-
397
- if verbose:
398
- print(
399
- f"In dataset {gt_root}, we are evaluating on {len(videos)} videos: {videos}"
400
- )
401
-
402
- if single_dataset:
403
- if verbose:
404
- results = tqdm.tqdm(
405
- pool.imap(
406
- VideoEvaluator(
407
- gt_root, mask_root, skip_first_and_last=skip_first_and_last
408
- ),
409
- videos,
410
- ),
411
- total=len(videos),
412
- )
413
- else:
414
- results = pool.map(
415
- VideoEvaluator(
416
- gt_root, mask_root, skip_first_and_last=skip_first_and_last
417
- ),
418
- videos,
419
- )
420
- else:
421
- to_wait.append(
422
- pool.map_async(
423
- VideoEvaluator(
424
- gt_root, mask_root, skip_first_and_last=skip_first_and_last
425
- ),
426
- videos,
427
- )
428
- )
429
-
430
- pool.close()
431
-
432
- all_global_jf, all_global_j, all_global_f = [], [], []
433
- all_object_metrics = []
434
- for i, mask_root in enumerate(mask_roots):
435
- if not single_dataset:
436
- results = to_wait[i].get()
437
-
438
- all_iou = []
439
- all_boundary_f = []
440
- object_metrics = {}
441
- for name, iou, boundary_f in results:
442
- all_iou.extend(list(iou.values()))
443
- all_boundary_f.extend(list(boundary_f.values()))
444
- object_metrics[name] = (iou, boundary_f)
445
-
446
- global_j = np.array(all_iou).mean()
447
- global_f = np.array(all_boundary_f).mean()
448
- global_jf = (global_j + global_f) / 2
449
-
450
- time_taken = time.time() - start
451
- """
452
- Build string for reporting results
453
- """
454
- # find max length for padding
455
- ml = max(*[len(n) for n in object_metrics.keys()], len("Global score"))
456
- # build header
457
- out_string = f'{"sequence":<{ml}},{"obj":>3}, {"J&F":>4}, {"J":>4}, {"F":>4}\n'
458
- out_string += f'{"Global score":<{ml}},{"":>3}, {global_jf:.1f}, {global_j:.1f}, {global_f:.1f}\n'
459
- # append one line for each object
460
- for name, (iou, boundary_f) in object_metrics.items():
461
- for object_idx in iou.keys():
462
- j, f = iou[object_idx], boundary_f[object_idx]
463
- jf = (j + f) / 2
464
- out_string += (
465
- f"{name:<{ml}},{object_idx:03}, {jf:>4.1f}, {j:>4.1f}, {f:>4.1f}\n"
466
- )
467
-
468
- # print to console
469
- if verbose:
470
- print(out_string.replace(",", " "), end="")
471
- print("\nSummary:")
472
- print(
473
- f"Global score: J&F: {global_jf:.1f} J: {global_j:.1f} F: {global_f:.1f}"
474
- )
475
- print(f"Time taken: {time_taken:.2f}s")
476
-
477
- # print to file
478
- result_path = path.join(mask_root, "results.csv")
479
- print(f"Saving the results to {result_path}")
480
- with open(result_path, "w") as f:
481
- f.write(out_string)
482
-
483
- all_global_jf.append(global_jf)
484
- all_global_j.append(global_j)
485
- all_global_f.append(global_f)
486
- all_object_metrics.append(object_metrics)
487
-
488
- return all_global_jf, all_global_j, all_global_f, all_object_metrics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
sav_dataset/utils/sav_utils.py DELETED
@@ -1,175 +0,0 @@
1
- # Copyright (c) Meta Platforms, Inc. and affiliates.
2
- # All rights reserved.
3
-
4
- # This source code is licensed under the license found in the
5
- # LICENSE file in the sav_dataset directory of this source tree.
6
- import json
7
- import os
8
- from typing import Dict, List, Optional, Tuple
9
-
10
- import cv2
11
- import matplotlib.pyplot as plt
12
- import numpy as np
13
- import pycocotools.mask as mask_util
14
-
15
-
16
- def decode_video(video_path: str) -> List[np.ndarray]:
17
- """
18
- Decode the video and return the RGB frames
19
- """
20
- video = cv2.VideoCapture(video_path)
21
- video_frames = []
22
- while video.isOpened():
23
- ret, frame = video.read()
24
- if ret:
25
- frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
26
- video_frames.append(frame)
27
- else:
28
- break
29
- return video_frames
30
-
31
-
32
- def show_anns(masks, colors: List, borders=True) -> None:
33
- """
34
- show the annotations
35
- """
36
- # return if no masks
37
- if len(masks) == 0:
38
- return
39
-
40
- # sort masks by size
41
- sorted_annot_and_color = sorted(
42
- zip(masks, colors), key=(lambda x: x[0].sum()), reverse=True
43
- )
44
- H, W = sorted_annot_and_color[0][0].shape[0], sorted_annot_and_color[0][0].shape[1]
45
-
46
- canvas = np.ones((H, W, 4))
47
- canvas[:, :, 3] = 0 # set the alpha channel
48
- contour_thickness = max(1, int(min(5, 0.01 * min(H, W))))
49
- for mask, color in sorted_annot_and_color:
50
- canvas[mask] = np.concatenate([color, [0.55]])
51
- if borders:
52
- contours, _ = cv2.findContours(
53
- np.array(mask, dtype=np.uint8), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE
54
- )
55
- cv2.drawContours(
56
- canvas, contours, -1, (0.05, 0.05, 0.05, 1), thickness=contour_thickness
57
- )
58
-
59
- ax = plt.gca()
60
- ax.imshow(canvas)
61
-
62
-
63
- class SAVDataset:
64
- """
65
- SAVDataset is a class to load the SAV dataset and visualize the annotations.
66
- """
67
-
68
- def __init__(self, sav_dir, annot_sample_rate=4):
69
- """
70
- Args:
71
- sav_dir: the directory of the SAV dataset
72
- annot_sample_rate: the sampling rate of the annotations.
73
- The annotations are aligned with the videos at 6 fps.
74
- """
75
- self.sav_dir = sav_dir
76
- self.annot_sample_rate = annot_sample_rate
77
- self.manual_mask_colors = np.random.random((256, 3))
78
- self.auto_mask_colors = np.random.random((256, 3))
79
-
80
- def read_frames(self, mp4_path: str) -> None:
81
- """
82
- Read the frames and downsample them to align with the annotations.
83
- """
84
- if not os.path.exists(mp4_path):
85
- print(f"{mp4_path} doesn't exist.")
86
- return None
87
- else:
88
- # decode the video
89
- frames = decode_video(mp4_path)
90
- print(f"There are {len(frames)} frames decoded from {mp4_path} (24fps).")
91
-
92
- # downsample the frames to align with the annotations
93
- frames = frames[:: self.annot_sample_rate]
94
- print(
95
- f"Videos are annotated every {self.annot_sample_rate} frames. "
96
- "To align with the annotations, "
97
- f"downsample the video to {len(frames)} frames."
98
- )
99
- return frames
100
-
101
- def get_frames_and_annotations(
102
- self, video_id: str
103
- ) -> Tuple[List | None, Dict | None, Dict | None]:
104
- """
105
- Get the frames and annotations for video.
106
- """
107
- # load the video
108
- mp4_path = os.path.join(self.sav_dir, video_id + ".mp4")
109
- frames = self.read_frames(mp4_path)
110
- if frames is None:
111
- return None, None, None
112
-
113
- # load the manual annotations
114
- manual_annot_path = os.path.join(self.sav_dir, video_id + "_manual.json")
115
- if not os.path.exists(manual_annot_path):
116
- print(f"{manual_annot_path} doesn't exist. Something might be wrong.")
117
- manual_annot = None
118
- else:
119
- manual_annot = json.load(open(manual_annot_path))
120
-
121
- # load the manual annotations
122
- auto_annot_path = os.path.join(self.sav_dir, video_id + "_auto.json")
123
- if not os.path.exists(auto_annot_path):
124
- print(f"{auto_annot_path} doesn't exist.")
125
- auto_annot = None
126
- else:
127
- auto_annot = json.load(open(auto_annot_path))
128
-
129
- return frames, manual_annot, auto_annot
130
-
131
- def visualize_annotation(
132
- self,
133
- frames: List[np.ndarray],
134
- auto_annot: Optional[Dict],
135
- manual_annot: Optional[Dict],
136
- annotated_frame_id: int,
137
- show_auto=True,
138
- show_manual=True,
139
- ) -> None:
140
- """
141
- Visualize the annotations on the annotated_frame_id.
142
- If show_manual is True, show the manual annotations.
143
- If show_auto is True, show the auto annotations.
144
- By default, show both auto and manual annotations.
145
- """
146
-
147
- if annotated_frame_id >= len(frames):
148
- print("invalid annotated_frame_id")
149
- return
150
-
151
- rles = []
152
- colors = []
153
- if show_manual and manual_annot is not None:
154
- rles.extend(manual_annot["masklet"][annotated_frame_id])
155
- colors.extend(
156
- self.manual_mask_colors[
157
- : len(manual_annot["masklet"][annotated_frame_id])
158
- ]
159
- )
160
- if show_auto and auto_annot is not None:
161
- rles.extend(auto_annot["masklet"][annotated_frame_id])
162
- colors.extend(
163
- self.auto_mask_colors[: len(auto_annot["masklet"][annotated_frame_id])]
164
- )
165
-
166
- plt.imshow(frames[annotated_frame_id])
167
-
168
- if len(rles) > 0:
169
- masks = [mask_util.decode(rle) > 0 for rle in rles]
170
- show_anns(masks, colors)
171
- else:
172
- print("No annotation will be shown")
173
-
174
- plt.axis("off")
175
- plt.show()