| # Bootstrapping Pipeline | |
| Bootstrapping pipeline for DensePose was proposed in | |
| [Sanakoyeu et al., 2020](https://arxiv.org/pdf/2003.00080.pdf) | |
| to extend DensePose from humans to proximal animal classes | |
| (chimpanzees). Currently, the pipeline is only implemented for | |
| [chart-based models](DENSEPOSE_IUV.md). | |
| Bootstrapping proceeds in two steps. | |
| ## Master Model Training | |
| Master model is trained on data from source domain (humans) | |
| and supporting domain (animals). Instances from the source domain | |
| contain full DensePose annotations (`S`, `I`, `U` and `V`) and | |
| instances from the supporting domain have segmentation annotations only. | |
| To ensure segmentation quality in the target domain, only a subset of | |
| supporting domain classes is included into the training. This is achieved | |
| through category filters, e.g. | |
| (see [configs/evolution/Base-RCNN-FPN-Atop10P_CA.yaml](../configs/evolution/Base-RCNN-FPN-Atop10P_CA.yaml)): | |
| ``` | |
| WHITELISTED_CATEGORIES: | |
| "base_coco_2017_train": | |
| - 1 # person | |
| - 16 # bird | |
| - 17 # cat | |
| - 18 # dog | |
| - 19 # horse | |
| - 20 # sheep | |
| - 21 # cow | |
| - 22 # elephant | |
| - 23 # bear | |
| - 24 # zebra | |
| - 25 # girafe | |
| ``` | |
| The acronym `Atop10P` in config file names indicates that categories are filtered to | |
| only contain top 10 animals and person. | |
| The training is performed in a *class-agnostic* manner: all instances | |
| are mapped into the same class (person), e.g. | |
| (see [configs/evolution/Base-RCNN-FPN-Atop10P_CA.yaml](../configs/evolution/Base-RCNN-FPN-Atop10P_CA.yaml)): | |
| ``` | |
| CATEGORY_MAPS: | |
| "base_coco_2017_train": | |
| "16": 1 # bird -> person | |
| "17": 1 # cat -> person | |
| "18": 1 # dog -> person | |
| "19": 1 # horse -> person | |
| "20": 1 # sheep -> person | |
| "21": 1 # cow -> person | |
| "22": 1 # elephant -> person | |
| "23": 1 # bear -> person | |
| "24": 1 # zebra -> person | |
| "25": 1 # girafe -> person | |
| ``` | |
| The acronym `CA` in config file names indicates that the training is class-agnostic. | |
| ## Student Model Training | |
| Student model is trained on data from source domain (humans), | |
| supporting domain (animals) and target domain (chimpanzees). | |
| Annotations in source and supporting domains are similar to the ones | |
| used for the master model training. | |
| Annotations in target domain are obtained by applying the master model | |
| to images that contain instances from the target category and sampling | |
| sparse annotations from dense results. This process is called *bootstrapping*. | |
| Below we give details on how the bootstrapping pipeline is implemented. | |
| ### Data Loaders | |
| The central components that enable bootstrapping are | |
| [`InferenceBasedLoader`](../densepose/data/inference_based_loader.py) and | |
| [`CombinedDataLoader`](../densepose/data/combined_loader.py). | |
| `InferenceBasedLoader` takes images from a data loader, applies a model | |
| to the images, filters the model outputs based on the selected criteria and | |
| samples the filtered outputs to produce annotations. | |
| `CombinedDataLoader` combines data obtained from the loaders based on specified | |
| ratios. The standard data loader has the default ratio of 1.0, | |
| ratios for bootstrap datasets are specified in the configuration file. | |
| The higher the ratio the higher the probability to include samples from the | |
| particular data loader into a batch. | |
| Here is an example of the bootstrapping configuration taken from | |
| [`configs/evolution/densepose_R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_uniform.yaml`](../configs/evolution/densepose_R_50_FPN_DL_WC1M_3x_Atop10P_CA_B_uniform.yaml): | |
| ``` | |
| BOOTSTRAP_DATASETS: | |
| - DATASET: "chimpnsee" | |
| RATIO: 1.0 | |
| IMAGE_LOADER: | |
| TYPE: "video_keyframe" | |
| SELECT: | |
| STRATEGY: "random_k" | |
| NUM_IMAGES: 4 | |
| TRANSFORM: | |
| TYPE: "resize" | |
| MIN_SIZE: 800 | |
| MAX_SIZE: 1333 | |
| BATCH_SIZE: 8 | |
| NUM_WORKERS: 1 | |
| INFERENCE: | |
| INPUT_BATCH_SIZE: 1 | |
| OUTPUT_BATCH_SIZE: 1 | |
| DATA_SAMPLER: | |
| # supported types: | |
| # densepose_uniform | |
| # densepose_UV_confidence | |
| # densepose_fine_segm_confidence | |
| # densepose_coarse_segm_confidence | |
| TYPE: "densepose_uniform" | |
| COUNT_PER_CLASS: 8 | |
| FILTER: | |
| TYPE: "detection_score" | |
| MIN_VALUE: 0.8 | |
| BOOTSTRAP_MODEL: | |
| WEIGHTS: https://dl.fbaipublicfiles.com/densepose/evolution/densepose_R_50_FPN_DL_WC1M_3x_Atop10P_CA/217578784/model_final_9fe1cc.pkl | |
| ``` | |
| The above example has one bootstrap dataset (`chimpnsee`). This dataset is registered as | |
| a [VIDEO_LIST](../densepose/data/datasets/chimpnsee.py) dataset, which means that | |
| it consists of a number of videos specified in a text file. For videos there can be | |
| different strategies to sample individual images. Here we use `video_keyframe` strategy | |
| which considers only keyframes; this ensures temporal offset between sampled images and | |
| faster seek operations. We select at most 4 random keyframes in each video: | |
| ``` | |
| SELECT: | |
| STRATEGY: "random_k" | |
| NUM_IMAGES: 4 | |
| ``` | |
| The frames are then resized | |
| ``` | |
| TRANSFORM: | |
| TYPE: "resize" | |
| MIN_SIZE: 800 | |
| MAX_SIZE: 1333 | |
| ``` | |
| and batched using the standard | |
| [PyTorch DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader): | |
| ``` | |
| BATCH_SIZE: 8 | |
| NUM_WORKERS: 1 | |
| ``` | |
| `InferenceBasedLoader` decomposes those batches into batches of size `INPUT_BATCH_SIZE` | |
| and applies the master model specified by `BOOTSTRAP_MODEL`. Models outputs are filtered | |
| by detection score: | |
| ``` | |
| FILTER: | |
| TYPE: "detection_score" | |
| MIN_VALUE: 0.8 | |
| ``` | |
| and sampled using the specified sampling strategy: | |
| ``` | |
| DATA_SAMPLER: | |
| # supported types: | |
| # densepose_uniform | |
| # densepose_UV_confidence | |
| # densepose_fine_segm_confidence | |
| # densepose_coarse_segm_confidence | |
| TYPE: "densepose_uniform" | |
| COUNT_PER_CLASS: 8 | |
| ``` | |
| The current implementation supports | |
| [uniform sampling](../densepose/data/samplers/densepose_uniform.py) and | |
| [confidence-based sampling](../densepose/data/samplers/densepose_confidence_based.py) | |
| to obtain sparse annotations from dense results. For confidence-based | |
| sampling one needs to use the master model which produces confidence estimates. | |
| The `WC1M` master model used in the example above produces all three types of confidence | |
| estimates. | |
| Finally, sampled data is grouped into batches of size `OUTPUT_BATCH_SIZE`: | |
| ``` | |
| INFERENCE: | |
| INPUT_BATCH_SIZE: 1 | |
| OUTPUT_BATCH_SIZE: 1 | |
| ``` | |
| The proportion of data from annotated datasets and bootstrapped dataset can be tracked | |
| in the logs, e.g.: | |
| ``` | |
| [... densepose.engine.trainer]: batch/ 1.8, batch/base_coco_2017_train 6.4, batch/densepose_coco_2014_train 3.85 | |
| ``` | |
| which means that over the last 20 iterations, on average for 1.8 bootstrapped data samples there were 6.4 samples from `base_coco_2017_train` and 3.85 samples from `densepose_coco_2014_train`. | |