| # Use Custom Datasets | |
| This document explains how the dataset APIs | |
| ([DatasetCatalog](../modules/data.html#detectron2.data.DatasetCatalog), [MetadataCatalog](../modules/data.html#detectron2.data.MetadataCatalog)) | |
| work, and how to use them to add custom datasets. | |
| Datasets that have builtin support in detectron2 are listed in [builtin datasets](builtin_datasets.md). | |
| If you want to use a custom dataset while also reusing detectron2's data loaders, | |
| you will need to: | |
| 1. __Register__ your dataset (i.e., tell detectron2 how to obtain your dataset). | |
| 2. Optionally, __register metadata__ for your dataset. | |
| Next, we explain the above two concepts in detail. | |
| The [Colab tutorial](https://colab.research.google.com/drive/16jcaJoc6bCFAQ96jDe2HwtXj7BMD_-m5) | |
| has a live example of how to register and train on a dataset of custom formats. | |
| ### Register a Dataset | |
| To let detectron2 know how to obtain a dataset named "my_dataset", users need to implement | |
| a function that returns the items in your dataset and then tell detectron2 about this | |
| function: | |
| ```python | |
| def my_dataset_function(): | |
| ... | |
| return list[dict] in the following format | |
| from detectron2.data import DatasetCatalog | |
| DatasetCatalog.register("my_dataset", my_dataset_function) | |
| # later, to access the data: | |
| data: List[Dict] = DatasetCatalog.get("my_dataset") | |
| ``` | |
| Here, the snippet associates a dataset named "my_dataset" with a function that returns the data. | |
| The function must return the same data (with same order) if called multiple times. | |
| The registration stays effective until the process exits. | |
| The function can do arbitrary things and should return the data in `list[dict]`, each dict in either | |
| of the following formats: | |
| 1. Detectron2's standard dataset dict, described below. This will make it work with many other builtin | |
| features in detectron2, so it's recommended to use it when it's sufficient. | |
| 2. Any custom format. You can also return arbitrary dicts in your own format, | |
| such as adding extra keys for new tasks. | |
| Then you will need to handle them properly downstream as well. | |
| See below for more details. | |
| #### Standard Dataset Dicts | |
| For standard tasks | |
| (instance detection, instance/semantic/panoptic segmentation, keypoint detection), | |
| we load the original dataset into `list[dict]` with a specification similar to COCO's annotations. | |
| This is our standard representation for a dataset. | |
| Each dict contains information about one image. | |
| The dict may have the following fields, | |
| and the required fields vary based on what the dataloader or the task needs (see more below). | |
| ```eval_rst | |
| .. list-table:: | |
| :header-rows: 1 | |
| * - Task | |
| - Fields | |
| * - Common | |
| - file_name, height, width, image_id | |
| * - Instance detection/segmentation | |
| - annotations | |
| * - Semantic segmentation | |
| - sem_seg_file_name | |
| * - Panoptic segmentation | |
| - pan_seg_file_name, segments_info | |
| ``` | |
| + `file_name`: the full path to the image file. | |
| + `height`, `width`: integer. The shape of the image. | |
| + `image_id` (str or int): a unique id that identifies this image. Required by many | |
| evaluators to identify the images, but a dataset may use it for different purposes. | |
| + `annotations` (list[dict]): Required by __instance detection/segmentation or keypoint detection__ tasks. | |
| Each dict corresponds to annotations of one instance in this image, and | |
| may contain the following keys: | |
| + `bbox` (list[float], required): list of 4 numbers representing the bounding box of the instance. | |
| + `bbox_mode` (int, required): the format of bbox. It must be a member of | |
| [structures.BoxMode](../modules/structures.html#detectron2.structures.BoxMode). | |
| Currently supports: `BoxMode.XYXY_ABS`, `BoxMode.XYWH_ABS`. | |
| + `category_id` (int, required): an integer in the range [0, num_categories-1] representing the category label. | |
| The value num_categories is reserved to represent the "background" category, if applicable. | |
| + `segmentation` (list[list[float]] or dict): the segmentation mask of the instance. | |
| + If `list[list[float]]`, it represents a list of polygons, one for each connected component | |
| of the object. Each `list[float]` is one simple polygon in the format of `[x1, y1, ..., xn, yn]` (n≥3). | |
| The Xs and Ys are absolute coordinates in unit of pixels. | |
| + If `dict`, it represents the per-pixel segmentation mask in COCO's compressed RLE format. | |
| The dict should have keys "size" and "counts". You can convert a uint8 segmentation mask of 0s and | |
| 1s into such dict by `pycocotools.mask.encode(np.asarray(mask, order="F"))`. | |
| `cfg.INPUT.MASK_FORMAT` must be set to `bitmask` if using the default data loader with such format. | |
| + `keypoints` (list[float]): in the format of [x1, y1, v1,..., xn, yn, vn]. | |
| v[i] means the [visibility](http://cocodataset.org/#format-data) of this keypoint. | |
| `n` must be equal to the number of keypoint categories. | |
| The Xs and Ys are absolute real-value coordinates in range [0, W or H]. | |
| (Note that the keypoint coordinates in COCO format are integers in range [0, W-1 or H-1], which is different | |
| from our standard format. Detectron2 adds 0.5 to COCO keypoint coordinates to convert them from discrete | |
| pixel indices to floating point coordinates.) | |
| + `iscrowd`: 0 (default) or 1. Whether this instance is labeled as COCO's "crowd | |
| region". Don't include this field if you don't know what it means. | |
| If `annotations` is an empty list, it means the image is labeled to have no objects. | |
| Such images will by default be removed from training, | |
| but can be included using `DATALOADER.FILTER_EMPTY_ANNOTATIONS`. | |
| + `sem_seg_file_name` (str): | |
| The full path to the semantic segmentation ground truth file. | |
| It should be a grayscale image whose pixel values are integer labels. | |
| + `pan_seg_file_name` (str): | |
| The full path to panoptic segmentation ground truth file. | |
| It should be an RGB image whose pixel values are integer ids encoded using the | |
| [panopticapi.utils.id2rgb](https://github.com/cocodataset/panopticapi/) function. | |
| The ids are defined by `segments_info`. | |
| If an id does not appear in `segments_info`, the pixel is considered unlabeled | |
| and is usually ignored in training & evaluation. | |
| + `segments_info` (list[dict]): defines the meaning of each id in panoptic segmentation ground truth. | |
| Each dict has the following keys: | |
| + `id` (int): integer that appears in the ground truth image. | |
| + `category_id` (int): an integer in the range [0, num_categories-1] representing the category label. | |
| + `iscrowd`: 0 (default) or 1. Whether this instance is labeled as COCO's "crowd region". | |
| ```eval_rst | |
| .. note:: | |
| The PanopticFPN model does not use the panoptic segmentation | |
| format defined here, but a combination of both instance segmentation and semantic segmentation data | |
| format. See :doc:`builtin_datasets` for instructions on COCO. | |
| ``` | |
| Fast R-CNN (with pre-computed proposals) models are rarely used today. | |
| To train a Fast R-CNN, the following extra keys are needed: | |
| + `proposal_boxes` (array): 2D numpy array with shape (K, 4) representing K precomputed proposal boxes for this image. | |
| + `proposal_objectness_logits` (array): numpy array with shape (K, ), which corresponds to the objectness | |
| logits of proposals in 'proposal_boxes'. | |
| + `proposal_bbox_mode` (int): the format of the precomputed proposal bbox. | |
| It must be a member of | |
| [structures.BoxMode](../modules/structures.html#detectron2.structures.BoxMode). | |
| Default is `BoxMode.XYXY_ABS`. | |
| #### Custom Dataset Dicts for New Tasks | |
| In the `list[dict]` that your dataset function returns, the dictionary can also have __arbitrary custom data__. | |
| This will be useful for a new task that needs extra information not covered | |
| by the standard dataset dicts. In this case, you need to make sure the downstream code can handle your data | |
| correctly. Usually this requires writing a new `mapper` for the dataloader (see [Use Custom Dataloaders](./data_loading.md)). | |
| When designing a custom format, note that all dicts are stored in memory | |
| (sometimes serialized and with multiple copies). | |
| To save memory, each dict is meant to contain __small__ but sufficient information | |
| about each sample, such as file names and annotations. | |
| Loading full samples typically happens in the data loader. | |
| For attributes shared among the entire dataset, use `Metadata` (see below). | |
| To avoid extra memory, do not save such information inside each sample. | |
| ### "Metadata" for Datasets | |
| Each dataset is associated with some metadata, accessible through | |
| `MetadataCatalog.get(dataset_name).some_metadata`. | |
| Metadata is a key-value mapping that contains information that's shared among | |
| the entire dataset, and usually is used to interpret what's in the dataset, e.g., | |
| names of classes, colors of classes, root of files, etc. | |
| This information will be useful for augmentation, evaluation, visualization, logging, etc. | |
| The structure of metadata depends on what is needed from the corresponding downstream code. | |
| If you register a new dataset through `DatasetCatalog.register`, | |
| you may also want to add its corresponding metadata through | |
| `MetadataCatalog.get(dataset_name).some_key = some_value`, to enable any features that need the metadata. | |
| You can do it like this (using the metadata key "thing_classes" as an example): | |
| ```python | |
| from detectron2.data import MetadataCatalog | |
| MetadataCatalog.get("my_dataset").thing_classes = ["person", "dog"] | |
| ``` | |
| Here is a list of metadata keys that are used by builtin features in detectron2. | |
| If you add your own dataset without these metadata, some features may be | |
| unavailable to you: | |
| * `thing_classes` (list[str]): Used by all instance detection/segmentation tasks. | |
| A list of names for each instance/thing category. | |
| If you load a COCO format dataset, it will be automatically set by the function `load_coco_json`. | |
| * `thing_colors` (list[tuple(r, g, b)]): Pre-defined color (in [0, 255]) for each thing category. | |
| Used for visualization. If not given, random colors will be used. | |
| * `stuff_classes` (list[str]): Used by semantic and panoptic segmentation tasks. | |
| A list of names for each stuff category. | |
| * `stuff_colors` (list[tuple(r, g, b)]): Pre-defined color (in [0, 255]) for each stuff category. | |
| Used for visualization. If not given, random colors are used. | |
| * `ignore_label` (int): Used by semantic and panoptic segmentation tasks. Pixels in ground-truth | |
| annotations with this category label should be ignored in evaluation. Typically these are "unlabeled" | |
| pixels. | |
| * `keypoint_names` (list[str]): Used by keypoint detection. A list of names for each keypoint. | |
| * `keypoint_flip_map` (list[tuple[str]]): Used by keypoint detection. A list of pairs of names, | |
| where each pair are the two keypoints that should be flipped if the image is | |
| flipped horizontally during augmentation. | |
| * `keypoint_connection_rules`: list[tuple(str, str, (r, g, b))]. Each tuple specifies a pair of keypoints | |
| that are connected and the color (in [0, 255]) to use for the line between them when visualized. | |
| Some additional metadata that are specific to the evaluation of certain datasets (e.g. COCO): | |
| * `thing_dataset_id_to_contiguous_id` (dict[int->int]): Used by all instance detection/segmentation tasks in the COCO format. | |
| A mapping from instance class ids in the dataset to contiguous ids in range [0, #class). | |
| Will be automatically set by the function `load_coco_json`. | |
| * `stuff_dataset_id_to_contiguous_id` (dict[int->int]): Used when generating prediction json files for | |
| semantic/panoptic segmentation. | |
| A mapping from semantic segmentation class ids in the dataset | |
| to contiguous ids in [0, num_categories). It is useful for evaluation only. | |
| * `json_file`: The COCO annotation json file. Used by COCO evaluation for COCO-format datasets. | |
| * `panoptic_root`, `panoptic_json`: Used by COCO-format panoptic evaluation. | |
| * `evaluator_type`: Used by the builtin main training script to select | |
| evaluator. Don't use it in a new training script. | |
| You can just provide the [DatasetEvaluator](../modules/evaluation.html#detectron2.evaluation.DatasetEvaluator) | |
| for your dataset directly in your main script. | |
| ```eval_rst | |
| .. note:: | |
| In recognition, sometimes we use the term "thing" for instance-level tasks, | |
| and "stuff" for semantic segmentation tasks. | |
| Both are used in panoptic segmentation tasks. | |
| For background on the concept of "thing" and "stuff", see | |
| `On Seeing Stuff: The Perception of Materials by Humans and Machines | |
| <http://persci.mit.edu/pub_pdfs/adelson_spie_01.pdf>`_. | |
| ``` | |
| ### Register a COCO Format Dataset | |
| If your instance-level (detection, segmentation, keypoint) dataset is already a json file in the COCO format, | |
| the dataset and its associated metadata can be registered easily with: | |
| ```python | |
| from detectron2.data.datasets import register_coco_instances | |
| register_coco_instances("my_dataset", {}, "json_annotation.json", "path/to/image/dir") | |
| ``` | |
| If your dataset is in COCO format but need to be further processed, or has extra custom per-instance annotations, | |
| the [load_coco_json](../modules/data.html#detectron2.data.datasets.load_coco_json) | |
| function might be useful. | |
| ### Update the Config for New Datasets | |
| Once you've registered the dataset, you can use the name of the dataset (e.g., "my_dataset" in | |
| example above) in `cfg.DATASETS.{TRAIN,TEST}`. | |
| There are other configs you might want to change to train or evaluate on new datasets: | |
| * `MODEL.ROI_HEADS.NUM_CLASSES` and `MODEL.RETINANET.NUM_CLASSES` are the number of thing classes | |
| for R-CNN and RetinaNet models, respectively. | |
| * `MODEL.ROI_KEYPOINT_HEAD.NUM_KEYPOINTS` sets the number of keypoints for Keypoint R-CNN. | |
| You'll also need to set [Keypoint OKS](http://cocodataset.org/#keypoints-eval) | |
| with `TEST.KEYPOINT_OKS_SIGMAS` for evaluation. | |
| * `MODEL.SEM_SEG_HEAD.NUM_CLASSES` sets the number of stuff classes for Semantic FPN & Panoptic FPN. | |
| * `TEST.DETECTIONS_PER_IMAGE` controls the maximum number of objects to be detected. | |
| Set it to a larger number if test images may contain >100 objects. | |
| * If you're training Fast R-CNN (with precomputed proposals), `DATASETS.PROPOSAL_FILES_{TRAIN,TEST}` | |
| need to match the datasets. The format of proposal files are documented | |
| [here](../modules/data.html#detectron2.data.load_proposals_into_dataset). | |
| New models | |
| (e.g. [TensorMask](../../projects/TensorMask), | |
| [PointRend](../../projects/PointRend)) | |
| often have similar configs of their own that need to be changed as well. | |
| ```eval_rst | |
| .. tip:: | |
| After changing the number of classes, certain layers in a pre-trained model will become incompatible | |
| and therefore cannot be loaded to the new model. | |
| This is expected, and loading such pre-trained models will produce warnings about such layers. | |
| ``` | |