| # Use Models | |
| ## Build Models from Yacs Config | |
| From a yacs config object, | |
| models (and their sub-models) can be built by | |
| functions such as `build_model`, `build_backbone`, `build_roi_heads`: | |
| ```python | |
| from detectron2.modeling import build_model | |
| model = build_model(cfg) # returns a torch.nn.Module | |
| ``` | |
| `build_model` only builds the model structure and fills it with random parameters. | |
| See below for how to load an existing checkpoint to the model and how to use the `model` object. | |
| ### Load/Save a Checkpoint | |
| ```python | |
| from detectron2.checkpoint import DetectionCheckpointer | |
| DetectionCheckpointer(model).load(file_path_or_url) # load a file, usually from cfg.MODEL.WEIGHTS | |
| checkpointer = DetectionCheckpointer(model, save_dir="output") | |
| checkpointer.save("model_999") # save to output/model_999.pth | |
| ``` | |
| Detectron2's checkpointer recognizes models in pytorch's `.pth` format, as well as the `.pkl` files | |
| in our model zoo. | |
| See [API doc](../modules/checkpoint.html#detectron2.checkpoint.DetectionCheckpointer) | |
| for more details about its usage. | |
| The model files can be arbitrarily manipulated using `torch.{load,save}` for `.pth` files or | |
| `pickle.{dump,load}` for `.pkl` files. | |
| ### Use a Model | |
| A model can be called by `outputs = model(inputs)`, where `inputs` is a `list[dict]`. | |
| Each dict corresponds to one image and the required keys | |
| depend on the type of model, and whether the model is in training or evaluation mode. | |
| For example, in order to do inference, | |
| all existing models expect the "image" key, and optionally "height" and "width". | |
| The detailed format of inputs and outputs of existing models are explained below. | |
| __Training__: When in training mode, all models are required to be used under an `EventStorage`. | |
| The training statistics will be put into the storage: | |
| ```python | |
| from detectron2.utils.events import EventStorage | |
| with EventStorage() as storage: | |
| losses = model(inputs) | |
| ``` | |
| __Inference__: If you only want to do simple inference using an existing model, | |
| [DefaultPredictor](../modules/engine.html#detectron2.engine.defaults.DefaultPredictor) | |
| is a wrapper around model that provides such basic functionality. | |
| It includes default behavior including model loading, preprocessing, | |
| and operates on single image rather than batches. See its documentation for usage. | |
| You can also run inference directly like this: | |
| ```python | |
| model.eval() | |
| with torch.no_grad(): | |
| outputs = model(inputs) | |
| ``` | |
| ### Model Input Format | |
| Users can implement custom models that support any arbitrary input format. | |
| Here we describe the standard input format that all builtin models support in detectron2. | |
| They all take a `list[dict]` as the inputs. Each dict | |
| corresponds to information about one image. | |
| The dict may contain the following keys: | |
| * "image": `Tensor` in (C, H, W) format. The meaning of channels are defined by `cfg.INPUT.FORMAT`. | |
| Image normalization, if any, will be performed inside the model using | |
| `cfg.MODEL.PIXEL_{MEAN,STD}`. | |
| * "height", "width": the **desired** output height and width **in inference**, which is not necessarily the same | |
| as the height or width of the `image` field. | |
| For example, the `image` field contains the resized image, if resize is used as a preprocessing step. | |
| But you may want the outputs to be in **original** resolution. | |
| If provided, the model will produce output in this resolution, | |
| rather than in the resolution of the `image` as input into the model. This is more efficient and accurate. | |
| * "instances": an [Instances](../modules/structures.html#detectron2.structures.Instances) | |
| object for training, with the following fields: | |
| + "gt_boxes": a [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each instance. | |
| + "gt_classes": `Tensor` of long type, a vector of N labels, in range [0, num_categories). | |
| + "gt_masks": a [PolygonMasks](../modules/structures.html#detectron2.structures.PolygonMasks) | |
| or [BitMasks](../modules/structures.html#detectron2.structures.BitMasks) object storing N masks, one for each instance. | |
| + "gt_keypoints": a [Keypoints](../modules/structures.html#detectron2.structures.Keypoints) | |
| object storing N keypoint sets, one for each instance. | |
| * "sem_seg": `Tensor[int]` in (H, W) format. The semantic segmentation ground truth for training. | |
| Values represent category labels starting from 0. | |
| * "proposals": an [Instances](../modules/structures.html#detectron2.structures.Instances) | |
| object used only in Fast R-CNN style models, with the following fields: | |
| + "proposal_boxes": a [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing P proposal boxes. | |
| + "objectness_logits": `Tensor`, a vector of P scores, one for each proposal. | |
| For inference of builtin models, only "image" key is required, and "width/height" are optional. | |
| We currently don't define standard input format for panoptic segmentation training, | |
| because models now use custom formats produced by custom data loaders. | |
| #### How it connects to data loader: | |
| The output of the default [DatasetMapper]( ../modules/data.html#detectron2.data.DatasetMapper) is a dict | |
| that follows the above format. | |
| After the data loader performs batching, it becomes `list[dict]` which the builtin models support. | |
| ### Model Output Format | |
| When in training mode, the builtin models output a `dict[str->ScalarTensor]` with all the losses. | |
| When in inference mode, the builtin models output a `list[dict]`, one dict for each image. | |
| Based on the tasks the model is doing, each dict may contain the following fields: | |
| * "instances": [Instances](../modules/structures.html#detectron2.structures.Instances) | |
| object with the following fields: | |
| * "pred_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes) object storing N boxes, one for each detected instance. | |
| * "scores": `Tensor`, a vector of N confidence scores. | |
| * "pred_classes": `Tensor`, a vector of N labels in range [0, num_categories). | |
| + "pred_masks": a `Tensor` of shape (N, H, W), masks for each detected instance. | |
| + "pred_keypoints": a `Tensor` of shape (N, num_keypoint, 3). | |
| Each row in the last dimension is (x, y, score). Confidence scores are larger than 0. | |
| * "sem_seg": `Tensor` of (num_categories, H, W), the semantic segmentation prediction. | |
| * "proposals": [Instances](../modules/structures.html#detectron2.structures.Instances) | |
| object with the following fields: | |
| * "proposal_boxes": [Boxes](../modules/structures.html#detectron2.structures.Boxes) | |
| object storing N boxes. | |
| * "objectness_logits": a torch vector of N confidence scores. | |
| * "panoptic_seg": A tuple of `(pred: Tensor, segments_info: Optional[list[dict]])`. | |
| The `pred` tensor has shape (H, W), containing the segment id of each pixel. | |
| * If `segments_info` exists, each dict describes one segment id in `pred` and has the following fields: | |
| * "id": the segment id | |
| * "isthing": whether the segment is a thing or stuff | |
| * "category_id": the category id of this segment. | |
| If a pixel's id does not exist in `segments_info`, it is considered to be void label | |
| defined in [Panoptic Segmentation](https://arxiv.org/abs/1801.00868). | |
| * If `segments_info` is None, all pixel values in `pred` must be ≥ -1. | |
| Pixels with value -1 are assigned void labels. | |
| Otherwise, the category id of each pixel is obtained by | |
| `category_id = pixel // metadata.label_divisor`. | |
| ### Partially execute a model: | |
| Sometimes you may want to obtain an intermediate tensor inside a model, | |
| such as the input of certain layer, the output before post-processing. | |
| Since there are typically hundreds of intermediate tensors, there isn't an API that provides you | |
| the intermediate result you need. | |
| You have the following options: | |
| 1. Write a (sub)model. Following the [tutorial](./write-models.md), you can | |
| rewrite a model component (e.g. a head of a model), such that it | |
| does the same thing as the existing component, but returns the output | |
| you need. | |
| 2. Partially execute a model. You can create the model as usual, | |
| but use custom code to execute it instead of its `forward()`. For example, | |
| the following code obtains mask features before mask head. | |
| ```python | |
| images = ImageList.from_tensors(...) # preprocessed input tensor | |
| model = build_model(cfg) | |
| model.eval() | |
| features = model.backbone(images.tensor) | |
| proposals, _ = model.proposal_generator(images, features) | |
| instances, _ = model.roi_heads(images, features, proposals) | |
| mask_features = [features[f] for f in model.roi_heads.in_features] | |
| mask_features = model.roi_heads.mask_pooler(mask_features, [x.pred_boxes for x in instances]) | |
| ``` | |
| 3. Use [forward hooks](https://pytorch.org/tutorials/beginner/former_torchies/nnft_tutorial.html#forward-and-backward-function-hooks). | |
| Forward hooks can help you obtain inputs or outputs of a certain module. | |
| If they are not exactly what you want, they can at least be used together with partial execution | |
| to obtain other tensors. | |
| All options require you to read documentation and sometimes code | |
| of the existing models to understand the internal logic, | |
| in order to write code to obtain the internal tensors. | |