MMDet / mmdetection /docs /en /advanced_guides /conventions.md

MMdet Model for Image Segmentation

6c9ac8f about 2 years ago

4.53 kB

	# Conventions

	Please check the following conventions if you would like to modify MMDetection as your own project.

	## About the order of image shape

	In OpenMMLab 2.0, to be consistent with the input argument of OpenCV, the argument about image shape in the data transformation pipeline is always in the `(width, height)` order. On the contrary, for computation convenience, the order of the field going through the data pipeline and the model is `(height, width)`. Specifically, in the results processed by each data transform pipeline, the fields and their value meaning is as below:

	- img_shape: (height, width)
	- ori_shape: (height, width)
	- pad_shape: (height, width)
	- batch_input_shape: (height, width)

	As an example, the initialization arguments of `Mosaic` are as below:

	```python
	@TRANSFORMS.register_module()
	class Mosaic(BaseTransform):
	def __init__(self,
	img_scale: Tuple[int, int] = (640, 640),
	center_ratio_range: Tuple[float, float] = (0.5, 1.5),
	bbox_clip_border: bool = True,
	pad_val: float = 114.0,
	prob: float = 1.0) -> None:
	...

	# img_scale order should be (width, height)
	self.img_scale = img_scale

	def transform(self, results: dict) -> dict:
	...

	results['img'] = mosaic_img
	# (height, width)
	results['img_shape'] = mosaic_img.shape[:2]
	```

	## Loss

	In MMDetection, a `dict` containing losses and metrics will be returned by `model(**data)`.

	For example, in bbox head,

	```python
	class BBoxHead(nn.Module):
	...
	def loss(self, ...):
	losses = dict()
	# classification loss
	losses['loss_cls'] = self.loss_cls(...)
	# classification accuracy
	losses['acc'] = accuracy(...)
	# bbox regression loss
	losses['loss_bbox'] = self.loss_bbox(...)
	return losses
	```

	`bbox_head.loss()` will be called during model forward.
	The returned dict contains `'loss_bbox'`, `'loss_cls'`, `'acc'` .
	Only `'loss_bbox'`, `'loss_cls'` will be used during back propagation,
	`'acc'` will only be used as a metric to monitor training process.

	By default, only values whose keys contain `'loss'` will be back propagated.
	This behavior could be changed by modifying `BaseDetector.train_step()`.

	## Empty Proposals

	In MMDetection, We have added special handling and unit test for empty proposals of two-stage. We need to deal with the empty proposals of the entire batch and single image at the same time. For example, in CascadeRoIHead,

	```python
	# simple_test method
	...
	# There is no proposal in the whole batch
	if rois.shape[0] == 0:
	bbox_results = [[
	np.zeros((0, 5), dtype=np.float32)
	for _ in range(self.bbox_head[-1].num_classes)
	]] * num_imgs
	if self.with_mask:
	mask_classes = self.mask_head[-1].num_classes
	segm_results = [[[] for _ in range(mask_classes)]
	for _ in range(num_imgs)]
	results = list(zip(bbox_results, segm_results))
	else:
	results = bbox_results
	return results
	...

	# There is no proposal in the single image
	for i in range(self.num_stages):
	...
	if i < self.num_stages - 1:
	for j in range(num_imgs):
	# Handle empty proposal
	if rois[j].shape[0] > 0:
	bbox_label = cls_score[j][:, :-1].argmax(dim=1)
	refine_roi = self.bbox_head[i].regress_by_class(
	rois[j], bbox_label, bbox_pred[j], img_metas[j])
	refine_roi_list.append(refine_roi)
	```

	If you have customized `RoIHead`, you can refer to the above method to deal with empty proposals.

	## Coco Panoptic Dataset

	In MMDetection, we have supported COCO Panoptic dataset. We clarify a few conventions about the implementation of `CocoPanopticDataset` here.

	1. For mmdet\<=2.16.0, the range of foreground and background labels in semantic segmentation are different from the default setting of MMDetection. The label `0` stands for `VOID` label and the category labels start from `1`.
	Since mmdet=2.17.0, the category labels of semantic segmentation start from `0` and label `255` stands for `VOID` for consistency with labels of bounding boxes.
	To achieve that, the `Pad` pipeline supports setting the padding value for `seg`.
	2. In the evaluation, the panoptic result is a map with the same shape as the original image. Each value in the result map has the format of `instance_id * INSTANCE_OFFSET + category_id`.