We propose a novel deep neural network architecture for semi-supervised semantic segmentation using heterogeneous annotations. Contrary to existing approaches posing semantic segmentation as region-based classification, our algorithm decouples classification and segmentation, and learns a separate network for each task. In this architecture, labels associated with an image are identified by classification network, and binary segmentation is subsequently performed for each identified label by segmentation network. The decoupled architecture enables us to learn classification and segmentation networks separately based on the training data with image-level and pixel-wise class labels, respectively. It facilitates to reduce search space for segmentation effectively by exploiting class-specific activation maps obtained from bridging layers. Our algorithm shows out standing performance compared to other semi-supervised approaches even with much less training images with strong annotations in PASCAL VOC dataset.
Table 1 summarizes quantitative results on PASCAL VOC2012 validation set. Given the same amount of supervision, DecoupledNet presents substantially better performance without any post-processing than other methods. Table 2 presents comprehensive results of our algorithm in PASCAL VOC test set.
Table 1. Evaluation results on PASCAL VOC 2012 validation set.
Table 2. Evaluation results on PASCAL VOC 2012 test set.
DecoupledNet model is now available.
To run DecoupledNet, you need modified version of caffe
If you want to reproduce our reported result, check our github repository.
Belows present more comprehensive results of our algorithm described in the paper.
Figure A.2 presents additional examples to Figure 2 in the main paper, which visualizes class-specific activation maps (outputs of bridging layers) obtained from PASCAL validation images. Despite significant appearance variations in the input images, the class-specific activation maps obtained from same class share similar properties. This property makes it possible to obtain figure-ground segmentation maps for individual relevant classes in segmentation network, and allows to achieve good generalization performance in segmentation even with a small number of training examples with strong annotations. More examples on various categories and channels of the activation map can be found at the link below
Figure A.3 presents additional examples to Figure 3 in the main paper, which illustrates segmentation maps of individual classes. Given class-specific activation maps presented in Figure A.2, we observe that segmentation network produces a segmentation map of each specific class effectively. More examples of segmentation maps can be found at the link below.
|Input image||Class 1||Class 2||Class 3|
Figure A.4 presents additional examples to Figure 4 in the main paper, which illustrates qualitative results of the proposed algorithm. Our model trained only with a small number strong annotations already shows good generalization performance, and that more training examples with strong annotations improves segmentation accuracy and reduces label confusions substantially. More qualitative results can be found at the link below.
|Input image||Ground-truth||5 examples||10 examples||25 examples||Full annotations|