Figure A.1. The architecture of the proposed network. While classification and segmentation networks are decoupled, bridging layers deliver critical information from classification network to segmentation network.

Abstract

We propose a novel deep neural network architecture for semi-supervised semantic segmentation using heterogeneous annotations. Contrary to existing approaches posing semantic segmentation as region-based classification, our algorithm decouples classification and segmentation, and learns a separate network for each task. In this architecture, labels associated with an image are identified by classification network, and binary segmentation is subsequently performed for each identified label by segmentation network. The decoupled architecture enables us to learn classification and segmentation networks separately based on the training data with image-level and pixel-wise class labels, respectively. It facilitates to reduce search space for segmentation effectively by exploiting class-specific activation maps obtained from bridging layers. Our algorithm shows out standing performance compared to other semi-supervised approaches even with much less training images with strong annotations in PASCAL VOC dataset.

Performance

Table 1 summarizes quantitative results on PASCAL VOC2012 validation set. Given the same amount of supervision, DecoupledNet presents substantially better performance without any post-processing than other methods. Table 2 presents comprehensive results of our algorithm in PASCAL VOC test set.

Table 1. Evaluation results on PASCAL VOC 2012 validation set.

Table 2. Evaluation results on PASCAL VOC 2012 test set.

Paper

Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation
*Seunghoon Hong, *Hyeonwoo Noh, Bohyung Han
In Proceedings of NIPS, 2015.
                                      
@inproceedings{hong2015decoupled,
  title={Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation},
  author={Hong, Seunghoon and Noh, Hyeonwoo and Han, Bohyung},
  booktitle = {Advances in Neural Information Processing Systems 28},
  year={2015}
}
                                      
                                    
[arxiv preprint]

Code

DecoupledNet model is now available.

To run DecoupledNet, you need modified version of caffe

If you want to reproduce our reported result, check our github repository.

Supplementary Examples

Belows present more comprehensive results of our algorithm described in the paper.

1. Filter Visualization

Figure A.2 presents additional examples to Figure 2 in the main paper, which visualizes class-specific activation maps (outputs of bridging layers) obtained from PASCAL validation images. Despite significant appearance variations in the input images, the class-specific activation maps obtained from same class share similar properties. This property makes it possible to obtain figure-ground segmentation maps for individual relevant classes in segmentation network, and allows to achieve good generalization performance in segmentation even with a small number of training examples with strong annotations. More examples on various categories and channels of the activation map can be found at the link below

aeroplane
image
activation
bicycle
image
activation
bird
image
activation
boat
image
activation

Figure A.2. Examples of class-specific activation maps obtained from several PASCAL validation images.



2. Class-specific Segmentation Maps

Figure A.3 presents additional examples to Figure 3 in the main paper, which illustrates segmentation maps of individual classes. Given class-specific activation maps presented in Figure A.2, we observe that segmentation network produces a segmentation map of each specific class effectively. More examples of segmentation maps can be found at the link below.

Input imageClass 1Class 2Class 3

Figure A.3. Example of segmentation maps. (column 1) input image, (column 2,3 and 4) segmentation maps of each identified class in an input image.



3. Qualitative Results

Figure A.4 presents additional examples to Figure 4 in the main paper, which illustrates qualitative results of the proposed algorithm. Our model trained only with a small number strong annotations already shows good generalization performance, and that more training examples with strong annotations improves segmentation accuracy and reduces label confusions substantially. More qualitative results can be found at the link below.

Input imageGround-truth5 examples10 examples25 examplesFull annotations

Figure A.4. Semantic segmentation results of several PASCAL VOC 2012 validation images based on the models trained on a different number of pixel-wise segmentation annotations.