We propose a novel weakly-supervised semantic segmentation algorithm based on Deep Convolutional Neural Network (DCNN). Contrary to existing weakly-supervised approaches, our algorithm exploits auxiliary segmentation annotations available for different categories to guide segmentations on images with only image-level class labels. To make the segmentation knowledge transferrable across categories, we design a decoupled encoder-decoder architecture with attention model. In this architecture, the model generates spatial highlights of each category presented in an image using an attention model, and subsequently generates foreground segmentation for each highlighted region using decoder. Combining attention model, we show that the decoder trained with segmentation annotations in different categories can boost the performance of weakly-supervised semantic segmentation. The proposed algorithm demonstrates substantially improved performance compared to the state-of-the-art weakly-supervised techniques in challenging PASCAL VOC 2012 dataset when our model is trained with the annotations in 60 exclusive categories in Microsoft COCO dataset.
Figure 1 illustrates overall architecture of the proposed algorithm. Our model learns knowledge for semantic segmentation for images with weak-annotations (target domain) by leveraging strong annotations from different categories (source domain).
The proposed algorithm outperforms all weakly-supervised semantic segmentation techniques with substantial margins, and even comparable to semi-supervised semantic segmentation methods, which exploits a small number of ground-truth segmentations in addition to weakly-annotated images for training. We refer the paper for more results.
Table 1. Evaluation results on PASCAL VOC 2012 validation set.
Github repository: https://github.com/maga33/TransferNet