Figure 1. Overall architecture of the proposed network. On top of the convolution network based on VGG 16-layer net, we put a multilayer deconvolution network to generate the accurate segmentation map of an input proposal. Given a feature representation obtained from the convolution network, dense pixel-wise class prediction map is constructed through multiple series of unpooling, deconvolution and rectification operations.

Abstract

We propose a novel semantic segmentation algorithm by learning a deconvolution network. We learn the network on top of the convolutional layers adopted from VGG 16-layer net. The deconvolution network is composed of deconvolution and unpooling layers, which identify pixel-wise class labels and predict segmentation masks. We apply the trained network to each proposal in an input image, and construct the final semantic segmentation map by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks by integrating deep deconvolution network and proposal-wise prediction; our segmentation method typically identifies detailed structures and handles objects in multiple scales naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset, and we achieve the best accuracy (72.5%) among the methods trained with no external data through ensemble with the fully convolutional network.

Performance

Evaluation results on PASCAL VOC 2012 test set. (algorithms trained without additional data)

Paper

Learning Deconvolution Network for Semantic Segmentation
Hyeonwoo Noh, Seunghoon Hong, Bohyung Han
In Proceedings of ICCV, 2015.
                                      
@inproceedings{noh2015learning,
  title={Learning Deconvolution Network for Semantic Segmentation},
  author={Noh, Hyeonwoo and Hong, Seunghoon and Han, Bohyung},
  booktitle={Computer Vision (ICCV), 2015 IEEE International Conference on},
  year={2015}
} 
                                      
                                    
[arxiv preprint]

Code

DeconvNet model is now available. (Trained on VOC2012 trainval)

To run DeconvNet, you need modified version of caffe

If you want to reproduce our reported result, check our github repository.

Acknowledgments

This work is funded by the Samsung Electronics Co., (DMC R&D center).