Abstract

Few-shot semantic segmentation aims at learning to segment a target object from a query image using only a few annotated support images of the target class. This challenging task requires to understand diverse levels of visual cues and analyze fine-grained correspondence relations between the query and the support images. To address the problem, we propose Hypercorrelation Squeeze Networks (HSNet) that leverages multi-level feature correlation and efficient 4D convolutions. It extracts diverse features from different levels of intermediate convolutional layers and constructs a collection of 4D correlation tensors, i.e., hypercorrelations. Using efficient center-pivot 4D convolutions in a pyramidal architecture, the method gradually squeezes high-level semantic and low-level geometric cues of the hypercorrelation into precise segmentation masks in coarse-to-fine manner. The significant performance improvements on standard few-shot segmentation benchmarks of PASCAL-5i, COCO-20i, and FSS-1000 verifies the efficacy of the proposed method.

Overall architecture

Figure 1. Overall architecture of the proposed network which consists of three main parts: hypercorrelation construction, 4D-convolutionalpyramid encoder, and 2D-convolutional context decoder.

Proposed center-pivot 4D convolution

Figure 2. 4D convolution (left) and weights of 4D kernel [55, 77] (middle) and center-pivot 4D kernel (right). Each black wire that connects two different pixel locations represent a single weight of the 4D kernel. The kernel size used in this example is (3, 3, 3, 3).

Experimental results

1. Resuts on PASCAL-5i dataset.

Table 1. Performance on PASCAL-5i in mIoU and FB-IoU. Some results are from [4, 35, 67, 71, 76]. Superscript † denotes our model without support feature masking (Eqn. 1). Numbers in bold indicate the best performance and underlined ones are the second best.

2. Results on COCO-20i and FSS-1000.

Table 2. Performance on COCO-20i (left) and FSS-1000 (right) in mIoU and FB-IoU. Some results are from [2, 4, 35, 67, 71, 76].

3. Qualitative results

Figure 4. Qualitative (1-shot) results on dataset in presence of large differences in object scales and extremely small objects.

Acknowledgements

This work was supported by Samsung Advanced Institute of Technology (SAIT), the NRF grants (NRF-2017R1E1A1A01077999, NRF-2021R1A2C3012728), and the IITP grant (No.2019-0-01906, AI Graduate School Program - POSTECH) funded by Ministry of Science and ICT, Korea.

Papers

Hypercorrelation Squeeze for Few-Shot Segmentation
Juhong Min, Dahyun Kang, and Minsu Cho
arXiv preprint, 2021
[arXiv] [Bibtex]

Code

Check our GitHub repository: [github]