Dahyun Kang | Heeseung Kwon | Juhong Min | Minsu Cho |
Pohang University of Science and Technology (POSTECH), South Korea |
We propose to address the problem of few-shot classification by meta-learning ``what to observe'' and ``where to attend'' in a relational perspective. Our method leverages relational patterns within and between images via self-correlational representation (SCR) and cross-correlational attention (CCA). Within each image, the SCR module transforms a base feature map into a self-correlation tensor and learns to extract structural patterns from the tensor. Between the images, the CCA module computes cross-correlation between two image representations and learns to produce co-attention between them. Our Relational Embedding Networks (RENet) combine the two relational modules to learn relational embedding in an end-to-end manner. In experimental evaluation, it achieves consistent improvements over state-of-the-art methods on four widely used few-shot classification benchmarks of miniImageNet, tieredImageNet, CUB-200-2011, and CIFAR-FS.
Architecture of SCR and CCA. | Effects of SCR and CCA. |
(a) miniImageNet | (b) tieredImageNet |
(c) CUB-200-2011 | (d) CIFAR-FS |
This work was supported by Samsung Electronics Co., Ltd. (IO201208-07822-01) and the IITP grants (No.2019-0-01906, AI Graduate School Program - POSTECH) (No.2021-0-00537, Visual common sense through self-supervised learning for restoration of invisible parts in images) funded by Ministry of Science and ICT, Korea.
Check our GitHub repository: [GitHub]