Taeyoung Son1 | Juwon Kang1 | Namyup Kim1 | Sunghyun Cho1,2 | Suha Kwak1,2 |
1POSTECH | 2GSAI |
Despite the great advances in visual recognition, it has been witnessed that recognition models trained on clean images of common datasets are not robust against distorted images in the real world. To tackle this issue, we present a Universal and Recognition-friendly Image Enhancement network, dubbed URIE, which is attached in front of existing recognition models and enhances distorted input to improve their performance without retraining them. URIE is universal in that it aims to handle various factors of image degradation and to be incorporated with any arbitrary recognition models. Also, it is recognition-friendly since it is optimized to improve the robustness of following recognition models, instead of perceptual quality of output image. Our experiments demonstrate that URIE can handle various and latent image distortionsand improve the performance of existing models for five diverse recognition tasks where input images are degraded.
Figure 1. Overall architecture of URIE. The Selective Enhancement Modules (SEM) are indicated by gray rectangles. Details of these modules are illustrated in Fig. 2.
Figure 2. Details of SEM. ⊕ and ⊗ indicate element-wise summation and multiplication between feature maps, respectively
Figure 3. Example outputs of URIE. (a) Distorted input images. (b) Outputs of URIE. (c) Ground-truth images. (d) Magnitudes of per-pixel intensity change by URIE.
Table 1. Classification accuracy on the ImageNet dataset. The numbers in parentheses indicate the differences from the baseline. V16, R50, and R101 denote VGG-16, ResNet50, and ResNet-101, respectively.
Table 2. Classification accuracy on the CUB dataset. The numbers in parentheses indicate the differences from the baseline. V16, R50, and R101 denote VGG-16, ResNet50, and ResNet-101, respectively.
Table 3. Object detection performance of SSD 300 in mAP (%) on the VOC 2007 dataset. The numbers in parentheses indicate the differences from the baseline.
Table 4. Semantic segmentation performance of DeepLab v3 in mIoU (%) on the VOC 2012 dataset. The numbers in parentheses indicate the differences from the baseline.
Table 5. Accuracy of the ResNet-50 classifier on the Haze-20 and HazeClear-20 datasets. The numbers in parentheses indicate the differences from the baseline.
Figure 4. Qualitative results on the CUB dataset. (a) Input distorted images. (b) OWAN. (c) URIE-MSE. (d) URIE. (e) Ground-truth images. For all images, their grad-CAMs drawn by the ResNet-50 classifier are presented alongside. Examples in the first row are degraded by seen corruptions and the others are by unseen corruptions.
Figure 5. Qualitative results of SSD 300 on the VOC 2007 dataset. (a) Corrupted input. (b) OWAN. (c) URIE-MSE. (d) URIE. (e) Ground-truth. Examples in the top three rows are degraded by seen corruptions and the others are by unseen corruptions.
Figure 6. Qualitative results of DeepLab v3 on the VOC 2012 dataset. (a) Corrupted input. (b) OWAN. (c) URIE-MSE. (d) URIE. (e) Ground-truth. Examples in the top two rows are degraded by seen corruptions and the others are by unseen corruptions.
Figure 7. Qualitative results on the Haze-20 dataset. (a) Corrupted input. (b) OWAN. (c) URIE-MSE. (d) URIE. Top-1 prediction of the ResNet-50 classifier together with its confidence score and grad-CAM are presented alongside per example.
Check our GitHub repository: [github]