Abstract

Domain generalization for semantic segmentation is highly demanded in real applications, where a trained model is expected to work well in previously unseen domains. One challenge lies in the lack of data which could cover the diverse distributions of the possible unseen domains for training. In this paper, we propose a WEb-image assisted Domain GEneralization (WEDGE) scheme, which is the first to exploit the diversity of web-crawled images for generalizable semantic segmentation. To explore and exploit the real-world data distributions, we collect web-crawled images which present large diversity in terms of weather conditions, sites, lighting, camera styles, etc. We also present a method which injects styles of the web-crawled images into training images on-the-fly during training, which enables the network to experience images of diverse styles with reliable labels for effective training. Moreover, we use the web-crawled images with their predicted pseudo labels for training to further enhance the capability of the network. Extensive experiments demonstrate that our method clearly outperforms existing domain generalization techniques.

Video 1. Representative results of WEDGE in video.

Overall framework of WEDGE

Figure 1. Overall framework of WEDGE. (1) Crawling real and task-relevant images from the Web automatically. (2) Learning semantic segmentation while transferring feature statistics of web images to features of synthetic training images in the source domain. (3) Further training the model using both source images and web-crawled images with predicted pseudo labels.

Experimental results

1. Performance comparison with other methods

Table 1. (Left) Quantitative results in mIoU of domain generalization from (G)TA5 to (C)ityscapes, (B)DDS, and (M)apillary. (Right) Quantitative results in mIoU of domain generalization from (S)YNTHIA to (C)ityscapes, (B)DDS, and (M)apillary. WEDGE clearly outperforms all the previous arts in all the 12 experiments.

2. In-depth Analysis on WEDGE

Table 2. Performance of WEDGE for domain generalization from (G)TA5 and (S)YNTHIA to (C)ityscapes, (B)DDS, and (M)apillary. The results show that how that the first stage using style injection most contributes to the performance in most experiments, which demonstrates the effectiveness of using web-crawled images and our style injection module for domain generalization. The second stage also leads to non-trivial performance improvement.

Table 3. (Left) Domain generalization performance of WEDGE with each variant of style injection methods and ours. While using AdaIN and MAST also improves performance, our method achieves the best in both SI and PL stages except for the GTA to BDDS case in the SI stage. (Right) Domain generalization performance of WEDGE with different types of style reference. Using the real yet irrelevant images improves performance, suggesting the robustness of our method, but the results are still inferior to those of our method, meaning that our crawling strategy is useful and using relevant images matters.

Figure 2. Domain generalization performance of WEDGE versus the number of web images. In all experiments ResNet101 is used as backbone. The generalization capability of the model can be substantially improved by using only 1,000 web images, while using the whole web dataset further improves performance.

3. Qualitative results

Figure 3. Qualitative results of WEDGE and its baseline using ResNet101 backbone and trained on the GTA5 dataset.

Acknowledgements

This work was supported by Samsung Research Funding \& Incubation Center of Samsung Electronics under Project Number SRFC-IT1801-52. This work was done while Namyup Kim was working as an intern at Microsoft Research Asia.

Paper

WEDGE: Web-Image Assisted Domain Generalization for Semantic Segmentation
Namyup Kim, Taeyoung Son, Jaehyun Pahk, Cuiling Lan, Wenjun Zeng, and Suha Kwak
ICRA, 2023
[arXiv] [Bibtex]