Abstract

We consider test-time adaptation (TTA), the task of adapting a trained model to an arbitrary test domain using unlabeled input data on-the-fly during testing. A common practice of TTA is to disregard data used in training due to large memory demand and privacy leakage. However, the training data are the only source of supervision. This motivates us to investigate a proper way of using them while minimizing the side effects. To this end, we propose two lightweight yet informative proxies of the training data and a TTA method fully exploiting them. One of the proxies is composed of a small number of images synthesized (hence, less privacy-sensitive) by data condensation which minimizes their domain-specificity to capture a general underlying structure over a wide spectrum of domains. Then, in TTA, they are translated into labeled test data by stylizing them to match styles of unlabeled test samples. This enables virtually supervised test-time training. The other proxy is inter-class relations of training data, which are transferred to target model during TTA. On four public benchmarks, our method outperforms the state-of-the-art ones at remarkably less computation and memory.

Proposed Method

Figure 1. The overall architecture and training objectives of the proposed model. Condensed examples of training data are stylized by input test examples, and used for supervised test-time training with their labels (Lsup). They are also used for contrastive learning so as to reduce the discrepancy between test-stylized condensed data and test data on a feature space (Lcontra). Meanwhile, inter-class relations of \training data are used to regularize predictions for test examples through class-relation knowledge distillation so that inter-class relations of the predictions well approximate those of training data (LCRKD). Finally, we apply consistency regularization with augmented test examples to further boost performance by directly exploiting unlabeled test example for TTA (LFixMatch).

Experimental Results

1. Robustness to common image corruption

Table 1. Results of TTA with ResNet backbones on CIFAR10-C, CIFAR100-C, and TinyImageNet-C, averaged across all 15 corruptions and 5 severity levels. We report average accuracy (%) and mark the best performance in bold. RN denotes ResNet. Note that LFixMatch is not used in Ours for a fair comparison.

Table 2. Classification accuracy (%) on CIFAR10-to-CIFAR10-C online continual TTA task. Results are evaluated on WideResNet-28 with the highest corruption severity level. We mark the best and second-best performance in bold and underline, respectively. * denotes the requirement on additional domain information of input data for resetting model.

2. Adaptation from synthetic to real

Table 3. Classification accuracy (%) on VisDA-C train → val. We mark the best and second-best performance in bold and underline, respectively. † denotes offline unsupervised domain adaptation methods where the number of source images is equal to the number of condensed images in our method.

3. Ablation study

Table 4. Ablation study on combinations of the losses and test stylization (ST) in online continual TTA on CIFAR10-C and synthetic-to-real adaptation on VisDA-C, respectively.

4. Visualization

Figure 2. Examples of condensed data where each column presents 10 images per class of CIFAR10 and VisDA-C.

Acknowledgement

This work was supported by the the Institute of Information & communications Technology Planning & Evaluation grant funded by Ministry of Science and ICT, Korea (IITP-2020-0-00842, IITP-2021-0-00739), Samsung Electronics Co., Ltd (IO201210-07948-01), and Samsung Research Funding & Incubation Center of Samsung Electronics (SRFC-IT1801-05)

Paper

Leveraging Proxy of Training Data for Test-Time Adaptation
Juwon Kang, Nayeong Kim, Donghyeon Kwon, Jungseul Ok, and Suha Kwak
ICML, 2023
[Proc] [Bibtex] [Code]