Abstract

Test-time adaptation (TTA) has emerged as a promising approach to dealing with latent distribution shifts between training and testing data. However, most of existing TTA methods often struggle with small input batches, as they heavily rely on batch statistics that become less reliable as batch size decreases. In this paper, we introduce memory-based batch normalization (MemBN) to enhance the robustness of TTA across a wide range of batch sizes. MemBN leverages statistics memory queues within each batch normalization layer, accumulating the latest test batch statistics. Through dedicated memory management and aggregation algorithms, it enables to estimate reliable statistics that well represent the data distribution of the test domain in hand, leading to improved performance and robust test-time adaptation. Extensive experiments under a large variety of TTA scenarios demonstrate MemBN's superiority in terms of both accuracy and robustness.

Figure 1. The core concept of MemBN. MemBN layer retains in-batch statistics of test data and aggregates those stored in memory to derive more reliable statistics. As more statistics are progressively stored, the average memory statistics well approximate the feature distribution of the test data, similar to those derived from large input batches in batch normalization (as illustrated on the right).

Overall pipeline of MemBN

Figure 2. (a) Updating the memory queues with in-batch statistics of current inputs (triangles) and calculating average memory statistics (red star). This update occurs sequentially as each new input batch is processed. (b) Calculating layer-wise adaptive weight $\alpha$ and deriving normalization statistics (yellow star) by mixing average memory statistics and source statistics. (c) Normalizing the input features using the derived normalization statistics. (d) Resetting the memory queues when detecting a shift to a new domain, to prevent the outdated statistics from affecting the normalization process within the new domain.

Experimental results

1. Single domain TTA for image classification

Table 1. Single domain adaptation on CIFAR10-C and CIFAR100-C. Error rate (↓) averaged over 15 corruptions with severity level 5 using WideResNet-40-2 for each test batch size. We mark the best and second-best performance in bold and underline, respectively.

Table 2. Single domain adaptation on ImageNet-C. Error rate (↓) averaged over 15 corruptions with severity level 5 using ResNet50 for each test batch size. We mark the best performance in bold. The results of using pre-trained model with GN instead of BN are indicated by †.

2. Continual TTA for image classification

Table 3. Continual domain adaptation on CIFAR10-C and CIFAR100-C. Error rate (↓) averaged over 15 corruptions with severity level 5 using WideResNet-40-2 for each batch size. We mark the best and second-best performance in bold and underline, respectively.

3. Temporally-correlated TTA for image classification

Table 4. Temporally correlated (non-i.i.d.) domain adaptation on ImageNet-C. Error rate (↓) averaged over 15 corruption types with severity level 5 is reported using ResNet-18 for each test batch size. We mark the best performance in bold.

4. Experiment on semantic segmentation

Table 5. Adaptation on DG benchmarks in semantic segmentation. mIoU (↑) on four unseen domains using ResNet-50 based DeepLab_V3+. We mark the best performance in bold.

5. Ablation study

Table 6. Ablation study on each component of our method in continual domain adaptation on CIFAR10-C and CIFAR100-C. Error rate (↓) averaged over 15 corruptions with severity level 5 using WideResNet-40-2 for each batch size. We mark the best and second-best performance in bold and underline, respectively.

6. In-depth analysis: Normalization statistics of MemBN

Figure 3. Histogram of distances between in-batch statistics of a small batch size and those of a large batch size, and between normalization statistics of MemBN and in-batch statistics of the large batch size.

Acknowledgements

This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics (SRFC-IT1801-52), Samsung Electronics Co., Ltd (IO201210-07948-01), the NRF grant (NRF-2021R1A2C3012728) and the IITP grants funded by Ministry of Science and ICT, Korea (No.RS-2021-II210739).

Paper

MemBN: Robust Test-Time Adaptation via Batch Norm with Statistics Memory
Juwon Kang, Nayeong Kim, Jungseul Ok, and Suha Kwak
ECCV, 2024
[Paper] [Bibtex]