TENT: FULLY TEST-TIME ADAPTATION BY ENTROPY MINIMIZATION

Notice

Recent Posts

Archives

Today

Total

관리 메뉴

둔비의 공부공간

TENT: FULLY TEST-TIME ADAPTATION BY ENTROPY MINIMIZATION 본문

Papers/Domain Adaptation

TENT: FULLY TEST-TIME ADAPTATION BY ENTROPY MINIMIZATION

Doonby 2023. 9. 6. 12:47

https://arxiv.org/abs/2006.10726

Tent: Fully Test-time Adaptation by Entropy Minimization

A model must adapt itself to generalize to new and different data during testing. In this setting of fully test-time adaptation the model has only the test data and its own parameters. We propose to adapt by test entropy minimization (tent): we optimize th

arxiv.org

abstract

Model은 새롭거나 다른 data에 대해 testing하는 도중에도 generalize 할 수 있어야 한다.

이를 위해서 논문의 저자들은 'test entropy minimzation'을 제안했다.

방법은 각 batch에 대해 normalization 통계를 측정하고, channel-wise affine transformation을 최적화하는 것이다.

ImageNet, CIFAR10, CIFAR100에 대해서 generalization error를 낮출 수 있었다.

이러한 방법은 별도의 training수정 없이, one epoch의 test-time optimization으로 달성할 수 있었다.

Introduction

Deep learning model은 새롭거나 다른 dataset (dataset shift)에서는 성능이 안좋다.

그러므로, 훈련중에 보지 못한 test data에 대해서도 모델 스스로가 적절하게 적응할 수 있어야 한다.

조건은, testing할때 학습한 parameter와 target data만 사용하는 "Fully test-time adaptation" setting이다.

training에 사용한 source data나 supervision등을 사용해서는 안된다.

Entropy를 최소화하는 Tent는 '확신'을 갖는 방향으로 최적화하는 것이다.

corruption이 심할수록 entropy가 커지고, loss가 커지는 것을 볼 수 있다.

corruption이 심해지니까, classification이 어려워져 loss가 커지고 error가 증가하므로 entropy역시 커진다는 의미다.

Entropy를 최소화 하기위해 batch별로 affine parameters를 최적화하고, target data의 통계를 측정해서 normalize 및 transform한다.

low-dimensional, channel-wise feature modulation을 선택하면 testing중에 효율적으로 적응할 수 있다.

contribution은 다음과 같다.

source data없이, target data만 사용해서 test time에 adaptation할 수 있도록 했다.
online, offline update를 이용해서 결과를 보였다.
Entropy를 adaptation의 목적으로 실험해서, test-time entropy minimization (tent)를 제안했다.
corruption에 대한 robustness에 대해서 더 낮은 error율을 보였다.
digit classification, semanntic segmentation에 대해서 online, source-free adaptation을 적용할 수 있고, 다른 source data와 더 많은 optimization을 하는 애들과도 경쟁할만큼 성능이 좋다.

Setting

목적은 심플하다.

source data, label $x^{s}$ , $y^{s}$ 로 paramter $\theta$ 를 학습한 model $f_{\theta}(x)$ 를 shifted target data $x^{t}$ 에 대해서 adaptation시키는 것이다.

domain adaptation은 source, target data를 모두 사용하고

test-time training은 처음 training할때 supervision loss $L(x^{s}, y^{s})$ 와 self-supervised loss $L(x^{s})$ 가 필요하다.

Method

$\hat{y} = f_{\theta}(x^{t})$ 의 를 최소화하는 것이 $L(x_{t})$ 이다.

$\theta$ 는 training/source data에 유일한 representation이다. 그러므로, $\theta$ 를 변경하면, 학습한 것에서 크게 벗어날 수 있다.

또한, $f$ 는 비선형이고, $\theta$ 는 high dimension이라 최적화에 민감해서, 둘 다 test-time usage에는 비효율적이다.

그러므로, 안정성과 효율성을 모두 챙기기 위해 scale and shift와 low-dimensional (channel-wise)만 업데이트하는 방법을 택했다.

Two steps of modulations

normalization by statistics
transformation by parameters

normalization by statistics는 input $x$ 를 $\bar{x} = (x - \mu ) / \sigma$ 로 normalization한다.

그러고, $x' = \gamma \bar{x} + \beta$ 로 $\bar{x}$ 를 transformation시킨다.

$\mu$ , $\sigma$ 는 data의 통계이며, $\gamma$ , $\beta$ 는 loss에 의해 최적화 된다.

각 normalization layer $l$ 와 channel $k$ 의 affine transform params ${\gamma_{l,k}, \beta_{l,k}$을 모으고

그 외의 affine transform은 고정시켜둔다.

test data가 들어오면 평균과 분산을 이용해서 $\bar{x}$ 를 만들고, entropy를 최소화시키는 loss를 backward하면서 $\gamma$ 와 $\beta$ 를 학습한다.

Experiments

'Papers > Domain Adaptation' 카테고리의 다른 글

NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation (0)	2023.09.12
Continual Test-Time Domain Adaptation (0)	2023.09.06

'Papers/Domain Adaptation' Related Articles

Comments

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

둔비의 공부공간

둔비의 공부공간

TENT: FULLY TEST-TIME ADAPTATION BY ENTROPY MINIMIZATION 본문

TENT: FULLY TEST-TIME ADAPTATION BY ENTROPY MINIMIZATION

abstract

Introduction

Setting

Method

Experiments

'Papers > Domain Adaptation' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역