Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Notice

Recent Posts

Archives

Today

Total

관리 메뉴

둔비의 공부공간

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time 본문

Papers/Ensemble

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Doonby 2023. 3. 8. 19:32

(2022 CVPR, META AI)

(github)

Abstract

Pre-trained모델을 갖고, 다른 하이퍼 파라미터로 fine tuned된 모델의 weights를 averaging했을때, 정확도와 안정성이 향상되는 것을 확인했다.

(https://arxiv.org/pdf/2110.12899.pdf, https://proceedings.neurips.cc/paper/2020/file/0607f4c705595b911a4f3e7a127b44e0-Paper.pdf)

둘중 논문에서 하나의 pretrained model로 hyper-params를 바꿔가면서 파라미터를 했을때 비슷한 loss convex를 갖는다고 했다.
그래서 그 증명을 토대로 위와 같은 model-soups가 가능했다고 함

- 기존 ensemble과 다르게 inference or memory cost 손해도 없다.

- ViT-G model로 ImageNet에서 90.94%의 top-1 acc를 달성하면서 SOTA를 달성했다.

Methods

fine tuned 기존 방식

pre-trained모델을 갖고, 다양한 params로 추가학습한다.
validation set에서 제일 좋은 수치의 모델을 선택하고, 나머지는 폐기한다.

이 논문에서 제안한 방식

pre-trained모델을 갖고, 다양한 params로 추가학습한다.
모델에 대해서 weight averaging으로 모델을 선택한다.

weight averaging하는 방법

weight averaging하는 방법은 3개로, 기존 제일 높은 validation acc의 모델을 사용하는 것과, Ensemble방법도 포함되어있다.
실제 사용하는 것은 Greedy soup이다.

GreedySoup를 만드는 방법

for j, model_path in enumerate(model_paths):

    print(f'Adding model {j} of {NUM_MODELS - 1} to uniform soup.')

    assert os.path.exists(model_path)
    state_dict = torch.load(model_path)
    if j == 0:
        uniform_soup = {k : v * (1./NUM_MODELS) for k, v in state_dict.items()}
    else:
        uniform_soup = {k : v * (1./NUM_MODELS) + uniform_soup[k] for k, v in state_dict.items()}

ValAcc가 높은 순서대로 모델을 정렬한다.
현재 선택한 모델을 ingradients에 포함하여 ValAcc를 기존 ingradients와 비교한다.
기존 ingradients의 ValAcc보다 높을 경우, 현재 모델을 ingradients에 추가한다.
for loop을 수행한 후에, ingradients에 포함된 모델들을 weight average하여 사용한다.

Experiments

'Papers > Ensemble' 카테고리의 다른 글

Training Independent Subnetworks For Robust Prediction (0)	2023.04.21

'Papers/Ensemble' Related Articles

Training Independent Subnetworks For Robust Prediction 2023.04.21

Comments

둔비의 공부공간

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time 본문

Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Abstract

Methods

Experiments

'Papers > Ensemble' 카테고리의 다른 글

티스토리툴바