THE LOTTERY TICKET HYPOTHESIS:FINDING SPARSE, TRAINABLE NEURAL NETWORKS

Notice

Recent Posts

Archives

Today

Total

관리 메뉴

둔비의 공부공간

THE LOTTERY TICKET HYPOTHESIS:FINDING SPARSE, TRAINABLE NEURAL NETWORKS 본문

Papers/Compression

THE LOTTERY TICKET HYPOTHESIS:FINDING SPARSE, TRAINABLE NEURAL NETWORKS

Doonby 2023. 3. 9. 17:58

(ICLR 2019. MIT)

https://arxiv.org/abs/1803.03635

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that

arxiv.org

Abstract

Pruning에 의해 만들어진 sparse architecture는 처음부터 학습하기가 어렵다.

논문 저자들은 표준적인 pruning기술이 자연스럽게 초기화를 통해 잘 훈련되는 sub-network를 찾아낸다는 것을 알아냈다.

위의 결과를 바탕으로, 'lottery ticket hypothesis'를 발견했다.

Dense 하고, 랜덤으로 초기화된 feed-forward network에는 독립적으로 학습해도, 기존 network만큼의 성능이 나오는 "winning tickets"인 sub-network가 포함되어 있다.

MNIST와 CIFAR10에 대해 fc와 conv 등 feed-forward architecture 각 크기의 10-20% 미만인 당첨 티켓을 지속적으로 찾는다.

이 크기 이상에서, 우리가 찾은 당첨 티켓은 원래 네트워크보다 더 빨리 학습하고 더 높은 테스트 정확도에 도달한다.

The Lottery Ticket Hypothesis.

쉽게 말해서, "더 적은 parameter와 iteration으로 원래의 network보다 acc를 높게 만드는 pruning mask m과 가중치가 존재할 거다."

$f(x;m\odot\theta_{0})$를 winning tickets이라고 할 때, 이 가중치를 초기화시키면 $f(x;m\odot\theta_{0}')$는 더 이상 기존 초기화했던 네트워크의 성능이 나오지 않는다.

Identifying winning tickets

Randomly initialize a neural network $f(x;\theta_{0})$
Train the network for $j$ iterations, arriving at parameters $\theta_{j}$
Prune $p%$ of the parameters in $\theta_{j}$, creating a mask $m$.
Reset the remaining parameters to their values in $\theta_{0}$, creating the winning ticket $f(x;m\odot\theta_{0})$

요약하자면, 랜덤으로 초기화하고 $j$ iteration 만큼 학습한 후에, $p\%$로 pruning 한 mask $m$을 갖고, 처음 초기화 한 가중치에 mask $m$을 적용해서 winning ticket을 만든다는 것이다. 근데 이걸 $p^{\frac{1}{n}}\%$씩 $n$번 iteration 한다.

한번 하는 것보다 $n$번 iteration 하는 것이 original network와 성능이 일치한다고 한다.

Result

MNIST에서 fc, CIFAR10에서 conv로 winning ticket의 존재를 확인해 봤다.

여러 optimization과 dropout, weight decay, batchnorm, residual connections 등도 사용해 봤다.

깊은 network에서는 winning ticket을 찾는 조건이 learning rate에 민감한 것을 확인했다.

높은 learning rate에서 winning ticket을 찾기 위해, warmup이 필요했다.

기존 network에 비해 10~20%의 사이즈를 갖는 winning ticket을 찾았으며, 동일한 iteration에서 기존 network의 test 정확도와 비슷하거나, 그것보다 높은 성능을 보였다.

하지만, 이 모델을 랜덤으로 초기화했을 때 더 이상 성능은 나오지 않았다.

이를 보았을 때, winning ticket은 network의 구조보다는 weight가 중요함을 알 수 있었다.

'Papers > Compression' 카테고리의 다른 글

DENSE-SPARSE-DENSE TRAINING FOR DEEP NEURAL NETWORKS (0)	2023.04.11
DYNAMIC MODEL PRUNING WITH FEEDBACK (0)	2023.03.16
RETHINKING THE VALUE OF NETWORK PRUNING (0)	2023.03.08
Knowledge Distillation with the Reused Teacher Classifier (0)	2023.03.08
Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation (0)	2023.03.08

'Papers/Compression' Related Articles

Comments

둔비의 공부공간

THE LOTTERY TICKET HYPOTHESIS:FINDING SPARSE, TRAINABLE NEURAL NETWORKS 본문

THE LOTTERY TICKET HYPOTHESIS:FINDING SPARSE, TRAINABLE NEURAL NETWORKS

Abstract

The Lottery Ticket Hypothesis.

'Papers > Compression' 카테고리의 다른 글

티스토리툴바