DENSE-SPARSE-DENSE TRAINING FOR DEEP NEURAL NETWORKS

Notice

Recent Posts

Archives

관리 메뉴

둔비의 공부공간

Papers/Compression

Doonby 2023. 4. 11. 13:44

ICLR 2017

최근 딥러닝은 parameters가 매우 많기 때문에, 학습하기 어렵다는 문제가 있다.

논문에서는 regularizing + optimization performance를 위해 dense-sparse-dense training flow를 제시했다.

DSD flow에는 여러 step이 있다.

1. D step - Train dense network (connection weights, importance)

2. S step - Pruning the umimportant connectons and retraining the sparse network.

3. D step - Increase the model capacity, re-initialize the pruned params and retrain the whole dense network

위 DSD flow는 image classification, caption generation, speech recognition등에 사용하는 CNN, RNN, LSTM 등의 성능향상에 도움이 됐다.

ImageNet 데이터에 대해서, Top-1 Acc향상이 있었다.
- GoogLeNe 1.1%
- VGG16 4.3%
- ResNet-18 1.2%
- ResNet-50 1.1%
그 외 다양한 dataset, model에 대해서도 성능 향상이 있었다.

위 내용은, 현재 사용하는 training method는 best local optimum을 제대로 찾지 못한다는 것을 보여준다.

학습과정에서 DSD는 오직 하나의 추가 hyper-parameter(sparsity ratio)만 사용하며, 추론과정에서는 어떠한 overhead도 없다.

코드는 아래 링크에서 확인이 가능하다.

DSD Model Zoo

DSD model zoo. Better accuracy models from DSD training on Imagenet with same model architecture.

songhan.github.io

DSD Survey (0)	2023.04.12
AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks (0)	2023.04.11
DYNAMIC MODEL PRUNING WITH FEEDBACK (0)	2023.03.16
THE LOTTERY TICKET HYPOTHESIS:FINDING SPARSE, TRAINABLE NEURAL NETWORKS (0)	2023.03.09
RETHINKING THE VALUE OF NETWORK PRUNING (0)	2023.03.08

'Papers/Compression' Related Articles

Comments