목록Papers/Compression (39)
둔비의 공부공간

Neurips 2021 https://arxiv.org/abs/2106.12379 AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks The increasing computational requirements of deep neural networks (DNNs) have led to significant interest in obtaining DNN models that are sparse, yet accurate. Recent work has investigated the even harder case of sparse training, where the DNN weights are arxiv.org Network의 ..

ICLR 2017 https://arxiv.org/abs/1607.04381 최근 딥러닝은 parameters가 매우 많기 때문에, 학습하기 어렵다는 문제가 있다. 논문에서는 regularizing + optimization performance를 위해 dense-sparse-dense training flow를 제시했다. DSD flow에는 여러 step이 있다. 1. D step - Train dense network (connection weights, importance) 2. S step - Pruning the umimportant connectons and retraining the sparse network. 3. D step - Increase the model capacity, re-i..

ICLR 2020https://arxiv.org/abs/2006.07253 Dynamic Model Pruning with FeedbackDeep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generarxiv.org 미루고 미루다가 3개월이 지나버렸다.분명 읽고 리뷰했다고 생각했는데 기억이 나질 않는다! Abs..

(ICLR 2019. MIT) https://arxiv.org/abs/1803.03635 The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that arxiv.org Abstract ..

ICLIR 2019, Berkeley https://arxiv.org/abs/1810.05270 Rethinking the Value of Network Pruning Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning, according to a cer arxiv.org Abstract Network pruning은 inference c..

Defang Chen et al(CVPR2022) Abstract기존에는 일반적으로 knowledge representations을 정교하게 만들었는데, 모델의 개발과 이해가 어려운 문제가 있었다.이 논문에서는 단순하게 teacher의 classifier를 재 사용하고, student의 encoder를 $L2$ loss를 사용해 alignment함으로써, 성능차이를 줄이는 것을 보였다. Method alignment의 경우 feature $L2$ loss를 통해서 학습한다. fc를 그대로 사용하면, 압축률 자체에 문제가 있지 않을까? 했었는데, resnet의 경우 fc layer가 작아서 가능하다고 함 근데, 다른 KD와 다르게 parameter가 더 크다 (FC가 teacher와 동일하기 때..