목록Papers/Compression (38)
둔비의 공부공간

https://openreview.net/forum?id=Y9t7MqZtCR Sparse Weight Averaging with Multiple Particles for Iterative...Given the ever-increasing size of modern neural networks, the significance of sparse architectures has surged due to their accelerated inference speeds and minimal memory demands. When it comes to...openreview.net(재구현이 되는지는 검증하지 못했는데, 위 링크에 zip파일에 보면 코드도 올라와있다.) AbstractIMP(Iterative Magni..

https://github.com/ZIB-IOL/SMS GitHub - ZIB-IOL/SMS: Code to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved PruningCode to reproduce the experiments of the ICLR24-paper: "Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging" - ZIB-IOL/SMSgithub.comhttps://arxiv.org/abs/2306.16788 Sparse Model Soups: A Recipe for Improved Pruning via Mod..

https://github.com/alooow/fantastic_weights_paper GitHub - alooow/fantastic_weights_paper: Repository for the paper: "Fantastic Weights and How to Find Them: Where to Prune in DyRepository for the paper: "Fantastic Weights and How to Find Them: Where to Prune in Dynamic Sparse Training" - alooow/fantastic_weights_papergithub.comNeurIPS 2023 Accepted AbstractDynamic Sparse Training은 학습과정에서 adapti..

https://arxiv.org/abs/2403.13512 Scale Decoupled DistillationLogit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing logit-based methods marxiv.org Abstract기존 logits distillation 방법은 sub-optimal한 문제가 있다고 주장한다.- 다양한 sem..

https://arxiv.org/abs/2307.08500 Cumulative Spatial Knowledge Distillation for Vision Transformers Distilling knowledge from convolutional neural networks (CNNs) is a double-edged sword for vision transformers (ViTs). It boosts the performance since the image-friendly local-inductive bias of CNN helps ViT learn faster and better, but leading to two prob arxiv.org DKD저자와 같은 저자로, 코드는 아직 올라오지 않은 것 ..

https://arxiv.org/abs/2310.19444 One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation Knowledge distillation~(KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the teacher and student m arxiv.org https://..