목록전체 글 (45)
둔비의 공부공간

CVPR 2023 https://arxiv.org/abs/2304.12777 Class Attention Transfer Based Knowledge Distillation Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on arxiv.org 한동안 ACL2024 paper를 작성하느라 바빠서, 논문읽고..

Zheng Li et al. (AAAI 2023) https://arxiv.org/abs/2211.16231 Curriculum Temperature for Knowledge Distillation Most existing distillation methods ignore the flexible role of the temperature in the loss function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In general, the temperature controls the discrepancy between two distribu arxiv.org https://github.com/z..

https://openaccess.thecvf.com/content/CVPR2023/papers/Jin_Multi-Level_Logit_Distillation_CVPR_2023_paper.pdf https://github.com/Jin-Ying/Multi-Level-Logit-Distillation GitHub - Jin-Ying/Multi-Level-Logit-Distillation: Code for 'Multi-level Logit Distillation' (CVPR2023) Code for 'Multi-level Logit Distillation' (CVPR2023) - GitHub - Jin-Ying/Multi-Level-Logit-Distillation: Code for 'Multi-level ..

https://arxiv.org/abs/2307.08436 DOT: A Distillation-Oriented TrainerKnowledge distillation transfers knowledge from a large model to a small one via task and distillation losses. In this paper, we observe a trade-off between task and distillation losses, i.e., introducing distillation loss limits the convergence of task loarxiv.org Accepted by ICCV 2023 Decoupled Knowledge distillation이랑, Cur..