(1)

Wang, Y. Structured Compression of Large Language Models With Sensitivity-Aware Pruning Mechanisms. JCTS 2024, 3.