Wang, Y. (2024). Structured Compression of Large Language Models with Sensitivity-aware Pruning Mechanisms. Journal of Computer Technology and Software, 3(9). https://doi.org/10.5281/zenodo.15851638