Distilling Semantic Knowledge via Multi-Level Alignment in TinyBERT-Based Language Models

Tao Yang; Yu Cheng; Yijiashun Qi; Minggu Wei

doi:10.5281/zenodo.15580053

Vol. 4 No. 5 (2025)

Articles

Distilling Semantic Knowledge via Multi-Level Alignment in TinyBERT-Based Language Models

pdf

Tao Yang,
Yu Cheng,
Yijiashun Qi,
Minggu Wei

DOI: https://doi.org/10.5281/zenodo.15580053

Published 2025-05-30

How to Cite

Yang, T., Cheng, Y., Qi, Y., & Wei, M. (2025). Distilling Semantic Knowledge via Multi-Level Alignment in TinyBERT-Based Language Models. Journal of Computer Technology and Software, 4(5). https://doi.org/10.5281/zenodo.15580053

This work is licensed under a Creative Commons Attribution 4.0 International License.

Abstract

This paper addresses the practical challenges of deploying large language models, particularly in terms of inference efficiency and resource consumption. It proposes an improved distillation framework. The method builds on the TinyBERT structure and introduces a multi-layer semantic alignment mechanism. This enhances the student model's ability to learn deep semantic and structural information from the teacher model. The approach jointly considers the transfer of output distributions, hidden layer representations, and attention matrices. A combined loss function is designed to optimize multiple distillation objectives. During training, the student model maintains a lightweight structure while effectively inheriting the expressive power of the teacher model. This improves its generalization and stability in multi-task scenarios. The experiments are conducted on the GLUE benchmark. Evaluation covers training dynamics, output distribution learning, task stability, and inference speed on low-resource devices. The results show that the proposed method outperforms mainstream distillation models across several metrics. It demonstrates strong compatibility, efficiency, and deployment adaptability. The findings further validate the effectiveness of multi-layer alignment strategies in improving the performance of compact language models. This provides a technical foundation for building high-performance, low-cost natural language processing systems.

pdf

Distilling Semantic Knowledge via Multi-Level Alignment in TinyBERT-Based Language Models

How to Cite

Download Citation

Abstract

Most read articles by the same author(s)