Published 2025-05-30
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
This paper addresses the practical challenges of deploying large language models, particularly in terms of inference efficiency and resource consumption. It proposes an improved distillation framework. The method builds on the TinyBERT structure and introduces a multi-layer semantic alignment mechanism. This enhances the student model's ability to learn deep semantic and structural information from the teacher model. The approach jointly considers the transfer of output distributions, hidden layer representations, and attention matrices. A combined loss function is designed to optimize multiple distillation objectives. During training, the student model maintains a lightweight structure while effectively inheriting the expressive power of the teacher model. This improves its generalization and stability in multi-task scenarios. The experiments are conducted on the GLUE benchmark. Evaluation covers training dynamics, output distribution learning, task stability, and inference speed on low-resource devices. The results show that the proposed method outperforms mainstream distillation models across several metrics. It demonstrates strong compatibility, efficiency, and deployment adaptability. The findings further validate the effectiveness of multi-layer alignment strategies in improving the performance of compact language models. This provides a technical foundation for building high-performance, low-cost natural language processing systems.