A Deep Learning-Based Predictive Framework for Backend Latency Using AI-Augmented Structured Modeling
Published 2024-10-30
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
This paper addresses the challenge of high variability and low predictability in API response times in backend systems. A deep learning method is proposed that combines structured modeling with latency-sensitive optimization. The model is based on Deep & Cross Network v2. It incorporates a Load-Aware Feature Fusion (LAFF) module to dynamically model interactions between system states and request features. In addition, a Latency-Sensitive Loss Adjustment (LSLA) mechanism is designed. It introduces a delay-weighted loss function to improve prediction accuracy on high-latency samples. Extensive experiments are conducted on the structured dataset Alibaba Cluster Trace. Results show that the proposed method outperforms mainstream time series models and structured modeling approaches across multiple regression metrics. It achieves lower MSE and MAE while maintaining a high R2 value. Ablation studies further validate the effectiveness of each module. The model remains stable and robust under various interference conditions, including increased proportions of delayed samples, injected feature noise, and varying time window settings. The study demonstrates that the proposed method provides a more accurate foundation for response time prediction in backend systems. It supports scheduling and resource optimization decisions. This work offers a practical path for applying struct.