Vol. 4 No. 11 (2025)
Articles

Deep Image Classification via Spatiotemporal Feature Fusion Networks

Published 2025-11-30

How to Cite

Ackerly, R. (2025). Deep Image Classification via Spatiotemporal Feature Fusion Networks. Journal of Computer Technology and Software, 4(11). Retrieved from https://www.ashpress.org/index.php/jcts/article/view/233

Abstract

This study proposes a deep image classification network that fuses spatiotemporal features to address the limitations of traditional methods in local feature extraction and global dependency modeling. At the input stage, multi-scale convolution is applied to extract spatial structures and fine-grained textures under different receptive fields. An attention mechanism is then used to model temporal dynamics, effectively capturing correlations between time steps and enhancing the ability to represent dynamic changes. In the feature fusion stage, spatial and temporal features are weighted and integrated to form a unified high-level representation, which is further fed into the classification layer for prediction. To ensure stability and generalization, regularization is introduced to suppress overfitting, and systematic comparative and sensitivity experiments are conducted for validation. Results show that the proposed method outperforms baseline models on multiple metrics, including AUC, ACC, Precision, and Recall, demonstrating the advantages of spatiotemporal feature fusion. Data reduction experiments further reveal the impact of dataset scale on performance, underscoring the importance of sufficient data for spatiotemporal modeling. Overall, the method shows superior performance, robustness, and adaptability, providing a new solution for classification tasks in complex image scenarios.