Published 2025-04-30
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
This paper addresses the problems of information loss and weakened structural representation in high-dimensional sparse data, particularly in feature expression and semantic modeling. A deep data mining method is proposed, which integrates a diffusion model with a Transformer architecture. Based on the diffusion process, the method performs structural completion and noise suppression through forward perturbation and reverse reconstruction. It also incorporates multi-layer Transformer modules to enhance global dependency modeling and multi-scale semantic extraction. The model forms a unified "generation-enhancement" architecture. It enables effective extraction of discriminative latent representations under conditions of significant feature sparsity. The proposed method is systematically evaluated on the Reuters-21578 text dataset. Results show that it outperforms existing mainstream deep mining models in terms of accuracy, precision, and recall. It also demonstrates strong robustness and stability. Further experiments on sparsity sensitivity and training process visualization confirm the model's capability in feature learning and convergence under high-dimensional sparse settings.