Published 2024-07-30
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
This paper addresses the problems of semantic inconsistency, cross-modal alignment difficulty, and low efficiency in standardized mapping during the ETL (Extract-Transform-Load) process of multi-source heterogeneous medical data, and proposes a deep learning-based method for multimodal fusion and unified semantic embedding modeling. The method extracts features from different modalities through a structured feature encoder, a text encoder, and a categorical encoder, and constructs a shared semantic embedding space using a cross-modal attention mechanism to achieve efficient alignment and semantic consistency modeling between structured data, unstructured text, and coded information. In the mapping prediction stage, the model integrates attention-enhanced semantic matching with a confidence calibration mechanism, effectively improving the ability to identify complex field relationships and mapping accuracy. The experimental design covers multi-dimensional evaluations, including hyperparameter sensitivity, environmental sensitivity, and data sensitivity, verifying the stability and robustness of the method under various settings. Comparative results with representative baseline models show that the method achieves the best performance in key metrics such as ACC, AUC, and F1-Score, and demonstrates significant advantages in handling medical data with high missing rates and cross-coding systems. The findings confirm that the proposed method can reduce reliance on manual rules and mapping maintenance costs while improving medical data integration and interoperability, providing a solid technical foundation for high-quality medical data analysis and applications.