Contrastive Learning Framework for Multimodal Knowledge Graph Construction and Data-Analytical Reasoning
Published 2024-07-30
How to Cite

This work is licensed under a Creative Commons Attribution 4.0 International License.
Abstract
This study addresses key challenges in multimodal knowledge graph construction, including difficulties in semantic alignment, insufficient modality fusion, and limited entity-relation extraction capability. It proposes a multimodal graph construction and relation extraction algorithm that incorporates a contrastive learning mechanism. During the feature encoding stage, the method performs independent representation learning for modalities such as text and images. A shared semantic space is then constructed through linear mapping. In the semantic alignment stage, the model introduces a contrastive learning objective. By constructing positive and negative sample pairs, it enhances the consistency of representations across different modalities. This improves both the aggregation and discrimination of entity semantics. For structural modeling, the algorithm integrates a graph-structure-aware mechanism. It leverages contextual information from adjacent entities to enhance the structural completeness of entity representations. A relation classification module based on entity pairs is built to complete high-quality triple extraction. To validate the effectiveness of the method, a series of sensitivity experiments are conducted. These cover variations in hyperparameters, data scale, and input noise. The evaluation focuses on entity recognition accuracy, relation prediction performance, and the stability of semantic alignment. Experimental results show that the proposed method achieves strong performance across multiple evaluation metrics. It demonstrates good robustness and generalization, and effectively improves the construction quality and structural expressiveness of multimodal knowledge graphs.