文章摘要
佟大威,冯凯悦,余佳,王晓玲.基于实时多模态数据的地下洞室施工机械活动识别深度学习模型[J].水利学报,2025,56(9):1143-1154
基于实时多模态数据的地下洞室施工机械活动识别深度学习模型
Deep learning model for identifying construction machinery activities in underground caverns based on real-time multimodal data
投稿时间:2024-06-07  修订日期:2025-09-18
DOI:10.13243/j.cnki.slxb.20240350
中文关键词: 地下洞室  施工机械活动识别  多模态数据  注意力机制  特征融合
英文关键词: underground caverns  construction machinery activity recognition  multi-modal data  attention mechanism  feature fusion
基金项目:国家自然科学基金项目(U24B20111,52279137,52379132)
作者单位E-mail
佟大威 天津大学 水利工程智能建设与运维全国重点实验室, 天津 300072  
冯凯悦 天津大学 水利工程智能建设与运维全国重点实验室, 天津 300072  
余佳 天津大学 水利工程智能建设与运维全国重点实验室, 天津 300072 yujia@tju.edu.cn 
王晓玲 天津大学 水利工程智能建设与运维全国重点实验室, 天津 300072  
摘要点击次数: 37
全文下载次数: 20
中文摘要:
      施工机械活动识别是生产效率分析和生产安全保障的有效途径。当前机械活动识别主要关注运动学、视觉、听觉等各模态自身的特征,缺乏考虑多模态数据间的内在联系,在光线昏暗、环境狭窄、声音嘈杂的地下洞室内效果不佳。本文基于Transformer模型,利用注意力机制能够捕获不同模态数据间长时依赖联系的优势,提出了基于实时多模态数据的地下洞室施工机械活动识别深度学习模型。首先,实时采集机械施工过程中的视频、音频与运动学数据,并分别采用S3D、VGGish、Conformer模型提取三种模态数据的初级特征。在此基础上,采用跨模态注意力、自注意力机制对初级特征进行整合提取,以获得多模态混合特征。最后基于多头注意力机制对初级特征和混合特征进一步融合,基于该融合特征进行活动识别分类。案例分析表明,本文所提模型的识别精度和F1值分别达到98.14%和96.47%,相比表现最优的单一模态分别提升了6.38%和9.13%,为地下洞室施工机械活动识别提供了新的途径。
英文摘要:
      Identification of construction machinery activities is an effective approach to analyzing production efficiency and ensuring operational safety. Current methods primarily focus on the characteristics of individual modalities such as kinematics,vision,and acoustics,without adequately considering the intrinsic correlations between multimodal data. This limitation reduces their effectiveness in environments such as dimly lit,confined,and noisy underground chambers. To address this,this study proposes a deep learning model based on the Transformer architecture for real-time multimodal data-based recognition of construction machinery activities in underground chambers. Leveraging the attention mechanism’s capability to capture long-term dependencies across different modalities,the proposed model integrates multimodal data to improve recognition performance. Initially,real-time video,audio,and kinematic data are collected during the construction process. The preliminary features of these three modalities are extracted using S3D,VGGish,and Conformer models,respectively. Cross-modal attention and self-attention mechanisms are then applied to integrate and extract these preliminary features,generating multimodal fused features. Subsequently,the multi-head attention mechanism further combines the preliminary and fused features,enabling robust activity classification based on the enriched feature set. Case studies demonstrate that the proposed model achieves an identification accuracy of 98.14% and an F1 score of 96.47%,representing improvements of 6.38% and 9.13%, respectively,over the best-performing single-modality models. This study provides a novel approach for recognizing construction machinery activities in underground chamber environments.
查看全文   查看/发表评论  下载PDF阅读器
关闭