东北大学学报(自然科学版) ›› 2025, Vol. 46 ›› Issue (1): 1-8.DOI: 10.12068/j.issn.1005-3026.2025.20230204

• 信息与控制 •    

基于跨模态融合的玻璃类似物分割方法

万应才, 房立金, 赵乾坤   

  1. 东北大学 机器人科学与工程学院,辽宁 沈阳 110169
  • 收稿日期:2023-07-17 出版日期:2025-01-15 发布日期:2025-03-25
  • 作者简介:万应才(1990—),男,甘肃靖远人,东北大学博士研究生
    房立金(1965—),男,辽宁沈阳人,东北大学教授,博士生导师.
  • 基金资助:
    国家自然科学基金资助项目(62273081);辽宁省基础研究计划项目(2022JH2/101300202)

Segmentation Method for Glass-like Object Based on Cross-Modal Fusion

Ying-cai WAN, Li-jin FANG, Qian-kun ZHAO   

  1. School of Robot Science & Engineering,Northeastern University,Shenyang 110169,China. Corresponding author: FANG Li-jin,E-mail: ljfang@mail. neu. edu. cn
  • Received:2023-07-17 Online:2025-01-15 Published:2025-03-25

摘要:

玻璃和镜子等物体因缺乏明显纹理和形状,使得传统语义分割方法难以有效识别,影响视觉任务准确性.为了解决这个问题提出了一种基于Transformer的RGBD跨模态融合方法,用于玻璃类似物的分割.该方法采用Transformer网络,通过跨模态融合模块提取RGB和深度特征的自注意力,并利用多层注意力机制(MLP)整合RGBD特征,实现3种注意力特征的融合.RGB和深度特征被反馈到各自分支,以增强网络的特征提取能力.最终,语义分割解码器结合4个阶段的融合特征输出玻璃类似物的分割结果.结果表明,本文方法与EBLNet方法相比在GDD,Trans10k和MSD数据集上的交并比分别提高1.64%,2.26%,7.38%,与PDNet方法比较在RGBD-Mirror数据集上交并比提高了9.49%,验证了其有效性.

关键词: 注意力, 语义分割, 玻璃类似物, 跨模态, 深度估计

Abstract:

Due to the lack of distinct textures and shapes, objects such as glass and mirrors pose challenges to traditional semantic segmentation algorithms, compromising the accuracy of visual tasks. A Transformer‑based RGBD cross‑modal fusion method is proposed for segmenting glass‑like objects. The method utilizes a Transformer network that extracts self‑attention features of RGB and depth through a cross‑modal fusion module and integrates RGBD features using a multi‑layer perceptron (MLP) mechanism to achieve the fusion of three types of attention features. RGB and depth features are fed back to their respective branches to enhance the network's feature extraction capabilities. Finally, a semantic segmentation decoder combines the features from four stages to output the segmentation results of glass‑like objects. Compared with the EBLNet method, the intersection‑and‑union ratio of the proposed method on the GDD, Trans10k and MSD datasets is improved by 1.64%, 2.26%, and 7.38%, respectively. Compared with the PDNet method on the RGBD-Mirror dataset, the intersection‑and‑union ratio is improved by 9.49%, verifying its effectiveness.

Key words: attention, semantic segmentation, glass?like object(GLO), cross?modal, depth estimation

中图分类号: