Journal of Northeastern University(Natural Science) ›› 2025, Vol. 46 ›› Issue (1): 1-8.DOI: 10.12068/j.issn.1005-3026.2025.20230204

• Information & Control •    

Segmentation Method for Glass-like Object Based on Cross-Modal Fusion

Ying-cai WAN, Li-jin FANG, Qian-kun ZHAO   

  1. School of Robot Science & Engineering,Northeastern University,Shenyang 110169,China. Corresponding author: FANG Li-jin,E-mail: ljfang@mail. neu. edu. cn
  • Received:2023-07-17 Online:2025-01-15 Published:2025-03-25

Abstract:

Due to the lack of distinct textures and shapes, objects such as glass and mirrors pose challenges to traditional semantic segmentation algorithms, compromising the accuracy of visual tasks. A Transformer‑based RGBD cross‑modal fusion method is proposed for segmenting glass‑like objects. The method utilizes a Transformer network that extracts self‑attention features of RGB and depth through a cross‑modal fusion module and integrates RGBD features using a multi‑layer perceptron (MLP) mechanism to achieve the fusion of three types of attention features. RGB and depth features are fed back to their respective branches to enhance the network's feature extraction capabilities. Finally, a semantic segmentation decoder combines the features from four stages to output the segmentation results of glass‑like objects. Compared with the EBLNet method, the intersection‑and‑union ratio of the proposed method on the GDD, Trans10k and MSD datasets is improved by 1.64%, 2.26%, and 7.38%, respectively. Compared with the PDNet method on the RGBD-Mirror dataset, the intersection‑and‑union ratio is improved by 9.49%, verifying its effectiveness.

Key words: attention, semantic segmentation, glass?like object(GLO), cross?modal, depth estimation

CLC Number: