Due to the lack of distinct textures and shapes, objects such as glass and mirrors pose challenges to traditional semantic segmentation algorithms, compromising the accuracy of visual tasks. A Transformer‑based RGBD cross‑modal fusion method is proposed for segmenting glass‑like objects. The method utilizes a Transformer network that extracts self‑attention features of RGB and depth through a cross‑modal fusion module and integrates RGBD features using a multi‑layer perceptron (MLP) mechanism to achieve the fusion of three types of attention features. RGB and depth features are fed back to their respective branches to enhance the network's feature extraction capabilities. Finally, a semantic segmentation decoder combines the features from four stages to output the segmentation results of glass‑like objects. Compared with the EBLNet method, the intersection‑and‑union ratio of the proposed method on the GDD, Trans10k and MSD datasets is improved by 1.64%, 2.26%, and 7.38%, respectively. Compared with the PDNet method on the RGBD-Mirror dataset, the intersection‑and‑union ratio is improved by 9.49%, verifying its effectiveness.