医学图像压缩与视觉任务联合优化方法

doi:10.12068/j.issn.1005-3026.2026.20259020

摘要/Abstract

摘要：

针对医学图像处理中依赖独立编码组件无法实现数据压缩与机器视觉任务联合优化的问题，本文构建了一种端到端的机器视觉任务驱动的医学图像压缩网络（machine vision task-driven medical image compression network，MVMICNet）模型，端到端地实现数据压缩与医学图像分析的和谐统一.为了保持医学图像压缩前后机器视觉任务的性能，设计了任务感知的改进码率-准确率损失函数，通过引入任务相关的损失项，在优化过程中动态平衡码率、重建图像失真与机器视觉任务精度三者之间的关系；同时，MVMICNet模型采用分阶段训练的模式，针对机器视觉任务的不同特性进行特定的优化，确保了模型能够精准捕获对诊断至关重要的特征信息，实现了压缩效率与任务性能的同步提升，从而在复杂的医学应用场景中展现出更优越的鲁棒性；最终，本文在语义分割和目标检测任务中验证了该框架的有效性.

关键词: 医学图像压缩, 语义分割, 目标检测, 卷积神经网络（CNN）, 任务驱动优化

Abstract:

In medical image processing， the reliance on independent encoding components makes it impossible to achieve joint optimization of data compression and machine vision tasks. To address this issue， an end-to-end machine vision task-driven medical image compression network （MVMICNet） was proposed， achieving harmonious unification of data compression and medical image analysis in an end-to-end manner. To maintain the performance of machine vision tasks before and after medical image compression， a task-aware improved code rate-accuracy loss function was designed. By introducing task-related loss terms， it dynamically balanced the relationship among code rate， reconstructed image distortion， and machine vision task accuracy during the optimization process. Furthermore， the MVMICNet model adopted a stage-wise training approach， specifically optimizing for the different characteristics of machine vision tasks to ensure that the model can accurately capture the feature information crucial for diagnosis. This has achieved a simultaneous improvement in compression efficiency and task performance， thus demonstrating superior robustness in complex medical application scenarios. Finally， the effectiveness of the framework was verified in semantic segmentation and object detection tasks.

Key words: medical image compression, semantic segmentation, object detection, convolutional neural network （CNN）, task-driven optimization

中图分类号:

TP 391

姚超, 高梓轩, 陈俊如, 卢奕鹏. 医学图像压缩与视觉任务联合优化方法[J]. 东北大学学报（自然科学版）, 2026, 47(1): 11-19.

Chao YAO, Zi-xuan GAO, Jun-ru CHEN, Yi-peng LU. Joint Optimization Approach for Medical Image Compression and Vision Tasks[J]. Journal of Northeastern University(Natural Science), 2026, 47(1): 11-19.

图/表 11

图1 MVMICNet框架结构示意图

Fig.1 Schematic diagram of MVMICNet framework

图2 MVMICNet任务特定优化阶段框架结构示意图

Fig.2 Schematic diagram of MVMICNet framework at task-specific optimization stage

表1 CVC-ColonDB数据集上语义分割精度的对比结果

Table 1 Comparison results of semantic segmentation accuracy on CVC-ColonDB dataset

算法	Bpp	PSNR/dB	MS-SSIM	mIoU
算法	Bpp	PSNR/dB	MS-SSIM	第一阶段	第二阶段
BPG	0.090	30.02	0.908	0.498 7
	0.100	31.40	0.929	0.559 1
	0.114	33.01	0.944	0.618 3
	0.132	34.53	0.956	0.668 3
MBT2018-Mean	0.080	32.55	0.907	0.613 5
	0.094	33.84	0.938	0.674 5
	0.111	35.15	0.952	0.722 0
	0.130	35.98	0.969	0.754 8
Cheng2020-Anchor	0.068	36.45	0.934	0.697 1
	0.089	37.98	0.952	0.743 5
	0.112	39.04	0.972	0.789 8
	0.137	39.82	0.978	0.819 2
MVMICNet	0.065	40.38	0.982	0.780 4	0.822 6
	0.083	41.25	0.984	0.825 2	0.831 7
	0.105	42.08	0.987	0.832 6	0.849 0
	0.131	42.83	0.989	0.833 7	0.864 5

图3 CVC-ColonDB和ChestX-Det数据集上不同算法的码率-准确率曲线比较结果（a）—CVC-ColonDB；（b）—ChestX-Det；（c）—ChestX-Det；（d）—ChestX-Det.

Fig.3 Comparison of code rate-accuracy curves for different algorithms on CVC-ColonDB and ChestX-Det datasets

图4 不同算法在CVC-ColonDB数据集上的语义分割可视化结果对比（a）—原始图像；（b）—原始图像对应的语义分割结果；（c）—BPG压缩图像的语义分割结果；（d）—Cheng2020-Anchor压缩图像的语义分割结果；（e）—MVMICNet的第一阶段压缩图像的语义分割结果；（f）—MVMICNet的第二阶段语义分割结果.

Fig.4 Visual result comparison of semantic segmentation by different algorithms on CVC-ColonDB dataset

图5 不同算法在CVC-ColonDB数据集上对同1张图像的重建结果

Fig.5 Reconstruction results of the same image from CVC-ColonDB dataset by different algorithms

表2 ChestX-Det数据集上目标检测精度的对比结果1

Table 2 Comparison results 1 of object detection accuracy on ChestX-Det dataset

指标			MVMICNet				BPG
指标			$λ 2 = 0.000 01$	$λ 2 = 0.000 1$	$λ 2 = 0.001$	$λ 2 = 0.1$	q=34		q=31	q=28		q=25
Bpp			0.047	0.060	0.074	0.089	0.046	0.060			0.079		0.091
PSNR/dB			41.14	41.93	42.57	43.10	35.98	37.50			38.76		39.52
MS-SSIM			0.984 7	0.987 7	0.989 8	0.991 3	0.959 4	0.969 5			0.976 7		0.983 4
mAP	IoU=0.50:0.95	第一阶段	0.079 7	0.089 9	0.101 0	0.115 4	0.030 0	0.049 1			0.030 0		0.093 4
	IoU=0.50:0.95	第二阶段	0.083 8	0.094 9	0.105 6	0.120 1	0.030 0	0.049 1			0.030 0		0.093 4
	IoU=0.50	第一阶段	0.164 9	0.190 1	0.218 4	0.241 0	0.064 0	0.106 9			0.064 0		0.203 3
	IoU=0.50	第二阶段	0.175 9	0.203 6	0.230 4	0.253 6	0.064 0	0.106 9			0.064 0		0.203 3
	IoU=0.75	第一阶段	0.069 3	0.082 0	0.092 0	0.100 6	0.023 5	0.036 3			0.023 5		0.084 7
	IoU=0.75	第二阶段	0.071 9	0.084 7	0.093 6	0.103 1	0.023 5	0.036 3			0.023 5		0.084 7

表2 ChestX-Det数据集上目标检测精度的对比结果1

Table 2 Comparison results 1 of object detection accuracy on ChestX-Det dataset

指标			MVMICNet				BPG
指标			$λ 2 = 0.000 01$	$λ 2 = 0.000 1$	$λ 2 = 0.001$	$λ 2 = 0.1$	q=34		q=31	q=28		q=25
Bpp			0.047	0.060	0.074	0.089	0.046	0.060			0.079		0.091
PSNR/dB			41.14	41.93	42.57	43.10	35.98	37.50			38.76		39.52
MS-SSIM			0.984 7	0.987 7	0.989 8	0.991 3	0.959 4	0.969 5			0.976 7		0.983 4
mAP	IoU=0.50:0.95	第一阶段	0.079 7	0.089 9	0.101 0	0.115 4	0.030 0	0.049 1			0.030 0		0.093 4
	IoU=0.50:0.95	第二阶段	0.083 8	0.094 9	0.105 6	0.120 1	0.030 0	0.049 1			0.030 0		0.093 4
	IoU=0.50	第一阶段	0.164 9	0.190 1	0.218 4	0.241 0	0.064 0	0.106 9			0.064 0		0.203 3
	IoU=0.50	第二阶段	0.175 9	0.203 6	0.230 4	0.253 6	0.064 0	0.106 9			0.064 0		0.203 3
	IoU=0.75	第一阶段	0.069 3	0.082 0	0.092 0	0.100 6	0.023 5	0.036 3			0.023 5		0.084 7
	IoU=0.75	第二阶段	0.071 9	0.084 7	0.093 6	0.103 1	0.023 5	0.036 3			0.023 5		0.084 7

表3 ChestX-Det数据集上目标检测精度的对比结果2

Table 3 Comparison result 2 of object detection accuracy on the ChestX-Det dataset

指标		MBT2018-Mean				Cheng2020-Anchor
指标		q=5	q=4	q=3	q=2	q=5	q=4	q=3	q=2
Bpp		0.041	0.054	0.072	0.097	0.043	0.061	0.079	0.093
PSNR/dB		37.36	38.60	39.98	40.81	39.41	40.63	41.63	42.36
MS-SSIM		0.969 1	0.974 0	0.978 8	0.984 5	0.979 6	0.983 2	0.985 4	0.986 6
mAP	IoU=0.50:0.95	0.048 4	0.069 9	0.086 4	0.096 7	0.064 9	0.084 0	0.096 7	0.106 2
	IoU=0.50	0.102 4	0.142 7	0.183 1	0.214 1	0.128 2	0.173 4	0.208 2	0.223 7
	IoU=0.75	0.036 1	0.054 2	0.072 3	0.087 1	0.055 4	0.074 7	0.087 6	0.092 4

图6 不同算法在ChestX-Det数据集上目标检测可视化结果比较

Fig.6 Visual result comparison of object detection by different algorithm on ChestX-Det dataset

图7 不同算法在ChestX-Det数据集上对同1张图像的重建结果

Fig.7 Reconstruction results of the same image from ChestX-Det dataset by different algorithm

表4 参数λ2对语义分割任务性能的影响 (segmentation performance)

Table 4 Impact of parameter λ2 on semantic

$λ 2$	Bpp	PSNR/dB	mIoU/%	准确率/%
0.1	0.252 30	24.250	33.040	45.5
0.001	0.245 60	31.880	45.780	63.0
0.000 1	0.243 20	35.080	58.030	79.9
0.000 01	0.243 10	34.570	52.710	72.6

表4 参数λ2对语义分割任务性能的影响 (segmentation performance)

Table 4 Impact of parameter λ2 on semantic

$λ 2$	Bpp	PSNR/dB	mIoU/%	准确率/%
0.1	0.252 30	24.250	33.040	45.5
0.001	0.245 60	31.880	45.780	63.0
0.000 1	0.243 20	35.080	58.030	79.9
0.000 01	0.243 10	34.570	52.710	72.6

参考文献 32

[1]	Wallace G K. The JPEG still picture compression standard［J］. IEEE Transactions on Consumer Electronics， 1992， 38（1）： 18-34.
[2]	Christopoulos C， Skodras A， Ebrahimi T. The JPEG2000 still image coding system： an overview［J］. IEEE Transactions on Consumer Electronics， 2000， 46（4）： 1103-1127.
[3]	Sullivan G J， Ohm J R， Han W J， et al. Overview of the high efficiency video coding （HEVC） standard［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2012， 22（12）： 1649-1668.
[4]	Ballé J， Laparra V， Simoncelli E P. End-to-end optimized image compression［C］// Proceedings of the International Conference on Learning Representations. Toulon，2017： 1611.01704.
[5]	Ballé J， Minnen D， Singh S， et al. Variational image compression with a scale hyperprior［C］// Proceedings of International Conference on Learning Representations. Vancouver， 2018： 1802.01436.
[6]	Liu J H， Lu G， Hu Z H， et al. A unified end-to-end framework for efficient deep image compression［EB/OL］. （2020-02-09）［2025-05-10］. .
[7]	乔思波，庞善臣，王敏，等. 基于残差混合注意力机制的脑部CT图像分类卷积神经网络模型［J］. 电子学报， 2021， 49（5）： 984-991.
	Qiao Si-bo， Pang Shan-chen， Wang Min， et al. A convolutional neural network for brain CT image classification based on residual hybrid attention mechanism［J］. Acta Electronica Sinica， 2021， 49（5）： 984-991.
[8]	张诗源，赵桐溪，戚飞越，等. 基于小波变换的智能生物医学图像分类算法［J］. 应用数学进展， 2025（3）： 16-25.
	Zhang Shi-yuan， Zhao Tong-xi， Qi Fei-yue， et al. Intelligent biomedical image classification algorithm based on wavelet transform［J］. Advances in Applied Mathematics， 2025（3）： 16-25.
[9]	江贵平，秦文健，周寿军，等. 医学图像分割及其发展现状［J］. 计算机学报， 2015， 38（6）： 1222-1242.
	Jiang Gui-ping， Qin Wen-jian， Zhou Shou-jun， et al. Medical image segmentation and its development status［J］. Chinese Journal of Computers， 2015， 38（6）： 1222-1242.
[10]	周涛，董雅丽，霍兵强，等. U-Net网络医学图像分割应用综述［J］. 中国图象图形学报， 2021， 26（9）： 2058-2077.
	Zhou Tao， Dong Ya-li， Huo Bing-qiang， et al. U-Net and its applications in medical image segmentation： a review［J］. Journal of Image and Graphics， 2021， 26（9）： 2058-2077.
[11]	刘飞，张俊然，杨豪. 基于深度学习的医学图像识别研究进展［J］. 中国生物医学工程学报， 2018， 37（1）： 86-94.
	Liu Fei， Zhang Jun-ran， Yang Hao. Research progress of medical image recognition based on deep learning［J］. Chinese Journal of Biomedical Engineering， 2018， 37（1）： 86-94.
[12]	苏华强，雷海军，雷柏英. 多分支特征融合分类网络用于CXR图像识别［J］. 信号处理， 2025， 41（2）： 253-266.
	Su Hua-qiang， Lei Hai-jun， Lei Bai-ying. Multi-branch feature fusion classification network for chest X-ray image recognition［J］. Journal of Signal Processing， 2025， 41（2）： 253-266.
[13]	Duan L Y， Liu J Y， Yang W H， et al. Video coding for machines： a paradigm of collaborative compression and intelligent analytics［J］. IEEE Transactions on Image Processing， 2020， 29： 8680-8695.
[14]	Wang S R， Wang Z， Wang S Q， et al. End-to-end compression towards machine vision： network architecture design and optimization［J］. IEEE Open Journal of Circuits and Systems， 2021， 2： 675-685.
[15]	Girod B， Chandrasekhar V， Chen D M， et al. Mobile visual search［J］. IEEE Signal Processing Magazine， 2011， 28（4）： 61-76.
[16]	王凯. 基于双分支特征融合的高动态范围医学影像压缩研究［D］. 哈尔滨：哈尔滨工业大学， 2022.
	Wang Kai. High dynamic range medical image compression based on two-branch feature fusion ［D］. Harbin： Harbin Institute of Technology， 2022.
[17]	Herbert R， Tuytelaars T， Gool L V. SURF： speeded up robust features［C］// Proceedings of the European Conference on Computer Vision. Graz， 2006： 404-417.
[18]	Redondi A， Cesana M， Tagliasacchi M. Rate-accuracy optimization in visual wireless sensor networks［C］//The 19th IEEE International Conference on Image Processing. Orlando， 2013： 1105-1108.
[19]	Liu L， Chen Z H， Hu Z H， et al. An efficient adaptive compression method for human perception and machine vision tasks［EB/OL］. （2025-01-08）［2025-05-10］. .
[20]	李基臣，亓玉龙，胡海瑞，等. 数字图像处理技术在医学影像中的研究与应用［J］. 电子技术与软件工程， 2022（9）： 194-197.
	Li Ji-chen， Qi Yu-long， Hu Hai-rui， et al. Research and application of digital image processing technology in medical images［J］. Electronic Technology & Software Engineering， 2022（9）： 194-197.
[21]	Zabala A， Pons X. Effects of lossy compression on remote sensing image classification of forest areas［J］. International Journal of Applied Earth Observation and Geoinformation， 2011， 13（1）： 43-51.
[22]	Chao J S， Steinbach E. Preserving SIFT features in JPEG-encoded images［C］//The 18th IEEE International Conference on Image Processing. Brussels， 2011： 301-304.
[23]	Shindo T， Yamada K， Watanabe T， et al. Image coding for machines with edge information learning using segment anything［C］// IEEE International Conference on Image Processing （ICIP）. Abu Dhabi， 2024： 3702-3708.
[24]	Paniga S， Borsani L， Redondi A， et al. Experimental evaluation of a video streaming system for wireless multimedia sensor networks［C］// The 10th IFIP Annual Mediterranean Ad Hoc Networking Workshop. Favignana Island， 2011： 165-170.
[25]	Bernal J， Sánchez J， Vilariño F. Towards automatic polyp detection with a polyp appearance model［J］. Pattern Recognition， 2012， 45（9）： 3166-3182.
[26]	He K M， Gkioxari G， Dollár P， et al. Mask R-CNN［C］// IEEE International Conference on Computer Vision. Venice， 2017： 2980-2988.
[27]	Liu J Y， Lian J， Yu Y Z. ChestX-Det10： chest X-ray dataset on detection of thoracic abnormalities［EB/OL］. （2020-06-17）［2025-05-10］. .
[28]	Ren S Q， He K M， Girshick R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
[29]	Minnen D， Ballé J， Toderici G. Joint autoregressive and hierarchical priors for learned image compression［EB/OL］. （2018-09-08）［2025-05-10］..
[30]	Cheng Z X， Sun H M， Takeuchi M， et al. Learned image compression with discretized Gaussian mixture likelihoods and attention modules［C］//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， 2020： 7936-7945.
[31]	Cordts M， Omran M， Ramos S， et al. The cityscapes dataset for semantic urban scene understanding［C］//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， 2016： 3213-3223.
[32]	Romera E， Álvarez J M， Bergasa L M， et al. ERFNet： efficient residual factorized ConvNet for real-time semantic segmentation［J］. IEEE Transactions on Intelligent Transportation Systems， 2018， 19（1）： 263-272.