Joint Optimization Approach for Medical Image Compression and Vision Tasks

doi:10.12068/j.issn.1005-3026.2026.20259020

Abstract

Abstract:

In medical image processing， the reliance on independent encoding components makes it impossible to achieve joint optimization of data compression and machine vision tasks. To address this issue， an end-to-end machine vision task-driven medical image compression network （MVMICNet） was proposed， achieving harmonious unification of data compression and medical image analysis in an end-to-end manner. To maintain the performance of machine vision tasks before and after medical image compression， a task-aware improved code rate-accuracy loss function was designed. By introducing task-related loss terms， it dynamically balanced the relationship among code rate， reconstructed image distortion， and machine vision task accuracy during the optimization process. Furthermore， the MVMICNet model adopted a stage-wise training approach， specifically optimizing for the different characteristics of machine vision tasks to ensure that the model can accurately capture the feature information crucial for diagnosis. This has achieved a simultaneous improvement in compression efficiency and task performance， thus demonstrating superior robustness in complex medical application scenarios. Finally， the effectiveness of the framework was verified in semantic segmentation and object detection tasks.

Key words: medical image compression, semantic segmentation, object detection, convolutional neural network （CNN）, task-driven optimization

CLC Number:

TP 391

Chao YAO, Zi-xuan GAO, Jun-ru CHEN, Yi-peng LU. Joint Optimization Approach for Medical Image Compression and Vision Tasks[J]. Journal of Northeastern University(Natural Science), 2026, 47(1): 11-19.

Figures/Tables 11

Fig.1 Schematic diagram of MVMICNet framework

Fig.2 Schematic diagram of MVMICNet framework at task-specific optimization stage

Table 1 Comparison results of semantic segmentation accuracy on CVC-ColonDB dataset

算法	Bpp	PSNR/dB	MS-SSIM	mIoU
算法	Bpp	PSNR/dB	MS-SSIM	第一阶段	第二阶段
BPG	0.090	30.02	0.908	0.498 7
	0.100	31.40	0.929	0.559 1
	0.114	33.01	0.944	0.618 3
	0.132	34.53	0.956	0.668 3
MBT2018-Mean	0.080	32.55	0.907	0.613 5
	0.094	33.84	0.938	0.674 5
	0.111	35.15	0.952	0.722 0
	0.130	35.98	0.969	0.754 8
Cheng2020-Anchor	0.068	36.45	0.934	0.697 1
	0.089	37.98	0.952	0.743 5
	0.112	39.04	0.972	0.789 8
	0.137	39.82	0.978	0.819 2
MVMICNet	0.065	40.38	0.982	0.780 4	0.822 6
	0.083	41.25	0.984	0.825 2	0.831 7
	0.105	42.08	0.987	0.832 6	0.849 0
	0.131	42.83	0.989	0.833 7	0.864 5

Fig.3 Comparison of code rate-accuracy curves for different algorithms on CVC-ColonDB and ChestX-Det datasets

Fig.4 Visual result comparison of semantic segmentation by different algorithms on CVC-ColonDB dataset

Fig.5 Reconstruction results of the same image from CVC-ColonDB dataset by different algorithms

Table 2 Comparison results 1 of object detection accuracy on ChestX-Det dataset

指标			MVMICNet				BPG
指标			$λ 2 = 0.000 01$	$λ 2 = 0.000 1$	$λ 2 = 0.001$	$λ 2 = 0.1$	q=34		q=31	q=28		q=25
Bpp			0.047	0.060	0.074	0.089	0.046	0.060			0.079		0.091
PSNR/dB			41.14	41.93	42.57	43.10	35.98	37.50			38.76		39.52
MS-SSIM			0.984 7	0.987 7	0.989 8	0.991 3	0.959 4	0.969 5			0.976 7		0.983 4
mAP	IoU=0.50:0.95	第一阶段	0.079 7	0.089 9	0.101 0	0.115 4	0.030 0	0.049 1			0.030 0		0.093 4
	IoU=0.50:0.95	第二阶段	0.083 8	0.094 9	0.105 6	0.120 1	0.030 0	0.049 1			0.030 0		0.093 4
	IoU=0.50	第一阶段	0.164 9	0.190 1	0.218 4	0.241 0	0.064 0	0.106 9			0.064 0		0.203 3
	IoU=0.50	第二阶段	0.175 9	0.203 6	0.230 4	0.253 6	0.064 0	0.106 9			0.064 0		0.203 3
	IoU=0.75	第一阶段	0.069 3	0.082 0	0.092 0	0.100 6	0.023 5	0.036 3			0.023 5		0.084 7
	IoU=0.75	第二阶段	0.071 9	0.084 7	0.093 6	0.103 1	0.023 5	0.036 3			0.023 5		0.084 7

Table 2 Comparison results 1 of object detection accuracy on ChestX-Det dataset

指标			MVMICNet				BPG
指标			$λ 2 = 0.000 01$	$λ 2 = 0.000 1$	$λ 2 = 0.001$	$λ 2 = 0.1$	q=34		q=31	q=28		q=25
Bpp			0.047	0.060	0.074	0.089	0.046	0.060			0.079		0.091
PSNR/dB			41.14	41.93	42.57	43.10	35.98	37.50			38.76		39.52
MS-SSIM			0.984 7	0.987 7	0.989 8	0.991 3	0.959 4	0.969 5			0.976 7		0.983 4
mAP	IoU=0.50:0.95	第一阶段	0.079 7	0.089 9	0.101 0	0.115 4	0.030 0	0.049 1			0.030 0		0.093 4
	IoU=0.50:0.95	第二阶段	0.083 8	0.094 9	0.105 6	0.120 1	0.030 0	0.049 1			0.030 0		0.093 4
	IoU=0.50	第一阶段	0.164 9	0.190 1	0.218 4	0.241 0	0.064 0	0.106 9			0.064 0		0.203 3
	IoU=0.50	第二阶段	0.175 9	0.203 6	0.230 4	0.253 6	0.064 0	0.106 9			0.064 0		0.203 3
	IoU=0.75	第一阶段	0.069 3	0.082 0	0.092 0	0.100 6	0.023 5	0.036 3			0.023 5		0.084 7
	IoU=0.75	第二阶段	0.071 9	0.084 7	0.093 6	0.103 1	0.023 5	0.036 3			0.023 5		0.084 7

Table 3 Comparison result 2 of object detection accuracy on the ChestX-Det dataset

指标		MBT2018-Mean				Cheng2020-Anchor
指标		q=5	q=4	q=3	q=2	q=5	q=4	q=3	q=2
Bpp		0.041	0.054	0.072	0.097	0.043	0.061	0.079	0.093
PSNR/dB		37.36	38.60	39.98	40.81	39.41	40.63	41.63	42.36
MS-SSIM		0.969 1	0.974 0	0.978 8	0.984 5	0.979 6	0.983 2	0.985 4	0.986 6
mAP	IoU=0.50:0.95	0.048 4	0.069 9	0.086 4	0.096 7	0.064 9	0.084 0	0.096 7	0.106 2
	IoU=0.50	0.102 4	0.142 7	0.183 1	0.214 1	0.128 2	0.173 4	0.208 2	0.223 7
	IoU=0.75	0.036 1	0.054 2	0.072 3	0.087 1	0.055 4	0.074 7	0.087 6	0.092 4

Fig.6 Visual result comparison of object detection by different algorithm on ChestX-Det dataset

Fig.7 Reconstruction results of the same image from ChestX-Det dataset by different algorithm

Table 4 Impact of parameter λ2 on semantic

$λ 2$	Bpp	PSNR/dB	mIoU/%	准确率/%
0.1	0.252 30	24.250	33.040	45.5
0.001	0.245 60	31.880	45.780	63.0
0.000 1	0.243 20	35.080	58.030	79.9
0.000 01	0.243 10	34.570	52.710	72.6

Table 4 Impact of parameter λ2 on semantic

$λ 2$	Bpp	PSNR/dB	mIoU/%	准确率/%
0.1	0.252 30	24.250	33.040	45.5
0.001	0.245 60	31.880	45.780	63.0
0.000 1	0.243 20	35.080	58.030	79.9
0.000 01	0.243 10	34.570	52.710	72.6

References 32

[1]	Wallace G K. The JPEG still picture compression standard［J］. IEEE Transactions on Consumer Electronics， 1992， 38（1）： 18-34.
[2]	Christopoulos C， Skodras A， Ebrahimi T. The JPEG2000 still image coding system： an overview［J］. IEEE Transactions on Consumer Electronics， 2000， 46（4）： 1103-1127.
[3]	Sullivan G J， Ohm J R， Han W J， et al. Overview of the high efficiency video coding （HEVC） standard［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2012， 22（12）： 1649-1668.
[4]	Ballé J， Laparra V， Simoncelli E P. End-to-end optimized image compression［C］// Proceedings of the International Conference on Learning Representations. Toulon，2017： 1611.01704.
[5]	Ballé J， Minnen D， Singh S， et al. Variational image compression with a scale hyperprior［C］// Proceedings of International Conference on Learning Representations. Vancouver， 2018： 1802.01436.
[6]	Liu J H， Lu G， Hu Z H， et al. A unified end-to-end framework for efficient deep image compression［EB/OL］. （2020-02-09）［2025-05-10］. .
[7]	乔思波，庞善臣，王敏，等. 基于残差混合注意力机制的脑部CT图像分类卷积神经网络模型［J］. 电子学报， 2021， 49（5）： 984-991.
	Qiao Si-bo， Pang Shan-chen， Wang Min， et al. A convolutional neural network for brain CT image classification based on residual hybrid attention mechanism［J］. Acta Electronica Sinica， 2021， 49（5）： 984-991.
[8]	张诗源，赵桐溪，戚飞越，等. 基于小波变换的智能生物医学图像分类算法［J］. 应用数学进展， 2025（3）： 16-25.
	Zhang Shi-yuan， Zhao Tong-xi， Qi Fei-yue， et al. Intelligent biomedical image classification algorithm based on wavelet transform［J］. Advances in Applied Mathematics， 2025（3）： 16-25.
[9]	江贵平，秦文健，周寿军，等. 医学图像分割及其发展现状［J］. 计算机学报， 2015， 38（6）： 1222-1242.
	Jiang Gui-ping， Qin Wen-jian， Zhou Shou-jun， et al. Medical image segmentation and its development status［J］. Chinese Journal of Computers， 2015， 38（6）： 1222-1242.
[10]	周涛，董雅丽，霍兵强，等. U-Net网络医学图像分割应用综述［J］. 中国图象图形学报， 2021， 26（9）： 2058-2077.
	Zhou Tao， Dong Ya-li， Huo Bing-qiang， et al. U-Net and its applications in medical image segmentation： a review［J］. Journal of Image and Graphics， 2021， 26（9）： 2058-2077.
[11]	刘飞，张俊然，杨豪. 基于深度学习的医学图像识别研究进展［J］. 中国生物医学工程学报， 2018， 37（1）： 86-94.
	Liu Fei， Zhang Jun-ran， Yang Hao. Research progress of medical image recognition based on deep learning［J］. Chinese Journal of Biomedical Engineering， 2018， 37（1）： 86-94.
[12]	苏华强，雷海军，雷柏英. 多分支特征融合分类网络用于CXR图像识别［J］. 信号处理， 2025， 41（2）： 253-266.
	Su Hua-qiang， Lei Hai-jun， Lei Bai-ying. Multi-branch feature fusion classification network for chest X-ray image recognition［J］. Journal of Signal Processing， 2025， 41（2）： 253-266.
[13]	Duan L Y， Liu J Y， Yang W H， et al. Video coding for machines： a paradigm of collaborative compression and intelligent analytics［J］. IEEE Transactions on Image Processing， 2020， 29： 8680-8695.
[14]	Wang S R， Wang Z， Wang S Q， et al. End-to-end compression towards machine vision： network architecture design and optimization［J］. IEEE Open Journal of Circuits and Systems， 2021， 2： 675-685.
[15]	Girod B， Chandrasekhar V， Chen D M， et al. Mobile visual search［J］. IEEE Signal Processing Magazine， 2011， 28（4）： 61-76.
[16]	王凯. 基于双分支特征融合的高动态范围医学影像压缩研究［D］. 哈尔滨：哈尔滨工业大学， 2022.
	Wang Kai. High dynamic range medical image compression based on two-branch feature fusion ［D］. Harbin： Harbin Institute of Technology， 2022.
[17]	Herbert R， Tuytelaars T， Gool L V. SURF： speeded up robust features［C］// Proceedings of the European Conference on Computer Vision. Graz， 2006： 404-417.
[18]	Redondi A， Cesana M， Tagliasacchi M. Rate-accuracy optimization in visual wireless sensor networks［C］//The 19th IEEE International Conference on Image Processing. Orlando， 2013： 1105-1108.
[19]	Liu L， Chen Z H， Hu Z H， et al. An efficient adaptive compression method for human perception and machine vision tasks［EB/OL］. （2025-01-08）［2025-05-10］. .
[20]	李基臣，亓玉龙，胡海瑞，等. 数字图像处理技术在医学影像中的研究与应用［J］. 电子技术与软件工程， 2022（9）： 194-197.
	Li Ji-chen， Qi Yu-long， Hu Hai-rui， et al. Research and application of digital image processing technology in medical images［J］. Electronic Technology & Software Engineering， 2022（9）： 194-197.
[21]	Zabala A， Pons X. Effects of lossy compression on remote sensing image classification of forest areas［J］. International Journal of Applied Earth Observation and Geoinformation， 2011， 13（1）： 43-51.
[22]	Chao J S， Steinbach E. Preserving SIFT features in JPEG-encoded images［C］//The 18th IEEE International Conference on Image Processing. Brussels， 2011： 301-304.
[23]	Shindo T， Yamada K， Watanabe T， et al. Image coding for machines with edge information learning using segment anything［C］// IEEE International Conference on Image Processing （ICIP）. Abu Dhabi， 2024： 3702-3708.
[24]	Paniga S， Borsani L， Redondi A， et al. Experimental evaluation of a video streaming system for wireless multimedia sensor networks［C］// The 10th IFIP Annual Mediterranean Ad Hoc Networking Workshop. Favignana Island， 2011： 165-170.
[25]	Bernal J， Sánchez J， Vilariño F. Towards automatic polyp detection with a polyp appearance model［J］. Pattern Recognition， 2012， 45（9）： 3166-3182.
[26]	He K M， Gkioxari G， Dollár P， et al. Mask R-CNN［C］// IEEE International Conference on Computer Vision. Venice， 2017： 2980-2988.
[27]	Liu J Y， Lian J， Yu Y Z. ChestX-Det10： chest X-ray dataset on detection of thoracic abnormalities［EB/OL］. （2020-06-17）［2025-05-10］. .
[28]	Ren S Q， He K M， Girshick R， et al. Faster R-CNN： towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
[29]	Minnen D， Ballé J， Toderici G. Joint autoregressive and hierarchical priors for learned image compression［EB/OL］. （2018-09-08）［2025-05-10］..
[30]	Cheng Z X， Sun H M， Takeuchi M， et al. Learned image compression with discretized Gaussian mixture likelihoods and attention modules［C］//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， 2020： 7936-7945.
[31]	Cordts M， Omran M， Ramos S， et al. The cityscapes dataset for semantic urban scene understanding［C］//IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， 2016： 3213-3223.
[32]	Romera E， Álvarez J M， Bergasa L M， et al. ERFNet： efficient residual factorized ConvNet for real-time semantic segmentation［J］. IEEE Transactions on Intelligent Transportation Systems， 2018， 19（1）： 263-272.