基于面部视频的非接触式血氧饱和度估计方法

doi:10.12068/j.issn.1005-3026.2026.20250067

摘要/Abstract

摘要：

针对远程光电容积描记法（rPPG）在非接触式血氧饱和度（SpO₂）测量中存在的时空特征建模不足以及复杂场景下鲁棒性差的挑战，提出了一种趋势感知时空融合网络（trend-aware spatio-temporal fusion network, TAST-Net）.该网络通过一个创新的双路融合架构，将3D卷积神经网络（3D CNN）分支提取的局部生理特征与ViViT（video vision transformer）分支捕捉的全局时空依赖进行协同融合.为增强模型对信号动态变化的敏感性，设计了一种结合均方误差与皮尔逊相关性损失的加权组合损失函数.在2个公开数据集上的实验结果表明，TAST-Net表现出优秀的性能：在PURE（pulse rate estimation）数据集上均方根误差（ $e R M S$ ）为0.53%，平均绝对误差（ $e M A$ ）为0.37%，皮尔逊相关系数（R）为0.96；在更具挑战性的VIPL-HR（visual information processing and learning-heart rate）数据集上， $e R M S$ 为0.84%， $e M A$ 为0.57%，R为0.82，其综合性能优于其他对比方法.研究结果表明，TAST-Net为从面部视频中实现准确、稳健的SpO₂估计提供了一个有效的方案，并验证了融合局部与全局特征策略在rPPG信号处理中的有效性.

关键词: 远程光电容积描记法, 深度学习, 非接触, 血氧饱和度估计, 面部视频

Abstract:

To address the challenges of inadequate spatio-temporal feature modeling and poor robustness in complex scenarios for non-contact blood oxygen saturation （SpO₂） measurement using remote photoplethysmography （rPPG），a trend-aware spatio-temporal fusion network （TAST-Net） was proposed. The proposed network adopted an innovative dual-branch fusion architecture that synergistically fused local physiological features extracted by a 3D convolutional neural network （3D CNN） branch with global spatio-temporal dependencies captured by a video vision transformer （ViViT） branch. To enhance the model’s sensitivity to signal dynamics， a weighted composite loss function combining mean squared error （MSE） and Pearson correlation loss was designed. Experimental results on two public datasets demonstrate the superior performance of TAST-Net. On the pulse rate estimation （PURE） dataset， it achieves a root mean squared error （ $e R M S$ ） of 0.53%， a mean absolute error （ $e M A$ ） of 0.37%， and a Pearson correlation coefficient （R） of 0.96. On the more challenging visual information processing and learning-heart rate （VIPL-HR） dataset， the $e R M S$ ， $e M A$ ， and R reach 0.84%， 0.57%， and 0.82， respectively， outperforming other comparative methods. These findings indicate that TAST-Net provides an effective solution for accurate and robust SpO₂ estimation from facial videos and validates the advantage of integrating local and global features in rPPG signal processing.

Key words: remote photoplethysmography, deep learning, non-contact, blood oxygen saturation estimation, facial video

中图分类号:

TP 391.41

齐林, 高启赫, 关舒月, 李永春. 基于面部视频的非接触式血氧饱和度估计方法[J]. 东北大学学报（自然科学版）, 2026, 47(1): 42-51.

Lin QI, Qi-he GAO, Shu-yue GUAN, Yong-chun LI. Non-contact Estimation Method of Blood Oxygen Saturation Based on Facial Videos[J]. Journal of Northeastern University(Natural Science), 2026, 47(1): 42-51.

图/表 7

图1 TAST-Net网络框架

Fig.1 Architecture of TAST-Net

图2 数据预处理流程（a）—面部关键点检测；（b）—提取的面部ROI；（c）—对ROI进行背景去除；（d）—欧拉视频放大后的面部.

Fig.2 Data preprocessing process

表1 不同模型在PURE数据集上的血氧饱和度估计结果

Table 1 Blood oxygen saturation estimation results of different models on PURE dataset

模型	$e R M S$ /%	$e M A$ /%	R	参数量/10⁶	推理速度/（ms·帧^-1）
3D-CNN	6.13	5.89	0.25	13.33	1.182
MultiPhysNet	0.91	0.72	0.86	0.88	0.153
ITSCAN	1.72	1.36	0.73	22.48	0.194
MMFM	0.89	0.66	0.87	0.74	0.077
Our TAST-Net	0.53	0.37	0.96	3.24	0.149

表1 不同模型在PURE数据集上的血氧饱和度估计结果

Table 1 Blood oxygen saturation estimation results of different models on PURE dataset

模型	$e R M S$ /%	$e M A$ /%	R	参数量/10⁶	推理速度/（ms·帧^-1）
3D-CNN	6.13	5.89	0.25	13.33	1.182
MultiPhysNet	0.91	0.72	0.86	0.88	0.153
ITSCAN	1.72	1.36	0.73	22.48	0.194
MMFM	0.89	0.66	0.87	0.74	0.077
Our TAST-Net	0.53	0.37	0.96	3.24	0.149

表2 不同模型在VIPL-HR数据集上的血氧饱和度估计结果

Table 2 Blood oxygen saturation estimation results of different models on VIPL-HR dataset

模型	$e R M S$ /%	$e M A$ /%	R	参数量/10⁶	推理速度/（ms·帧^-1）
3D-CNN	2.62	2.42	0.63	13.33	1.182
MultiPhysNet	0.90	0.66	0.80	0.88	0.153
ITSCAN	1.14	0.72	0.70	22.48	0.194
MMFM	1.16	0.87	0.59	0.74	0.077
Our TAST-Net	0.84	0.57	0.82	3.24	0.149

表2 不同模型在VIPL-HR数据集上的血氧饱和度估计结果

Table 2 Blood oxygen saturation estimation results of different models on VIPL-HR dataset

模型	$e R M S$ /%	$e M A$ /%	R	参数量/10⁶	推理速度/（ms·帧^-1）
3D-CNN	2.62	2.42	0.63	13.33	1.182
MultiPhysNet	0.90	0.66	0.80	0.88	0.153
ITSCAN	1.14	0.72	0.70	22.48	0.194
MMFM	1.16	0.87	0.59	0.74	0.077
Our TAST-Net	0.84	0.57	0.82	3.24	0.149

图3 TAST-Net在公开数据集上的血氧饱和度估计结果可视化注：左侧为SpO2pre和SpO2gt的Bland-Altman图；右侧为SpO2pre和SpO2gt的散点图；SD（standard deviation），±1.96SD表示95%的一致性界限.（a）—PURE数据集；（b）—VIPL-HR数据集.

Fig.3 Visualization of blood oxygen saturation estimation results of TAST-Net on public dataset

表3 TAST-Net模型消融实验结果 (model)

Table 3 Ablation experiment results of TAST-Net

模型	$e R M S$ /%	$e M A$ /%	R
Baseline	1.32	1.05	0.68
Baseline+Dual-Path Architecture	0.58	0.41	0.95
Baseline+Total Loss（ $L T$ ）	1.12	0.85	0.84
Our TAST-Net	0.53	0.37	0.96

表3 TAST-Net模型消融实验结果 (model)

Table 3 Ablation experiment results of TAST-Net

模型	$e R M S$ /%	$e M A$ /%	R
Baseline	1.32	1.05	0.68
Baseline+Dual-Path Architecture	0.58	0.41	0.95
Baseline+Total Loss（ $L T$ ）	1.12	0.85	0.84
Our TAST-Net	0.53	0.37	0.96

表4 EVM预处理消融实验结果 (preprocessing)

Table 4 Ablation experiment results of EVM

数据集	预处理方式	$e R M S$ /%	$e M A$ /%	R
PURE	TAST-Net(无EVM)	0.59	0.46	0.94
PURE	TAST-Net(有EVM)	0.53	0.37	0.96
VIPL-HR	TAST-Net(无EVM)	0.91	0.66	0.79
VIPL-HR	TAST-Net(有EVM)	0.84	0.57	0.82

表4 EVM预处理消融实验结果 (preprocessing)

Table 4 Ablation experiment results of EVM

数据集	预处理方式	$e R M S$ /%	$e M A$ /%	R
PURE	TAST-Net(无EVM)	0.59	0.46	0.94
PURE	TAST-Net(有EVM)	0.53	0.37	0.96
VIPL-HR	TAST-Net(无EVM)	0.91	0.66	0.79
VIPL-HR	TAST-Net(有EVM)	0.84	0.57	0.82

参考文献 30

[1]	Laratta C R， Ayas N T， Povitz M， et al. Diagnosis and treatment of obstructive sleep apnea in adults［J］. Canadian Medical Association Journal， 2017， 189（48）： 1481-1488.
[2]	Watson A R， Wah R， Thamman R. The value of remote monitoring for the COVID-19 pandemic［J］. Telemedicine Journal and e-Health， 2020， 26（9）： 1110-1112.
[3]	Amoore J N. Pulse oximetry： an equipment management perspective［C］//IEE Colloquium on Pulse Oximetry： A Critical Appraisal. London， 2002： 124-126.
[4]	Shimazaki T， Hara S， Okuhata H， et al. Cancellation of motion artifact induced by exercise for PPG-based heart rate sensing［C］// The 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Chicago， 2014： 3216-3219.
[5]	Verkruysse W， Svaasand L O， Nelson J S. Remote plethysmographic imaging using ambient light［J］. Optics Express， 2008， 16（26）： 21434-21445.
[6]	de Haan G， Jeanne V. Robust pulse rate from chrominance-based rPPG［J］. IEEE Transactions on Bio-medical Engineering， 2013， 60（10）： 2878-2886.
[7]	Poh M Z， McDuff D J， Picard R W. Non-contact， automated cardiac pulse measurements using video imaging and blind source separation［J］. Optics Express， 2010， 18（10）： 10762-10774.
[8]	Balakrishnan G， Durand F， Guttag J. Detecting pulse from head motions in video［C］//2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland， 2013： 3430-3437.
[9]	Wang W J， den Brinker A C， Stuijk S， et al. Algorithmic principles of remote PPG［J］. IEEE Transactions on Biomedical Engineering， 2017， 64（7）： 1479-1491.
[10]	Chen W X， McDuff D. DeepPhys： video-based physiological measurement using convolutional attention networks［C］//Computer Vision-ECCV 2018. Cham： Springer， 2018： 356-373.
[11]	Mathew J， Tian X， Wong C W， et al. Remote blood oxygen estimation from videos using neural networks［J］. IEEE Journal of Biomedical and Health Informatics， 2023， 27（8）： 3710-3720.
[12]	Yu Z T， Shen Y M， Shi J G， et al. PhysFormer++： facial video-based physiological measurement with slow fast temporal difference transformer［J］. International Journal of Computer Vision， 2023， 131（6）： 1307-1330.
[13]	Yu Z T， Shen Y M， Shi J G， et al. PhysFormer： facial video-based physiological measurement with temporal difference transformer［C］// IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. New Orleans， 2022： 4176-4186.
[14]	Du J D， Liu S Q， Zhang B C， et al. Weakly supervised rPPG estimation for respiratory rate estimation［C］// IEEE/CVF International Conference on Computer Vision Workshops （ICCVW）. Montreal， 2021： 2391-2397.
[15]	Gideon J， Stent S. The way to my heart is through contrastive learning： remote photoplethysmography from unlabelled video［C］//IEEE/CVF International Conference on Computer Vision （ICCV）. Montreal， 2022： 3975-3984.
[16]	Vaswani A， Shazeer N， Parmar N， et al. Attention is all you need ［C］//Advances in Neural Information Processing Systems. Long Beach，CA，2017：5998-6008.
[17]	Arnab A， Dehghani M， Heigold G， et al. ViViT： a video vision transformer［C］// IEEE/CVF International Conference on Computer Vision （ICCV）. Montreal， 2021： 6816-6826.
[18]	Stricker R， Müller S， Gross H M. Non-contact video-based pulse rate measurement on a mobile service robot［C］//The 23rd IEEE International Symposium on Robot and Human Interactive Communication. Edinburgh， 2014： 1056-1062.
[19]	Niu X S， Han H， Shan S G， et al. VIPL-HR： a multi-modal database for pulse estimation from less-constrained face video［C］//Computer Vision-ACCV 2018. Cham： Springer， 2018： 562-576.
[20]	Kazemi V， Sullivan J. One millisecond face alignment with an ensemble of regression trees［C］// IEEE Conference on Computer Vision and Pattern Recognition. Columbus， 2014： 1867-1874.
[21]	Lugaresi C， Tang J Q， Nash H， et al. MediaPipe： a framework for building perception pipelines［EB/OL］. （2019-06-12）［2024-11-19］. .
[22]	Wu H Y， Rubinstein M， Shih E， et al. Eulerian video magnification for revealing subtle changes in the world［J］. ACM Transactions on Graphics， 2012， 31（4）： 1-8.
[23]	Loshchilov I， Hutter F. Decoupled weight decay regularization［EB/OL］. （2017-11-14）［2024-12-25］. .
[24]	He K M， Zhang X Y， Ren S Q， et al. Delving deep into rectifiers： surpassing human-level performance on ImageNet classification［C］// IEEE International Conference on Computer Vision （ICCV）. Santiago， 2015： 1026-1034.
[25]	He K M， Zhang X Y， Ren S Q， et al. Deep residual learning for image recognition［C］// IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， 2016： 770-778.
[26]	Kim S Y， Lim J， Na T， et al. 3DSRnet： video super-resolution using 3D convolutional neural networks［EB/OL］. （2018-12-21）［2024-11-15］. .
[27]	Liu K， Tang J K， Jiang Z， et al. Summit vitals： multi-camera and multi-signal biosensing at high altitudes［C］// IEEE Smart World Congress （SWC）. Nadi， 2025： 284-291.
[28]	Zhu S W， Liu S H， Jing X J， et al. Innovative approaches in imaging photoplethysmography for remote blood oxygen monitoring［J］. Scientific Reports， 2024， 14： 19144.
[29]	Hu M， Wu X， Wang X H， et al. Contactless blood oxygen estimation from face videos： a multi-model fusion method based on deep learning［J］. Biomedical Signal Processing and Control， 2023， 81： 104487.
[30]	Respiratory Devices and Related Equipment Used for Patient Care. Medical electrical equipment. Part 2-61： particular requirements for basic safety and essential performance of pulse oximeter equipment：［S/OL］. （2017-12-15）［2025-03-12］. .