Journal of Northeastern University(Natural Science) ›› 2026, Vol. 47 ›› Issue (1): 42-51.DOI: 10.12068/j.issn.1005-3026.2026.20250067

• Smart Healthcare Column • Previous Articles     Next Articles

Non-contact Estimation Method of Blood Oxygen Saturation Based on Facial Videos

Lin QI1,2,3, Qi-he GAO1, Shu-yue GUAN1, Yong-chun LI4()   

  1. 1.College of Medicine and Biological Information Engineering,Northeastern University,Shenyang 110169,China
    2.Key Laboratory of Medical Image Computing,Ministry of Education,Northeastern University,Shenyang 110169,China
    3.Engineering Research Center of Medical Imaging and Intelligent Analysis,Ministry of Education,Northeastern University,Shenyang 110169,China
    4.Shenyang Contain Electronic Technology Co. ,Ltd. ,Shenyang 110167,China.
  • Received:2025-06-12 Online:2026-01-15 Published:2026-03-17
  • Contact: Yong-chun LI

Abstract:

To address the challenges of inadequate spatio-temporal feature modeling and poor robustness in complex scenarios for non-contact blood oxygen saturation (SpO2) measurement using remote photoplethysmography (rPPG),a trend-aware spatio-temporal fusion network (TAST-Net) was proposed. The proposed network adopted an innovative dual-branch fusion architecture that synergistically fused local physiological features extracted by a 3D convolutional neural network (3D CNN) branch with global spatio-temporal dependencies captured by a video vision transformer (ViViT) branch. To enhance the model’s sensitivity to signal dynamics, a weighted composite loss function combining mean squared error (MSE) and Pearson correlation loss was designed. Experimental results on two public datasets demonstrate the superior performance of TAST-Net. On the pulse rate estimation (PURE) dataset, it achieves a root mean squared error (eRMS) of 0.53%, a mean absolute error (eMA) of 0.37%, and a Pearson correlation coefficient (R) of 0.96. On the more challenging visual information processing and learning-heart rate (VIPL-HR) dataset, the eRMSeMA, and R reach 0.84%, 0.57%, and 0.82, respectively, outperforming other comparative methods. These findings indicate that TAST-Net provides an effective solution for accurate and robust SpO2 estimation from facial videos and validates the advantage of integrating local and global features in rPPG signal processing.

Key words: remote photoplethysmography, deep learning, non-contact, blood oxygen saturation estimation, facial video

CLC Number: