Journal of Northeastern University(Natural Science) ›› 2025, Vol. 46 ›› Issue (5): 37-45.DOI: 10.12068/j.issn.1005-3026.2025.20240015

• Information & Control • Previous Articles     Next Articles

An Efficient Distributed False Positive Control Algorithm for FDR

Xu-ze LIU1, Hui-ying WANG2, Liang-yu CHU3, Yu-hai ZHAO1()   

  1. 1.School of Computer Science & Engineering,Northeastern University,Shenyang 110819,China
    2.Information and Communication Branch of Liaoning Electric Power Company,State Grid,Shenyang 110065,China
    3.School of Medicine & Bioinformatics Engineering,Northeastern University,Shenyang 110819,China.
  • Received:2024-01-17 Online:2025-05-15 Published:2025-08-07
  • Contact: Yu-hai ZHAO

Abstract:

To address the issue of false positives caused by multiple hypothesis testing in big data mining, as well as the extremely time-consuming nature of calculating theoretical results for controlling the false discovery rate (FDR). Aiming at the computational efficiency of theoretical FDR values, a distributed false-positive control algorithm based on DPFDR(distributed permutation testing-based false discovery rate) is proposed. The algorithm firstly mining the representative patterns based on the conditional frequent pattern tree (CFP) method, and using the representative patterns to compress the pattern space. Then, the workload of the corresponding task is estimated according to the representative mode, the data is divided according to the workload, and the task is allocated to each compute node through the load balancing policy. Finally, the effective FDR false-positive control threshold is obtained by merging and sorting the calculation results of each node. A series of experimental results on real data sets show that the proposed DPFDR algorithm can greatly improve the computational efficiency of FDR false positive control threshold.

Key words: false positive, data mining, distributed computing, false discovery rate, significance threshold

CLC Number: