sPLINK: A Federated, Privacy-Preserving Tool as a Robust Alternative to Meta-Analysis in Genome-Wide Association Studies

Abstract

Genome-wide association studies (GWAS) have been widely used to unravel connections between genetic variants and diseases. Larger sample sizes in GWAS can lead to discovering more associations and more accurate genetic predictors. However, sharing and combining distributed genomic data to increase the sample size is often challenging or even impossible due to privacy concerns and privacy protection laws such as the GDPR. While meta-analysis has been established an effective approach to combine summary statistics of several GWAS, its accuracy can be attenuated in the presence of cross-study heterogeneity. Here, we present sPLINK (safe PLINK), a user-friendly software tool set which performs federated, privacy-preserving GWAS on distributed datasets, preserving accuracy. sPLINK neither exchanges raw data nor does it rely on summary statistics. Instead, it performs model training in a federated manner, communicating only model parameters between cohorts and a central server. We verify that the federated results from sPLINK are the same as those from aggregated analysis conducted with PLINK. Moreover, we demonstrate that sPLINK is robust against imbalanced phenotype distributions across cohorts while existing meta-analysis tools considerably lose accuracy in such scenarios. Federated and user-friendly analysis with sPLINK, thus, has the potential to replace meta-analysis as the gold standard for collaborative GWAS.Competing Interest StatementThe authors have declared no competing interest.

Publication
bioRxiv