Download LD Reference
Linkage-Disequilibrium (LD) matrices, which record pairwise correlations between
genetic variants, are required as input to the VIPRS
model. To facilitate running the model
on GWAS data from diverse ancestries, we computed LD matrices for 6 continental populations represented in
the UK Biobank. The six ancestry groups and their corresponding download links are listed below:
Code | Ancestry group | Sample size | Download |
---|---|---|---|
EUR |
European | 362446 | GitHub or Zenodo |
CSA |
Central/South Asian | 8284 | GitHub or Zenodo |
AFR |
African | 6255 | GitHub or Zenodo |
EAS |
East Asian | 2700 | GitHub or Zenodo |
MID |
Middle Eastern | 1567 | GitHub or Zenodo |
AMR |
Admixed American | 987 | GitHub or Zenodo |
The sample sizes here are restricted to unrelated individuals in the UK Biobank.
The matrices were computed using the block
LD estimator, where we only record pairwise correlations between
variants in the same LD block. The LD blocks are defined by LDetect
.
The matrices were computed using the sister package magenpy
and were then
quantized to int8
data type for enhanced compressibility.
For European samples, we also provide LD matrices that record pairwise correlations for up to 18 million variants. This matrix is available for download via Zenodo.
For more details on QC criteria, data preparation, etc., please consult our manuscript:
Zabad et al. (2025). Towards whole-genome inference of polygenic scores with fast and memory-efficient algorithms. BioRxiv.
To access and use these matrices for downstream tasks, consult the codebase of magenpy
, our
sister python package that implements specialized data structures for computing and processing large-scale LD matrices.
Bash Script for downloading/extracting LD matrices¶
Here is a bash script that can be used to download and extract the LD matrices for all 6 populations. The script uses
the GitHub
links provided above. Feel free to modify the script to suit your needs.
#!/bin/bash
output_dir="LD_matrices"
populations=("EUR" "CSA" "AFR" "EAS" "MID" "AMR")
extract=true
mkdir -p $output_dir
for pop in "${populations[@]}"
do
echo "Downloading LD matrix for $pop"
wget -O $output_dir/$pop.tar.gz "https://github.com/shz9/viprs/releases/download/v0.1.2/$pop.tar.gz"
if [ "$extract" = true ]; then
mkdir -p $output_dir/$pop
tar -xf $output_dir/$pop.tar.gz -C $output_dir/$pop
fi
done