Skip to content

magenpy_simulate

Simulate complex traits with varying genetic architectures (magenpy_simulate)

The magenpy_simulate script is used to facilitate simulating complex traits with a variety of genetic architectures, given a set of genotypes stored in plink's BED file format. The script takes as input the path to the genotype data, the type of trait to simulate, the parameters of the genetic architecture (e.g. polygenicity, heritability, effect sizes), and the output directory where the simulated phenotypes will be stored.

A full listing of the options available for the magenpy_simulate script can be found by running the following command in your terminal:

magenpy_simulate -h

Which outputs the following help message:

        ********************************************************                            
             _ __ ___   __ _  __ _  ___ _ __  _ __  _   _ 
            | '_ ` _ \ / _` |/ _` |/ _ \ '_ \| '_ \| | | |
            | | | | | | (_| | (_| |  __/ | | | |_) | |_| |
            |_| |_| |_|\__,_|\__, |\___|_| |_| .__/ \__, |
                             |___/           |_|    |___/
            Modeling and Analysis of Genetics data in python
            Version: 0.1.4 | Release date: June 2024
            Author: Shadi Zabad, McGill University
        ********************************************************
        < Simulate complex quantitative or case-control traits >

usage: magenpy_simulate [-h] --bfile BED_FILE [--keep KEEP_FILE] [--extract EXTRACT_FILE] [--backend {plink,xarray}] [--temp-dir TEMP_DIR]
                        --output-file OUTPUT_FILE [--output-simulated-beta] [--min-maf MIN_MAF] [--min-mac MIN_MAC] --h2 H2
                        [--mix-prop MIX_PROP] [--prop-causal PROP_CAUSAL] [--var-mult VAR_MULT]
                        [--phenotype-likelihood {gaussian,binomial}] [--prevalence PREVALENCE] [--seed SEED]

Commandline arguments for the complex trait simulator

options:
  -h, --help            show this help message and exit
  --bfile BED_FILE      The BED files containing the genotype data. You may use a wildcard here (e.g. "data/chr_*.bed")
  --keep KEEP_FILE      A plink-style keep file to select a subset of individuals for simulation.
  --extract EXTRACT_FILE
                        A plink-style extract file to select a subset of SNPs for simulation.
  --backend {plink,xarray}
                        The backend software used for the computation.
  --temp-dir TEMP_DIR   The temporary directory where we store intermediate files.
  --output-file OUTPUT_FILE
                        The path where the simulated phenotype will be stored (no extension needed).
  --output-simulated-beta
                        Output a table with the true simulated effect size for each variant.
  --min-maf MIN_MAF     The minimum minor allele frequency for variants included in the simulation.
  --min-mac MIN_MAC     The minimum minor allele count for variants included in the simulation.
  --h2 H2               Trait heritability. Ranges between 0. and 1., inclusive.
  --mix-prop MIX_PROP   Mixing proportions for the mixture density (comma separated). For example, for the spike-and-slab mixture density,
                        with the proportion of causal variants set to 0.1, you can specify: "--mix-prop 0.9,0.1 --var-mult 0,1".
  --prop-causal PROP_CAUSAL, -p PROP_CAUSAL
                        The proportion of causal variants in the simulation. See --mix-prop for more complex architectures specification.
  --var-mult VAR_MULT, -d VAR_MULT
                        Multipliers on the variance for each mixture component.
  --phenotype-likelihood {gaussian,binomial}
                        The likelihood for the simulated trait: gaussian (e.g. quantitative) or binomial (e.g. case-control).
  --prevalence PREVALENCE
                        The prevalence of cases (or proportion of positives) for binary traits. Ranges between 0. and 1.
  --seed SEED           The random seed to use for the random number generator.