Skip to content

Binary metrics

avg_precision(true_val, pred_val)

Compute the average precision between the PRS predictions and a binary.


Name Type Description Default

The response value or phenotype (a binary numpy vector with 0s and 1s)


The predicted value or PRS (a numpy vector)

Source code in viprs/eval/
def avg_precision(true_val, pred_val):
    Compute the average precision between the PRS predictions and a binary.

    :param true_val: The response value or phenotype (a binary numpy vector with 0s and 1s)
    :param pred_val: The predicted value or PRS (a numpy vector)
    from sklearn.metrics import average_precision_score
    return average_precision_score(true_val, pred_val)

cox_snell_r2(true_val, pred_val, covariates=None)

Compute the Cox-Snell pseudo-R^2 between the PRS predictions and a binary phenotype. If covariates are provided, we compute the incremental pseudo-R^2 by conditioning on the covariates.


Name Type Description Default

The response value or phenotype (a binary numpy vector with 0s and 1s)


The predicted value or PRS (a numpy vector)


A pandas table of covariates where the rows are ordered the same way as the predictions and response.

Source code in viprs/eval/
def cox_snell_r2(true_val, pred_val, covariates=None):
    Compute the Cox-Snell pseudo-R^2 between the PRS predictions and a binary phenotype.
    If covariates are provided, we compute the incremental pseudo-R^2 by conditioning
    on the covariates.

    :param true_val: The response value or phenotype (a binary numpy vector with 0s and 1s)
    :param pred_val: The predicted value or PRS (a numpy vector)
    :param covariates: A pandas table of covariates where the rows are ordered
    the same way as the predictions and response.

    if covariates is None:
        add_intercept = False
        covariates = pd.DataFrame(np.ones((true_val.shape[0], 1)), columns=['const'])
        add_intercept = True

    null_result = fit_linear_model(true_val, covariates,
                                   family='binomial', add_intercept=add_intercept)
    full_result = fit_linear_model(true_val, covariates.assign(pred_val=pred_val),
                                   family='binomial', add_intercept=add_intercept)
    n = true_val.shape[0]

    return 1. - np.exp(-2 * (full_result.llf - null_result.llf) / n)

f1(true_val, pred_val)

Compute the F1 score between the PRS predictions and a binary phenotype.


Name Type Description Default

The response value or phenotype (a binary numpy vector with 0s and 1s)


The predicted value or PRS (a numpy vector)

Source code in viprs/eval/
def f1(true_val, pred_val):
    Compute the F1 score between the PRS predictions and a binary phenotype.

    :param true_val: The response value or phenotype (a binary numpy vector with 0s and 1s)
    :param pred_val: The predicted value or PRS (a numpy vector)
    from sklearn.metrics import f1_score
    return f1_score(true_val, pred_val)

liability_logit_r2(true_val, pred_val, covariates=None, return_all_r2=False)

Compute the R^2 between the PRS predictions and a binary phenotype on the liability scale using the logit likelihood as outlined in Lee et al. (2012) Gene. Epi.

The R^2 is defined as: R2_{logit} = Var(pred) / (Var(pred) + pi^2 / 3)

Where Var(pred) is the variance of the predicted liability.

If covariates are provided, we compute the incremental pseudo-R^2 by conditioning on the covariates.


Name Type Description Default

The response value or phenotype (a binary numpy vector with 0s and 1s)


The predicted value or PRS (a numpy vector)


A pandas table of covariates where the rows are ordered the same way as the predictions and response.


If True, return the null, full and incremental R2 values.

Source code in viprs/eval/
def liability_logit_r2(true_val, pred_val, covariates=None, return_all_r2=False):
    Compute the R^2 between the PRS predictions and a binary phenotype on the liability
    scale using the logit likelihood as outlined in Lee et al. (2012) Gene. Epi.

    The R^2 is defined as:
    R2_{logit} = Var(pred) / (Var(pred) + pi^2 / 3)

    Where Var(pred) is the variance of the predicted liability.

    If covariates are provided, we compute the incremental pseudo-R^2 by conditioning
    on the covariates.

    :param true_val: The response value or phenotype (a binary numpy vector with 0s and 1s)
    :param pred_val: The predicted value or PRS (a numpy vector)
    :param covariates: A pandas table of covariates where the rows are ordered
    the same way as the predictions and response.
    :param return_all_r2: If True, return the null, full and incremental R2 values.

    if covariates is None:
        add_intercept = False
        covariates = pd.DataFrame(np.ones((true_val.shape[0], 1)), columns=['const'])
        add_intercept = True

    null_result = fit_linear_model(true_val, covariates,
                                   family='binomial', add_intercept=add_intercept)
    full_result = fit_linear_model(true_val, covariates.assign(pred_val=pred_val),
                                   family='binomial', add_intercept=add_intercept)

    null_var = np.var(null_result.predict())
    null_r2 = null_var / (null_var + (np.pi**2 / 3))

    full_var = np.var(full_result.predict())
    full_r2 = full_var / (full_var + (np.pi**2 / 3))

    if return_all_r2:
        return {
            'Null_R2': null_r2,
            'Full_R2': full_r2,
            'Incremental_R2': full_r2 - null_r2
        return full_r2 - null_r2

liability_probit_r2(true_val, pred_val, covariates=None, return_all_r2=False)

Compute the R^2 between the PRS predictions and a binary phenotype on the liability scale using the probit likelihood as outlined in Lee et al. (2012) Gene. Epi.

The R^2 is defined as: R2_{probit} = Var(pred) / (Var(pred) + 1)

Where Var(pred) is the variance of the predicted liability.

If covariates are provided, we compute the incremental pseudo-R^2 by conditioning on the covariates.


Name Type Description Default

The response value or phenotype (a binary numpy vector with 0s and 1s)


The predicted value or PRS (a numpy vector)


A pandas table of covariates where the rows are ordered the same way as the predictions and response.


If True, return the null, full and incremental R2 values.

Source code in viprs/eval/
def liability_probit_r2(true_val, pred_val, covariates=None, return_all_r2=False):
    Compute the R^2 between the PRS predictions and a binary phenotype on the liability
    scale using the probit likelihood as outlined in Lee et al. (2012) Gene. Epi.

    The R^2 is defined as:
    R2_{probit} = Var(pred) / (Var(pred) + 1)

    Where Var(pred) is the variance of the predicted liability.

    If covariates are provided, we compute the incremental pseudo-R^2 by conditioning
    on the covariates.

    :param true_val: The response value or phenotype (a binary numpy vector with 0s and 1s)
    :param pred_val: The predicted value or PRS (a numpy vector)
    :param covariates: A pandas table of covariates where the rows are ordered
    the same way as the predictions and response.
    :param return_all_r2: If True, return the null, full and incremental R2 values.

    if covariates is None:
        add_intercept = False
        covariates = pd.DataFrame(np.ones((true_val.shape[0], 1)), columns=['const'])
        add_intercept = True

    null_result = fit_linear_model(true_val, covariates,
                                   family='binomial', link='probit', add_intercept=add_intercept)
    full_result = fit_linear_model(true_val, covariates.assign(pred_val=pred_val),
                                   family='binomial', link='probit', add_intercept=add_intercept)

    null_var = np.var(null_result.predict())
    null_r2 = null_var / (null_var + 1.)

    full_var = np.var(full_result.predict())
    full_r2 = full_var / (full_var + 1.)

    if return_all_r2:
        return {
            'Null_R2': null_r2,
            'Full_R2': full_r2,
            'Incremental_R2': full_r2 - null_r2
        return full_r2 - null_r2

liability_r2(true_val, pred_val, covariates=None, return_all_r2=False)

Compute the coefficient of determination (R^2) on the liability scale according to Lee et al. (2012) Gene. Epi.

The R^2 liability is defined as: R_{liability}^2 = R2_{observed}K(K-1)/(z^2)

where R_{observed}^2 is the R^2 on the observed scale and K is the sample prevalence and z is the "height of the normal density at the quantile for K".

If covariates are provided, we compute the incremental pseudo-R^2 by conditioning on the covariates.


Name Type Description Default

The response value or phenotype (a binary numpy vector with 0s and 1s)


The predicted value or PRS (a numpy vector)


A pandas table of covariates where the rows are ordered the same way as the predictions and response.


If True, return the null, full and incremental R2 values.

Source code in viprs/eval/
def liability_r2(true_val, pred_val, covariates=None, return_all_r2=False):
    Compute the coefficient of determination (R^2) on the liability scale
    according to Lee et al. (2012) Gene. Epi.

    The R^2 liability is defined as:
    R_{liability}^2 = R2_{observed}*K*(K-1)/(z^2)

    where R_{observed}^2 is the R^2 on the observed scale and K is the sample prevalence
    and z is the "height of the normal density at the quantile for K".

    If covariates are provided, we compute the incremental pseudo-R^2 by conditioning
    on the covariates.

    :param true_val: The response value or phenotype (a binary numpy vector with 0s and 1s)
    :param pred_val: The predicted value or PRS (a numpy vector)
    :param covariates: A pandas table of covariates where the rows are ordered
    the same way as the predictions and response.
    :param return_all_r2: If True, return the null, full and incremental R2 values.

    # First, obtain the incremental R2 on the observed scale:
    r2_obs = incremental_r2(true_val, pred_val, covariates, return_all_r2=return_all_r2)

    # Second, compute the prevalence and the standard normal quantile of the prevalence:

    from scipy.stats import norm

    k = np.mean(true_val)
    z2 = norm.pdf(norm.ppf(1.-k))**2
    mult_factor = k*(1. - k) / z2

    if return_all_r2:
        return {
            'Null_R2': r2_obs['Null_R2']*mult_factor,
            'Full_R2': r2_obs['Full_R2']*mult_factor,
            'Incremental_R2': r2_obs['Incremental_R2']*mult_factor
        return r2_obs * mult_factor

mcfadden_r2(true_val, pred_val, covariates=None)

Compute the McFadden pseudo-R^2 between the PRS predictions and a phenotype. If covariates are provided, we compute the incremental pseudo-R^2 by conditioning on the covariates.


Name Type Description Default

The response value or phenotype (a binary numpy vector with 0s and 1s)


The predicted value or PRS (a numpy vector)


A pandas table of covariates where the rows are ordered the same way as the predictions and response.

Source code in viprs/eval/
def mcfadden_r2(true_val, pred_val, covariates=None):
    Compute the McFadden pseudo-R^2 between the PRS predictions and a phenotype.
    If covariates are provided, we compute the incremental pseudo-R^2 by conditioning
    on the covariates.

    :param true_val: The response value or phenotype (a binary numpy vector with 0s and 1s)
    :param pred_val: The predicted value or PRS (a numpy vector)
    :param covariates: A pandas table of covariates where the rows are ordered
    the same way as the predictions and response.

    if covariates is None:
        add_intercept = False
        covariates = pd.DataFrame(np.ones((true_val.shape[0], 1)), columns=['const'])
        add_intercept = True

    null_result = fit_linear_model(true_val, covariates,
                                   family='binomial', add_intercept=add_intercept)
    full_result = fit_linear_model(true_val, covariates.assign(pred_val=pred_val),
                                   family='binomial', add_intercept=add_intercept)

    return 1. - (full_result.llf / null_result.llf)

nagelkerke_r2(true_val, pred_val, covariates=None)

Compute the Nagelkerke pseudo-R^2 between the PRS predictions and a binary phenotype. If covariates are provided, we compute the incremental pseudo-R^2 by conditioning on the covariates.


Name Type Description Default

The response value or phenotype (a binary numpy vector with 0s and 1s)


The predicted value or PRS (a numpy vector)


A pandas table of covariates where the rows are ordered the same way as the predictions and response.

Source code in viprs/eval/
def nagelkerke_r2(true_val, pred_val, covariates=None):
    Compute the Nagelkerke pseudo-R^2 between the PRS predictions and a binary phenotype.
    If covariates are provided, we compute the incremental pseudo-R^2 by conditioning
    on the covariates.

    :param true_val: The response value or phenotype (a binary numpy vector with 0s and 1s)
    :param pred_val: The predicted value or PRS (a numpy vector)
    :param covariates: A pandas table of covariates where the rows are ordered
    the same way as the predictions and response.

    if covariates is None:
        add_intercept = False
        covariates = pd.DataFrame(np.ones((true_val.shape[0], 1)), columns=['const'])
        add_intercept = True

    null_result = fit_linear_model(true_val, covariates,
                                   family='binomial', add_intercept=add_intercept)
    full_result = fit_linear_model(true_val, covariates.assign(pred_val=pred_val),
                                   family='binomial', add_intercept=add_intercept)
    n = true_val.shape[0]

    # First compute the Cox & Snell R2:
    cox_snell = 1. - np.exp(-2 * (full_result.llf - null_result.llf) / n)

    # Then scale it by the maximum possible R2:
    return cox_snell / (1. - np.exp(2 * null_result.llf / n))

pr_auc(true_val, pred_val)

Compute the area under the Precision-Recall curve for a model that maps from the PRS predictions to the binary phenotype.


Name Type Description Default

The response value or phenotype (a binary numpy vector with 0s and 1s)


The predicted value or PRS (a numpy vector)

Source code in viprs/eval/
def pr_auc(true_val, pred_val):
    Compute the area under the Precision-Recall curve for a model
    that maps from the PRS predictions to the binary phenotype.

    :param true_val: The response value or phenotype (a binary numpy vector with 0s and 1s)
    :param pred_val: The predicted value or PRS (a numpy vector)
    from sklearn.metrics import precision_recall_curve, auc
    precision, recall, thresholds = precision_recall_curve(true_val, pred_val)
    return auc(recall, precision)

roc_auc(true_val, pred_val)

Compute the area under the ROC (AUROC) for a model that maps from the PRS predictions to the binary phenotype.


Name Type Description Default

The response value or phenotype (a numpy binary vector with 0s and 1s)


The predicted value or PRS (a numpy vector)

Source code in viprs/eval/
def roc_auc(true_val, pred_val):
    Compute the area under the ROC (AUROC) for a model
     that maps from the PRS predictions to the binary phenotype.

    :param true_val: The response value or phenotype (a numpy binary vector with 0s and 1s)
    :param pred_val: The predicted value or PRS (a numpy vector)
    from sklearn.metrics import roc_auc_score
    return roc_auc_score(true_val, pred_val)