AnnotationMatrix
Bases: object
A wrapper class for handling annotation matrices, which are essentially tables of features for each variant in the genome. These features include information such as whether the variant is in coding regions, enhancers, etc. It can also include continuous features derived from experimental assays or other sources.
The purpose of this class is to present a unified and consistent interface for handling annotations across different tools and applications. It should be able to read and write annotation matrices in different formats, filter annotations, and perform basic operations on the annotation matrix. It should also allow users to define new custom annotations that can be used for downstream statistical genetics applications.
Attributes:
Name | Type | Description |
---|---|---|
table |
A pandas dataframe containing the annotation information. |
|
_annotations |
A list or array of column namees to consider as annotations. If not provided, will be inferred heuristically, though we recommend that the user specify this information. |
Source code in magenpy/AnnotationMatrix.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 |
|
annotations
property
¶
Returns:
Type | Description |
---|---|
The list of annotation names or IDs in the annotation matrix. |
binary_annotations
property
¶
Returns:
Type | Description |
---|---|
A list of binary (0/1) annotations in the annotation matrix. |
chromosome
property
¶
A convenience method to get the chromosome if there is only one chromosome in the annotation matrix.
Returns:
Type | Description |
---|---|
The chromosome number if there is only one chromosome in the annotation matrix. Otherwise, None. |
chromosomes
property
¶
Returns:
Type | Description |
---|---|
The list of unique chromosomes in the annotation matrix. |
n_annotations
property
¶
Returns:
Type | Description |
---|---|
The number of annotations in the annotation matrix. |
n_snps
property
¶
Returns:
Type | Description |
---|---|
The number of variants in the annotation matrix. |
shape
property
¶
Returns:
Type | Description |
---|---|
The dimensions of the annotation matrix (number of variants x number of annotations). |
snps
property
¶
Returns:
Type | Description |
---|---|
The list of SNP rsIDs in the annotation matrix. |
__init__(annotation_table=None, annotations=None)
¶
Initialize an AnnotationMatrix object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
annotation_table
|
A pandas dataframe containing the annotation information. |
None
|
|
annotations
|
A list of array of columns to consider as annotations. If not provided, will be inferred heuristically, though we recommend that the user specify this information. |
None
|
Source code in magenpy/AnnotationMatrix.py
add_annotation(annot_vec, annotation_name)
¶
Add an annotation vector or list to the AnnotationMatrix object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
annot_vec
|
A vector/list/Series containing the annotation information for each SNP in the AnnotationMatrix. For now, it's the responsibility of the user to make sure that the annotation list or vector are sorted properly. |
required | |
annotation_name
|
The name of the annotation to create. Make sure the name is not already in the matrix! |
required |
Source code in magenpy/AnnotationMatrix.py
add_annotation_from_bed(bed_file, annotation_name)
¶
Add an annotation to the AnnotationMatrix from a BED file that lists the range of coordinates associated with that annotation (e.g. coding regions, enhancers, etc.). The BED file has to adhere to the format specified by, https://uswest.ensembl.org/info/website/upload/bed.html with the first three columns being:
CHR StartCoordinate EndCoordinate ...
Note
This implementation is quite slow at the moment. May need to find more efficient ways to do the merge over list of ranges.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bed_file
|
The path to the BED file containing the annotation coordinates. |
required | |
annotation_name
|
The name of the annotation to create. Make sure the name is not already in the matrix! |
required |
Raises:
Type | Description |
---|---|
AssertionError
|
If the annotation name is already in the matrix. |
Source code in magenpy/AnnotationMatrix.py
filter_annotations(keep_annotations)
¶
Filter the list of annotations in the matrix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
keep_annotations
|
A list or array of annotations to keep. |
required |
Source code in magenpy/AnnotationMatrix.py
filter_snps(extract_snps=None, extract_file=None)
¶
Filter variants from the annotation matrix. User must specify either a list of variants to extract or the path to a file with the list of variants to extract.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
extract_snps
|
A list or array of SNP IDs to keep in the annotation matrix. |
None
|
|
extract_file
|
The path to a file with the list of variants to extract. |
None
|
Source code in magenpy/AnnotationMatrix.py
from_file(annot_file, annot_format='magenpy', annot_parser=None, **parse_kwargs)
classmethod
¶
Initialize an AnnotationMatrix object from a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
annot_file
|
The path to the annotation file. |
required | |
annot_format
|
The format of the annotation file. For now, we mainly support annotation files in the |
'magenpy'
|
|
annot_parser
|
An |
None
|
|
parse_kwargs
|
arguments for the pandas |
{}
|
Returns:
Type | Description |
---|---|
An instance of the |
Source code in magenpy/AnnotationMatrix.py
get_binary_annotation_index(bin_annot)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bin_annot
|
The name of the binary annotation for which to fetch the relevant variants. |
required |
Returns:
Type | Description |
---|---|
The indices of all variants that belong to binary annotation |
Source code in magenpy/AnnotationMatrix.py
split_by_chromosome()
¶
Split the annotation matrix by chromosome.
Returns:
Type | Description |
---|---|
A dictionary of |
Source code in magenpy/AnnotationMatrix.py
to_file(output_path, col_subset=None, compress=True, **to_csv_kwargs)
¶
A convenience method to write the annotation matrix to a file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
output_path
|
The path and prefix to the file where to write the annotation matrix. |
required | |
col_subset
|
A subset of the columns to write to file. |
None
|
|
compress
|
Whether to compress the output file (default: True). |
True
|
|
to_csv_kwargs
|
Key-word arguments to the pandas csv writer. |
{}
|
Source code in magenpy/AnnotationMatrix.py
values(add_intercept=False)
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
add_intercept
|
Adds a base annotation corresponding to the intercept. |
False
|
Returns:
Type | Description |
---|---|
The annotation matrix as a numpy matrix. |
Raises:
Type | Description |
---|---|
KeyError
|
If no annotations are defined in the table. |