ANNOVAR® perl script “convert2annovar.pl” converts the .tsv files to ANNOVAR® acceptable format for further analysis.
CODE: convert2annovar.pl -format cg -out vart1 vart1.tsv
CODE: convert2annovar.pl -format cg -out varn1 varn1.tsv
To isolate tumor specific variants, the germline variants from varn1 can be removed by considering the varn1 file as a generic database file (Figure 1).
CODE: annotate_variation.pl -dbtype generic -genericdbfile varn1 vart1 –buildver hg19
The file containing filtered results “vart.hg19_generic_filtered“ is used as input for gene based annotation which extracts variants in the exonic, intronic, intergenic and other regions.
CODE: annotate_variation.pl -geneanno vart1 -buildver hg19 <location of refseq database>
The result file “vart1.hg19_generic_filtered.exonic_variant_function” contains all exonic variants and “vart1.query.hg19_generic_filtered.variant_function” contains all variants in intronic and other regions. Using the “grep” function, specific genes can be searched within the variant function file. For example, high breast cancer risk genes such as TP53, BRCA1 and BRCA2 are searched (Figure 2).
In order to remove those exonic variants that are commonly observed, the file is filtered using Complete Genomics 69 genome database with a Minor Allele Frequency 0.05.
Note: Download cg69 from ANNOVAR® as:
CODE: annotate_variation.pl -downdb -buildver hg19 -webfrom annovar cg69 <database location>
Note: For the cg69 filtering step, extract and use only columns 5 through 13 from the exonic variant file.
CODE: annotate_variation.pl -filter -dbtype cg69 vart1.hg19_generic_filtered.exonic_variant_function -buildver hg19 /<database location> -score_threshold 0.05
Additional common variants from dbSNP are removed directly using ANNOVAR® by mapping against the “NonFlagged” dbSNP database, which can be downloaded using the –webfrom procedure in ANNOVAR® ($ perl annotate_variation.pl -downdb -buildver hg19 -webfrom annovar snp137NonFlagged <db location>) or by using the “common snp.gff” file from Genome Trax™.
Extending beyond simple removal of common variants, Genome Trax™’s data tracks can be downloaded in .gff format and used as database files for the region based annotation procedure in ANNOVAR® in order to further filter variants of significant interest.
CODE: annotate_variation.pl -regionanno -dbtype gff3 -gff3dbfile /genometrax_hgmd_hg19.gff vart1.hg19_generic_filtered. exonic_variant_function_analysis -gff3attr -buildver hg19 <database location>
As an example, we used Genome Trax™ data tracks for the COSMIC and HGMD® databases to further filter the set of significant variants, in combination with more general tracks.