The resulting sequence data is analyzed with the GS Reference Mapper software. For sequence capture, the software application requires two input reference files: the complete human genome reference (HG18 from UCSC Genome Browser, University of California Santa Cruz), and the .gff file that describes the targeted portion of the genome enriched by the array.
The software maps all of the sequencing reads against the full human genome reference. Mapping against the full genome reference helps to eliminate false positives by determining if a sequencing read maps uniquely to the target region or if the sequencing read also maps to a region elsewhere in the genome.
To assess the reproducibility of the sequence capture and sequencing process, an experiment was designed and performed that repeated the entire process six times using the publicly available human HapMap NA11881 sample (Table). It was found that over 99% of the reads mapped to the human genome, indicating a high degree of fidelity in the capture and sequencing processes. Of those reads, ~70% mapped to the target region.
The vast majority of the reads that did not map to the target region were within the introns bordering the targeted exons and were most likely captured by probes complementary to the ends of a given exon. This result presents an additional value to researchers interested in querying variation within the exon/intron boundary regions. For some variants, uniformity of coverage and adequate sequencing depth for detection were achieved with just 4.5x sequencing coverage.