Bridging a Gap
Dr. Abecasis is one of several scientists from a handful of institutions working collaboratively to design and evaluate exome arrays. They’re basing this work on published variants derived from past sequencing studies involving more than 12,000 individuals.
“I see exome arrays as bridging a gap between genotyping-based association studies of common variants and sequencing-based studies of rare variants,” Dr. Abecasis says. “Studies that aim to survey protein-coding variation in very large numbers of individuals are very well-suited for exome arrays,” he adds.
The University of North Carolina at Chapel Hill School of Medicine’s Karen Mohlke is using these chips for such a study. Along with her colleagues, Mohlke, Ph.D., associate professor of genetics, is examining exome array data from thousands of individuals in search of low-frequency coding variants associated with metabolic traits.
In a February Nature Genetics paper, Dr. Mohlke and her colleagues identified low-frequency coding variants associated with fasting proinsulin concentrations at the SGSM2 and MADD GWAS loci as well as three new genes—TBC1D30, KANK1, and PAM—associated with fasting proinsulin or insulinogenic index in a cohort of 8,229 nondiabetic Finnish men. A corollary conclusion accompanied the researchers’ results: “This study demonstrates that exome array genotyping is a valuable approach to identify low-frequency variants that contribute to complex traits,” Dr. Mohlke et al., wrote.
“We did expect a large sample size would be needed to see any significant evidence of association with low-frequency variants,” Dr. Mohlke tells GEN. “The low-frequency variants had not been looked at previously in genome-wide association studies, and some of those could not be imputed well, even from 1,000 Genomes-based reference panels, so there was certainly an opportunity to see new associations.”
Genotyping array analysis is fairly straightforward, giving users an edge over examining raw sequence data for certain studies.
Further, the researchers chose to use exome arrays in part because they cost less. The chips, Dr. Mohlke says, proved “cost-effective, compared to sequencing, at the time we were doing this study.”
And today? “I think still it is cost-effective for analyzing low-frequency variants, at least the ones that are known,” she says. “When variants are known and on the array one wants to test, then it is much more cost-effective than sequencing.”
But for unknown variants or those associated with an ancestry group not well-represented on exome arrays, Dr. Mohlke suggests taking another approach. While powerful, exome arrays offer “an incomplete assessment of coding variants,” she says. “In the long term, sequencing still has the opportunity to identify many more variants.”
Michigan’s Dr. Abecasis echoes these sentiments. “There are certainly lots of settings where these arrays don't make sense,” he says. “As one obvious example, these arrays are no match for exome sequencing when it comes to studying the contribution of de novo mutations and other very rare variants to Mendelian disorders.”
Because particularly rare and population-specific variants often do not make the design cut, exome arrays “only represent 70 to 80% of the variation in any one exome,” Dr. Abecasis adds.
As a result, Dr. Mohlke at UNC suggests weighing cost vs. array content from the get-go. “If I were studying a population group for which the lower frequency and rare variants were not included on this array I would either add them to the existing array or design a new one, if that were technically [feasible] and cost-effective to do,” she says.