Leading the Way in Life Science Technologies

GEN Exclusives

More »

Feature Articles

More »
Feb 15, 2009 (Vol. 29, No. 4)

Honing In on Targeted Resequencing

Researchers Revisit the Human Genome to Identify Disease-Associated Variants

  • Population Variations

    Click Image To Enlarge +
    The SOLiD 3 System was designed to tackle large-scale resequencing, according to Life Technologies.

    Next-generation sequencing technologies are also being applied to study evolutionary changes. “There has never been a better time to analyze molecular variation data from natural populations,” notes Paul Marjoram, Ph.D., assistant professor, preventative medicine, Keck School of Medicine, University of Southern California (Los Angeles).

    “We are examining mutation and recombination rates between individuals. It is a rite of passage for a computational biologist to develop methods to determine the number of mutations in a data set and to then calculate the rate at which mutations happen. The ultimate goal is to design association studies.”

    “Genome-wide association studies interrogate the genome in a set of individuals and look for polymorphisms that differentiate two populations—for example, those that have a disease and those that don’t, explains Dr. Marjoram. “The problem is that often you can only derive partial information. There are holes and gaps in the coverage of the genome. These gaps can be filled by inferring sequences and imputing the missing data. This can be made easier by referring to an external library of data for related individuals in which you already know what falls in the missing regions.

    “The key question to ask when starting association studies is ‘how big a sample do I need?’ So, the first step is to do a power calculation. There are a number of ways to do this, but the bottom line is that you have to divide the coverage across samples. We have found that it is better to divide coverage equally across individuals. But, even given that knowledge, you still need to decide whether to use, for example, 100 individuals and 20-fold coverage, or 500 individuals with fourfold coverage.”

    Ultimately, it is a like the race of the tortoise and the hare. The hare, in this case, will use inexact methods to more quickly produce a best guess, while the tortoise will perform slow, steady, and difficult annotation of genomic sequences. “Both approaches will produce results in their own time, but faster and more useful methods are available right now by using simplified models or summaries of the data,” concludes Dr. Marjoram.

Related content

Be sure to take the GEN Poll

Cancer vs. Zika: What Worries You Most?

While Zika continues to garner a lot of news coverage, a Mayo Clinic survey reveals that Americans believe the country’s most significant healthcare challenge is cancer. Compared to other diseases, does the possibility of developing cancer worry you the most?

More »