GEN Exclusives

More »

Feature Articles

More »
Feb 15, 2009 (Vol. 29, No. 4)

Honing In on Targeted Resequencing

Researchers Revisit the Human Genome to Identify Disease-Associated Variants

  • Population Variations

    Click Image To Enlarge +
    The SOLiD 3 System was designed to tackle large-scale resequencing, according to Life Technologies.

    Next-generation sequencing technologies are also being applied to study evolutionary changes. “There has never been a better time to analyze molecular variation data from natural populations,” notes Paul Marjoram, Ph.D., assistant professor, preventative medicine, Keck School of Medicine, University of Southern California (Los Angeles).

    “We are examining mutation and recombination rates between individuals. It is a rite of passage for a computational biologist to develop methods to determine the number of mutations in a data set and to then calculate the rate at which mutations happen. The ultimate goal is to design association studies.”

    “Genome-wide association studies interrogate the genome in a set of individuals and look for polymorphisms that differentiate two populations—for example, those that have a disease and those that don’t, explains Dr. Marjoram. “The problem is that often you can only derive partial information. There are holes and gaps in the coverage of the genome. These gaps can be filled by inferring sequences and imputing the missing data. This can be made easier by referring to an external library of data for related individuals in which you already know what falls in the missing regions.

    “The key question to ask when starting association studies is ‘how big a sample do I need?’ So, the first step is to do a power calculation. There are a number of ways to do this, but the bottom line is that you have to divide the coverage across samples. We have found that it is better to divide coverage equally across individuals. But, even given that knowledge, you still need to decide whether to use, for example, 100 individuals and 20-fold coverage, or 500 individuals with fourfold coverage.”

    Ultimately, it is a like the race of the tortoise and the hare. The hare, in this case, will use inexact methods to more quickly produce a best guess, while the tortoise will perform slow, steady, and difficult annotation of genomic sequences. “Both approaches will produce results in their own time, but faster and more useful methods are available right now by using simplified models or summaries of the data,” concludes Dr. Marjoram.

Add a comment

  • You must be signed in to perform this action.
    Click here to Login or Register for free.
    You will be taken back to your selected item after Login/Registration.

Related content


GEN Jobs powered by connects you directly to employers in pharma, biotech, and the life sciences. View 40 to 50 fresh job postings daily or search for employment opportunities including those in R&D, clinical research, QA/QC, biomanufacturing, and regulatory affairs.
More »

Be sure to take the GEN Poll

Patient Access to Genetic Information

Do you think patients have the absolute right to gain access to their own genetic information from medical or clinical laboratories?

More »