Meghaan M. Ferreira Ph.D. Contributor GEN and Clinical OMICs

Rewriting the Story of Drug Development with PheWAS

Drug development needs a new storyline. Although pharmaceutical companies spend billions of dollars to develop new therapeutics, only one out of every ten candidates survives clinical trials, and the majority fail due to a lack of efficacy and/or safety. Genetics may rewrite that story. Preclinical studies model diseases using cell culture or animal models to predict drug efficacy and safety, and, as a result, most drug candidates enter clinical trials without clear evidence of the targeted pathway’s relevance to the disease in humans. Human genetics, on the other hand, uses real-world data to decipher the riddles of human pathology.

“When we’ve looked at using genetics broadly to recapitulate drug activity it’s actually been a pretty good story,” said Joshua Denny, M.D., professor of biomedical informatics and medicine at Vanderbilt University Medical Center. “[The data] really suggests that genetics are helpful in determining what drugs work in a disease.” For example, in 2013 Okada et al., recapitulated 27 genes targeted by drugs currently approved for rheumatoid arthritis using a genome-wide association study (GWAS) meta-analysis.

Down the Rabbit Hole

However, over the past decade, scientists have followed GWAS down the rabbithole only to discover that they need a new paradigm to explain the most prevalent diseases. While GWAS works well for diseases caused by single, rare mutations with high penetrance, the black and white days of “simple” monogenic traits have given way to a more complex framework where multiple genes, and even environmental factors, contribute to disease.

According to Marylyn Ritchie, Ph.D., professor and director of biomedical and translational informatics at Geisinger Health System, the past decade of genetics research “has actually shown us that even traits we would have previously said are simple, like Mendelian traits, turn out to be complex.” The genetics of disease becomes “curiouser and curiouser” as variations in penetrance and expressivity complicate the phenotypes related to monogenic diseases, while pleiotropic genes influence multiple, seemingly unrelated, phenotypes.

In 2010, Denny and Ritchie introduced a new method, published in Bioinformatics, to answer the complex riddles genetics poses. Referred to as a “reverse GWAS,” phenome-wide association studies (PheWAS) created a new paradigm for studying associations between genetics and disease. In contrast to GWAS, which selects a disease phenotype and then compares genetic variants in affected and unaffected individuals, PheWAS selects a genetic variant and then searches for phenotypes common among individuals with that mutation.

While GWAS laid the foundation by building the statistical methods required to test for multiple gene–disease associations, “the electronic health record was the modality that made it all possible,” said Denny. Electronic health records have assembled comprehensive, unbiased repositories of information on disease phenotypes, and, when linked to genotypes, researchers can use PheWAS to mine these storehouses for causal variants, pleiotropic effects, and other relationships between phenotypes.

According to David Carey, Ph.D., associate chief research officer at Geisinger Health System, the approach could also shed light on dark portions of the “druggable genome.” Approximately 3,000 genes constitute the druggable genome, which encodes proteins with pockets capable of binding small­–molecule drugs. However, the biological function of many of these proteins remains a mystery, and less than 10% of them are targeted by FDA-approved drugs. “If you interfere with the function of those genes, what are the clinical consequences?” Carey posed. “That’s an area where PheWAS has an advantage over GWAS approaches.”

Geisinger, which now has a biorepository linked to electronic healthcare data for more than 100,000 individuals as part of their MyCode program, is using PheWAS to investigate one of four protein families known to harbor most of the proteins in the druggable genome. By looking for individuals with mutations in orphan G-protein coupled receptor genes and then asking what clinical traits are associated with those mutations, Geisinger hopes to bring to light their function and potential as a therapeutic target.

The Secret Sauce

Phenome-wide association studies have a distinct advantage over GWAS for investigating potential drug target genes with an unknown function. However, researchers need to first identify genes of interest, and many have turned to GWAS. In a preprint posted on bioRxiv, 23andMe collaborated with multiple institutions to validate potential drug targets for common immune-mediated, cardiometabolic, or neurodegenerative diseases. The study’s authors, Diogo et al., used PheWAS with a cohort of 800,000 individuals to successfully replicate 70% of known disease-gene associations among 25 single nucleotide polymorphisms (SNPs) previously identified by GWAS.

The study also identified 10 novel associations that suggest potential adverse effects for therapies targeting pathways controlled by those genes. For example, while the study confirmed a gain-of-function mutation in the PNPLA3 gene that increases the risk for liver disease, it also identified previously unknown associations that suggested inhibiting the pathway could cause severe acne and/or high cholesterol. Although the two approaches complement one another, there’s a reason why the Director of Drug Discovery at 23andMe, Erik Karrer, called PheWAS the company’s “secret sauce” in a recent Nature News & Comment article.

The ability not only to validate drug targets but also to predict potential adverse reactions early on could save pharmaceutical companies from following futile leads. “If it’s not going to work, don’t spend millions or billions [of dollars] getting there,” remarked David Mosedale, Ph.D., chief scientific officer at Total Scientific, a small contract research organization in the U.K. “You kill it early, and you move on to something else.”

Total Scientific began offering PheWAS as a contract service to current clients after developing its own cohort. Despite its size, the 1,300-individual cohort already has helped detect potential toxicology issues that could threaten a drug program. For example, one study identified an unlikely association between a candidate drug targeting an unrelated disease and thrombosis. “In some sense, PheWAS is like performing a knockout study in man without the impossible ethical hurdle,” wrote Total Scientific Executive Director and Founder, David Grainger, Ph.D., in

By bringing potential safety concerns to the forefront, PheWAS gives drug companies an opportunity to add additional parameters to animal studies, design clinical trials that exclude patients genetically predisposed to adverse reactions, or if the concern is serious enough, kill a drug program and move on—avoiding any surprises in Phase III. Of course, drug developers will need to perform additional studies to confirm adverse associations, since the hypothesis-generating nature of PheWAS functions a bit like the Cheshire Cat—answering questions with more questions. However, “[PheWAS] tells you something about your [drug] target and where you’re going,” said Mosedale.

Pharmaceutical companies may not have to go far to find new treatments for disease either. Systematically examining the broad range of phenotypes associated with a particular gene can not only signpost adverse drug effects, but can also lead to new indications for approved drugs. Repurposing drugs can significantly accelerate the process of finding a drug that might work in a particular disease—especially for rare diseases, where the challenges associated with clinical studies are magnified as a consequence of the limited number of patients available for recruitment.

David Mosedale, CSO, at work with staff member at Total Scientific.

Cashing In on Biobanks

Despite the promising story unfolding around genetics and drug development, the protagonists still have many challenges left to overcome. Excavating phenotypic information from sometimes hundreds of thousands of individual records demonstrates one of the challenges to successfully implementing PheWAS. While the proof-of-concept study published by Denny et al., addressed the need for a high-throughput method to extract clinical phenotypes from medical records by developing algorithms that map ICD-9 billing codes to diseases of interest, the field will need to move beyond billing codes to gain greater granularity from

 The main conflict, however, arises from the astronomical upfront costs and legwork required to collect phenotypic information from thousands of individuals and link that data to genotypes. Only a handful of large biobanks with linked phenotype data currently exist, and, according to Daniel Rader, M.D., professor of molecular medicine at the Perelman School of Medicine, University of Pennsylvania, “It is not a cheap endeavor to actually create this kind of

 “It’s hard to put concrete numbers on it,” said Rader, “but I can tell you that to assemble a biobank of 50,000 people, all of whom are genotyped and all of whom have electronic health record data in a way that one could easily use for a sophisticated PheWAS analysis, is well north of 15 to 20 million dollars.” While large pharmaceutical companies, including Regeneron, Merck, and GSK, have already incorporated PheWAS into their drug development programs by partnering with institutions that had the foresight to develop these biobanks, small drug companies may have to wait until it becomes more of a commodity before they can add PheWAS to their toolboxes.

 However, it’s not just about accessing the trove of information deposited in these biobanks. Investigators also need to take the size and composition of the cohort into account when evaluating what questions they can ask. Small cohorts may not have the statistical power necessary to give researchers confidence in the results—especially for diseases that occur in the population at a relatively low frequency. However, while large cohorts increase statistical power, they may still suffer from a lack of diversity.

 “The vast majority of genetic information we have linked to phenotype data is in people of European ancestry,” admitted Rader. “Moving the field to more actively study people of African ancestry, of South Indian ancestry, of Asian ancestry can only enhance our ability to make new discoveries that are going to ultimately impact on new drug targets and on drug development.”

 Another insight Rader shared on how the field will move as the future of genomics in drug development becomes less of a wonderland and more of a reality revolves around the types of variants studied: common versus rare, and harmful versus protective. While researchers agree that analyzing common and rare genetic variants together provides a more complete picture of health and disease, the majority of both GWASs and PheWASs focus on common variants. To analyze rare variants, the field will need to move beyond SNPs to whole-genome and whole-exome sequencing. Similarly, while studies focus predominantly on variants that increase disease-risk, the concept that genetics could also unearth variants that protect people from developing a disease is another powerful paradigm shift that could point drug developers toward new targets.

 As scientists search to answer the riddles of human pathology by staring into the looking-glass of genetics, one theme will run throughout their adventures—that sometimes it takes a new perspective to find the answers.   

Source: Si-Gal / Getty Images

This article was originally published in the January/February 2018 issue of Clinical OMICs. For more content like this and details on how to get a free subscription, go to

This site uses Akismet to reduce spam. Learn how your comment data is processed.