Scientists at the University of Chicago have developed a new statistical tool that uses a combination of data from genome-wide association studies (GWAS) and gene expression predictions to more accurately identify disease causing genes and variants. Their work was published this week in Nature Genetics in a paper titled “Adjusting for genetic confounders in transcriptome-wide association studies leads to reliable detection of causal genes.”
The new tool, dubbed causal-transcriptome-wide association studies or cTWAS, uses advanced statistical techniques to reduce false positives and weed out confounding genes and variants. Instead of focusing on a single gene at a time, which is likely to lead to false positives, the tool considers surrounding genes and variants, increasing the likelihood of finding the actual causal gene.
Many human diseases result from a complex interaction of multiple genes, environmental factors, and other variables. GWAS studies cast a wide net, identifying disease-linked variants throughout the genome. But as the name implies, these studies identify association rather than causality.
Typically, many variants in a given genomic region are highly correlated with each other because DNA is passed from one generation to the next in entire blocks rather than individual genes. As a result, “you may have many genetic variants in a block that are all correlated with disease risk, but you don’t know which one is actually the causal variant,” said Xin He, PhD, an associate professor of human genetics, and senior author of the new study. “That’s the fundamental challenge of GWAS, that is, how we go from association to causality.” An additional challenge is that many genetic variants are located in non-coding genomes making it difficult to interpret their effects.
One strategy used to address these challenges is to use expression quantitative trait loci, or eQTL data. However, many existing methods developed to use eQTL data to nominate risk genes are often confounded by nearby associations. In fact, these methods can generate false positive genes more than 50% of the time. “The software will allow people to do analyses that connect genetic variations to phenotypes. That’s really the key challenge facing the entire field,” according to He. “We now have a much better tool to make those connections.”
In the paper, the researchers demonstrated cTWAS’ utility by studying the genetics of LDL cholesterol levels. As an example of how cTWAS offers improvements, existing eQTL methods nominated a gene involved in DNA repair, while the new method identified a different variant in the gene targeted by statins. In total, cTWAS identified 35 putative causal genes of LDL, more than half of which have not been previously reported.
The cTWAS software is now available to download from He’s lab website. The next steps are to extend its capabilities to incorporate other types of omics data, such as splicing and epigenetics, as well as using eQTLs from multiple tissue types.