The genomes of children affected by autism spectrum disorders (ASD) harbor significantly more damaging tandem repeat mutations that are not present in their parents’ genomes, a new study reports. Tandem repeats (TR) are sequences of two or more DNA base pairs repeated end to end on a chromosome.
The new study titled “Genome-wide patterns of de novo tandem repeat mutations and their contribution to autism spectrum disorders,” published in Nature, highlights the contributions of these understudied mutations in autism.
“Few researchers really study these repetitive regions because they’re generally non-coding—they do not make proteins, their function is unclear, and they can be difficult to analyze,” says Melissa Gymrek, assistant professor in the University of California (UC) San Diego Department of Computer Science and Engineering and School of Medicine. “However, my lab has found these tandem repeats can influence gene expression, as well as the likelihood of developing certain conditions such as ASD.”
The study is based on 1,600 families that include mother, father, a neurotypical child and a child with ASD. The authors focus on de novo mutations, which appear in the children but not the parents. Led by Ileena Mitra, graduate student at University of California, San Diego, the study identifies an average of 50 de novo mutations at tandem repeats in each child, regardless of whether they were affected by autism.
Children with ASD showed a slight but statistically significant increase in the number of de novo TR mutations compared with their neurotypical siblings. Using a novel algorithm developed by University of California San Diego bioengineering undergraduate and second author Bonnie Huang, the researchers showed TR mutations predicted to be the most deleterious evolutionarily, are found at higher rates in ASD children.
“In our initial analysis, the ratio between the number of mutations in ASD children and neurotypical children was around 1.03, so barely above one,” said Gymrek. “However, after we applied Bonnie’s tool, we found relative risk increased about two-and-a-half fold. The kids with autism had more severe mutations compared to the controls.”
The authors note, de novo TR mutations in children with ASD tend to be enriched in fetal brain regulatory regions—genomic regions predicted by epigenomics data to be promoters or enhancers in fetal brain samples.
Factors that can influence the frequency of de novo TR mutations are also underscored in the study. Children with older fathers had more de novo TR mutations, quite possibly because sperm continues to divide—and accumulate mutations—during a man’s lifetime. The changes in repeat length originating in the ova were often larger than those from the sperm, although the reasons for this are unclear.
“The mutations from dad tended to be plus or minus one copy,” said Gymrek. “However, mutations from mom were usually plus or minus two or more copies, so we’d see more dramatic events when they came from the mother.”
Studying TRs have presented multiple technical challenges. “Overcoming these challenges is a major focus of our lab,” says Gymrek.
It can be difficult to count the number of repeats an individual has when the TR is long relative to the sequence read length in the next-generation sequencing data. The process of sample preparation or sequencing can introduce repeat artifacts or “stutter” errors that confound the counting of TRs. Also, sequences are often poorly aligned at repetitive regions.
“We and others have developed algorithms to overcome these challenges and analyze tandem repeat regions from NGS,” says Gymrek.
“Because analyzing these regions is difficult and error-prone in the first place, it is even more challenging to accurately identify regions with mutations,” says Gymrek. Eliminating false positives using a bioinformatic tool created by the team, called MonSTR, helps identify cases where children have true variations in TR copy number compared to their parents’ DNA.
“MonSTR which builds on a method called HipSTR developed by the Erlich Lab, treats individual genotypes probabilistically which allows us to quantify uncertainty when inferring mutations. It also allows us to apply detailed filters to remove mutation calls that are unreliable,” says Gymrek.
Interpreting the significance of detected TR mutations poses a critical challenge as well. “We found that most of these tandem repeat mutations fell into regions of the DNA that do not encode for proteins. Even for ones falling in protein-coding regions, they typically result in ‘in-frame’ changes for which the interpretation is also unclear,” says Gymrek.
A method called “SISTR” developed in collaboration with Kirk Lohmueller, PhD, at UC Los Angeles, co-senior author on the study, and led by Huang offered a major breakthrough for the study. SISTR analyzes variations at a particular short tandem repeats (STRs) in a healthy population, and compares that to expectations based on how STRs mutate over time.
“If we expect an STR to have a lot of variation, but actually see very little variation, that is a clue that there is some evolutionary force keeping the STR at a particular length. Bonnie was able to build models that use this information to measure a ‘selection’ score for any particular mutation. Even though these estimates are noisy, they give us some handle to pick out mutations likely to have the most biological impact,” says Gymrek.
This bioinformatic, genome-wide approach highlights several genes that have been previously linked to ASD, as well as new candidates, which the lab is now exploring.
“We want to learn more about what these novel ASD genes are doing,” said Gymrek. “It’s exciting because repeats have so much more variation compared to point mutations. We can learn quite a bit from a single location on the genome.”