Results provide new insights into how genomic variation impacts on phenotypes and gene regulation.
Scientists report on the genome sequencing, analysis, and comparison of 17 strains of laboratory mouse. Their work covered nine classical laboratory strains, four wild-derived inbred strains, three related 129-strains, and C57BL/6NJ, the strain used by three genome-wide knockout programs.
Reported in two papers published side by side in Nature, the results represent the largest catalogue for any vertebrate model, claims the international team that carried out the project. Studies were headed by scientists at the Wellcome Trust Sanger Institute and the University of Oxford.
The massive sequencing and analysis program generated data that allowed the researchers to use the genomes to compare variations between mouse strains, identify functional variants, explore the phylogenetic history of the laboratory mouse, and examine the functional consequences of allele-specific variation on transcript abundance.
The authors claim the resulting data will significantly reduce the amount of mouse breeding and testing required to identify genes and mutations, as the initial discovery can now be made computationally. The project identified 56.7 million unique SNP sites in the 17 strains, and, importantly, highlighted other types of sequence polymorphism that have previously been difficult to assess on a genome-wide scale: Indels were identified at 8.8 million unique sites, and 0.28 million structural variations were found.
Of the total number of SNP loci identified, 0.12 million SNP positions were sited in protein-coding sequences that led to amino acid changes (nonsynonymous substitutions), and 0.26 million were associated with synonymous substitutions. Some functional variants previously reported in one strain were in addition found for the first time in these other strains.
Overall, the work suggested that at least 12% of transcripts show a significant tissue-specific expression bias, the researchers add. The scale of the data meant the researchers could also demonstrate that the molecular nature of sequence variants and their position relative to genes can provide clues as to whether they are likely to be functional or not.
Using a statistical method to predict whether the allelic pattern of a variant is consistent with its action as the molecular cause of quantitative trait variation (QLT), the team found that functional variants at small effect QTLs are significantly more likely to be intergenic and less likely to be a structural variant. On the other hand, functional variants at large effect QTLs are significantly less likely to be intergenic and more likely to be intronic.
“We now know where the variants are, so the questions today are what do they do, and can we explain the phenotypic differences between different strains of mice?” concludes the Sanger Institute’s Thomas Keane, Ph.D., lead author of the paper titled “Mouse genomic variation and its effect on phenotypes and gene regulation.
“In some cases it has taken 40 years—an entire working life—to pin down a gene in a mouse model that is associated with human disease. Now with our catalogue of variants, the analysis of these mice is breathtakingly fast and can be completed in the time it takes to make a cup of coffee.”
The second Nature paper, titled “Sequence-based characterization of structural variation in the mouse genome,” describes the results of analyses focused on identifying structural variants (SVs) in genes and their impact on phenotype. Overall, the work identified 711,920 SVs at 281,243 sites in the genomes of the 13 classical and four wild-derived inbred mouse strains.
The majority of these were less than 1kb in size, and 98% were deletions or insertions. Surprisingly, however, SVs were found to be less likely than other sequence variants to cause gene expression or quantitative phenotypic variation, the researchers claim.
This conclusion was based on a number of observations: SVs overlapping a gene accounted for less than 10% of variation in gene expression, (which is three to four times less than that found by studies using expression arrays); SVs overlapping exons are rare; and, estimates based on calculations extrapolating from those SVs identified that do delete exons suggested that there are only about 50 SVs that directly overlap exons, or about 0.2% of the total burden of SVs in the genome.
Nevertheless, the researchers add, even though SVs appear to make a relatively small contribution to the total amount of quantitative phenotypic variation, at a small number of QTLs they are the cause of variation, and larger effect QTLs are more likely to arise from SVs, the team continues. Of 24 SVs that were found to affect coding exons (including six that encompassed a gene in its entirety), five were already known, but the other 19 were completely novel. A third of the affected genes are involved in immunity and infection. “Despite their relative rarity in the mouse genome, SVs that cause phenotype change are likely to provide biological insights out of proportion to their relative small contribution to phenotypic variance,” the authors claim. “We expect that the alleles we have described will provide a starting point for investigating the relationship between phenotype and genotype in mice.”
Jonathan Flint, Ph.D., at the Wellcome Trust Centre for Human Genetics, who co-led the SV study, concluded, “This study is the first step in a long path that moves from understanding what the gene is to what it does.”