Twist the Lion's Tail
In many (if not most) cases, codon choice seems to be the most trustworthy way of optimizing a gene, though how to choose the best codons for each amino acid is not unambiguous. Beside tRNA usage, other factors can affect expression as well, such as unwanted termination-, splice-, and poly-adenylation sites, and certain mRNA structural elements. The role that each of these plays in different proteins and in different circumstances, how their contributions should be weighted, and how they affect each other have not yet been definitively determined.
And considering that there can be more than 10100 different ways to spell a 300 amino acid sequence, examining every possibility is a practical impossibility.
Mark Welch, Ph.D., director of gene design at DNA2.0 (www.dna20.com), believes that many of the algorithms used to determine optimum codon usage are based on assumptions derived from very limited datasets and uncorroborated evidence. Dr. Welch and his colleagues set out to determine what factors have the most effect on protein expression. They systematically varied design parameters in constructing genes for a number of diverse proteins and determined their impacts on expression in various host organisms.
Among their findings were that overall codon usage frequencies could explain most of the variation in all systems tested, he says. tRNA concentrations were pretty telling as well. “The assumption was that you want to use only those with the highest concentration, yet it never made sense that that would necessarily be the case, particularly for heterologous overexpression,” Dr. Welch explains. At least in E. coli, it turns out, a distributed use of tRNAs with a bias toward those that remain most highly charged during amino acid starvation is preferred.
While these findings go a long way toward predicting the most efficacious string of As, Ts, Gs, and Cs, algorithms are only as good as the data used to train them, and accounting for all protein, growth-condition, and host-specific nuances is a challenge. DNA2.0 is continually refining the algorithms it works with and broadening their applicability to new expression systems. According to Dr. Welch, “the most important thing is that we determine what is optimal experimentally.”
DNA2.0’s strategy is to diversify genes around a particular codon usage bias (e.g., host genomic bias) and then experimentally determine their performance (e.g., expression level, total activity, etc.). The company then uses the data to “evolve the gene population” for further testing, he explains. “It’s not yet predictable from rationalizations people used in the past. If you don’t approach it this way, you won’t know that you’re near optimal.”