Send to printer »

Feature Articles : Jan 1, 2011 (Vol. 31, No. 1)

Adapting Protein Expression for HT

The Best Genes, Well-Tailored Systems, and Ideal Growth Conditions Essential to Process
  • Josh P. Roberts

The three most important things when expressing protein for high-throughput applications are: optimize, optimize, optimize. Put the best genes you can into your expression system, use the most fitting expression systems, and make sure the growth conditions are suited to achieve the most from those systems.

Whether it’s drug screening, crystallography, NMR, mass spectrometry, or binding and toxicity studies, the earliest decisions about choice of template and how to express it can have ripple effects throughout a project, points out Frank Schäfer, Ph.D., associate director of R&D, head of DNA and protein sciences at Qiagen.

One problem researchers frequently encounter when trying to express human proteins in bacterial cells is, for example, that the organisms use the genetic code differently. True, the genetic code is universal in that the same three-nucleotide codon always signifies the same amino acid. But there’s ambiguity in the other direction—that is, most of the 20 naturally occurring amino acids can be encrypted by more than 1 of 61 possible codons, and different organisms prefer different spellings of those amino acids.

There are even some sequences used by one organism that may be misinterpreted by another—take, for instance, the presence of mammalian intragenic sequences that mimic bacterial ribosomal entry sites.

Organisms display intraspecific codon usage biases as well. Highly expressed proteins are often spelled with a skewed set of otherwise synonymous codons relative to more lowly expressed proteins. Certain tRNAs are more abundant than others, while some tRNAs may be recharged more rapidly than others—and the two sets are not always the same.

Thus there is even plenty of room for improvement expressing a native protein in its endogenous environment, let alone in an exogenous system. “It seems that the genes as they have been made by evolution are not optimized for maximum expression,” says Dr. Schäfer, who, along with the others included in this article, will be speaking at CHI’s “PepTalk” to be held in San Diego later this month.

“By optimizing the codon usage and several other parameters, we were very successful in dramatically increasing expression of human genes in human cells,” Dr. Schäfer explains.

An algorithm designed by Geneart (www.geneart.com) allows Qiagen to offer a complete set of plasmids (termed QIAgenes) covering the entire annotated human genome, optimized for expression in either E. coli or human/insect cells. QIAgenes are designed to take into account the host cell’s preferred codon usage as well as mRNA secondary structure and stability considerations, and contain a 6xHIS tag to facilitate purification. QIAgenes plasmids can be used as templates for cell-free protein expression as well.

Rational gene design can increase the chances of successful expression and can yield as much as a 50-fold increase in protein expression. This, in turn, can make a huge contribution to the whole project, notes Dr. Schäfer. “The more protein you have, the easier it is to perform all of the downstream experiments.”

Twist the Lion's Tail

In many (if not most) cases, codon choice seems to be the most trustworthy way of optimizing a gene, though how to choose the best codons for each amino acid is not unambiguous. Beside tRNA usage, other factors can affect expression as well, such as unwanted termination-, splice-, and poly-adenylation sites, and certain mRNA structural elements. The role that each of these plays in different proteins and in different circumstances, how their contributions should be weighted, and how they affect each other have not yet been definitively determined.

And considering that there can be more than 10100 different ways to spell a 300 amino acid sequence, examining every possibility is a practical impossibility.

Mark Welch, Ph.D., director of gene design at DNA2.0 (www.dna20.com), believes that many of the algorithms used to determine optimum codon usage are based on assumptions derived from very limited datasets and uncorroborated evidence. Dr. Welch and his colleagues set out to determine what factors have the most effect on protein expression. They systematically varied design parameters in constructing genes for a number of diverse proteins and determined their impacts on expression in various host organisms.

Among their findings were that overall codon usage frequencies could explain most of the variation in all systems tested, he says. tRNA concentrations were pretty telling as well. “The assumption was that you want to use only those with the highest concentration, yet it never made sense that that would necessarily be the case, particularly for heterologous overexpression,” Dr. Welch explains. At least in E. coli, it turns out, a distributed use of tRNAs with a bias toward those that remain most highly charged during amino acid starvation is preferred.

While these findings go a long way toward predicting the most efficacious string of As, Ts, Gs, and Cs, algorithms are only as good as the data used to train them, and accounting for all protein, growth-condition, and host-specific nuances is a challenge. DNA2.0 is continually refining the algorithms it works with and broadening their applicability to new expression systems. According to Dr. Welch, “the most important thing is that we determine what is optimal experimentally.”

DNA2.0’s strategy is to diversify genes around a particular codon usage bias (e.g., host genomic bias) and then experimentally determine their performance (e.g., expression level, total activity, etc.). The company then uses the data to “evolve the gene population” for further testing, he explains. “It’s not yet predictable from rationalizations people used in the past. If you don’t approach it this way, you won’t know that you’re near optimal.”

Methylotrophic Yeast

Another empirical strategy starts with a contract research organization like VTU Technology taking a gene that a company wants expressed in high quantities as a secreted protein. The firm then designs a synthetic gene that fits the Pichia pastoris codon usage table, along with other parameters. The gene is then pool-cloned into the methylotrophic yeast, and transformants are screened in 96-well deep-well plates.

VTU has a library of P. pastoris alcohol oxidase 1 (AOX-1) promoters, which the company’s head of R&D, Roland Weis, Ph.D., thinks is probably the strongest native promoter known in the microbial world. The library was created by Professor Anton Glieder from Graz University of Technology and his colleagues by transcription factor binding site mining, producing variants that allow for a variety of expression patterns. These can be transformed into a set of hosts that, for example, co-express various auxiliary proteins.

The key is that different proteins have different ideal expression requirements. Some proteins give the best yield when they begin to be expressed earlier than they would with the wild-type AOX-1 promoter, while other genes need to be kept silent longer, explains Dr. Weis.

“There is often no real rationale behind it, and that means you have to screen different promoter types to see which type best fits the needs of the gene.”

Using this combinatorial approach, VTU can produce and screen thousands of clones in microscale. The best two or three clones are put directly into a small fermentor. “The one-liter fermentation nicely reflects what will come out later in higher-volume fermentations,” Dr. Weis says.

A typical yield may be in the 5–15 g/L of folded, post-translationally modified protein. “Pichia hardly secretes any other proteins, so there is hardly anything else in the culture supernatant of Pichia fermentation except your protein (if it works well),” he extols.

By cutting out intermediary steps like shaker flask cultures, the entire process can reportedly be completed in about six weeks.

For customers who don’t want to deal with the toxicity and safety concerns of using methanol, VTU has created an analogous AOX-1 promoter system driven by glycerol, Dr. Weis notes. These yeast strains are capable of yielding between 30–70% of the best methanol-responsive strains—“but always far beyond the titers that you can reach with competitors’ systems.”

Look Mom, No Cells

Sometimes getting buckets of protein isn’t as important as getting the right proteins, fast. When working on an analytical scale, researchers can often bypass cloning, transfecting, and cell culture by using in vitro translation (IVT) systems based on bacterial, insect, or mammalian cell lysates.

With IVT you can pull an aliquot out of the freezer, mix it with template mRNA, and in 90 minutes have a protein that normally takes many days or weeks to produce, explains Peter Bell, Ph.D., director of proteomics R&D at Thermo Fisher Scientific (www.thermo.com). “E. coli, insect, and rabbit systems have been getting progressively closer to making human-like proteins, but they have never been perfect, and the yield and functionality of protein has always been variable.”

The Thermo Scientific Pierce In Vitro Glycoprotein Expression Kit and In Vitro Protein Expression Kit, introduced about a year ago, are the first to be based on human lysates, Dr. Bell says. “The systems provide human post-translational modifications with a higher yield than some of the competing options,” he adds. To date these have been used to produce many important classes of protein including membrane proteins and kinases.

The benefits of IVT systems include more than just speed. Proteins that would be toxic to a host cell can be produced in vitro, for example, as can those that might otherwise end up in inclusion bodies and be difficult to purify. IVT may also be used to easily incorporate radioactive or otherwise “unnatural” amino acids, which Dr. Bell says Thermo is currently looking into. As a result of ongoing development efforts, he also expects to launch updated products this calendar year that provide a 10-fold improvement in protein yield.