September 15, 2005 (Vol. 25, No. 16)

Kevin Ahern

Key Trend Focuses on Analyzing Microarray Data

Its time once again for my annual survey of molecular biology software. Since the last report, a few new packages have appeared and are peppered in the list of products below. A few more have disappeared from sight and still others are badly in need of updating, but the same could be said in almost any given year.

The tendency toward integration of functions into integrated programs has slowed, though not stopped entirely. If there is a trend in the field, it is the design of software for analyzing microarray data.

Microarrays have provided very fertile programming problems, with software products that assist in almost every aspect of array-related research, from design to visualization to statistical analysis to integration of information to system-wide understanding.

Integrated Products

The category of integrated software products has grown in number with a couple of new additions (DNAssist and Sequence Manipulation Suite). Despite inroads being made by the open source movement, commercial vendors appear to be holding on, at least as measured by their numbers.

The past couple of years have witnessed one commercial product resurrected from the dead (DNASIS), and one collection suffering from neglect (ChromaTool, GeneTool, and PepTool), as well as expansion (LaserGene), and new ownership (Vector NTI Suite, acquired by Invitrogen).

MacVector users rejoiced in the release of a new version (8.0) designed to take advantage of OS X, breathing life into Bill Kraus original standard that keeps plugging along.

Sequence Display/Manipulation/Editing

The category of sequence display/editing/manipulation has expanded considerably in recent years as demands peripheral to simple sequence editing have arisen from genome projects. The first is the desire of users to format sequences with color, annotations, and other information. A few products meet those needs (biOpen, CINEMA, ElDorado, STING).

The second is the development of software to read automated sequence information (4Peaks, ABI View, PHRED, TraceTuner). The third is for assembly of contigs generated by sequencing projects. Offerings here include CAP3, iCE, MERGER, MIRA2, Paracel Genome Assembler, Paracel Transcript Assembler, and Sequencher.

A few tools for sequence manipulation, such as translating, reverse complementing, and inverting are still out there (Reverse Complement), but this subcategory serves as little more than a reminder of computational problems of long ago.

Restriction Enzyme Related

Restriction enzyme related software is another category whose original usefulness has paled in comparison to bigger project demands and has expanded to meet new needs. Though identification of restriction sites from sequences is still important for vector construction, online tools (In Silico Restriction Cutting, NEB Cutter, WatCut), freebies (EnzymeX), and integrated products mentioned above satisfy these needs well.

Researchers demands for publication-quality graphic products have fueled development of plasmid mapping software (Gene Construction Kit, NetPlasmid, PDRAW32, REMAP, SimVector, and Visual Cloning) and it is this function that has seen considerable development.

Translation

Numerous tools are available for simple identification and translation of protein coding sequences in DNA. Notably, all of the products listed in this category are noncommercial, reflecting the fact that commercial software developers have looked to more complicated translation problems, such as gene identification in eukaryotes (genomics), as posing more interesting problems and having more potential for development.

Sequence Searching

Sequence searching, as a category, has a considerable hodge-podge of offerings. In some ways, this category overlaps with sequence editors. At least one integrated software sequence editor (MacVector) has stood out, providing users with the simplest interface for accessing Entrez. It is hardly alone, however, in providing database access.

Baylor College of Medicines SearchLauncher (listed as BLAST) is one of the best, but there are zillions of others not listed here for space reasons.

Other automated products include ACT, DARWIN, DNASIS GeneIndex, Entrez Cross Database Searcher, PatternHunter, Seqware DataCenter, and StarBlast.

Multiple sequence alignment programs are important tools for making sense of retrieved information. Notable products include ClustalW Format Conversion, COGS, JOY, Multiple Sequence Alignments, MVIEW, Pairwise Sequence Alignments, and Tmap.

Specialized search programs round out this category with offerings targeted to answering narrowly defined questions.

Phylogenetics

DNA sequence information combined with computer analysis revolutionized phylogenetics. Products in this category provide general phylogenetic information (Tree of Life), genome alignments (MGA), and phylogenetic trees (ATV, fastDNAml, FootPrinter, PAUP, Phydbac, Phylip, Phylodendron, QuickTree, TNT, Treefinder, TREE-Puzzle, Treeview).

In many cases, phylogenetic tree information is no longer the end point of an analysis, but is instead a tool for further insights into problems of DNA, RNA, and protein sequence.

PCR/Amplification/Analysis

Yesteryears darling of developers, PCR-related software has expanded far from predicting ideal primers for amplification (iOligo, NetPrimer, OLIGO, Oligo 2002, Oligo Calculator, Primer3, Primer Premier, Primer Design Assistant, PrimerQuest, PRIMO, Probemer, WebPrimer) to designing primers for specialized needs, such as exon amplification (ExonPrimer), real time PCR (AutoPrime), quantitative PCR (Beacon Designer, Gene-Quantification), arrays (GenomePRIDE, OligoArray, OligoDesign, Promide, Xpression Primer), multiplex PCR (In Silico Multiplex PCR), RFLP analysis (In Silico RFLPs), and methylation PCR (MethPrimer).

Miscellaneous tools assist researchers in PCR simulation (Amplify 3, e-PCR), calculations (Biopolymer Calculator, Oligo Calculator, PCR Box Titration), mutagenesis (PrimerGenerator, PrimerX), and general organization (OligoMaster).

Genomics

Genomics is a broad software category with several useful utilities. Software in this category provides at the genome level some of the same types of functionality that early software products provided for plasmid researchers.

Witness to this are tools for basic browsing (Argo Genome Browser, ENSEMBL, GenePalette, Human Genome Browser), as well as comparison/analysis (Cross Genome Analysis, Evolution Highway, GALA, GANESH, Genalysis, GeneMachine, GeneMark, GeneSplicer, GenomeScan).

Specialized products provide analysis and/or information related to gene expression (DecisionSite DoTS, euGenes, MBGD) or drug discovery (GeneScape Portal)

RNA Analysis

One of the most underrated and underappreciated sets of analyses in molecular biology is that of RNA secondary structure. Replacement of T by U in RNA complicates the base-pairing rules for secondary structure because U can form reasonably stable structures with G.

The recent characterization of interfering RNAs (iRNAs) and their self-complementary structures has focused considerable attention on this field.

Products specific to RNA interference include BlockIt, iRNAChek, iRNAi, RNAhybrid, RNAi design, and SIRNA. SCOR provides a general structural classification of RNA. Other tools are important for identifying RNA motifs (Motif) and folds (CombFold, Mfold, PairFold, Pfold, RNAalifold, RNAfold).

Protein Structure/Imaging/Classification

Programs for analysis, imaging, and classification of proteins are far more abundant than those for nucleic acids, reflecting the diversity of structure and function of these molecules.

Included in this category are packages for searching databases (Chou-Fasman, COMBSearch, Molecules To Go), structure visualization (Chimera, HelixWheel, HyperChem, MDTOOLS, MolScript, Motif3D, Protein Explorer, PROView, Visual Molecular Dynamics, WebMol), and predicting the sites of proteolysis (CUTTER, PeptideCutter, SignalP).

Tools for predicting structural or functional domains of proteins are most abundant. These include general structure predictors/collections (ExPASy Proteomics Tools, META II, Mini-Pedant, Pfam, PRINTS, PROSPECT Pro, RAPTOR, SCOP, SOSUI, SYSTERS, TIGRFAMS, UNIPROT), fold predictors (3Matrix, FFAS03, GeneSilico), topology predictors (BPROMPT, TopPred 2, TOPS), transmembrane predictors (DAS), post-translational modification prediction (FindMod), subcellular localization prediction (LOC3d, Predotar, Psort, TargetP), phosphorylation site prediction (NetPhos, SCANSITE), mutation effects (MutaProt, Protein Mutant Database), secondary structure prediction (PsiCSI, Secondary Structure Prediction), protein side chain interactions prediction (SCWRL), and transmembrane predictions (Tbbpred, Tmpred, Transmembrane Helix Benchmark).

Other categories include motif analysis (3Motif, Gibbs Motif Sampler, Motif Analysis Workbench, Motif Scan, ParSeq), domain analysis (Biozon, ProDom, PROSITE), functional site analysis (ELM), epitope binding affinity (MHCPred), myristoylation site prediction (Myristoylator), stabilization center prediction (Stabilization Centers in Proteins), tyrosine sulfation sites (Sulfinator), and protein-protein interaction (Protein Networker).

Mass Spectral Analysis

Software for mass spectral analysis of proteins has been another rich area of development in recent years. Products include MASCOT, PEAKS Batch, PEAKS Studio, PEAKS Viewer, PepMapper, PeptideSearch, PeptIdent, Phenyx, Protein Prospector, PROWL).

Applications for using structure as a tool for searching/alignment (K2SA, MATRAS) round out this very rich set of software products.

Besides mass spectrometry, another important technique for mass throughput analysis in molecular biology labs is that of 2-D gel electrophoresis. Numerous products have been developed to help researchers find, identify, and compare different gel patterns to each other.

Interestingly, most of the products in this category are commercial. They include Delta2D, Dymension, GELLAB II, ImageMaster 2D, Phoretix 2D, Phoretix 2D Evolution, and Phoretix 2D Expression.

Noncommercial products are a bit more varied in nature, offering virtual 2-D gels (JvirGel), 2-D gel comparisons (NCIFlicker), and Web analyses (WebGel). 1-D gel products (1Dscan EX, Phoretix 1D Range) complete this coverage.

Microarray Tools

As noted above, microarrays have provided the most robust area of recent molecular biology software development. Tools for designing arrays (Array Designer, ArrayMaker, OligoPicker) provide assistance with planning and managing this task. Visualization/ imaging tools include ArrayStar, ArrayVision, BRB Array Tools, ImaGene, and Phoretix Array.

Interpreting the underlying meaning in the colors and intensities of spots on an array are the domain of analysis tools. These include basic analysis products (Amiada, Array Pro Analyzer, Array Miner, GeneSight, GeneSpring, MicroArray Explorer, QuantArray, Rosetta Resolver, Vector Xpression) as well as products that focus more on statistical analyses (ArrayStat, BRB Array Tools, ExpressionSieve, SAM).

Integrating/synthesizing information is another important step in interpreting array data and software provides help in this area as well (Interaction Explorer, OligoDB, SilicoCyte, Systems Biology Workbench, Vector PathBlazer).

Miscellany and Compilations

The last two categories of information here include a potpourri of products that were too important to ignore (Miscellaneous), but which didnt fit neatly into any other listing and a set of websites (Compilations) with links to software products too vast to be adequately covered here.

The latter illustrates that molecular biological software is now such a vast field that even a lengthy article like this one cant cover it all.

Previous articleEnhancing System Cleanliness & Drainability
Next articleA-1 Database on Vaccines and News About Vaccines