January 1, 2005 (Vol. 25, No. 1)


Strong Points: Nice design
Weak Points: None

Consider the number of organisms on earth, the number of common genes among them, and the number of databases of sequence information for each. Needless to say, the set size of any of these categories is large. A search of DNA polymerase for Bacillus subtilis might turn up numerous identical DNA sequences among different databases. How does one cut through the noise and find desired protein sequence information? As you might imagine, there is no single answer to the question, but one very good approach to the problem is taken by PIR-NREF (a twisted acronym for PIR Non-Redundant Protein REFerence Databases), that groups identical sequences in the PIR-PSD, SwissProt, TrEMBL, RefSeq, GenPept, and PDB databases into individual records. Searches by species, by FASTA, BLAST, and other algorithms are supported, with results neatly and nonredundantly organized.