GEN Exclusives

More »

Feature Articles

More »
Mar 15, 2006 (Vol. 26, No. 6)

Cheminformatics for Finding New Drugs

Host of Computational Methods Are Poised to Break the Bottleneck in Drug Discovery

  • Everyone laments that there are only so many hours in a day, therefore only so much that one can accomplish. Scientists understand the need to generate as much data as possible about the project on which they are working, and then race into the next project, the next data set, the next batch of compounds to screen, and the next big drug candidate to send through clinical trials and ultimately, hopefully, to market.

    To get a handle on what technologies are available to break the bottlenecks in the drug discovery field, a few of the presenters at the upcoming CHI Cheminformatics Conference discuss their successes in drug development with respect to cheminformatics and some of the technical challenges they face.

    Cheminformatics can be identified as the use of computers in chemistry, but thats actually a bit broad and better broken down in two ways, says Stephen Dixon, product manager, ligand-based drug design at Schrodinger ( Chemoinformatics, the predecessor of cheminformatics, can be defined as a general discipline thats concerned with how to best utilize and manage data, generated from high-throughput screening (HTS). Cheminformatics is more about the battery of tools required to find relationships in the data. How you define it also depends largely on what technology you use to tackle it.

    Dixon points out that the biggest bottlenecks come from the restraints of space and time. You are dealing with massive numbers of compounds to screen using computational methods and you have to answer the question of how you are going to deal with the data. Either you are going to store gigabytes or even terabytes worth of data, and analyses are going to take that much longer, or if you decide you dont want to store the data, you will be doing the same computations and screening the same compounds over and over again.

    For example, the active site geometry of a protein complex depends on conformational changes induced by the bound ligand. However, resolving the crystallographic structure of a protein-ligand complex requires a substantial investment of time and is frequently infeasible or impossible.

    Schrodingers Induced Fit protocol solves this problem by using the companys Glide and Prime technologies to consider all possible binding modes and the associated conformational changes within receptor active sites. This allows chemists to quickly predict active site geometries with minimal expense, even for systems as challenging as hERG homology models.

    Dixon notes that working on large molecules brings its own set of challenges. If you bring 3-D analysis into the picture and you are dealing with large structures that are highly flexible, you just hope that youre able to get its general shape correct and that the difference between the actual bioactive structure and the 3-D-generated shape doesnt matter that much.

  • Computational Analysis

    Strictly speaking we dont do cheminformatics, says Jeff Wiseman, vp and officer, technology and informatics at Locus Pharmaceuticals (, but our goal is to raise cheminformatics to understand which proteins cells bind to. Today, cheminformatics is mining data on millions of compounds, but we still have not reached the diversity in chemical structures that we need for developing new drugs. Our goal is to expand this diversity by orders of magnitude by generating binding data on tens of billions of compounds. Since only 10 million or so drug-like molecules exist in the world today, we will have to reach this goal computationally in order to transcend the limitations of the physical world.

    Locus de novo fragment-based design technology is the first computational approach to identify high-affinity ligands rapidly and accurately enough for practical use. The algorithms are based on theories of statistical thermodynamics.

    Series of high-affinity compounds generated from the de novo design process are further refined to identify a small set of drug leads. This filtering process includes the computational analysis of each molecules ADME using several standard and models developed by Locus. Target-specific criteria are then applied to select the compounds that are most drug-like, providing the lead series for synthesis and biological testing. The trick is to identify the 100 best molecules from a pool of 10 billion, explains Wiseman. In order to do this, computational methods must rival high-throughput screening in accuracy, and we are finding that this level of accuracy is achievable.

  • Fingerprints

    Fingerprints are useful cheminformatics tools because they can be designed to focus on chemical features or interactions of interest, says Chris Williams, Ph.D., principal scientist, Chemical Computing Group ( Analysis tools developed for one fingerprint, such as similarity searching and clustering, are usually applicable to other fingerprints. Fingerprints also allow the use of convenient mechanisms for combining multiple results into consensus predictions that leverage the strengths of the individual methods.

    At the center of this Montreal-based companys technology is MOE, a fully integrated suite of computational chemistry software that includes molecular modeling, QSAR, protein-ligand docking, and protein bioinformatics applications. A number of fingerprint systems, including 2-, 3- and 4-point pharmacophore fingerprints in 2-D or 3-D and MACCS keys, are also supported. MOE applications are written in Scientific Vector Language (SVL), a high-level chemistry-aware programming language, designed for computational chemistry. SVL source code, provided with the MOE distribution, is easy to modify and can be used to create new applications, according to the company.

    Since the SVL source code provided in the distribution can be easily customized, it was used to create custom fingerprints and novel fingerprint analysis approaches, states Dr. Williams. Information from different fingerprints can be combined by mapping bit importance values back onto the fragments used to construct the bits. The resulting fragments scores can reflect either one or multiple fingerprinting systems and can be used to visualize important pharmacophore features and to score molecules in virtual screening.

    Cheminformatics is part of the lead discovery process, says Jeremy Jenkins, research investigator, lead discovery center at the Novartis Institutes for Biomedical Research ( Mining HTS data is one application, and another is in silico lead discovery using 2-D and 3-D methods.

    Novartis in silico chemogenetics effort, based in Cambridge, MA, is concentrated on exploiting chemical approaches to target identification. Our main focus is that we are building statistical models for large numbers of targets. This is a new area, using in silico chemogenomics as a predictive tool for biologists, said Jenkins. We can link targets with chemical structures and this data can be mined to predict ligand-target pairings.

    Jenkins group de-orphans targets and phenotypically interesting molecules by using chemical fingerprinting. We work with more than 1,000 different targets then, prioritize which targets are likely to bind with what molecules, he explains.

    Chemical genetics is not just an academic tool; its a really good way to discover lead targets when combined with cheminformatics, Jenkins notes.

    Other companies are developing ways to screen small molecules using genomics-based technologies to better understand their action on biological systems. We are not specifically developing cheminformatics tools, says Larry Mertz, vp of R&D and product management at Gene Logic (, but we are developing a genomics-based platform so pharmaceutical and biotech customers can screen panels of small molecules to predict human toxicity. It will better enable them, at the stage of lead optimization, to rank and prioritize small molecules for further development.

    Toxicogenomic profiling can accelerate the pace of uncovering critical information before making significant investments into relatively more expensive and lengthy nonclinical studies. The use of microarrays to examine gene-expression profiles, derived from mammalian cells and tissues treated with small molecules when benchmarked to large reference data sets, helps establish accurate predictive methods that identify potential toxic liabilities, even in the absence of overt injury.

    In addition, Gene Logics ToxExpress Program reportedly helps to streamline, focus, and augment subsequent classical studies through specialized applications that support key research areas of predictive, investigative, and mechanistic toxicology, as well as safety biomarker discovery.

    Gene Logic has seven years of experience in creating predictive algorithms to determine toxicity of compounds in both human and rat models. Its latest project is the creation of a genomics-based, 96-well hepatocyte screening platform that will provide a lower-cost toxicogenomic screening assay.

  • In Silico Modeling

    Coalesix ( focuses on the development and commercialization of technology to improve the efficiency of drug discovery through the use of its candidate design environment (CDE), Mobius. We don't do hardcore data mining, says Jim Wikel, CTO at Coalesix. We are all about using in silico models combined with evolutionary structure-generation methods to look for structures that satisfy as many criteria as possible.

    Mobius CDE fosters interactions between computational and medicinal chemistry to produce a novel approach to address the challenges of lead optimization to enable faster identification of more potential drug candidates. Mobius exploits the chemical information represented in the in silico models and the experience and intuition of the discovery scientists to drive multicriteria optimization.

    The insight and experience of medicinal chemists provides direction for the application of existing computational methods and algorithms through an initial set of design priorities. These priorities balance between biological potency and properties pertinent to a successful drug candidate. Mobius provides the foundation for the generation and evaluation of relevant ideas, taking into account the priorities of the medicinal chemist. Were not about finding the most potent structure but finding populations of structures that satisfy the most criteria, Wikel asserts.

    Structures generated by Mobius are presented to the scientist for human input on objectives that are not readily quantifiable, such as synthetic tractability. The scientists feedback may reinforce or change the direction in which Mobius is moving or change the design priorities. Mobius continues to navigate the vast space of potential chemical structures, incorporating the expert medicinal chemistry input and generating significant data that is available for detailed analysis. The process continues until structures of sufficient potential interest for synthesis are identified.

    Computational and medicinal scientists have never seen eye to eye, notes Wikel. But by using our approach, the medicinal chemist feels more comfortable with computational chemistry, and computational chemists realize the value of the direct input they can bring to the drug development process. As statistician George Box said, all models are wrong, but some are useful.


GEN Jobs powered by connects you directly to employers in pharma, biotech, and the life sciences. View 40 to 50 fresh job postings daily or search for employment opportunities including those in R&D, clinical research, QA/QC, biomanufacturing, and regulatory affairs.
More »

Be sure to take the GEN Poll

The Triple Package and Success

One theory for explaining “success," put forward by Amy Chua Jed Rubenfeld, posits cultural traits such as a superiority complex, personal insecurity and impulse control. Union College professors Joshua Hart and Christopher Chabris counter that intelligence, conscientiousness, and economic advantage are the most likely elements of success, regardless of ethnicity. Do you think that Hart-Chabris make a better argument for achieving success than the Chua-Rubenfeld theory?

More »