July 1, 2005 (Vol. 25, No. 13)

Computational Chemistry Generates Diverse and Focused Compound Libraries

New hits in drug discovery have traditionally been identified by screening a biological target with large chemical libraries that often contain more than a million compounds. This random screening is both costly and time-consuming. Additionally, these libraries are notorious for their lack of diversity in terms of chemical and biological properties.

In recent years, a more rational and streamlined approach to drug discovery has developed that involves computational design of drug structures and evolutionary techniques such as genetic algorithms. These methods are replacing or complementing traditional approaches.

Computational chemistry can be used as a predictive tool to visualize and model the chemical, physical, and biological properties of chemical molecules before they are synthesized, enabling chemists to focus on the synthesis of only a few structures (Figure 1).

A number of methods currently exist for designing chemical libraries. General libraries use a measurement of chemical diversity in their design and aim at covering as much chemical space as possible to maximize the likelihood of discovering a novel, patentable, lead class of active compounds.

Focused or targeted chemical libraries are then synthesized to expand this class and thoroughly explore the space around them. These libraries are typically for distinct targets or protein families and have the potential to identify better hits/leads as well as improve efficiency.

GPCR-targeted Libraries and Ligand-based Design

GPCRs are a family of membrane-bound proteins that facilitate many complex cellular responses to extracellular stimuli. They are major targets in drug discovery: it is estimated that over 50% of marketed drugs, which are involved in a wide range of therapeutic applications, act on GPCRs.

Currently, structural data from x-ray crystallography exists for only one GPCR. This reflects the difficulty in crystallizing these proteins: they will not fold into their native conformation in an artificial membrane. The absence of detailed structural information has demanded that alternative approaches for designing GPCR inhibitors are sought.

There are estimated to be 2,000 GPCR sequences in the human genome, 350 of which are likely to be druggable. This allows structural biologists to look for common sequences within the GPCR protein family to identify parts of the protein structure that are responsible for binding.

Key binding motifs can be identified and a particular GPCR of interest can be classified based on the presence or absence of these motifs. This information is then combined with in silico modeling of compounds that possess the correct binding motifs, and thereby molecules with varying degrees of selectivity toward the GPCR target protein can be generated.

An alternative computational approach is to study ligands that interact with the target GPCR. This collection of interacting ligands can provide insight into the shape and characteristics of the GPCR binding pocket by constructing a pharmacophore model.

A pharmacophore is an ensemble of structural features (steric and electronic) in a molecule that are responsible for the molecule’s biological activity (i.e., binding to this structural feature can trigger or block the proteins’ biological response).

Optimizing the Pharmacophore

Using a ligand-based approach, Peakdale Molecular (Chapel-en-le-Frith, U.K.) and De Novo Pharmaceuticals (Cambridge, U.K.) collaborated to develop targeted GPCR Peakexplorer libraries. These were derived from a variety of family-based extended pharmacophores for subfamilies of the GPCR class.

At the start of the project a sophisticated algorithm sampled the chemical space for each class of aminergic GPCR targets in turn, starting with the dopamine target, for example.

Using standard diversity analysis and clustering tools, the companies were able to identify ten discrete ligands to represent the different types of compounds that were known to be active against dopamine targets.

These ligands were then used to rapidly build up a set of over 3,000 pharmacophore models, which could be tested against a database of druglike compounds seeded with dopamine ligands.

The resulting pharmacophores were found to be feature-rich, identifying a significant number of known ligands from other druglike compounds. This gave Peakdale’s chemists the confidence to use these models to aid in the design of new compounds targeted against aminergic GPCRs.

The pharmacophore models are also used by medicinal chemists to help understand how ligands bind to the proteins. In order to design new compounds chemists will look at small fragments of the compounds to see if they can be replaced by something else that still meets the requirements of binding.

Substituting the fragments allows the generation of novel scaffolds and extends the chemical space. This process can be facilitated by computers which can rapidly scan thousands of available fragments and can highlight those that can produce good binding.

De Novo’s SkelGen (advanced in silico structure-generating platform) was used to design novel molecules. SkelGen operates with a database of more than 1,700 molecular fragments, each of which is coded with information about the kind of other fragment it can join to and what the resulting bond would be. This allows the program to select the fragments randomly and build them into druglike molecules.

Thousands of chemically feasible, virtual molecules were generated using SkelGen. These molecules were further filtered for suitable properties and ranked. This output was then optimized by Peakdale’s medicinal chemists to define chemotypes suitable for generating new compounds for the GPCR-targeted libraries.

Chemical Libraries in Cancer Drug Discovery

A fragment-based approach has also been used to generate chemical libraries targeted against the c-Abl tyrosine kinase. c-Abl is ubiquitously expressed and regulates cell cycle progression, DNA damage responses, and apoptosis. The protein is a proto-oncogene that can enhance the development and progression of cancer. This makes c-Abl an interesting target for the development of new drugs in cancer treatment.

Peakdale collaborated with BioLeap (New Hope, PA) to produce new druglike compounds targeted at c-Abl. Unlike GPCRs, detailed structural information from x-ray crystallography data was available for c-Abl. However, this procedure cannot precisely locate and identify water molecules. This information is valuable since under physiological conditions proteins are surrounded with water which can strongly influence ligand binding.

BioLeap’s computational approach can overcome these drawbacks (Figure 2). From a three-dimensional protein structure BioLeap’s computer algorithm enables the immersion of molecules in water to be simulated and computes the distribution and free energy of each water molecule.

Additionally, this algorithm identifies water that is tightly bound or embedded within the protein structure. Since water molecules are polar, they can modify the electrostatic field in their vicinity.

In effect, these molecules become an integral part of the protein and influence the binding of ligands. In most cases ligands are not able to displace tightly bound water molecules. The outcome is a new crystallographic structure of the protein that now also contains bound water molecules.

The water molecules can also be replaced with small chemical fragments which allow medicinal chemists to locate small binding pockets that recognize key building blocks (Figure 3).

From the enormous range of possibilities available this approach allows chemical modifications to drug candidates that retain potency while solving problems in protein dynamics and toxicity to be rapidly identified. Peakdale’s chemists are currently designing novel compounds targeted to the proto-oncogene c-Abl with better druglike properties than currently existing compounds.


Many advances in computational chemistry have been made over recent years. The optimization through in silico design of potentially new molecules expedite drug discovery with target-focused compounds and chemically diverse novel hits that are forming the basis for new leads (Figure 4).

The next generation of computational algorithms could enhance the design of new compounds and libraries further by taking into account protein flexibility and the movement and rotation of amino acid residues.

Previous articleEvaluation of the CellFerm-Pro STBR System
Next articleEPA-IRIS Chemical Tracking System