Drug discovery is an arduous process with an extremely high attrition rate. Every day the industry processes millions of small organic molecules in search of the next blockbuster drugs. Typically, this process involves high-throughput screening, which results in thousands of hits that could potentially be optimized and developed into a new drug. In contrast to these vast numbers of compounds screened and hits discovered, the success rate has been rare and the numbers of IND filing of NCEs has dwindled over the past decade.
The most common failure for hits or leads to become a drug candidate is not for lack of potency against the intended target but rather failure due to side effects and toxic events. In order to avoid such unintended events, there are several schools of thought and approaches, with each sharing one common theme—eliminating potential pitfalls early.
Conventionally, screening hits are profiled against wide panels of biological targets, i.e. cells, receptors, enzymes, and compounds that have unintended biological activities are eliminated. This approach does not really reduce the overall attrition rate but rather shifts the failure to an earlier phase where little investment has been made. For large pharmaceutical companies with extensive in-house infrastructure and resources, such an approach has helped to increase productivity. However, for the mid-size biopharmaceutical companies that have limited resources, this type of approach is cost prohibitive.
Caliper Discovery Alliances and Services (CDAS; www.caliper.com) has compiled a content database called Side-Effect Database™ or SED™, which consists of the screening results of 3,000 compounds tested against 70 biological targets. The compounds are mostly marketed and withdrawn pharmaceutical drugs, agricultural and cosmetic compounds, reference agents, and known receptor ligands and modulators. The biological targets within the database are GPCRs, enzymes, ion channels, and transporters. Each of the compounds collected in this library have been comprehensively screened against the biological targets and all of the screening results (both positive and negative) have been captured. Since this collection of compounds contains previously known “active” molecules, there is a significantly higher hit rate (10%) observed than the hit rate from a random chemical library of similar size.
Using the database as a computational training tool, CDAS carried out a proof-of-principle study. This demonstrated that, with a full-rank, high-density dataset, such as SED, and innovative data-handling techniques ideal novel lead compounds could be identified without screening a substantial chemical library or suffering a high-attrition rate due to interaction of screening hits with unintended target(s).
More specifically, CDAS sought compounds that could demonstrate specificity with dopamine, D1 receptors, but demonstrate no or little affinity with seven other similar proteins with at least 30% sequence homology.
The first step was to establish a structure-activity relationship using the dopamine D1 dataset from the database. The dataset was interrogated with a statistical data-handling algorithm, called recursive partitioning, using a commercial software package (ChemTree; GoldenHelix). As shown in Figure 1, chemical descriptors were clustered into two general categories based on the p-test: one set of cluster of descriptors was depicted as statistically associated with the observed dopamine D1 binding affinity. The other set of clusters of descriptors was devoid of such association. The set of active chemical descriptors was used to cherry-pick 11,169 compounds from a virtual library of 112,539 entries provided by SPEC Chemicals (www.spec.net). By conventional means, this focused library of 11,000 compounds was expected to exhibit higher dopamine D1 “hit-rate” than would have been observed with the original collection.
The second step was to optimize the Dl-focused virtual chemical library for D1-receptor specificity. The virtual collection would also have the propensity to exhibit affinities with those proteins structurally similar to dopamine D1. SAR clustering analysis of dopamine D2, adrenergic-a2A, -a2B, adrenergic-b1 and -b2, 5HT2a, and norepinephrine transporter were constructed with the same clustering tools and using the dataset derived from the same database. Using the nodes (leaves) of these clustering “trees” that did not exhibit any affinity with these dopaminergic (D2), adrenergic (a- and b-), and serotonergic (5HT2a), enabled the sequential “triaging” (or trimming) of the virtual library of compounds to afford a virtual subset, shown in Figure 2. Statistically, the subset of compounds are likely to exhibit affinity with dopamine D1 receptor and unlikely to show affinity with other receptors in the designated panel. A subset of the virtual library was purchased and screened for binding activities in all eight biological assays.
The overall screening results confirmed the statistical biases, that is, the collection of compounds provided > 10% dopamine D1 hits, but significantly less with other receptors. For instance, as shown in Figure 3, in a pair-wise comparison of binding affinities between dopamine D1 and D2 receptors, the screening hits gravitated toward demonstrating activities in the dopamine D1 (x-axis) binding assay, whereas very few hits showed concurrent binding affinity with the D2 receptors. In fact, the same propensity was shown in all pair wise activity comparisons. Figure 4 illustrates the activity profiles of a few compounds that demonstrated specific affinity with dopamine D1 receptor and little affinity with others in the designate panel of proteins.
In short, with innovative use of existing dataset and computational tools, it is possible for mid-size biopharmaceutical companies with limited resources to further reduce attrition rate and to cut costs associated with early-phase drug discovery. This example has demonstrated that with a full-rank, high-quality dataset, like SED, along with the use of statistical clustering algorithms, it is possible to find novel lead candidates with target specificity without incurring the excessive costs associated with library collection, high-throughput screening and follow up profiling. Databases, such as SED, are a useful computational tool for the design and collection of specific project libraries in which the undesired side effects, toxicological events, and target specificity may be predefined.