Using the database as a computational training tool, CDAS carried out a proof-of-principle study. This demonstrated that, with a full-rank, high-density dataset, such as SED, and innovative data-handling techniques ideal novel lead compounds could be identified without screening a substantial chemical library or suffering a high-attrition rate due to interaction of screening hits with unintended target(s).
More specifically, CDAS sought compounds that could demonstrate specificity with dopamine, D1 receptors, but demonstrate no or little affinity with seven other similar proteins with at least 30% sequence homology.
The first step was to establish a structure-activity relationship using the dopamine D1 dataset from the database. The dataset was interrogated with a statistical data-handling algorithm, called recursive partitioning, using a commercial software package (ChemTree; GoldenHelix). As shown in Figure 1, chemical descriptors were clustered into two general categories based on the p-test: one set of cluster of descriptors was depicted as statistically associated with the observed dopamine D1 binding affinity. The other set of clusters of descriptors was devoid of such association. The set of active chemical descriptors was used to cherry-pick 11,169 compounds from a virtual library of 112,539 entries provided by SPEC Chemicals (www.spec.net). By conventional means, this focused library of 11,000 compounds was expected to exhibit higher dopamine D1 “hit-rate” than would have been observed with the original collection.
The second step was to optimize the Dl-focused virtual chemical library for D1-receptor specificity. The virtual collection would also have the propensity to exhibit affinities with those proteins structurally similar to dopamine D1. SAR clustering analysis of dopamine D2, adrenergic-a2A, -a2B, adrenergic-b1 and -b2, 5HT2a, and norepinephrine transporter were constructed with the same clustering tools and using the dataset derived from the same database. Using the nodes (leaves) of these clustering “trees” that did not exhibit any affinity with these dopaminergic (D2), adrenergic (a- and b-), and serotonergic (5HT2a), enabled the sequential “triaging” (or trimming) of the virtual library of compounds to afford a virtual subset, shown in Figure 2. Statistically, the subset of compounds are likely to exhibit affinity with dopamine D1 receptor and unlikely to show affinity with other receptors in the designated panel. A subset of the virtual library was purchased and screened for binding activities in all eight biological assays.