Doug Auld, Ph.D. Novartis Institutes for BioMedical Research

Researchers demonstrate the value of a tool that compares compounds solely on the basis of their bioactivity for virtual screening.

ASSAY & Drug Development Technologies offers a unique combination of original research and reports on the techniques and tools being used in cutting-edge drug development. The journal includes a “Literature Search and Review” column that identifies published papers of note and discusses their importance. GEN presents one article that was analyzed in the “Literature Search and Review” column, a paper published in ACS Chemical Biology titled “Rethinking molecular similarity: comparing compounds on the basis of biological activity.” Authors of the paper are Petrone PM, Simms B, Nigsch F, Lounkine E, Kutchukian P, Cornett A, Deng Z, Davies JW, Jenkins JL, and Glick M.

Abstract from ACS Chemical Biology

Since the advent of high-throughput screening (HTS), there has been an urgent need for methods that facilitate the interrogation of large-scale chemical biology data to build a mode of action (MoA) hypothesis. This can be done either prior to the HTS by subset design of compounds with known MoA or post HTS by data annotation and mining. To enable this process, we developed a tool that compares compounds solely on the basis of their bioactivity: the chemical biological descriptor “high-throughput screening fingerprint” (HTS-FP).

In the current embodiment, data are aggregated from 195 biochemical and cell-based assays developed at Novartis and can be used to identify bioactivity relationships among the in-house collection comprising ~1.5 million compounds. We demonstrate the value of the HTS-FP for virtual screening and in particular scaffold hopping. HTS-FP outperforms state of the art methods in several aspects, retrieving bioactive compounds with remarkable chemical dissimilarity to a probe structure. We also apply HTS-FP for the design of screening subsets in HTS. Using retrospective data, we show that a biodiverse selection of plates performs significantly better than a chemically diverse selection of plates, both in terms of number of hits and diversity of chemotypes retrieved. This is also true in the case of hit expansion predictions using HTS-FP similarity.

Sets of compounds clustered with HTS-FP are biologically meaningful, in the sense that these clusters enrich for genes and gene ontology (GO) terms, showing that compounds that are bioactively similar also tend to target proteins that operate together in the cell. HTS-FP are valuable not only because of their predictive power but mainly because they relate compounds solely on the basis of bioactivity, harnessing the accumulated knowledge of a high-throughput screening facility toward the understanding of how compounds interact with the proteome.


Understanding structure–activity relationships (SAR) of compounds is a central focus in chemical biology. Activity signatures are generated when a compound’s activity is measured across many assays, and these signatures can be used to group compounds that have similar chemical–biological interactions. This approach does not need to consider chemical similarity and depends on large well-annotated chemical biology datasets.

In this article, the HTS data from 195 assays (both biochemical and cell-based assay data) developed at Novartis over a period of 10 years was used to derive the “high-throughput screening fingerprint” (HTS-FP) of compounds. The HTS-FP provides biological descriptors that can be analyzed by computational methods. In the article, HTS-FPs are used to develop target hypotheses and construct diverse subsets of bioactive compounds, and it is demonstrated that HTS-FPs are biologically relevant through examining gene ontology (GO) category enrichments within HTS-FP clusters.

To test the utility of HTS-FP in virtual screening, HTS-FPs were compared to a two-dimensional (2D) chemical similarity searching method, ECFP4 (extended connectivity atom environment with radius 4). Overall, for well-explored target classes like protein kinases for which there are many different types of chemotypes with many similar analogs, ECFP4 performed better in retrieving active compounds.

However, since HTS-FP doesn’t depend on chemical structure, this method was far better at retrieving chemically diverse structures with similar biological activity. For example, when performing chemical similarity searches for the phosphodiesterase type 4C (PDE4C) inhibitor rolipram, compounds that were selected by chemical similarity contained at least one moiety in common with rolipram, while in the top 1% of the HTS-FP clusters many unrelated compounds were retrieved such as xanthanine derivatives. For these xanthanine compounds there are public data confirming PDE activity, with one compound reported as specific for PDE4C.

Next, HTS-FP was used to build a biodiverse focus library. The entire 1.5 million Novartis compound file was clustered using HTS-FPs and a total of 710 384w plates that represent many HTS-FP clusters were chosen (to maximize biodiversity in a plate-based fashion). This was used for virtual screening to expand the collection. This biodiverse set was then compared to a chemically diverse set based on ECFP4. Both libraries were compared to a random selection of plates. Finally, from a set of 13 Novartis assays the percentage of actives retrieved from each library was assessed.

Again HTS-FP outperformed either random selections or ECFP4 except in the case of certain protein kinase assays. The HTS-FP–based biodiverse collection represents 15% of the entire HTS library but was found to cover 37% of the actives across the 13 assays analyzed and also provided improved coverage of diverse scaffolds compared to structure-based methods. As well, this method minimizes choosing chemically similar structures with no biological activity. Finally, the biological relevance of the HTS-FP clusters was examined. Since the HTS data included cell-based assays, the clusters could be enriched for certain targets, pathways, or cellular functions. Enriched GO terms were identified by looking at GO category enrichment within each HTS-FP cluster. HTS-FP enriched for GO terms more than random clusters (see Figure). As well, although compounds within HTS-FP GO groups don’t share the same genes or structural features (see Figure) these had similar biological functions suggesting that such an analysis can be used to construct focus libraries to target a phenotype.

Figure. (A, B, D) Groups of compounds that belong to the same HTS-FP cluster and share the same GO term (biological processes [red], cellular components [green], and molecular functions [purple]) defined as GO-groups. (C) GO-group size distribution for each GO-term enriched in HTS clusters. In black, GO-group size distribution for random clusters. The distribution for random functions and components are comparable and smaller, respectively, and omitted here (Supplementary Figure S10 in the article).

Doug Auld, Ph.D., is affiliated with the Novartis Institutes for BioMedical Research.

Previous articleFilling the Drug Translation Gap
Next articleCLC Bio Wins $2M For Pathogen Sequencing