High-Quality Hits from High-Throughput Screens


October 1, 2018 (Vol. 38, No. 17)

Tamsin E. Mansley Ph.D. Head of North American Operations Optibrium
Peter A. Hunt Ph.D. Director of Research Optibrium
Edmund J. Champness CSO Optibrium
Matthew D. Segall Ph.D. CEO Optibrium

Optibrium Created a Multiparameter Approach to Identify Good SAR, Potent Compounds

High-throughput screening (HTS) campaigns are frequently carried out early in a drug discovery project for target validation and identification of validated hits. In HTS analysis, it is important to:

  • Quickly identify one or more hit series with high activity. Diversity between series is beneficial to provide backup strategies.
  • Ensure lead series exhibit structure-activity relationships (SARs) that indicate opportunities for further optimization.
  • Find compounds with good lead-like properties that provide high-quality starting points for hit-to-lead exploration, these include:

    1. Appropriate physicochemical properties.
    2. Good absorption, distribution, metabolism, and excretion (ADME) properties.
    3. Avoiding frequent hitters (false positives) and high-risk functionalities.

To identify such compounds and series, a common practice is to apply filters to the typically large HTS datasets, for example, by specifying an activity threshold or simple properties such as molecular weight, lipophilicity, or the presence of substructures that may indicate nonspecific binding. This practice, however, draws artificially harsh distinctions between compounds, given the inherent variability in HTS data and the low correlation between simple properties and the ultimate in vivo disposition of a compound.

Consequently, the common practice can lead to the selection of false positives (that is, active compounds that are not good starting points for further optimization) and rejection of false negatives (that is, potentially good compounds that have been inappropriately rejected). Alternatively, using a rigorous multiparameter approach enables appropriate weight to be given to these data to confidently identify high-quality, potent hits while avoiding missed opportunities.

Mapping the Chemical Space of Activity

The structural diversity of the compounds screened during a campaign can be explored by clustering or visualization of a chemical space, in which each point represents a single compound, and similar compounds are clustered together. Mapping compound activities onto the visualization using color makes it easy to see hotspots with multiple highly active compounds that might be interesting for further investigation.

One of the first steps in an HTS analysis is to determine, “What is a significant result?” One approach is to assess the distribution of the activity across the dataset and select those compounds with significantly higher activity than average. For example, a compound might reasonably be classified as a “hit” (Figure 1) if its activity is more than two standard deviations above the mean.

Figure 1. Chemical space of a 1000-member screening library. Some areas of chemistry possess no active compounds, whereas in other regions, clusters or hot spots of activity can be observed. A “hit” for this data set is defined by inhibition values >80%, where the mean is 31% and the standard deviation is 24%.

Understanding the Activity Landscape

Another goal in the analysis of HTS data is to identify a series containing potent compounds with good SARs—this provides confidence that hits are genuine and not the result of assay interference or impurities. Consistent SARs may also indicate opportunities for further optimization.

Analyses, such as the activity landscapes displayed by StarDrop™ software solutions, quickly highlight interesting regions of SARs (Figure 2). This method highlights the difference in potency between every pair of similar compounds in a set:

  • Variable regions highlight large changes in activity resulting from small changes in structure and indicate interesting SARs.
  • “Flat spots” indicate limited opportunity for optimization of activity. These highlight opportunities to optimize different properties without a negative impact on activity.

Figure 2. Extract from an activity landscape visualization. Layout (A) is an example of a “flat spot”; layout (B) illustrates a variable region.

Targeting High-Quality Hit Series

While drug candidates should have a SAR consistent with potency, they are more likely to be successful if they also balance appropriate selectivity, ADME, and physicochemical properties. Considering these properties as early as possible, using multiparameter optimization (MPO), is important when assessing a potential hit or lead series. A high-quality lead compound may demonstrate a combination of:

  • Low molecular weight, offering more flexibility for structural optimization.
  • Low lipophilicity, reducing the risk of off-target effects and providing a better chance of good solubility/permeability.
  • Appropriate ADME properties, tailored to the project’s therapeutic objectives.
  • An absence of undesirable structural features, for example, those found in pan-assay interference compounds (PAINS),1 as these may be promiscuous binders, resulting in false positives.

A common approach to selecting compounds based on multiple properties is to apply a series of filters to the data, a simple example of which is shown in Figure 3A. However, this illustrates that while there are numerous compounds in the 1000-member screening library that might be considered as hits based on the inhibition data, only 4 of these also pass filters for 3 additional simple lead-like properties.

Any filtering approach risks inappropriately excluding potentially good compounds by ignoring the relative importance of each criterion and uncertainty in the data. Uncertainty can come from a variety of sources:

  • Experimental variability in the assay, assessed by the variability observed for reference compounds or the standard error in the mean of replicate measurements.
  • Statistical uncertainty for in silico predictions.
  • The relevance of the property to the outcome of the project. For example, many compounds with PAINS alerts are not frequent hitters, and several successful drugs contain PAINS alerts.2

A more appropriate method for MPO will take into consideration the uncertainty in the data and weight the input data based upon their relevance. An example of this is Probabilistic Scoring3 in StarDrop, where compounds are scored based upon the likelihood of their meeting the criteria in a scoring profile, as defined by the project’s objectives. In this approach, desirability functions can be specified for each property, rather than employing hard cut-offs.

In addition, this method explicitly takes into consideration any uncertainty in the data, to avoid rejecting compounds inappropriately where the data do not confidently determine their outcome against the criteria. This results in MPO scores with known uncertainties, and this enables us to consider when compounds can be selected or rejected with confidence.

Figures 3B & 3C illustrate the Probabilistic Scoring approach. The Simple Profile strategy (Figure 3B) considers the same properties as the filtering strategy (Figure 3A) but applies desirability functions weighted by the property’s importance. The selection (yellow) includes all the high-scoring compounds that cannot be statistically differentiated (compounds where the error bars overlap with those of the top-scoring compound, as shown by the inset “snake” plot).

Sampling on this basis highlights a much more diverse sample across chemical space, identifying several interesting compounds that would otherwise have been missed by filtering. The balanced profile in Figure 3C includes ADME properties desirable for a high-quality lead, in addition to potency and physicochemical properties. While this selection also samples broadly across the chemical space, some additional series are excluded, and we could have even greater confidence that the remaining compounds have a good balance of properties.

Figure 3. Comparison of compound selection strategies using (A) filtering, (B) Simple Profile including desirability functions and relative importance, and (C) MPO scoring using a balanced project profile. Compound selections (yellow) include those compounds with scores that cannot be differentiated from the highest-scoring compound with statistical confidence.


Several factors need consideration when prioritizing compounds and series from an HTS campaign. High-quality compounds will exhibit potency combined with a balance of lead-like physicochemical and ADME properties, making them amenable to hit-to-lead optimization. Demonstration of SARs within a series provides opportunities and directions for further optimization.

Assessing the quality of hit compounds using filters risks excluding potentially interesting compounds due to the inherent uncertainty in the data available during drug discovery. Instead, employing a rigorous approach to MPO can ensure that any uncertainties in the data are taken into consideration, enabling the identification of high-quality, potent compounds quickly. From this, one can confidently select structurally diverse series for further exploration and avoid missed opportunities from any HTS campaign.

Tamsin E. Mansley, Ph.D. (tamsin.mansley @optibrium.com), is head of North American operations; Peter A. Hunt, Ph.D. (peter.hunt@optibrium.com), is director of research; Edmund J. Champness (edmund.champness@optibrium.com) is CSO; and Matthew D. Segall, Ph.D. (matthew.segall@optibrium.com), is CEO at Optibrium.

1. Baell, J.B., Holloway, G.A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 2010; 53(7): 2719–2740. DOI: 10.1021/jm901137j
2. Capuzzi, S.J., Muratov, E.N., Tropsha, A. Phantom PAINS: Problems with the Utility of Alerts for PAn-Assay INterference CompoundS. J. Chem. Inf. Model. 2017; 57(3): 417–427. DOI: 10.1021/acs.jcim.6b00465
3. Segall, M.D. Multi-Parameter Optimization: Identifying High-Quality Compounds with a Balance of Properties. Curr. Pharm. Des. 2012; 18(9): 1292–1310. DOI: 10.2174/138161212799436430


This site uses Akismet to reduce spam. Learn how your comment data is processed.