Kevin Teburi Director of Informatics and IT AstraZeneca
Oliver Leven Ph.D. Head of Professional Services Genedata

Consistent Analysis Workflows Enhance Experiment Interpretation

Improved automation and miniaturization have enabled scientists in pharmaceutical R&D to run more and more complex experiments than ever before—but this comes at a price. Simple data handling and processing (not to be confused with scientific data analysis and interpretation) consume a large proportion of researcher time. The majority of data analysis time is devoted to tedious and repetitive data handling tasks—file export and import, cut-and-paste operations, sorting, and other spreadsheet manipulations. The result: Scientists’ time for interpretation of results and planning experiments is very limited.

This article will illustrate the prevalent antipatterns in drug discovery processes that impede a scientist’s efficiency (i.e., using different software packages, Excel, or pipelining) and highlight effective operational patterns that can overcome inefficiencies (e.g., consistent data analysis setup from instrument to data warehouse).  AstraZeneca’s modern infrastructure will demonstrate how pharmaceutical R&D organizations can implement a controlled process empowering scientists to  understand and interpret research better, ultimately enabling cost-effective and innovative drug discovery.

Situation

The backbone of most pharmaceutical R&D organizations is comprised of various information technology (IT) systems that support research. Logistics systems, such as assay and compound registration systems, support experiment setup. Data analysis systems (e.g., screening data) and electronic lab notebooks support the actual data analysis and results capture. However, many of these systems were developed to cope with a limited number of simple experiments [e.g., enzyme-linked immunosorbent assays (ELISA)] and do not support today’s projects with a large number of complex assays, such as experiments using surface plasmon resonance instruments. Complex experiments often require sophisticated data preprocessing in combination with sophisticated statistical analysis.  Preprocessing is frequently done on the instrument software, yielding intermediate results; these results are uploaded into another statistical program for further analysis. All of this consumes a scientist’s data analysis time, resulting in:

  • Manual workflows: Manually transferring data between different systems—from plate registration to the instrument software, from instrument software to the statistical analysis program—takes effort and introduces errors. To view the underlying raw data at a later stage, information, such as sensorgrams or images, is manually copied and pasted one by one.
  • Lack of comparability: When the same experiment is done by multiple scientists on two similar, yet different, instruments or working at different sites, scientists usually cannot apply identical calculations, algorithms, or thresholds, which brings reproducibility into question. Or, if data transfer includes manual copy-and-paste—data-handling errors may occur that can have a dramatic effect on decisions and follow-on investments.
  • Nonscalable result sharing: Sharing data and results via spreadsheets and PDFs between two scientists or in a small group does not scale. To understand results in the data warehouse, full access to the underlying raw data is needed for proper analyses and interpretations.
  • Hurdles to outsourcing: Oftentimes, data analysis conducted at a partner site is not transparent when spreadsheets are exchanged between the parties, requiring the receiving scientists to rerun data analysis. This results in duplication of effort, reduced efficiency, and questionable data integrity. 

Enter the Antipattern

An antipattern is a common response to a recurring problem that usually proves ineffective and risks being highly counterproductive.1 Two key elements formally distinguish an antipattern from a simple bad habit, practice, or idea:

  1. A commonly used process, structure, or pattern of action that, despite initially appearing to be an appropriate and effective response to a problem, typically has more bad consequences than good ones
  2. Another solution exists that is documented, repeatable, and proven to be effective.2

In R&D, spreadsheets and other flexible systems are examples of antipatterns. Due to its flexibility, the ubiquitous spreadsheet has become the glue that connects different steps in the data analysis workflow. Although spreadsheets attempt to streamline the process, they induce manual labor and increase the risk of significant and costly errors.

Enter the Antidote

An integrated and automated software platform enables scientists to eliminate manual data handling in most of their data analysis and adopt consistent workflows, methods, and practices. Platform maintenance is not their concern because it is handled by research IT. Platform changes due to evolving methodology or practices (e.g., adding new input sources) are driven by the scientists, compiled by the system owner, and realized by research IT.  To maintain good structures and clean metadata, IT has the authority over result definitions, methods, and vocabularies across the enterprise. Although this approach seems to reduce a scientist’s flexibility, flexibility is actually increased. The platform eliminates mundane data-handling steps and provides proven analysis options, enabling proper evaluation of new approaches and easy adoption.

AstraZeneca’s Antidote to the Antipattern

Prior to 2012, AstraZeneca used a collection of homegrown and proprietary software for data analysis of screening data. Five different software packages across seven global sites were used that fed into AstraZeneca’s federated results database IBIS.

While each software package generally worked well for specific groups, this heterogeneous, complex infrastructure (Figure 1) led to issues in the following key areas:

1. Misalignments on nomenclature occurred between user groups; data could not be compared without (expensive) translation.

2. Differences in data analysis methods used to generate results sometimes required recalculation before onward analysis could commence.

3. There was increased potential for local data silos if all generated data was not transferred to the corporate database, meaning the full breadth of data could not be considered.

4. Difficulties arose with result sharing and comparability between sites and with outsourcing partners.

The added negative effect was seen in a few areas. For example, collaboration opportunities were delayed because simple data collection required more effort than anticipated. Efficiency lags were observed in the screening processes due to lack of harmonization across the organization. Furthermore, results sharing internally and externally was difficult due to different analysis paradigms from each system.


Figure 1. AstraZeneca screening landscapes before and after standardization on a single platform, which simplifies IT, harmonizes infrastructure, and connects all screening technologies with a common data pipeline.

In 2013, AstraZeneca globally implemented Genedata Screener® as a single screening data capture and analysis platform.  It achieved the triple effect of rationalizing, standardizing, and harmonizing the plate-based data screening analysis process.

More results were captured with the single platform than previously recorded (Figure 2). This was attributed to a few factors, including: the same global process created new efficiency; consistent and fast analysis was realized globally; a single analysis methodology reduced ambiguity of results interpretation; and results from complex screening technologies could be captured more easily, increasing the willingness to report these centrally.

These factors were catalysts for standardizing analysis methods, so that result sets could be generated with minimal user interaction. This step required the scientific community to trust in the process methodology and to understand that allowing more time for the value-add of the screening process and biological interpretation—versus results generation—was a positive. For example, now an algorithm automatically handles curve fittings, providing consistency and full traceability in the analysis session versus the previous scenario of manually adjusting curve-fitting equations. This infrastructure enables AstraZeneca to collaborate quickly and easily with the research community. It also helps the organization to adopt new tools, such as a collaboration server environment, which serves as a knowledge-sharing platform, eliminating the need to recreate the data processing path from scratch every time AstraZeneca scientists want to reanalyze external data.


Figure 2. AstraZeneca weekly result-set capture rate before and after screening standardization. (A result set is the set of all results from a single experiment batch.)

Design Principles Improve Scientific Insights

AstraZeneca’s screening analysis platform is rooted in:

  1. Harmonized data analysis workflows: There is central implementation and approval of all analysis methods and avoidance of duplication and equivalent methods.
  2. Technology-specific support: There is no need to run specific portions of the analysis (for a complex screening technology) outside of the platform.  
  3. Automation:  Import of raw data is automated and avoids file handling. For example, in a high-content system, features are imported directly from the instrument without any file handling.
  4. Single method–single formula: All methods are implemented once a method life cycle management is in place.
  5. Application of the “rule and exception” principle: All entities in a screening experiment should be treated the same; any exception from the rule must be fully traceable.

A comprehensive screening platform based on these design principles can provide an effective antidote to pervasive antipatterns in R&D, allowing scientists to execute analytical workflows more efficiently while maintaining the flexibility needed for innovative experimental research.  This gives researchers more time for interpretation of biological results and planning experiments, areas in which their expert insight is of high value to the R&D process.

References
1. Budgen, D. (2003). Software Design. Harlow, Eng.: Addison-Wesley, p. 225. ISBN 0-201-72219-4. "As described in Long (2001), design anti-patterns are 'obvious, but wrong, solutions to recurring problems'."
2. https://en.wikipedia.org/wiki/Anti-pattern

Kevin Teburi ([email protected]) is director of informatics and IT, iMed team leader, R&D information at AstraZeneca. Oliver Leven, Ph.D. ([email protected]), is head of professional services for the Genedata Screener business unit at Genedata. 

 

Previous articleSeres, Mayo Clinic Launch Liver Disease Partnership
Next articleNovel System Allows Real-Time Observation and Study of the Gut