September 15, 2010 (Vol. 30, No. 16)
Pharmaceutical Firms and Biotech Companies Begin to Take Close Look at Evolving Technology
When the National Center for Biotechnology Information at the NIH began its 1,000 Genome Project, it turned to cloud computing to handle the massive file transfers and store the data. The FDA is using cloud computing, too, as it updates and virtualizes its data centers.
Watching this, biopharmaceutical companies are beginning to explore cloud options, looking initially at business applications and noncritical data before considering cloud computing for regulated or patentable data.
Cloud computing is a potentially disruptive technology that makes inexpensive, pay-on-demand computing practical. It blends the concepts of grid computing, virtualization, software as a service (SaaS), and service-oriented architecture to produce a computing environment that resembles the Internet.
Unlike a traditional computing environment in which resources reside on specific servers, cloud-based resources—including servers and applications—reside in a distributed, globally accessible environment.
Pfizer’s research and development organization is taking advantage of clouds’ flexibility to analyze large datasets, develop models in antibody docking runs, optimize drug candidates, and select targets. “We have about 500 applications running in the cloud,” according to Mike Miller, Ph.D., senior director for R&D high-performance computing.
The use of clouds has shortened some of those runs from two to three days to two to three hours. Those gains were possible because the powerful applications running in a cloud could access as much simultaneous computing power as necessary, rather than relying on the finite number of servers in Pfizer’s data center.
Likewise, when Eli Lilly needed to analyze a drug it was developing, it paid $89 to access Amazon’s EC2 web service, rather than add 25 servers to its data center, plus their software licenses maintenance fees, data center space, electricity, and IT labor, according to Philip Sheibley, life sciences industry director for Accenture’s Life Sciences Practice.
With benefits like that, “there’s a lot of interest in clouds among our clients,” he says, but they are concerned about the challenges and regulatory environment.
The current trend, Sheibley says, is to test cloud computing and cloud-based services on safe environments, like such back-office applications as human resources, IT, and financial applications, and for nonpatient data. “There’s not much R&D work under way in the cloud,” but that will change as more applications are developed.
The next trend will incorporate clinical data. That is where regulatory concerns become a factor. That shift is already under way, according to Ronald Ranauro, president and CEO of GenomeQuest. His company provides SaaS and Platform as a Service as a suite of offerings for bioinformatics developers and end users.
GenomeQuest and others like it are leveling the playing field by making high-performance computing and computationally intense applications available on an as-needed basis. Consequently, computationally strapped companies can run the same robust applications as their more cash-flush competitors.
Compliance
“Regulators are coming to the conclusion that it’s not in the public interest to put up barriers to the use of cloud technology,” Sheibley says.
As FDA spokesman Crystal Rice explains, “We are currently discussing cloud computing. We are attempting to develop a road map and will be exploring various areas where cloud computing may be effectively used. Unfortunately, we have not narrowed down our discussions to answer questions specific to drug development.”
Consequently, Mike Eaton, CEO at Cloudworks, advises companies operating in the cloud to expect lots of questions designed to assure regulators of data integrity.
“The cloud provider should be willing to work with you to address compliance requirements. If they are not, they are not the best fit for your company,” Eaton stresses.
“When moving to a cloud environment, some of the compliance burden will shift from your internal staff to the cloud provider, relieving some of the internal resources.” But, from a regulatory perspective, ultimate responsibility for compliance remains with the biopharma company even when the day-to-day responsibility is outsourced.
One of the benefits offered by cloud-based services is robust security. “There are strong reasons why security could and should be better in a cloud,” Sheibley stresses, but not all providers have those security features in place yet.
Working with a well-established cloud provider like IBM, Google, or Amazon generally provides more robust security than a small to mid-sized company could afford, Eaton says, so cloud computing may actually improve security.
To further enhance cloud security, Pfizer took advantage of Amazon’s Virtual Private Cloud (VPC). “It uses resources in a ring-fenced way,” Dr. Miller explains. Basically, the VPC physically partitions machines and creates a network subnet to isolate virtual machines so they are not available to anyone else in the cloud. “That was one of the key innovations that helped us move forward.” Then his team could do the compliance testing.
Another powerful driver, Ranauro says, is that widely dispersed colleagues can share data without actually moving the data, thus ensuring access to the most current datasets. In a grid-computing environment, in contrast, users would transfer a dataset with its application to another powerful computer, or would break the project into modules that would be handled by many different mainframes. (CERN’s Large Hadron Collider computing grid, for example, has more than 170 computing centers in 34 countries, involving more than 100,000 CPUs.)
The ability to share data via a cloud has been enhanced by a data-transfer protocol developed by Aspera. The problem, according to Michelle Munson, Aspera president and co-founder, is that the common file transfer protocols (ftp, http, and TCP) intentionally slow data transmission, having been developed for the capabilities of the Internet many years ago.
The file-transfer protocol her company has developed the “fast and secure protocol,” dubbed fasp, eliminates the bottleneck of earlier transfer methods. It allows files to be transferred thousands of times faster than standard TCP-based transfer protocols over high bandwidth, long distance networks by providing full utilization of the available bandwidth, and precise control over the bandwidth that is utilized for each transfer. So, high priority traffic can be allocated more bandwidth than low priority traffic.
Cloud computing is relatively new and many enterprise-level applications that can run in a cloud are not licensed for that. Therefore, moving legacy applications into a cloud may be difficult or even impossible. Before migrating to a cloud, Dr. Miller worked with software vendors to ensure that the software licenses were set up to accommodate off-site instances and virtual environments.
Operating in a virtual environment in-house creates a similar licensing conundrum. Because software typically is licensed to run on a given number of servers, there is confusion among some vendors as to whether eight virtual servers running on one physical server counts as one or eight machines for the license agreement.
Users underestimate what is still required, Sheibley says, and, “You still need a robust IT organization to manage the cloud environment, including security and access.”
“There’s definitely a learning curve,” Dr. Miller agrees. “We learned early on that it’s easier to adapt to the way Amazon does things than to force its services to fit our working model. Amazon sells components, and you need to assemble them into something useful. They’re very robust, but very limited,” he cautions, advising users not to try to force them to do things for which they were not intended.
Although cloud computing is an on-demand, pay-for-use resource, accessing that resource is not instantaneous, Dr. Miller points out. In Amazon’s EC2 environment, he says there typically is a 15- to 20-minute window between the time IT makes a request to bring resources online and the time that happens.
Also, he says that the numbers of virtual instances that are needed aren’t always available. “For example, if you require 50, you may get 48 within 15 minutes, but if you can wait one hour, you may get all 50,” he says. In contrast, provisioning physical machines in one’s own data center may take three weeks.
“You don’t know where the science will take you,” Dr. Miller stresses, so “the ability to spin up resources within a few minutes rather than within a few weeks lets researchers pursue avenues of inquiry at the pace of decisions rather than the pace of computing. Clouds allow us to be a lot more nimble, to deliver a result in a timely manner.”
Management
“The ability to manage the data effectively is where a lot of biotech companies are struggling,” Dr. Miller says. “We extended our internal network, but users have to tell me whether they want to run on Amazon’s cloud or on internal resources.”
Pfizer makes that choice available only to its expert users in computational scientists in biology, chemistry, pharmaceutical sciences, or statistics. The behavior of the internal and cloud-based instances is identical, so their decisions may be governed by the need for computing power or the need to maintain physical control of the data, among other potential reasons.
Important Questions to Ask
When entering a cloud environment, biopharmaceutical companies should ask:
1 How long has the vendor provided cloud services? Cloud computing is still relatively new, and few have used it for more than five years.
2 How is data security ensured? All data moving to and from the cloud must be encrypted, but additional security features must be in place; for example, virtual machines need to be segregated from other users.
3 How do they handle redundancy? Virtual machines do fail, but several methods are available beyond physical redundancy, including load balancing, automatic failover, and send/receive monitoring.