Cloud computing sounds like the perfect solution: outsource the difficulty of running a data center to someone with more experience and equipment. No need for a large capital investment, and you have the ability to only pay for the capacity you need at the time, with minimal delays in scaling-up or down.
In the case of infrastructure-as-a-service (IAAS), you still have to set up the software to run on the instances you're allocated. With software-as-a-service (SAAS), even this hurdle is removed. However, cloud computing is still vulnerable to many of the same problems that in-house data centers face.
Did You Remember the Umbrella?
Human error is the biggest threat. Two months ago, VMware's Cloud Foundry service suffered some unplanned downtime (the entire network infrastructure was disabled), an incident more severe than an earlier outage due to electrical problems.
Cloud Foundry's official blog described the situation: “Unfortunately, at 10:15am PDT, one of the operations engineers developing the playbook touched the keyboard. This resulted in a full outage of the network infrastructure sitting in front of Cloud Foundry. This took out all load balancers, routers, and firewalls; caused a partial outage of portions of our internal DNS infrastructure; and resulted in a complete external loss of connectivity to Cloud Foundry.”
Of course, one would expect other cloud services to be more robust, and internal data centers are also exposed to the same threat of problems hiding between the keyboard and chair.
So what can one do? When it comes to human error, the best description is “when it rains, it pours.”
Distributed computing programs such as SETI@home and Folding@home made their way into history as some of the strongest distributed computing projects of their time, while providing the public with an easy way to lend a hand to scientific research.
A fascinating concept is Bitcoin mining, a decentralized, distributed computing project that sprung up practically overnight. Users with the appropriate hardware (namely, Radeon gaming graphics cards) set their systems to solve hash functions, either alone or as part of a pool (where the pool assigns blocks of work to different systems and shares the results). When a solution to the given parameters is found, the user (or pool) is rewarded with Bitcoins, currency that exists in an entirely decentralized system.
This virtual reward was sufficient to motivate the founding of companies selling mining rigs (computers specialized for bitcoin mining) and the growth (over two years) of a mining network, which now boasts more computational power than Folding@home.
Make It Rain
If virtual pennies (actually, one Bitcoin is worth nearly $20 because of recent press interest) can motivate users to buy more hardware and share their computational power for an arbitrary computational job, imagine what such an approach could do for clouds. Amazon has a program (Mechanical Turk) for “crowdsourcing” human-interactive tasks (e.g., writing captions) with rewards.
A reward-based system with a decentralized network would be more resilient than many cloud providers, as work could continue as long as internet backbones stay afloat. Since the cloud would be made of unsecured systems, it would be necessary to make the work inherently secure.
This could be done by applying some reversible transformation to the data before it is sent to the cloud for computation or by splitting the work so that a large fraction of the systems would have to be monitored to discover the nature of the work.
Redundancy and robustness would also be important, as individual systems drop in and drop out of the distributed network and suffer problems.
Such considerations in designing software and handling sensitive data, though, would be a wise investment regardless of whether the software would be running on Amazon's servers or gaming rigs. Whether it's due to an operations engineer or a smooth operator, or maybe an electrical storm, you don't want to get stuck because of a tangled network.