Virtualization
Yesterday, I attended a talk at CSUCI by Dmitrii Zagorodnov of UCSB on cloud computing. Their in-house developed system, Eucalyptus, is essentially an open-source , API-compatible implementation of Amazon EC2. The concept of virtualization has interested me in quite some time, and I've even put VMware and OpenVZ into production at work, so this is quite up my alley.
It turns out the concept of on-demand virtual computing isn't as complex as I had originally hypothesized. My concept of "cloud computing" had the virtual machines running on a network of multiple machines, tightly integrated to share responsibilities in hosting the VM and simultaneously fault tolerant. Apparently, at least in the case of Eucalyptus, that's not the case. There's essentially a set of controlling servers which manage the cluster and host the VM images, and the controller picks a node to boot the VM on. There's a little more to it, like the injection of SSH keys into the root filesystem on boot and virtual private networks (not VPNs) but that's the gist.
The VM image is fault tolerant in that it's stored on a RAID array, and if the node running it goes down, the controller can simply boot it on a new machine. This is possible because the cluster never guarantees the integrity of the filesystem itself - it actually guarantees that it will be destroyed by always booting a fresh copy. In the context of being able to provide computing power on demand as Amazon advertises, this is acceptable anyway - each machine could automatically configure itself at boot, and within minutes of staring the instance you'd have an additional Ghz of processing power and associated resources.
However, I see a gap between the EC2/Eucalyptus model and the VPS model used by providers such as my own, Slicehost. EC2 allows one to instantly create a VM and charge by the hour that it runs, while Slicehost charges me monthly for one always-on, persistent solution. I've been pondering a system that would be something in the middle, with persistent file systems, dynamic host-node selection, and a central, highly fault-tolerant storage system.
"But wait," you say, "that's VMware ESX!" Correct, it is. And everything it can do, like virtualization with live migration and a distributed file system, are all doable with open source software.
And with that, I think I have an idea for my Master's Thesis.
This entry has 2 comments:
Tyler says:
{woooosh} (And that was the sound of most of this post going over my head.)
Dan says:
Sounds like an interesting thesis. Which open source hypervisors were you considering? Xen? If each VM instance will run an identical kernel, then maybe a paravirtualization system (like OpenVZ) could work. On another side note, have you given ESXi a try? Having a "free" alternative to ESX certainly sounds enticing.