Turning down the LAMP: Software specialization for the cloud

Several years ago, a reading group I was in read about the Flux OSKit Project, which aimed to provide a modular basis for operating systems. One of the topics of discussion was the possibility of, and possible benefits of, an application-specific OS. (For example, the fearful spectre of EmacsOS was raised.)

Today, I ran across "Turning down the LAMP: Software specialization for the cloud", which actually makes a pretty strong case for the idea on a virtual machine infrastructure,

...We instead view the cloud as a stable hardware platform, and present a programming framework which permits applications to be constructed to run directly on top of it without intervening software layers. Our prototype (dubbed Mirage) is unashamedly academic; it extends the Objective Caml language with storage extensions and a custom run-time to emit binaries that execute as a guest operating system under Xen. Mirage applications exhibit significant performance speedups for I/O and memory handling versus the same code running under Linux/Xen.

As one example,

Frameworks which currently use (for example) fork(2) on a host to spawn processes would benefit from using cloud management APIs to request resources and eliminate the distinction between cores and hosts.

On the other hand, I suspect that this "unashamedly academic" idea may already be advancing into the commercial arena, if I am correctly reading between the lines of the VMware vFabric tc ServerTM marketing material.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Language-driven OSes

Jonathan Edwards wrote a story on his blog not too long ago about how his distributed, fault-tolerant transaction management platform for money transfers baked the object persistence into the kernel, effectively making the DBMS the OS's file system.

I once heard somebody tell me that Meditech, Inc. bought a copy of a Windows OS kernel from Microsoft and that many point-of-service apps in hospitals across the country are probably running software. Yep, a whole OS for customized display of 8-bit graphics display terminals. (To illustrate how the idea can become overengineering overkill.)

My favorite phrase is Dan Ingalls, though: "An operating system is a collection of things that don't fit into a language. There shouldn't be one."

But this idea traces back to researchers at Burroughs - in the 1960s.

Benchmark?

Where can I find this custom benchmark?

We evaluated the performance of our database storage by running a benchmark that inserts, updates and deletes records of varying sizes over 5000 iterations.

Details here are incredibly light. What is the distribution of the data? Does it contain data skews like the AdventureWorks DB example provided by Microsoft for SQL Server? How long is the data in relation to the underlying storage? Can you produce a graph showing the relation among system events that SQLite cause to fire? etc. are ideas for more rigorous benchmarks. What is an "iteration"? How are they proving that their benchmarks are/aren't I/O-bound, network-bound, or processor-bound?

Also, the "with orm" processing tag seems rather monolithic and perhaps indicates a misunderstanding of "orthogonal persistence" versus what an "object/relational mapper" is, but this is a minor complaint and probably only done to facilitate a rapid prototype of Mirage's much bigger contribution.

The only shameful part of this academic adventure is that it needs to be more rigorous if the author's want to take great pride in their ideas. Performance evaluation is seriously complicated stuff. It is all well and good to say:

One of the main benefits of implementing the SQLite VFS operations in OCaml is to let us experiment with different heuristics and data structures more easily; one of the main benefits of Mirage is exactly that it makes this kind of specialisation straightforward.

But it is another to bear it out in practice. I would suspect that newer storage devices without mechanical read/write arms significantly change both filesystem organization & design and DBMS internal data structures, such as choices for representing indexes (e.g., ISAM vs. B+tree vs. log-structured merge tree) and the general organization of the heap (is it physically designed for persistence objects so that they have a strong locality of reference, while still allowing a logical design decoupled from physical storage?). There are also higher-level issues here not defined in the paper, such as atomicity guarantees concerning the consistency of the data set.

Where can I find this

Where can I find this custom benchmark?
...
The only shameful part of this academic adventure is that it needs to be more rigorous if the author's want to take great pride in their ideas.

This paper is a "position paper" at HotCloud, which means that it identifies an interesting area of research, does some probing evaluations to determine if its worth pursuing, and writes it up for community feedback. It is not intended to be, not presented as, a completed system as 6 pages is rather too limited for that. But fear not, we're working hard on a complete paper full of graphs and benchmarks! :-)

The SQL benchmark used is still available in the github repository, although it wont work on the current HEAD branch as I'm reunifying the I/O APIs at present to work on Xen, UNIX and Javascript via a single ML signature.

With respect to your other points, Mirage builds up atomicity guarantees via what is provided by the underlying hardware. The Xen blkfront provides optional write barriers to prevent reordering, and acks writes reliably via a response system. All higher-level data structures are built up from that, and are written in pure OCaml and so can use the type system to enforce safely statically where possible, and dynamically otherwise.

A fairly full-fledged protocol implementation example is the ML DNS server. I have some out-of-tree code to implement BTRFS in Mirage, but we're still thinking through the best way to expose filesystem functionality. The SQLite integration was a quick way to get something high-level working, and is still useful for the browser/Javascript backend (where you have LocalStorage via SQL). For a number of applications where you just need to serialise something to/from disk, a full filesystem is overkill though and writing to a raw block device is just fine.

There is no POSIX API at all to Mirage, and dietlibc will disappear entirely as soon as the memory allocator is rewritten soon.

W/ regards to benchmarks

It is not intended to be, not presented as, a completed system as 6 pages is rather too limited for that

Sure, but a position paper never contains the tests in the 6 pages, only the justification that the design of experiments was sound. It doesn't bloat your position paper if your design of experiments are rigorous.

But fear not, we're working hard on a complete paper full of graphs and benchmarks! :-)

I am not fearing. I just think benchmarking needs to be taken very seriously, despite the fact it is almost never done so (when benchmarking is done, it usually does not (a) simulate real workloads based on real world scenarios (b) explain why the benchmarks were chosen). I recommend reading: A Nine Year Study of File System and Storage Benchmarking (if you haven't already)

we're still thinking through the best way to expose filesystem functionality

It depends on scenario. Do you want to support quotas? If so, make them discoverable and self-describing. e.g. I once wrote a blog post on my initial thoughts of S3 and how poorly designed the API was for the various sample languages (including code and documentation that was Just Plain Wrong(tm)), as well as the lack of consideration for how developers could probe service-level agreements and other service metadata. For example, S3 embeds the quota for bucket size into a PDF, when it should be exposed through a web service itself. Most cloud storage engines get a lot of hype, but most people don't realize how bad the docs/API are for effectively all of them.

dietlibc will disappear entirely as soon as the memory allocator is rewritten soon

Will your memory system be zero'ed out before new systems are spun up, to prevent security leaks? Or are you going to depend on the language to enforce memory protection?

Do you want to support

Do you want to support quotas? If so, make them discoverable and self-describing.

One of the designs works directly on ML datastructures, and uses a dynamic typing extension to marshal ML values directly to a block device. The block layout structures themselves use this system, so you essentially stream in values directly into the safe OCaml heap from disk, with the layout algorithms implemented as ML functions.

Policy like quotas and so on are at a higher level, so we're not considering those for now. I wonder how far we can get with just snapshot-capable block devices. Time will tell!

Will your memory system be zero'ed out before new systems are spun up, to prevent security leaks

Memory is zeroed before adding it to the OCaml heap, yes. The I/O grant pages come from a separate pool of MFNs.

Or, see Mesos, gaining in

Or, see Mesos, gaining in traction in several of your favorite cloud-using websites, which similarly strips away abstraction layers to enable programs to reason at the physical resource level in a (potentially, should your framework of choice want it), demand driven and first class way. Unlike Tommy's vision, as far as I understand it, it argues against VM-per-app: a VM is one abstraction layer you might want to plug in on top.

Yeah, Mesos is a really

Yeah, Mesos is a really interesting system too, but quite complementary to Mirage. Mesos addresses the issue of "scheduler multiplexing", where too many competing schedulers do a bad job globally. Mirage collapses application layers instead, but doesn't mandate a particular scheduling system.

So you could, for example, write a fast map/reduce in Mirage using Ethernet communication directly, and use Mesos to provision physical resources to it running alongside Hadoop on the same cluster (which could be used for running existing map/reduce jobs written in Java).

Here's a recent talk about

Here's a recent talk about Mirage. Interesting stuff! The language truly becomes the OS.