OPIS -- Distributed Arrows In O'Caml

The short version is that they let you deploy "Arrows" over the network. The long version makes reference to Lamport and touches on the use of theorem provers in their system.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

And questions/comments are welcomes ;-)

First, you can't imagine my reaction (of satisfaction (and fear)) when I saw the word "Opis" in the "forum topics". I'm a daily lurker of lambda-the-ultimate for 2 years now: thanks all for these 2 years of continuous, great, and contagious ideas.

I hope that the mail is understandable and I believe that it highlights most of the technicalities of Opis. Btw, your "short version" is a lossless compressed version of the 12 pages technical report ;-)

The goal of Opis is to provide a formalism as well as a set of tools that allows the developer to build *reliable*, large-scale distributed systems. The advantages of Arrows are twofold: first, Arrow combinators defines a reactive, dataflow-oriented EDSL in OCaml. Second, Arrows abstract over computations, allowing to build and manipulate a Mealy automaton without going into the machinery.

The first point offers a lot of benefits to the developer. I'll not develop them here, the reputation of reactive systems in mission critical systems is, I think, well established (Lustre, Signal,...). The second point is a goody for the poor guy (me and anyone who want to improve Opis) who has to develop tools manipulating Opis-based code (network interface, simulator, debugger, model-checker, performance profiler, formal proofs, ...). You are manipulating a Mealy automaton and this "event function" is a value in OCaml: you can duplicate it, process an event on it, compute its signature, ...

Concerning the verification part, there are two perspectives: the lazy one, that says "let's model-check it". Although usually model-checking is a lost battle (state space explosion), our use of dynamic partial-order reduction scales, most of the time, almost linearly with the number of hosts. The other approach, my preferred one, consists in working hand-in-hand with a theorem prover. At first, we, and it's not new, had defined some critical functions of a distributed system in a theorem prover, certified them and then used the exported code in our system. In the field of distributed systems, it is, to my knowledge, a premiere.

My current work is to formalize the Arrow/Mealy instance in Coq. Then, we should be able to build the event functions in Coq. Finally, based on the Coq-event function, we would 1/ export it in OCaml and plug it to any "launcher" 2/ prove some properties about the distributed system it builds.

In fact, my biggest concern is in convincing people to try Opis: large-scale distributed systems (such as peer-to-peer systems) are coded by C/C++/Java people who, apparently, are not really concerned about reliability. When a network is composed of 100.000.000, they don't really care about the 10 or 100 nodes which can't join, are disconnected, dead-lock, or seg-fault... But I'm confident that things will change as more and more systems are distributed. "One man's trash is another man's treasure": see http://status.aws.amazon.com/s3-20080720.html recently.

Wow, this is a quite long post. I hope its signal/noise ratio remains as high as usual lambda-the-ultimate posts are.

I am so surprised this came

I am so surprised this came out of O'Caml, but maybe all that means is that I haven't been paying attention. I guess it lets the language jump right over the SMP hump.

Why OCaml

Well, I did it in OCaml because I'm more fluent in this language. But there already exists an Haskell implementation of a small subset of Opis, for example. Translating the whole set of tools in Haskell should not be difficult, most of the code being purely functional.

Also, at the beginning, I had troubles with loss of polymorphism in Arrows and asked Martin Odersky for some advices: the solution was to implement Opis in Scala :-)

So, definitely, Opis is not tied to OCaml. It just takes benefit of OCaml specifities, as it would take benefit of Haskell, Scala, F# or $your_prefered_typed_functional_language.

Btw, in ./theory, you'll find a not-so-simplified implementation of the Cyclon protocol in Isabelle: the beauty of the Arrow EDSL is that, as soon as you are provided the Arrow combinators, the event functions are almost the same, whatever language you use. For instance, here you could compare the Isabelle code and the OCaml code in ./examples/cyclon/.

I'm curious about a

I'm curious about a comparison to the work in lucid-synchrone...

Lucid-synchrone and Opis

[disclaimer: I'm not a lucid-synchrone expert. Please, correct me if I'm mistaken]

A first, technical difference is that lucid-synchrone is a standalone language: it provides an homebrew compiler, type-checker,... The use of OCaml comes from the fact that the compiler "source-to-source compile" lucid-synchrone code to OCaml. On the other hand, Opis is an EDSL, hence is provided as any library: I did not have to design (and ensure the correctness of) a compiler, a type-checker,...

The second point is that lucid-synchrone is "synchronous" and the compiler does a lot of work to ensure synchrony. Opis targets networked systems, which use TCP and UDP that are asynchronous. This relaxes a lot our constraints compared to lucid-synchrone.

A last point is that lucid-synchrone works with streams, hence enabling the sampling of the input streams, which might introduces "time inconsistencies". Therefore, they use a "clock calculus" and check the consistency of a code at compile time, using their modified type-checker. In Opis, a reactive function is given an input event, processes it and returns a list (possibly empty) of output commands. Therefore, there is no possibility for/use of sampling and, in fact, no notion of time.

To conclude, lucid-synchrone and Opis share some general design rules but they target different domains. Lucid-synchrone targets synchronous, real-time, mostly embedded systems whereas Opis want to be run on (billions of) John Doe's computer(s), using asynchronous channels in an event-oriented fashion.

Given your brief description

Given your brief description here and in this thread, Opis sounds similar to Orc. I've saved the paper for reading when I have more time.

Orc vs. Erlang?

i have not delved into either much, but the abstracts for Orc and Erlang always sounded really similar to me. maybe Erlang systems tend to escape out of Erlang less often than Orc ones, for the local node worker code?

Indeed, Orc is a wonderful

Indeed, Orc is a wonderful language, IMHO. Opis shares with Orc this data-flow oriented approach, as well as the focus on a small set of combinators.

However, Orc is a language for distributed *computing*: you have some computations and you would like to distribute them over a set of computers (e.g., Web service mashups). Acute, JOCaml, Erlang, etc. also address this problem.

On the other hand, Opis is an EDSL for building reliable distributed systems: you have a set of networked computers and you want them to provide a service. In the paper, we have shown examples of peer-to-peer systems (e.g., file-sharing systems) as they illustrate the challenge of constructing a large-scale distributed service from unreliable peers that are communicating over an asynchronous network (the Internet). You could also think of Paxos that solves consensus. Other choices for implementing these systems are Ensemble, WiDS, Mace, P2, etc.

However, Orc is a language

However, Orc is a language for distributed *computing* [...] On the other hand, Opis is an EDSL for building reliable distributed systems

I'm not sure I see the distinction you're trying to draw, or rather, I don't see how Orc falls into distributed computing. It certainly can be used in that scenario, but it seems to me that each Orc node is orchestrating its own computations independently from the others, which would seem to make it a distributed system. I don't see why a p2p network wouldn't be expressible using Orc, for instance. I will read the paper to understand the distinction you're trying to make.

I think that Orc and Opis

I think that Orc and Opis are aimed at programming at different levels. You could build Orc on top of Opis for instance. In principle you could also use Orc to do what Opis does. But I think that it would be more reasonable to use Opis to implement, e.g., an efficient Paxos, and use this Opis-Paxos within Orc (as a Channel) to get e.g. a form of replication.

Also, I forgot to mention

Also, I forgot to mention that, unlike Lucid-Synchrone, Opis is not "just" an (E)DSL. It comes with an extensive set of tools:

  • replay debugger
  • simulator (that might not be relevant for Lucid-Synchrone)
  • model-checker
  • performance profiler (with complexity inference)

I don't know the state of the art for other reactive systems and I suspect that other reactive systems, such as those used at Esterel, have such tools too.

The same point could be made about Orc. But, as I wrote, that would be an Orange and Apple comparison.