hobbes, Morgan Stanley OSS

Over the last few years, I have been developing hobbes -- a programming language, JIT compiler, and database system -- as part of my work for Morgan Stanley. It has become a critical piece of infrastructure in our low-latency, high-volume trading applications, and we have decided to release the source code to the public on github (currently can be built for recent Linux and macOS platforms):

github.com/Morgan-Stanley/hobbes

The database system is a lightweight (self-contained, header-only) library used by applications to efficiently log structured (binary) data over shared memory to minimize application latency and reflect application type structure as accurately as possible in log files. We use this to record a very detailed image of application state over time, which we analyze/query out of band.

The JIT compiler can be used embedded in another application (e.g. to "hot patch" an application with very efficient, precisely typed intercepts) or as a standalone interactive interpreter (e.g.: to monitor and query application log data).

The programming language is a variant of Haskell (HM type inference, algebraic data types, qualified types / type classes) with some adjustments to help reduce boilerplate and derive very efficient machine code. For example, we use "structural" record, variant, and (iso-)recursive types to represent data as it's naturally represented in applications and can be deconstructed generically at compile time (e.g. a record can be printed if its first field can be printed and its rest can be printed, a variant can be printed if its first case can be printed and its rest cases can be printed, a recursive type can be printed if its one-step unrolling can be printed, etc).

We are actively using this on major projects, we will continue to develop this github project as we need new features, and we are interested in engaging others outside of the firm for their thoughts, feedback, and hopefully pull requests. :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Some questions

This looks fairly intersting, thanks!

Three questions came to mind:

1. Would you be able to tell which aspects of the language are informed by its specific information domain? The database system looks domain-specific, but should we think of it as a "language feature"? You present structural record/variants/types as a domain-influenced feature, but it looks like a rather general-purpose language aspect to me -- although it may have been initially motivated by your solutions to domain problems. You mention efficiency constraints, how did efficient really influence your design choices? (I see that there is a restriction on type classes, but I don't really understand its description.)

2. Is it actually more efficient? In my experience when people start building their own languages in-house, it often turns out to be the case that picking an existing publicly-available language would have been "good enough" (and simplifies a lot of tooling/training/communication hurdles). Maybe the performance gap (for example) with general-purpose language was high at the beginning of the project, and then you add more stuff, or said languages improve, and the gap is not that big anymore. If I wanted to write the same code today as libraries on top of GHC Haskell, would it still be unfeasible? Would it really be noticeably slower?

3. Why the JIT? You mention that the implementation uses a JIT, but JIT makes most sense when you can do speculative optimization to reduce the overhead of "dynamic" language features. What are the "dynamic" features that justify the use a JIT for your language? (Do you use this to avoid dictionary-passing by aggressively specializing qualified-types-polymorphic functions?). Have you compared with the performances of an ahead-of-time compiler?

Great questions!

Hi gasche, you've really hit at the heart of much of what we're doing here. Let me see if I can address your questions one-by-one:

1. Would you be able to tell which aspects of the language are informed by its specific information domain? [...] You mention efficiency constraints, how did efficient really influence your design choices?

Although this is a very generic project, "efficiency" and especially low-latency has guided every major design choice. We have used this in particular in trading systems that process a very large number of trades and have very small latency budgets.

The bit about type classes is one example of that. The traditional interpretation (going back to the Wadler/Blott '88 paper) is to rewrite overloaded functions to take a "dictionary" as the constraint argument but instead we treat constraints as predicates decided at compile-time (and "decided" means rewriting to an equivalent expression that doesn't have the constraint). This has a technical limitation in the case of polymorphic recursion, but it lets us easily produce efficient code (or maybe you'd say it guarantees partial evaluation for dictionary arguments).

We have common types arranged in memory to be identical to the equivalent types in C/C++ so that movement between C and hobbes has no cost. This is basically an efficiency decision, although it also has the benefit of making such integration very easy.

I guess there are a number of decisions that I could describe this way. As you say, this can serve as a general-purpose PL and probably do plenty of things outside of the domain of finance, but it's at least representative of some of the PL decisions relevant in this domain (e.g.: we don't put MIT Scheme in the critical path in our trading systems).

2. Is it actually more efficient? In my experience when people start building their own languages in-house, it often turns out to be the case that picking an existing publicly-available language would have been "good enough" (and simplifies a lot of tooling/training/communication hurdles). Maybe the performance gap (for example) with general-purpose language was high at the beginning of the project, and then you add more stuff, or said languages improve, and the gap is not that big anymore. If I wanted to write the same code today as libraries on top of GHC Haskell, would it still be unfeasible? Would it really be noticeably slower?

This is an interesting question and there are a lot of ways to look at it. I mean, in principle we could have done this in any number of ways. There were some arguments that we should evaluate queries as JIT-compiled C++ for example, which certainly could work also. But IMHO the logical structure of a Haskell-like language has worked very well in the places where we've used this (hot-patching very specific functions, live querying running systems, scripts for real-time/out-of-band analysis of application data).

It definitely produces _very_ efficient code and although I'm somewhat biased, I think that the internal consensus is that it delivers on that promise very well. Could GHC be made to work in the same role and be just as effective? To be honest, I don't know (but I'd support anyone who wanted to try). I developed this project along with my other responsibilities, working in an application group focused on building specific trading systems. I didn't have the ability to take a few weeks out to investigate GHC garbage collector behavior, for example, so having the ability to control everything very precisely was important in that role.

I'll say one thing about GHC, and this is not intended as a knock in any way (one of the first things I did after we published this project was to let folks like SPJ know about it and to thank them for all that they've done to inspire folks like me in the work that we do), but early on one of the use-cases we looked at for this project was to replace some internal "rules engines" that basically were being used for big pattern match expressions. I translated some of these "rules engine configurations", some crazy 40 column by 5000 row tables, the compiler worked away at it for hours before I killed it. Maybe this has been addressed or maybe there's an option that would have produced an executable more quickly but I was not aware of it.

In any case, I can't answer a question about how a hypothetical translation to vanilla Haskell would work here because it hasn't been done. However I can say that at least in our critical-path use-cases the question can't be whether it's "noticeably slower" but whether it's "at all slower". That 40x5000 match has to run in less than a microsecond. Certainly that's doable in many ways, but we do have latency budgets like this.

3. Why the JIT? You mention that the implementation uses a JIT, but JIT makes most sense when you can do speculative optimization to reduce the overhead of "dynamic" language features. What are the "dynamic" features that justify the use a JIT for your language?

Well there are some features that we use pretty heavily that require staged compilation. So there we aren't using it for speculative optimization, but rather at a certain point we know "this is the data file you will be analyzing" and then we look into the file for its type structure -- only at this point do we know what types we'll be able to read out of the file so we can compile the rest of the program. There's a similar story for networking where we can write programs that will quote other programs to be evaluated in a remote process. We connect to that process _at compile-time_ and negotiate a protocol along with the normal type inference process to decide how to compile the rest of the program.

Does that kind of use-case make sense? It's a little unusual in my experience, but it has been very useful for us.

Thanks!

Thanks for your answers! Further questions come to mind:

I translated some of these "rules engine configurations", some crazy 40 column by 5000 row tables, the compiler worked away at it for hours before I killed it.

This is very interesting to me because I have worked a bit on pattern-matching compilation (for OCaml rather than Haskell), and I know of some of the efficiency constraints there (for example, when checking exhaustivity of the pattern). Two questions:

- Would you be able to publicly release a representative example of those rulesets that pattern-matching compilers do not handle well? Whenever discussing algorithmic costs in a pattern-matching compiler (and warnings, etc.), people typically use the programs that they have around (usually their own compiler), having an example of *realistic* extreme program could be very useful.

- You mention both efficiency of *compiling* the pattern-matching (too slow using GHC), and the efficiency of the generated code (less than a microsecond). Of course, having a JIT means that compilation time matters more than usual, but in any case there is a tradeoff, as most "efficient pattern matching" techniques in the functional programming literature do take a bit of time to run, while there are very naive compilation techniques (for example, generating one "else if" branch per clause) that give not-very-good code very fast. What is your implementation doing? Did you find that a fairly naive compilation approach is good enough for the generated code, or did you make use of state-of-the-art pattern compilation techniques (being careful about not blowing up compilation time)? Do good pattern-matching heuristics (of the sort discussed in Luc Maranget's ompiling Pattern Matching to Good Decision Trees, 2008) make a difference for your problem domain?

So there we aren't using [JITs] for speculative optimization, but rather at a certain point we know "this is the data file you will be analyzing" and then we look into the file for its type structure -- only at this point do we know what types we'll be able to read out of the file so we can compile the rest of the program.

If I understand correctly, your compilation scheme relies on aggressive specialization, and in particular being able to monomorphize code (this is how you avoid compiling type-class-polymorphic code into dictionary-passing style), and you use JIT to monomorphize lazily as the code is about to be run. I believe that the .NET machine handles generics in a similar way -- as opposed to, say, MLton which does ahead-of-time whole-program monomorphization.

Remark: In theory this should let you handle polymorphic recursion: while polymorphic recursion means that an unbounded number of distinct types may occur at a given instantiation point over the runtime of a program, specializing just-in-time makes monomorphization possible in this context -- especially if the type-dependent features do not rely on full type information (as in type-classes) but rather on aspects of them that lose some information, such as "unboxed or not", and typically range over a finite/bounded set of different values even in presence of infinitely many different type instances.

More

Would you be able to publicly release a representative example of those rulesets that pattern-matching compilers do not handle well?

I will look into that question and see what I can do. I will try to get some more of the pattern matching details (representative tables, compile times, match times) in a document on the website since it's been an important use-case.

[...] (for example, generating one "else if" branch per clause) [...] Did you find that a fairly naive compilation approach is good enough for the generated code, or did you make use of state-of-the-art pattern compilation techniques [...]?

No a series of whole-row if/else-if tests would not have performed well for us (though yes, very simple to compile that!).

I like the paper that you cite there, I may have come across that at some point. In hobbes we translate match tables to a DFA and then the DFA to a more primitive expression that shares tests, merges shared states to new functions, etc. We do have some heuristics for column selection and some cases where we can rewrite column patterns along with the input to reduce branching (e.g. we can turn a match on string literals into a series of matches on 64-bit int values unpacked from string contents). When we reduce match tables to tests on primitive values we make very efficient low-level functions. There are a variety of things like that. Since the code is there I'd be happy to go over it if you're interested -- it's all in 'hobbes/lang/pat/pattern.C' and 'hobbes/lang/pat/dfa.C'. Also regex matching is mixed with general pattern matching and the main bit that makes that work is in 'hobbes/lang/pat/regex.C'.

If I understand correctly, your compilation scheme relies on aggressive specialization, and in particular being able to monomorphize code (this is how you avoid compiling type-class-polymorphic code into dictionary-passing style), and you use JIT to monomorphize lazily as the code is about to be run. I believe that the .NET machine handles generics in a similar way -- as opposed to, say, MLton which does ahead-of-time whole-program monomorphization.

Yes that's basically it. We just produce machine code for monomorphic code. I am not very familiar with .NET and I've read a few things about MLton (sounds like a really cool project) but don't have a lot of experience with that either.

We have several cases where type structure isn't determined until a late stage (structured log files, messaging exchanges, protocol negotiation) and so the JIT feature is very useful just to delay compilation until that information is known.

Remark: In theory this should let you handle polymorphic recursion: while polymorphic recursion means that an unbounded number of distinct types may occur at a given instantiation point over the runtime of a program, specializing just-in-time makes monomorphization possible in this context -- especially if the type-dependent features do not rely on full type information (as in type-classes) but rather on aspects of them that lose some information, such as "unboxed or not", and typically range over a finite/bounded set of different values even in presence of infinitely many different type instances.

Well if I follow you correctly, we don't actually specialize in that way because it would require a monomorphic function to dynamically decide that it needs an unbounded number of distinct monomorphic functions. Is that right?

We don't exactly compile functions as they're invoked (we're not that lazy) but just as definitions are introduced we decide how to translate them dependent on whether they're polymorphic, qualified, or monomorphic. From the embedded compiler perspective, at some point a user could ask for a C/C++ function with a certain type given a certain expression (in the whole context it's built up) and that might drive some type class selection and all of the rest, but at the end of it he gets a function pointer that no longer needs further compilation and won't ever invoke the compiler again.

That's not to say that we couldn't put together something like what you're describing. I guess I haven't been really concerned about polymorphic recursion while doing this work (though I know that there are some cool use-cases for it).

Polymorphic recursion

Felix eliminates type classes by monomorphisation, it uses whole program analysis rather than JIT. It guarantees zero overhead from use of type classes (at the cost of whole program analysis). It can do polymorphic recursion through type classes (but not directly), however the compiler will go into an infinite loop if the recursion is unbounded: my type system isn't up to detecting this correctly so I just use a fixed recursion limit instead. The recursion comes automatically there's no special code required: when you monomorphise a virtual function of a type class with an instance function, you then have to monomorphise the instance function code to eliminate any virtual functions it uses, which indirects through the type class abstraction again to another instance. So polymorphic recursion requires zero implementation effort, the problem is to detect if it will terminate or not. If the recursion terminates, the compiler terminates, by turning the polymorphic recursion into a monomorphic one. It can do this because it keeps track of already instantiated virtuals (including any instantiation in progress). (A virtual function is a function in a type class which has to be defined in an instance).

I might add there are more problems in GHC then just laziness. Felix currently does a sample Ackermann's function the same speed as C, twice as fast as Ocaml (which was already Leroy's design spec), but 5 times faster than GHC. Experts tell me that GHC knows the function is strict so the poor performance hasn't go anything to do with laziness. I'm told it unboxes the data, so that's not the problem either. I can't believe GHC can't see tail recursions. It's disappointing when a language with such a strong focus on the utility of recursion can't actually compile a simple recursive function efficiently (GHC). Its disappointing when the compilation model used by C is so much better than Ocaml at handling recursive definitions.

Felix design spec is similar to Hobbes in a different context: it was originally required to track huge numbers of connections in a telephony environment, and to integrate seamlessly with a mega-LOC C++ infrastructure. When you put performance first and try to lift the result to have good theoretical properties (soundness of type system, local reasoning, etc) you get a different result than if you try to implement the theoretical system first then optimise it afterwards. Code is intrinsically conservative: its very hard to change the overall plan.

I'll say one thing about

I'll say one thing about GHC, and this is not intended as a knock in any way (one of the first things I did after we published this project was to let folks like SPJ know about it and to thank them for all that they've done to inspire folks like me in the work that we do), but early on one of the use-cases we looked at for this project was to replace some internal "rules engines" that basically were being used for big pattern match expressions. I translated some of these "rules engine configurations", some crazy 40 column by 5000 row tables, the compiler worked away at it for hours before I killed it. Maybe this has been addressed or maybe there's an option that would have produced an executable more quickly but I was not aware of it.

Dude, just use Rete and Lmax Disruptor.

Heh

Dude, just use Rete and Lmax Disruptor.

Dude we just used pattern matching. Even simpler.

RE: Disruptor, it's not needed for the use-case you quoted but we do use something like that for structured logging.

Happy to compare notes privately if you'd rather not have the discussion in public. I'm kthielen at gmail.

How many FLOPS is that?

You said pattern matching failed to compile.

Intel can do over 500 GFLOPS. "crazy 40 column by 5000" is 40x5000, or 200k cells. How saturated are your NUMA nodes relative to peak GFLOPS on your hardware? Look at it in 1 mike slices

Wut?

Wut?

Talking about GHC

You can reproduce the experiment with GHC if you want. Compile time wasn't that important but we did want it to terminate eventually. The residual output (the important part) ran in under a mic, a significant improvement.

Evaluation strategy?

This looks like cool work, thanks for posting.

I couldn't find any information on the evaluation strategy used in Hobbes. Since it's a variant of Haskell, I would naturally expect it to be lazy ; but there are also well-known drawbacks associated to lazy evaluation, in particular when consistent performance is important. So, is Hobbes strict or lazy?

Thanks

It's strict. As you suggest, there are some drawbacks to lazy evaluation and in the situations where this gets used (on the critical path in low-latency trading systems or sitting on large volumes of trading data to run real-time queries) it helps to eliminate some costly overhead.

Having said that, it's possible to make lazy data (e.g. "[0..]" will give you the nats or "[{n=i,s=i*i}|i<-[0..]]" will give you an infinite table of numbers and their squares) but such data has a different type.

Awesome project

Thanks for sharing this gem. A couple of questions from my side, if I may:
1. there was already a question on "why do this, and not use X programming language instead?" - could you elaborate a bit more about your initial motivation, beyond strict constraints in performance/latency? In my career I've seen a lot of such in-house programming languages (mostly in the finance world). Does the effort really pays off in the end? Problems with hiring and training new people, maintenance, lack of libraries? I am currently re-writing my hardware development/synthesis/verification library for the fifth time, each time running into some arbitrary limitations of the chosen programming language and start to tinker with the idea of going in the DIY-PL direction too :)
2. How long it took before you had a production-ready version running? How often do you find bugs in production that are related to the PL environment, and not the actual software written in it?
3. How you've managed to convince your management about benefits of open-sourcing such a thing?
Thank you.

Thank you!

Well you can always take the "isn't it possible to do the same thing with X?" path. Maybe similar things could have been built with Agda or Pascal or Java or Haskell or Python or whatever. Many interesting propositions have more than one proof. Many interesting problems can be solved with more than one program. :)

There were several problems that motivated the development of this project. I hope to write more about those in detail in the coming months.

The first version of this JIT compiler was in use in production in a major trading system within three weeks of starting. Of course it didn't have every feature that's in it now; I've been adding to it off and on for the last three years.

Management here have been very interested in open source for a while actually; convincing them was not a major task. It makes a lot of sense to open source this project.

Rust

This is very close to the kind of languages that I am interested in, strict languages with concrete memory layout and type-classes.

It seems similar to Rust. Certainly you have type-classes that monomorphise at compile time, and data and function pointers compatible with C/C++.

In addition Rust has memory safety (affine types) and thread safety (exclusive mutable references).

Are you aware of language features you have that Rust does not? Does it have a garbage collector, if not what are common allocation strategies (RAII, Arena, etc)?

Interesting topics

I've thought about substructural types and dependent types -- actually there's a limited form of dependent types already in now with a couple of interesting uses but not yet general sigma/pi type intro/elim or things like equality types (I keep HoTT in the back of my mind though).

I haven't spent a lot of time with Rust and I haven't done a feature-by-feature comparison with hobbes, but there's definitely bound to be some overlap (and with other PLs too).

The README goes over a few distinct features that have made this tool particularly useful in our environment. I hope to write more about it in the coming weeks, but perhaps from the storage and networking sections of the readme you can get an idea of some of the things we're doing with it.

In terms of GC, hobbes has a region allocation method and this fits very well with low-latency trading applications. We also process large volumes of trading data in real time and this allocation approach has also worked well here. Currently it's a manual process though, so we tend to have pretty coarse regions. Perhaps we might try out some region inference methods, I understand that others have had some success with this.

Region allocation

Good stuff, region allocation is the same as the arena pattern. I have found that stack + region allocation handles most of my use cases, and is good for high performance real time stuff. I look to allocate all resources before an algorithm runs, so that there should be no runtime out of memory exceptions, of course this is not always possible, but I am looking to manage this with a monadic effect system.

I like HoTT too, but I don't think dependent types as normally implemented are right for a programming language. For me the value of a type system is in what it can statically prove about the program, and you have disjoint-union (sum) types for runtime purposes. I see no point in having runtime features in the type system, as your value level language is already there for that. If you merge the type and value levels you are basically adding Prolog like features to the value level, which seem contrary to the need for real time, high performance computing.

How far have you gone with things like kind-inference and higher kinded types? What instance selection method are you using for type classes? (global coherence? Committed choice?). I personally favour global coherence with backtracking and disequality (as opposed to the more usual closest-match specialisation).

Edit: Another question is, why is it so reliant on C++ for IO? Why not implement native IO libraries in Hobbes itself? Also what happened to Calvin :-)

Calvin ...

... actually was the internal name for the compiler originally (the PL has always been hobbes). I had some nice comic-themed documentation and I liked the C&H overloading ("Curry and Howard" might make a good comic?). Obviously that's not kosher for a public project though. :)

Instance selection is basically first-match (like in pattern matching). I think that it's interesting to consider alternative approaches and the constraint resolution process can be "plugged into" (there are several built-in constraints that don't represent type classes for example).

For a while, the focus has been on structural types and calculations involving structural types. This has complicated the question of type constructor overloading where kinds are usually very useful. So for example, in Haskell it's very easy to say that 'List' has kind '* -> *' because that's how list types are already formed at the type-level, but in hobbes a list of 'a' looks like '^x.(()+(a*x))' (ie: a recursive variant, equivalent to List). There are also type constructors, but I'm not completely sure that I want to follow the same path as Haskell here (not totally against it either).

I'm not sure what you have in mind for the normal implementation of dependent types, but I agree that they shouldn't be used if they introduce unnecessary runtime overhead. I kind of see it as the reverse though, you need to waste resources at runtime without dependent types, doing redundant checks or storing redundant data that could have been resolved earlier. Robert Harper's comments on boolean blindness are a good example. The bool type looks like '()+()' and those two sides basically hold no information (other than which side they are -- a secret protocol you keep in your mind about what each side really means). But perhaps the type could be '(x<y)+(x>=y)' and at runtime it's the same thing because both branches of this variant are static facts about static program variables (the variables may have different values at runtime, but "this 'x' variable right here at this place in this program" is a static thing), and such facts can be representationally-equivalent to unit but not equivalent in terms of static information content. A more boring example are the file reference types in hobbes like '[int]@f' which are represented by a 64-bit int at runtime and are interpreted as offsets in the file "f" (an expression that yields a structured data file). This is a minimal data representation, lets you keep an '[int]@f' from being mixed up with an '[int]@g', and it works well with generic definitions / type classes that just peel off the reference (e.g. '(Show a) => (Show a@f)').

I hope to write a follow-up article some time in the next few weeks to get into some more detail about some of these things we're doing with hobbes in production applications.

Refinement Types

I think refinement types are sufficient, but we came across some of these issues in the HList paper. For example do you need types to represent all the integer values? How far do you take partial evaluation? Taking the example of a Mandelbrot set calculated from static constants, even though this could be calculated at compile time, this is probably not what we want.

In the lambda cube, dependent types are types that depend on values, it seems fairly clear.

It may simply be that my understanding of what a dependent type is, is too rigid, but the dependent type languages I have seen seem to collapse everything to one level, whereas what I want is to selectively lift values to the type level when it makes sense.

You also want some degree of subtyping, so having 'int' a separate type from 'int_is_zero" could be problematic, although we could define a type class for type level equality with subtyping.

In the end the inequalities attached to types can become complex like Int{x . (x > 0 & x < 3) | (x > 4 & x < 5) | ...} In which case representing the constraint as a refinement in the integer domain seems sensible. These are not dependent types in the sense of Pi and Sigma types, but normal types with value constraints inferred from the code. We can solve the value constraints using a logic-constraint solver or a SAT approach.

Good points

I find myself in agreement with much of what you say. You have a good grasp of the same issues we're dealing with.