## The Rust Language

Rust is systems programming languages being developed by Mozilla. It's very preliminary work, but the list of features suggests an interesting intersection of features. I'm particularly excited by the control over memory layout that Rust gives to the programmer, but it also has massive concurrency, structural typing, and the “ability to define complex invariants that hold over data structures” in its bag of tricks.

The main developer is Graydon Hoare, with contributions from heroes-to-the-Internet Dave Herman and Brendan Eich amongst others.

## Comment viewing options

### it also has massive

it also has massive concurrency, structural typing, and the â€œability to define complex invariants that hold over data structuresâ€ in its bag of tricks.

On the linked page, this actually refers to Typestate. The original Typestate paper is one of Jonathan Aldrich's "classic programming langauge" papers.

My immediate feedback is WHO CARES if your language supports typestate. Now you need a member on your core language design team who can design libraries that effectively use this feature. That should be the most exciting thing to look forward to. Thus far, I am unaware of any open source industrial libraries that use typestate.

### Thus far, I am unaware of

Thus far, I am unaware of any open source industrial libraries that use typestate.

Chicken and egg -- there's no 'industrial language' with typestate constructs so no 'industrial programs' take advantage of them. The interesting thing about this group of guys is that they're longterm Mozilla folks (well, hopefully to be for Dave ;-)) -- they know about the feature, they write code, and think they've been missing it.

Edit: What would be exciting, but does not seem to be the case, would be if the plan was to incorporate this into the Mozilla pipeline. That would, by far, set it apart from other proposals for new systems languages and make it more compelling.

### Oh, you!

[Responding to your edit:] Just you wait. We aren't going to ship Rust code in Firefox N for any N I can yet foresee, but we are not just hacking on Rust for its own sake. I spoke about this at the #moz10 summit; I'll blog about it shortly.

/be

### We care.

We would still be in caves with that attitude. WHO CARES about Ugg's new fire thing, we need cave-people who can use it.

Somehow, we got past that problem.

True, Rust needs library code. This is obvious and we acknowledge it; we're working on it. But the claim that "library" programmers do not effectively use poor substitutes for typestate (assertions) today is unreal.

Mozilla code is full of assertions. They need to be path-killers, but we can't afford their overhead in C++, or cope with assertion (check in Rust) failure without Rust's task model. I blogged about this years ago.

/be

### Whoa, der

I didn't claim there was substitutes, poor or otherwise.

I just noticed Rust doesn't have a standard library.

All I am saying is that if your library isn't designed well then you will have people either (a) rolling their own library; (b) sticking with whatever language and libraries they use now; (c) rolling their own language, whether it be a DSL for the task or a 3GL.

Most of my productivity problems as a developer seem to be overcoming really stupid APIs. If the language sucks, I can always use DSLs instead. But at some point I have to interface with something, and thats the API. If that sucks, I am screwed and my life is a nightmare.

It's actually a good time to be you, considering the work laid out before you by folks like Jonathan Aldrich and also JSR-308.

### Standard library coming, with typestate FTW

I thought your point was not merely that we lack a standard library at the moment (anyone can see this), but that Rust library developers might not be able to use typestate well, or at all.

Self-hosting the Rust compiler and building the standard library loom before us as next steps. These should prove typestate usable, or force changes to make it so.

Mozilla's use of assertions is well-established and part of our development process (we fuzz-test and automatically test debug builds, and treat assertbotches seriously). With Rust we can write check statements instead. This looks likely to be a pure win for our C++ hackers who try out Rust, whatever else happens. It's not a hard shift in programming culture (compared, say, to lack of nominal types and C++-ish OOP support), and it yields compile-time errors instead of possibly-missed runtime assertbotches.

Every C or C++ program I've ever worked on, including Unix kernels in my youth, had assertions and even runtime checks attempting to enforce higher level invariants than could be expressed in the host language's type system.

Typestate (along with better failure handling of course) looks exactly like what those of us frustrated by the debug-only and unrecoverable nature of assertions have always wanted. Even now, in Mozilla code, we optimize performance-critical paths by hand according to contracts and higher-level invariants or protocols that are implicit and unchecked, ignoring comments and assertions.

Of course, there are many ways to beef up one's programming language, or get a better one, to automatically check (statically, dynamically, or both) such high-level contracts. Many of you LtU members know more about this general area than I ever will.

But with Rust's typestate system we do not have to make a big bet on a researchy static-only type system, or a still-researchy dynamically checked contract system, or any such thing.

Instead, we choose a well-proven (but under-appreciated) approach that resembles our current use of assertions. We can't graft typestate onto C++ easily, although we have invested heavily in C++ static analysis. Anyway C++ would need other major changes to scale up to manycore targets safely. So we're exploring typestate along with other well-researched ideas in Rust.

Does this help? I still don't see why your first comment made an issue of typestate usability for standard library (or other) Rust programmers.

/be

### Oleg says:

Lightweight static capabilities. Oleg Kiselyov with Chung-chieh Shan. Electr. Notes Theor. Comput. Sci, 174(7), pp. 79-104, 2007.

No real point here except that you may want to see if it is not just smarter to invest in a type system which supports the above instead of rolling out an older approach.

### You write "older approach" as if it's a bad thing

Nice research, love Oleg, but no thanks. Rust (please read the FAQ) is about older ideas that have been well-researched and implemented, and which do not carry the risk inherent in newer, less implemented, less used, research.

/be

### Hell no

I agree with you, well-researched and tested is good. Just wanted to point out that stuff like typesafe handling of constraints is expressible in some typing systems by default. I have no idea how it compares.

Cheers, M

### Hmm, lightweight static

Hmm, lightweight static capabilities are based on an idiomatic use of ML modules and Haskell type classes. I'm not sure I would call those new!

### How does this differ from using phantom types?

You guys are obviously familiar with OCaml (since Rust is implemented in it), so how does typestate differ from the kind of invariants that one can encode in OCaml using phantom types? Is there a feature here in Rust that I should be jealous of as an OCaml programmer?

An aside: I do philosophically rather like the approach of using tried-and-true ideas, but I'm a little surprised by the notion that the Hindley-Milner type system is anything but tried and true. H-M is about 40 years old at this point, and reasonably high-performance implementations like OCaml have been around for a long time --- OCaml itself is 15 years old.

### Re: How does this differ from using phantom types?

so how does typestate differ from the kind of invariants that one can encode in OCaml using phantom types?

It's a good question. In the small scale, typestate allows for an imperative style that ensures memory safety; in a functional setting, this mostly doesn't come up. As for tracking user-specified predicates, you're right that phantom types are a lightweight mechanism for similar purposes. We're still iterating the language, and we'll have to do comparisons to see. Too early to tell.

Is there a feature here in Rust that I should be jealous of as an OCaml programmer?

In theory: concurrency and parallelism, greater predictability of performance, particularly with respect to memory layout, much higher levels of stack allocation.

In practice: let's wait and see!

Dave

### H-M

Oh, and I forgot to answer: no one said H-M isn't tried and true! But a) it stands on a very delicate knife edge -- you can't put exactly ML's type system on a language that isn't ML! and b) we can be much more confident of our ability to produce high-quality error messages with only local type inference.

Dave

### The way I see it..

HM is just an abstraction of general inference. It just boils down to a few rules on unification, generalization and instantiation. So, in that manner, everyone is doing HM.

### Is it HM if you don't generalise?

Except they're not, because one of the common design decisions out there is not to generalise automatically for the user. It's notable that GHC is abandoning local generalisation as well.

### Mutation / Linearity

I think phantom types without linear types are not enough for a faithful emulation of typestate.

I'm basing my reply on the typestate system as describe in this paper. The paper says their system is implemented in a language "NIL", which I haven't otherwise heard of.

Procedures are allowed to mutate their parameters, and in the typestate system a procedure declares a relation between input and output typestate. When typechecking following statements, the environment is updated with the resulting typestates.

This could be simulated with linear types, by making the argument the only reference to the value, and returning a new value with new phantom type parameters (linearity even supports true update-in-place, which seems to be necessary for Rust's systems programming aspirations).

In some cases like closing a file it is important that old references vanish. In other cases it is probably enough to switch to immutable data so using existing references is not fatal, and just return a new value with new phantom type parameters. This works fine in O'Caml (I imagine you already use some phantom types), but isn't enough for Rust's goal of working without assuming garbage collection.

Clean has been around for a while, but languages with linear types do seem to be a bit less established than plain Hindley-Milner.

The paper also describes a convenience which seems like it would be hard to emulate. For each type their space of typestates is a lattice equipped with lower bounds and lowering coercions (e.g. closing open files) which are automatically applied at join points - implicitly closing open files when an exception occurs, for example. I don't know whether this is a good idea, or if Rust plans to include it, but it does seem like it would be hard to emulate.

### I think what you are saying

I think what you are saying is my comments came off rude. I apologize. Can you forgive me for being a jerktard?

On a separate note, have you ever designed a standard library? How big was its scope? I expect this issue to be settled in the typical open source manner, where the one who writes the code decides what it looks like initially, and others augment the API based on experience with actual usage. Yet this can cause a lot of inconsistency.

It's not easy, even with experience. I spent two weeks recently just researching and arguing with myself what the interface for a Uri type should be -- such as looking at implementations in .NET, Java, Python, Ruby, Go, and Haskell. Uri was one of the more obscure problem domains to model, even after reading Tim Berners-Lee's RFCs. I spent the time on this mainly because I was not convinced (a) current Uri model implementations were good [.NET, for example, has a slew of obsolete or deprecated methods on the Uri class, and the class itself is the largest one in Silverlight]. (b) sticking with the status quo was a good idea, especially since I believe problem domains and experience should be the motivating factors for language design. Another example is Time & Date. (Some have given me feedback on this and have told me it is silly to sweat these details when there are much larger issues to tackle; I disagree.)

I think from this conversation we both agree understanding problem domains and relying on experience and proven techniques while carefully discarding cruft is the best way to plan for piecemeal growth of large systems.

### You ask an honest question

Have I ever designed a standard library? I'd like to lie and say "no", but the answer is "yes": I designed JavaScript's standard library (Clause 15 of ECMA-262, Edition 1, plus or minus) along with the "DOM Level 0" (onload and onclick and document, window, forms, etc. -- a great deal of this is only now standardized by HTML5).

I did this in about 30 days in May and June 1995 (10 days in May for the core language), and spent the rest of the year debugging and repenting. I was under orders to make JS look like Java, so some of the standard library is cloned from the JDK1.0 -- in particular, Ken Smith of Netscape did a pretty straight port of java.util.Date, Y2K bugs and all, to JS.

(There, got it all out. Throwing up now...)

Yes, it's hard. Much worse if rushing, and done solo, or nearly solo with another language infecting your (different) language's library design.

So with Rust we launched on github precisely to avoid both of these pitfalls. But we have some savvy hackers too.

Hey, so did Unix. Mike Lesk et al. gave us a bunch of C standard library code that took years to standardize. Some of it was, like my own poor efforts, a botch (gets(3S) -> Morris worm). But as dmr wrote in his C history, C is a success in spite of the quirks and missteps. So too with JS.

With Rust so far, Graydon (who is smarter than I am) has had other advantages: years of time to study and think as well as design and prototype, mad OCaml/Valgrind/GNU-toolchain skills, and best of all a stronger programming languages patrimony than JS's Self/Scheme/Java-or-really-C-and-mostly-syntax/Perl-regex/Awk(!).

Now that the Rust design is prototype-implemented (mostly) and seems to hold together, we are ready for select "lead users" and super-strong compiler/runtime hackers.

Between all of us now working on Rust, plus those about to start working on it via github, I am confident we'll evolve good library code. But enough talk, let's do it.

/be

### Two kinds of fools...

Brendan: I'm sure you know the old saw about there being two kinds of fools. You are certainly no fool, and in consequence I find some of your statements here very puzzling.

Your statements here read (to me) as if you are saying typestate as "old therefore good" and modern type systems are "new therefore untested". This is flatly mistaken, and you should reconsider your view - particularly if it reflects your actual motivation for design decisions in Rust.

Rob Strom and Shaula Yemini (who co-invented typestate) would be the first to acknowledge that typestate has never been adequately explored as a core element of a programming language. On the other, the "new and presumably untested" type systems that you are apparently reluctant to adopt have been in wide-spread use in multiple languages for nearly 30 years. Yes, there are new and bleeding edges, and they should be viewed with judicious caution; that's not the same statement.

You probably know that my work on BitC shares something of your view: we tried consciously not to invent. Because of your statements, I'm concerned that you may be getting led off the mark in Rust. Since I would very much like to see Rust succeed, that seems unfortunate.

Typestate is a promising idea. I've looked at it as well, and I think it holds a lot of interest for static checkers and for programming languages. I'm happy to see Rust exploring it, both because it's an interesting idea and because BitC has gotten a lot of mileage out of borrowing from other successes.

If it would be helpful, I'ld be happy to stop in and spend some time with you and your team trading thoughts and experience. I think you know how to reach me.

### Hi Jon, long time since SGI.

Hi Jon, long time since SGI. My argument with Z-Bo was compromised in a couple of ways, so let's start over.

We're implementing Rust in OCaml, so we have some experience with more modern type systems than the one in the implementation language used by every competitive browser implementor (C++). We've used SML-NJ for another project, so you should consider us to be singing with the preacher in that choir.

My "older is better" remarks were aimed not at the ML family, which is a definite influence on Rust, rather at Haskell, in particular Oleg's paper brought up in this sub-thread.

I don't dismiss either Haskell or research built on it in the least. Even the Galois Connection folks have shown production wins using it for some programs.

But Mozilla cannot bank on anything of the kind at this juncture. We are taking enough risk building a parallel browser engine that won't speed up without more cores than currently ship on common machines. (The Actor-language aspects of Rust, along with the memory controls, are at least as interesting as typestate from this light.)

What's more, we have a large community of C++ hackers to migrate assuming this comes off (not a sure thing). They will have enough ML-ish type system goodness in Rust to swallow (and some will choke, no surprise) without us trying to push anything resembling Haskell.

Finally, Rust is not my baby, but I'm supporting it as a Research (Mozilla Labs) experiment and we are not messing around idly with it. It is therefore not likely to be fruitful to meet with only me or the people in Mountain View -- we'd need Graydon Hoare, who is in Vancouver.

We have others wanting to meet too, so I'd better get back to you after calendaring a bit. Appreciate the offer.

/be

### Been There, Done That

Having went through the hoops of implementing a compiler in Ocaml, I advise against it.

### Quick response

1. Haskell definitely isn't appropriate for what you are doing. You need eager eval and first-class state.
2. That said, the HM-style type system and Haskell-style constraints are both useful - even for your target audience. We've successfully pulled both into BitC, and we're getting close to a first release.
3. I'm in the Seattle area, so dropping in on Graydon up in Vancouver wouldn't be a particularly big deal. It's very drivable from here if that makes matters any easier.

Our objectives are similar to yours, I suspect, so it would be interesting to have a chance to compare notes. See also my comment below about anticipating convergence.

### Just when I thought I had an

Just when I thought I had an acceptable handle on the types of research in dataflow analysis, a whole new field opens up. Typestate looks pretty interesting, and there are plenty of novel papers on the subject. Should have lots of overlap with linear/uniqueness types, refinement types, contracts and other lightweight verification techniques.

My main typestate concerns would be concurrent settings (LtU post) and abstracting over type state. For instance, consider an abstraction that has state X, Y and Z, and a valid transition from X->Z and Y->Z, but X and Z are not ordered with respect to each other. Can I specify a function that can make the transition to state Z regardless of whether the object is in state X or Y? Much reading to do...

### Rust will eat your laundry

Rust language

This is a very preliminary work in progress. No supported releases yet nor defined release schedule / plans. Caveat emptor. It will crash. It will change syntax and semantics. It will eat your laundry. Use at your own risk. Etc.

I've recently done much reading of work by Adam Chlipala, who has been studying certifiable compilers for a while now - albeit for a much simplified language and to 'idealized' machine language targets compared to Rust. Comments like the above, regarding Rust, are gradually scaring me into submission to the Coq.

### Ur/Web

Have you given Ur/Web a look yet? I would be very interested in your opinion of it. :-)

### Re: Ur/Web

I read up on Ur/Web when it was mentioned a while back. I think it interesting and well motivated, but I've not explored it thoroughly enough to form any justifiable opinions about the Ur/Web language. (I do love its icon.)

### 2 cents

Rust is not a particularly original language.

I haven't looked at Rust in detail yet, but it's my dearest hope that more designers of new languages would take a similar stance to language design.

### Nobody asked this, so I'll

Nobody asked this, so I'll try:

How does it compare to other systems programming languages such as BitC and ATS in terms of "practicality" and what-have-you?

### BitC and ATS

I'm only barely qualified to comment on this, but I had the good fortune to attend Graydon's Mozilla Summit presentation and chat with him afterwards. I've read the Rust code and documentation, and contributed some small patches to the bootstrap compiler. I've read about BitC in the past, but not in depth. I am not familiar with ATS at all, so my comments below are based on a quick glance through its documentation.

Rust is much less mature right now. The implementation is not fully working and the design is in considerable flux.

The Rust compiler has x86 and LLVM backends, while BitC and ATS compile to C. Rust code can make direct calls to C library functions. Rust supports RAII using deterministic destructors. Rust uses actors for concurrency, while ATS appears to use threads and BitC has no (?) concurrency. Rust has task-local GC, growable task stacks, and Erlang-like task trees. Rust and BitC have disjoint union types, while ATS does not (?).

BitC and ATS include theorem provers. Rust uses typestates and dataflow analysis to accomplish some similar goals, but not complete formal verification.

In many other ways the three languages have very similar designs: Static, safe, compiled, multi-paradigm, modular. Types are inferred; programmers have control over memory layout of structures; side effects are allowed but are explicitly marked and statically checked.

### sounds like ten years from now

(or maybe 20) would be a nifty time to be programming. guess i should consider taking care of myself better, to last that long.

### Minor corrections re: BitC

Back end: BitC compiles to C today. That was a bring-up expedience. We need to get v1 out there for a sanity check (not long now), so bringing up real code generation using wither LLVM or the Plan 9 back end is a v2 objective.

Tasks: BitC doesn't have a notion of tasks at the language level. One can argue about the right place to put it. Multicore is on our horizon, but we haven't spent a lot of thought on it in the BitC context yet. We do now have an answer for low-level concurrency management that we thing we like. Also a thing for v2.

Theorem prover: BitC has dropped its theorem prover. We decided that (a) most user wouldn't be able to use it, (b) having used it, most users can read the theorems to determine whether they were useful and relevant. Where appropriate we'll fall back to Coq or Isabelle/HOL.

As you say, our objectives and choices are similar, which is part of why it would be really interesting to get in a room with Graydon and get a deeper understanding of where we made different choices and why. I suspect the differences are probably motivated by either differences in objectives or differences in beliefs about adoption, and that's interesting to me. I think that multiple choices in the ecosystem is important, but that one is likely to come out ahead of the others. Preparing for the inevitable convergence now seems like a good thing.

### Refcounting

Rust requires that concurrent shared data structures are immutable, but its GC semantics for immutable data are based on reference counting. This completely negates the point of making the data immutable! Am I missing something?

### Type system enforces immutability, no MMU needed

Ref-counting in the current implementation is single-threaded, no locking at all. The GC is also single-threaded and only needed for cycle collection. Tasks in the same domain thus cheaply share data and finalize eagerly where possible.

Currently, immutable data crossing a channel between tasks in different domains is copied. We use lock-free queues between OS thread domains. We're going to build on this implementation and measure real Rust code (non-trivial code) before doing anything fancier.

We have lots of ideas for dealing efficiently with large shared immutable data to avoid copying, and others have researched the problem, but it's not important yet.

I do not think in any scenario that we would mmap immutable data read-only using the virtual memory system of the OS, in order to enforce immutability at the MMU level. The type system is enough.

So nothing is "negated" even in the same-domain setting where mutable and immutable data is ref-counted lock- and barrier-free.

/be

### If you're referring to cache

If you're referring to cache and concurrency issues, there are ways to implement reference counting without incrementing a shared counter on every access.

If I'm reading the draft spec correctly, there's no *actual* concurrency between tasks in a domain, is that right? As in, two tasks in the same domain being executed concurrently on two different cores.

I'm fully behind the notion of having logical domains, that would also manage some resources etc., but I don't see why we should squander the opportunity to make use of spare CPU resources if we have "thousands to milions" of tasks available...

What I mean is, what if all of my domains are totally idle, except one which has a thousand runnable tasks. Is my CPU going to be mostly idle except for one thread? Will I have to split my domains up manually to improve CPU utlization? Will I have to put one or more each of these manual "slices" of each logical domain on each CPU to load balance between the logical domains if they have uneven work loads over time? Or do I have to manually migrate tasks between domains to keep the CPU busy (effectively manual, and less efficient since it would have to happen via messages, work-stealing?).

This is exactly the kind of manual book-keeping that "concurrency and efficiency oriented" languages are supposed to handle so that we can write apps with tons of parallel slackness that can scale to dozens of threads automatically. I sincerely hope that I've just misread the draft spec because this seems like a fatal flaw. Which would be a shame because I'm pretty impressed by everything else (well, closures is an obvious contentious point that I do think you've got wrong as well).

### Thousands of Domains

The idea is to have thousands of domains with relatively few tasks inside of them. These domains could all run in parallel on multiple threads or processes. This would effectively keep your CPUs plenty busy.

### To say a bit more

Rust will be used to saturate many-core CPUs, but our experience is that the best way to do that is to dedicate tasks to domains so you know what's not hardware parallel, and separate by domain the tasks that must be truly concurrent (let's say parallel).

Examples include things like parallel video decoders (once the entropy-coded bitstream is parsed, you can farm out macroblocks to different domains' tasks), parallel CSS layout (watch out for floats and other feedback), parallel HTML+JS lexing and parsing. Within each domain you may want tasks for good Actor-model reasons, even if they do not run in parallel on different CPUs.

If you want the system to schedule tasks to CPUs, put each in its own domain and away you go.

/be

### Hardware/OS thread is the wrong primitive for parallelism

But you say in the spec that a domain is an abstraction for an OS thread, that means we can't run tasks in their own domain. The OS overhead alone is orders of magnitude too high, plus the extra overhead that Rust itself adds for domains.

So assuming that domains won't be literally an abstraction of a single OS thread each, but will actually do an N:M mapping on OS threads (with roughly one OS thread per core). That leaves the cost incurred for communication between domains, as well as the per-domain overhead in terms of memory and what not (which must be comparatively significant if it has its own task scheduler). The appeal of tasks is that they're lightweight and cheap. Wrapping them up individually in a relatively heavyweight wrapper - which is also unnecessary since the type system already provides all the guarantees you need for safety - makes them kind of pointless, if you're interested in parallelism (i.e. performance).

This is exactly why Everyone (tm) is working on task parallelism with light weight tasks, which can have fairly complicated and dynamic communication patterns, that get executed on a per-core worker OS-thread, with work stealing to load balance the tasks and keep all the cores saturated (see e.g. cilk). If only perfectly isolated tasks can run efficiently in parallel (due to the communication overhead, and the per-domain overhead), then that's a serious limitation, IMO.

I don't see how anything about domains is compromised by scheduling its tasks to be run on any available resources. The downsides seem obvious, so what's the *benefit* of this restriction?

I don't mind having "bound" domains (like bound threads in Haskell) that will only run on one thread (for things like bindings to libraries that aren't thread safe) when you really do care that something stays on the same OS thread, but I do think that's a tiny minority of the cases (and if your code needs to run in such a domain it should be associated with some sort of stigma). The more common scenario is where you want hundreds or thousands of "jobs" that need efficient communication between each other, but also want them to run on all available cores - the right abstraction for such jobs seem to be tasks, not domains.

We designed domains to be single-threaded to reduce the overhead of having many small tasks talking to each other. Everything from scheduler data structures to reference counts is done thread-local, which is much cheaper than synchronized access.

At the same time the system supports thousands of independent domains. Tasks in separate domains communicate by sending message between domains. Messages are delivered using lock free queues.

We expect this system to work very well in architectures with many cores that are more NUMA-like that todays SMP systems, and I think its a good bet that this is very hardware development is headed.

### Sure, but why couldn't tasks

Sure, but why couldn't tasks in a domain be scheduled among any available cores in the current NUMA "node"?

It just strikes me that domains are way too heavy weight as your lowest-level concurrency primitive. Can I really kick off one domain per game entity in my game? I can see how I would just about live with the overhead of a lightweight task, but a domain with its own scheduler data structures etc.? I doubt that it would be feasible. That means I can't easily run my game-entity logic in parallel with your system, which I could in C# TPL, or Cilk, or any number of other systems, unless I'm prepared to write a significant scheduler on my own, with my own "task" abstraction that I spread out among a small number of domains. That's exactly the kind of tedium I'd want the language/libraries to deal with.

Almdahl's law implies that the attainable degree of parallelism should be a major priority for any language that bills itself as "concurrency and efficiency oriented".

EDIT: also consider that the cost of communicating between domains would make it effectively impossible to write a TPL/Cilk style library on top of Rust. Actually, I'm not convinced that even tasks are lightweight enough. Perhaps you need something even simpler (e.g. "jobs" that don't have a stack of their own, i.e. can't yield, but can block) that can then be shuffled between multiple domains running worker threads. But yeah, that shuffling would need some "special" support to avoid the overhead.

### I don't think domains are

I don't think domains are inherently so expensive nor preclude ideas like work stealing. In particular, esp. for a language investing in type state, you can consider taking advantage of techniques like Zach's lightweight sharing annotations for guaranteeing safe and cheap crossings (where, without such a safe sharing mode, you require message passing). This is also one of the few areas that get me excited about type systems. Singularity seemed to have been moving in this direction originally as well but their theory people seemed to have abandoned that part of the project. As a final consideration in this direction, in a multi/manycore OS, what we currently perceive as an expensive system interaction may no longer be.

Btw, I think work stealing abstractions have quite a ways to go. E.g., Cilk++/TBB support for locality is horrible. When you actually know the 'right' parallel algorithm for a task, such as for the kernels within a browser that Rust might want to target, programming to the the typical work stealing interface is quite hard. This is a shame because (modulo the lack of tools like Zach's) work stealing is my favorite way to (initially) code non-SIMD parallel programs. OTOH, it's a great area if you do research :)

A task in a low-overhead task parallel system is cheap because each task runs to completion (though may block) so you can run it on the worker thread's stack. So it really only needs a few words of book-keeping data and a function pointer.

Tasks in Rust have a stack, and other things, which already make them more heavyweight (and *possibly* already too heavy to assign one per game entity, although I know games that use Lua co-routines per object).

And then that brings us to domains, which are even more heavy weight because they need at minimum one task, as well as a scheduler, and presumably a bunch of extra book-keeping to keep track of per-domain memory constraints etc.

It just strikes me that Rust in its current form has no support for lightweight task parallelism. Which means it's limited in the degree of parallelism it can expose, which means it's limited in how much it can scale. This is a shame because it looks pretty awesome overall, but it just seems like this is not the area you want to make compromises in for a concurrency oriented language.

### Sounds like I need to reread

Sounds like I need to reread their proposal!

### Could you be more specific?

Could you be more specific? That comment isn't very helpful at all.

### sylvan,

I really appreciate your thoroughness. I've had similar complaints about the ultimate limitations of languages like Scala and F#. While these are both well-designed languages, they are ultimately at the mercy of many physical layout characteristics and programmers must write code using the static production rules of the language while assuming the impact of those physical layouts; jgit is a good example of some nasty code written to performance tune Java code for DVCS. I also don't think most people even appreciate how much most OS kernels, like Linux, have changed in the past 10 years.

It seems to me that what is really needed is co-design. But there appear to be a couple of ways to achieve this. How would you personally suggest it be done? The only other alternative seems to be DSLs external to the language. (By the way, I don't know who you are? Are you an academic who has written papers on this?)

### What do you mean by

What do you mean by co-design? Between languages and hardware? If so yes I agree that would be helpful, but no I have no clue how to go about achieving it.

And no, I'm not an academic. I'm a game developer (mainly graphics, C++). Languages are a hobby.

### Machine Learning and tooling

There was a recently completed Ph.D. thesis that argued for machine learning to achieve hardware-compiler co-design. Further co-design would be compiler/library co-design through active libraries.

Awhile back there was a Google SoC project focused on primarily providing Boost with actors-style concurrency (Surge), with transparent assembly optimizations. But it was a lot of manual coding and showed the weaknesses of Boost and C++ in codesigning software. Dave Abrahams actually hired the student to work at Boost Consulting, because he dropped out of community college after dropping out of a game programming major at I think DigiPen. Dave said he really didn't believe a college student could master Boost, let alone design a Boost library with the student having full control of the API decisions and responsible for maintaining consistent Boost-style, but was really impressed by the proposal, so much so that it opened the door for future students in future years to do Google SoC Boost projects.

Matt has argued contrary to my opinion that C++ isn't well-fit for such codesign. See this old thread and his comments here. But results, IMHO, say otherwise: Boehm's C++ reports just show how big a concern it is.

### Zach's lightweight sharing annotations [PLDI2009] & Singularity

In particular, esp. for a language investing in type state, you can consider taking advantage of techniques like Zach's lightweight sharing annotations for guaranteeing safe and cheap crossings (where, without such a safe sharing mode, you require message passing). This is also one of the few areas that get me excited about type systems.

I am confused. I finally read Zach's PLDI 2009 paper. The use case for these lightweight sharing annotations seems pretty narrow to me - the advantage appears to be that multithreaded systems can currently make better use of cache locality than nested data parallel systems (which have more predictable addressing). See Section 2.1 of the paper where Zach does a good job defining just how narrow this approach is. Its main value seems to be legacy code. Like Cyclone and CCured, they attempt to put a champagne gown and lipstick on C. To put this in context, ca. early 00's, here is an example Cyclone program, with Java-style syntactic annotations:

// See http://www.cs.umd.edu/~mwh/papers/cyclone-cuj.pdf
#include
int main(int argc, char *@fat *@fat argv) {
argc--; argv++; /* skip command name */
while (argc > 0) {
printf(" %s",*argv);
argc--; argv++;
}
printf("\n");
return 0;
}


First, is this the cleanest way to process command line arguments? How do we measure maintenance? The problem domain for Cyclone's toy example is not well-defined, so we don't know, but there are many command line options processors.

Second, @fat annotations (seen in this example) in Cyclone were similar to Java's dynamic bounds checks, except for the fact that the Java compiler could statically infer repeated bounds checks are unnecessary.

Third, the Cyclone authors had some benchmarks about Java programs vs. C programs vs. Cyclone programs, using the Debian benchmark. But the Debian benchmark is pure garbage, because the speediest programs in these benchmarks have been excessively hand-tuned by performance experts and don't reflect real world code, since the number of programmers who know the idioms for abusing the static production rules of these languages is remarkably small. They do not represent what it is like to actually write efficient code and still be productive and not end up with a product with a "lame ship" (as Doug Church used to call it). Look at how badly the authors explain their benchmark: "If we assume these were poorly coded and remove them from the averages, then we still see a factor of 3 slowdown for the Java code, compared to 0.4 for Cyclone." Yes, let's benchmark and not bother to explain at all why the benchmark is what it is - just toss in the phrase "outlier" and we're done! They're also always going to be dependent on an OS kernel, which isn't necessarily a bad thing, but when you look at the changing hardware landscape, the big thing that made Linux so successful (Linus wisely focusing on x86 rather than abstraction messes for portability reasons) might no longer apply.

Fourth, features like "Tagged unions" really appear to be missing the point entirely about how to write clean code.

I could go on. In short, it appears not much has changed in 6 years, other than the news that the Linux kernel should be getting safer thanks to industry-sponsored research into automatically eliminating kernel defects. In other words, I don't exactly see how lightweight sharing annotations will be of long-term interest to systems programming. The fact SharC is more gradual in how it can incorporate this tooling than Cyclone is another small win, but has more to do with legacy software than the future.

Singularity seemed to have been moving in this direction originally as well but their theory people seemed to have abandoned that part of the project.

When I think of this, I think of neelk's comments in the Pluggable Types LtU thread. From neelk's comments and what I've read about singularity, this approach is very different from CCured, Cyclone and SharC. Those other techniques can be thought of as trying to make C a safer assembly language (in SharC, via dependent types), while Singularity was (partly) aimed at making assembly language safer through the use of linear types to help resource management. I think the comparison is mostly superficial in nature. For example, linux-sparse contains annotations such as __user that indicate what address space a pointer points into. Do those same annotations make sense for Singularity? No. (Pragmatically, this also suggests linux-sparse and tools like SharC need more integration.)

So that brings me to my big question:

[SharC had a cool idea.] This is also one of the few areas that get me excited about type systems. [Singularity had a different cool idea.]

Which one are you more interested in? Do you want to make [C] a safer assembly language, or make assembly language safer? I would say these are pretty divergent approaches. What about BitC? Or are you just excited about improving legacy software that is already highly tuned for out-of-date architectures? It would be interesting if there was a study to see how much C code in the Linux kernel could survive a CPU architecture shift. I don't think there is.

### sharing invariants on shared memory ftw

The use case for these lightweight sharing annotations seems pretty narrow to me - the advantage appears to be that multithreaded systems can currently make better use of cache locality than nested data parallel systems (which have more predictable addressing)

If, by nested data parallel, you mean SIMD/vector instructions, with some sort of flattening or threaded process revealing it (NESL, CUDA, ..,), unfortunately, a lot of important problems don't match this.

Parallel code working on shared data structures comes up a lot, even in the nested data parallel case (now relaxing the term to include systems like TBB). I often hit cases where, while my basic formulation fits a nested task parallelism model, I still want to use say a concurrent hashtable for a little part or talk to some other library: there are both algorithmic and code reuse challenges to having full independence. While the basic algorithmic problem meriting parallelization can be decomposed by the data (and typically should), unfortunately, there'll often be a little bit that isn't, and it's not worth going into contortions to fix that, or even unclear if you even can. If this wasn't the case, I'd be writing an entirely SIMD/vector/CUDA browser for massive performance and power/energy benefits. One of the coolest things about Cilk++, to me, was the hyperobjects / reducers library for easing the pain here -- even they acknowledged there's a problem!

See Section 2.1 of the paper where Zach does a good job defining just how narrow this approach is.

2.1 describes a technical problem not solved in the previous paper that is to be addressed in this one. Not sure what you meant by 'this approach' (overall direction, this paper, use of shared memory, ..?)

Its main value seems to be legacy code. Like Cyclone and CCured, they attempt to put a champagne gown and lipstick on C.

A good principle for language research is to focus on your innovation: in this case, they are looking at controlled sharing for systems languages. Whether it's C, D, C++, etc. doesn't matter so much. I think they made a great choice: I was able to come in and say, for a wholecloth approach like Rust, that there's experience on running real apps with a shared memory abstraction that matches their default non-share desire. If you're concerned that they're presenting a systems language that's an extension of C... while that's a feature of their approach, it's not what the work was about.

Finally, again, shared memory abstractions -- with the additional challenge of a low-level language -- is not just about legacy code: I am writing new systems code from scratch and for future hardware that (needlessly?) hurts in this area. The Rust crew seems to as well. This problem area is sticking around for awhile further.

First, is this the cleanest way to process command line arguments?...

At a minimum, given the line count deltas in Zach's evaluation, annotations for examples at this level are unnecessary. This example seems out of context and irrelevant to the topic of annotations on shared data structures.

Second, @fat annotations (seen in this example) in Cyclone were similar to Java's dynamic bounds checks, except for the fact that the Java compiler could statically infer repeated bounds checks are unnecessary.

I don't see your point. The requirement for performance similarly steers SharC towards static analysis when possible -- is that good, bad, not novel, ...?

Or perhaps you think these checks can be fully inferred? That would be great for code evolution (even for future code). However, when writing a parallel kernel, expressing intended invariants and automating their checks is still important, such as for code reuse or maintainence by others.

Third, the Cyclone authors had some benchmarks about Java programs vs. C programs vs. Cyclone programs, using the Debian benchmark

Dan Grossman is not Zach Anderson: Zach never compared C to Java, etc., so that seems irrelevant. Zach compared, among other things, GIMP extended with Shoal annotations and runtime to GIMP without Shoal annotations and runtime, etc. Zach -- and his group -- have a great track record in this area (George Necula, Eric Brewer, David Gay, etc.). I don't know the context of the Cyclone evaluation, such as the intended conclusion or the scope of the evaluation, so I can't really respond to that.

From neelk's comments and what I've read about singularity, this approach is very different from CCured, Cyclone and SharC.

Which approach is 'this approach'? Singularity, for better or worse, has a wildly branching history. I'm referring to software isolated processes using an exchange heap. Hopefully the comparison is clearer: default non-share with explicit annotations and guidance to handle sharing. Interestingly, exchange heaps seem to have died down as a topic of conversation.

I don't have experience writing code in any of these systems. I do have experience writing parallel and nested parallel algorithms at a systemsy level using pthreads, cilk, etc., and I agree with the observation that sharing should not be the default but still should be supported. Sylvan's point about being unable to implement lightweight task parallel frameworks within Rust is an example where that would have helped. (I still need to think about his statement that even sequential tasks would be too expensive).

Which one are you more interested in? ... Or are you just excited about improving legacy software that is already highly tuned for out-of-date architectures?

Typestate etc. to get a handle on shared memory for parallel kernels. Types for low-level sequential code etc. is important, but I think there's a practical difference (which would be interesting to quantify). We have an amazing churn and acceptance rate for sequential systems code (through testing, frameworks, best practices, code review, etc.), but not so much for (shared memory) parallel code (despite being, in a sense, the least restricted form of algorithmic parallelization). I don't know the solution, but find the line of research here appealing (and, as noted, complementary to other language-level techniques such as those in Cilk).

Sylvan still deserves his response so I guess that'll be in another couple days.

1. assuming you were excited about SharC, specifically
2. dragging Cyclone in too much into the discussion

For 1, I think it IS important to divide this into legacy vs. future r&d. Zach needs to decide which path he wants to explore for a career. I explained my thoughts on how his ideas might be better integrated with the kernel social processes.

For 2, I was mainly using it as a point of reference and trying to draw out many factors in what I think will amount to the next big system language, such as readability.

I don't see your point. The requirement for performance similarly steers SharC towards static analysis when possible -- is that good, bad, not novel, ...?

Or perhaps you think these checks can be fully inferred?

There are a couple of good papers on this - your thesis advisor wrote one of them.

Finally, again, shared memory abstractions -- with the additional challenge of a low-level language -- is not just about legacy code: I am writing new systems code from scratch and for future hardware that (needlessly?) hurts in this area. The Rust crew seems to as well.

But Singularity emphasizes resource protection, not sharing. You first describe communicate, and from that communicate you infer sharing. This is a radically different approach. So I didn't understand what you found exciting -- but it is clear now that "(type-safe) shared memory abstractions" are what get you hot and bothered, not necessarily typestate, substructural types, dependent types, tracked types, guarded types, linear types, separation logic, universes, whatever.

### With regards to your edit,

With regards to your edit, it seems you want to implement work stealing at the library level. While TBB does that (with a cost), most implementations don't. My above comment is about extending Rust to support these ideas at a language level, which has benefits atypical of library-level approaches (such as static checking of sharing between tasks). In particular, shared state work stealing would be a verified extension under what I described, presenting a possible evolutionary path if you do want Cilk-style work stealing.

### Not really. I was pointing

Not really. I was pointing out that unlike other languages you probably *couldn't* implement it as a library in Rust, because of the built-in overheads for domains.

### An "auto" domain might suffice

To let the runtime decide. We should consider this for Rust.

Again we are low-level systems programmers thinking about Amdahl's law and multicore applied to browser engine workloads (lexing, parsing, script, layout, rendering including image and video decoding).

We're definitely going to wire some tasks to CPUs. But this application shouldn't exclude other use-cases.

/be

### Yes, that would certainly

Yes, that would certainly help, though it still leaves you with the problem that Rust tasks are fairly heavy-weight compared to task parallel systems such as Cilk or TPL, which limits how much parallelism you could exploit, which directly hurts performance in accordance with Almdahl's law.

As I see it the parallel potential of a program is inversely proportional to the overhead of spawning parallel work. Less overhead means you can push the threshold for when it's worth spawning something in parallel further.

The problem is that constructs that have low overhead and are therefore good for *parallelism* are not flexible enough to use for *concurrency* (where the former has to do with speeding up something you'd prefer to write sequentially if you could, and the latter has to do with running things at the same time and you'd do it even if you only had one core). Maybe we need different constructs for these two different cases?

Or maybe tasks could start off as light weight tasks that don't have their own stack and run on a worker thread a la cilk, but if and when they encounter their first "yield" they are *converted* into much heavier-weight co-routine style lightweight threads with their own stack... That way short-lived jobs and long-lived threads could use the same "task" semantics, and let the runtime sort out how to represent it dynamically.

Although even if you got that working you'd still need data parallelism for even less overhead, when possible. I'm fairly convinced that there's no one paradigm that will solve this issue. We need different systems for different scenarios (at least on the runtime side, it could be that the "user interface" to this is using the same language construct).

### Small nit

Yes, that would certainly help, though it still leaves you with the problem that Rust tasks are fairly heavy-weight compared to task parallel systems such as Cilk or TPL, which limits how much parallelism you could exploit, which directly hurts performance in accordance with Almdahl's law.

You've mentioned Amdahl's law several times now. A more richer model is the Work and Span model. Just making sure you are aware of this, since Work and Span reformulates the formula given by Gene Amdahl into a model that actually corresponds to a model for concurrency.

### Abstract Intepretation

Some abstract interpretation frameworks which enable the same types of analyses as typestate. For instance, Near-Concrete Program Interpretation, which supports path-sensitive and context-sensitive analyses, which can be used to enforce temporal safety properties.