A glimpse into a new general purpose programming language under development at Microsoft

Microsoft's Joe Duffy and team have been (quietly) working on a new programming language, based on C# (for productivity, safety), but leveraging C++ features (for performance). I think it's fair to say - and agree with Joe - that a nirvana for a modern general purpose language would be one that satisfies high productivity (ease of use, intuitive, high level) AND guaranteed (type)safety AND high execution performance. As Joe outlines in his blog post (not video!):

At a high level, I classify the language features into six primary categories:

1) Lifetime understanding. C++ has RAII, deterministic destruction, and efficient allocation of objects. C# and Java both coax developers into relying too heavily on the GC heap, and offers only “loose” support for deterministic destruction via IDisposable. Part of what my team does is regularly convert C# programs to this new language, and it’s not uncommon for us to encounter 30-50% time spent in GC. For servers, this kills throughput; for clients, it degrades the experience, by injecting latency into the interaction. We’ve stolen a page from C++ — in areas like rvalue references, move semantics, destruction, references / borrowing — and yet retained the necessary elements of safety, and merged them with ideas from functional languages. This allows us to aggressively stack allocate objects, deterministically destruct, and more.


2) Side-effects understanding. This is the evolution of what we published in OOPSLA 2012, giving you elements of C++ const (but again with safety), along with first class immutability and isolation.


3) Async programming at scale. The community has been ’round and ’round on this one, namely whether to use continuation-passing or lightweight blocking coroutines. This includes C# but also pretty much every other language on the planet. The key innovation here is a composable type-system that is agnostic to the execution model, and can map efficiently to either one. It would be arrogant to claim we’ve got the one right way to expose this stuff, but having experience with many other approaches, I love where we landed.


4) Type-safe systems programming. It’s commonly claimed that with type-safety comes an inherent loss of performance. It is true that bounds checking is non-negotiable, and that we prefer overflow checking by default. It’s surprising what a good optimizing compiler can do here, versus JIT compiling. (And one only needs to casually audit some recent security bulletins to see why these features have merit.) Other areas include allowing you to do more without allocating. Like having lambda-based APIs that can be called with zero allocations (rather than the usual two: one for the delegate, one for the display). And being able to easily carve out sub-arrays and sub-strings without allocating.


5) Modern error model. This is another one that the community disagrees about. We have picked what I believe to be the sweet spot: contracts everywhere (preconditions, postconditions, invariants, assertions, etc), fail-fast as the default policy, exceptions for the rare dynamic failure (parsing, I/O, etc), and typed exceptions only when you absolutely need rich exceptions. All integrated into the type system in a 1st class way, so that you get all the proper subtyping behavior necessary to make it safe and sound.


6) Modern frameworks. This is a catch-all bucket that covers things like async LINQ, improved enumerator support that competes with C++ iterators in performance and doesn’t demand double-interface dispatch to extract elements, etc. To be entirely honest, this is the area we have the biggest list of “designed but not yet implemented features”, spanning things like void-as-a-1st-class-type, non-null types, traits, 1st class effect typing, and more. I expect us to have a few of these in our mid-2014 checkpoint, but not all of them.


What do you think?

Print release of a textbook on the Coq proof assistant

For a few years now, I've been working on a textbook introducing the Coq proof assistant. It's been available freely online, and I'd like to announce now the availability of a print version from MIT Press. The site I've linked to includes links to order the book online.

Quick context on why LtUers might be interested in Coq: it supports machine checking of mathematical proofs, including in program verification and PL metatheory, some of the most popular applications of proof assistant technology.

Quick context on what distinguishes this book from other Coq resources: it focuses on the engineering techniques to develop large formal developments effectively. It turns out that there are some reusable lessons on how to write formal proofs so that they tend to continue to work even when theorem statements change over the courses of projects.

I'm grateful to MIT Press for agreeing to this experiment where I may continue distributing free versions of the book online.

Call for Participation: Programming Languages Mentoring Workshop

Alan Schmitt just posted an invitation to participate in this event which will take place at POPL. I think anyone who can attend should.

R7RS-small draft ratified by Steering Committee

R7RS-small draft was ratified by Steering Committee a few days ago at the Scheme 2013 workshop.

The announcement is here.

The final draft is here (PDF).

The origin of zero-based array indexing

An amusing historical analysis of the origin of zero based array indexing (hint: C wasn't the first). There's a twist to the story which I won't reveal, so as not to spoil the story for you. All in all, it's a nice anecdote, but it seems to me that many of the objections raised in the comments are valid.

MOOC: Paradigms of Computer Programming

The Université catholique de Louvain has joined the edX consortium this year, and as part of edX Peter Van Roy is preparing a MOOC (Massive Open Online Course) called Paradigms of Computer Programming starting next February.

As you'd expect the course uses the CTM book and is based on the course Peter has been teaching, it will thus present a multi-paradigm approach to programming and include non-traditional computational models such as the deterministic dataflow model for concurrent programming.

I wonder who will end up signing up for this course. I think the option of auditing might appeal to folks who found CTM interesting but are way beyond the category of beginning programmers for whom the course is officially designed.

Python and Scientific Computing

This interesting blog post argues that in recent years Python has gained libraries making it the choice language for scientific computing (over MATLAB and R primarily).

I find the details discussed in the post interesting. Two small points that occur to me are that in several domains Mathematica is still the tool of choice. From what I could see nothing free, let alone open source, is even in the same ballpark in these cases. Second, I find it interesting that several of the people commenting mentioned IPython. It seems to be gaining ground as the primary environment many people use.

Pure Subtype Systems

Pure Subtype Systems, by DeLesley S. Hutchins:

This paper introduces a new approach to type theory called pure subtype systems. Pure subtype systems differ from traditional approaches to type theory (such as pure type systems) because the theory is based on subtyping, rather than typing. Proper types and typing are completely absent from the theory; the subtype relation is defined directly over objects. The traditional typing relation is shown to be a special case of subtyping, so the loss of types comes without any loss of generality.

Pure subtype systems provide a uniform framework which seamlessly integrates subtyping with dependent and singleton types. The framework was designed as a theoretical foundation for several problems of practical interest, including mixin modules, virtual classes, and feature-oriented programming.

The cost of using pure subtype systems is the complexity of the meta-theory. We formulate the subtype relation as an abstract reduction system, and show that the theory is sound if the underlying reductions commute. We are able to show that the reductions commute locally, but have thus far been unable to show that they commute globally. Although the proof is incomplete, it is "close enough" to rule out obvious counter-examples. We present it as an open problem in type theory.

A thought-provoking take on type theory using subtyping as the foundation for all relations. He collapses the type hierarchy and unifies types and terms via the subtyping relation. This also has the side-effect of combining type checking and partial evaluation. Functions can accept "types" and can also return "types".

Of course, it's not all sunshine and roses. As the abstract explains, the metatheory is quite complicated and soundness is still an open question. Not too surprising considering type checking Type:Type is undecidable.

Hutchins' thesis is also available for a more thorough treatment. This work is all in pursuit of Hitchens' goal of feature-oriented programming.

John C. Reynolds Doctoral Dissertation Award (SIGPLAN)

Presented annually to the author of the outstanding doctoral dissertation in the area of Programming Languages. The award includes a prize of $1,000. The winner can choose to receive the award at ICFP, OOPSLA, POPL, or PLDI.

Nominations must be submitted to the secretary of SIGPLAN by January 5th 2014 to be considered for this year's award. The nominated dissertation must be available in an English language version to facilitate evaluation by the selection committee and have been awarded in the year 2013.

I know of some outstanding dissertations. I am sure you do to. So why not nominate them for this honor (for further details see here)?

Taking Off the Gloves with Reference Counting Immix

Taking Off the Gloves with Reference Counting Immix, by Rifat Shahriyar, Stephen M. Blackburn, and Kathryn S. McKinley:

Despite some clear advantages and recent advances, reference counting remains a poor cousin to high-performance tracing garbage collectors. The advantages of reference counting include a) immediacy of reclamation, b) incrementality, and c) local scope of its operations. After decades of languishing with hopelessly bad performance, recent work narrowed the gap between reference counting and the fastest tracing collectors to within 10%. Though a major advance, this gap remains a substantial barrier to adoption in performance-conscious application domains. Our work identifies heap organization as the principal source of the remaining performance gap. We present the design, implementation, and analysis of a new collector, RCImmix, that replaces reference counting’s traditional free-list heap organization with the line and block heap structure introduced by the Immix collector. The key innovations of RCImmix are 1) to combine traditional reference counts with per-line live object counts to identify reusable memory and 2) to eliminate fragmentation by integrating copying with reference counting of new objects and with backup tracing cycle collection. In RCImmix, reference counting offers efficient collection and the line and block heap organization delivers excellent mutator locality and efficient allocation. With these advances, RCImmix closes the 10% performance gap, outperforming a highly tuned production generational collector. By removing the performance barrier, this work transforms reference counting into a serious alternative for meeting high performance objectives for garbage collected languages.

A new reference counting GC based on the Immix heap layout, which purports to close the remaining performance gap with tracing collectors. It builds on last year's work, Down for the count? Getting reference counting back in the ring, which describes various optimizations to raw reference counting that make it competitive with basic tracing. There's a remaining 10% performance gap with generational tracing that RCImmix closes by using the Immix heap layout with bump pointer allocation (as opposed to free lists typically used in RC). The improved cache locality of allocation makes RCImmix even faster than the generational tracing Immix collector.

However, the bump pointer allocation reduces the incrementality of reference counting and would impact latency. One glaring omission of this paper is the absence of latency/pause time measurements, which is typical of reference counting papers since ref counting is inherently incremental. Since RCImmix trades off some incrementality for throughput by using bump pointer allocation and copy collection, I'm curious how this impacts the pause times.

Reference counting has been discussed a few times here before, and some papers on past ref-counting GC's have been posted in comments, but this seems to be the first top-level post on competitive reference counting GC.