An operational and axiomatic semantics for non-determinism and sequence points in C

In a recent LtU discussion, naasking comments that "I always thought languages that don't specify evaluation order should classify possibly effectful expressions that assume an evaluation order to be errors". Recent work on the C language has provided reasonable formal tools to reason about evaluation order for C, which has very complex evaluation-order rules.

An operational and axiomatic semantics for non-determinism and sequence points in C
Robbert Krebbers

The C11 standard of the C programming language does not specify the execution order of expressions. Besides, to make more effective optimizations possible (e.g. delaying of side-effects and interleav- ing), it gives compilers in certain cases the freedom to use even more behaviors than just those of all execution orders.

Widely used C compilers actually exploit this freedom given by the C standard for optimizations, so it should be taken seriously in formal verification. This paper presents an operational and ax- iomatic semantics (based on separation logic) for non-determinism and sequence points in C. We prove soundness of our axiomatic se- mantics with respect to our operational semantics. This proof has been fully formalized using the Coq proof assistant.

One aspect of this work that I find particularly interesting is that it provides a program (separation) logic: there is a set of inference rules for a judgment of the form \(\Delta; J; R \vdash \{P\} s \{Q\}\), where \(s\) is a C statement and \(P, Q\) are logical pre,post-conditions such that if it holds, then the statement \(s\) has no undefined behavior related to expression evaluation order. This opens the door to practical verification that existing C program are safe in a very strong way (this is all validated in the Coq theorem prover).

Luca Cardelli Festschrift

Earlier this week Microsoft Research Cambridge organised a Festschrift for Luca Cardelli. The preface from the book:

Luca Cardelli has made exceptional contributions to the world of programming
languages and beyond. Throughout his career, he has re-invented himself every
decade or so, while continuing to make true innovations. His achievements span
many areas: software; language design, including experimental languages;
programming language foundations; and the interaction of programming languages
and biology. These achievements form the basis of his lasting scientific leadership
and his wide impact.
Luca is always asking "what is new", and is always looking to
the future. Therefore, we have asked authors to produce short pieces that would
indicate where they are today and where they are going. Some of the resulting
pieces are short scientific papers, or abridged versions of longer papers; others are
less technical, with thoughts on the past and ideas for the future. We hope that
they will all interest Luca.

Hopefully the videos will be posted soon.

Re-thinking Prolog

A recent paper by Oleg Kiselyov and Yukiyoshi Kameyama at the university of Tsukuba discusses weaknesses and areas for improvement to Prolog.

Quite many computations and models are mostly deterministic. Implementing them in Prolog with any acceptable performance requires the extensive use of problematic features such as cut. Purity is also compromised when interfacing with mainstream language libraries, which are deterministic and cannot run backwards. Divergence is the constant threat, forcing the Prolog programmers to forsake the declarative specification and program directly against the search strategy. All in all, Classical Prolog is the exquisite square peg in the world with mostly round holes

The strong points of Prolog can be brought into an ordinary functional programming language. Using OCaml as a representative, we implement lazy guessing as a library, with which we reproduce classical Prolog examples. Furthermore, we demonstrate parser combinators that use committed choice (maximal munch) and can still be run forwards and backwards. They cannot be written in Classical Prolog. Logic variables, unification, and its WAM compilation strategy naturally emerge as a "mere optimization" of the Herbrand universe enumeration.

The paper mentions the strength of the approach used by miniKanren (which embeds logic programming with fairer search strategy than normal Prolog into Scheme) and Hansei (which embeds probability based nondeterminism into Ocaml using delimited continuations to allow direct-style expression of monadic code).

After motivating some choices by studying the prototypical example of running append backwards they cover running parsers with "maximal munch" rule backwards - something that cannot be (declaratively) expressed in prolog.

A very interesting paper on logic programming! It also thanks Tom Schrijvers of CHR fame at the end.

Scratch jr

Scratch jr is an iPad version of the Scratch environment, designed with young kids in mind. It is the best kid-oriented programming tool I tried so far, and my five year old has great fun making "movies" with it. As I noted on twitter an hour after installing, the ability to record your own voice and use it for your sprites is a killer feature. Check it out!

Scala woes?

A fork in the back? See discussion over at HN. People in the know are encouraged to shed light on the situation.

Howard on Curry-Howard

Philip Wadler posts his exchange with William Howard on history of the Curry-Howard correspondence. Howard on Curry-Howard.

Cost semantics for functional languages

There is an ongoing discussion in LtU (there, and there) on whether RAM and other machine models are inherently a better basis to reason about (time and) memory usage than lambda-calculus and functional languages. Guy Blelloch and his colleagues have been doing very important work on this question that seems to have escaped LtU's notice so far.

A portion of the functional programming community has long been of the opinion that we do not need to refer to machines of the Turing tradition to reason about execution of functional programs. Dynamic semantics (which are often perceived as more abstract and elegant) are adequate, self-contained descriptions of computational behavior, which we can elevate to the status of (functional) machine model -- just like "abstract machines" can be seen as just machines.

This opinion has been made scientifically precise by various brands of work, including for example implicit (computational) complexity, resource analysis and cost semantics for functional languages. Guy Blelloch developed a family of cost semantics, which correspond to annotations of operational semantics of functional languages with new information that captures more intentional behavior of the computation: not only the result, but also running time, memory usage, degree of parallelism and, more recently, interaction with a memory cache. Cost semantics are self-contained way to think of the efficiency of functional programs; they can of course be put in correspondence with existing machine models, and Blelloch and his colleagues have proved a vast amount of two-way correspondences, with the occasional extra logarithmic overhead -- or, from another point of view, provided probably cost-effective implementations of functional languages in imperative languages and conversely.

This topic has been discussed by Robert Harper in two blog posts, Language and Machines which develops the general argument, and a second post on recent joint work by Guy and him on integrating cache-efficiency into the model. Harper also presents various cost semantics (called "cost dynamics") in his book "Practical Foundations for Programming Languages".

In chronological order, three papers that are representative of the evolution of this work are the following.

Parallelism In Sequential Functional Languages
Guy E. Blelloch and John Greiner, 1995.
This paper is focused on parallelism, but is also one of the earliest work carefully relating a lambda-calculus cost semantics with several machine models.

This paper formally studies the question of how much parallelism is available in call-by-value functional languages with no parallel extensions (i.e., the functional subsets of ML or Scheme). In particular we are interested in placing bounds on how much parallelism is available for various problems. To do this we introduce a complexity model, the PAL, based on the call-by-value lambda-calculus. The model is defined in terms of a profiling semantics and measures complexity in terms of the total work and the parallel depth of a computation. We describe a simulation of the A-PAL (the PAL extended with arithmetic operations) on various parallel machine models, including the butterfly, hypercube, and PRAM models and prove simulation bounds. In particular the simulations are work-efficient (the processor-time product on the machines is within a constant factor of the work on the A-PAL), and for P processors the slowdown (time on the machines divided by depth on the A-PAL) is proportional to at most O(log P). We also prove bounds for simulating the PRAM on the A-PAL.

Space Profiling for Functional Programs
Daniel Spoonhower, Guy E. Blelloch, Robert Harper, and Phillip B. Gibbons, 2011 (conference version 2008)

This paper clearly defines a notion of ideal memory usage (the set of store locations that are referenced by a value or an ongoing computation) that is highly reminiscent of garbage collection specifications, but without making any reference to an actual garbage collection implementation.

We present a semantic space profiler for parallel functional programs. Building on previous work in sequential profiling, our tools help programmers to relate runtime resource use back to program source code. Unlike many profiling tools, our profiler is based on a cost semantics. This provides a means to reason about performance without requiring a detailed understanding of the compiler or runtime system. It also provides a specification for language implementers. This is critical in that it enables us to separate cleanly the performance of the application from that of the language implementation. Some aspects of the implementation can have significant effects on performance. Our cost semantics enables programmers to understand the impact of different scheduling policies while hiding many of the details of their implementations. We show applications where the choice of scheduling policy has asymptotic effects on space use. We explain these use patterns through a demonstration of our tools. We also validate our methodology by observing similar performance in our implementation of a parallel extension of Standard ML

Cache and I/O efficient functional algorithms
Guy E. Blelloch, Robert Harper, 2013 (see also the shorter CACM version)

The cost semantics in this last work incorporates more notions from garbage collection, to reason about cache-efficient allocation of values -- in that it relies on work on formalizing garbage collection that has been mentioned on LtU before.

The widely studied I/O and ideal-cache models were developed to account for the large difference in costs to access memory at different levels of the memory hierarchy. Both models are based on a two level memory hierarchy with a fixed size primary memory (cache) of size \(M\), an unbounded secondary memory, and assume unit cost for transferring blocks of size \(B\) between the two. Many algorithms have been analyzed in these models and indeed these models predict the relative performance of algorithms much more accurately than the standard RAM model. The models, however, require specifying algorithms at a very low level requiring the user to carefully lay out their data in arrays in memory and manage their own memory allocation.

In this paper we present a cost model for analyzing the memory efficiency of algorithms expressed in a simple functional language. We show how many algorithms written in standard forms using just lists and trees (no arrays) and requiring no explicit memory layout or memory management are efficient in the model. We then describe an implementation of the language and show provable bounds for mapping the cost in our model to the cost in the ideal-cache model. These bound imply that purely functional programs based on lists and trees with no special attention to any details of memory layout can be as asymptotically as efficient as the carefully designed imperative I/O efficient algorithms. For example we describe an \(O(\frac{n}{B} \log_{M/B} \frac{n}{B})\) cost sorting algorithm, which is optimal in the ideal cache and I/O models.

Stream Processing with a Spreadsheet

ECOOP 2014 paper (distinguished) by Vaziri et. al, abstract:

Continuous data streams are ubiquitous and represent such a high volume of data that they cannot be stored to disk, yet it is often crucial for them to be analyzed in real-time. Stream processing is a programming paradigm that processes these immediately, and enables continuous analytics. Our objective is to make it easier for analysts, with little programming experience, to develop continuous analytics applications directly. We propose enhancing a spreadsheet, a pervasive tool, to obtain a programming platform for stream processing. We present the design and implementation of an enhanced spreadsheet that enables visualizing live streams, live programming to compute new streams, and exporting computations to be run on a server where they can be shared with other users, and persisted beyond the life of the spreadsheet. We formalize our core language, and present case studies that cover a range of stream processing applications.

Safely Composable Type-Specific Languages

Cyrus Omar, Darya Kurilova, Ligia Nistor, Benjamin Chung, Alex Potanin, and Jonathan Aldrich, "Safely Composable Type-Specific Languages", ECOOP14.

Programming languages often include specialized syntax for common datatypes (e.g. lists) and some also build in support for specific specialized datatypes (e.g. regular expressions), but user-defined types must use general-purpose syntax. Frustration with this causes developers to use strings, rather than structured data, with alarming frequency, leading to correctness, performance, security, and usability issues. Allowing library providers to modularly extend a language with new syntax could help address these issues. Unfortunately, prior mechanisms either limit expressiveness or are not safely composable: individually unambiguous extensions can still cause ambiguities when used together. We introduce type-specific languages (TSLs): logic associated with a type that determines how the bodies of generic literals, able to contain arbitrary syntax, are parsed and elaborated, hygienically. The TSL for a type is invoked only when a literal appears where a term of that type is expected, guaranteeing non-interference. We give evidence supporting the applicability of this approach and formally specify it with a bidirectionally typed elaboration semantics for the Wyvern programming language.

A Next Generation Smart Contract and Decentralized Application Platform

A Next Generation Smart Contract and Decentralized Application Platform, Vitalik Buterin.

When Satoshi Nakamoto first set the Bitcoin blockchain into motion in January 2009, he was simultaneously introducing two radical and untested concepts. The first is the "bitcoin", a decentralized peer-to-peer online currency that maintains a value without any backing, intrinsic value or central issuer. So far, the "bitcoin" as a currency unit has taken up the bulk of the public attention, both in terms of the political aspects of a currency without a central bank and its extreme upward and downward volatility in price. However, there is also another, equally important, part to Satoshi's grand experiment: the concept of a proof of work-based blockchain to allow for public agreement on the order of transactions. Bitcoin as an application can be described as a first-to-file system: if one entity has 50 BTC, and simultaneously sends the same 50 BTC to A and to B, only the transaction that gets confirmed first will process. There is no intrinsic way of determining from two transactions which came earlier, and for decades this stymied the development of decentralized digital currency. Satoshi's blockchain was the first credible decentralized solution. And now, attention is rapidly starting to shift toward this second part of Bitcoin's technology, and how the blockchain concept can be used for more than just money.

Commonly cited applications include using on-blockchain digital assets to represent custom currencies and financial instruments ("colored coins"), the ownership of an underlying physical device ("smart property"), non-fungible assets such as domain names ("Namecoin") as well as more advanced applications such as decentralized exchange, financial derivatives, peer-to-peer gambling and on-blockchain identity and reputation systems. Another important area of inquiry is "smart contracts" - systems which automatically move digital assets according to arbitrary pre-specified rules. For example, one might have a treasury contract of the form "A can withdraw up to X currency units per day, B can withdraw up to Y per day, A and B together can withdraw anything, and A can shut off B's ability to withdraw". The logical extension of this is decentralized autonomous organizations (DAOs) - long-term smart contracts that contain the assets and encode the bylaws of an entire organization. What Ethereum intends to provide is a blockchain with a built-in fully fledged Turing-complete programming language that can be used to create "contracts" that can be used to encode arbitrary state transition functions, allowing users to create any of the systems described above, as well as many others that we have not yet imagined, simply by writing up the logic in a few lines of code.

Includes code samples.