Simon Peyton Jones elected into the Royal Society Fellowship

Simon Peyton Jones has been elected as a Fellow of the Royal Society. The Royal Society biography reads:


Simon's main research interest is in functional programming languages, their implementation, and their application. He was a key contributor to the design of the now-standard functional language Haskell, and is the lead designer of the widely-used Glasgow Haskell Compiler (GHC). He has written two textbooks about the implementation of functional languages.

More generally, Simon is interested in language design, rich type systems, compiler technology, code generation, runtime systems, virtual machines, and garbage collection. He is particularly motivated by direct use of principled theory to practical language design and implementation -- that is one reason he loves functional programming so much.

Simon is also chair of Computing at School, the grass-roots organisation that was at the epicentre of the 2014 reform of the English computing curriculum.

Congratulations SPJ!

Chez Scheme now open-source

Kent Dybvig (Cadence Research, Cisco Systems) has released the commercial scheme compiler Chez Scheme (scheme.com) as open source on GitHub. Chez Scheme is a native code generating optimizing compiler for R6RS focusing on performance and productivity. It supports cross-compilation, threading, and many other extensions. Current version is 9.4.

I'm excited to see what the community will build with this great tool.

https://github.com/cisco/chezscheme

Remora: An Array-Oriented Language with Static Rank Polymorphism

The paper: An Array-Oriented Language with Static Rank Polymorphism, Justin Slepak, Olin Shivers, and Panagiotis Manolios, Northeastern University, 2014.
Abstract. The array-computational model pioneered by Iverson’s lan- guages APL and J offers a simple and expressive solution to the “von Neumann bottleneck.” It includes a form of rank, or dimensional, poly- morphism, which renders much of a program’s control structure im- plicit by lifting base operators to higher-dimensional array structures. We present the first formal semantics for this model, along with the first static type system that captures the full power of the core language.

The formal dynamic semantics of our core language, Remora, illu- minates several of the murkier corners of the model. This allows us to resolve some of the model’s ad hoc elements in more general, regular ways. Among these, we can generalise the model from SIMD to MIMD computations, by extending the semantics to permit functions to be lifted to higher-dimensional arrays in the same way as their arguments.

Our static semantics, a dependent type system of carefully restricted power, is capable of describing array computations whose dimensions cannot be determined statically. The type-checking problem is decidable and the type system is accompanied by the usual soundness theorems. Our type system’s principal contribution is that it serves to extract the implicit control structure that provides so much of the language’s expres- sive power, making this structure explicitly apparent at compile time.

Type Checking Modular Multiple Dispatch with Parametric Polymorphism and Multiple Inheritance

Type Checking Modular Multiple Dispatch with Parametric Polymorphism and Multiple Inheritance by Eric Allen, Justin Hilburn, Scott Kilpatrick, Victor Luchangco, Sukyoung Ryu, David Chase, Guy L. Steele Jr.:

In previous work, we presented rules for defining overloaded functions that ensure type safety under symmetric multiple dispatch in an object-oriented language with multiple inheritance, and we showed how to check these rules without requiring the entire type hierarchy to be known, thus supporting modularity and extensibility. In this work, we extend these rules to a language that supports parametric polymorphism on both classes and functions.

In a multiple-inheritance language in which any type may be extended by types in other modules, some overloaded functions that might seem valid are correctly rejected by our rules. We explain how these functions can be permitted in a language that additionally supports an exclusion relation among types, allowing programmers to declare “nominal exclusions” and also implicitly imposing exclusion among different instances of each polymorphic type. We give rules for computing the exclusion relation, deriving many type exclusions from declared and implicit ones.

We also show how to check our rules for ensuring the safety of overloaded functions. In particular, we reduce the problem of handling parametric polymorphism to one of determining subtyping relationships among universal and existential types. Our system has been implemented as part of the open-source Fortress compiler.

Fortress was briefly covered here a couple of times, as were multimethods and multiple dispatch, but this paper really generalizes and nicely summarizes previous work on statically typed modular multimethods, and does a good job explaining the typing rules in an accessible way. The integration with parametric polymorphism I think is key to applying multimethods in other domains which may want modular multimethods, but not multiple inheritance.

The Formalization in COQ might also be of interest to some.

Also, another interesting point is Fortress' use of second-class intersection and union types to simplify type checking.

Usability of Programming Languages SIG at CHI'2016

A special-interest group meeting during the ACM CHI 2016 conference in San Jose, CA on the topic of the usability of programming languages. People are invited to attend!

To attend you must be registered for the CHI'2016 conference, and early registration ends March 14:

https://chi2016.acm.org/wp/registration/

For more information about the SIG, see:

http://www.programminglanguageusability.org

Abstract:

Programming languages form the interface between programmers (the users) and the computation that they desire the computer to
execute. Although studies exist for some aspects of programming language design (such as conditionals), other aspects have received little or no human factors evaluations. Designers thus have little they can rely on if they want to make new languages highly usable, and users cannot easily chose a language based on usability criteria. This SIG will bring together researchers and practitioners interested in increasing the depth and breadth of studies on the usability of programming languages, and ultimately in improving the usability of future languages.

C is Manly, Python is for “n00bs”: How False Stereotypes Turn Into Technical “Truths”

Jean Yang & Ari Rabkin C is Manly, Python is for “n00bs”: How False Stereotypes Turn Into Technical “Truths”, Model-View-Culture, January 2015.

This is a bit of a change of pace from the usual technically-focused content on LtU, but it seemed like something that might be of interest to LtUers nonetheless. Yang and Rabkin discuss the cultural baggage that comes along with a variety of languages, and the impact it has on how those languages are perceived and used.

"These preconceived biases arise because programming languages are as much social constructs as they are technical ones. A programming language, like a spoken language, is defined not just by syntax and semantics, but also by the people who use it and what they have written. Research shows that the community and libraries, rather than the technical features, are most important in determining the languages people choose. Scientists, for instance, use Python for the good libraries for scientific computing."

There are probably some interesting clues to how and why some languages are adopted while others fall into obscurity (a question that has come up here before). Also, the article includes references to a study conducted by Rabkin and LtU's own Leo Meyerovich.

Temporal Higher Order Contracts

Temporal Higher Order Contracts
Tim Disney, Cormac Flanagan, Jay McCarthy
2011

Behavioral contracts are embraced by software engineers because they document module interfaces, detect interface violations, and help identify faulty modules (packages, classes, functions, etc). This paper extends prior higher-order contract systems to also express and enforce temporal properties, which are common in software systems with imperative state, but which are mostly left implicit or are at best informally specified. The paper presents both a programmatic contract API as well as a temporal contract language, and reports on experience and performance results from implementing these contracts in Racket.

Our development formalizes module behavior as a trace of events such as function calls and returns. Our contract system provides both non-interference (where contracts cannot influence correct executions) and also a notion of completeness (where contracts can enforce any decidable, prefix-closed predicate on event traces).

This paper appears to be about a way to define (and enforce through dynamic monitoring) correctness properties of APIs by enforcing or ruling out certain orderings of function calls, such as calling a "read" method on a file descriptor after having called "close". I am personally not convinced that this specification language is a good way to solve these problems. However, the bulk of the paper is actually about giving a denotational semantics to contracts, as specifying a set of traces that the external interface of a component may expose (in a way strongly reminding of game semantics), and this feels like an important technique to reason about contracts. The exposition of this contribution is practical (based on a simple abstract machine) and accessible.

Programmatic and Direct Manipulation, Together at Last

A technical report by Ravi Chugh et al. Abstract:

We present the SKETCH-N-SKETCH editor for Scalable Vector Graphics (SVG) that integrates programmatic and direct manipulation, two modes of interaction with complementary strengths. In SKETCH-N-SKETCH, the user writes a program to generate an output SVG canvas. Then the user may directly manipulate the canvas while the system infers realtime updates to the program in order to match the changes to the output. To achieve this, we propose (i) a technique called trace-based program synthesis that takes program execution history into account in order to constrain the search space and (ii) heuristics for dealing with ambiguities. Based on our experience writing more than 40 examples and from the results of a study with 25 participants, we conclude that SKETCH-N-SKETCH provides a novel and effective work- flow between the boundaries of existing programmatic and direct manipulation systems.

This was demoed at PLDI to a lot of fanfare. Also see some videos. And a demo that you can actually play with, sweet!

Coordinated concurrent programming in Syndicate

Coordinated concurrent programming in Syndicate
Tony Garnock-Jones and Matthias Felleisen.
2016

Most programs interact with the world: via graphical user interfaces, networks, etc. This form of interactivity entails concurrency, and concurrent program components must coordinate their computations. This paper presents Syndicate, a novel design for a coordinated, concurrent programming language. Each concurrent component in Syndicate is a functional actor that participates in scoped conversations. The medium of conversation arranges for message exchanges and coordinates access to common knowledge. As such, Syndicate occupies a novel point in this design space, halfway between actors and threads.
If you want to understand the language, I would recommend looking at sections 1 to 2.2 (motivation and introducory examples) and then jumping at section 5, which presents fairly interesting designs for larger programs.

Concurrent program components must coordinate their computations to realize the overall goals of the program. This coordination takes two forms: the exchange of knowledge and the establishment of frame conditions. In addition, coordination must take into account that reactions to events may call for the creation of new concurrent components or that existing components may disappear due to exceptions or partial failures. In short, coordination poses a major problem to the proper design of effective communicating, concurrent components.

This paper presents Syndicate, a novel language for coordinated concurrent programming. A Syndicate program consists of functional actors that participate in precisely scoped conversations. So-called networks coordinate these conversations. When needed, they apply a functional actor to an event and its current state; in turn, they receive a new state plus descriptions of actions. These actions may represent messages for other participants in the conversations or assertions for a common space of knowledge.

Precise scoping implies a separation of distinct conversations, and hence existence of multiple networks. At the same time, an actor in one network may have to communicate with an actor in a different network. To accommodate such situations, Syndicate allows the embedding of one network into another as if the first were just an actor within the second. In other words, networks simultaneously scope and compose conversations. The resulting tree-structured shape of networked conversations corresponds both to tree-like arrangements of containers and processes in modern operating systems and to the nesting of layers in network protocols [1]. Syndicate thus unifies the programming techniques of distributed programming with those of coordinated concurrent programming.

By construction, Syndicate networks also manage resources. When a new actor appears in a conversation, a network allocates the necessary resources. When an actor fails, it deallocates the associated resources. In particular, it retracts all shared state associated with the actor, thereby making the failure visible to interested participants. Syndicate thus solves notorious problems of service discovery and resource management in the coordination of communicating components.

In sum, Syndicate occupies a novel point in the design space of coordinated concurrent (functional) components (sec. 2), sitting firmly between a thread- based world with sharing and local-state-only, message-passing actors. Our de- sign for Syndicate includes two additional contributions: an efficient protocol for incrementally maintaining the common knowledge base and a trie-based data structure for efficiently indexing into it (sec. 3). Finally, our paper presents eval- uations concerning the fundamental performance characteristics of Syndicate as well as its pragmatics (secs. 4 and 5).


Our examples illustrate the key properties of Syndicate and their unique combination. Firstly, the box and demand-matcher examples show that Syndicate conversations may involve many parties, generalizing the Actor model’s point-to-point conversations. At the same time, the file server example shows that Syndicate conversations are more precisely bounded than those of traditional Actors. Each of its networks crisply delimits its contained conversations, each of which may therefore use a task-appropriate language of discourse.

Secondly, all three examples demonstrate the shared-dataspace aspect of Syndicate. Assertions made by one actor can influence other actors, but cannot directly alter or remove assertions made by others. The box’s content is made visible through an assertion in the dataspace, and any actor that knows id can retrieve the assertion. The demand-matcher responds to changes in the dataspace that denote the existence of new conversations. The file server makes file contents available through assertions in the (outer) dataspace, in response to clients placing subscriptions in that dataspace.

Finally, Syndicate places an upper bound on the lifetimes of entries in the shared space. Items may be asserted and retracted by actors at will in response to incoming events, but when an actor crashes, all of its assertions are automatically retracted. If the box actor were to crash during a computation, the assertion describing its content would be visibly withdrawn, and peers could take some compensating action. The demand matcher can be enhanced to monitor supply as well as demand and to take corrective action if some worker instance exits unexpectedly. The combination of this temporal bound on assertions with Syndicate’s state change notifications gives good failure-signalling and fault-tolerance properties, improving on those seen in Erlang [7].


Syndicate draws directly on Network Calculus [2], which, in turn, has borrowed elements from Actor models [16,17,18], process calculi [19,20,21,22,23], and actor languages such as Erlang [7], Scala [24], E [25] and AmbientTalk [26].

This work makes a new connection to shared-dataspace coordination models [27], including languages such as Linda [28] and Concurrent ML (CML) [29]. Linda’s tuplespaces correspond to Syndicate’s dataspaces, but Linda is “generative,” meaning that its tuples take on independent existence once created. Syndicate’s assertions instead exist only as long as some actor continues to assert them, which provides a natural mechanism for managing resources and dealing with partial failures (sec. 2). Linda research on failure-handling focuses mostly on atomicity and transactions [30], though Rowstron introduces agent wills [31] and uses them to build a fault-tolerance mechanism. Turning to multiple tuplespaces, the Linda variants Klaim [32] and Lime [33] offer multiple spaces and different forms of mobility. Papadopoulos [34] surveys the many other variations; Syndicate’s non-mobile, hierarchical, nameless actors and networks occupy a hitherto unexplored point in this design space.

Some of the proposed designs were surprising to me. There is a reversal of perspective, from the usual application-centric view of applications being first, with lower-level services hidden under abstraction layers, to a more medium-directed perspective that puts the common communication layer first -- in the example of the TCP/IP stack, this is the OS kernel.

Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities

Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities
Luca Della Toffola, Michael Pradel, Thomas Gross
2015

Performance bugs are a prevalent problem and recent research proposes various techniques to identify such bugs. This paper addresses a kind of performance problem that often is easy to address but difficult to identify: redundant computations that may be avoided by reusing already computed results for particular inputs, a technique called memoization. To help developers find and use memoization opportunities, we present MemoizeIt, a dynamic analysis that identifies methods that repeatedly perform the same computation. The key idea is to compare inputs and outputs of method calls in a scalable yet precise way. To avoid the overhead of comparing objects at all method invocations in detail, MemoizeIt first compares objects without following any references and iteratively increases the depth of exploration while shrinking the set of considered methods. After each iteration, the approach ignores methods that cannot benefit from memoization, allowing it to analyze calls to the remaining methods in more detail. For every memoization opportunity that MemoizeIt detects, it provides hints on how to implement memoization, making it easy for the developer to fix the performance issue. Applying MemoizeIt to eleven real-world Java programs reveals nine profitable memoization opportunities, most of which are missed by traditional CPU time profilers, conservative compiler optimizations, and other existing approaches for finding performance bugs. Adding memoization as proposed by MemoizeIt leads to statistically significant speedups by factors between 1.04x and 12.93x.

This paper was recommended by Asumu Takikawa. It is a nice idea, which seems surprisingly effective. The examples they analysed (sadly they don't really explain how they picked the program to study) have a mix of memoization opportunities in fairly different parts of the codebase. There are several examples of what we could call "peripheral communication logic", eg. date formatting stuff, which we could assume is not studied really closely by the programmers focusing more on the problem-domain logic. But there is also an interesting of subtle domain-specific memoization opportunity: an existing cache was over-pessimistic and would reset itself at times where it was in fact not necessary, and this observation corresponds to a non-trivial program invariant.

The authors apparently had some difficulties finding program inputs to exercise profiling. Programs should more often be distributed with "performance-representative inputs", not just functionality-testing inputs. In one example of a linting tool, the provided "standard test" was to feed the code of the linting tool to itself. But this was under a default configuration for which the tools' developers had already fixed all alarms, so the alarm-creating code (which actually had an optimization opportunity) was never exercised by this proposed input.

Note that the caching performed is very lightweight, usually not a full tabulation of the function. Most examples just add a static variable to cache the last (input, output) pair, which is only useful when the same input is typically called several times in a row, but costs very little space.