Theory
Objects to Unify Type Classes and GADTs, by Bruno C. d. S. Oliveira and Martin Sulzmann:
We propose an Haskell-like language with the goal of unifying type classes and generalized algebraic datatypes (GADTs) into a single class construct. We treat classes as first-class types and we use objects (instead of type class instances and data constructors) to define the values of those classes. We recover the ability to define functions by pattern matching by using sealed classes. The resulting language is simple and intuitive and it can be used to define, with similar convenience, the same programs that we would define in Haskell. Furthermore, unlike Haskell, dictionaries (or
objects) can be explicitly (as well as implicitly) passed to functions and we can program in a simple object-oriented style directly.
A very interesting paper on generalizing and unifying type classes and GADTs. Classes are now first-class values, resulting in a language that resembles a traditional, albeit more elegant, object-oriented language, and which supports a form of first-class "lightweight modules".
The language supports the traditional use of implicit type class dispatch, but also supports explicit dispatch, unlike Haskell. The authors suggest this could eliminate much of the duplication in the Haskell standard library of functions that take a type class or an explicit function, eg. insert/insertBy and sort/sortBy, although some syntactic sugar is needed to make this more concise.
Classes are open to extension by default, although a class can also be explicitly specified as "sealed", in which case extension is forbidden and you can pattern match on the cases. Furthermore, GADTs can now also carry methods, thus introducing dispatch to algebraic types. This fusion does not itself solve the expression problem, though it does ease the burden through the first-class support of both types of extension. You can see the Scala influences here.
I found this paper via the Haskell sub-reddit, which also links to a set of slides. The authors acknowledge Scala as an influence, and as future work, suggest extensions like type families and to support more module-like features, such as nesting and opaque types.
Resolving and Exploiting the k-CFA Paradox, Matthew Might, Yannis Smaragdakis, and David Van Horn. To appear in PLDI 2010.
Low-level program analysis is a fundamental problem, taking the shape of "flow analysis" in functional languages and "points-to" analysis in imperative and object-oriented (OO) languages. Despite the similarities, the vocabulary and results in the two communities remain largely distinct, with limited cross-understanding. One of the few links is Shivers's k-CFA work, which has advanced the concept of "context-sensitive analysis" and is widely known in both communities. Recent results, however, indicate that the relationship between the different incarnations of the analysis is not understood.
Van Horn and Mairson proved k-CFA for k ≥ 1 to be EXPTIME-complete, hence no polynomial algorithm exists. Yet there have been multiple polynomial formulations of context-sensitive points-to analyses in OO languages. Is functional k-CFA a profoundly different analysis from OO k-CFA? We resolve this paradox by showing that OO features conspire to make the exact same specification of k-CFA be polynomial-time: objects and closures are subtly different, in a way that interacts crucially with context-sensitivity. This leads to a significant practical result: by emulating the OO approximation, we derive a polynomial hierarchy of context-sensitive CFAs for functional programs, simultaneously achieving high precision and efficiency.
I learned that performance bounds on flow analysis were fascinating from earlier work by David van Horn and Harry Mairson, so it's good to see that this line of work is still being continued, and even better to see new algorithms come out of it.
Continuity Analysis of Programs, Swarat Chaudhuri, Sumit Galwani, and Roberto Lublinerman. POPL 2010.
We present an analysis to automatically determine if a program represents a continuous function, or equivalently, if infinitesimal changes to its inputs can only cause infinitesimal changes to its outputs. The analysis can be used to verify the robustness of programs whose inputs can have small amounts of error and uncertainty -- e.g., embedded controllers processing slightly unreliable sensor data, or handheld devices using slightly stale satellite data.
Continuity is a fundamental notion in mathematics. However, it is difficult to apply continuity proofs from real analysis to functions that are coded as imperative programs, especially when they use diverse data types and features such as assignments, branches, and loops. We associate data types with metric spaces as opposed to just sets of values, and continuity of typed programs is phrased in terms of these spaces. Our analysis reduces questions about continuity to verification conditions that do not refer to infinitesimal changes and can be discharged using off-the-shelf SMT solvers. Challenges arise in proving continuity of programs with branches and loops, as a small perturbation in the value of a variable often leads to divergent control-flow that can lead to large changes in values of variables. Our proof rules identify appropriate “synchronization points” between executions and their perturbed counterparts, and establish that values of certain variables converge back to the original results in spite of temporary divergence.
We prove our analysis sound with respect to the traditional epsilon-delta definition of continuity. We demonstrate the precision of our analysis by applying it to a range of classic algorithms, including algorithms for array sorting, shortest paths in graphs, minimum spanning trees, and combinatorial optimization. A prototype implementation based on the Z3 SMT-solver is also presented.
Another fun paper from POPL this year. I've seen metric spaces used to solve domain equations before, but the idea of actually considering a metric on the outputs was a new one to me.
Leo Meyerovich recently started a thread on LtU asking about Historical or sociological studies of programming language evolution?. I've been meaning to post a paper on this topic to LtU for awhile now, but simply cherrypicking for the opportune time to fit it into forum discussion. With Leo's question at hand, I give you an interesting paper that models language evolution, by artificial intelligence researcher Luc Steels. Steels has spent over 10 years researching this area, and his recent paper, The Recruitment Theory of Language Origins, summarizes one of his models for dealing with language evolution:
The recruitment theory of language origins argues that language users recruit and try out different strategies for solving the task of communication and retain those that maximise communicative success and cognitive economy. Each strategy requires specific cognitive neural mechanisms, which in themselves serve a wide range of purposes and therefore may have evolved or could be learned independently of language. The application of a strategy has an impact on the properties of the emergent language and this fixates the use of the strategy in the population. Although neurological evidence can be used to show that certain cognitive neural mechanisms are common to linguistic and non-linguistic tasks, this only shows that recruitment has happened, not why. To show the latter, we need models demonstrating that the recruitment of a particular strategy and hence the mechanisms to carry out this strategy lead to a better communication system. This paper gives concrete examples how such models can be built and shows the kinds of results that can be expected from them.
Simplicial Databases, David I. Spivak.
In this paper, we define a category DB, called the category of simplicial databases, whose objects are databases and whose morphisms are data-preserving maps. Along the way we give a precise formulation of the category of relational databases, and prove that it is a full subcategory of DB. We also prove that limits and colimits always exist in DB and that they correspond to queries such as select, join, union, etc. One feature of our construction is that the schema of a simplicial database has a natural geometric structure: an underlying simplicial set. The geometry of a schema is a way of keeping track of relationships between distinct tables, and can be thought of as a system of foreign keys. The shape of a schema is generally intuitive (e.g. the schema for round-trip flights is a circle consisting of an edge from $A$ to $B$ and an edge from $B$ to $A$), and as such, may be useful for analyzing data. We give several applications of our approach, as well as possible advantages it has over the relational model. We also indicate some directions for further research.
This is what happens when you try to take the existence of ORDER BY and COUNT in SQL seriously. :-)
If you're puzzled by how a geometric idea like simplexes could show up here, remember that the algebraic view of simplicial sets is as presheaves on the category of finite total orders and order-preserving maps. Every finite sequence gives rise to a total order on its set of positions, and tables have rows and columns as sequences!
Physics, Topology, Logic and Computation: A Rosetta Stone by John C. Baez and Mike Stay, 2009.
In physics, Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear
that underlying these diagrams is a powerful analogy between quantum physics and topology. Namely, a linear
operator behaves very much like a `cobordism': a manifold representing spacetime, going between two
manifolds representing space. This led to a burst of work on topological quantum field theory and `quantum
topology'. But this was just the beginning: similar diagrams can be used to reason about logic, where
they represent proofs, and computation, where they represent programs. With the rise of interest in quantum
cryptography and quantum computation, it became clear that there is extensive network of analogies between
physics, topology, logic and computation. In this expository paper, we make some of these analogies precise
using the concept of `closed symmetric monoidal category'. We assume no prior knowledge of category
theory, proof theory or computer science.
I am not sure whether this should be categorized as "Fun" instead of "Theory", given that "We assume no prior knowledge of category theory, proof theory or computer science".
At least one pair from the title (logic and computation) should ring some bells...
Semantic types: a fresh look at the ideal model for types, Jerome Vouillon and Paul-André Melliès. POPL 2004.
We present a generalization of the ideal model for recursive polymorphic types. Types are defined as sets of terms instead of sets of elements of a semantic domain. Our proof of the existence of types (computed by fixpoint of a typing operator) does not rely on metric properties, but on the fact that the identity is the limit of a sequence of projection terms. This establishes a connection with the work of Pitts on relational properties of domains. This also suggests that ideals are better understood as closed sets of terms defined by orthogonality with respect to a set of contexts.
A couple of days ago, we had a post about game models of linear logic, which led to a discussion of Girard's ludics. One of the key ingredients of ludics is the idea of biorthogonality, which is the idea that the semantics of types should be stable under biorthogonality. Now, the idea of biorthogonality is that for every term in a language has an orthogonal -- the set of client programs with which our term will run safely together. A set of terms is a biorthogonal when it is equal to the orthogonal of its orthogonal. The reason this idea is interesting is that it offers a way to sensibly pass back and forth between a local, proof-theoretic notion of "type as intro and elim rule" and a global, semantic notion of type as "predicate on terms".
This paper offers a relatively accessible entry point to those ideas, assuming you're familiar with the basic ideas behind either solving domain equations or doing logical relations proofs.
Abdulaziz Ghuloum and R. Kent Dybvig, Implicit Phasing for R6RS Libraries, Proc. ICFP 2007.
The forthcoming Revised Report on Scheme differs from previous reports in that the language it describes is structured as a set of libraries. It also provides a syntax for defining new portable libraries. The same library may export both procedure and hygienic macro definitions, which allows procedures and syntax to be freely intermixed, hidden, and exported.
This paper describes the design and implementation of a portable version of R6RS libraries that expands libraries into a core language compatible with existing R5RS implementations. Our implementation is characterized by its use of inference to determine when the bindings of an imported library are needed, e.g., run time or compile time, relieving programmers of the burden of declaring usage requirements explicitly.
R6RS leaves it up to implementations whether import statements need to explicitly state the meta-level into which bindings should be placed. The authors argue for letting the implementation automatically figure out the required meta-levels, and present an implementation-oriented description of how to do so, that they implemented portably. The paper also includes a well-written and detailed introduction to the issues involved, and since they want the community to adopt their solution, they seem to have worked extra hard to produce a convincing paper ;).
(via vieiro)
One of the themes of Barbara Liskov's Turing Award lectue ("CS History 101") was that nobody has invented a better programming concept than abstract data types. William Cook wrote a paper for OOPSLA '09 that looks at how well PLT'ers understand their own vocabulary, in particular abstract data types and concepts that on the syntactical surface blend to all seem like ADTs. The paper is On Understanding Data Abstraction, Revisited.
In 1985 Luca Cardelli and Peter Wegner, my advisor, published an ACM Computing Surveys paper called “On understanding types, data abstraction, and polymorphism”. Their work kicked off a flood of research on semantics and type theory for object-oriented programming, which continues to this day. Despite 25 years of research, there is still widespread confusion about the two forms of data abstraction, abstract data types and objects. This essay attempts to explain the differences and also why the differences matter.
The Introduction goes on to say:
What is the relationship between objects and abstract data types (ADTs)? I have asked this question to many groups of computer scientists over the last 20 years. I usually ask it at dinner, or over drinks. The typical response is a variant of “objects are a kind of abstract data type”. This response is consistent with most programming language textbooks.
[...]
So what is the point of asking this question? Everyone knows the answer. It’s in the textbooks.
[...]
My point is that the textbooks mentioned above are wrong! Objects and abstract data types are not the same thing, and neither one is a variation of the other.
Ergo, if the textbooks are wrong, then your Dinner Answer to (the) Cook is wrong! The rest of the paper explains how Cook makes computer scientists sing for their supper ;-)
When I’m inciting discussion of this topic over drinks, I don’t tell the the full story up front. It is more fun to keep asking questions as the group explores the topic. It is a lively discussion, because most of these ideas are documented in the literature and all the basic facts are known. What is interesting is that the conclusions to be drawn from the facts are not as widely known.
A Type-theoretic Foundation for Programming with Higher-order Abstract Syntax and First-class Substitutions by Brigitte Pientka, appeared in POPL 08.
Higher-order abstract syntax (HOAS) is a simple, powerful technique
for implementing object languages, since it directly supports
common and tricky routines dealing with variables, such as
capture-avoiding substitution and renaming. This is achieved by
representing binders in the object-language via binders in the meta-language.
However, enriching functional programming languages
with direct support for HOAS has been a major challenge, because
recursion over HOAS encodings requires one to traverse -
abstractions and necessitates programming with open objects.
We present a novel type-theoretic foundation based on contextual
modal types which allows us to recursively analyze open terms
via higher-order pattern matching. By design, variables occurring
in open terms can never escape their scope. Using several examples,
we demonstrate that our framework provides a name-safe foundation
to operations typically found in nominal systems. In contrast
to nominal systems however, we also support capture-avoiding
substitution operations and even provide first-class substitutions to
the programmer. The main contribution of this paper is a syntax directed
bi-directional type system where we distinguish between
the data language and the computation language together with the
progress and preservation proof for our language.
Its been a while since I posted, but this paper appears that it may be of interest to some members of this community. It looks interesting to me, but I just wish I understood all the terminology. I don't know what "open objects" are, and why they are a problem. I don't understand what HOAS is. I don't even know what binders are. The list goes on. I surely can't be the only person who is interested, but feels that this is just out of their grasp. I bet that I probably could understand these things with a minimum of explanation, given I have experience implementing languages. If anyone is interested in rephrasing the abstract in more basic terms, I would be very appreciative.
[Edit: corrected spelling of Brigitte Pientka. My apologies.]

|