Theory

The Structure of Authority: Why security is not a separable concern

The Structure of Authority: Why security is not a separable concern, by Mark S. Miller, Bill Tulloh, and Jonathan Shapiro:

Common programming practice grants excess authority for the sake of functionality; programming principles require least authority for the sake of security. If we practice our principles, we could have both security and functionality. Treating security as a separate concern has not succeeded in bridging the gap between principle and practice, because it operates without knowledge of what constitutes least authority. Only when requests are made -- whether by humans acting through a user interface, or by one object invoking another -- can we determine how much authority is adequate. Without this knowledge, we must provide programs with enough authority to do anything they might be requested to do.

We examine the practice of least authority at four major layers of abstraction -- from humans in an organization down to individual objects within a programming language. We explain the special role of object-capability languages -- such as E or the proposed Oz-E -- in supporting practical least authority.

An important overview of why security properties cannot be an after-thought for any platform, languages and operating systems included. To this end, the paper covers security properties at various granularities from desktop down to object-level granularity, and how object-capabilities provide security properties that are compositional, and permit safely composing mutually suspicious programs.

A recent LtU discussion on achieving security by built-in object-capabilities vs. building security frameworks as libraries reminded me of this paper. Ultimately, the library approach can work assuming side-effects are properly controlled via some mechanism, ie. effect types or monads, but any solution should conform to object capability principles to maintain safe composition.

An example of a capability-secure legacy/library approach is Plash (Principle of Least Authority SHell), which provides object-specific file system name spaces. Any library interface to the file system should mimic this file system virtualization, which effectively pushes side-effect control down to OS-level objects, and which is essential to safely composing mutually suspicious programs that access the file system.

A Formal System For Euclid's Elements

A Formal System For Euclid's Elements, Jeremy Avigad, Edward Dean, and John Mumma. Review of Symbolic Logic, Vol. 2, No. 4, 2009.

Abstract. We present a formal system, E, which provides a faithful model of the proofs in Euclid’s Elements, including the use of diagrammatic reasoning.

Diagrammatic languages are a perennial favorite discussion topic here, and Euclid's proofs constitute one of the oldest diagrammatic languages around. And yet for hundreds of years (at least since Leibniz) people have argued about whether or not the diagrams are really part of a formal system of reasoning, or whether they are simply visual aids hanging on the side of the true proof. The latter position is the one that Hilbert and Tarski took as well when they gave formal axiomatic systems for geometry.

But was this necessary, or just a contingent fact of the logical machinery available to them? Avigad and his coauthors show the former point of view also works, and that you can do it with very basic proof theory (there's little here unfamiliar to anyone who has read Pierce's book). Yet it sheds a lot of light on how the diagrams in the Elements work, in part because of their very careful analysis of how to read the diagrams -- that is, what conclusion a diagram really licenses you to draw, and which ones are accidents of the specific figure on the page. How they consider these issues is a good model for anyone designing their own visual programming languages.

Objects to Unify Type Classes and GADTs

Objects to Unify Type Classes and GADTs, by Bruno C. d. S. Oliveira and Martin Sulzmann:

We propose an Haskell-like language with the goal of unifying type classes and generalized algebraic datatypes (GADTs) into a single class construct. We treat classes as first-class types and we use objects (instead of type class instances and data constructors) to define the values of those classes. We recover the ability to define functions by pattern matching by using sealed classes. The resulting language is simple and intuitive and it can be used to define, with similar convenience, the same programs that we would define in Haskell. Furthermore, unlike Haskell, dictionaries (or
objects) can be explicitly (as well as implicitly) passed to functions and we can program in a simple object-oriented style directly.

A very interesting paper on generalizing and unifying type classes and GADTs. Classes are now first-class values, resulting in a language that resembles a traditional, albeit more elegant, object-oriented language, and which supports a form of first-class "lightweight modules".

The language supports the traditional use of implicit type class dispatch, but also supports explicit dispatch, unlike Haskell. The authors suggest this could eliminate much of the duplication in the Haskell standard library of functions that take a type class or an explicit function, eg. insert/insertBy and sort/sortBy, although some syntactic sugar is needed to make this more concise.

Classes are open to extension by default, although a class can also be explicitly specified as "sealed", in which case extension is forbidden and you can pattern match on the cases. Furthermore, GADTs can now also carry methods, thus introducing dispatch to algebraic types. This fusion does not itself solve the expression problem, though it does ease the burden through the first-class support of both types of extension. You can see the Scala influences here.

I found this paper via the Haskell sub-reddit, which also links to a set of slides. The authors acknowledge Scala as an influence, and as future work, suggest extensions like type families and to support more module-like features, such as nesting and opaque types.

Resolving and Exploiting the k-CFA Paradox

Resolving and Exploiting the k-CFA Paradox, Matthew Might, Yannis Smaragdakis, and David Van Horn. To appear in PLDI 2010.

Low-level program analysis is a fundamental problem, taking the shape of "flow analysis" in functional languages and "points-to" analysis in imperative and object-oriented (OO) languages. Despite the similarities, the vocabulary and results in the two communities remain largely distinct, with limited cross-understanding. One of the few links is Shivers's k-CFA work, which has advanced the concept of "context-sensitive analysis" and is widely known in both communities. Recent results, however, indicate that the relationship between the different incarnations of the analysis is not understood.

Van Horn and Mairson proved k-CFA for k ≥ 1 to be EXPTIME-complete, hence no polynomial algorithm exists. Yet there have been multiple polynomial formulations of context-sensitive points-to analyses in OO languages. Is functional k-CFA a profoundly different analysis from OO k-CFA? We resolve this paradox by showing that OO features conspire to make the exact same specification of k-CFA be polynomial-time: objects and closures are subtly different, in a way that interacts crucially with context-sensitivity. This leads to a significant practical result: by emulating the OO approximation, we derive a polynomial hierarchy of context-sensitive CFAs for functional programs, simultaneously achieving high precision and efficiency.

I learned that performance bounds on flow analysis were fascinating from earlier work by David van Horn and Harry Mairson, so it's good to see that this line of work is still being continued, and even better to see new algorithms come out of it.

Continuity Analysis of Programs

Continuity Analysis of Programs, Swarat Chaudhuri, Sumit Galwani, and Roberto Lublinerman. POPL 2010.

We present an analysis to automatically determine if a program represents a continuous function, or equivalently, if infinitesimal changes to its inputs can only cause infinitesimal changes to its outputs. The analysis can be used to verify the robustness of programs whose inputs can have small amounts of error and uncertainty -- e.g., embedded controllers processing slightly unreliable sensor data, or handheld devices using slightly stale satellite data.

Continuity is a fundamental notion in mathematics. However, it is difficult to apply continuity proofs from real analysis to functions that are coded as imperative programs, especially when they use diverse data types and features such as assignments, branches, and loops. We associate data types with metric spaces as opposed to just sets of values, and continuity of typed programs is phrased in terms of these spaces. Our analysis reduces questions about continuity to verification conditions that do not refer to infinitesimal changes and can be discharged using off-the-shelf SMT solvers. Challenges arise in proving continuity of programs with branches and loops, as a small perturbation in the value of a variable often leads to divergent control-flow that can lead to large changes in values of variables. Our proof rules identify appropriate “synchronization points” between executions and their perturbed counterparts, and establish that values of certain variables converge back to the original results in spite of temporary divergence.

We prove our analysis sound with respect to the traditional epsilon-delta definition of continuity. We demonstrate the precision of our analysis by applying it to a range of classic algorithms, including algorithms for array sorting, shortest paths in graphs, minimum spanning trees, and combinatorial optimization. A prototype implementation based on the Z3 SMT-solver is also presented.

Another fun paper from POPL this year. I've seen metric spaces used to solve domain equations before, but the idea of actually considering a metric on the outputs was a new one to me.

The Recruitment Theory of Language Origins

Leo Meyerovich recently started a thread on LtU asking about Historical or sociological studies of programming language evolution?. I've been meaning to post a paper on this topic to LtU for awhile now, but simply cherrypicking for the opportune time to fit it into forum discussion. With Leo's question at hand, I give you an interesting paper that models language evolution, by artificial intelligence researcher Luc Steels. Steels has spent over 10 years researching this area, and his recent paper, The Recruitment Theory of Language Origins, summarizes one of his models for dealing with language evolution:

The recruitment theory of language origins argues that language users recruit and try out different strategies for solving the task of communication and retain those that maximise communicative success and cognitive economy. Each strategy requires specific cognitive neural mechanisms, which in themselves serve a wide range of purposes and therefore may have evolved or could be learned independently of language. The application of a strategy has an impact on the properties of the emergent language and this fixates the use of the strategy in the population. Although neurological evidence can be used to show that certain cognitive neural mechanisms are common to linguistic and non-linguistic tasks, this only shows that recruitment has happened, not why. To show the latter, we need models demonstrating that the recruitment of a particular strategy and hence the mechanisms to carry out this strategy lead to a better communication system. This paper gives concrete examples how such models can be built and shows the kinds of results that can be expected from them.

Simplicial Databases

Simplicial Databases, David I. Spivak.

In this paper, we define a category DB, called the category of simplicial databases, whose objects are databases and whose morphisms are data-preserving maps. Along the way we give a precise formulation of the category of relational databases, and prove that it is a full subcategory of DB. We also prove that limits and colimits always exist in DB and that they correspond to queries such as select, join, union, etc. One feature of our construction is that the schema of a simplicial database has a natural geometric structure: an underlying simplicial set. The geometry of a schema is a way of keeping track of relationships between distinct tables, and can be thought of as a system of foreign keys. The shape of a schema is generally intuitive (e.g. the schema for round-trip flights is a circle consisting of an edge from $A$ to $B$ and an edge from $B$ to $A$), and as such, may be useful for analyzing data. We give several applications of our approach, as well as possible advantages it has over the relational model. We also indicate some directions for further research.

This is what happens when you try to take the existence of ORDER BY and COUNT in SQL seriously. :-)

If you're puzzled by how a geometric idea like simplexes could show up here, remember that the algebraic view of simplicial sets is as presheaves on the category of finite total orders and order-preserving maps. Every finite sequence gives rise to a total order on its set of positions, and tables have rows and columns as sequences!

Physics, Topology, Logic and Computation: A Rosetta Stone

Physics, Topology, Logic and Computation: A Rosetta Stone by John C. Baez and Mike Stay, 2009.

In physics, Feynman diagrams are used to reason about quantum processes. In the 1980s, it became clear
that underlying these diagrams is a powerful analogy between quantum physics and topology. Namely, a linear
operator behaves very much like a `cobordism': a manifold representing spacetime, going between two
manifolds representing space. This led to a burst of work on topological quantum field theory and `quantum
topology'. But this was just the beginning: similar diagrams can be used to reason about logic, where
they represent proofs, and computation, where they represent programs. With the rise of interest in quantum
cryptography and quantum computation, it became clear that there is extensive network of analogies between
physics, topology, logic and computation. In this expository paper, we make some of these analogies precise
using the concept of `closed symmetric monoidal category'. We assume no prior knowledge of category
theory, proof theory or computer science.

I am not sure whether this should be categorized as "Fun" instead of "Theory", given that "We assume no prior knowledge of category theory, proof theory or computer science".

At least one pair from the title (logic and computation) should ring some bells...

Semantic types: a fresh look at the ideal model for types

Semantic types: a fresh look at the ideal model for types, Jerome Vouillon and Paul-André Melliès. POPL 2004.

We present a generalization of the ideal model for recursive polymorphic types. Types are defined as sets of terms instead of sets of elements of a semantic domain. Our proof of the existence of types (computed by fixpoint of a typing operator) does not rely on metric properties, but on the fact that the identity is the limit of a sequence of projection terms. This establishes a connection with the work of Pitts on relational properties of domains. This also suggests that ideals are better understood as closed sets of terms defined by orthogonality with respect to a set of contexts.

A couple of days ago, we had a post about game models of linear logic, which led to a discussion of Girard's ludics. One of the key ingredients of ludics is the idea of biorthogonality, which is the idea that the semantics of types should be stable under biorthogonality. Now, the idea of biorthogonality is that for every term in a language has an orthogonal -- the set of client programs with which our term will run safely together. A set of terms is a biorthogonal when it is equal to the orthogonal of its orthogonal. The reason this idea is interesting is that it offers a way to sensibly pass back and forth between a local, proof-theoretic notion of "type as intro and elim rule" and a global, semantic notion of type as "predicate on terms".

This paper offers a relatively accessible entry point to those ideas, assuming you're familiar with the basic ideas behind either solving domain equations or doing logical relations proofs.

Implicit Phasing for R6RS Libraries

Abdulaziz Ghuloum and R. Kent Dybvig, Implicit Phasing for R6RS Libraries, Proc. ICFP 2007.

The forthcoming Revised Report on Scheme differs from previous reports in that the language it describes is structured as a set of libraries. It also provides a syntax for defining new portable libraries. The same library may export both procedure and hygienic macro definitions, which allows procedures and syntax to be freely intermixed, hidden, and exported.

This paper describes the design and implementation of a portable version of R6RS libraries that expands libraries into a core language compatible with existing R5RS implementations. Our implementation is characterized by its use of inference to determine when the bindings of an imported library are needed, e.g., run time or compile time, relieving programmers of the burden of declaring usage requirements explicitly.

R6RS leaves it up to implementations whether import statements need to explicitly state the meta-level into which bindings should be placed. The authors argue for letting the implementation automatically figure out the required meta-levels, and present an implementation-oriented description of how to do so, that they implemented portably. The paper also includes a well-written and detailed introduction to the issues involved, and since they want the community to adopt their solution, they seem to have worked extra hard to produce a convincing paper ;).

(via vieiro)

XML feed