DSL

Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams

Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams, John Whaley and Monica S. Lam. PLDI 2004.

This paper presents the first scalable context-sensitive, inclusion-based pointer alias analysis for Java programs. Our approach to context sensitivity is to create a clone of a method for every context of interest, and run a context-insensitive algorithm over the ex-panded call graph to get context-sensitive results. For precision, we generate a clone for every acyclic path through a program's call graph, treating methods in a strongly connected component as a single node. Normally, this formulation is hopelessly intractable as a call graph often has 10^14 acyclic paths or more. We show that these exponential relations can be computed efficiently using binary decision diagrams (BDDs). Key to the scalability of the technique is a context numbering scheme that exposes the commonalities across contexts. We applied our algorithm to the most popular applications available on Sourceforge, and found that the largest programs, with hundreds of thousands of Java bytecodes, can be analyzed in under 20 minutes.

This paper shows that pointer analysis, and many other queries and algorithms, can be described succinctly and declaratively using Datalog, a logic programming language. We have developed a system called bddbddb that automatically translates Datalog programs into highly efficient BDD implementations. We used this approach to develop a variety of context-sensitive algorithms including side effect analysis, type analysis, and escape analysis.

Binary decision diagrams are one of the coolest data structures of the last 20 years, and are also one of the basic enabling data structures underlying modern SAT solvers. Using them to implement Datalog is a really clever idea.

EDIT: I relied on memory instead of Google, and got it wrong about BDDs and SAT solving. Modern DPLL-based SAT solvers generally do not use BDDs, because when your solutions are far apart in the solution space the representing BDD would get quite large. BDDs are widely used in verification, though, and are also still one my favorite data structures. :)

Apache Camel routing rules: a DSL?

Apache Camel is a powerful rule based routing and mediation engine which provides a POJO based implementation of the Enterprise Integration Patterns using an extremely powerful fluent API (or declarative Java Domain Specific Language) to configure routing and mediation rules.

The authors probably meant embedded DSL. The examples of its use feel remarkably similar to "criteria queries" DSLs, as popularized in Java by Hibernate (and yes, many others as well).

To start a discussion - where is the boundary between an API and an embedded DSL? Is it in the eye of beholder or are there some objective differences? Some APIs have a strong feel of a DSL - remarkably various parser libraries in Haskell. Is it the quality of the host language (Haskell) or the domain (parsers)?

Realization of natural language interfaces using lazy functional programming

Realization of natural language interfaces using lazy functional programming, Richard Frost, ACM Computing Surveys, Volume 38, Issue 4 (2006).

The construction of natural language interfaces to computers continues to be a major challenge. The need for such interfaces is growing now that speech recognition technology is becoming more readily available, and people cannot speak those computer-oriented formal languages that are frequently used to interact with computer applications. Much of the research related to the design and implementation of natural language interfaces has involved the use of high-level declarative programming languages. This is to be expected as the task is extremely difficult, involving syntactic and semantic analysis of potentially ambiguous input. The use of LISP and Prolog in this area is well documented. However, research involving the relatively new lazy functional programming paradigm is less well known. This paper provides a comprehensive survey of that research.

Amazon Flexible Payments Service

When I heard of Amazon FPS (an overview of which can be found here), I knew a DSL must be hidden somewhere.

And indeed, the GateKeeper language is a special-purpose language for specifying (declarative) payment instructions.

Once again we see the importance of language design skills in today's marketplace...

So what do LtU readers think of the design of the GateKeeper language?

Application-specific foreign-interface generation

Application-specific foreign-interface generation, John Reppy and Chunyan Song, October 2006.

A foreign interface (FI) mechanism to support interoperability with libraries written in other languages (especially C) is an important feature in most high-level language implementations. Such FI mechanisms provide a Foreign Function Interface (FFI) for the high-level language to call C functions and marshaling and unmarshaling mechanisms to support conversion between the high-level and C data representations. Often, systems provide tools to automate the generation of FIs, but these tools typically lock the user into a specific model of interoperability. It is our belief that the policy used to craft the mapping between the high-level language and C should be distinct from the underlying mechanism used to implement the mapping.

In this paper, we describe a FI generation tool, called FIG (for Foreign Interface Generator) that embodies a new approach to the problem of generating foreign interfaces for high-level languages. FIG takes as input raw C header files plus a declarative script that specifies the generation of the foreign interface from the header file. The script sets the policy for the translation, which allows the user to tailor the resulting FI to his or her application. We call this approach application-specific foreign-interface generation. The scripting language uses rewriting strategies as its execution model. The other major feature of the scripting language is a novel notion of composable typemaps that describe the mapping between high-level and low-level types.

FFIs are a perennial engineering problem, and it's very nice to see progress being made on automating what's automatable about building interfaces. Their interface specification language is built from two little DSLs. The first one is a language that for specifying how to map low level types to high level types, and the second one is a rewriting-based language for translating API functions, which makes use of the type mapping programs you defined earlier. The whole thing is quite pretty, and seems to read very well.

An interesting gimme for you stack-language fans: the DSL that Reppy and Song use to specify type mappings from low-level to high-level types is a combinator-based language that reads a bit like Forth or Postscript.

Derivatives of Regular Expressions

Derivatives of Regular Expressions, Janusz Brzozowski, Journal of the ACM 1964.

Kleene's regular expressions, which can be used for describing sequential circuits, were defined using three operators (union, concatenation and iterate) on sets of sequences. Word descriptions of problems can be more easily put in the regular expression language if the language is enriched by the inclusion of other logical operations. However, in the problem of converting the regular expression description to a state diagram, the existing methods either cannot handle expressions with additional operators, or are made quite complicated by the presence of such operators. In this paper the notion of a derivative of a regular expression is introduced and the properties of derivatives are discussed. This leads, in a very natural way, to the construction of a state diagram from a regular expression containing any number of logical operators.

This is one of my favorite papers. It describes a very cute algorithm for building deterministic finite automata directly from a regular expression. The key trick is the idea of a derivative of a regular expression with respect to a string, which is a non-obvious but fun idea.

Note: This is an ACM DL link; I couldn't find the paper freely available online. :(

A Functional Description of TeX's Formula Layout

A Functional Description of TeX's Formula Layout, Reinhold Heckmann and Reinhard Wilhelm, Journal of Functional Programming 1997.

While the quality of the results of TeX's mathematical formula layout algorithm is convincing, its original description is hard to understand since it is presented as an imperative program with complex control flow and destructive manipulations of the data structures representing formulae. In this paper, we present a re-implementation of TeX's formula layout algorithm in the functional language SML, thereby providing a more readable description of the algorithm, extracted from the monolithical TeX system.

I've wanted a simple description of what TeX is doing for a long time, and it's nice to see that laid out in a clear and readable way.

AutoBayes -- A DSL For Bayesian Networks

Bayesian networks are one of the most important formalisms in contemporary machine learning. Given the structure of a network there are a number of well established inference algorithms, but the algorithms are quite involved to implement and applications are typically very performance sensitive. Hence the programmer faces a dilemma: quickly implement a solution in a high level language and suffer extended run time or grub around in C or Fortran for weeks but achieve good performance. This is a perfect application area for a DSL to neatly solve this dilemma, and this is exactly what AutoBayes does. Specify a high level description of the network and it generates high performance C++ code to solve it.

If you find this interesting you will also be interested in the links between probability theory and monads. Eric Kidd has had a great series of blog posts on this theme, and sigfpe has also posted on this theme. Those crazy academics have also been getting in on the action!

RZ: a tool for bringing constructive and computable mathematics closer to programming practice

RZ: a tool for bringing constructive and computable mathematics closer to programming practice, Chris Stone and Andrej Bauer.

Realizability theory is not only a fundamental tool in logic and computability, but also has direct application to the design and implementation of programs: it can produce interfaces for the data structure corresponding to a mathematical theory. Our tool, called RZ, serves as a bridge between the worlds of constructive mathematics and programming. By using the realizability interpretation of constructive mathematics, RZ translates specifications in constructive logic into annotated interface code in Objective Caml. The system supports a rich input language allowing descriptions of complex mathematical structures. RZ does not extract code from proofs, but allows any implementation method, from handwritten code to code extracted from proofs by other tools.

Realizability is a way of formalizing the constructive interpretation of logical formulas (eg, the BHK interpretation). Constructively, a proof of an implication A implies B means that if you are given evidence for A, you must be able to compute evidence for B. Since this constructive interpretation does not specify any particular programming language (theorists often use the lambda calculus, Turing machines, and the like) so Andrej Bauer and Chris Stone observed that they could take the realizers of mathematical proofs to be Objective Caml programs! This meant that they could take mathematical theories and generate ML module signatures describing the interface to the implementation of a mathematical theory, plus comments about what invariants must hold.

Putting functional database theory into practice: NixOS

NixOS is a Linux distribution based on Nix, a purely functional package management system. NixOS is an experiment to see if we can build an operating system in which software packages, configuration files, boot scripts and the like are all managaed in a purely functional way, that is, they are all built by deterministic functions and they never change after they have been built. Such an operating system should have all the nice characteristics that the Nix package manager has.

Here are links to:

I found this an extremely readable thesis, light on math but high on insight. I now have an entirely new way of thinking about components and the filesystem, and that's really cool. I'd be very interested in hearing what people with serious deployment/sysadmin experience think about this approach.

(Thanks to Gavin Mendel-Gleason and Martin Bravenboer for posting the original links in the discussions page!)

XML feed