Darcs: an open source version control system implemented in Haskell

Slashdot today links to an interview with David Roundy, the author of darcs, one of a number of proposed replacements for CVS. Darcs is, rather thrillingly, based on a theory of patches with roots in quantum mechanics, a notion which Roundy brings down to earth a little in the interview:

At its most basic level, the theory of patches is about the commutation, or reordering, of changes in such a way that their meaning doesn't change. The rules of commutation tell us when, for example, one patch requires another, since dependent patches cannot be commuted. Once the commutation primitives have been worked out, one can do all sorts of interesting (and useful) operations, such as merging. And such operations can be shown to be independent of order, i.e. it doesn't matter whether you merge patch A or patch B first, you'll get the same result.

Also interesting is his choice of Haskell for an implementation language, and the reasons he gives for this:

Haskell is just a great language in which to program. It is purely functional, and lazy, both of which allow you to do really cool tricks. For example, by using lazy IO I can cleanly separate the file and directory reading, from the patch-applying (which is pure functional code), from the file or directory writing. Haskell also is a really good match for implementing the primitive patch operations, with its pattern-matching syntax and higher order functions.

Roundy also notes that he has had no difficulty finding Haskell coders to help with the project:

There seem to be quite a few people out there just looking for somewhere to use Haskell! And in fact, there have also been developers who learned Haskell expressly for the purpose of contributing to darcs.

Jon Udell: interview with Ward Cunningham and Jack Greenfield

Jon Udell's interview with Ward Cunningham and Jack Greenfield might help understand Microsoft's methodology of software factories and DSLs.

The interview is available as a 54 minute MP3 file. The notion of language as abstraction mechanism and explanation of the part played by DSLs appear towards the second half of the conversation.

Python, metaprogramming, and macros

A nice blogpost from Ian Bicking.

LL4 Program and Abstracts

The program and abstracts for LL4 are available, including a presentation by LtU's own Anton van Straaten on reconciling REST and continuations. (Here's hoping LL4 will be webcast as in previous years.)

RDF and Databases

Some RDF research dropped me to a nice paper (PDF) from IBM discussing RDF with relational databases. This combination can replace half-baked application data mechanisms. These crop up regularly in my consulting work. Think nested directories of Windows INI files and brittle, binary files breaking on minor design iterations. The pain, the pain.

Someone should describe RDF in 500 words or less as a generalization of INI. That note would spread understanding of RDF, which is simple but often described so abstractly that it seems complicated. It's better to start from the known and move to the unknown.

Here is a short attempt, just to spark interest. Experts may call me all wet. Windows INI format uses "key-equals-value," with keys grouped into sections. Think of "key-equals-value" as a special case of RDF's "subject-predicate-object." RDF generalizes to any verb, not just "equals," along with superior grouping. While INI nests just one level down (via sections), RDF URIs handle arbitrary nesting (via slashes), and URIs also permit remote data. That is not to say RDF data must be tree-structured. Most RDF papers focus too much on XML. XML is merely one expression syntax. There are several others and a relational database will store RDF data in its own way, completely independent of XML.

There are several projects in this domain. My favorite so far is OpenRDF Sesame. It supports querying at the semantic level. It seems more mature than others, having derived from previous efforts, and works with both PostgreSQL and MySQL as well as Oracle. An abstraction layer called SAIL makes Sesame database-agnostic. Sesame even sports a stand-alone b-tree system, or in-memory operation, if you don't want an external database. I like PostgreSQL much better than MySQL for its loose BSD license and technical merits. Apropos of that, another bit of news is that PostgreSQL now works natively on Windows. (The PostgreSQL client has always worked natively as a DLL.) PostgreSQL speed issues mentioned in Sesame papers have improved. As for Sesame, the only drawback is Java. But since Sesame interfaces over TCP through Java servlets, that's a don't-care.

On a related note, I looked into Python-based Chandler. The story there is that it's a custom job because, says Andi Vajda, When I started working on this project in May, the repository was late, very late, and the project was stalled because of that. I felt I could get something usable for the project to resume much faster if I started a data model implementation from scratch and persisted it using Sleepycat's Berkeley dbxml and Berkeley db. Today, the Chandler repository is not really so much an object database as an item XML database combined with large collections of references directly stored in Berkeley DB. Hmm...project behind, so build from scratch? I'm not clear why Chandler didn't go with RDF, but it sounds like project management problems. It seems as though RDF would support all that Chandler wants to do without the constrictions of XML. Note that Sesame has Python bindings.

Release of Python 2.4, release candidate 1

What's New in Python 2.4 details the changes and additions in this release. Download the release here.

Most of the additions were discussed here in the past. Notable among them are generator expressions and function and method decorators.

Extending Ruby with C

I've generally found that the APIs for bridging the gap between Perl and C are either cryptic (XS) or fragile (Inline::C). While Python is better in some ways, I still find its C API rather difficult to read. Tools such as SWIG can help alleviate this problem, but you still need to write a bunch of glue code to bridge the gap between the high-level agile languages and the low-level C code.

When I first looked at doing the same kind of thing for Ruby, a whole new world opened up. The APIs are simple, to the point where I was up and running in minutes rather than hours.

A nice tutorial article on extending Ruby by providing access to a C library (GenX).

It has been awhile since we discussed language extension mechanisms and multi-language programming, yet these techniques are quite important when building real life systems.

Skribe 1.2b released

(via comp.lang.scheme)

Erick Gallesio and Manuel Serrano have announced the release of version 1.2b of Skribe, a document processing language based on Scheme. From the home page:

Skribe is a text processor. Even [though] it is a general purpose tool, it best suits the writing of technical documents such as web pages or technical reports, API documentations, etc. At first glance, Skribe looks like a mark-up language à la HTML. So, there is no need to have developed computer programming skills to use Skribe.

A second look reveals that Skribe is actually a true programming language, provided with high level features (such as objects, higher order functions, regular and syntactic parsing, etc.). Skribe is based on the Scheme programming language.

From Skribe source files it is possible to produce various targets:

  • HTML pages that can be used to implement a web site (such as the Skribe Home Page).
  • XML files.
  • LaTeX files that can be used to produce high quality Postscript or PDF files.

What language enthusiast/researcher hasn't chafed at the language design of TeX? You should especially check out some of their cool examples.

Language Oriented Programming

Sergey Dmitriev of JetBrains has written a whitepaper on domain specific languages. It is called "Language Oriented Programming: The Next Programming Paradigm" and is available at Language Oriented Programming

Calculemus 2005

via OCaml

The Calculemus 2005 Symposium on July 18-19, 2005 will explore the mission of the Calculemus project:

The scientific and technological goal...is the design of a new generation of mathematical software systems and computer-aided verification tools based on the integration of the deduction and the computational power of Deduction Systems and Computer Algebra Systems respectively.

Both Deduction Systems and Computer Algebra Systems are receiving growing attention from industry and academia. On the one hand, Mathematical Software Systems have been commercially very successful in recent years....On the other hand, the use of formal methods in hardware and software development has made Deduction Systems indispensable not least because of the complexity and sheer size of the reasoning tasks involved.

In spite of these successes there is still need for improvement as many application domains still fall outside the scope of existing Deduction Systems and Computer Algebra Systems. For instance, the scope of Computer Algebra Systems (CASs) could be significantly enhanced by adding deductive reasoning power. In fact this lack of expressivity together with the unsolved problem of correctness prohibit large classes of applications. Deduction systems (DSs), which - on the other hand - provide such an expressivity, as well as the guarantee of correctness, still lack computational power as they are not suited to directly carry out algebraic or numerical calculations. This severely restricts their scope of application in mathematics and - more importantly - in engineering applications.

Earlier we discussed the Axiom CAS.