archives

How best to add a record type to my typed Scheme variant?

I'm working on a compiler for a Scheme-like language, 'Irken'. The purpose of Irken is to generate a VM for a (yet to be designed) python-like language. I intend for Irken to be the implementation language for the VM, and also to be the language by which the language is extended - similar to the role of pre-scheme in scheme48.

http://www.nightmare.com/rushing/irken/

One goal here is to eliminate the need for end users to program in C.

Another reason for the roundabout approach: I want massively scalable cooperative threads (think 100,000+ threads, think Erlang)... so using the C stack is out (as is BitC, which I've recently learned about here).

Irken is a whole-program compiler that generates one large C function - 'vm()'. Irken uses the gcc address-of-label extension and 'goto' to implement closures. The current continuation (i.e., 'stack') is heap-allocated.

A couple of months ago my desire for a speedy VM got the best of me and I decided to dabble with type inference - mostly so I could get rid of runtime type checks. It's also nice to get the type safety. 8^)

I now have ML-style let-polymorphism - the algebraic data types. I can implement nice polymorphic data structures like red-black trees, parsers, etc. I've been pleasantly surprised at how little trouble the type inference has been. I've been able to write well-typed generators using call/cc, for example.

I put off thinking about records for a while, assuming that it would be a sugary detail. Now I'm seeing that it's more of a tar pit.

So I'm asking for advice - given the purpose of Irken, what approach should I take in adding a record type? Depending on the amount of work and complexity involved, I could accept anything in the range of 'all records require type annotation' to 'everything is figured out magically by the compiler and has no run-time cost'. Leaning toward the latter direction, though. Also, it'd be nice if I could sugar it up to support some simple OO features.

I've read Wand's "Type Inference for Record Concatenation..." and a bunch of stuff by Remy related to OCaml's row polymorphism. I've also been reading Ohori's "A Polymorphic Record Calculus...".

The impression I have right now is:

Wand: simple monomorphic records + recursive types.
Remy: less simple polymorphic records + recursive types.
Ohori: really cool polymorphic records but no recursive types?

The Ohori approach also looks like I'd have to throw my existing code away - it seems to replace products and sums with labeled records?

Thanks for any and all advice!

Open Recursion

I've been contemplating an embedded DSL. It strikes me how incredibly useful open recursion would be for what I want to achieve, to the point that I might want to give the technique some kind of explicit support and encouragement. Curious to see what had already been done, a quick google search turned up Closed and Open Recursion by Ralf Hinze.

Open recursion Another handy feature offered by most languages with objects and classes is the ability for one method body to invoke another method of the same object via a special variable called self or, in some langauges, this. The special behavior of self is that it is late-bound, allowing a method defined in one class to invoke another method that is defined later, in some subclass of the first.

I don't particularly like the examples being offered, but it's an interesting set of slides nonetheless. Of all the many features of objects, somehow I had failed to fully appreciate this aspect. I don't necessarily want to keep the recursion open until the last possible minute either; I would prefer to close up the recursion as I see fit, quite possibly at compile time. Is there any sensible and attractive way to re-open a recursion once it's been closed?

I was quite impressed with CTL Model Checking in Haskell: A Classic Algorithm Explained as Memoization on Kenn Knowles' blog. Seeing another good example of open recursion would require solving Project Euler Problem 220.

What is the dual of { } ?

I have a question for the logicians and category/set theorists among LtU readers.

Background:

I'm currently dealing with a simple matching algorithm for strings. One can create a search pattern either by building a set of admissible characters {a, b, c, ... } from the empty set {} or a search pattern from matching basically any character ( like using a . operator in regexps ) and withdraws a set of characters that shall not be matched:

ANY - {a, b, c, ... }

For disambiguation purposes I'm interested in making all those sets disjoint. It shall not be really significant in the discussion that characters are matched. This is just an implementation detail.

The question is about the status of ANY? As it seems the full ZF set theory would be far too much specification. It entirely suffices that each of those sets S can be finally constructed or finally de-constructed as for ANY - S. In one case one starts with a "set of no entities" and in the other case with a "set of all entities". The latter clearly violates the principle that the set is constructed after the elements. I know ANY isn't entirely insane because I have a working implementation with the usual relations like union, difference, subset and intersection.

What I'd like to know is about a more in depth treatment of ANY in the literature. I expect more interest from computing scientists given the above motivation than from classical mathematics.