LtU Forum

Tom 2.3 Released

Tom is a pattern matching compiler developed at INRIA. It is
particularly well-suited for programming various transformations on
trees/terms and XML based documents. Its design follows our research
on the efficient compilation of rule based languages (e.g. ELAN,
developed at INRIA-Loria).

This release continues our work on the integration of pattern matching
and rule based programming facilities into C and Java.

Many applications have been developed in Tom. Among them, lets mention:

  • the Tom compiler itself
  • languages semantics, interpreters and program transformation tools
  • a prover for the Calculus of Structures
  • an interpreter for the Rho Calculus
  • a disunification algorithm
  • Tom is a complex compiler which adds powerful constructs to C and
    Java: non linear syntactic matching, associative matching with neutral
    element (a.k.a. list-matching), XML based pattern matching, string
    matching, and equational rewriting.
    This offers the possibility to analyze and transform any kind of
    data-structure. Tom can be used for large scale developments and
    applications. It comes with documentation, programming, and debugging support.

    This new release contains many improvements and new features:
    - a new generator of abstract data types implementations (Gom) which supports hooks. In practice, this corresponds to private data types of Caml, which ensures that terms are maintained in canonical form

    - a new %strategy construct which allows to easilly define strategies that can be combined using strategy primitives a la Stratego (All, One, Repeat, Choice, Innermost, Mu, etc.)

    - a new %[...]% construct which helps to write cide generators (it is no longer necessary to encode special characters of strings)

    Tom is available, in open source (GPL/BSD License), from the Tom web page

    Implementation of Hecl

    The Hecl scripting language as a programming language isn't something that's pushing programming language design in interesting new directions. However, as a simple, dynamic language that's implemented in Java, it's pretty easy to figure out how it works, so perhaps this article on its design and implementation is of interest to those who aren't quite as advanced or passionate as most of the LtU readership, or who are interested in something to sink their teeth into before wading into something more difficult.

    Additionally, because of the very fact that it is small and simple, it has a practical application: the runtime runs on even older, slower J2ME-enabled cell phones.

    I originally wrote the article for a European Linux magazine, but didn't like their terms, so I decided to just put it up on my web site:

    http://www.dedasys.com/articles/hecl_implementation.html

    Chuck - Concurrent audio programming language

    Not sure if this is relavent to LtU but it is a programming language so...

    http://chuck.cs.princeton.edu/

    ChucK is a new audio programming language for real-time synthesis, composition, and performance - fully supported on MacOS X, Windows, and Linux. ChucK presents a new time-based concurrent programming model, which supports a more precise and fundamental level of expressiveness, as well as multiple, simultaneous, dynamic control rates, a precise and straightforward concurrency, and the ability to add, remove, and modify code, on-the-fly, while the program is running, without stopping or restarting.

    How do Java generics correspond to System F-(omega)?

    In his recent guest lecture in my undergrad PL course, Markus Mottl mentioned that ML-style module systems help his company develop and maintain software in the large. This guest lecture made me wonder how an ML-style module system could be expressed using Java generics. Because I've already translated an ML-style module system into System F-omega, could someone please point me at a comparison or translation between Java (generics) and System F(-omega)?

    J2ME

    A nice article of J2ME programming made me wonder about alternatives to Java for J2ME development.

    I know there are other apporaches to mobile development, including native Python on Nokia phones, but that's not what I am asking about. I mean languages that target the JVM. There are many alternative languages for the JVM. Do none of them provide J2ME support?

    From what I could see Jython isn't an option since the interpreter alone needs more memory than found on normal phones.

    The fundamental difference between Sets and Lists?

    In my pursuit to implement persistent functional databases, I'm struggling with the difference between Sets and Lists.

    My question is to Lambda the Ultimate is: what's more fundamental? Set membership or the relative order between List elements?

    I tend to prefer Sets above Lists - because Sets are more succinct and because ordering can be expressed on top of Sets (although somewhat artificially).
    Consider the following Set A:

    A = (5, 1, 3)

    Imposing order over Set A can easily be achieved if you compare and sort all its elements (lexically), giving:

    a = order_in_list(A) = {1, 3, 5}

    Conversely, mapping Lists to Sets cannot be achieved, because double elements could be possibly contained by Lists, hence:

    b = {1, 2, 2, 3, 5}
    B = make_set(b) = (1, 2, 3, 5)
    b' = order_in_list(B) = {1, 2, 3, 5}

    which gives the unsatisfactory : b' != b

    But things get even worse if we try to map the following (unsorted) List to a Set and vice versa:

    c = {9, 3, 3, 1, 5, 4}
    C = make_set(c) = {9, 3, 1, 5, 4}
    c' = order_in_list(C) = {1, 3, 4, 5, 9}

    Which gives us the even worse : c' !=!= c

    Yet if Sets are more general then Lists how then can we uniquely map c to C?
    One possible mapping would be the following:

    C' = make_ordered_set(c) = {(0,9), (1,3), (2,3), (3,1), (4,5), (5,4)}

    and vice versa:

    c'' = ordered_in_list(C') = (9, 3, 3, 1, 5, 4)

    Giving us the c'' = c --- the equality we want:

    But with this approach we are still stuck with the (position, value) List pairs. But not too surprising this can be rid of easily with the following generic mapping:

    P = ($position, $value)
    p = make_orderder_pair(P) = {{~, $position}, {!, $value}}

    where '~' and '!' are special and reserved symbols that are unique and cannot be re-used except for the creation of List pairs.

    Now that we established a perfect mapping from Lists to Sets and vice versa, we can conclude that Sets are more fundamental - not a big surprise if you think of Sets to be the basic building blocks of all mathematical theorems.

    But one interesting operation remains: the concatenation of Lists. Let's see if the previous mapping is capable in expressing the concatenation of Sets. First, let us consider the simple case:

    A = {4, 1, 3}
    B = {6, 2, 1}

    One algorithm of mapping the concatenation of A and B would be the following:

    C = concat(A, B) = {{4, {6, 2, 1}}, 1, 3} ie. nesting B into the greatest element of A.

    However, if we concatenate C a million times with itself, this algorithm would create a nested structure of the same depth: a million - not an ideal solution. Especially if we want map such enormous Sets to Lists in at least (O(log n)).

    An alternative algorithm would be the concatenation of the greatest element of A to *all* elements of B. This would give:

    C' = concat2(A, B) = {4, 1, 3, (4,6), (4,2), (4,1)}

    But again, by applying this scheme we alternatively end up with very big elements (length > million).
    So the choice is to allow either very deep recursion of nested sets, or the growth of enormous elements (by repeated concatenation).
    My choice would be the growth of enormous elements - because concatenating elements can be relatively cheap.
    But how can this be cheap? How can we ever expect to concatenate humongous elements cheaply?
    Consider the concatenation of simple Strings, in Java or in any other industry strength language - such Strings are immutable. But the naive Java implementation of immutability causes a lot of inefficiency. Concatenating two Strings will always involve the copy of both Strings to produce the new one, so:

    A = {This is a text of a millions characters....}
    B = {A trillion characters text will follow....}
    concat(A, B) -> in Java this will cost you O(1.000.000 + 1.000.000.000)

    But what if you can concatenate A and B in min(log(A), log(B)) operations? This can be done in reality, albeith not easy.
    One route to fast concatenation would be the recent discussion and implementation of:
    Finger Trees: A Simple General-purpose Data Structure by Ralf Hinze and Ross Paterson.

    Another potential solution is:
    Ropes: an Alternative to Strings by Hans-j. Boehm, Russ Atkinson and Michael Plass.

    Regardless of this discussion, both articles are very much worth reading - especially because they encompass the thru functional style we all love.

    TinyScheme Revived

    Just a brief note to let people know that the TinyScheme project has been revived. Dimitrios has graciously allowed me to adopt the project.

    My sense is that TinyScheme needs to be a good scripting language first, and a conformant Scheme implementation second. Regrettably, there are features of Scheme that are very hard to support in an environment where Scheme calls out to C and C must be able to call back into Scheme. In particular, "call/cc" is a true nightmare in this context, and I am (reluctantly) considering removing it (more precisely: leaving call/cc and dynamic-wind out when TinyScheme is compiled for scripting use).

    I would be very interested in discussion and advice on this point, since I don't want to walk any further away from conformance than is necessary.

    Best regards,

    Jonathan Shapiro

    Dynamic Software Updating for C

    Practical Dynamic Software Updating for C

    "We compile programs specially so that they can be dynamically patched, and generate most of a dynamic patch automatically. Ginseng performs a series of analyses that when combined with some simple runtime support ensure that an update will not violate type-safety while guaranteeing that data is kept up-to-date. We have used Ginseng to construct and dynamically apply patches to three substantial open-source server programs—Very Secure FTP daemon, OpenSSH sshd daemon, and GNU Zebra. In total, we dynamically patched each program with three years’ worth of releases."

    Functional single argument style object oriented programming

    So I had this thought as regards a programming language. Basically it would a lot like forth, only instead of a stack there would be a default initial object. You get new objects by sending messages to this initial object. Things like numbers would actually be messages that would result in the receiver producing a new object to act like a number. The only thing you could write would be messages. Messages would be represented in the source file as whitespace separated strings, much like forth. Other than that there would be no syntax. Basically, I want you guys who all know a lot more than me to point out the flaws in this idea. Heres some examples of what I envision:

    1 + 2
    

    This would break down into:

    1
    

    Send the message "1" to the default object, this will result in a new object being returned that results in the integer 1.

    1 +
    

    This sends the "+" message to the 1 object, which returns a new object an "adder".

    1 + 2
    

    This sends the "2" message to the "1 +" adder object. this results in a new object, a three.

    As you can see, messages are evaluated from left to right, and there are no arguments. A program in this language could almost be viewed as one giant single argument function.

    Unlike FORTH, you don't have to use post-fix notation (although there is no operator precedence, just as in Smalltalk).

    Due to the syntactic baggage of a class based object system I would suggest this be implemented as a prototyped system.

    A clone message could be used to well clone an object. You could then add methods to it by sending it another message:

    clone define-response hello string hello display end-define-response hello
    

    I have named my method definition method "define-response" since messages are pretty much the only thing you got, and your more responding to messages than invoking methods. Of course if the above is confusing it can of course be written as (I will borrow FORTHs commenting convention here):

    clone ( Create a new object )
    define-response hello ( define the hello method )
        string hello ( Create a string with the contents "hello" )
            display    ( Display that string and return it )
    end-define-response ( Finish the method definition and return the object )
    
    hello ( Call the hello method of the newly created object )
    

    Note that define-response creates an object that sets up some internal state. It receives its first message, it responds by naming the method for the object it is creating after the message it just receives. It then returns a new object that simply consumes and stores the messages it receives until it receives a message of end-define-response When it receives this response it creates the method in the object it was defining it for and returns that same object. You can now call additional methods on it, including defining more methods. You can see how any control structure can be implemented thru a combination of this kind of object and/or recursion. Nesting can be handling by having these objects maintain an internal stack, they need only recognize an invocation of themselves, any other nesting problems can be detected when the message is responded to.
    A major problem with this system is variables. Of course I can just add some methods and change them as I need to change the variable e.g.:

    define-response  x 1 end-response
    x display ( Prints 1 )
    

    But that's verbose and creates the equivalent of global variables. It also has performance issues since x has to be evaluated every time it is called, this creates problems especially if the object created is complex. So I propose a slight modification to the define-response method. The creation of a define-response-immediate method. The only difference between the two would be that the body of a define immediate response would only be evaluated once, when the method was defined. Every other time the message was sent it would return the cached value. The rest of the issues with the verboseness of variables could be handled with additional methods, like copy_to e.g.

    1 + 2 copy_to x
    

    It uses this "backwards assignment, since I'm trying to avoid parentheses, although they could be added.

    I look forward to any criticism.

    "Down with Lambda-Lifting"

    With regard to the absence of new posts (and Eric Meijer), I don't think this paper has been much discussed.

    Simplifications of the Spineless Tagles G-Machine and TIM are presented, which
    like the classic SECD machine or the Krivine machine reduce lambda-expressions to weak
    head normal form -- no prior lambda-lifting is necessary. The machines are at least as
    efficient as their combinatorforebears but more importantly they are simpler due to the
    elimination of a translation step that obfuscates programs without improving their efficiency.

    Note that this marked (on his home-page) as an unfinished draft. Not sure when it was last revised, but it looks post-1992.

    Actually I'm hopeing someone might be able to translate the gist of the paper to something a Schemer might understand :), since I think at least Chicken uses lambda-lifting.

    XML feed