LtU Forum

Content Addressable Type Systems

Hello,

I have an interesting problem that this site's audience may have an opinion on.

I'm assuming that people know what content addressable storage is. Let's say it uses SHA1; we put some binary blob in it, get its hash back, which we can later use to retrieve the blob again.

Now imagine I have a primitive type system that looks something like this (pseudo-code):

    struct TypeInfo
    {
        string name;
        MemberInfo [] members;
    }

    struct MemberInfo
    {
        string name;
        Hash type;
    }

I can use the TypeInfo and MemberInfo structures to describe types, and insert them into the content addressable store. The type field in the MemberInfo structure contains the hash of the field type.

This all works fine and dandy, until a type refers to itself. The trivial case is a direct reference, for example:

    struct BinTreeNode
    {
        BinTreeNode left;
        BinTreeNode right;
    }

But one can imagine more complicated type relations where a type only refers to itself indirectly (i.e., Foo refers to Bar refers to Pep refers to Foo).

Suddenly, I can no longer build a type description using the content addressable store. After all, it means generating a SHA1 hash value for a binary blob that is supposed to contain that very hash itself.

Next thing I know, I'm looking at fixed point combinators and I'm trying to crack SHA1. Clearly that isn't going to work - not given the machine power and time available to me... :-)

Does anybody have some creative suggestions on how to solve this problem. So far, my work arounds include various combinations of losing type safety and/or introducing an extra level of indirection.

The former would involve dropping from a typed reference to an untyped one (a void* if you will). The latter would result in breaking the invariant of my content addressable store, namely that the SHA1 of each blob is the hash used to refer back to it.

Another interesting question is how to decide where in a type relation cycle it makes the most sense to insert such a work around. Theoretically, interpreting and inserting types can begin on any point on a cycle, so it seems arbitrary to pick the first time my parser detects a cycle; it would lead to non-deterministic type descriptions.

If the problem description is too vague, I can provide more detail if necessary. It's language agnostic.

I'm suspecting that there won't really be a clean solution; I'm looking for the least ugly work around.

Note that I'm aware we could generate hashes from just the type names instead of the full type description. But the whole point of the exercise is to capture a type's complete content in the system; this has numerous interesting benefits once types start evolving (in the area of database and schema upgrades).

Thanks,

Jaap Suter - http://jaapsuter.com

Another multimedia dataflow programming system

Didn't find it mentioned previously on LtU, although Pd is apparently wizened so I'd figure some folks here would already have checked it out? Considered to be in the "Max" family, I guess.

Even though several computer scientists followed Sutherland's pioneering work, somehow graphical programming didn't really take off in the mainstream computer world. Besides the LABView[3], a tool popular among industrial engineers, there is only one two-dimensional programming language that has found massive use - and that is the paradigm used in the "Max"-family of software which Pd is a member of.

another multi language learning/overview site

possibly interesting, PLACE is something I stumbled across while googling to read up on Oz. not canonical in any way, but advertising polyglotism seems like a nice public service to me.

A Java-like formalism for control flow analysis.

I'm looking into defining a flow-sensitive effects system for Java (to check things like the definite assignment of local variables). Does anybody know of a good existing Java-like formalism I could use?

I don't think Featherweight Java is appropriate here because it mostly ignores Java's control-flow mechanisms. Much of what's interesting about a flow-sensitive effects system has to do with loops, conditionals, exceptions. On the other hand, I care less about things like classes and inheritance.

Maybe the simply typed lambda calculus (plus some Java-like exception mechanism) would work better?

"Very linear" lambda calculus

What portion of linear lambda calculus corresponds to using only the B and I combinators (composition and identity)? I have a hunch that it's all lambda terms that use their input variables exactly once and without permuting them, but I'm finding it hard to prove that.

The standard abstraction elimination algorithm doesn't work to give a combinator:

\abcd.a(b(cd))
=\abc.B a (B b c)
=\ab.B (B a) (B b)
=\a.B (B (B a)) B

and to eliminate the variable "a" here, I need to use the C combinator.

But there is a combinator that is extensionally equivalent that doesn't use C:

\abcd.a(b(cd))
=\abc.B (B a b) c
=\ab.B (B a b)
=\a.B B (B a)
=B (B B) B

It depends on using the associativity of composition.

How can I tell which linear terms are in the span of B and I?

Is There a Standard Formalism for Describing Abstract Syntax Trees?

I'm in a position where I need to describe an algorithm that operates over an abstract syntax tree. In this sense, the language would be that of a simple 'while' language and I'm only interested in statement level detail. An example of what I'm looking for would be a good, mathematically precise description of implementing a control flow graph. Does anyone know of any articles, papers, books with such a discussion? An example:

Program:

A = 0
while A != 10
  if A == 5 then
    print "hello, 5!"
  else
    print "waiting for a 5..."
  endif
  A = A + 1
endwhile

The obvious AST:

[stamentlist]
 +[assignment]
 +[while]     
   +[statementlist]
     +[if]
       +[statementlist]
         +[print]
       +[statementlist]
         +[print]
     +[assignment]


Publishing negative results: single-assignment Lisp

For the past few months I've been working on a language that I thought of as a hybrid of Scheme and Erlang. Lisp-1, non-hygienic macros, message-passing concurrency, implicit parallelism on function args, single assignment, no mutable values. Mostly I'm pleased with what I've learned from it, but today I realized that I couldn't see it as a usable language.

The problem is that single assignment prevents the user from redefining functions at the REPL. Or, rather, you can do it (it creates a new environment containing the new binding), but it doesn't affect the functions that use the old binding. I don't have a problem with that in ML; but I just couldn't see myself using a Lisp that didn't let me redefine functions.

I just wanted to post about it, on the theory that we don't see enough people talking about their failures. Maybe somebody else will see this and avoid the trap I painted myself into; or maybe they'll be inspired to find a way I didn't think of.

Oh, and why single assignment? Because it means that environments are immutable, too—the only time you get to bind is when you create a new environment, with (let)—which means that environments can safely be shared across processes, which makes processes cheaper. It also means that it's relatively safe to evaluate function arguments in parallel, since there are no assignments to get out of order. (It's only relatively safe, though, since message send/receive could get misordered.)

One decision that I'm still pleased with: I disallowed arbitrary cons cells; all lists are proper lists, as in, say, ML. I'm of the opinion that Lisp constructs such as (1 2 3 . 5) are more trouble than they're worth; every time you iterate over a list, you have to decide whether to check for improper lists or just have (car) throw an error. Also, when all lists are proper, and immutable, more optimizations become possible, such as, under the hood, using an array instead of a linked list.

Cilk++ (alpha) docs made public

http://www.cilk.com/resources-for-multicoders/for-developers-only

Cilk Arts is commercializing the MIT Cilk project, to help transition C/C++ apps to multicore architectures.

Shipping towards the end of 2008, supporting both GCC and Visual Studio compilers.

The alpha docs and code samples provide a preview of the technology, including language syntax, dealing with data races and non-local variables.

Irresistible programs

Here's the deal: Let's design a short booklet of irresistible programs, sure to intrigue and delight programmers and geeks. The programs should be short (one page), real (not made up for this occasion) and preferably of historical value, and must look interesting (bonus points to those suggesting programs written in amusing glyphs).

Here are a couple examples: McCarthy's Lisp (1960), Quicksort in Haskell (and for comparison: Hoare's definition of Partition), first winners of the Obfuscated C Code Contest.

Now is your turn. Which programs do you find simply irresistible?

Logic programming and finance

I'm curious what sorts of papers and articles are out there regarding logic programming (Prolog and family especially) in finance applications. I know at least a little work has been done in this, but I'm not sure where to start looking in terms of resources. Are there any significant systems? Any particularly standard ways of the right (normalized?) ways to represent information regarding financial products in logic languages? Any sorts of pointers on this would be very useful.

XML feed