Implementation of Cardelli and Daan Leijen Style Record Systems?

I've been reading through "Extensible records with scoped labels", and I'll soon be rereading Cardelli's "Operations on Records." (should be on Citeseer)

My question is simple: are there language implementations, papers, web pages, lore, (anything?) on analyzing, optimizing and thus efficiently implementing the underlying machine data structures and record operator implementations that implement these elegant record systems?

And while I'm asking, any other LTU'ers interested in these record systems and have any insight to share?

Mucho thanks.

Scott

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

On Citeseer

Isn't it this one?

Yup, that's the one. The two

Yup, that's the one. The two type systems share certain qualities, but certainly are not identical.

My inquiry, in part, is whether there is anything regarding compiler/RTS support for these style record systems beyond the potentially "naive" implementations.

First would be (1) decompose shorthand - that actually seems to help a (simple) compiler generate more efficient code - to the primitives, which would appear to render less efficiency by a (simple) compiler at first glance - *unless* decomposition to the primitives allow for some clever analysis by a not-so-simple compiler.

Here are examples: "terse" form on left; primitives form on right

// Creation is really extension
R = {a = 1, b = 2} vs. R = {a = 1 | {b = 2 | {}}}
// "Update" (functional) is extension of a record with old field removed.
{a := 88 | R} vs. {a = 88 | R - a}

Second, is also question rather than a strong assertion. Among other possible optimizations, might it be "naive" to not take advantage of potential sharing opportunities of functional records as they are (perhaps frequently) altered via extension/restriction/renaming/update?

Again, thanks much.

Scott

Implementation...

...the main trick to understanding how to implement these systems is to interpret records as arrays, and the row operations as arithmetic operations.

So a typical system will have types like

   A ::= base | record{r} | forall a:row. A 
   row ::= a | nil | row :: (l:A) | row @ row

So we've got row variables, the empty rows nil, cons'ing on a label, and row concatenation at the row level, and at the type level we have records and quantification over rows.

The idea is that if we have a concrete record record{l0:A, l1:B, l2:C}, then we'll represent this as a 3-element array, whose first element is an A, whose second is a B, and whose third element is a C. So to project from a record, we need to compile labels to offsets into a record.

However, a label can occur at different positions in different records -- for example, consider record{l0:A, l1:B, l2:C} versus record{l1:B, l2:C} -- in this example, l2 occurs at the third position in the first record and at the second position in the second record type. For records without type variables, there's no problem -- but what to do about records whose static type has row variables, such as record{a, l2:C}?

The idea is to compile the row variables to an integers representing its length, and to compile row quantification as functions which get passed an integer (representing the length of the row argument) at runtime. So if you try to index a record of type record{a, l2:C}, you'll generate code which takes the integer representing the length of a, and add 1 to it to get the offset for l2.

This scheme does mean that things like functional record update have cost O(k), where k is the size of the record you're updating (since you have to copy the whole array), but that's basically what people expect. You lose the sharing benefits, but you avoid chasing pointers, and since records are usually small this is a favorable tradeoff.

I first learned about this idea in Ohori's paper "A Polymorphic Record Calculus and its Compilation".

An "Ouch" Case

For records without type variables, there's no problem -- but what to do about records whose static type has row variables, such as record{a, l2:C}?

Hmm, so now we're talking about record types with polymorphism over actual row variables - yes? And not just type variables over the types of some fields in a record. I admit that this first appears as a bit of an "ouch case" regarding type analysis and compilation activities.

But at first glance, I do have happy, if fuzzy, thoughts of how this might present many "natural" expressive possibilities, removal of formerly special cases in a language definition and so on and so forth. Removal of the module system as a special case from a language? Reducing anonymous "structurally typed" structs, tuples, multiple return values, etc. as single-constructor degenerate cases of algebraic "data" type definitions, and so on. A man can dream :-)

Again, first thoughts. The level of capability for new sound abstraction using the row polymorphism to which you refer seems to me somewhat analogous to Typed Scheme's "Section 3.2 Non-Uniform Variable-Arity Functions" found in Typed Scheme: Scheme with Static Types (as well as one of the Typed Scheme derived papers I can't locate right now).

This latter typed-scheme type system allows for typing "natural" expressions such as typed variable arity functions - such as the so very common (map add1 [1 3 2]) and (map char-ref ["hello" "world"] [2 3]) - and so very many more, similar cases in a highly expressive typed language.

I need to read the paper you site.

Thanks much!

Scott

p.s. Again, not yet having read the paper, my mind turns to Go's recent description of mapping value types to interface types, in every case creating a [I,T] specific (static, shared) method "vtable" table in the language to achieve rapid matching of interface method declarations to the data type method definitions. Hmmm?

Record polymorphism

Hmm, so now we're talking about record types with polymorphism over actual row variables - yes?

Well, it may not be clear from the paper, but that's actually what "record polymorphism" and "extensible records" are usually referring to. Without it the record features in the paper can be regarded as just syntactic sugar, because you can easily expand them out at compile time.

Roger that

I was viewing the row variable in the Leijen paper as a further "foundational contribution" because where I am in the paper, he brings it up, lays out the syntax and how it functions in the "axioms", and then elides it - so far to where I am in the paper.

Motivation

I looked into these documents some time ago, but was unable to find a powerful motivating argument (use-case) for the extensible records. The ability to mask a field of a record (as opposed to only replacing it) is especially pointless, since any code that can possibly take advantage by unmasking the field must know in advance the context in which it was called. For generic programming, one would more often wrap one record around another than use extension.

I suppose one might somehow use this for meta-programming, but I'm not sure how or where it would be of real benefit.

Has anyone else managed to find a motivating case?

In OO terms

In OO terms you are saying that:

1. You don't see how inheritance is useful.
2. Especially, inheritance allowing shadowing of method names (a la C++).
3. Delegation is preferable.

There might be some truth to that, but it won't be hard to find strong disagreement. :-)

This is not just a joke. The OCaml object system, for example, is based on extensible records in almost exactly this way: instead of subtyping it uses record polymorphism.

IIRC, OCaml does not allow shadowing. However, Leijen's main argument for that is not so much that it is useful (I also doubt that), but rather that it significantly simplifies the semantics and the type system. And he has a point there.

Records and Objects

Well, I'm sure you can find vehement disagreement with anything negative said about OO or <pick any popular language>...

To be clear, I see records as being stateless things, such that a projection of a record is referentially transparent (same value regardless of when it is performed). You'll need to remind me whether OCaml has mutable records or not.

This makes for a major difference between objects and records, since you generally cannot 'project' a method from an object, nor can you easily mix and match methods to produce new objects, and you especially cannot do these things with interfaces that aren't even instantiated into objects yet.

The sort of 'is-a' inheritance given first-class support in C++ and such isn't something I use often (except for polymorphism from an abstract base class). I prefer to create a new object or facet to serve whichever fine-grained role is necessary, and keep the set of exposed method names as narrow as possible for the job... i.e. allowing the principles of capability security to drive design.

significantly simplifies the semantics and the type system

I wasn't convinced even of that much. Clearly, any record extension operation will introduce a new type each time it is used, but it shouldn't be difficult to simply assert that 'shadowed' fields are entirely unavailable.

But if we're working with mutable records, it might be different...

Objects are records

State is an orthogonal issue. In mainstream languages, objects are typically stateful, but that's mainly because mainstream languages are typically imperative. Functional objects make perfect sense, and have been advocated by different people. Semantically, state and objects are mostly unrelated.

As for OCaml, records can be mutable, and objects immutable.

you generally cannot 'project' a method from an object

Actually, that's precisely what you do in more foundational treatments of OO, where objects are usually viewed as (recursive) records of functions/methods.

it shouldn't be difficult to simply assert that 'shadowed' fields are entirely unavailable

Unfortunately, you then need to track which fields are legal where for which row variable. There are several ways to do that, e.g. by decorating row kinds with sets of illegal labels (a la Ohori), with some sort of "negative" (absent) fields (a la Remy), or by using qualified types with absence predicates (Gaster/Jones). And probably others. They all significantly complicate types, typing rules, type inference, and also operational semantics.

Records as Objects

State is an orthogonal issue.

Ugh. You've got to be careful with that word. Whether one concept is "orthogonal" to another is very often relative to specific concerns, which might not be the concerns under discussion. In this particular discussion - where the 'concerns' are composition and decomposition of records - statefulness of the record itself is not an orthogonal issue.

State can easily be a separate concern from records (i.e. so that records can hold references to stateful 'cells', but the records themselves are pure). But to allow stateful records and make that statefulness orthogonal to composition and decomposition (including projection) would require some sort of reactive programming and binding vs. assignment interface. That is, if you say 'Y=R.x', you had better be able to decide whether Y will change whenever R.x changes and vice versa. Otherwise the state ain't orthogonal to the projection operation.

Functional objects make perfect sense, and have been advocated by different people. Semantically, state and objects are mostly unrelated.

Sure, you could use a corecursive record of functions to represent something like objects. But, IMO, it is extrinsic identity (support for names), and the ability to represent communications elements of the domain as 'objects', that most clearly distinguishes OO from functional methodologies. That is, functional objects by themselves do not allow for OOP.

All objects with extrinsic identity are stateful, if only in having stateful lifespan, accessibility (potential for disruption), and activity, even if they don't manipulate cells to keep information across activities.

you then need to track which fields are legal where for which row variable

It remains unclear to me how one does any better when using extensible records with shadowing.

If you're dealing with generic programming on record types, you still need to express "negative" fields (as outputs from generic functions) regardless of shadowing. And you'll still need to express required fields (which may or may not need be a superset of negative fields, depending on semantics of field elimination).

If you're not doing generic programming on record types, then you can track the set of legal fields more precisely.

I have to agree....

While I'm sure there are interesting functional OOP systems sitting on lab hard drives and in journal articles out there, whether via mind share, market share or historically (Simula, Clu , Smalltalk, Eiffel ...), OOP has been predominantly an effort to *manage* imperative, stateful programming (via predominantly nominal typing based efforts as well).

Scott

"Generic programming on record types"?

If you're dealing with generic programming on record types, you still need to express "negative" fields (as outputs from generic functions) regardless of shadowing.

I don't think I understand what you are saying. What kind of "generic programming on record types" do you mean?

Generic Programming on Record

Generic Programming on Record Types could include function prototypes like the following:

(a:A b:B ... R) -> (b:B c:C ... {R - d})

This function takes a record with fields a, b, and more. And it returns a record with fields b, c, and without d. The return value will also be without 'a' unless fields can be shadowed.

That is, one doesn't know the whole type for the record, but does know some requirements for it, and does know that it IS a record type. The function, thus, is not only generic... but specific to records.

The need to track these 'negative' fields (like 'R - d' above) is not resolved by Leijner's design. Well, it is, but the resolution is to simply forbid that class of generic programming on records.

Misunderstanding?

I don't see the problem. The right type for the function you have in mind is

{a:A b:B d:D | R} -> {b:B c:C | R}

That is valid in Leijen's system as well as others. The only difference is that in Leijen's system you might instantiate the function with a type where R contains additional occurrences of a and d, whereas in the other approaches this would simply be ill-typed.

That is, by not tracking negative label information, Leijen's system allows more programs, not less (but arguably ones that are not very useful).

If we're talking tracking requirements....

At least one of Leijen's examples of shadowing extension (my terminology, I think) has an example such as {R - q}.q to access a shadowed "super class" field, presuming that we're somehow at least tracking the shadowed variables throughout our code if we want this kind of capability.

In other words, whether tracking multiple shadowed fields or tracking absent fields, in neither case are we allowed perfectly freewheeling use of operations/operands, somehow liberated from book keeping. By itself that doesn't seem so horrid - am I missing a case or certain cases where this book keeping *does* seem some undo burden?

Thanks

Somehow, it made sense to me this time. I think I understood it before, too, but it has been a while since I read the papers.

I believe those 'extra programs' - in particular, the ones that subtract a field in order to access a shadowed field - are not going to be useful. If I never subtract a field from a record in order to get at a shadowed field, then the implementation (for a closed system) should GC the shadowed fields anyway. And this results in the same practical semantics as the other systems: update is back to being the same as extension.

Problem is, that approach leaves a massive information dump for any open system... not exactly a performance or security booster. And this leaves me thinking the greater limitations of the other designs are probably better.

Of course, under capability security principles (which guide many of my program design decisions), you shouldn't ever 'subtract' fields. You should always whitelist the fields you want the other module to see, never blacklist the fields you don't want them to see, as the former avoids any risk of leakage (or accidental coupling) when a record is extended with a new, possibly sensitive field.

Handling illegal fields

I believe most record systems allow restriction on a field not present in a record, so I'd imagine that illegal fields can be managed easily during extension via: { m = value | R - m}.

Of course this derived form gives us the exact, goofy "extension is extend or functional update" semantics that appears to be the predominate record extension semantics (where?) and is one of Leijen's concerns in his article.

I suppose that adherents of the extend-or-update semantics can take solace in that this style extension operator is definable and typable as a derived form given row variables to work with.

Re: OOP, I see the big difference between most (all?) production OOP systems I've worked with and structure/record systems is that OOP systems are nearly all nominally typed, whereas record systems are structurally typed.

It's the combination of structural (sub)typing, parametric polymorphism and row variables all-put-together that make these record systems an intriguing computational playground for me, and makes me wonder why they don't figure more prominently in more programming languages.

Scott

Ill-typed update

Of course this derived form gives us the exact, goofy "extension is extend or functional update" semantics that appears to be the predominate record extension semantics (where?) and is one of Leijen's concerns in his article.

If I understand you correctly then no, that is not the case. In Leijen's system you can only remove a field when you know it's there. That is, {l=x | r-l} is ill-typed if r does not have l. In other words, you have to know what you are doing and which case you want to express (extension vs update).

I was confusing two articles/systems

From Cardelli's "Operations on Records": "Restriction r\x ; removes the field of label x, if any, from the record r. We write r\xy for r\x\y." Emphasis mine.

I was slipping in this "if any" into Leijen's system without looking closely enough. If we were to include these looser restriction semantics, then I think my further conclusions would hold, but the way my mind's been working lately, I'm never sure :-)

Scott

Right

If we were to include these looser restriction semantics, then I think my further conclusions would hold

Indeed. But then of course, not loosening it like that is the very point of his approach.

Motivating cases

I think there are two motivations between the two papers. First, Leijen particularly seems concerned that the prevailing semantics of record extension is a big fat hack combining extension and update: {F = 1 | R} means if R already has an F then update {F := 1 | R} else add new field F to R (all functionally, of course). Better is to make {F = 1 | R} illegal in this case, or to allow for the new field to coexist and "shadow" R's existing field binding.

The second motivation seems a goal to lay out some cleaner (statically typed) semantics for OOP style polymorphism with both structural subtyping and indeed, with the shadowing style record extension mirroring a subtype's "overriding" of a "parent's" classes's method(s). So presuming a shadowing field F having a "method" value M, we can access the "super's classes's" method value M' with {R - F}.F. Yada yada yada.

For my particular purposes, this shadowing extension aspect isn't (necessarily) a super important feature - disallowing extension of R with F if R already has a field F would be just as good. I am particularly interested in efficient implementation of the five key operations (last two derived) - extension, selection, restriction, update, rename - and perhaps other likely very common (derivable) operations, such as concatenation of two records, or more generally, a join of N records, and so forth; a more general n-ary field selection "projection" operator on the set of fields in one or more records, yielding a tuple of the values or a new record preserving the field names; etc.

Perhaps of interest - in my Googling on this topic, there seems a reasonable number of message group chatter, blog entries, whatever expressing a desire for replacement of Haskell's record system with a "better" record system akin to one described in these papers. That said, I'm *NOT* a Haskell person, so I have no opinion on whether or not this is desirable, the quality of Haskell's current record system, etc. I don't really care about Haskell at all, excepting that this does seem to indicate that some folks do believe the types of record systems I am interested in might better support software programming expressiveness or style or idioms, at least for some set of problems with which they are concerned.

BTW - I think I've located one ML related paper related just to just efficient implementation of selection operation in good old record "structural subtype polymorphism," but I want to give it a read to see if it's indeed relevant before posting it.

Scott

Haskell records

I'd be interested in seeing the blog entries on the subject, see if any have a motivating example. I suspect you'll have far better Google search cues than I.

My own interest for 'better support' for records would be to support structural subtyping. That is (a:X b:Y c:Z) should be a subtype of both (b:Y c:Z) and (a:X b:Y). Same for sum-types, albeit inverted: a(X) | b(Y) and b(Y)|c(Z) should be subtypes of a(X)|b(Y)|c(Z).

functionally, of course [...] Better is to make {F = 1 | R} illegal in this case, or to allow for the new field to coexist and "shadow" R's existing field binding.

This is the concept with which I'm having trouble agreeing.

If we were dealing with mutable records, where the record must remain as whole cloth to deal with multiple users each accessing different parts of it, I could perhaps agree with the notion of shadowing fields in the manner described above, and that extension must be distinct from update.

But, with the pure records, I cannot see much benefit in doing so. If we write a function to operate on a record, and we ever used {R-F}.F in that function, that '-F.F' simply becomes an automatically generated name, expressed in unary, but otherwise sharing properties with any other name you might use to access a record: you need to know it exists in advance, etc. Frankly, we don't want to write much code of this format, but if we did need it we could still have it without extensible records - get a similar effect using (a:X d:(a:Y d:(a:Z))).d.d.a (and then writing up some 'cdar', 'cadr', 'cddar', etc. functions) each producing a different type.

Suppose that {F=1|R} will produce a record equal to R except replacing F in R if it exists. This would have the exact same semantics as Leijin's approach, excepting that the '-' operator removes all instances of field F rather than just the first one. Where would this design hurt?

Anyhow, it seems your interest is more focused upon efficient implementation rather than interface, which ideally guarantees that there is little or no overhead per record to identify fields, and that code does not need to search a record to find the right field, AND that one can reuse space in the existing record rather than too often allocating new records.

An example challenge is working with collections of records with structural subtyping - i.e. passing a collection of (a:X b:Y c:Z) records into a function that requires a collection of (a:X c:Z) records - without re-allocating each record as it is passed to the function, and without requiring the function to 'search' for the correct field, and ideally without consuming space for 'b'.

The option I'd probably choose in that case is to compile a specialized variation of the function to deal with (a,b,c) records even though it only accesses a and c. Alternatively, if I knew that the list was destined linearly for consumption by a function that only needs a and c, I'd favor dropping b right away.

A common implementation of the selection operator seems more a feature for interpreted code, than for compiled code.

Poor man's row polymorphism

My own interest for 'better support' for records would be to support structural subtyping. That is (a:X b:Y c:Z) should be a subtype of both (b:Y c:Z) and (a:X b:Y). Same for sum-types, albeit inverted: a(X) | b(Y) and b(Y)|c(Z) should be subtypes of a(X)|b(Y)|c(Z).

Well, that's exactly what row polymorphism is supposed to give you: e.g., if you have a function of type (b:Y, c:Z, A) -> t then you can supply any object/record whose type is an extension of that row. This is how OCaml subsumes (most interesting uses of) subtyping, but without the pain induced by that. (OCaml's polymorphic variant do the same thing for sums, btw.)

My stance is that subtyping in fact is poor man's row polymorphism. :-)

Why "Polymorphism"

That is such an "overloaded" word.

I just consider this proper structural subtyping (as opposed to nominative subtyping). I usually associate 'polymorphism' with automatically selecting different behaviors for different types and 'subtyping' as simply being in accordance with the Liskov substitution principle (that is, performing the same expected service with different resources).

It is unclear to me how OCaml is offering what I'd consider "polymorphism" here. But I guess it's far too late to start complaining. I'll try to remember 'row polymorphism' in the future.

Row polymorphism is a kind of parametric polymorphism

Parametric polymorphism is about selecting the *same* parameterised behaviour for a collection of types (often all types). Row polymorphism is a form of parametric polymorphism in which the types ranged over by a row variable are all possible extensions of a row (typically a record or a variant).

It might take 20 years....

It's arguable that something like structural subtyping + parametric polymophism + row polymorphism + some more or less subtle mixing of inference and interfaces - aka a more powerful, future variant of what's now called "typesafe duck typing", etc. in Scala, Go and a few other languages, will turn out to be the more flexible, dominating successor paradigm to the still prevailing OOP paradigm we have today (founded on nominal (sub)typing, inheritance, imperative programming, etc.).

I won't claim that the former, non-OOP formula is any kind of unified field theory, but it sort of smells like one if one is in a good mood :-)

row polymorphism =/= subtyping

Despite superficial similarities, subtyping and row polymorphism are not really the same thing. Subtyping can express programs that cannot be expressed with only row polymorphism and row polymorphism can express programs that cannot be expressed with only subtyping. The clearest explanation I've found of this relationship is in Chapter 14 of Pottier's thesis.

Pottier and Remy advocate combining row polymorphism with subtyping.

OCaml hits a sweetspot in supporting row polymorphism as part of Hindley-Milner type inference, along with subtyping via explicit upcasts. Row polymorphism integrates cleanly with type inference, and handles a lot of useful cases. The explicit upcasts are needed rarely, but when they are needed they're essential.

(Unfortunately, OCaml doesn't actually implement vanilla polymorphic records, but significantly more complicated objects instead, and the polymorphic variants are also a bit quirky because they use constraints rather than true row variables, but that's beside the point.)

Encoding

Yes, I know. That is, if you only consider universal quantification. However, you can express all negative occurrences of record subtyping by universally quantified row variables and all positive occurrences by existentially quantified row variables. Of course, in general this encoding can get tedious, especially if you have types with nested occurrences of records. But I tend to believe that most practical uses of subtyping actually are not very complicated. The benefit is a cleaner and more expressive typing discipline (than subtyping).

Nice

[replying to Andreas despite the misleading indentation]
Is this observation discussed in the literature anywhere?

Not sure

Not sure. I always assumed this is more or less folklore. There are a couple of papers that describe ways of translating away subtyping, but I'm not aware of one that actually describes it in terms of row polymorphism.

Efficient Record Restriction and Extension

I want to throw out just an idea I've been toying with, and then ask a simple question. Presuming functional records, let's say we devise a "slicing" style mechanism to implement restriction (field(s) removal). This slice structure might be as small as a tiny bit vector, compared to reallocating large parts of potentially large records.

We've seen some array systems that use a slicing mechanism such as this on top of raw, allocated arrays.

Similarly, let's say we add a one word "tail" field to records, where extension might then share "tails" during extension, kind of like cons cells except with N "head" or "car" record fields in a addition to the "tail" or "cdr" field. Obviously, we've seen this with cons cells, and indeed in the degenerate case we have records as lists (not a good thing! IMHO). But let's presume extension is relatively rare and oriented toward adding many fields simultaneously during extension.

Another potential saving grace, we might well be able to implement a consolidation policy of some kind, perhaps during garbage collection cycles.

So now my question - to prefer "pointer chasing" roughly defined or to prefer more/larger memory allocation. More precisely, what usage patterns might favor one over the other?

Mucho thanks in advance!

Scott

Morrow

I think these ideas were implemented in Daan's toy Morrow language, but I'm not entirely sure.