Syntax of Literal Tables (Assocative Collections) and Auto-generated fields

In Heron I'm adding literal tables which like literal dictionaries/maps/hashes/tables in many languages (e.g. Python, Ruby, etc.) are intended to provide a convenient syntax for associative collection literals.

The syntax looks like this:

var animals = table(animal:String, sound:String, legs:Int)
  {
    "Dog", "woof", 4;
    "Cat", "meow", 4;
    "Human", "hello", 2;
  };

The first column is used as a key, and the "value" is an anonymous record containing the other columns.

So you you can say: animals["Dog"].legs.

Question one: what other languages that have built-in support associative collections with more than two columns?

Now obviously this looks a lot like a simple array of objects: but the identity is chosen as the first field. So an interesting application is to construct an object-oriented programming system built from tables alone. It would then be helpful if the first field could be an auto-generated immutable integer id (e.g. the object address, or the database auto-generated key field).

So then the question becomes what syntax should I use? I'm thinking of something like:

var animals = table(self:Id, animal:String, sound:String, legs:Int)
  {
    $, "Dog", "woof", 4;
    $, "Cat", "meow", 4;
    $, "Human", "hello", 2;
  };

Question two: I don't want to invent brand new syntax, what syntax do other languages use for similar features (auto-generated ids)?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Looks good.

I don't know of any other language off hand that has similar syntax. That syntax looks pretty good at first glance. The auto-ids are maybe a little awkward repeating the placeholder, perhaps a modifier keyword during the column definition?

var animals = table(auto self:Id, animal:String, sound:String, legs:Int)
  {
    "Dog", "woof", 4;
    "Cat", "meow", 4;
    "Human", "hello", 2;
  };

Thanks

This is not a bad idea. My only concern though now that I see it, is that it may appear to a programmer skimming code quickly that the first column are the keys, where in reality they are hidden.

So this makes me think that maybe I should be more explicit about the special status of the first column. Borrowing from Ruby:

var animals = table(animal:String, sound:String, legs:Int)
  {
    "Dog" => "woof", 4;
    "Cat" => "meow", 4;
    "Human" => "hello", 2;
  };

So maybe something like:

var animals = table(self:Id, animal:String, sound:String, legs:Int)
  {
    auto => "Dog", "woof", 4;
    auto => "Cat", "meow", 4;
    auto => "Human", "hello", 2;
  };

Would be a bit more obvious to a casual reader when auto-generated ids are concerend. Here "Id" values are generated by the "auto" keyword which is effectively a nullary operator.

Perl, Javascript

Perl:

$animals = {
    Dog => {
        legs => 4
    }
};

print $animals->{Dog}->{legs}; # -> 4

Javascript:

animals = {
    Dog : {
        legs : 4
    }
};

alert( animals['Dog']['legs'] ); // -> 4

Identity Operator

you are missing the $ (identity operator).

This can be done in Javascript using prototypes to seal a first-class environment that then generates the correct identity... assuming my understanding of the evaluation order of JSON is correct and follows the "as if" principle.

Thanks for sharing these

Thanks for sharing these examples.

policy vs. mechanism

It might be a good idea for you to allow folks to control the auto-generation of ids w/ an algorithm. The sigil $ is fine, but in my experience what I want the most is the ability to, when quickly scripting things, say, "start an identity insert with this seed and this increment value". The same task in TSQL requires constructing a temporary table with an identity column, and there is no analagous solution in C# w/o hacking the compiler and AST to provide a special compiler service.

In C++, I could do this using macros and template metaprogramming... but the resulting code is hard to read and difficult to debug.

If people want precise

If people want precise control over ids, perhaps then they could just write:

var seed = 0;
table(id:Int, s:String)
{
  seed++ => "zero";
  seed++ => "one";
  seed++ => "two";
}

See this comment for why I want to provide compiler support.

Lua

I would take a look at how Lua handles this. It's fairly flexible, although it's not exactly what you want.

Neat

Neat

Almost anything as a key

Just to demonstrate that we can use nearly all Lua objects here is a function as a key:

    > t = { [function(x) print(x) end] = "foo" }
    > for key,value in pairs(t) do key(value) end
    foo

The anonymous function takes a single argument, which it prints out, and has the value "foo". We iterate over the table executing the key (i.e. function) and passing it the value of the function, which it print out: "foo".

Good point. Lua was actually

Good point. Lua was actually a big inspiration for tables in Heron. What I didn't want though was to allow jagged tables. In Lua tables with multiple columns would have to be tables of tables. I wanted the number and type of fields in each row to be constant in Heron. So tables are made up of rows, which are more limited. This is for safety, ease of maintenance, and performance.

"In Scala, this would be a library"

val animals = Table('animal -> classOf[String], 'sound->classOf[String], 'legs->classOf[Int])
{
("Dog", "woof", 4),
(Cat", "meow", 4),
("Human", "hello", 2)
};

Identity Operator

you are missing the $ (identity operator).

Easy enough

val animals = Table('self->classOf[Id], 'animal -> classOf[String], 'sound->classOf[String], 'legs->classOf[Int])
{
($, "Dog", "woof", 4),
($, "Cat", "meow", 4),
($, "Human", "hello", 2)
};

In which case the $ is a no-arg method call returning a unique ID each time. You'll need to define one or more of your own, and plug in the right one via an import.

Thanks for sharing this

Thanks for sharing this example. This is a good example of how awesome Scala is for building DSLs.

The reason though that I wanted auto-generated IDs in the language and not just do it via library support, is because the compiler can then use the memory address of the records for the auto-field. This can lead to some extremely efficient machine code.

Memory Addressed IDs

If you're planning to use memory locations as implicit labels, then expect to be bitten by later demands for serialization, persistence, and a variety of GC issues. (Speaking from experience...)

...and passing data across processes, such as parent/child

...and passing data across processes, such as parent/child.

Bottom line: you won't be able to guarantee how virtual address schemes will work.

Opaque Identity and Referential Transparency

Identifiers are useful for describing relationships in tables and graphs, but if you're going to support them you'll need to handle a few use-cases:

  • How do you introduce relationships between entities within the literal?
  • Can you introduce relationships on identity between two literals produced together? I.e. suppose you broke the above table into three tables: animal(ID -> String), sound(ID -> String), legs(ID -> Nat). How, in this case, do you ensure corresponding identifiers upon construction of the literals? Is this relationship somehow maintained as the literals are distributed?
  • How do you combine literals that were produced independently without collisions in identity? i.e. given two literal tables with identity, produce their union, perhaps with a few relationships or edges between them.

Things get a bit more complicated if you want referential transparency with these identifiers, and yet more so if you want isomorphic identity, such that a graph described by points arbitrarily labeled A B C may be semantically identical to a graph described by points arbitrarily labeled X Y Z. Isomorphic identity helps divorce "how was this graph created?" from "what does this graph mean?".

I've been pondering this off and on for a while. The tentative solution I've been favoring involves introduction of a space/point concept. Points exist only in a given space, and spaces are compared isomorphically. There is a special primitive for mapping a function into a space, at which point it can compare points for equality or wrap points into a new space, but cannot export them. A 'space' literal produces a named function that takes arbitrary labels and produces points. I.e. {space S: (vertices:{set [{S a} {S b} {S c}]} edges:{set [({S a} {S c}) ({S b} {S c})]})} is semantically identical to {space X: (vertices:{set [{X 1} {X 2} {X 3}]} edges:{set [({X 1} {X 3}) ({X 2} {X 3})]})}.

define AnimalDB = {set 
{map@list \(X Y Z)=>(animal:X sound:Y legs:Z) [
  ("Dog" "woof" 4)
  ("Cat" "meow" 4)
  ("Human" "hello" 2)] 
}}

define AnimalDB_wID = {space S: {set 
{map@list \(X Y Z)=>(self:{S X} animal:X sound:Y legs:Z) [
  ("Dog" "woof" 4)
  ("Cat" "meow" 4)
  ("Human" "hello" 2)] 
}}} 

;; it's a tad pointless to use ID at all 
;; unless you use it in a relationship somewhere.
define AnimalDB_Alt = {space S:
{set [
   animal({S 0} "Dog")
   animal({S 1} "Cat")
   animal({S 2} "Human")
   sound({S 0} "Woof!")
   sound({S 1} "meow")
   sound({S 2} "hello")
   legs({S 0} 4)
   legs({S 1} 4)
   legs({S 2} 2)        ]}}

I've recently been favoring 'sets' over 'relations'. A set of labeled records could contain many relations (one per label) and thus serve as a whole database. Compared to a record of relations, sets of labeled records have been more convenient for structural typing and syntactically datalog-style relational queries (where the output may involve as many relationships as the input). YMMV.

This design is aimed at referential transparency, isomorphic identity, and determinism. Given that these are functional - rather than containers - making 'Key' special isn't a worthwhile effort.

One might alternatively utilize ADTs to this end, treating each ID resource as a unique ADT (via sealers/unsealers if in a structurally typed language) which allows some distribution of names without violating transparency. Unfortunately, that design makes it difficult for distinct databases to reference or inherit from one another (you cannot generally compare ADTs, not even for equality). Global (sharable) names supporting equality comparison and avoiding collisions cannot be produced with referential transparency, and demand many of the same security properties as capabilities. A generator of such names could be provided to a function if linearized.

Given that these are functional

Given that these are functional - rather than containers - making 'Key' special isn't a worthwhile effort.

In fact, your design goes further than what the OP discusses here. I suppose you could query your table by animal, sound or legs equally. And there may be more than one result.

The syntax you ended with is nice, except for the rather cryptic map@list \(X Y Z)-- seems out of place when entering data.

Is the third AnimalDB the one which is split into three tables?

cryptic map@list \(X Y Z)--

cryptic map@list \(X Y Z)-- seems out of place when entering data

It is a rename operation. (X Y Z) implicitly uses labels (a:X b:Y c:Z), but we want labels (animal:X sound:Y legs:Z). A macro could probably make it look cleaner. map@list is simply an inline import of the function 'map' defined at location 'list'.

If it is too cryptic, I'll be happy to remove it.

Is the third AnimalDB the one which is split into three tables?

Essentially, yes. Though for reasons I mentioned above, I represent these three relations as all existing within a common set as opposed to a record of relations.

I suppose you could query your table by animal, sound or legs equally. And there may be more than one result.

Indeed. One of my basic set operators is a datalog-style relational query:

;; Which animals have 4 legs?
;; note: 'probe Space Fn' applies Fn within Space.
define Q4legs = {fn: DBS => {probe DBS {fn: DB =>
  {query: 
     AnimalName <= {exists X: DB.animal(X AnimalName), DB.legs(X 4)}
}}}}
define AnimalsWithFourLegs = {Q4legs AnimalDB_Alt}

Recursive queries are also possible... i.e. if you had a database of edges in a graph, one might produce a database of connectivity:

define Connectivity = {fn: Graph =>
  {query Answer:
     connected(A B) <= Graph.edge(A B)
     connected(A B) <= {exists C: Answer.connected(A C), Graph.edge(C B)}
  }}

Good support for databases and simple logic programming inside the functional layer combines well with functional reactive programming. Basically, you can have realtime views over multiple databases, and the optimizer gets a lot of room to play.

The first column is used as a key, and the "value" is an anonymo

var animals = table(animal:String, sound:String, legs:Int)
  {
    "Dog", "woof", 4;
    "Cat", "meow", 4;
    "Human", "hello", 2;
  };

The first column is used as a key, and the "value" is an anonymous record containing the other columns.

So you you can say: animals["Dog"].legs.

Well as you said, the first column is a key and the other columns part of an anonymous record so that is how you would ideally define it in other languages.

For example the dodo syntax would be:

String[(String sound, int legs) features] animals = [
   "Dog": ("woof", 4),
   "Cat": ("meow", 4),
   "Human": ("hello", 2)
]

As for the auto-generated ID... Can't you assume all objects have one and use that? Like the memory address of the object. If you need the IDs to be successive numbers, it may be preferable to specify it explicitely. At which point you may want to build your table using a map operation or a loop.

Chicken and egg

So in dodo (String sound, int legs) features is the syntax for defining the structure of an anonymous record? This looks quite reasonable.

In Heron, there are no anonymous records, so tables are in fact used for that purpose. AFAICT there is no ideal solution, just a chicken and egg problem: do you define tables from anonymous objects, or vice versa.

I can't follow the chicken

I can't follow the chicken egg problem here. It sounds like I could build tables from dictionaries and records or I could build records and dictionaries from tables.

Using records I can for example write a function taking a single animal (something that has an animal, a sound and a legs field), using tables, how do I ensure that I only get a single animal?

Good challenges!

You have obviously given the problem a lot of thought!

How do you introduce relationships between entities within the literal?

Don't know if it is a "good idea", but here is what I am thinking:

var prev = $;
var t = table(self:Id, prev:Id, s:String) 
{
  prev = $ => prev, "a";
  prev = $ => prev, "b";
  prev = $ => prev, "c";
}
Can you introduce relationships on identity between two literals produced together? I.e. suppose you broke the above table into three tables: animal(ID -> String), sound(ID -> String), legs(ID -> Nat).
How, in this case, do you ensure corresponding identifiers upon construction of the literals? Is this relationship somehow maintained as the literals are distributed?

Here is how it works:

var names = Table(self:Id, name:String) { };
foreach (var r in animals)
  names.Add(r.self, r.animal); 

var sounds = Table(self:Id, sound:String) { };
foreach (var r in animals)
  sounds.Add(r.self, r.sound); 

var legs = Table(self:Id, legs:Int) { };
foreach (var r in animals)
  legs.Add(r.self, r.legs); 
How do you combine literals that were produced independently without collisions in identity? i.e. given two literal tables with identity, produce their union, perhaps with a few relationships or edges between them.

Nice. Have to think about that one. It may just require brute force programming effort on behalf of the programmer.

Thanks for your insights. Your proposed approach sounds very well thought out, but I fear may not be very accessible to run-of-the-mill programmers if that is your goal.

run-of-the-mill programmers

Thanks for your insights. Your proposed approach sounds very well thought out, but I fear may not be very accessible to run-of-the-mill programmers if that is your goal.

I'm assuming run-of-mill programmers will be leveraging EDSLs (macros and higher level library functions) to keep things simple... doubly so when dealing with spaces.

I suspect a functional approach will be more accessible long-term, when dealing with reactivity, distribution, concurrency, etc.

var names = ... foreach (var r in animals) ...

It doesn't look like you actually managed to produce multiple 'literal tables', but rather produced a universal relation then algorithmically broke it into smaller chunks. If you need to create a Universal Relation to ensure shared identities, then this approach will scale poorly (i.e. violating once-and-only-once) for 1:N or N:M relationships.

Perhaps you should look into representing digraphs as a good initial effort for stressing your table literals.

OTOH, given that your 'tables' all have a special 'key', perhaps you're not so interested in 1:N and N:M relationships.