For a relational language, how treat KeyValues, Streams in relations with operators that add or remove "columns"

I'm building a relational language where I wish to treat everything as a relation. The main point is to provide a common core of relational operators (aka: Queries) that work across all. A "relation" here is a table with a header of fields, columns and rows.

I have a container that match directly this, that internally is an array:



id name age
1 Jhon 18

Now, I have other containers that are Key/Value like a hashmap and BTreeMap, and single value like Generators, ranges and vectors. Only generators, as being read-only forward-only can be say to output a fixed schema.

You can say a hashmap, for example, is a relation that look like:



Key Value
1 Jhon

And a vector one like



it
1
2
3

With all of them, I wanna provide the same operations like:

table ?where #age = 18
kv    ?where #key =1 
array ?where #it = 2

Now my dilema is what to do for some operations that add or remove columns, like joins and projection:

table ?select *, #age + 10 as older
kv    ?select *, #value |> String.Upper as name
array ?select *, it * 10 as calc

What could be the best option here?

1- Say that kv/array have fixed schemas and can't grow or shrink their headers

2- Say a KV only have fixed the KEY and the value header is flattened, so it become:

[key, value, name]

The problem is not what OUTPUT. I could just output tuples. Is how REPRESENT the relation if it be more than the KV.

So, If I have 10 columns, what is the best option to put inside a KV? Consider that the relational model allow to rename columns, and add and remove them.

Another problem is that KV like Btree allow to model... trees. If I think instead of Btree as a index behind a table, the language lost the ability of "naturally" model trees. BTW, I wanna to have AGDTs.

P.D: Other potential question is: What kind of container could work alike "table" but also alike hashmaps/trees? How model a type that allow several "backends"?

P.D: Originally here: https://www.reddit.com/r/ProgrammingLanguages/comments/awkdrb/for_a_relational_language_how_treat_kv_and/ but not good answers... so I ask here with a bit of more details.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Isn't there already enough prior art?

You seem to be making this really hard, by trying to re-invent the wheel. Most of your answers are here. (You'll have to hold your nose as you go past the ugly 1980's syntax.)

A "relation" here is a table with a header of fields, columns and rows.

Suggest you follow the definition of "relation" from relational theory. That is, not SQL. A relation is a set of tuples, all with the same 'Heading'. 'Heading' is a technical term; it doesn't mean "header". A tuple is a set of name-value pairs. The technical term for name-value pairs is 'attribute'. All these are sets, and you can explain the semantics in terms of set operations. My first link gives the semantics using setbuilder notation.

Talking about tables/columns/rows will lead you down blind alleys of physical representation. You'll end up at the mess that is SQL.

If you want a Key-Value store then you don't want a relation. Relations are not restricted to having a single key. If you want a Key-Value store, there are plenty of NoSQL-type products to draw from, with all their attendant difficulties. They are not relational.

what to do for some operations that add or remove columns

Operators that remove columns (aka project and project-away) need some way to specify a column-set. Use a relation for that. See the work of Vadim Tropashko.

Adding columns (aka extend) is Join -- I suppose I'd better call it 'Natural Join', since your head seems to be full of SQL. Rename is a combo of Join and project-away (aka compose).

If you want to extend with a formula such as `#age + 10` then you're Joining with an algorithmic relation -- that term is from a 1975 paper by Hall, Hitchcock, Todd.

Don't use that text

That 'grussell' text that you link to is terrible. No wonder you're asking confused questions. It doesn't even define 'relation', and clearly it thinks SQL is relational.

There are any number of better texts. If you want a free one, use The Alice Book. It has a nice balance of theory and practice.

How to ask questions

P.D: Originally here: [reddit] but not good answers... so I ask here with a bit of more details.

The answers you got showed more effort to understand than you showed in posing the question. Just pasting the question in here shows you are not learning.

Go away and research the topic properly. Then ask questions using terminology that appropriately shows you've done that. I am being a lot more polite to you than the treatment you'd get on StackOverflow.

It's a journey

While you are certainly treading over ground that has been well-traveled (as the other commenter suggested), you should not worry about re-inventing the wheel, if you like inventing (or re-inventing) things. You'll learn a lot by doing it. After all, it is your life, and your time, and sometimes the best way to learn is to do.

I'm not an expert in the area that you're exploring, but there's lots out there to read on set theory, type theory, relational theory, and so on. The other option is to look at other projects (e.g. open source) in the same space as you are working, and see how they approached these problems.

Just remember that many times, things that _look_ easy are actually the hardest, because people have invested years finding ways to simplify (reduce the complexity of) the topic. When you start to re-invent in those areas, you should not be surprised by unexpected combinatorial explosions of complexity, because your experiments will be probing the boundaries of what appears to be simple, but is actually a very carefully orchestrated assembly of carefully-honed concepts. Complex architectures fit together not by accident, but by great purpose, and you are just beginning the journey of understanding those purposes.

Just remember that many

Just remember that many times, things that _look_ easy are actually the hardest

That so true. I start this journey a few years ago, mostly reading. Now this year I start to get serious (how hard it can be?) and have lost the count on the many refactoring, and full rewrites of the core.

I think I'm getting close to a decent first version of it. I think the problem I show here is my last main showstopper in the logical area. At the same time, looking how make a speedy and not memory-hungry engine and many other stuff.

I wish to have a partner on this, to mainly have a better perspective. So far the help of the rust community have been great, but mostly about implementation details...

Alternative styles of language for relational

Have a look at Dataphor's D4 or its inspiration Tutorial D to get some ideas about language structure, etc.

There is enough discussion of the subject on The Third Manifesto Forum https://forum.thethirdmanifesto.com/ from all sort of viewpoints that should give you various inspiration and leads to further work done by others in this area.

miniKanren

Also, miniKanren and microKanren offer interesting takes on relational programming.

And Datalog, Dedalus, and Bloom.

And Answer Set Programming.

Thanks for the replies.I'm

Thanks for the replies.

I'm well aware about the relational theory. But this is not about how represent relations in the abstract. Is about the physical storage of relations and how deal with it.

Using a 2d structure is "easy". But I can't remember about how about extending the relational model for more than that.

For example:

* If you want a Key-Value store then you don't want a relation

Why not? Is the relational model not powerful enough to be on top of a KV? I remember that the relational model not prescribe the internal storage, so can be made on anything. Well, now I'm at that point: I wanna see how, actually, do that.

* Dataphor's D4, Tutorial D, miniKanren

Can deal with trees or hash maps? I don't remember it can, or at least, not obviously to me.

Key-Value store is OK

There's absolutely nothing wrong with a well-executed key-value store as your underlying store for a relational system. As one example, SQLite 4 has been headed in that direction.

That said, an underlying K-V store turns into a somewhat specialized beast in this problem space. Just as an example, there are strong reasons to inline (unbox) certain kinds of column values into their corresponding index trees, which isn't something you typically think of when building a K-V store.

Taken overall, you could do a heck of a lot worse than adopting the SQLite implementation as your underlying store. It's mature, exceptionally well tested, and you could easily put years of effort into building something that isn't half as well done. Adopting it lets you focus your attention where you are actually adding value.

Even if you don't want SQL, the SQLite4 code base has a fairly clean separation between the K-V part and the SQL layer. You might find that you can use the K-V support without the higher layer.

SQLite performance

I agree that SQLite4's codebase might be an interesting place to start; of course it does not have the battle-tested maturity of SQLite3.

The SQLite team wound up the #4 experiment, saying they would integrate the lessons learned into SQLite3. So on the one hand, it seems that they regarded using an underlying K-V store to be an interesting failure.

On the other hand, they have been pushing SQLite3 as being good for the kind of things people use K-V stores for. E.g., sqlar.

There was a discussion of the use of a K-V store in an HN subthread that touched performance issues associated with putting a DB engine on top of various kinds of K-V stores.

In the worst event...

Just use SQLite (or something else thy exists) as your store. You don’t have to experiment at all points at the same time.

Power of KV compared to relation

> * If you want a Key-Value store then you don't want a relation

> Why not? Is the relational model not powerful enough to be on top of a KV?

I mean: a KV is not powerful enough to implement a relation. Yes I get you that you're talking about physical representation. That is, if you mean the Key to be one/some of the attributes, and therefore mentioned in queries.

If OTOH you want the Key to be not some attribute(s) but a surrogate for a tuple (as a 'Record number', for example, or a hash of the whole tuple) then you add a level of indirection to every query. Then as an implementation issue you get bad performance. You might consider a vertical store: that makes every query equally bad performance.

> I'm well aware about the relational theory.

I'm not seeing evidence of that. I suspect you think SQL is within relational theory.

That is, if you mean the Key

That is, if you mean the Key to be one/some of the attributes, and therefore mentioned in queries.

Yes, I want this. I have find is better to specialize some of the relational operations by each data structure (for example, with a Btree I can rely in the fact PK are not duplicated).

But then hit a logical wall in what to do with resizing attributes (columns), and that is where I wonder if and how, this have been solved before.

I'm not seeing evidence of that. I suspect you think SQL is within relational theory.

I'm self-taught, and not formally educated in the PL areas, but read a lot about this issue. Not using the proper terminology when posted in this site have cost me a little :) I use the more popular terms because that is what most people know.

However, certainly I'm influenced by what SQL is and worry about the more mundane stuff in how implement the model. I still not know how "pure" can be implemented without cost a lot in terms of actual performance.

For example, making the storage work on sets where order don't matter make stuff like implementing equality too costly, and this expand to many other areas where when I have tried to be more pure is not very practical. In fact, that is part of the reason to introduce Btree/Hashmaps/BitArrays in the equation.

I start to see why RDBMS not fully follow the relational model to the letter. Performance is one, but complications on how implement some of the logic is other.

Abstraction

Relational algebra is an abstraction, and there is no concrete data structure, just a model. The model is effectively that a relation is a table - a collection of tuples where every tuple has the same semantic meaning for each member in the same order. An example relation might be (name-of-employee, age, salary). Think of this like a set, but every member of the set must have this pattern.

It is important to realise that things like sets and relations impose a minimum criteria. An ordered set is still a set, so you can impose more restrictions on the data structures and it will still be relational algebra. For example we might store the members of a relation (table) in a certain order, but this is not a problem.

How can you build data structures that allow arbitrary relational operations to be performed quickly? The usual way is to have a separate block for each different relation, and then simply append new tuples to the end (an array). You delete by blanking out a tuple, this leaves spaces that will need compacting eventually, but doing it every time is too slow, so the DB will do it once every so often. This works nicely for things like projection etc where you construct a new relation copying the relevant values from each tuple.

This is obviously slow for things that only access a few tuples from the relation, so you construct ordered ‘indexes’ for each column that is often used in restrictions. These indexes reference the location of the tuple in the relation block, and also need to be rebuilt when compacting.

Going back to a key-value store, it only indexes on one element of the tuple, which is why it does not make a good candidate for storing the tuples, and it also does not easily support range queries like 20 < age < 40.