In my pursuit to implement persistent functional databases, I'm struggling with the difference between Sets and Lists.
My question is to Lambda the Ultimate is: what's more fundamental? Set membership or the relative order between List elements?
I tend to prefer Sets above Lists - because Sets are more succinct and because ordering can be expressed on top of Sets (although somewhat artificially).
Consider the following Set A:
A = (5, 1, 3)
Imposing order over Set A can easily be achieved if you compare and sort all its elements (lexically), giving:
a = order_in_list(A) = {1, 3, 5}
Conversely, mapping Lists to Sets cannot be achieved, because double elements could be possibly contained by Lists, hence:
b = {1, 2, 2, 3, 5}
B = make_set(b) = (1, 2, 3, 5)
b' = order_in_list(B) = {1, 2, 3, 5}
which gives the unsatisfactory : b' != b
But things get even worse if we try to map the following (unsorted) List to a Set and vice versa:
c = {9, 3, 3, 1, 5, 4}
C = make_set(c) = {9, 3, 1, 5, 4}
c' = order_in_list(C) = {1, 3, 4, 5, 9}
Which gives us the even worse : c' !=!= c
Yet if Sets are more general then Lists how then can we uniquely map c to C?
One possible mapping would be the following:
C' = make_ordered_set(c) = {(0,9), (1,3), (2,3), (3,1), (4,5), (5,4)}
and vice versa:
c'' = ordered_in_list(C') = (9, 3, 3, 1, 5, 4)
Giving us the c'' = c --- the equality we want:
But with this approach we are still stuck with the (position, value) List pairs. But not too surprising this can be rid of easily with the following generic mapping:
P = ($position, $value)
p = make_orderder_pair(P) = {{~, $position}, {!, $value}}
where '~' and '!' are special and reserved symbols that are unique and cannot be re-used except for the creation of List pairs.
Now that we established a perfect mapping from Lists to Sets and vice versa, we can conclude that Sets are more fundamental - not a big surprise if you think of Sets to be the basic building blocks of all mathematical theorems.
But one interesting operation remains: the concatenation of Lists. Let's see if the previous mapping is capable in expressing the concatenation of Sets. First, let us consider the simple case:
A = {4, 1, 3}
B = {6, 2, 1}
One algorithm of mapping the concatenation of A and B would be the following:
C = concat(A, B) = {{4, {6, 2, 1}}, 1, 3} ie. nesting B into the greatest element of A.
However, if we concatenate C a million times with itself, this algorithm would create a nested structure of the same depth: a million - not an ideal solution. Especially if we want map such enormous Sets to Lists in at least (O(log n)).
An alternative algorithm would be the concatenation of the greatest element of A to *all* elements of B. This would give:
C' = concat2(A, B) = {4, 1, 3, (4,6), (4,2), (4,1)}
But again, by applying this scheme we alternatively end up with very big elements (length > million).
So the choice is to allow either very deep recursion of nested sets, or the growth of enormous elements (by repeated concatenation).
My choice would be the growth of enormous elements - because concatenating elements can be relatively cheap.
But how can this be cheap? How can we ever expect to concatenate humongous elements cheaply?
Consider the concatenation of simple Strings, in Java or in any other industry strength language - such Strings are immutable. But the naive Java implementation of immutability causes a lot of inefficiency. Concatenating two Strings will always involve the copy of both Strings to produce the new one, so:
A = {This is a text of a millions characters....}
B = {A trillion characters text will follow....}
concat(A, B) -> in Java this will cost you O(1.000.000 + 1.000.000.000)
But what if you can concatenate A and B in min(log(A), log(B)) operations? This can be done in reality, albeith not easy.
One route to fast concatenation would be the recent discussion and implementation of:
Finger Trees: A Simple General-purpose Data Structure by Ralf Hinze and Ross Paterson.
Another potential solution is:
Ropes: an Alternative to Strings by Hans-j. Boehm, Russ Atkinson and Michael Plass.
Regardless of this discussion, both articles are very much worth reading - especially because they encompass the thru functional style we all love.
Recent comments
27 weeks 1 day ago
27 weeks 1 day ago
27 weeks 1 day ago
49 weeks 3 days ago
1 year 1 week ago
1 year 3 weeks ago
1 year 3 weeks ago
1 year 5 weeks ago
1 year 10 weeks ago
1 year 10 weeks ago