Abstractly typed languages

This is an admittedly rather vague idea that has occurred to me a few times - I'm wondering if anyone has had the same idea, or been able to formalize it in an interesting way.

I'm wondering if it would be possible to have a purely functional language, like Haskell, in which 'everything is an abstract data type'. There are no concrete data types in the language itself, everything is dealt with in an abstract / categorical / 'up to isomorphism' kind of way.

Data types would be chosen just to reflect what the data logically is, without concern for the physical representation or the performance characteristics of the algorithms involved. So a list of pairs, a hash-table and a function would all be of type Map. (or (Map a) => a, to borrow Haskell's typeclass notation)

The compiler, then, could chose which concrete implementation to use, say for a list or a map, based on any kind of compile-time or runtime optimizations it feels like. It can also apply lots of highly abstract categorically-based rewrite rules before it even gets to that stage.

Is this idea fatally flawed (I fear it might be, but I can't quite figure out how), or has anyone done anything similar?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Interesting Stuff

I have seen similar ideas mentioned here before, some of which I have bookmarked:

Frank Atanassow's work has been focused on similar ideas, and doing a search for his name will turn up interesting posts.

I dream of a specifications language...

I dream of a specifications language that would do what you propose; the only data type would be the 'set', as in 'set of information'. Each set would have a function which implements the 'set'.

For example, a function of type Integer -> T denotes a mapping of integers to type T. If Integer is all the set of integers, then the implementation can be an array; if Integer is a variant set of integers, then the implementation is a map.

Sets with functions of type T -> T (i.e. a mapping of an element to itself) can be implemented as set classes (because the key is the same as the value).

Single values are sets of a single value: V -> V where V is a specific instance of a value.

The language should also have, as part of the typing system, operations on sets:

  1. insertion
  2. removal

The compiler, by reading the program, can infer from the types of insertions and removals the underlying implementation:

  • if insertion and removal is at the edges of a set, then a linked list or array can be used.
  • if insertion and removal is at the middle of a set, then a linked list or a map or a set implementation could be chosen.

Where the program alone can not be used for inferring the proper implementation, the programmer would have to supply annotations.

As for persistence, I do not think it is a matter of specifications, but a matter of implementation: the underlying mechanism should be programmed to persist the data when needed, independently of what the type of the set should be.

Sets have edges and middles?

Sets have edges and middles?

Of course

Here's my proof. :-)

(everything you need to know about maths).

Nice!

...but ordered sets have edges and middle values...and in a programming language, what affects the implementation of a collection is where data are inserted to and removed from.

But set != ordered set, and

But set != ordered set, and forcing an ordering has consequences, which is why you're being called on it.

In a strict mathematical sense, yes.

...for the average Joe like me, set is similar to ordered set, more or less :-)... Sorry for not using the appropriate mathematical terms, but I prefer doing that when I formulate a vague idea into something more concrete.

The whole idea behind using sets is that a greater degree of abstraction could be achieved, freeing the programmer from being responsible to declare implementation details.

I think it could be achieved, if the compiler knows some of the properties of the set, either declared by the programmer or inferred by the compiler.

Let's pretend the word 'set' equals 'collection of data' in the context of programming languages; then we can categorize collections according to the following properties on algorithmic complexity:

1) lookup: arrays and hash sets have O(1), linked lists have O(n), maps have O(log(n)).

2) insert: arrays have O(n), hash sets have O(1), linked lists have O(1), maps have O(log(n)).

We also have some special cases:

3) prepend: arrays have O(n), linked lists have O(1).

4) append: arrays have O(1), linked lists have O(1).

A programmer could then declare a set of data by declaring the desired properties on algorithmic complexity.

Another approach would be for the compiler to select the appropriate implementation at run-time, based on statistics collected per set. For example, if a set has lots of insertions at the end of it, then the implementation could be an array; but then the next batch of insertions are in the middle of the array, and therefore the implementation is converted at run-time to a linked list.

And perhaps the above is possible at compile-time, based on the code and help from annotations.

In either case, the level of abstraction is raised; and if coupled with comprehensions, lots of boilerplate code is turned into a one line function...

Combination

Another approach would be for the compiler to select the appropriate implementation at run-time, based on statistics collected per set. For example, if a set has lots of insertions at the end of it, then the implementation could be an array; but then the next batch of insertions are in the middle of the array, and therefore the implementation is converted at run-time to a linked list.
Or a combination of both...
I would qualify this of "list" instead of set though. Indexes up to n are in an array, then middle elements from n to p in a linked list, then the remaining in another array.

BTW what is your definition of a map, and how is it different from a hash map? I suspect you mean a tree-based ordered list but please do tell.

Either.

By map, I mean the general concept of a map, i.e. a mapping of values to values. The actual implementation should depend on details, i.e. it could be a hash map or a tree map depending on what the user selects at compile-time or what the program finds out at run-time or both (as you suggest).

The whole idea behind using

The whole idea behind using sets is that a greater degree of abstraction could be achieved, freeing the programmer from being responsible to declare implementation details.

It is interesting that you say this because just above you required sets to be ordered.

Premature optimization?

Optimization will likely be a concern at some point. But the main reason sfor pushing the envelope on this front has more to do with expressiveness (via abstraction), reuse (aka generics) and reliability (via proofs). Seems to me that the point of the operation is to abstract away the problem of physical representation.

The kinds of optimizations that will be necessary to make the runtime efficient go beyond the efficiency of inserts and reads into an internal data representation. That will probably be a component of the implementation - both compile and runtime. Also, new forms of optimization will probably become available since many of the details of representation are left out.

Yes, optimization wasn't really the main point

The mechanics of how the compiler would choose which data structures to use, wasn't so much what I was interested in, although I'm sure it would interest some.

I was more interested in the formal properties of such a language, the higher level of abstraction it would allow programming to take place at, and the opportunities it would afford for higher-level reasoning about code, both machine and human.

It feels to me like the kind of thing which could be formulated very elegantly using a lot of category theory, but I'm afraid I don't know enough of that yet to be sure. Was hoping one of the resident 'type theory gods' might have something to say on the matter :)

It's not exactly what you're

It's not exactly what you're asking for (to be honest I can't properly make sense of what you're asking - sets with middles and edges?) but you could alwasy try CASL or other similar algebraic specification languages. It has many of the sorts of properties you seem to want.

Existing specification languages to the rescue

Don't either Z or
VDM-SL suit to your needs? Subsets of them seem to fit your description...

SETL

Jack Schwartz's SETL is based on two datatypes (sets and tuples.) The compiler did a lot of analysis to decide, on a case by case basis, how to map those types onto appropriate implementations.

(a,b)={{a},{a,b}}

Just curious. What would an ordered triple look like? I cannot find this in any of my math texts.

(a,b,c) = (a,(b,c)) (Or

(Or ((a,b),c), or ((c,a),b) or ...)

(Weird bug on this web site. I can't delete the word 'Or' from the subject line in the preview.)

[OT] line break

There's a line break after "...(b,c))" in the parent's subject line, and "(Or" is on the next line. That can happen when the subject line is originally left blank, so automatically copied from the comment's text. Deleting the line break fixes the problem.

Ah, you are right! I just

Ah, you are right! I just found it.
And with one additional step to get from an ordered triple to a set:

(a,b,c) = ((a,b),c) = ({{a,b},{a}},c) = {{{{a,b},{a}},{c}},{c}}

Argh! This tangle is worse than Lisp (dodging the tossing of many tomatoes)! I meant my original post to go under Achilleas comment (specification of everything in terms of sets). If it is this bad with a triple, imagine an n tuple.

How about relying on depth for ordering

Can't you just use:
{{{a},b}, c}

That is, a subclass of sets that contain two elements, one of which is a set. The set comes first in the order. There is a single-element set as terminator (head).

A list by any other name

There is a single-element set as terminator (head).

If you flip that over (head is the initial element), you've just described at set-theoretic implementation of a list.

It is an elementary fact from set theory that there are numerous different ways to induce an ordered tupling on a set.

It is an elementary fact from the category-theoretic description of set theory that they are all isomorphic.

From the point of view of the original topic of this thread, this means that if you have a precise, abstract description of the properties you want, it doesn't matter which implementation you choose, you will be able to simulate the same specified behaviour.

Thus, a compiler could choose the representation it deems best and still fulfill the specification defined by the program.

Data types would be chosen

Data types would be chosen just to reflect what the data logically is, without concern for the physical representation or the performance characteristics of the algorithms involved. So a list of pairs, a hash-table and a function would all be of type Map. (or (Map a) => a, to borrow Haskell's typeclass notation)

I guess many people have actually thought about solving the word problem. However it is intractable in the general case.

Otherwise a specification language would permit to state morphisms on different representations of ADTs explicitely and use pattern matching for detecting correspondences. Furthermore one can put some of them into libraries for heuristic use and automatical detection. This is not very much different from writing component wrappers or even implicits in Scala. Saying this I believe your suggestion makes perfect sense.

Interesting

Although I'm not sure I see the connection with the word problem - do you think you could spell out what you mean there?

Hutchins' Sym and Ohmu

In his Ohmu and Sym languages, DeLesley Hutchins has been working on this idea. I can't find a good online paper on Ohmu.

http://delesley.blogspot.com/
http://prog.vub.ac.be/~thomas/ERLS/DeLesley.pdf

Wouldn't there have to be some?

I might be being too literal, but wouldn't there have to be some concrete types to allow literal data to be entered into the system?