Code Completion for Generic Programming

I want generics, in particular polymorphic functions as well as function overloading based on type-classes. For the moment think about Haskell type-classes and C++ Concepts. I also want code completion. Code completion works for OO languages because they are polymorphic on the object type, which prefixes the function call, and hence the type is known when code completion lookup occurs. Generic functions can be polymorphic on any argument. This suggests that all the arguments need to go before the operator in postfix format. This way code completion can operate on the argument types, and only offer valid generic functions. It can also offer all generic functions that take those arguments in alphabetic order. I want to discuss whether this idea has any merit, and second I want to discuss syntax:

So, a reverse lisp style (likely to confuse people as no clear difference from lisp function first format):

(list comparator sort)

or perhaps reverse C:

(list, comparator) sort

Or perhaps a template C++ like syntax:

<list,comparator>.sort

Or:

list,comparator.sort

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Unconvinced

I think it's useful to consider interactive tooling when thinking of the syntax, but that should not be the sole concern. Deciding function/argument order for tooling reasons feels like over-doing it to me. There are more important factors, such as what your userbase will be familiar with, and what is best for readability of the code once it is produced.

It looks like you are very rigid about the interaction model of the tooling (it must be called to complete something *to the right* of alreayd-written code), and very flexible about what the syntax can be. I'd rather change the former and be more conservative on the latter. What's stopping me from writing "list", hitting the auto-completion button, and having the IDE write the selected function name *before* the argument?

Finally, type-directed completion seems to work fine without changing the usual functional-programming syntax, assuming careful API design. Just define "(.) : forall a b, a -> (a -> b) -> b" and "sort : forall a, cmp a -> list a -> list a", and writing "list . " then hitting the completion button should propose "sort" (with an argument to be filled).

I think I'd rather think of a more flexible mechanism where I indicate holes to be filled later, and the tooling helps me fill them. For example I may write "sort ? ?" if I know I want to sort (and let the scoring heuristics propose "list", which I just defined, as a likely candidate, and later fill the comparison accordingly), or maybe "? list" if I'm wondering about what I'll do to list next. In absence of hole (and when not writing and identifier), hitting "complete" would just mean "assume there is a hole at the cursor position and help me fill it".

I agree.

All good points. Postfix just seems an interesting way of doing it, might find a niche with the HP calculator users accustomed to reverse polish :-)

Is there any real advantage? I guess I am still thinking of a linear editor, where you type and it offers completions and these are always 'ahead' of the cursor. Is this worth anything, I don't know, but it might be interesting enough to try a small experiment.

Your goal is to hide as many

Your goal is to hide as many completions from programmers if they get the types wrong, or if they don't know what the arguments are yet? That is horrible usability.

Think from the point of view: "I want to do something with list" or "I want to do something with comparator". What arguments arise will follow from what you decide to do, even if some of those arguments are invalid for the method. It would be nice if list.sort(comparator) and comparator.sort(list) were valid ways to input (that could be normalized into one form after the expression is complete).

But even that doesn't seem very right. What if list is not a type that supports sort, but should be?

Functions not methods

I think sort would not be a method of 'list' in a generic language. Rather generic functions are defined in terms of 'concepts'. For example you could have some sort algorithms defined on iterator and comparison concepts:

sort(forward_iterator, comparable_value)
sort(bidirectional_iterator, comparable_value)
sort(indexed_iterator, comparable_value)
sort(random_iterator, comparable_value)
etc...

Where the different sort algorithms are classed according to their requirements on the container (iterator) and value (comparitor). The containers provide the appropriate efficient iterators like:

singly-linked-list provides forward-iterator
doubly-linked-list procides bidirectional-iterator
vector provides random-iterator
etc...

So each container (list etc...) doesn't define a member 'sort', its defined using 'Concept' overloading. Code completion could filter the top level database of all functions for whether the type provides the concept required for the overload, and it will pick the most specific overload if there are several version of the same function that match.

I guess it nearly always makes sense to have one argument first, like "I have a list, what can I do with it". There are some functions that dont have an obvious 'subject' like non-directional linking of two nodes.

link(a, b) == link(b, a)

In some regards postfix makes sense here, as they are both subjects (I have nodes 'a' and 'b' what can I do with them.

So maybe you can use an OO style syntax with tuples for the multi-subject cases? Auto-completed parameters can be named so that you get an argument template to complete:

list.sort(order:descending)
<a,b>.link()

There's just something weirdly appealing about the postfix notation. The more arguments you give it before completion the less possible functions would match. For example given a container and a comparator maybe only sort matches?

Turn the representation question it on its head.

Why not have the language itself provide a sort of barely-defined "collection" thing on which all collection operations are defined, and then make decisions about representation based on the way particular modules use it?

IOW, someone can use a "collection" in a program where they add!() items to it, call contains?() to see if items are in it, and delete!() items from it, and under the hood your typechecker/analyzer/profiler/whatever decides that instance of a collection is best implemented as a hash table.

Having made exactly the same declarations, though, a different part of the program can define a collection as a member variable in an object that operates on it using push!() and pop!() and under the hood the typechecker/analyzer/profiler/whatever will decide that this instance of a collection is best represented as a singly-linked list (or maybe a buffer with a 'head' pointer).

I guess my point is why should the programmer be worrying about representation? Why should representation questions show up at the level of abstraction that human beings work with? The programmer has enough cognitive overhead worrying about what she wants to actually DO; worrying about representation should be part of what the language system does for her based on her decisions about what she wants to do and on the basis of what representation profiles best or passes typechecks for actually doing it.

So "sort!()" should be defined on collections, full stop. The programmer should not be worrying about whether a particular collection is a "list" or not, and neither should any autocomplete system. She shouldn't even be thinking about whether it has an order or not unless she particularly wants to get items out of it in a certain order.

If the programmer is allowed to make representation decisions independently of use decisions, she can get them disastrously wrong. or even if she gets them perfectly right they can become incompatible as the program changes over years; if the collection is fully generic and backed up by good analysis, then at some point the compiler should just be noticing, "oh, this linked-list implementation profiles badly now that we're using sort!() a lot; from now on I'll compile this as an array implementation."

This might work better in a

This might work better in a aystem where the compiler/evaluator are accessible rather than mysterious black boxes.

Concepts

My description was obviously bad. I was trying to explaining that sort depends on the iterator concept, not the type of the container. So a merge-sort that can be implemented with only forward iterators, and a quick-sort implemented using random iterators will be picked according to the type of iterator supported by the container.

In this way the most efficient sort algorithm, overloaded on the iterator concept, will be selected for the container. A totally generic sort that could be used on any container would be defined in terms of a forward-iterator (as it is a sub-type of all iterators).

So the definition of sort knows nothing about collections at all, and especially not which particular collection it is being used on.

To bring this back to your point, an iterator is the barely defined collection thing, except it has a precise definition. A forward-iterator simply defines a 'successor' function such that it given i it returns i+1. If an iterator also supports the 'readable' concept, then you can dereference it for reading, writable the same for writing.

So sort implementation might be in-place (given a "readable writeable iterator") or might sort non-destructively (given only a "readable iterator").

The full definition of an iterator looks like this (from Stepanov's "Elements of Programming"):

Iterator(T) =
  Regular(T)
^ DistanceType : Iterator -> Integer
^ successor : T -> T
^ successor is not necessarily regular

Readable(T) =
  Regular(T)
^ ValueType : Readable -> Regular
^ source : T -> ValueType(T)

Sorry. Missed it because I don't care about iterators.

I don't consider sorting to be dependent on iterators. Sorting is a very general concept of math and logic; iterators are a set of implementations of very mechanical, low-level operations on representations which don't even make any sense outside a fairly narrow set of OO frameworks.

Even in the languages that use those OO frameworks there are plenty of ways to sort collections without using iterators; the consequence is merely that that particular sort then will be defined in a way that is valid only for one representation of collection.

So, no, I probably wasn't going to see that point. I'm entirely focused on trying to see the ur-idea of SORTING and produce it where the code wants it to happen, rather than focusing on the details of how it's to be done or how certain OO systems encourage people to express it.

Hmmm

An argument could be made that your notion of 'fundamental' is exactly backward, and that "something like iterators" is much closer to the ur-idea of sorting than "something like collections." In fact, I might even make such an argument myself.

I have never seen a

I have never seen a presentation of iterators that isn't tied to some particular OO system and informed more by that OO system's peculiarities than by any mathematically fundamental idea.

I can conceive of them being "fundamental" only if I take a definition of them so broad that nothing, including the idea of indexing into arrays, is excluded.

Moreover, even if they are "fundamental", so are continuations but it is ridiculous - even harmful - to ask programmers to think of or use them directly.

Iterators nothing to do with OO

(forward) iterators simply capture the idea of a sequence. There is no OO here, "successor i" is a pure function. The associated concept of readability gives access to containers, as in "source i" is the value in that location, which is also a pure function. To sum all the values in any (iterable) container:

sum l l = 0 
sum f l = source f + sum (successor f) l

sum begin(collection) end(collection)

Fundamental

The sense in which iterators are fundamental is deeper, I think. The idea of the iterator is that it describes the interface to the data, no matter what the representation or even the semantics of the underlying collection (e.g., list vs set).

The point is that regardless of what kind of collection you have, or even a collection-like thing like a lazy stream, etc., you have some interface that allows you access to the elements. And there are important differences between different interfaces: for example, do I have constant-time random access? Can I go backward? Can I rewind to the beginning and see all the elements again?

(Of course, in a lazy language like Haskell the definition of the List type is itself an iterator, since other things can be lazily transformed to conform to that interface. But the distinction matters in general.)

I agree that iterators as usually presented in the context of C++ seem hopelessly OO-specific, but I think the notion that, for example, sorting algorithms depend on the interface through which they access the data is much more general.

In fact, based on your comments above, I feel like you'd agree that interface is more fundamental than representation, but it seems that you have some misunderstanding or allergy to the word 'iterator'.

This completely does not

This completely does not sound like any of the ideas that has been presented to me as "iterators." You're right, it's a much more important fundamental idea than some crappy sets of OO methods that have been clothed with that name.

You're talking about the relationship between different modes of organization of data and the properties of accessing/traversing that data. These are the relationships we used, learned and calculated in Algorithm Analysis, where we were asked as students to calculate the big-Oh notation, memory requirements, access counts, etc, of various algorithms on given kinds of collections. Referring to particular instances of those equations and results using the name "iterators" doesn't change them, nor make them any more nor less fundamental.

We're in the territory of "definition so broad that it excludes nothing" here. This is no longer talking about iterators as being an idea distinct from the idea of algorithms itself. So why talk about iterators rather than talking about algorithms?

vs map, foreach, do, etc

Some OO approaches, like Smalltalk's, favor member-wise visit without iterators, using methods like map, foreach, do, etc, which pass an operation to perform along with a context (where under Smalltalk, a block satisfies both the operation code and the context state).

It's also common in Lisp and in C. Every C based collection at work uses FOREACH macros, for example. When I implement new collection classes, I add a foreach() method to every one, taking a pointer-to-function and a context pointer. The pattern is so familiar, no one even asks for a code comment in documentation, to explain usage, because it's obvious. No one asks for an iterator; if I provide one for a hashmap, I get a funny look until I add the foreach() method, after which they stop holding their breath. (Little experiments like that help test against one's own confirmation bias.)

The runtimes for some languages will be measurably slower with code folded the wrong way, against the grain, so benchmarks using iterators would cause some languages to suck compared to others. (I don't really have a point except that iterators are not necessarily fundamental compared to alternatives.)

That's also my experience.

Foreach(), map(), etc, are the expected form of traversals where I work, too.

Iterator methods as such are generally only written to satisfy demands made by particular OO frameworks, and never otherwise used.

map and iterators

Map can be defined in terms of iterators:

map fn l l = []
map fn f l = fn (source f) : map fn (successor f) l

So map is an algorithm, and it can be defined generically in terms of a forward-iterator.

Yes but that's too low-level for humans to be productive.

Of course it can. But iterators are not the level of abstraction at which human beings usefully think. Map is closer to it.

It's rather like continuations; there are a lot of things that can be expressed by transforming code into continuation-passing style but continuations are not how people think, and we should not normally require programmers to do those transformations themselves if we want our programmers to spend their time being productive instead.

claims of productivity

People familiar with iterators will think in terms of iterators, and will be sufficiently productive that it's difficult to verify a claim that they're less productive. Granted, it may be that collections-oriented verbs are in truth more productive. But until you prove it with empirical studies carefully designed to correct for known forms of bias, it's just BS rather than CS.

In my own experience, the little details for expressing local algorithmic computation aren't really the areas that bite me with respect to productivity. Typing something out isn't really the time sink, unless I'm asked to type it out over and over again (boiler plate). Though, even with boiler plate I can be productive (albeit, more in the fungible assembly line worker sense rather than commander of robot army sense; Java is a fine example of a language that can be productive without being powerful). Rather, what hinders me is stuff like race conditions, deadlocks, partial failure and recovery, security concerns, dependency management, preserving consistency of data structures in-program vs. in-database or between client and server, ad-hoc glue code specific to each compositions of objects.

Partitioning and Rotation

Can you define partitioning and rotation in terms of map, or even fold? If learning one abstraction lets you write a whole class of algorithms (those requiring only forward stepping/iteration through the data) it seems a better abstraction. Where better is simplicity times generality. Simpler is better should be obvious, general is better because you have to learn less special cases. If you only have to learn one concept to do many things you can be very productive once you understand it. On the other hand something else may be simpler, but if it only does one thing, how productive can you be with it?

Also you can provide all the algorithms implemented in terms of iterators, so using map would be exactly the same. The only time you need concern yourself with iterators is implementing a new collection (then you get all algorithms for free, so save a lot of work) or implementing a new algorithm (which works on all collections that support the required access pattern or a subtype of it for free).

I define partitioning in

I define partitioning in terms of filter() or split(), depending on whether I'm selecting on a predicate that has to be called on each member or on a position.

Rotation is usually in terms of pop() and append().

defining X in terms of fold

It is difficult to find a computation which can't be expressed origamatically (consider the parallel between folding* and automata). Map is more restrictive: it can express less, but when it does, you know more, because it indicates that it doesn't matter whether one iterates forwards or backwards, nor even whether one iterates at all (cf Blelloch).

Composition of maps is trivial; it's the map of the composition. Composition of folds is more involved, with an observable difference being that COBOL programmers used to manually collapse multiple traversals into a single one ("control break") whereas functional programmers apparently leave this process up to the machine ("deforestation").

* what are the three basic things a function can do? It can introduce information, it can forget information, or it can swizzle information: change the representation while leaving the logical content unaltered. Analogously, introducing or forgetting (or merely ignoring) structure uses a fold, but a map indicates that structure remains invariant (in fact, inviolate).

Equality

There are many algorithms that can't be expressed as a fold. Fold only works over a single data-structure, as soon as you want any algorithm involving traversing two or more structures at the same time it is no longer sufficient. For example equality of to data structures.

eq_it :: ForwardIterator s, ForwardIterator t,
  ForwardIterator u, ForwardIterator v 
  => (s, t) -> (u, v) -> Bool
eq_it (l1, l1) (l2, l2) = true
eq_it (l1, l1) (l2, f2) = false
eq_it (l1, f1) (l2, l2) = false
eq_it (f1, l1) (f2, l2) = (eq (source f1) (source f2))
  and eq_it (successor f1 , l1) (successor f2, l2)

eq :: Linearizable a, Linearizable b => a -> b -> Bool
eq x y = eq_it (begin x, end x) (begin y, end y)

And of course this works on all linearizable datatypes like maps, arrays, vectors, lists, deques, ring-buffers etc, not just lists.

multum in parvo

What is a 'single' data structure? 'Zip' (in array languages, transpose) is the traditional way to view two or more structures traversed at the same time* as a single one. 'All' is a reduction, and hence a fold, and we know we can compose single folds producing single composite folds, so the sole question remaining is: are there any folds which are zips?

Exercise A: implement zip by folding forward, iteratively
Exercise B: implement zip by folding iteratively from the other end, backwards
Exercise C: implement zip as a non-iterative (bottom-up) fold
Exercise D: implement zip as a streaming bottom-up (paritywise) fold
(these are just some possible variations; consider a logic programming language, where the commutativity of unification means that any particular equality fold over substructures may be computed at runtime spread out over many different times for the different parts)

* if you don't want to traverse them at the same time, but one traversal will depend upon the other, (or, as in the folded version of Dijkstra's 'gcd', they are mutually dependent and potentially nondeterministic) then things get uglier. At the bottom of the stack, however, what does any automaton do other than fold its inputs over its state?

Personally, I'm happy to suggest "write your algorithms using iterators, but your functions using folds", as the point of an algorithm is to make efficient transitions (minimal traversals of specific representations) and the point of a function is to behave correctly (even if one should happen to swap out different algorithms underneath it).

[Edit: in fact, upon reflection, I've halfway convinced myself that iterators could be equivalent to folds. To me, an "iterator" is something that goes through a composite, element by element, in a given order, as in your iterative version of 'map'. Is it possible that your meaning of "iterator" is more general, and includes something that can also process a composite range by range? Are there nondeterministic iterators? (eg. equality written as a fold behaves the same way even if the pairs to be compared arrive out of order and potentially duplicated)]

Iterator Families

A Forward iterator can be copied and incremented independently. Bidirectional iterators can be incremented or decremented at any step. Indexed iterators can have integers added and subtracted from the position efficiently (so are like bidirectional iterators that support larger step sizes). A Random access iterator is a combination of bidirectional iterator and indexed iterator. BifurcateCoordinates have two successors (representing trees) etc...

Iterators are control-inversed folds

When the user passes a function to a fold, the fold (that is, the author of the data-structure that provides the fold implementation) has the control of the control flow, and the user must accept to be called and relatively helpless (it will be limited in terms of backtracking, incrementality, shortcutting etc. to what the fold communication protocol provides, or to using (costly?) control-capture effects to turn the table around). It is very convenient (and often easy to do efficiently) for the data-structure author to write a fold.

When the user is given an iterator, the user is in control the traversal of the data-structure value. It is much more convenient, often result in easier code and simpler adaptation. On the other hand, iterator providers have to contort themselves to working in a world where they lost control and have to explicitly reify their whole traversal state to be robust to the whims of the iterator user. It is often hard to write, and harder to write to be as efficient as the fold (for the subset of computations that can be expressed in both).

There is no universally better solution, because the situation is symmetric. If you are the person providing data-structures, it's easier to provide folds. If you are the person consuming them, it's better to be given iterators. You usually are both. I think the ultimate library should strive to require folds of its dependencies and provide iterators to its users, but with limited workforce you should focus (for each data-structure) on the specific equilibrium that exists in your project.

(Generic solutions to convert one into the other, eg. delimited continuations or "yield", are not available (or efficient) on most platforms. They are thus not necessarily usable as a silver-bullet to replace well-designed explicit indirect code.)

Iterators must me efficient.

The iterator should not be difficult to implement on a data structure, in fact you should only provide those iterators that map naturally onto a data structure. For example you should not provide a bidirectional or indexed iterator for a singly linked list. In effect the iterator is a guarantee that the data structure implements the access pattern efficiently, for example that 'successor' is O(1). A forward iterator is trivial to implement on a singly linked list, a bidirectional iterator trivial to implement on a doubly linked list, an indexed iterator trivial to implement on an array, a bifurcate-coordinate is trivial to implement on a tree. etc.

Iterators support synthesis at the cost of analysis.

Although iterators are very general, defining things in terms of iterators leaves the definitions overspecified. Using an iterator that has a particular direction, when direction doesn't actually matter (as in map or filter) makes it hard for the system to recover the fact that the direction didn't matter -- which it needs to recover if it's to discover that two algorithms do the same thing and can be merged, or that something can be done in parallel.

Using map, filter, and similar abstractions doesn't encode information about directionality, leaving it perfectly obvious that that information doesn't matter to the solution. IMO this is much better in an algorithmic language because it supports analysis better.

An algorithm expressed with fold or filter can be identified as equal to an algorithm expressed with a directional iterator, because the directionless expression couldn't have been used in the first place if the direction mattered - therefore the added information in the directional expression can easily be identified as irrelevant. But if you have the same algorithm expressed in terms of two different directional iterators, it is damned hard work (assuming you don't also have a directionless expression of the same algorithm) to analyze them and identify them as equal.

Efficient Algorithms

Nothing is overspecified. If direction does not matter then you can use a bidirectional iterator. It is clear direction matters for a singly-linked list, and this correctly would supply a forward iterator. A doubly linked list can provide a bidirectional iterator.

In our generic library there will be sort algorithms implemented on forward iterators and bidirectional iterators, each the most efficient sort for that family of data access pattern. So when you say 'sort collection' the language can select the most efficient sort algorithm for the 'type' of iterator supported by the collection.

I think his point was more

I think his point was more about imperative specifications imposing an arbitrary order on the function calls of e.g. map rather than anything peculiar to iterators. Noticing that a loop doesn't modify state isn't that heroic of an optimization, though.

Arbitrary Order?

But when is order arbitrary? Say data is stored in a singly linked list and you want to shard a 'map' operation over several compute nodes - except you can't efficiently split the singly linked list because you can only access it via a forward iterator.

Algorithms require access patterns/shapes, and collections make certain access patters/shapes efficient. I agree with Stepanov that this is a fundamental relationship.

Any order imposed by map is

Any order imposed by map is arbitrary. Even in your example, it may well make sense to separate the traversal from function call. i.e. you have worker threads that apply the function to a block of N elements and then return and you have a master thread that starts a worker thread at an index and then quickly skips ahead N elements, blocking when there are too many active worker threads.

As a drive-by remark, I think the ways zip and map are used in common functional programming practice are generally anti-patterns.

re: anti-patterns

please expand up on that; inquiring minds want to know.

I think I may have written

I think I may have written about this on LtU before, but any time map and zip are used to manage indexing, I think it's a mistake. I have to leave now but in a nutshell, I want more code that looks like this:

z[i] = sqrt(x[i] **2 + y[i] **2)

And less that looks like this:

zs = map((x,y) -> sqrt(x**2 + y**2), zip(xs,ys)) 

Huh?

I am not sure what the point is here. To seek, and skip ahead, copying the 'iterator' requires iterators.

map cannot express the required semantic refinement. Consider that 'parallel_map' is an algorithm implemented in terms of iterators on collections. Now different data structures can use different algorithms, so a parallel split expressed only in terms of forward iterators would be used on a singly liked list, but on an array a parallel map that uses indexed iterators is preferable.

So the generic library provides a family of 'map_parallel' functions implemented in terms of the iterators that could be provided by various collections.

I think zip and map are just algorithms that would be provided by the generic library with implementations on all the standard iterators. You would use them when you actually want a 'map', ie to add one to all elements in a collection etc. You would not build new algorithms on top of them, instead you would define sort, rotate, partition, etc on the iterators directly.

Edit: you could build 'higher' algorithms on top of map. Perhaps what I mean is that you use the operations natural to the algorithm, which may be a map if the algorithm is defined in terms of map.

There's the traversal and

There's the traversal and the function calling. The order of the function calling is arbitrarily specified by an imperative implementation of map. This is a rather shallow point so I think we're probably talking past each other.

I think I get it

I think I get it, for example 'fold' has an order that the accumulator is composed, but it may not actually access the data structure in that order.

I think this is captured where I stated you can implement fold in terms of iterators (providing different implementations for different iterators), but not the other way around. Fold seems to be an abstract concept (like sorting) that can be implemented with different algorithms, where the algorithm used would depend on the efficient access patterns for a data structure.

It's all about leaving tools the latitude to optimize.

It's hard to respond because now I don't even know whether you think we're talking about the same thing.

If we want better tools, we have to get out of their way and let them work. Choosing low-level details like particular data structures or particular directions for iteration, and embedding those details in the code, does not allow our tools to choose them. Lacking that choice, our tools cannot make good optimization decisions responsive to our basic needs as opposed to being responsive only to our implementation strategies.

You seem to think I'm talking about 'algorithms' as operations on particular kinds of collections like lists or arrays or bidirectional lists, which must be implemented in terms of different iterators because those collection types support different kinds of efficient iteration.

I seem to think I'm talking about algorithms as operations on collections, period, full stop. Caring whether a particular collection happens to be implemented as a list or an array or a bidirectional list or a tree or balanced tree or whatever is not something that should ever intrude on a programmer's mind, because it is not part of, nor relevant to, the absolute essentials of what the programmer is trying to do. Nor should it be expressed in a programmer's code, because expressing it denies our tools the ability to make these choices for the sake of efficiency. At best they are low-level implementation/optimization decision and ought to be left to the compiler or to a profiler based on what operations the programmer actually needs.

Accordingly the programmer should specify the absolute LEAST possible about the operations that are needed, in order to give the compiler maximum latitude to pick an efficient substrate for the operations and hardware by substituting in such details as direction, parallelism, as the program leaves unspecified when they can be picked in a way that supports highly efficient collections.

If a programmer asks for 'forward' in one place and 'backward' in another, then a highly-efficient singly-linked list is off the table; the compiler is forbidden from using a singly-linked list because that would support only a single direction of iteration. But if the iteration direction didn't matter because either of these places was just a 'map' implementation, then the inefficiency of a doubly-linked list has been needlessly introduced. And this is a very easy mistake to make, because the arbitrarily-chosen directional iterations may not even be specified by the same programmer; if you have two programmers working on different parts of the code, they may both think they are looking at a highly-efficient singly-linked list as they specify iterators that only go in a single direction; but if the directions don't agree, because either or both of them felt compelled (or even allowed) to mention a particular direction when it wasn't essential to the problem that coder was solving, you get a more inefficient doubly-linked list for no good reason.

Whenever it does not matter to the program's correctness which direction the iteration goes, or whether the iteration has a direction, or whether the collection is handled one element at a time, the program should not specify which direction the iteration goes or whether the iteration is handled in serial or parallel. That is not essential to the operation and specifying it denies the compiler/optimizer opportunities to pick an appropriate, efficient data structure for the collection.

Cluttering up the program with nonessentials like that is also a distraction that obscures what is essential to the algorithm. It expresses a nonessential artifact of a particular implementation - one which may be expected to be missing, or different, or have the same algorithm implemented in unrecognizably different terms (as with reified continuations as iterators or some such crap) elsewhere.

Specifying any property of the solution which is not an absolutely essential part of the solution implied by the PROBLEM as opposed to implied by a particular strategy, or a premature-optimization decision as to what data structure to use for SOLUTION, first needlessly presents an opportunity to get it wrong, and second hinders the ability of people or tools to recognize or compare or optimize implementations of the same or related operations and see what differences between them are essential (or in fact whether they have differences).

Of course, foreach(), map(),

Of course, foreach(), map(), etc. are also often described as "internal iterators"...

Right. They're the

Right. They're the prepackaged tools we use instead of puzzling out how to build them with iterators each time.

No problem with pre-packaging

There is not problem with pre-packaging 'map' etc. But consider implementing a new collection, would you rather have to define, map, fold, append, etc... or just successor and source (and be able to use the existing generic definitions of map, filter, append etc...)?

I imagine the language will come with a set of common collection types, and a whole bunch of common algorithms (that makes up its standard library). You only need to bother with iterators directly if you want to implement a new collection or a new generic algorithm.

My point was that, in my

My point was that, in my view, the notion of 'iterator' is broad to enough to include an approach that prefers map(), foreach(), fold(), etc. Iterators are a more general idea than particular OO patterns that you may have in mind.

map and fold and forach the same access pattern

Map, fold and forach all capture the same access pattern, the forward-iterator, so in a way they are not fully 'factorised', they still include redundant information. There is no need to define all of them on a particular collection. Then again isn't fold more general than the other two in any case (although still not as general as a forward iterator). The real difference comes when you have an algorithm that needs two or more iterators.

No they don't.

Map and foreach, at least, are usually explicitly defined as visiting in no particular order. They could be perfectly well defined based on (using your C++ terminology) a ForwardIterator, a BidirectionalIterator, a RandomAccessIterator, or even without iteration of any kind.

I'm not even 100% certain that fold, reduce and friends guarantee any particular order either, despite the apparent obviousness that they do.

folds

There is foldr (start at the right end) and foldl (start at the left end). APL's reduce operator / is a foldr. k's reduce operator / is a foldl. So in APL -/1 2 4 is 3, in k -/1 2 4 is -5. So in APL -/ has a quadratic behavior in vector size. k's -/ is linear (but I think less "interesting").

Also note that f/x y z can be seen as equivalent to "inserting" the binary function f as in "x f y f z" and since APL has right to left evaluation, this means x f (y f z) etc. In k this simple equivalence doesn't hold (a slight bit of imperfection!).

You are right

Yes, you are right, map does not impose an access order, and neither does fold (although there is some relationship between the data source and the accumulator, but I guess the specification does not care what the actual access order is as long as the accumulator is constructed as if the accesses were left to right or right to left).

In any case this shows why map, fold etc, cannot be the fundamental relationship between algorithms and containers, because access order efficiency is a fundamental property of both collections and containers. So this supports the idea that iterators are a more appropriate abstraction, allowing the most efficient option to be chosen.

I still don't buy it.

You can't produce map in a collection-agnostic way using iterators; map : collection -> mapping-function, collection must somehow know the most efficient way of iterating collection.

The solution, you'll say, is to implement meta-map : collection -> mapping-function, collection, iterator and then making map generic, with specialised cases for forward-iterating-collection, random-access-iterating-collection and so on. That might make sense, but what to do when a collection provides multiple iterator types? One might, for example, have an array which is all of forward-iterating-collection, bidirectional-iterating-collection, random-access-iterating-collection and any number of other types of iteration you might want to come up with. So you probably end up having to specialise map on a per-collection-type basis anyway. And even that might not be enough : in some cases, platform constraints may mean that forward iteration is more efficient than random-access above a specific threshold size of collection.

I think what I'm trying to say is that iteration is orthogonal to higher-order functions and that it makes no sense to say

iterators are a more appropriate abstraction, allowing the most efficient option to be chosen.

This is clearly wrong - going back to our friend map, the most efficient option is, in many cases, not to iterate at all, rather to perform mapping-function in parallel across the whole collection; map in such a case wouldn't necessarily care about iterators at all, but perhaps about the in-memory layout of the collection itself.

Iterators must be efficient

The array provides a random-access iterator, which is a subtype of bidirectional and indexed interator.

In other words collections do not provide more than one iterator, they provide the least subtype they can implement efficiently. A collection only implements the iterator where it can perform all the actions in O(1) time.

map-parallel is a different algorithm from map-sequential. You may provide many different mapping algorithms based on the same fundamental set of iterators.

map-parallel is a different

map-parallel is a different algorithm from map-sequential

No it's not. Mapping, no matter how it's carried out, is transforming a collection by applying a function to each of its elements. The only difference between mapping sequentially and any other formulation is *internal* to the map function itself. What I want to do is map my dataset, not map-sequentially it.

It's like the difference between bubble sort, merge sort, and parallel radix sort. They all carry out the same operation, sort, and, as a user of hypothetical-programming-language, that's all I should need to know. Syntax aside, all it should come down to is collection.sort. If I'm having to choose between sort algorithms, especially at the time I'm writing the code, then your language has it all sorts of wrong. Yes, I should probably be able to hint that data at this point is likely to be worth the effort of using a parallel radix sort on, but mostly it should be down to the compiler and / or runtime to bother with the piffling little details for me.

Not only is code writing time the wrong time to be choosing specific implementations of map, sort, reduce, etc (it's premature optimisation, c.f. Hoare, Knuth, Jackson et al, although you shouldn't really have to), but it also adds significantly to the cognitive payload on the programmer. Instead of remembering map, they have to remember parallel-map, sequential-map, backwards-map, alphabetic-map and so on, as well as the use cases for each.

What is an algorithm?

You seem to be using a different definition of algorithm. Radix sort is clearly a different algorithm from quick sort. Similarly map-parallel is a different algorithm from map-sequential. The end user would just use 'map'.

The generic language automatically chooses the most efficient sort algorithm based on the container type. The programmer does not need to specify this. However the language must be capable of expressing the different semantics if you hope to ever add new collection types or new algorithms. For example I want to add a super new efficient parallel map, and have all applications using collections which are suitable automatically take advantage of this new definition.

-

Posted in wrong place.

Stepanov says is best

Quote from Stepanov: "I believe that iterator theories are as central to Computer Science as theories of rings or Banach spaces are central to Mathematics."

I like to think about it

I like to think about it this way. Let's say you start with some object:

object collection

Later on, you put things into it

collection.insert(Item)

You put a lot of things into it, and then you want to sort it:

collection.sort()

The collection you want is defined by the collection you need, so why not just improve collection's type in response to the methods called on it (nothing, collection, list)? Right now we are stuck in the mindset that "the methods you get are the methods provided by the types you chose" but this ignores the problem of choosing a type in the first place.

The problem is fundamentally of flow and what order do you want to permit the programmer to think in. In this case, it useful to use types as hints for prioritizing completions, but completions shouldn't be ruled by possibly erroneous or incomplete types.

But C++ is its own bag of crazy and I'm not sure what would work for the style of programming you are dealing with.

Completion with subjective arguments

I'm not really thinking of C++, it just happens to be the only language that has both generics and iterators that I am familiar with. What I am thinking of would have a type system with parametric polymorphism and type inference.

As I am planning on a type-class like mechanism, the comparator can be inferred as a type constraint:

sort :: Iterator i, Comparable ValueType(i) => i -> i -> (i, i)
sort first last = ...

However, as each type can only have one comparator, and really changing the comparator does not make sense (less than is always less than), you would probably need:

sort_ascending :: ...
sort_descending :: ...

Of course this is on the iterator, so you would want to write

sort_ascending begin(list) end(list)

Or as 'begin' and 'end' would be like member functions of list:

sort_ascending list.begin list.end

But if we want to start with the list, we need to do something like:

list.sort_ascending(.begin .end)

Where list is the 'subject' and '.begin' and '.end' apply to the subject to generate the arguments to sort_ascending. I think this includes the 'interfaces' better than all the previous ideas because it includes the iterator concepts, and starts with the list. The other thought is that with named arguments you can complete whichever arguments relate to the subject whatever the argument order, so leaving the remaining arguments for the programmer to complete, and amount of typing is not so important if its completed for you so the improvement in readability does not cost much.

list.sort_ascending(first=.begin, last=.end)

So you type "list." and you would be offered the completion "sort_ascending(first=.begin, last=.end)" as one of the options, along with all algorithms that take an iterator as any argument. You could have some heuristic rules about whether arguments get 'begin' or 'end' by default.

Which flow?

I think the example you chose does not demonstrate the need to disregard the type system. If you just write, say, let sorted_insert coll item = coll.insert(item); coll., there is no erroneous type information to start with, there is just no typing information. This means that any type (which has an insert method) is correct, and that completion should be provided for any such type. This is not an example where you need to escape the hypothesis that only well-typed completions should be suggested; but it demonstrates than (when there is not enough type information) you need other scoring method to structure the completion suggestions.

Sure, you can also use these alternate scoring heuristics as an "escape of the typed local minima" strategy to sprinkle them in the suggestions to enhance the proposal with ill-typed (but possibly correct) suggestions; and that is a long idea in the long run. But need it be the central talking point of our conversations about completion?

Maybe the issue is with language design forcing users to give a type annotation when they don't know yet what the type is.

I think your point that whatever information *is* given may be wrong, and that you shouldn't restrict the suggestions-set assuming they are always right, is of course correct. But that's not specific to the typing information (it also applies to any sort of static deduction from the code), and my intuition would suggest that you can tackle this issue in a rather general way, turning a clear specification of a given static analysis that assumes the already-filled part are correct, into another static analysis that is fuzzier and more forgiving in behavior. That would probably be done by adapting to interactive use bayesian techniques already suggested for (batch-mode) explanation of static errors, such as Danfeng Zhang and Andrew Myers's work on "general diagnosis of static errors" presented at POPL'14.

You start with zero type

You start with zero type information and add type information as method are added. Rather than deciding what coll should be, you decide what coll should do! A lot of stuff is incompatible, so coll can't support insertions, sort, and as well as window operations, for example (e.g. via exclusive inheritance which Scala traits should have supported).

So the first method called on a symbol says a lot about it, just like the type you would have selected in more old-fashioned PLs. Really, we are no worse off than before (since we always had to select the type of a symbol from a global list anyways) and are probably better off.

The reason why many developers hate code completion (and worse, structured type correct editing) is that it lies: it doesn't show you what you really want because it would cause a type error somewhere else. Let the type errors work, having the error to guide program construction is better than not having the erroneous code at all. So use the type information, ya, but don't be limited by it.

Hm

I think we agree on how this mechanism work in your example: you don't give type information, and information flows from the first method you've selected. My point is that can be explained *inside* the framework of type-correct suggestions: in absence of type information, all suggestions are type-correct (and they thus need to be ordered in another way for usability). If you want to demonstrate the need for type-incorrect suggestions, you need another example.

Erm, just because you don't

Erm, just because you don't see the types doesn't mean they aren't there.

Confused

I am a little confused by this sub thread. If we have an object 'obj' that we don't know the type of and we try completing, we can only be offered the complete list of all methods applicable to all objects, or the list of methods applicable to any object (this is much shorter).

Really if we don't know what 'obj' is we don't know what we can do with it.

I thought part of the reason for code completion was that people cannot remember the method names and precise arguments (as the definitions are likely in other files), is it "sort", "reorder", "permute-ordered" in this language?

Objects on the other hand will be passed as arguments or defined in local(ish) scope, and at least in the same file, so we should be able to look easily to see what there is to manipulate.

There seems to be some conflict between type inference and completion, in that type inference uses the functions to infer they types of the objects, where as code completion uses the objects to infer the available functions.

Global namespace; e.g. If

Global namespace; e.g. If

object obj 
obj.quack()

Then we know obj must be a duck because there is only one quack method anywhere. You then would get a type error in the next line of code

obj.bark()

If duck and dog are not compatible (given a rule that dog duck hybrids cannot exist).

Code completion is useful for both recall and discovery.

The problem is figuring out where the mistake is.

If the programmer calls object.quack() in one line, and then calls object.bark() in a different line, and there is not yet an object type that has valid calls for both, then, there's no type solution for the subset of the program you've already parsed.

But you don't know what's wrong. We can assume the case where the error is "obviously" the call to object.bark() because that's the earliest point at which we would need more information to resolve types. But because the nature of type errors is heavily context dependent, the earliest place where the error is detected may not be the place that needs to be fixed. The actual mistake may be due to errors or omissions earlier or later in the program text.

Was it a mistake when the original call to object.quack() was written, and now your system refusing to autocomplete to object.bark() is just obstructing what the programmer wants to do - bearing in mind that if that *is* what he wants to do he'll have to go back and do something about the object.quack() call?

Was it a mistake when the original definition of duck lacked an object.bark() method, and fixing the error means going back and adding that method?

Or does this only seem to be a mistake because you haven't yet seen a definition that comes later, where an object class with methods for both bark() and quack() will be defined?

When you decide what your system will refuse to autocomplete, or for that matter at what point in the program to report an error in type checking, you're stating that different parts of the program don't match up consistently - but you have no information about which of those parts needs to change in order to bring the whole into consistency, so reporting an error at one part, or refusing to autocomplete at that part, is making an assumption which is at best a shot in the dark about which of the various inconsistent parts are correct and incorrect with respect to the programmers' intent.

You do know that the

You do know that the programmer wants obj to be both a duck and a dog, and that these requirements are not compatible, so you give an error on one or both lines.

The system must never refuse to autocomplete. It can, however, bury invalid choices in a menu system, using what it thinks the user's intent is. So choosing quack will bury bark, but it is still there and if the user types "bar" it will probably show up in a top-level menu (with an error listed in the menu, for example).

This requires some fancy menu design as laid out in the maze of twisty classes paper. Basically, each menu has the same alphabetically ordered options (anything else is cruel), but many are elided and require expansion to see, where elision occurs via what you think the programmer's intent is.

confused

I'm a bit confused by the entire topic, as I had thought the point of generic programming was that one didn't need to remember method names: eg. after learning the name for 'sort', I can sort ducks and I can sort dogs (and presumably if I sort 'ducks and dogs' the result will be lexically ordered by duck then dog, whereas if I sort a mix of 'ducks or dogs' I at least get back a mix in which the ducks and the dogs are nondecreasing), and I can even sort, say, 'green eggs and hams' without learning any new names for 'sort'.

Are there so many functions with the genericity of a 'sort' that code completion would be as useful as it is in the kinds of languages where one must continually discover which name for 'sort' to use given the specific arguments at the call site?

Algorithm directory

Maybe for generic programming you don't need completion as such, what you need is a dictionary of generic algorithms, perhaps hierarchically categorised?

I still quite like the subjective arguments idea above though as it allows you to go direct from container to sort, even if you didn't know you needed "begin" and "end" iterators from the container to call sort. It seems easier for some reason to think of the 'subject' first.

The problem of generic

The problem of generic programming is one of parametric names. I have name "sort" but it isn't really "sort" but "sort"; the comparator that paramaterizes the sort method isn't really an argument but a part of the name itself. We really should just start of with list.sort() and then let the system complain about missing context that can then be added by the programmer...list.sort() using backwards.

The sort of sort

I think this is an interesting point, and I am still thinking about it. You don't really ever sort objects, you map objects onto a semi-order and sort that. For example 'sort these trees into ascending height order'. Where 'height' is a property of a tree. Specifying the comparator is a way of expressing a semi-order on a 'virtual' property. This suggests that the comparator should not be a 'generic' argument at all, but a plain function. Maybe something like:

sort :: Collection x => x -> (ValueType(x) -> ValueType(x) -> Bool) -> x

The function parameter for the comparator is not an independent parametric type, its type totally depends on the type of 'x'.

This would suggest that 'free' types in the function signature should go before the function name, and arguments whose type is entirely dependent on the types of the 'free' types go on the right. I am not sure how to write a type signature that reflects this in its syntax, but perhaps something like:

Collection x => x :: sort :: (ValueType(x) -> ValueType(x) -> Bool) -> x

Or maybe the signature should be written as normal, and the 'free' types automatically shifted to the left of the function name in the syntax?

Free type variables to the left.

I am reasonably happy with this definition. I am finding it hard to think of easy examples that would not fit this general model, when allowing for associated-types/type-families.