## How can be a interpreter faster than C (aka: kdb+)

In the linked page it claim the interpreter is faster than C:

Kdb+ is a testament to the rewards available from finding the right abstractions. K programs routinely outperform hand-coded C. This is of course, impossible, as The Hitchhiker’s Guide to the Galaxy likes to say. K programs are interpreted into C. For every K program there is a C program with exactly the same performance. So how do K programs beat hand-coded C? As Whitney explained at the Royal Society in 2004, “It is a lot easier to find your errors in four lines of code than in four hundred.”

Now I'm a noob about compilers, but I understand that a)Interpreters are easier to code than compilers b)And them are slow, probably very very slow.

How can be faster? How do it?

I don't see how "“It is a lot easier to find your errors in four lines of code than in four hundred.”" can be the explanation.

## Comment viewing options

### fast interpreters

Collections-oriented programming languages such as J, K, and APL have a very high tooth-to-tail ratio even when interpreted. That is, they chomp and process so much data for every operation performed that the interpretation overhead is relatively small. This is improved further by keeping the code compact, such that there simply isn't much to interpret.

Going from there to 'faster than hand crafted C' is achievable by avoiding unnecessary conversions between data structures, building the database into the language to minimize IO, expressing programs in terms of operations that are easy to optimize at a high level, tightening up the interpreter code so it stays in cache at all times, focusing on microbenchmarks that are a good fit for collections-oriented idioms, and so on.

This isn't to say a C program couldn't beat K, just even excellent hand-crafted C is rarely polished to the same degree as K's hand-crafted interpreter.

I find K quite inspiring in some ways, and I hope to leverage collections-oriented programming for similar benefits in Awelon project (eventually). OTOH, I don't care for the K language itself, with its heavily context-dependent verbs and ad-hoc mutable state.

Reading about it at http://c2.com/cgi/wiki?CollectionOrientedProgramming I see is closer to the DBase family too!

By coincidence I wish to build something in the spirit of it where the language will be relational in nature. I think the lineage of Dbase is lost today, but I haven't find a more productive environment than VisualFox.. I love python and Delphi, but none is as approachable than it, IMHO. So I thinking in how build something like that, but more modern.

So the trick is have a Collections-oriented interpreter. But where look in how build something like that? I found some simplified examples about build a functional language, but this is something new for me.

BTW, I'm not a expert in the area, and I have more procedural than functional skills. *If * exist a sample, even if limited or small in any imperative code I could follow it better...

### J

The primary J language interpreter is available under GPL, and is written mostly in C.

J's implementation isn't quite so polished or tight as K's, but J is still a collection oriented programming language with an enthusiastic community. And, people have argued, one of the best development environments.

### The source is very

The source is very compressed and hard to read, for someone not experienced...

There isn't any significant difference between implementing a collections-oriented language vs. implementing a scalar one. Try a few collections oriented programming languages for a while. Decide which operators you need. Select a syntax for literals (usually, this is kept very lightweight... no commas or other overhead). Write a parser, an interpreter, and a REPL. Most importantly, optimize later.

Besides J and family, you might also look into Kaya or Bloom, especially since you seem to favor the relational algebras.

### So, the "trick" is just that

So, the "trick" is just that kdb is build around a data structure and is optimized like hell for it? Like in lua the table is the main/almost only structure?

The relation to be a fast interpreter is because it operate in batch?

So, for example:

a = 1+2

But a collection language instead write the above as:

a = [1] + [2]?

In both cases the operation perform "equally" than in scalar languages but:

a = [1,2,3,4] + [1,2,3,4]

is "faster" because it not process one-by-one? And when it combine with other collections it also process it in batch (this what is called vectorization?).

And the nice side-effect is because it can process batch of data is that theCPU/Memory is more happy for it?

So, this is related how in a database is more happy if can process batch of data than one-by-one?

P.D: I look a Kaya and bloom and not see the collection-oriented or relational angle...

### The 'trick'

build around a data structure and is optimized like hell for it? ... a = [1,2,3,4] + [1,2,3,4] is "faster" because it not process one-by-one?

Yes, that's a big part of it. I suspect they also optimize common compositions of functions, such as +/ (sum over), and perhaps even reorder matrix multiplications and similar.

I look a Kaya and bloom and not see the collection-oriented or relational angle...

I don't understand how this is possible. You may as well have just told me: "I look at the ocean and do not see how it's wet."

### How to read kona source code

I can't tell you about j (haven't looked at its sources in a long time) but kona follows a similar style; both are based on Whitney's original one pager (you can find it on jsoftware's website). It is dense code but not too difficult to tease apart. But first you must be understand the language -- at least a few core operations, data types etc.

There are only a few basic types in K and their names are abbreviated to a single char. So I for int, C for char, S for string and so on. Common keywords are replaced with macros. R for return for example. Since this is a vector language, a few macros are used for vector work. The most common such macro is DO. So for example

DO(n,x[i]=i);


is short for

for (i = 0; i <n; i++) x[i] = i;


Also DO2 is used within DO and DO3 within DO2. The central "object" is called K, which can represent any scalar or vector type object and is ref counted. Two char macros are used to access a specific type. Thus Ki(x) is an int, KI(x) is an int vector, etc. The following creates an nxn matrix, filling it with numbers from 0..n*n-1.

K mat=gtn(0,n);
DO(n,K r=KK(mat)[i]=gtn(-1,n); DO2(n,KI(r)[j]=i*n+j);


Here gtn(t,n) generates a vector of type t of length n. If t denotes some scalar type (1 for int), -t denotes its corresponding vector type (-1 for int vector). 0 t denotes a vector of K. So KK(x)[i] references a specific K element.

### An implementation of J in C

There's a document on the J interpreter which may help you decode the C source: http://sblom.github.io/openj-core/ioj.htm.

### Kona

kona is a BSD licensed (almost) clone of k (k3 to be precise). But kona is not quite as fast as k3.

### Thanks for the ref. I wasn't

Thanks for the ref. I wasn't aware of kona.

### I'm surprised ...

... that those guys weren't threatened with a lawsuit by kx as I was (and my employer and our client).

### seems an expected result

There's large variance in C performance, so you can plot a C executable on a bell curve covering orders of magnitude in measured execution latency. Some optimizations are obvious, but others can be hard to see. Easy ones include "don't copy stuff" and "keep touching data while it's hot", while hard ones include "change design so that need not happen" and "non-aligned loop branch targets can be slow."

(I have seen executables vary in performance, by 10%, after just rebuilding so branch targets in sensitive tight loops move to less aligned new addresses. Finding this was caused by testing a null hypothesis: maybe performance changes when the code is the same, and you just rebuilt it. This upset folks pursuing improvements of less than 10%, because it meant past results were inside the noise of error bars. Oops.)

Profilers generally don't show when code is slow because it's waiting for a condition before more work is done. You can see where cycles went, but not where you added latency to do something because you refused to start doing it until you got the nod from another party. Instrumenting this is pretty hard. When fibers (green threads) are used, you can measure latency added by not being scheduled between the start of a task and when it's done. But that still doesn't show whether not being scheduled was optimal given conditions, because forced.

It's perfectly reasonable for average results in a programming language to fall on a different point of the bell curve than the mean result in C programs. Faster is likely to occur when a clever way of organizing things automatically causes some optimizations by default, like touching fewer cache lines fewer times on data organized in a better way.

The way C is typically written fails to be fast in certain standard ways so easy to find you can go correct them by rote, by walking through a mental checklist of classic mistakes. The only problem is, to make changes you have to understand what old code does, including subtle dependencies, including those an original dev knew about, and those they didn't because there was no reason to notice. Changing code easily goes ten times as slow as writing it the first time, largely because folks don't write comments or docs.

### Faster than C++?

So C++ has generic collections in the STL, how does this compare?

### No

It's easy to make straightforward C++ programs that are much more efficient than equivalent straightforward q programs. We made a fun game of this at a place where I used to work. I won this game a few times over, ultimately replacing many internal q programs with more efficient programs and leading up to a replacement for the whole q interpreter until kx shut down the fun.

What is the key difference in C++ over C, then? Are you talking about vector / parallel programs? When you say "straightforward C++" i hope/assume you mean it just looks like regular C++, not like tortured C++ that is trying to be K/J/APL/etc.? thanks.

### No difference really

You can also write straightforward C programs that can outperform similar q programs.

I used to work with some guys from First Derivatives (they recently bought kx), and we had a little competition to write a program that would parse a tree of integers and find the path with the minimum sum. I was able to find the most efficient q solution first (take that FD!), but it was like 10 times slower (though shorter, granted) than a simple idiomatic C++ program that did the same thing.

Basically, where you have "scalar" values, q performs terribly. It only winds up performing well in circumstances where you've arranged to call a primitive function with an array of primitive values.

### KDB+ is not quite K

Looking at the site, KDB+ is a column-store database with a query language called "Q" which is based on K4.

I would like to see the C/Q source code used for the comparison, but I am pretty sure we can say it is not a like-for-like comparison.

KDB+ is effectively an in-memory cached database, where a common data format is used in memory an on disk. So 64bit pointers mean no address conversion, and reading and writing is just dumping blocks of memory. The primitive operator is a 'list' (presumably lazy), and monadic functions (presumably in the list monad, but they could have a 'singleton' monad also). So writing a Q program is a lot like writing an SQL query where an operator is applied over every value.

It may well be that for the kind of operations performed, IO is the performance limit. In any case it is easy to see how compact programs can have high performance compared with 'C' programs perhaps using ADO over an SQL database. Maybe even compared with a NoSQL database. But its not really the 'language' its the built-in storage model making the performance difference, and the monadic functions making the code simple, and reducing the interpreter overhead.

I don't see why a compiled language could not have these kind of features. I could imagine a 'disk-backed' version of a C++ Vector could written in a similar way. Of course a parametrically polymorphic type system would make writing monadic code over such a vector much easier. I doubt that much mutation is really needed, so I could imagine a Haskell library could do quite well too.

One clear advantage KDB+ does have is simplicity, but a DSL in a compiled language would be as fast or faster given the same architecture. The interesting point is that if it is only as fast, then why make things more complex with a compiler? My guess would be that only certain problems can be tackled efficiently in KDB++. For example I doubt my Monte-Carlo Go engine would be fast in KDB++, but it has make me think if a monadic version could be written that was fast. I think the problem would be the branching logic that has to be applied every move.

### You're on the right track

I don't see why a compiled language could not have these kind of features. I could imagine a 'disk-backed' version of a C++ Vector could written in a similar way. Of course a parametrically polymorphic type system would make writing monadic code over such a vector much easier. I doubt that much mutation is really needed, so I could imagine a Haskell library could do quite well too.

Yes, I made this very argument to Charlie and Simon (two of the main guys behind kdb) and they laughed at me. Their (nonsensical, IMHO) response was that their customers will not want to be bothered with types.

I did build something similar to what you're describing, and it's part of my day job now actually. I'm not sure how much I can say on a public forum but if you're interested in this problem, send me an email and I can share some interesting stories. :)

i assume this is you :-)

### Yes I wrote that ...

... but it was intended as a joke. I used to work for Lab49 (they do a lot of great work for financial companies in NYC and London), and in that capacity had plenty of occasions to work with kdb.

### Some misconceptions

AFAIK Q is essentially a thin layer atop K4. In the Q interpreter you can escape to K4 by typing \ and hitting return. What K calls a list is what we'd call a vector. What they call Monadic are unary functions. Diadic are binary. A verb is a function. An adverb is a higher order function. A name is a symbol. A noun is a value. Their "cached database" are mmapped files or disk partitions. K is a general purpose array language so you can write scalar code with branches and what not but its strength is in working over large collections of regular structures.

It provides polymorphic functions over its built in types: scalars, vectors, dictionaries and tables. IMHO, adding the machinery required for "parametrically polymorphic type system" would overwhelm K for not much gain.

Fair comment about Q and K4, but if they are the same why call them something different?

I would think K vectors are lazy lists, or is it limited to only columns that fit entirely in memory?

I think there's some confusion about Monads, K uses monadic to just mean a function of one argument, which unless there is something I am unaware of is a misuse of the term. Nevertheless the functions that operate over lists could probably be mapped to functions in the list monad in Haskell.

I understand writing scalar code in K would not be very fast, hence my comments about porting a monte-carlo engine. It looks more suited to a particular kind of problem.

My comments about parametric polymorphic types were in reference to C++, and that they would be better than the current template mechanism for imementing something like a disc backed vector.

I still think Haskell could do this well with a monadic DSL for disk backed lazy streams, although i think to get high performance would require 'strict' buffers.

### AFAIK, K vectors are nothing

I believe a lot of J/K terminology comes from Iverson. See his "Notation as a tool for thought" paper.

AFAIK, K vectors are nothing special. Streaming is achieved by outsourcing fetching/storing to the OS by using mmap. The 64bit version will operate on large enough arrays to not have to do caching/splitting in your code. I suspect the remaining "fat" is only in the OS, which may be why Arthur Whitney is writing his own OS.

As for scalar code, if you can transform it to extract regularity, K can be a win. See the explanation of how a file is split up into lines in the referenced HN thread. This sort of transformation is not easy or obvious but (with lots of practice) you can often interactively arrive at the solution once you have the idea as you build it up a few expressions at a time.

### Is mmap optimal?

I wonder if mmap'ing is optimal, as it will have to page fault to load new data, and that uses the CPU exception mechanism which is (relatively) slow. I would have thought a lazy buffering approach where the vector is read chunk at a time (one cylinder contiguously for a hard disk for example) into memory and processed block at a time.The hard disk can be reading the next block by DMA at the same time as the CPU is working on the block before. It would use a lot less memory too.

### Try it!

What you're suggesting would introduce a lot of complexity in the main path. With mmaping I don't need to worry about buffer boundaries, what is in core and what is not etc. And a decent OS also do readaheads (that is, if you asked for block N, chances are you'll ask for block N+1 so might as well read it in -- if this happens fast enough, extra faults are avoided). And often you'll go over the same data multiple times. Consider something simple like

x[&x[;2]=0]

which selects all rows of table x where column#2 of x has a zero. Here x is accessed a number of times. Not having to worry about buffering issues makes the implementation of this code very compact.

### C++ file_vector Column Store Implementation

I have implemented a C++ generic container called "file_vector" that provides a file-backed dynamically sized array using mmap. It supports nearly all the STD vector methods, including iterators, and should call constructors and destructors on data if necessary, but also provides specialised methods for primitive types and POD objects. I am interested in comments regarding C++ style, performance optimisations, and any bugs found. Here's the GitHub repo:

https://github.com/keean/File-Vector.git

### objective?

What's your objective? Every plan can have a problem, so if you tell me which, I can narrow my comments. Persistence? Of what value is a file partially updated when a crash happens?

I like comments: preferrably a one line summary of class and function intent. Naming conventions implying where to look for a symbol are better. Your member names are naked symbols (dictionary words) without prefix or suffix, but a prefix like "fv_" on file_vector member variable names would improve them.

(I don't use C++ much any more, though it was my main language from 1990 to 2010. I liked C-with-classes style, but most "modern" changes in the direction of higher level -- removing as much responsibility from a programmer as possible -- are as appealing as vomit. I still need to tweak daemons someone wrote using Boost statechart FSMs, but I'd rather not.)

The objective is more or less summarised in the readme, to allow very fast processing of column data (for example time series data), inspired by discussion about KDB+ above. So the idea is that a file_vector implements the same IO style as KDB+ and would allow the same kind of applications people write in Q to be implemented in C++.

As for partial updates, you might read data from one vector and write to a different one, so you don't have to lose input data. I think this is a sound strategy to avoid the need for transactions, but not something that needs to be enforced at the vector level. In other words responsibility for data-integrity lies with the user of the class.

As all the methods are the same as the standard vector class I didn't think they needed any more explanation than is given in the C++ stdlib documentation. I generally think its only worth commenting non-obvious stuff. So for example there is no point in commenting "mmap maps a file into memory". I prefer meaningful function names to short function names that need comments to explain what they do, but there are some internal functions that could be explained better.

Personally I don't like prefixes on member names. It seems to be a common practice for people who tend towards the "Hungarian" notation style.

I find modern C++ better than early C++. I would prefer a parametric polymorphic language with type classes, but templates in C++ get closer to that than any other mainstream imperative language I have come across.

### so, file vector for fast databases, gotcha

The eighty word readme (which I did read) is a bit short:

C++ file backed vector for very fast column databases, useful for time series data. The API is identical to the STL vector class, except a filename is provided to the constructor, which is created if necessary and mmaped to allow fast random access. Because the destructor can fail, one additional method (close) is provied which enables exceptions during closure to be caught. There is full write support including push_back, and file-space is reserved using the usual vector doubling algorithm.

So your objective is "file backed vector for very fast column databases", and issues like data integrity are up to the class user. It's a mechanism -- STL vector workalike mapped to a file via mmap -- as opposed to part of solving a class of problems.

I didn't ask to have it explained. You asked for style feedback. Comments have a specific purpose: to explain why an original dev was wrong when a bug is found, by expressing intent, because often code does not match intent, in some specific detail. I interpret your reaction as criticism, so I'm sorry I replied. Not polite to go silent though.

A prefix is namespace, not hungarian notation. I take it you haven't needed to go find everywhere a member variable name appears, in a body of code millions of lines in size, and didn't enjoy looking at thousands of hits to find three that mattered. Best of luck.

### Feedback Appreciated

I appreciate the comments, and was attempting to explain my thinking. I did not take it as criticism, and I don't want to discourage you from giving feedback. I am not sure what else I should do though. I disagree about some of your points, but I respect your opinion. Its good to express them because otherwise one might think there is only one good way to do things. I am not going to rush and change my style because one person doesn't like something. But compare my style now to the past and it has slowly changed.

As for namespace prefixes, a class variable is lexically scoped like all variables, its no different to any other curly bracketed scope, so should I start prefixing all outer scopes, perhaps loop variables. Its just a style I don't like. Not prefixing variables has never caused me any debugging problems, as I don't use global variables, any variable not defined in the local function is a member variable. I don't write 1000 line functions. Functions should be kept short with meaningful names. I have worked places where the house style is to m_ prefix member variables (but not member functions?), which normally seems to be associated with Java, and I have programmed that way, and I found it no advantage, just made all the names longer, and caused problems when I wanted to change a local to a class variable when refactoring.

The problem with comments is what is obvious to me might not be obvious to other people. This is one of the reasons to have code reviews. If you find part of the code non-obvious then I would be happy to add explanatory comments. If you had said "I don't understand what the template specialisation for constructor/destructor is doing" then I could add a comment, but then again this technique is pretty standard to overload class definitions in modern C++ does it need commenting? There are probably two functions I think might need some explanation "open" and "reserve", but you might find them obvious enough? Having been in the position of taking over maintenance of other peoples code, I have learned the hard way not to trust comments. People write what they think it does, not what it actually does. Then other parts of the program are modified to work with the actual not intended behaviour. Code gets refactored and comments not changed. Some of the longest bug fixes I have done have been due to being mislead by comments. Normally I comment assumptions and usage of functions, but as in this case its identical to the standard library docs, there didn't seem much need.

I have changed the name of open to "map_file_into_memory" which I hope removes the need for any additional comment here, and I have added a comment to "reserve" which I think needed one as it is doing something which might not be obvious, and added some short comments where they might be useful.

In array languages readonly access to arrays is more common than writeonly access (which is more common than updates). open() will fail on a readonly file (as well, you don't want to open potentially precious data files in RW mode). You prob. need separate constructors for all three cases.

Haven't looked at this carefully enough but the feeling I get is it will do lots of unnecessary syscalls in resize_and_remap_file(). Ideally you want to leave all the cleanup til the end and allocate plenty of space during normal ops. Have you profiled it or done strace(2) on it?

You should point out that any ptrs into a file_vector class will stop working after a resize. This may have implications for file_vector

Can you do scalar + vector, vector + scalar etc? In general scalar ops should apply to objects with "congruent" shapes.

Not a comment on your code but another way k achieves its performance is by copy on write. So for example, given

a: 1000 1000#!1000000
b:a
b[3;3]:4 

in line 2, b simply points to the same object as a. In line three only the top level vector is cloned. So a and b point to different vectors but for all n except 3, b[n] points to the same vector as a[n]. You may wish to do something similar for better performance!

Good point about open, I'll have a think about it, but I could just open read-only mmap if you can't write. You could still modify the vector in memory, but it would not write back.

It uses a similar preallocation method to the standard C++ vector, which preallocates a multiplicative factor of current space. This means allocations are logarithmic, so its not normally a problem. You can always call "reserve" yourself if you know how much space you will need.

Standard C++ vectors have the same relocation behaviour so I didn't think it needed a comment initially, but I have added something at the top.

This does not address what operations you can do on a vector, but you should be able to use the standard library algorithms including "map". I will have a think about this too.

I am not sure how copy on write should work. Perhaps you can give me some pointers on how K does it? Currently a vector is associated with a file, so setting a=b copies the contents of file b to file a.

### interface and usage context details affect how CoW looks

I don't know K, but I've done a lot of copy-on-write file i/o, and can easily make suggestions if you expand your objective beyond "file backed vector for very fast column databases".

Your last reply continued to assume I didn't already know what you're saying, which is annoying enough to end conversation with folks not paying my salary. I was going to tell you what I saw happen with mmap under high-volume network i/o caching, and the way we replaced that with custom paging, and parameters involved there. But your manner suggests you deal with a lot of dumb people, and can't suppress a habit of talking down. What's up with that?

### Sorry.

Sorry, I meant no offence. It was my fault for not making myself clear in the initial request. What I should have asked for are specific places that need commenting because people find them non-obvious, rather than asking for general principles of commenting and style, which in hindsight was not what I wanted. I would be interested to hear your suggestions regarding copy-on-write. We can certainly mmap the same file (or anonymous shared-memory region) with MAP_PRIVATE, which would turn it into a copy-on-write copy of a file or shared-memory region, what I am not so sure about is if this can be turned from an anonymous and private mapping into a shared file-backed one?

Regarding network filesystems, I would probably want to copy the file to local SSD before processing, with many nodes processing in parallel, rather than all loading a file server.

### apologies not much PL content appears in this

All this is written for someone else, if not you; what you want to hear isn't clear to me. Token peace-making suggests you want to hear about what can go wrong with mmap, if you stress it. Using SSD may blunt the worst effect. (We didn't switch to SSD until after giving up mmap.) Skip the next four paragraphs to go straight to mmap.

This sub-thread isn't about programming languages, which seems a slight problem. But most folks like stuff about performance in small doses, and I like explaining. (Yesterday I realized the world view I had, when I learned to like explaining, is wrong. A younger me believed intelligent people in the world share common goals, and can exchange views and facts to mutual advantage, instead of pursuing self-interest and status alone. Little supports that belief, so I can't afford to hold it, but I don't feel like me unless I act like I do.) I fantasize this is useful to someone.

Dave's reply below about mapping is good. You write a util, and define an api whose contract specifies usage rules, permitting a logical to physical mapping where you replace physical parts with new bits as needed that logically look like the old medium. Lots of things fit this scheme, so it should sound familiar to the point of obviousness. A good explanation presents a sequence of obvious details in such a way that larger patterns also become obvious too. Getting a balance between old-and-boring and new-and-interesting, in obvious parts, is hard unless you know what a listener already considers too familiar to go over again. Ideally, they ask questions; and ideally, you can see their face and change speeds depending on cues. In writing you pick a safe compromise. This paragraph is slightly on the too-explicit side, to put a stake in the ground.

Words define, specify, contracts, and rules (and others) are all important. People can't RTFM unless there is a manual. Nearly every interface has a rule that, if broken, causes results to go haywire, causing a crash or merely nonsense in output. Crashing is easy in low level languages; nonsense is easy in high level languages. Part of design is explaining the rules. A story about usage can be effective, because people like stories, and remember them. I was going to include some fictional dialog as example, but I won't unless this sub-thread goes on. Conversations are much more interesting than plain vanilla docs.

Context also matters. Design depends on tradeoffs. You can't write a tool optimal for everything. Only having no features and no semantics makes it possible to avoid all possible conflicts with a later usage scenario. A typical tool makes assumptions about when it would be useful, and leaves them implicit, or calls them out explicitly if an author is feeling helpful. If you provide a C++ interface, probably you implicitly expect more C++ will be used at the same time, etc. Other languages can be expected, or required. A code shop can say "we use language X" and reject most options in disagreement. Programming language is a variable folks usually bother to express. Others vary in how implicit they remain: number of processes, number of threads, whether async or blocking api is preferred or forbidden, size of RAM, size of secondary storage, network i/o bandwidth, lifetime and size of data structures, contention when resources are exhausted, frequency and cost of restarts, impact of latency on which consumers and when it matters, whether content is persistent or ephemeral, and whether components are local-and-tightly-coupled or distributed-and-loosely-coupled. Sometimes merely cost of goods sold imposes a design limit: yes, N disk platters would be nice, but this config only has one, so have fun with the iops limit and managing i/o queue latency.

We used mmap on Linux in 2006, and kept using it two or three years before a change to get better performance in terms of raw i/o throughput and consistent latency. In a typical scenario, we wanted as many connections as possible filling network bandwidth completely, while writing nearly everything to ephemeral cache for near-term deduplication. This means there was a lot of writing, before a later optimization to avoid writing content seen so far only once. This was in C++ and involved several processes and lots of threads, so blocking on virtual memory was okay. (Later versions were quite different, with blocking on virtual memory forbidden.) Every process needed read and write access, but each part written had a single writer and N readers. Latency was important, because failure to service network connections in a timely fashion causes escalating load problems: hiccups cause congestion. Long-term steady state behavior mattered.

Measurement showed huge latency spikes coinciding with plunging throughput dips, ultimately traced to an over-worked Linux flush daemon, causing all user space processes to remain unscheduled for several consecutive seconds. This is very bad for network traffic, when everything backs off. (I didn't do this analysis, so cannot give blow-by-blow detail.) It also showed as mutex locks hitting six second latency, when the process was never scheduled this long. We learned any app trying to do large amounts of writing, like databases, could expect this result if i/o scheduling was left to system daemons.

One cool thing about mmap is how simple it is; another is how much testing it has undergone, so your odds of hitting a bug are low. But in some contexts, control over latency means scheduling your own i/o with a custom user-space pager, which is quite complex compared to just using mmap. I told them I had written several pagers for storage systems, and whipped up another as a prototype to replace mmap (because like any sensible developer who uses some abstraction, I had arranged not very many places actually revealed mmap was under the covers). As proof-of-concept, that drove the next design. Under C++ with threads it could not show best results without control over when async i/o occurs, because writes must be scheduled when slack exists, or else you double latency on seeks to clean buffers for re-use. To the extent your interface directly reveals mmap is below, changing it to another option becomes hard.

### COW wrangling

A vector is basically a finite map from N to each element, and maps have keys as well as values. If you want to change that mapping, you could move all the elements around, or you could just precompose with another map, M->N. Can you see how reversing (or even rotating) an arbitrarily large vector can be done in O(1) time? Take a look at the APL "encode" and "decode" operators, and consider their relation to "dope vectors". (this should also explain the original question about the kinds of problems where array languages get speed: structure-preserving operations can traverse long swathes of mostly contiguous memory without cycling through the interpreter dispatch) After thinking about that, is copy on write still unclear?

### OJ Dahl

Short historical note: Norm Hardy points out that even the first Fortran was a little more sophisticated in its approach to multiple indices than many current languages. (I will note that back when I did this kind of thing professionally, we were not trying to be faster than C —which suffered from potential aliasing—, we were trying to match Fortran)

On this topic, students of history may enjoy comparing OJ Dahl's 1957 Multiple Index Countings on the Ferranti Mercury Computer with the ca. 2000 STL discussion Segmented Iterators and Hierarchical Algorithms.

Or, compare Dahl's autocoding work as explained on pp. 250-251 of the 1958 Teddington Conference with the line that would later develop into Prolog.

The two threads come together when looking at the proposed pseudo-code:
(having explicitly been influenced by mathematical logicians, the dots are a syntax which indicates what we would call 'parenthesization'.)

FA = 5 x 3
FB = 3 x 7
FC = 7 x 5
(i)(k) .. O → Cik
(s) . Cik + AksBsi → Cik..


in which the demonstration application for logic (here only 'for-all', but the text also discusses 'exists' in connection with 'solution of') programming is not databases, but vector calculation.

### Rotation

I don't see how rotation relates to CoW. You can do an optimal in-place rotate in O(1) time without CoW.

### Rotation

True; all I was trying to say is that once one has managed to make the physically non-contiguous former suffix and former prefix logically contiguous, it shouldn't be too difficult to make (eg) the physically non-contiguous former suffix and updated prefix logically contiguous.

### You can't successfully open

You can't successfully open a readonly file with open(file,O_RDWR|...). What you do with mmap after that is a secondary issue.

I forgot to mention an important point: you need not use mmap (to files) at all for temporary variables. CoW is most relevant for in-core data structures. Given its syntax, in k you wouldn't need to worry about CoW for writing back to file. For example:

a:0:"file-a"
b:0:"file-b"
c:b
c[0]:"xyz"
"file-c" 0:a,c


Here a is mapped to file-a, b to file-b. c initially points to the same vector of strings as b (: is the assignment operator, b[x] is a string for the xth line minus the terminating newline char). Overwriting c[0] means a new top level vector is cloned and shares all but zeroeth string with b's strings. Now a,c creates a vector that has a's strings followed by c's strings. This is then written out to file-c (and this can be done by mapping file-c -- you precompute the size of a'c and mmap file-c as needed. Though I do not know if this is what is actually done in K or kona or Q). Operator "f 0:"does text writes so each string is terminated with a newline.

FWIW, in my view std C++ vector may not be the right abstraction for emulating k's vectors.

### Abstraction seems okay

The abstraction seems okay:

file_vector<int> a("file-a");
file_vector<int> b("file-b");
vector<int> c = b;
c[0] = 123;
a.write("file-c");
c.write("file-c");


Would appear to be the same thing, and offer exactly the same opportunities for optimising the implementation. I have used integers as strings in a vector don't make sense (unless they are fixed length? we can implement this with a helper class that implements copy and assignment).

The problem that I can see is that there seems to be no way to create a CoW map of a memory region. We can create a CoW region of a file (with a PRIVATE_MAP) which would work for "c" above, but you would not be able to make a CoW copy of "c"? How does the K implementation deal with is using mmap and keeping the code simple. Does K implement CoW manually with a 'difference' tree or something similar for all vectors?

### refcounts

Answer to the last question: A K node is refcounted and can represent any K object. If var a is assigned to var b, they both point to the same node, which now has a refcount of 2. A 2D array is represented as a vector of vectors. So if a is a 2D array, an expression like "a[i;j]:k" is interpreted as something like "(assign! (quote a) (vector i j) k)" so the assign! routine knows if a is shared and can do the necessary cloning and refcount adjusting before assigning.

K has explicit operations for input and output. Any mmaping is done under the hood. I think something like

"file-c"0:(0:"file-a"),'(0:"file-b")


which reads from file-a and file-b, does line wise concatenation and writes out the result to file-c, *can* use mmap for output & actual flushing of the data to file-c can happen asynchronously by the OS (this can make your interactive UI more responsive).

BTW, if you are interested, search for kreflite.pdf as it documents k2 semantics very well (and k3 is very close to k2, unlike k4).

### C++ mmap vs streambuf

I have modified my parser-combinators to use the File-Vector instead of the streambuf interface. A simple parser appears no faster using the mmap version than the streambuf version.

### C++ mmap vs streambuf redux.

I have finally got round to refactoring my parser combinators https://github.com/keean/Parser-Combinators to use iterators rather than directly operating on the stream object. I have then repeated the mmap vs streambuf comparison which previously showed no improvement. Now using iterators the 'mmaped' vector shows significant improvement in performance (>100%) over the streambuf version, this is on top of about 40% improvement just from refactoring the parsers and the streambuf to use assignable stream iterators.

edit: I have now added an example expression parser using the file_vector:
https://github.com/keean/Parser-Combinators/blob/master/vector_expression.cpp

### Really?

I am surprised, actually. I think of iterators as being a level of abstraction, and expect to pay some performance price for it. I fact, I have always observed better performance when, eg, iterating over arrays using a for loop than when iterating over an STL container using an iterator. With something that has complex, and especially undocumented, internal structure, of course you use iterators and of course they're slower than a simple operation on a simple structure. Price of abstration. But I had not thought of streams as having a particularly complex or undocumented structure.

So hearing that using iterators rather than direct operations has actually improved performance in this case is something of a surprise. I guess it must mean that the underlying structure (streams in this case) is more complex than I had supposed, or has badly implemented direct operations, or that there is some reason why iteration in particular is more efficient when done via some unexposed (undocumented) interface than via the the direct operations.

But I could be wrong. To what do you attribute the improved performance, in terms of operations on the stream?

### OS and runtime

When you measure the performance of streams, you measure the buffering-read performances as implemented by the language library. When you measure the performance of mmap-ed file access, you measure the buffering-read performances as implemented by the operating system. It's hard to say which is more "abstract" or "direct" than the other, or to predict which one will win performance-wise (it depends on the system+library one uses), you're just comparing two very different implementations layers.

It's a sad fact of the current state of affairs that the systems and language communities duplicate a lot of work because they do the same things, phrased in different terms, and have historically valued not relying on one another. A lot of work on language runtime would make perfect sense at the OS level, and conversely a lot of system calls can be understood as trying to propose better language interfaces. This whole thread is about the fact that everything-is-an-array can be seen either as a programming paradigm or a complex system call, depending on how you look at it.

### STL iterators are as fast as pointers.

STL iterators over arrays are normally implemented as plain pointers in an opaque struct (which compiles away), all the iterator 'stuff' comes from generic template operations, so they are no slower in my experience (if you use gcc or clang and enable optimisations).

The way to think of STL iterators is as having all the performance of plain pointers, but with added static type-checking and rules. This is why STL does not declare an iterator base-class or use any kind of inheritance or virtual functions for iterators.

This abstraction from compile time program transformations (like C++ templates) gives abstraction with no performance penalty when done right.

The change to iterators so far does not lose any performance, but it opens the door to other optimisations in the way the backtracking and file handling is done. One advantage is we compare iterators to find the end of the file, we don't have to dereference the value.

The improved stream performance comes from the caching of the current symbol, and the assign operator on the stream. The iterators in this case wrap the virtual streambuf interface, cache the current symbol. This means symbol dereferencing is now a static dispatch function because the parser combinators are templated on the iterator. Secondly they get rid of the backtracking stack, by having all the iterators internally hold a reference to a single shared 'range' object created on the streambuf. The range object tracks the real file pointer. We can then freely assign one file iterator to another with no cost (just copy the position and the symbol), and we can dereference at no cost (read the symbol). We fix the file access in the increment and decrement operators, where it checks if the iterators position matches the underlying range position and seeks in the stream if necessary.

### Right, but not the point.

Yes, iterators over arrays are as fast as for loops. But if you're using an array, why the heck would you use an iterator rather than a for loop?

The performance penalty comes from the more complicated structures whose use iterators encourage and provide simpler syntax for, not from the iterators themselves.

### Iterators should always be efficient.

The choice of iterator (random, indexed, bidirectional, forward, input etc) matches the natural memory access patterns of the underlying container. The point is not simpler syntax, it is to abstract the data access pattern of a container.

The rule of only implementing efficient iterators, means a linked list should never provide a random-access iterator. So all iterators should map to the natural access pattern of the container, and hence normally be O(1).

The iterators provided for any container should never add an access penalty no matter what the data structure is.

You would not use a for loop, if you want to generically operate on any container (array, linked list etc) with no performance penalty, no matter what the data structure.

edit: but that does mean you cannot use all algorithms on all containers. The assumption is it is better to use an algorithm that is optimal for forward iterators on a singly linked list, than it is to provide a random-access iterator for it, and use an algorithm that requires random access.

### Still right. Still not on point.

The fastest access is gained by using the simplest and most direct iteration over the simplest adequate container.

Just because you can come up with an O(1) iterator and a container that uses it, does not mean that container and that iterator are the simplest and most direct container and iteration available or appropriate for that problem.

I can use a forward iterator with a linked list or a vector, for example, and even though they are O(1) iterators I will not get performance as good as I get with a for loop over a simple array, because the linked list and the vector are not as simple or as direct to access as the flat array.

Using an iterator (that is, iterator the language facility, not iterator the idea) usually means using a template-derived container, and there's no point in patterning a container with a template if it's really as simple and direct to access as a flat array.

### interface as view

I've been thinking a little about view-as-xyz patterns, because it helped me recall I can always use things directly, instead of only via higher level abstraction, which unifies disparate things for a generic consumer. For example, a local in-memory tree can be exposed as files in a vfs, but you can keep using the concrete tree directly. It depends on the level of abstraction you intend as means of approach: we view things the way we approach them. There's a bit of MVC (aka model-view-controller) to this, since we interact with views and the lowest level model may be hidden.

there's no point in patterning a container with a template if it's really as simple and direct to access as a flat array.

Schupke wants view-as-iterator, vai, to plug into generic algorithms that consume iterators as arguments, but is not saying that here in a focus on performance. You're right that direct array use is simpler, when you just intend to iterate without plugging into a iter-device. A template specialized to an array is concrete just like the array and the abstraction doesn't help, unless there's code elsewhere that wants to specialize on the iterator.

The context most often left unstated seems to be "what I intend to do with this", perhaps from assuming everyone must have the same objective. But it's the usage context that imposes the view-as-x perspective, and is thus essential to full explanation.

### I've been thinking about it too....

I've been thinking about the view-as-xyz patterns too, and I've come to something of a mad-science conclusion. And that conclusion is... Why should the programmer be worrying about details like representation?

A language should provide collections, full stop. Picking which kind of organization/representation a collection should have ought to be handled IMO by the compiler based on what the program does with it.

The property of uniform-expression of the iterator abstraction is kind of pointless, IMO, unless there's a good reason why we expect the underlying representation to change. And the good reason would be that the representation ought to be selected based on what we do with that collection, and subject to change if the balance of use shifts such that a different organization would give better overall performance or meet our performance constraints better.

The corollary of this idea is that the programmer oughtn't be directly selecting the container; the programmer ought to be specifying the operations and constraints and leaving the details to the software whose business is details.

### okay, sounds good so far

I don't mean to irritate, if I am. I dislike templates; they give off a complex baroque design vibe I think is nasty. The only reason I don't advocate against using C++ is because I expect new usage to dry up all on its own, eventually, but not before a few things get really complex. Having all the major C family compilers use it heavily now ought to give it enormous inertia, helping longevity, or else I'd predict it dies in a few years. A lot of folks seek complexity as it complements several related outlooks: love of tech priesthood, prestige from exclusion, pride in trivia mastery, and faith in elaboration as progress. Fixing just one won't work. The last alone is a real bear. Simplicity looks like regression to some folks. But it's not my problem to address.

Your alternative sounds good to me. Programmers tend to over-specify some things and under-specify others. Representation is often too concrete (creating too many constraints) while other details affecting behavior patterns get no attention at all. When you say "based on what the program does with it", do you mean a compiler is supposed to infer it rather than being told? Some things are hard to infer, even with runtime data capture feedback. Do you envision new sorts of notation a dev would use to guide expected semantics? If so, would you ignore or forbid specifying things not already on your todo list? (If I can't specific async semantics, new notation is useless to me. Other folks have their own hot buttons.)

In general I like the idea of specifying abstractly what you need, then having tools match to available concrete tools. Did you have a scheme in mind? I expect a particular problem when asking people what they want to be able to specify abstractly: people are crazy, so suggestions will be ludicrously involved, self contradictory, and filled with wishful thinking like "and then the strong AI will realize what I meant and fix it." The crap folks come up with is always surprising when you open the table to reasonable suggestion. Rationality doesn't seem a common trait. As much as I dislike brow-beating to get people in line, I see how it evolved, just as way to say no to madness lurking under the surface. But I'd rather tell a clear story and let others sink or swim. Have a good idea, then leave behind whoever wants to keep making mud pies.

Anyway, I recommend articulating specifics of good ideas you come up with, so they can be adopted because they work better than messes. Trying to reason with others about what is a mess and what isn't seems like a waste of time. Absent trial by fire, folks defending awful ideas have little to lose, and your life is short.

### No irritation at all, I'm enjoying the discussion.

Offhand, you're probably right about C++. It's starting to have several serious points against it.

First, if there's a well-targeted domain specific language suited for a project, and it has a well-tuned runtime, much simpler code in the DSL will often perform as well or better than you can easily do in C++.

Second, people who want simple development and flatout performance usually go all the way down to C.

Third, people who want ease of development in standard OO design usually go to Java or some later C-family language.

Fourth, people who need OO features not provided by C++ (such as multiple-argument dispatch in my case, darn it) usually have to either go to a language with a very different OO model or a metaobject protocol such as Common Lisp, or drop to C and roll their own.

Fifth, if a project grows too large, one eventually faces a horrifying revelation. The time spent parsing header files in C++ grows exponentially rather than linearly in relation to the number of files in the project. You don't start really seeing the knee of the curve until your project goes north of 2 million lines or so, but it's there.

C++ has a valuable niche because if you invest the needed effort, you can use it to do a whole lot of different things very well, and no other language allows you to do all of those things very well. But each of them is -- awkward and difficult and error prone, and usually at its best only when very complex rules are followed. IOW, you can do many things well, but you can't do them well easily.

I've got a C++ project that I develop and maintain. And I have a hundred little irritations with the language. It's doing several things that are just irritating, such as template collections and iterators making the compiler emit reams of complaints about template-generated code when you make any small simple error, and the reams of complaints are about things that aren't where you need to fix the error and don't help you pinpoint the simple error. The compiler could be made to do this better, but the template library as an abstraction has made it IMO unnecessarily hard for the system to produce on-point error messages. Anyway, I could live with the obtuse error messages. But for other reasons I'm going to be taking the project out of C++, so I'll have to get rid of the STL containers as part of the transition.

Part of preparing for the next major phase of that project will mean "de-objectifying" it and taking it down to C. The C++ model of object-oriented programming is only single-dispatch, and therefore isn't really a good fit for what I need to do in the next phase of the project. If I leave it in C++, it will mean coding redundant O(1) dispatchers for each of thousands of simple functions, or making more template code (which is notoriously tricky and hard to maintain) to create those dispatchers for me.

It would be much simpler in a language with two-argument dispatch - and in C I can do it just as simply with a 2-dimensional array of function pointers, and not require anyone to ever understand template code. But leaving it in C++ with half of it decided by the native dispatch mechanism and half decided by other means would be ugly, confusing, and hard to maintain.

I need to implement collections code anyway for the runtime I'm building, and when I take the compiler/interpreter system down to C, I'll have to implement collections to replace what I'm now using the STL for. There's no reason these two purposes can't be served by the same code, so I've been thinking a lot about collections and how to do them better. And specifically about how to avoid the things that have been annoying the heck out of me with the required overspecification of the STL containers.

### STL is the best part of C++

I agree with these criticisms of C++, but I don't write much C any more due to all the pointless reimplementation. You cannot write generic containers or generic functions in C. I do not want to write a 'linked list of type X', every time is the opportunity for mistakes, typos, not to mention the volume of code. I do net want to have to write a cast to "void*", i want type safety.

So I am developing a new language, to fix all the things I dislike about C++ (It is imperative, type-safe, parametrically-polymorphic; has type-classes, type-families, zero runtime) but C++ seems the best language to do this in as the compiler requires a lot of graph operations which I never got on well with in Haskell or functional languages (lists of graph node ids and edges are cumbersome compared to structs and pointers for this), but I can use templates to keep some order and elegance in the implementation, but also make it very quick.

For example take a look at this recursive decent parser:
https://github.com/keean/Parser-Combinators/blob/master/parser_simple.hpp
used in this demo csv parser:
https://github.com/keean/Parser-Combinators/blob/master/test_simple.cpp

It is a lot of work to change the parser, and writing complex parsers rapidly becomes hard. Compare this with:
https://github.com/keean/Parser-Combinators/blob/master/parser_combinators.hpp
Now although the library implementation is more complex, using it is much simpler:
https://github.com/keean/Parser-Combinators/blob/master/test_combinators.cpp

Not only is it simpler to use, scales better to more complex parsers, and is also more than twice as fast as the simple recursive decent parser.

### Containers are good: Templates are a baroquification.

A container is properly speaking a closure, not a collection of code that can be filled in from a template. The "right answer" IMO when you want an implementation of a specialized container is a second-order function. You pass it parameters, it passes you back a closure and a set of routines referring to that closure's captured environment. That's sort-of what templates are doing, but they're doing it in a way that seems very clumsy; going all the way back to source code, running the parser and compiler again, etc. Why isn't this just a function call? Why are these "iterators" some kind of special thing of their own with their own syntax, rather than simple closures like any other closure in the language? Why for that matter isn't the container itself a closure? Why are there so many damn side effects on these containers when doing things that have the same syntax as side-effect free operations performed on inert data? And other grumbly irritations.

They are good at what they do; but it annoys me that every damn thing about them is an exception to the syntax and duplicates the semantics of the rest of the language - and it annoys me that C++'s closures -- it calls them "objects" -- are so hamstrung as to have made the STL seem like the most straightforward way to achieve its containers.

More than that, though, the "parameters you pass" to C++'s STL constructors aren't on point; they require overspecification, because they define a data structure rather than a set of operations to support.

### Template Parameters Define a Type

I agree completely that templates are not a great mechanism, they are however equivalent to type-classes, that is the parameter is a type, the functionality depends on the type of the parameter, and there is overloading (specialisation) on the type parameters. They also fully support multiple (static) dispatch as the overload depends on the type of all parameters. It is important however that these are not normal function parameters, they are types not values. There is a specific need for something that is not a function (for example Haskell type-classes, ML functors etc). The reason they have to be types is to allow compile time elaboration, which is what enables the implementations to have high-performance. If values are used, elaboration has to be deferred until runtime, and that has big performance implications.

To see it done right, look at Haskell type-classes, if you look past the C++ syntax, that is exactly what you are doing with templates. The important thing is the parameter to a function or object is a type, not a module or functor or any other such thing. Have a look at the mess you get into with Ada generics to see what happens with modules.

I don't care if a container is a closure or an object. To say it is 'properly' speaking a closure presupposes much about the system (garbage collection, first class functions etc). Suffice it to say that collections like that might be nice, but it is by no means necessary. A collection is just an Abstract Data Type, hence it is something with an interface.

There is no special syntax for an iterator, so I don't understand your point. An iterator is just an ADT itself.

If you are concerned about side-effects, I take it you only program pure functional languages like Haskell (not ML or Lisp which let side effects get everywhere)? I intend to use an effect system based on Monads to control side effects in the language I am writing. However, as a general principal when I write C++ code I expect objects to behave like values, so I am not sure what side effects you are referring to?

As for template parameters being overspecified, quite the opposite, templates are duck-typed, you can pass any type in providing it implements all the operators and methods required by the templated function. C++ requires an as yet unapproved extension (concepts) to bring type safety and the ability to name the 'interfaces' to templates.

However, you will be glad to hear I do not plan on having templates in my language, rather type-classes, type-families and kind-polymorphism will provide the generic programming facilities.

### Not type classes

I agree with most of what you are saying, but templates are quite different from type classes, insofar as there is nothing analogous to instance resolution. If you are sufficiently determined, template metaprogramming can be used to get some degree of type-driven implementation variation, but in the end you're still dealing with whatever is lexically visible in the context of expansion.

So I'm inclined to think that templates are much more in the flavor of parametric procedures or parametric type definitions. Except that the template metaprogramming option introduces some wrinkles in that view.

### More general than type classes

Perhaps that is a misstatement, templates are >= type-classes, rather than == to type classes. I can show line for line translations of Haskell type classes into templates if you need convincing. The difference is templates are duck-typed, rather than requiring instances. In this case templates are less constrained and more flexible than type-classes, and completely subsume their functionality, in the same way that dynamically duck-typed languages can type more programs than statically typed ones. For clarity, I am not saying templates are better than type classes.

Perhaps it is more accurate to say type-classes are equivalent to C++ Concepts, templates are equivalent to the parametrically polymorphic function, any function/object can be an instance of a concept, but I like to include 'validators' that check the function/object conforms to the concept, which also serve as documentation of the instance.

See my Go board for an example of this style:
https://github.com/keean/Go-Board/blob/master/go.cpp

Note:
- BOOST_CONCEPT_USAGE defines the type-class
- BOOST_CONCEPT_ASSERT asserts that something is an instance of a class
- BOOST_CONCEPT_REQUIRES indicates the requirements for the type parameters

### I would like to see it.

The thing that I don't see how to emulate in C++ are instance resolution. to instances that are not lexically visible.

### Lexically Visible

I don't really understand your objection. In Haskell the instances must be imported into the current compilation unit, just like in C++. Can you give an example of what you mean?

### Differing assumptions

I realized just this morning that we are (or may be) operating on different assumptions, and that this is probably worth a separate discussion thread.

The Haskell rule that an instance must either be defined in the compilation unit of the type or the compilation unit of the type class is an ad hoc restriction rather than a feature of the language design. It defeats much of the utility of type classes. First, it tacitly assumes that most instances are instances over a single type. Multivariable type class instances are heavily restricted. Second, it eliminates the possibility that a third party might define a type class relationship that was not imagined by either the TC author or the author of the type we are instantiating over. Eliminates this because the third party is not able to change either compilation unit.

Second, it's notable that instances mostly aren't imported into the compilation unit where they are used. Rather, they are imported into the compilation unit whose parametric resolution (i.e. at the call site) drives the generation of code (i.e. at the implementation site). The implementation site generally doesn't have lexical visibility on the instances. This is why, at compile time, the instances become hidden parameters.

So backing up: if we assume the presence of the ad hoc Haskell rule about where instances can appear, then I agree with you. We can do all of that in C++.

### Agree, Interesting Discussion

I think there is an interesting discussion about other language features and how they interact with type-classes. I am not sure I have encountered the limitation you refer to, and so it makes me question the statement that it severely limits their utility. If I have never had to break the restriction, and I already find type-classes the most useful model of an interface, what might I discover were the restriction to be lifted. Can you give a (practical) example of which cannot be done with Haskell style type classes?

### Practical (single variable) example

Obvious example: somebody introduces a new type T but forgets to instantiate Ord T. As a client of that type and also a client of the Haskell core type classes, there is no place that I can correct this oversight while satisfying the ad hoc rule about defining compilation units.

Incidentally, this gets much more painful when unboxed types get into things, because you desperately want to inline all of the instance method invocations. The dictionary approach at types close to ground is just unthinkably horrible. We think we may see an escape hatch for this that we can use in BitC, but it definitely falls under the heading of "dirty, sleazy, nasty, and mildly clever abuses of kinding".

### Type-Classes and Unboxed Types

For the missing Ord T, I just define it in the file where I call the type-class function, or in an imported file, that I need to import at the call site.

I am not sure I see the problem with unboxed types, after all C++ templates work with unboxed types, and we could just define a translation from type-classes to templates. The restriction is of course that all types must be resolved at compile time - but of course they always are, unless you have types that depend on values?

Edit: I like to think of values that are known at compile time as actually being types. For example you can define the integers as types of kind 'integer'. I might need better names, because in my thinking dynamically typed languages don't actually have types, they have tag-values, but I know some people call them types.

For dynamic polymorphism, I would translate the type-class into a real interface (abstract base class), and each instance into sub-classes in C++, but this would be restricted to only places where we don't statically know the type (existentials?)

### Correction

Not only are you a pretty decent sort of person, you're darned useful! :-)

The Haskell rule that an instance must either be defined in the compilation unit of the type or the compilation unit of the type class is [...]

Are you sure you're not thinking of the rule that implicitly brings instances into scope? I don't think it limits new instances, but rather defines where it will go looking for instances if it doesn't find them. (I don't do large scale Haskell development, so I'm not an expert on this, but I don't remember reading about this limitation).

### That's exactly the issue.

The Haskell instance resolution rules are just plain broken for programming at any sort of scale.

### I've got it wrong

I'm pulling stuff back into my head as I resume work on BitC. Sometimes I get pieces that aren't yet connected properly. Keean is completely right about the class environment and instance scoping, though the Haskell Report is strangely silent on how and when the class environment is consulted.

As I'm thinking about it, I believe the ad hoc rule came up in the context of ensuring instance coherence.

### Side effects are okay, but misleading syntax for them is bad.

I wasn't complaining about the existence of operations with side effects. My complaint is that syntax for operations that don't produce side effects has been overloaded for use by operations that do produce side effects. This is the same kind of complaint I have when the '+' operator is overloaded with something that doesn't have the commutative and associative properties of addition.

The result, in either case, is that code that looks like it ought to be correct and safe, isn't.

### Concept Checking

I agree with this, although the answer in C++ is simple: don't do it. I want operator overloading with axioms and effects, so that + has to be commutative and associative, C++ (full) concepts are supposed to provide axiom checking, although an external tool may have to be used to check them. On balance I would rather have the ability to overload operators than not.

I'm starting to like the idea that you can have limits on operators, functions or method selectors and still allow even run-time polymorphism.

That is, you could specify that + has to at least be associative and communitive and if used with * has to be distributive and it has to be a pure function. In that case even if you didn't know the types involved you could still do a lot of optimizations. You could reliably use mathematical algorithms that depend on qualities of the operators but work over EVERY domain where those qualities apply.

You could specify that a certain method selector (ie method name) or function name is a fexpr without specifying what it is and that way get rid of part of the original objections to fexprs.

Another idea that I'd like to see is dialects within one program, so that (for instance) you could have a region where arithmetic operations are low level ones that can compile to efficient code, other places where the same operations invoke a wider numeric tower, for instance allowing big ints or very high precision floats, and other places where there are no restrictions and the same operations could work on, say, polynomials or congruences.

### Interfaces and Concepts.

Yes, this is the kind of thing I want too. But it just looks like an application of Concepts/Interfaces to me (C++ concepts or type-classes) for the standard mathematical hierarchy:

MultiplicativeSemigroup, MultiplicativeMonoid, MultiplicativeGroup
Semiring, CommutativeSemiring, Ring, CommutativeRing, Semimodule, Module

Algorithms should be written using the minimal requirements, for example GCD can be defined using a EuclideanSemimodule, which would allow things like polynomials over reals.

So there is no real need for 'dialects', algorithms can be expressed generically, and datatypes can implement the required interfaces.

On the subject of fexpr, it seems they are only a problem because a nullary function is indistinguishable from a value. If we define a nullary function to take the empty tuple, then the problem disappears:

while :: (() -> ()) -> (() -> ()) -> (() -> ())
while cond expr () = case (cond ())
false -> ()
true -> let () = (expr ()) in (while cond expr ())


### typeclasses or objects are a marker for context

You're saying that you will insert the context into every object operated on and make sure that they're all the tagged with the same context and that the compiler knows it before hand...

And by "context" here I mean "This line is doing arithmetic over domain X".

I'm saying that there are other, possibly simpler, possibly more flexible ways to communicate context.

Remember, the goals are that your program is readable by a human and compiles correctly.

I'm not impressed by ever more complicated type systems because advanced typing is just an extremely constrained programming language based on a super simplified unification algorithm. When we try to express anything complicated in types, I keep feeling that there are easier ways to do this

### Type Inference

All this can be inferred providing the type-class definitions are in scope. So for example using Haskell syntax:

f a = a + 1


we can infer:

f :: AdditiveSemigroup t => t -> t


Because of the following definition:

class Regular r => AdditiveSemigroup t where
(+) ::  (t, t) -> t


Ideally we should be able to require (+) is associative and commutative but we can't do that in Haskell.

### The dialects still aren't bad

You can't tell by looking at "1" what domain it's in nor "0".

You can treat integers as something that wrap around at register width in some calculations and as something that increases size in others.

There are other possible uses of dialects -
functions could use a stack discipline in some parts of a program and garbage collected continuations in another, they could use a compacting gc in some regions and an in-place one in others.

### Sure you can

Because (e.g.) BitC's approach works:

0;;
forall bits :: Nat
0 < bits =>
it: int('bits)

Though I concede that we probably want floating 0 to be 0.0. I'm borrowing this notion of a literal Nat kind from BitC and the notion of kind classes expressing Peano relations from Habit. We'd decided (independently) to do the Nat kind in BitC, but Mark Jones' work on kind classes was eye opening to us.

The above is what we would do now. What we actually did in BitC was simpler and more conventional. The BitC preamble has a type class for all of the natural integer sizes (AtLeast8, AtLeast16, and so forth - though the originals were called IntLit8... you'll see in a moment why that's confusing). The instances of these classes are all integer sizes whose bit size is >= the one in the class. So AtLeast64 only instantiates at 64 bit types where AtLeast32 instantiates at both 32-bit and 64-bit types. It works just fine, but it has some interesting consequences:

let addRealSmall x = x + 8;;
addReallSmall: AtLeast8('a) => 'a -> 'a

let addBigger x = x + 256;;
addBigger: AtLeast16('a) => 'a -> 'a


For terms, all of this works out pretty well. In practice it becomes more awkward in two cases:

1. At top level, where a simple definition runs into a monomorphism restriction.
2. In imperative iteration constructs, where it is possible to end up with an underspecification (e.g. what type for for i = 0; i < 16; i := i+1)

We made most of the "interior" cases go away by introducing an integer type word that aliases the natural register size and inferring that at imperative constructs. The top-level monomorphism restriction is trickier. Given:

let mutable m_int = 0;;  // must choose a type

The reasonable choices here are (a) complain that monomorphism failed, (b) choose the signed integer type corresponding to the machine's natural register size, or (c) choose the largest signed integer type on the theory that this holds the most general set of possible values. Arguments can be made for any of these outcomes. In BitC we chose to complain.

In hindsight, we perhaps should have chosen (A+B). That is: resolve to register size so that compilation can continue, but issue an error as well so that the bug has to be fixed. We didn't do that because concretizing incorrectly has a way of causing a cascade of concretizations through the program that may induce other complaints, which in turn may cause the developer to "repair" their program unnecessarily or even incorrectly (that is: changing code that would have been correct if we had guessed their intent correctly here).

I suppose that, strictly speaking, we haven't selected a domain. What we've really done is constrain the domain sufficiently so that compilation can continue.

### That is, you could specify

That is, you could specify that + has to at least be associative and communitive and if used with * has to be distributive

Well that rules out floating point arithmetic.

### ... and that's a good thing

In conventional PL use, the +, -, *, / operators are underspecified. We often use them as if commutativity, associativity, and distributivity hold, but as you note, this isn't really true. Yet for a lot of purposes, things are "commutative enough" that we don't care.

But when the distinction actually matters, it's not a bad thing to be able to state our requirement.

### I agree

When using the standard operators, the compiler should assign idealized semantics (as if the values being manipulated were real numbers) such that it's allowed to rearrange and simplify expressions with algebraic manipulations. If you intend a specific floating point sequence of operations, different operators can be available for that.

### Sounds like we agree

I agree that the programmer should specify the operations and constraints. When you write an algorithm you specify the kind of iterator you need (random, indexed, bidirectional, forward etc). The algorithm does not need to know the kind of container it will be passed. The point is not that you expect the container to change, but that you don't want to have to write algorithms for each container. The important part is that algorithms are naturally classified by the access patterns they require, so it makes sense to create interfaces classified by access patterns. There are multiple containers per access pattern. So you might see something like this:

container: supported iterators
------------------------------

map(hash): indexed-iterator
map(tree): forward_iterator, bidirectional-iterator, bifurcating-iterator, indexed-iterator
vector: forward-iterator, bidirectional-iterator, indexed-iterator, random-iterator.

So an algorithm that requires only a forward-iterator can operate on most collections, and would not need different implementations for each collection type, but this does not stop more efficient implementations of algorithms using random-iterators being used on the same or different collections, as you can have 'concept overloading' which overloads the algorithms based on the supported concept.

Where we disagree is that I don't think the type of collections should be limited by the language. Obviously most people should stick to using the standard library collections unless they really know what they are doing, and I would much rather see a vector of a map of sets than some custom 'spaghetti class'.

However, when I want a datatype for disjoint-set, or a trie, I should be able to match them to existing iterator concepts.etc. I should also be able to define new iterator concepts for new collections and composites of collections like multi-dimensional iterators (coordinate objects), or iterators for binary trees (bifurcating-iterators). I wouldn't want to limit these concepts to those that are supplied with the language.

### Look at it a different way.

A container is to be evaluated in terms of a complete set of operations including but not limited to its iterators. Rather than talking about what iterators a particular data organization have, we should talk about what operations affect the suitability of a container. Maybe you never iterate at all and access only in a predictable pattern, and your container should just be a stack or a queue or something. Maybe it's more important to have iterators that aren't invalidated by insertion or deletion than it is to have iterators that are O(1). Maybe you do so much more insertion and deletion that the efficiency of those operations has a greater effect on your program than the efficiency of the iterators.

I say you just get containers, full stop, use whatever operations you want, and then let the optimizer pick the container implementation by profiling your code through its test suite to see which operations are most important.

### Lack of Organisation

So each collection implements a bunch of operations, isn't it natural to group the operations so you can talk about which containers have which subsets of operations in common. The invalidation of iterators is something that is determined by the container, and there should be a way of indicating the invalidation in the type of the insert or delete method, maybe using linear types, or axioms in the proof-relevant logic (type-system = theory, function-values = proofs).

When you have implemented a lot of algorithms on these collections, you will start to notice patterns in the algorithms that use common subsets of operations on the collections. Isn't it sensible to give names to these common subsets of operations so we can discuss them.

Mathematicians do exactly the same thing for number theory, creating semi-rings, rings, monoids, etc... They have found this useful and have a much longer history (3500+ years back to the Rhind papyrus for an early example of an algorithm) of thinking about these things.

Naming an iterator is observing a common pattern in _algorithms_ nothing to do with collections :-)

### Again, that's going the wrong direction.

The semantics of an iterator that's not invalidated by insertion/deletion are different from the semantics of an iterator that is invalidated by insertion/deletion. The programmer is concerned with the semantics of iterators, so that distinction is exposed to the programmer. This still doesn't require exposing the choice of implementation.

So the container abstraction provides iterators that are defined to NOT be invalidated by insertion/deletion, which is a more precise requirement than just the direction of the iterator. If the programmer uses those, as opposed to just using standard iterators, then the underlying implementation chosen must be one that provides non-invalidating iterators.

### Same Thing?

Reading that it seems you are saying the same thing, the algorithm determines the properties required of the container, so it seems we agree on that point, so I don't understand your "wrong direction" heading. To borrow from Stepanov (where he talks of classes read collections in this case):

"It starts with classes. It is as if mathematicians would start with axioms. You do not start with axioms - you start with proofs. Only when you have found a bunch of related proofs, can you come up with axioms. You end with axioms. The same thing is true in programming: you have to start with interesting algorithms. Only when you understand them well, can you come up with an interface that will let them work."

edit: okay I think I can see the difference, I think you are talking about naming the insert/delete safe iterators differently, say as "safe_forward_iterator" and "unsafe_forward_iterator" and letting the programmer choose which one they want to use. I am suggesting that you just say "forward_iterator" and the type system determine if any deletes or inserts occur during the lifetime of the iterator, and selects either the safe or unsafe version accordingly.

### Sure, if that works.

If I don't have to make the programmer specify a safe iterator because I can always have the typechecker figure it out instead, then that's a better approach. I'm not confident that I can, but it's sure worth a shot.

### Array access is modelled as random access iterator

There is no cost using a random-access iterator to access an array, in fact a pointer to an array is a fully qualified random access iterator, so your argument does not make sense, they are the same thing. An iterator abstracts the access pattern. It is abstracted as a random access iterator because that is the access-pattern, the container may have other properties (such as read-only dereferencing, mutability etc) that are orthogonal to the access-patern. Just because an array models a random access iterator does not mean it is the only component that does or can do so, but it is of course useful to not have to rewrite algorithms unnecessarily. When I have a quick-sort on a random-access iterator, I can use it on any component that models that concept.

I think there's some confusion about Monads, K uses monadic to just mean a function of one argument, which unless there is something I am unaware of is a misuse of the term.

Using "monadic" to mean a one-argument function or predicate is a standard piece of terminology in logic. (E.g., "monadic second-order logic" refers to second order logic restricted to quantification over single-argument predicates.)

### One thing

Excellent, I shall now proceed to call all functions with an arity of one "Monadic" and enjoy the confusion this brings to functional programming threads.

### Mathematical linguistics

I've noticed that even many mathematicians don't seem to appreciate that a serious literature search in their field must be an exercise in translation. They assume everyone defines technical terms in the same way as they and the small circle of mathematicians they frequent. When you read a paper on graph theory by an unfamiliar author, the first question it should answer is, what is a graph? Directed or undirected? With or without multi-edges and/or loops? I've strugged through graph theory articles that ought to have been easy to read, because I had to try to reconstruct, by inference from their reasoning, what their definition of graph was. When researching abstract algebra in a variety of sources a few years ago, I discovered that not all authors use the same definition for the term ring.

From a stackoverflow answer by Jon Purdy:

The various meanings of monad in philosophy, religion, biology, category theory, and functional programming are all derived separately, from its literal denotation of a “unit”. The Haskell term is probably derived from monoid, an algebraic structure equivalent to an additive monad.

So it is Haskell that introduced the confusion!

Correction: Iverson did use these terms in APL (perhaps because he used Greek letters as operator names?) but they are not mentioned in his "Notation as a tool of thought" paper.

### Category theory

Maybe you were joking, but the meaning Haskell uses comes from Category Theory. Not sure about the date. 50s or 60s probably.

### 1960s, afaict

If I'm understanding the notes from Categories for the Working Mathematician correctly, monadic categories first apeared in 1963 under the name "categories with multiplication", and were renamed, Mac Lane says, by Eilenberg.

### according to Mac Lane

p 138, 2nd edition CftWM:

These objects ... have been variously called "dual standard construction", "triple", "monoid", and "triad". The frequent but unfortunate use of the word "triple" in this sense has achieved a maximum of needless confusion, what with the conflict with ordered triple, plus the use of associated terms such as "triple derived functors" for functors which are not three times derived from anything in the world. Hence the word monad.

Leibniz' use should offer someone at MSR a cheap pun, but otherwise the greek origin leads us to expect there to be a wide polysemy; cf 'union'.

(for what it's worth, Iverson 1962 speaks of "binary operations" and dyadic doesn't occur in the index. maybe —reduction and positional numbering already having been present— he later introduced 'dyadic' to avoid any possible confusion between function arity and base-2 encodings of data?)

### Hum. I recognize it but try to use "unary" instead.

I do remember "monadic" meaning "of one argument" back before "monads" became accepted as a word for the quasi-functional style of deferring side effects by returning a composed function.

But I try not to use it that way in the last few years. I use instead another word that used to be a synonym: Unary. Of course, there's another word in that set that means having two arguments, which will be heard as meaning something else by a lot of people. So for that one you want to use "Dyadic' which is the second term in the series that used to include 'Monadic' ......

There is no logical reason why one of these words should have been chosen for that particular purpose over the other. We could be talking about "Unary Programming" in exactly the same way as we now talk about "Monadic programming" If one word in some important paper a few years ago had been chosen differently. For all we know it might have actually come down to a coin toss.

A lot of technical words drift toward specialization because we don't really have enough words for all our concepts.

The word "object" has had a particularly rough ride because it is deliberately nonspecific. So people who want to delineate something to which they can apply a novel technical definition have often come up with it when they reached for a word for something they could arbitrarily define in context. As a result, in any technical application that word is now a beartrap because it has acquired dozens of very specific conflicting technical definitions. You avoid the heck out of it because the minute your readers see it they'll have any of dozens of possible wrong ideas and you'll have to spend time explaining what you didn't mean by it.

I'm increasingly of the opinion that when you have a new technical idea, it's better to make up a new word than to try to hijack one of the same set of words everyone else is using.