Typed Data

While things are quiet seems like a good time for a question with a tenuous link to programming languages. Unix has been designed around the idea of plaintext as a universal interface between programs. It is not the only possible universal interface; there are many possible binary formats with strictly defined syntax and a flexible definition of semantics.

What previous attempts have been made to make a unix (posix) core that operates on a different interface?
- Replacements for the standard set of tools (awk, sed, grep, cut, paste etc)
- A replacement editor for serialised streams (e.g. the equivalent of a text editor)
- Changes to the runtime interface in libc to replace stdin/stdout and read/write etc

I am aware of Microsoft's work on powershell, but replacing the textual interface with objects introduces a complex set of semantics. What I am asking is if anybody has tried something simpler? In particular - a typed representation for data, that may look something like an Algebraic Data Type. A fixed binary encoding for:
- integers
- floats
- strings
- a list type structure
- a form of nesting to allow trees

The rough idea here would be to take something as expressive as s-exprs, but combine it with a fixed binary encoding, probably a chunk-based format like the old IFF syntax from the Amiga file-system. This idea keeps floating back up every year when I teach the posix core to students. Unfortunately it does not suggest a project that I can give to a student (too much programming, very weak / undefined comparison with previous work). Some day - I may sit down and start hacking, but before then it seems interesting to point out that this is quite an obvious departure point from the unix design philosophy - has anybody been down this road before?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Two linked topics: structured data and richer interaction

The idea of commands that manipulate richly structured data (instead of text) is naturally related to the question of the interaction capabilities we have on the input and output data (not just text edition). There have been many attempts at these two topics, often in linksteps, that are fairly interesting (from the modest incremental step forward to the grand vision): you replace the shell and the terminal. Some random references below:

  • TermKit, see design document On Termkit detailing some design choices. Interesting attempt, improbable bunch of 2.0 technologies (Webkit + node.js), stopped development as soon as it was launched. Data is typed by MIME types, with JSON data (s-exps with a Javascript syntax) playing a prominent role.
  • Terminology, a modest attempt by some Linux desktop project to call the proper viewer when a rich file format is piped to the terminal. A very tiny part of the OPENSTEP ideas coming on scene again, with a slightly larger chance of eventual success than Étoilé.
  • IPython, a cool notebook environment for Python programming, now generalized into the language-agnostic Jupyter. Impressive, reasonable chances to see serious adoption.
  • jq, a command-line tool to perform transformation between JSON data representations. I don't see new rich-data formats taking over without such tools to have a gradual transition from dumb pipes to something else.
  • The deliciously retro xmlterm that did everything in XML with a XUL interface back in 2000.

Missing from this crowd are some Powershell reference (you already mentioned that), and some cool Wolfblah thing that I'm sure the Mathematica people are busy inventing.

Interesting collection

Interesting collection of links. I'm familiar with Terminology (I use it now as a light-weight alternative to gnome-shell, which amuses me given the history of "bloat" in e17 over the years). I remember the inline image capabilities, although I've never used them. If I recall there is a built-in alias that knows certain file formats and inlines certain kinds of data.

What I had in mind is closer to IPython (in observed effect rather than implementation). It would be an interesting environment if GraphViz or Gnuplot simply output a stream that was a rich encoding that could be operated on by filters (something like a vector-oriented version of ImageMagic).

The interdependence between the terminal, and the kind of data flowing in and out of programs executed by the shell is indeed the main stumbling block. It is a problem without a simple starting point - lots of engineering just to get to a basic functioning system. This explains the relative inertia in the current arrangement - the design chosen 30 years ago has proven to be good enough to occupy its niche and make it difficult for an incompatible successor to arise.

You're absolutely right that this implies the interface between the shell and the terminal would change quite radically. The terminal would become a default viewer for some kind of structured stream of data. Possibly using a list of filters for known types (e.g. rendering vectors to a bitmap, or display tabular data, trees etc). There is an issue here that the terminal would then become much more complex, and it is unclear how much of that could split off into well-formed modules that exists as simple filters on pipes.

One of the luxuries that we have now that was unavailable 30 years ago is that memory, storage and processing are all cheap relative to the complexity of this problem. Prototyping a new terminal is relatively easy now (e.g. using a library in Python to handle the rendering and forget entirely about efficiency) in comparison to the resource constraints on this kind of work in the past.

The link between structured data and richer interaction is very inviting. Outside of the unix command-line most programs essentially ignore their argv/stdin/stdout interface and use a system library to build their interface upon (whether sockets, or opengl or a GUI). Perhaps many of these types of interaction would have a simpler expression as a network operating over richer pipes?

Not plain text: ad hoc and polyvalent

Unix has been designed around the idea of plaintext as a universal interface between programs.

"plaintext"?

The unix user environment treats the byte-stream as universal.

Many tools interpret a bytestream as encoding a stream of lines, each line a list of bytes.

Some tools (e.g., troff) assume higher-level structure atop the stream of bytes and/or lines.

One of the defining characteristics of this architecture is pervasive, intentional imprecision. For example, grep(1) applied to source code examines lines of byte arrays and is useful for imprecise examination of text the compiler will see as line-spanning symbolic expressions following a complicated syntax.

"Typed data" never quite works out like classical unix did because classical unix data is not merely "untyped" but is the type theorist's nightmare: usefully, inconsistently typed data.

Classical lisp style, using just s-expressions over atoms, can sometimes be similar to unix: inconsistently typed data depending on what code is looking at that data.

One could say that the lisp variation is "structured data", even though the "structure" is no more usefully a "type" than streams of lines of arrays of bytes.

One of the key characteristics of any system of structured data (such as lines of bytes or files of s-expressions) is that the type system of the "structure" is trivial, and typically not extensible.

(You can understand XML in this context as s-exp-like structured data but, via complicated namespaces, in some-sense extensible and less trivial that classic lisp s-exps.)

Convention vs Specification

There is no formal requirement that stdin/stdout should be restricted to textual data, but there is a body of work suggesting that as a design choice it should be treated as the default assumption. It is a recurring theme in The Art of Unix Programming, lifting a few quotes from there:

Doug McIlroy:

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

.

Henry Spencer:

A bad example of binary formats in Unix history was the way device-independent troff read a binary file containing device information, supposedly for speed. The initial implementation generated that binary file from a text description in a somewhat unportable way. Faced with a need to port ditroff quickly to a new machine, rather than reinvent the binary goo, I ripped it out and just had ditroff read the text file. With carefully crafted file-reading code, the speed penalty was negligible.

It is clear that text (whether "plaintext", ascii or utf-8) is a universal format, under the description of "intentional imprecision" that you use, but it is not the only such universal format. Each of the discussions of text vs binary formats that I've read (including the AOUP) attribute this characteristic to text, but not to binary formats. That is: the data semantics are an ad-hoc decision at the time of receiving the stream.

S-expressions are a weak (general) form of structure to apply to data, as you say the types are trivial because the structure is encoded in the data. Line structured records, and sexprs are then two possible forms of inband signalling about the structure of data. Binary formats, on the other hand, are assumed to be a strong (specific) form of structure using out-of-band signalling. These are three points in a fairly large design space. I wonder what other decisions on structure would produce something useful?

If we use out-of-band signalling to encode simple structures (lists, nesting) then the resulting format would look like a binary version of sexprs (or could be encoded as json, xml etc). These all imply quite weak type systems. Is there any work on encoding something richer - e.g. allowing declarations of structures as types, and then encoding a stream of data according to those types?

To take a concrete (but very trivial) example, consider the output of a program like netstat or ps. Line-structured text is used to describe a simple table of untyped cells. There is a mixture of data and presentation that is generally stripped out for further processing by various filters. Instead, this stream could be represented as a declaration of a record type, containing a mixture of numbers, strings and enumerations. Encoding this as an out-of-band binary structure allows a simple collection of processing routines in a runtime to replace the ad-hoc parsing code that tends to exist in filters.

re Convention vs Specification

That's very well expressed from my perspective.

There was a small flurry of interest and activity in the unix world around the concept of self-describing data. I am not all that familiar with that source material and if it went anywhere.

As an anchor point to start exploring, one thing I do vaguely remember is Tom Duff doing some self-describing structured data stuff in a unix tradition as part of trying to build toolkits for various forms of graphics rendering. (I also vaguely recall that this work was commercially successful.)

Not sure how much was published and how much not.

Is there any work on encoding something richer - e.g. allowing declarations of structures as types, and then encoding a stream of data according to those types?

Source control

One of the motivating examples (from my perspective at least) was something I read a long time ago about reinventing the interface for source-control. I know that you've done a lot in this area, and I wonder if it was something that you wrote that I cannot recall right now? The overall gist was that storing deltas between program sources works up to a certain point, but that storing deltas between ASTs allows a lot of functionality that is difficult to achieve with current tooling.

A (possibly idle day-)dream that often takes my fancy is what would happen if we could turn parsers into simple filters? Working in an environment of textual streams this doesn't really make sense as the output would be no easier to process than the input. But using a typed representation that allows the AST to be output directly in a form suitable for further processing would allow parsers to become independent programs, useful separately from compilers.

There has been a fair amount of interest in "nano-pass" compilers in the past five years. A lot of complexity could be removed from this area if there was a reasonable way to serialise/de-serialise the intermediate forms of programs that did not create a large overhead in each transformation stage. The overhead is noticeable in terms of runtime and resource use, but currently there does not seem to be a way to estimate how simply a particular transformation could be expressed independently of the large software framework wrapped around it.

re source control

The overall gist was that storing deltas between program sources works up to a certain point, but that storing deltas between ASTs allows a lot of functionality that is difficult to achieve with current tooling.

I'm sure I have publicly made observations along those lines but also that none of it was especially original.

But using a typed representation that allows the AST to be output directly in a form suitable for further processing would allow parsers to become independent programs, useful separately from compilers.

As far as I know, this is an open and rich area for practical R&D. Computers these days have gotten fast enough to open up a lot of possibilities compared to 30 years ago. (I wish you lived nearby... I could show what I'm working on. :-)

I have observed that if you envision tool chains along the lines you are hinting at, it will change your view of what a concrete syntax should look like. Eventually, in my experience, you wind up wanting to make the surface syntax usefully, incrementally parsable even in the presence of syntax errors.

There has been a fair amount of interest in "nano-pass" compilers in the past five years. A lot of complexity could be removed from this area if there was a reasonable way to serialise/de-serialise the intermediate forms of programs that did not create a large overhead in each transformation stage. The overhead is noticeable in terms of runtime and resource use, but currently there does not seem to be a way to estimate how simply a particular transformation could be expressed independently of the large software framework wrapped around it.

Don't have much more to say in a comment here other than, yup.

MTProto

Is there any work on encoding something richer - e.g. allowing declarations of structures as types, and then encoding a stream of data according to those types?

I think this is exactly what MTProto has: A rich declaration of structure types and then a transmission of the data.

The general concept seems commonly reinvented: MIME type headers that tell you what kind of document you're getting, fexpr operatives that tell you how to process their arguments, dynamic type tags that tell... but I personally hadn't seen rich, type-theory-inspired encoding tags until MTProto.

At one point I was trying to explain a possibility available to an extensible parser on the Arc Forum, and it started looking like a type system much like MTProto's. So it looks like this kind of encoding-declaration header blurs the distinction between type constructors and parser combinators.

Richer Type Systems

Part of the problem I see with richer type systems, if looking at object oriented languages, is the binary data is the private contents of the object, accessed by the methods. This leads to needing to send code to the remote end (either in-band or out of band). This shows up as serialisable objects in JavaScript when generating or parsing JSON.

This seems a fundamental and unavoidable problem, so to me there seems no alternative to structured data, like JSON where there are a fixed number of primitive types and some container structures like dictionaries (each entry has a label and a type tag) and lists (enties accessed by position, single type tag for whole list).

I don't think this need be plain text, but it should be a transmittable protocol, so in band tags to differentiate the primitive types seem necessary. In JSON / XML this is achieved with special character sequences, in a binary protocol you would need tag bits/bytes.

You could allow extension of the tag bits, but all that does is create an opaque package, obscuring the contents, which ultimately have to be of primitive types. I would say this is a failed attempt at security (as you can still inspect the byte stream), or compression by leaving out individual tags. I would suggest for these purposes using zip and/or encryption on the data and sending it as a normal byte array.

So in summary I would be against out-of-band communication of types as it offers no real security or compression, and just serves to make the data harder to work with.

Edit: you might not need dictionary types, if you had tuples instead then a dictionary would be a tuple of pairs, where the pair is name and value. This would be beneficial because the format would not have to deal with type 'label' encoding for the dictionary. You would only need float, unsigned and signed integers of various lengths, and the array and tuple types as your primitives. The labels for a dictionary could be utf-8 encoded strings sent as byte-arrays.

Streams? How quaint

IMHO a stream is not a good interprocess interface. It doesn't address all the things a modern UI/OS should handle, e.g. a diagram editor.

I think (re)active values, akin to Barbour's RDP are more interesting.

In my hobby Unix successor project, processes exchange values (which can be huge) using a REST-like protocol with two messages:

  • Give me the whole value and its current version V (the value can be split into parts if it's huge, similar to an Atom paged feed, or in a tree shape for parallelism)
  • Give me the changes since version V (as a diff/patch) and the new version V' (the server may wait a bit before answering if V is still the current version, effectively turning this into a long-poll/push notification)
  • There's a special response/error code with which a server can tell the client that it can't deliver a diff against a version (might happen if the version is too old and the server doesn't keep complete history), and the client has to refetch the whole value

Ammonite

I never used it, but saw a video about it. It is a typed shell script.

Ammonite enables shell-like scripting in the Scala programming language. It is made of a few subprojects:

    Ammonite-REPL: A Modernized Scala REPL, with tons of bug fixes and features
    Ammonite-Ops: A Scala Library for convenient, rock-solid Filesystem Operations
    Ammonite-Shell: A modern replacement for the Bash system shell

Typed shell has been successfully implemented, and then forgotte

A Functional Shell that Dynamically Combines Compiled Code by Arjen van Weelden and Rinus Plasmeijer (2003).
Abstract. We present a new shell that provides the full basic function- ality of a strongly typed lazy functional language, including overloading. The shell can be used for manipulating files, applications, data and pro- cesses at the command line. The shell does type checking and only exe- cutes well-typed expressions. Files are typed, and applications are simply files with a function type. The shell executes a command line by combin- ing existing code of functions on disk. We use the hybrid static/dynamic type system of Clean to do type checking/inference. Its dynamic linker is used to store and retrieve any expression (both data and code) with its type on disk. Our shell combines the advantages of interpreters (di- rect response) and compilers (statically typed, fast code). Applications (compiled functions) can be used, in a type safe way, in the shell, and functions defined in the shell can be used by any compiled application.
A follow-up paper A Functional Shell That Operates on Typed and Compiled Applications

Selling point?

There have been many "typed shell programming" libraries designed in typed functional languages. One example would be ShCaml, an OCaml library for shell programming, Alec Heller and Jesse Tov, 2008 -- which, relevantly to the current discussion, uses row types to type data pipelines.

If you point out this work in particular, I guess it has a distinctive aspect and went further that these other libraries in some aspect. Would you mind elaborating a bit on what you like in this 2003 work in particular? Looking at the abstract only, it looks like they have put more thought in the compilation chain, it's not just "convenient scripting inside your usual language toplevel" but something closer to a dynamic experience. Is that important? Did they do it in a really novel way? Should we reuse this approach?

Note: the "embedded DSL inside a host typed, functional language" approach, that is less ambitious than redefining a shell from scratch, has one nice property. It composes nicely with the notebook approach of giving a nicer-than-terminal interactive experience to existing language. For example one may want to combine ShCaml on top of IOCaml, the OCaml notebook build on top of IPython/Jupyter, to get both a nice interactive experience and convenient programming, with an appreciable economy of means.

ShCaml

This paper is an absolute goldmine - thanks for the pointer.

I have to disagree.

This is interesting work, and certainly what they are describing can be seen as a step towards the executive system for a fully typed OS, but I would not describe Esther as a shell. Certainly it is an interpreter for a language that includes functions to operates on files and processes. But then again, so is pretty much any interpreted general purpose language. I would hesitate to label them all shells.

But at this point there is no strict definition of what is or is not a shell, so I can only offer my own internal model as an opinion. A shell is a simplified language that initiates control functions on a machine, and interrogates the file-system using an application of Fitz law (that is using a prefix-code for interactions). Obviously there is no mouse or target-region in the CLI, but all shells seem to attempt to minimise the number of characters to achieve a given function - with priority given to more common functions. Esther does not look specialised to that purpose given the examples in the paper - it does look like most functional languages in that the syntax/semantics are tuned to minimise the work required to express a function.

Some of the aspects of Esther look very much like what I had in mind - the decision to type every file follows naturally from a design choice to force a non-trivial typing over data. But the need to express Esther wrappers for commands written in another language looks awkward - I would claim (not too strongly) that language-neutral systems tend to scale-up better than those that require one language to be the primary interface. I'm sure that smalltalk fans (and many lispers) would disagree strongly, so its not an argument that I would press very hard. But all successful OS' have been language-neutral :)

I would also claim that their type system is too large to be the most useful for that problem domain. This is something that I will probably trip over trying to explain. A type-system that includes functions is certainly more useful for modelling programs, but is it the most useful choice for modelling non-program data? A type system that only focuses on modelling the structural properties of data, preferably in an extensible manner, but without any computational types would be more tractable for many forms of analysis. Given that many filters are simply transformations of one kind or another, restricting the domain/codomain to be simpler than functions should improve their composability.

The kind of "usefullness" is obviously context-sensitive - if I were writing the core functions of an OS then I would definitely want function types so that I could use them in expressing the APIs. In this case the larger type system is more useful. But if I am providing a shell to someone then I want them to work in a more restrictive model, maximising the number of ways that filters can be successfully composed should increase their productivity. So "usefulness" in this context means ensuring that operations are more uniform over the types that may be fed into them. In this case the distinction between data-type and function-types would only increase complexity.

Thanks for the link to the article - that was exactly the kind of previous work that I was hoping to uncover.

Higher-order functions are not language-agnostic

One problem with having function arrows in your data description domain is that higher-order functions are not language-agnostic. What "function from A to B" means in OCaml is completely different from what it means in Haskell, which is again completely different from what it means in Coq. Even in terms of computability, the golden Church-Turing thesis does not extend to higher-order types.

I would argue that supporting higher-order function is incompatible in practice with the idea of a language-agnostic approach. Two possible approaches would be to take the least common denominator (total (pure) functions), but this criterion is impossible to enforce in most languages, or to assume as a common semantics the union of all function types of all cohabiting languages, and this means that none of the languages really has the semantics we think it has (their observational equivalence for example), they become polluted by the interaction with all others.

(Designing safe multi-language semantics with higher-order functions is of course possible and an interesting research topic. But it requires careful design of all languages involved and/or very strong static analyses, not a "we accept everyone" approach.)

One problem with having

One problem with having function arrows in your data description domain is that higher-order functions are not language-agnostic.

Do you have a simple example (where the data is not source-form of a programming language)? I've been thinking so far of problem domains where data is just data - no attached code. What kind of domain would necessitate arrows to describe its data?

That shouldn't be too hard?

A plugin for a text processor, a (composite) transform for an audio tool, a listener for updates to be applied after any transform on raw data, a semantics checker..

Enough?

Code?

Would they not all be pieces of code? Obviously some data is code, and all code is data, but I was wondering specifically about data that is not code.

Excel Sheet?

Typed excel sheet?

But at this point there is

But at this point there is no strict definition of what is or is not a shell

What qualifies as a shell seems to consist of any programming language with:

  1. concise function names (mostly for common commands)
  2. optional and named function parameters (for command arguments)
  3. special operators for directing data flow to/from stream values (typically < and >) and composing stream pipelines (typically |); these operators should only accept data in a particular format, so general language types may not apply by default

Most shells have sufficiently simple types that they can intermix textual and native forms of the same type. The same could be done for more sophisticated types if your language has a way to reify a value as an expression that can reproduce that value, though I wouldn't recommend it.

language neutral? no such thing

all successful OS' have been language-neutral :)

Modern OS's have a strong bias towards C family languages. Some OS functionality is to work around C foibles, e.g. memory barriers between processes. Languages outside the C family re-implement functionality provided to C by the OS, e.g. thread models and schedulers, memory management. In some cases, they must fight with the OS to efficiently work around features designed for C, e.g. GC and orthogonal persistence tend to interact poorly with OS-level virtual memory paging. OS-provided filesystems heavily favor mutability which isn't optimal for caching or persistence of purely functional applications.

If we instead had an OS biased towards memory-safe, garbage collected languages, I think C would have the harder time of it. The C compiler would be forced to inject memory safety checks all over the place.

Even hardware isn't very language neutral. Modern hardware is certainly developed to improve performance for the word-at-a-time programming of C-like languages (abstract register machines, really) rather than, say, operating on huge streams of data or massive neural networks or incremental computing with efficient, consistent cache invalidations.

Memory Safety

I think this looks at memory safety backwards. If I can write a machine code program by hand (so modifying the 'C' compiler will not work), that can look at your program and extract information from its memory, then the security is broken.

Re: Memory Safety

You seem to be making an assumption: That an OS must directly support conventional executable formats like ELF or PE that are more or less naked machine code. There is no fundamental reason to make this assumption. We could have an OS that requires all executables be provided as a typed assembly or bytecode, or a proof-carrying format that packages machine code with a proof that it is a refinement of a bytecode.

And that doesn't even touch on your assumptions about the machine code. Consider capability-based addressing or content addressing. A lot of existing hardware was designed to make C programs on C-family operating systems run ever faster. But I find it interesting to imagine what we'd have if hardware development was driven by a different set of programs, with differently weighted priorities. Hardware oriented around running neural networks very quickly, for example, would look very little like what we have today.

C and Java

'C' was designed for the hardware, not the other way around (A PDP-11?). Various attempts have been made to design functional hardware, and they have all failed. There are many reasons for this. To succeed you would need to design something that has the features you want, yet is fundamentally simpler than current architectures (IE achieves the same performance with a lower transistor count), or targets a niche market where price, performance and power consumption are not key business drivers.

Neural networks are just pattern matchers, and I have designed a chip for this in the past. It looks like a 2D array of synapses (which are multipliers that compute the weight times the neuron output), with the 'neurons' along the diagonal (which are multi-input adders). It's not a general purpose computing tool, and is not a replacement for a computer, and does not have an OS or run programs.

You could have a kernel that could only run validated byte-code, but the performance cost is too high. It was tried with Java machines and failed. I don't see why the same idea would succeed now.

Edit: Apparently the failure of Java on hardware could be due to it being a stack machine. It is faster to JIT compile the byte-code to run on a register machine, than it is to natively run it on a stack machine. So a validated bytecode for a register machine seems a possibility.

Edit2: It would seem a viable plan would be to have typed-machine-code for x86-64, and then write a type checker that could operate as a 'pico' kernel that consists of your type checker. You would want to keep this to the minimum possible amount of code, so you would use it to type check the rest of the kernel, and then every executable the kernel loads. This type-checker core would both have to be proved correct (in terms of the type system) and bug-free, and would have to encode all operations the machine can do (side effects and IO) as you could have no 'trapdooors'. Then of course it would only work on one CPU, and they keep adding instructions and changing the architecture, so its a moving target requiring continuous updates and changes, and hence revalidation.

cycles and forces

'C' was designed for the hardware, not the other way around

Do you know about vicious and virtuous cycles? C was designed for hardware, true. But this doesn't imply the "not the other way around". Because C family operating systems and programs are widely used, hardware is designed to make those run faster. And because hardware makes C programs run fast, people write more of those.

There's a big challenge for anyone trying to break into this cycle, e.g. economies of scale for manufacturing, the many years and massive funding it takes for hardware to mature to develop as many performance tricks as are used in modern CPUs. There are also political issues, e.g. the threat you present to an established system that has orders of magnitude more money, lawyers, and lobbyists. Breaking into some niche market, as you mentioned, is sometimes a good way to get started and remain under the radar of established competition.

The idea that technologically superior solutions win by virtue of said superiority is laughably naive. There are many other forces determining success.

Various attempts have been made to design functional hardware, and they have all failed.

GPUs with shader pipelines are relatively functional. You feed in a stream of vertices and a few textures, transform it functionally, render it functionally, and further transform it functionally.

Neural networks are just pattern matchers

So is a normal CPU: it just happens to 'match' opcodes and whether registers are zero or not for control flow. Neural networks can be designed to 'run' programs, too. Indeed, such designs have recently been mentioned on LtU (Neural Programmer Interpreters).

You could have a kernel that could only run validated byte-code, but the performance cost is too high.

Bytecodes can have very high performance. Indeed, we can understand x86 as an ad-hoc machine-oriented, unsafe bytecode that is understood by our CPU.

Making Java Bytecode run fast on hardware is feasible, but I expect it would take many years for the resulting hardware to mature to match hardware designed for the popular machine codes today. JIT offered some similar benefits with less risk and effort. When I mentioned passing bytecode to the OS, I wasn't implying that the bytecode would run directly on the hardware in any case. The OS could do JIT.

There are other techniques I mentioned - typed assembly, or packaging machine code with a proof. Requiring validated code is hardly a problem. Modern GPU and GPGPU drivers actually receive *source code* and compile it internally. We could do that at the OS layer, if we wished, using a safe subset of ATS or something.

have typed-machine-code for x86-64, and then write a type checker that could operate as a 'pico' kernel that consists of your type checker. You would want to keep this to the minimum possible amount of code, so you would use it to type check the rest of the kernel, and then every executable the kernel loads.

Having a minimal TCB is nice, but not essential. Having any TCB would be a good start.

An alternative to checking executables on every run is to check once then sign off cryptographically and cache the signatures for future loads of that executable. This allows much more expensive checkers. It also generalizes easily to trusted compilers. This technique would be biased to AOT compilations or persistent runtime models instead of repeating a JIT each run.

We can also target a simpler bytecode that is trivially compiled in a high performance way for modern processors. (Or whatever kind of processor you're targeting.)

Architecture vs Optimisation

Although what you say about virtuous cycles seems correct, it overlooks the architectural issues in hardware. Register machines are faster than stack machines because it works well with pipelining. Whist you could overcome this with extra hardware that hardware itself would increase the time taken to perform operations and slow down the max clock speed. The hardware architecture is the biggest influence on speed and it has nothing to do with 'C'. C was designed to fit the best performing HW architecture.

Since then optimisations have favoured 'C', but I would contend that all the optimisations put together do not overcome the architectural advantage.

So what we have is, we design the best performing hardware architectures (which is currently super-scalar out-of-order single-instruction-multiple-dispatch register machine). This has nothing to do with the languages it will run.

We then need a language that will run efficiently on this optimal hardware architecture, so 'C' does nicely. Then we optimise the operations as much as possible based on real world (application) usage of the language. So I would argue that the optimisations are more targeted at the applications written in the language rather than the language itself.

I think people who would like their favourite language to get special hardware treatment do not understand the hardware and think that they can push any operation down to the hardware and it will be efficient. I don't think this approach can ever work. You have to look at what the most efficient hardware structures are (where I define efficient as providing the highest performance with the fewest transistors) and work from there. Mutable register CPUs are not going anywhere, and experience has shown that we are at a sweet spot, neither stack machines nor VLIW are faster for general purpose computing.

re: architecture

I agree that architecture is important. To get optimal performance, we need a computing model that is nicely aligned with physical constraints, or at least those we know how to take advantage of through technology. Register machines do have an advantage here, because they require a constant amount of space which corresponds to a fixed allocation of hardware.

OTOH, most C programs already model stack machines. They, after all, perform much computation on a stack. We already know many hardware tricks to make this efficient. Caching gives us a limited illusion of locality of the stack in hardware. The more active elements on a stack can represented in registers, which we can understand as a much closer cache than even L0. These performance tricks can be implemented by hardware. I don't see how they would hinder CPU pipelining.

I don't believe I was "overlooking" anything with regards to architecture. If I said something like "logic programming can be hardware accelerated" without qualification, that would be much more dubious because of the ad-hoc fan-ins and fan-outs and the unclear representation.

In any case, the architecture we use today is far from the only model that is a good fit for physical and technological constraints. Where you say "C was designed to fit the best performing HW architecture", you really should qualify that with an "of its time". The best performing HW architecture today is the GPGPU. Centuries from now it might be something entirely different, perhaps specialized pluggable neural networks for sound and holographic video and language, etc..

Also, performance isn't the only property we could weight heavily. Power consumption, secure distributed computing, survivable networking, ease of physical and logical composition, precise timing and latency, etc.. are other considerations that could be supported with hardware.

The market values performance, of course. People don't have any interest paying for features they don't understand, or that seem to do nothing at all when successful (like real security, as opposed to security theater). Just a foible of humanity.

I would argue that the optimisations are more targeted at the applications written in the language rather than the language itself.

I agree with this. I even wrote it initially, when I spoke of "C programs" and "C-family operating systems".

But languages affect how programs are expressed and which compile-time optimizations are feasible. So the net effect is that hardware optimizes for a particular 'way' of expressing applications, and hence for a particular family of languages.

people who would like their favourite language to get special hardware treatment do not understand the hardware

I expect this is true of many people. Yosef Kreinin had a nice blog post to debunk the myth that we can just shove any computation down to hardware and make it run faster.

But blanket categorizations are unwise.

Despite chaff from widespread misunderstandings, there are many things that can be hardware accelerated, and can be effectively leveraged by alternative programming models or languages, or infrastructure for the same (GC techniques, etc.).

Alternative efficient hardware architectures will not succeed without enough support from the rest of the ecosystem: algorithms, programs, and operating systems well situated to leverage them, applications to massage data into the right format, economies of scale in manufacturing. And trying to alter the ecosystem is generally expensive and risky and scares a lot of people who might be well positioned to oppose the change. Underestimating systemic forces is no less naive than assuming dedicated hardware can usefully accelerate anything.

Aside: I'm interested in statically allocated functional language for efficiency on register machines or fixed allocations on an FPGA. It would be very nice to make robust assertions about which functional subprograms (possibly folds over a stream) could be implemented in a fixed space. FP or FRP with affine or linear values is actually a pretty good fit for most computer architectures, even down at the bit level.

re: GPGPU

GPGPU is only the best-performing HW architecture for applications that are similar to its original design space. Specifically, GPU hardware cheaps out on both memory coherence and decision logic -- so, it is very efficient for tasks that need neither, but inefficient at tasks that benefit from both.

I agree that there is a lot of specialization in the design of commercially successful general-purpose processors, but I would argue that memory coherence and decision logic are so generally and fundamentally useful that their absence should be counted as a specialization, not the reverse.

GPGPU mostly depends on a

GPGPU mostly depends on a bunch of threads doing basically the same thing only on different parts of relatively adjacent memory. Adding the "missing" hardware mostly doesn't even make sense under that specialized processing model (relevant missing hardware has been added already in the higher end GPGPU solutions).

Not general purpose

Which goes back to my point about super-scalar out-of-order single-instruction-multiple-dispatch register-machines being the most efficient general purpose CPU architecture we have, as GPGPU is not general purpose, but only more efficient for some tasks. Many tree algorithms like the upper-confidence-bound monte-carlo technique do not parallelise, and if they did each thread would take a different branching path.

Right, the name is a bit of

Right, the name is a bit of a misnomer. My only point is that putting those features back wouldn't fix the problem since they wouldn't make sense. We are still far away from a general efficient parallel processor architecture.

Control Flow Does Not Parallelise

I think the situation is worse than that. I don't think control flow really parallelises at all. Highly parallel machines are great for specific calculations, but you still need to get the result of that calculation before you can make a decision based on that. Fundamentally programs are about a sequence of decision points interspersed with calculation. Yes running background tasks and servers require multiple of these processes in parallel, but each one needs to be able to take an independent branching path.

Approaches like (hardware) actors translate the temporal nature of a sequence of decisions into a pipeline, which is great if you need to make a large number of the same decisions, like dataflow architectures, and are great for things like mpeg compression/decompression, encryption etc.

Hardware/Software co-design has been trying to address this problem for at least 20 years with no real progress towards practical applications, so that you can configure an FPGA as GPU like parallel workers, or a pipeline like dataflow for compression as needed. You could augment the CPU instructions set with custom instructions loaded into the FPGAs for parallel and pipelined operations. It all sounds great until you consider sharing that resource between processes, and dynamically loading different combinations of instructions to optimise your usage of the FPGA. The problems are the FPGA configurations are large, and hence unacceptably slow down context changes, and fitting different blocks together dynamically requires routing, and placement and routing is a hard optimisation problem you don't want to be doing at runtime, and you would need some part of the FPGA compiler to be present in the operating system at runtime.

The more I think about it, the more I think the best option is to keep going with the current architecture. Increasing the size of the re-order buffer and increasing registers and ALUs allows longer loops to be pipelined in the CPU core itself. More SIMD units and wider data words allow GPU like functionality. More cores allow more parallel processes.I think the dedicated GPU should disappear when the CPU can have dozens of cores, and each core can run SIMD on wide data words (equivalent to an entire warp in the nvidia architecture). SIMT on the CPU seems a logical next step. This will happen for the same reasons the dedicated FPU disappeared. I think attaching the GPU warps directly to each CPU core, would lead to a more efficient architecture, and avoid problems sharing GPU resources between processes. You would also save on having to have separate control flow logic in the GPU.

Edit: This seems interesting along these lines: https://www.irisa.fr/alf/downloads/collange/talks/ufmg_singleisa_scollange.pdf. I like the OO-SIMT point in their continuum, but also the idea of providing a range of cores at different points, optimised for low-power, sequential performance, parallel performance sharing the same instruction set seems really sensible. You can migrate daemon processes that don't do much to low power cores, enabling the idle machine to really cut power consumption, but provide fast single thread, and parallel performance when necessary. Writing custom code for the GPU would disappear, but we would have to all get more used to SIMT, and perhaps this is where a language specifically for SIMT would be interesting. This is the only paper I have been able to find on compiling for SIMT https://www.researchgate.net/publication/261313978_CuNesl_Compiling_Nested_Data-Parallel_Languages_for_SIMT_Architectures which appears to be based on the parallel language NESL.

Control-flow

Control-flow has never really been an issue in parallel architecture designs and it scales just fine - the issue is the difference between independent decisions and data-dependent decisions. In particular, when data-dependent decisions depend on the results calculated by previous instructions. Latency is a much harder beast to slay than throughput. Control-flow that can use latency-hiding tricks works just fine. The original motivation for vector-style arrays in GPUs was to reduce the expense of the fetch-decode logic by amortising the transistor cost across multiple execution units. This makes sense for some problem sets in some cost models for the hardware design.

I would speculate that has gone out of the window over the 10+(?) generations of hardware that we've seen on GPU boards. The transistor count for a core that can do double-precision floating points operations probably makes the decoding of the in-order instruction stream an insignificant cost. MIMD machines have been around forever - whenever the cost of decode is high in comparison to execution we see SIMD designs dominate, and later in a given architecture cycle when the cost of execution dominates we see MIMD machines providing more flexibility.

There are several sweetspots - hence the various architectures that dominates their own particular niche right now - but in order to see those sweetspots you cannot separate power-efficiency from absolute execution speed. The main advantage of a GPGPU right now is that large vectors of similar executions allow a huge amount of latency-hiding when coupled with an expensive cache. This is generally the types of computations that gain a massive speedup on a GPGPU architecture. The wideness of the problem (in terms of independent sub-problems that are only data-dependent on the results from a previous phase) needs to be several larges larger than the number of cores on the design. Not least of all because the warm-up cost of the cache and the fairly deep pipelines now in use inside the cores impose a huge program-switching cost in terms of wasted cycles getting back to a steady state.

If your problem is narrow and deep; if you convert it into an acylic-DAG, e.g. using the SSA transform and the DAG-width is low then there is little that parallelism will buy you. It only makes sense to push diminishing returns (in terms of absolute performance at the expense of efficiency) as far as you can and then you get something like a modern x86 design with a huge ROB, large register file and very low-latency arithmetic/logic operations. It's not elegant, it's not pretty (other than to an architecture geek) but it will maximise absolute performance for problems in that domain.

If a problem is narrow and shallow then you have plenty of independent sub-problems, so scaling is relatively easy regardless of the architecture that you choose - but a GPGPU-like design will probably give you the best power-efficiency.

If a problem is wide then you have a huge opportunity for intra-problem parallelisation, but you will require a lot of scratch memory to exploit that opportunity. A register machine is good win here because the transformation onto the register addressed version is simple and well understood. Where this becomes complex is that if the wide-problem is also quite shallow then you may maximise both performance and efficiency by ignoring the potential for intra-problem parallelisation and simply scheduling it on a vector machine, where the small job size allow exploitation of parallelism between the jobs. This also avoids the need for a complex architecture with lots of scratch memory running at high-speed.

For these reasons it is likely that we will continue to see architecture that are narrow and in-order to maximise power-efficiency on latency-tolerant problems, out-of-order designs to minimise latency on data-depend chains, and vectors to hide latency on problem types where it is possible. Unless all of the computing jobs that we do collapse into one of these buckets I would find it very unlikely that one architecture would be "better" than the others. If you are interested in the many core with large vectors style of architecture then check out the Loongson family as they are doing interesting things in that space.

Control Flow (qualification).

Perhaps I should qualify that with "control flow for most common tasks does not parallelise". I think the demand for a fast single threaded processor is not going away, except in specialist niches like scientific computation.

You don't mention SIMT which I think is an interesting architecture between SIMD and SMT. If x86 gets a SIMT instruction set, to complement the SISD and SIMD instruction sets, it could do a lot of what current GPUs do.

I think the idea of a range of cores accepting different workloads is a good one. In-order low power cores, out of order cores for fast pipelining, and highly parallel GPU like cores. I think it's important that they all use the same instruction-set, and behave like different cores in the same CPU, rather than the GPU being like a periferal.

GPGPU design space

Is your first sentence any less true if you replace GPGPU by CPU? No matter which HW architecture we use, we end up squeezing into it apps (or fragments thereof) that are a poor fit. GPGPUs are nonetheless a fine target for many domains: machine learning, probabilistic models, physics sims, graphics and sound processing, etc..

Almost any "decision logic" can be enhanced by fuzzy, weighted, or probabilistic models. Frequently, doing so improves the results in relevant domains (face detection, business decisions, natural language processing, soft constraint systems, etc.). Even if you feel these aren't an enhancement, they are strictly more general because binary logic is simply a weight of zero and one. The binary decisions of CPUs is arguably specialized case. The default behavior of GPUs to "take both branches" and combine weighted results is far more general than most programmers initially credit.

Repeatability

I am not convinced by the fuzzy logic argument. Computers like the mechanised loom from which they descend are so useful because they reliably repeat the instructions over and over. Probabilistic decision models have a limited application to pattern recognition, but not general computing.

A general purpose CPU can generate 3D graphics, and perform all the operations GPU can, although more slowly. A GPU cannot perform all the operations a CPU can due to lack of hardware security rings, MMU etc. I definitely think adding SIMT instructions to a CPU would be a good idea, and would lead to a genuine general purpose hybrid CPU/GPU.

Human intelligence just

Human intelligence just emerges from a bunch of fuzzy recognizers, and it is considered pretty "general."

The Intel Phi adds SIMT, though I don't think it (or its descendants) has been at all successful yet.

Human intelligence just

Human intelligence just emerges from a bunch of fuzzy recognizers

Perhaps somewhat aside, but I suspect it doesn't "just" emerge; I suspect there's a "trick" to it. Keep piling together fuzzy recognizers, without knowing the trick, and you won't get a sapience, just a really big pile of fuzzy recognizers.

I'm sure it just emerged,

I'm sure it just emerged, unless you believe in intelligent design or something (or to say, the trick "emerged" like convolutional nets did). It is quite fascinating when intrinsically fuzzy computational units (e.g. neural nets) can start doing unfuzzy things like math and logical reasoning. I wouldn't be surprised if future computing hardware follows a similar path. That we are using CPUs and GPUs for the fuzzy stuff (and even non-fuzzy stuff) right now is probably not very optimal.

If it were that easy,

If it were that easy, it would have happened a lot, and I don't think it has. (This is exactly the subject of the blog post I linked a little while back in another LtU topic; sorry to link it again, but, this.)

We are getting awfully close

We are getting awfully close with deep learning. Frankly, the only thing left is to figure out arbitrary abstraction usage in addition to the ability to form abstractions that ML already provides. The rest is just scale.

It will be interesting when we know enough to create custom hardware for this kind of stuff. I bet what we think of as general purpose combustion (with ALUs and coherent memory) is actually the special case.

Deep Learning

Deep learning is not awfully close. DL can categorise things hierarchically, but shows no sign of consciousness. It's still just a sophisticated associative memory. There's nothing really new about it. Multi layer networks have been around since Kohonen, and there has really been nothing new in the field since then. All DL really does is fix the training algorithm. Basically we could not get multi layer training to work how we thought it should just using back propagation. The DL breakthrough was to train only the first layer until it became organised before starting to train the next layer, and proceeds to train one layer at a time up the hierarchy. There is no difference in the network architecture, and the theory of what they recognise has not changed.

Personally I think a lot of human potential is pre-coded in the brain architecture. Natural selection over time has evolved the brain architecture, and there are subtle differences between people (obvious ones emerge as tendencies towards aspergers, neurotic, psychotic tendencies etc). Neural networks are just a basic component like a transistor. To make a brain, neurons with different connectivities and transfer functions need to be arranged into different topologies, and topologies of topologies. The simple neural network neuron (weighted sum) is also probably not the right model as it has no local inhibition, temporal inhibition. Networks also have a separate training phase, not continuous learning. We probably also have to understand sleep.

Edit: sparse coding which is used in some DL does deal with a form of local inhibition., but sparse coding is a kind of deep-learning algorithm, so its not a requirement to sparse code for something to be considered DL.

Intel Phi

Does the Phi use the x86 instruction set? Does it add SIMT to the x86 architecture without slowing down single threaded execution? Basic SIMT added to x86 is what is needed, not an add on card that is more like a GPU. I don't think the Phi is the right approach.

Phi uses x86, with wide AVX

Phi uses x86, with wide AVX (512 bits) and 8-way SMT.

Statically allocated functional languages

I am interested in this too, perhaps where resources are handed out by the OS or something else. However in place mutation (for example the in place string reverse by swaping bytes) is simple and efficient, so I am not convinced functional is really what is needed. Regular procedures (defined by Stepanov as a procedure that always generates the same output given the same input) seem a more useful concept. They give localised control of side effects, so a procedure passed a string, can return that string reversed. When languages like ML provide mutable references, I am not sure there is really much difference. You might as well add looping primitives to improve readability, and non-closure forming blocks to improve performance. There is definitely something interesting between a pure functional and imperative language, without garbage collection, and with some kind of side effect control.

In place mutation in pure fp

I agree that in place mutation is useful for performance. But it's available with affine (single reference or 'owned') arrays or vectors. No need for a procedural semantics.

FP on top.

I don't see procedural semantics as necessarily a bad thing, and I also think that functional semantics are more complex than procedural semantics. Naturally things are mutable, have identity, and are multiply referenceable, like objects. I think more restrictive semantics are built on top if this by adding restrictions. I realise this is contrary to a mathematical notion of simplicity, but mathematics is already "unreal" and abstract. It is in effect the wrong notion of simplicity for electronic computers. The correct notion of simplicity is based on registers and combinatorial logic, the mutable fabric on which computers are built. The finite-state-machine is the simplest model of an electronic computer, and is where mathematical modeling should start. A function is a more complex abstraction built on top of this, and requires certain proofs, for example we must prove immutability, termination, or any other property we want the FSM to have.

Natural and simple rarely coincide

I disagree with your opinions about simplicity. And I think "natural" things are rarely simple (evidence: brains, jungle ecosystems, English language, 3-body gravitational systems, DNA, weather systems). Indeed, I tend to assume "natural" as better described by "so complex that we barely comprehend it even with our best theories".

Regardless of your other beliefs favoring procedural, the specific issue of in place mutation with pure semantics has been handled in several purely functional languages: clean, mercury, ddc haskell, safl, and my own.

Natural functionality is

Natural functionality is emergent through lots of random trial and error, it makes sense that we couldn't reverse engineer it just like the machine learned models we can produce today are pretty much opaque. It is definitely very stateful.

Different goals on simplicity

First, I'm happy to see you here again :-)

I've argued along the same lines as you, but I recognize that Sean has different goals.
He wants to enable people to program more intuitively.

I'm not sure about you, but for instance Dijkstra cared about correctness, and morphing the mind of programmers to teach them the ideas that best supported correctness. When people objected "that's too hard", he pointed out that surgery doesn't get such criticism.

Hopefully, once we'll figure what people should learn to program well, we'll also learn to teach it effectively. But I fear this will take decades, and I'm not sure what will happen of programming by then.

Surgery is nothing like

Surgery is nothing like programming. Heck, surgery doesn't have universe control (and responsibility) like programmers do. I guess if you asked me to repair a program, while it is executing, with a patch job that wouldn't kill the space probe it was running on...well, that would be surgery! And correctness would involve practicing and simulating the repair multiple times before it actually happened. Thank god for simulators, because space probes are expensive!

Generally, correctness is more a matter of the domain than technique. Does your problem have a notion of correctness? What are they? Can they be represented somehow? In the waterfall world, we'd then figure everything out up front, write a spec, implement that, and ship.

So maybe programmers should only write programs with meaningful correctness properties, everything else be damned? Dijkstra was wrong because the world actually needed programs written where people didn't even really understand the problems very well. Programs generally suck not because not enough rigor went into their construction, but requirements kept cropping up after work began, or necessary requirements were never discovered before work ended.

But the crappy program we have is better than the perfect one we don't have.

Anyways, we know where this will end. Eventually programs won't be written anymore, of course. We might have only a few generations of programming left, enjoy it while it lasts!

Trained correct intuitions

Sean and I share a lot of simplicity goals, including both an ability for computers to help program and the ability for people to program more intuitively. But we do have some divergent hypotheses.

I'm more interested in trained intuition, where our intuitions are something we (or another generation) learn from a young age by interacting with applications and programs designed in a certain way - one designed for extensibility and composition. Because invariants and inductive properties, when composing larger programs from smaller ones, are an excellent foundation for trained intuition. Sort of like conservation laws in physical systems.

I agree with Sean that 'correctness' of a program is very domain or problem specific, and not every problem has a well defined notion of correctness. But we can at least aim for correct intuitions about program behavior, how a change in a subprogram will affect things.

IMO, humans have a lot of bad natural intuitions. So, appealing to untrained intuition strikes me as a mistake. And not just in programming.

Uh?

What is an untrained intuition?

Design is the difference

Design is the difference between training and other forms of learning or experience.

Dunno

I think I get what you're aiming at but that last sentence is far to cryptic for me.

You can develop intuitions about programming through training. I see that. And design is probably going along with your intuitions, so they better be well developed.

But an untrained intuition? All intuitions are trained. Maybe something along the line of: badly applied intuition? Or intuition out of context.

Needs more work.

Most intuitions are

Most intuitions are untrained. You seem to be conflating "trained" with "learned". But training connotes something closer to "learned by design". One might compare weight training vs. gaining muscle through strenuous work. Training can be subtle, and doesn't need to be thought of as such by all participants - e.g. a dog we train might think of it as play. But it is not the case that all play is training.

An intuition learned via arbitrary experience is not trained. But unlike our muscles, our brains are full of heuristic biases that easily lead us astray. Confirmation bias is especially bad when forming intuitions. The main problem with untrained intuitions is that they tend to be inconsistent or incomplete. Intuitions that lead to bad conclusions aren't necessarily 'badly applied' or 'out of context' - they can be flat out wrong, like bigotry or superstition. I am not aware of any way to reliably instill correct intuitions without designing for them.

Don't see that

Most intuitions are untrained.

I imagine some stuff is hard-wired into ones brain. Though I wouldn't know concrete examples where I'ld be absolutely sure something is hard-wired and not trained. Love for kittens? Wouldn't know.

But looks to me that almost all intuitions are developed from training, or learning. I guess I conflate the two. But you train responses right? At some other level of abstraction that might involve learning. Looks like a matter of getting our definitions to align.

I don't think correct intuitions exist. You might have an intuition leading to a solution, should that be labeled "correct"? It worked out, maybe it doesn't work out in a different scenario.

I don't think correct

I don't think correct intuitions exist

Now that you mention it, one might plausibly reckon that the absolute notion of correctness has fundamental limitations described by Gödel's Theorems, whereas intuition is closer to the non-discrete realm to which Gödel doesn't apply.

Some intuitions are

Some intuitions are basically hardwired via evolution. Like our fear of snakes, and some in-built knowledge about nutrition that arise only in starvation contexts.

Many biases are simply generalizations on experience or instilled in us by our biased environment. I would guess cognitive biases are a combination of the two, though if the former, I wonder what evolutionary advantage they gave us? E.g. perhaps confirmation bias allowed us to harmonize reality with our beliefs, leading to more confidence (with the idea that confidence conveys an advantage in competition). Of course, not all adaptations have to be beneficial, as long as they aren't too detrimental. Otherwise, perhaps cognitive biases are mostly learned, and have worked out to good results enough that we trust them.

Biases are quite dangerous when applied to situations that we aren't evolved for...e.g. when flying airplanes (we never evolved flight capabilities) you should really ignore your instincts and trust your instruments. ML also allows us to create generalizations over data that are completely free of human bias.

As for programming, I'm not really sure how David is applying them here. Why OO is so successful is because it uses our evolved linguistic hardware, and why FP can be hard to learn is because our skills in math are mostly learned on top of that linguistic hardware (which gave us the ability to form abstractions in the first place, but not the ability to free them from our other biases, and our ability to abstract with names isn't very useful in math).

OO and language

OO appeals to our linguistic intuitions, and this contributes to its popularity. But it's an incorrect and misleading intuition. OO locks us into a particular view or model, whereas language allows us to speak and think of an object or system in multiple ways.

Appealing to intuition is only a good thing if those intuitions are valid, lead us truly. Otherwise, it's just something to sucker us in then trip us up until unlearned. Better no initial intuitions at all than what OO offers.

Arguably, relational programming does a better job of correctly leveraging our linguistic intuitions and hardware. We can manipulate relations and take multiple views like we do language. But the connection isn't obvious, so the intuitive appeal is weak.

In comparison, functional programming doesn't strongly appeal to any natural intuitions. Intuitions about functional programs are learned with minimal bias, and tend to be oriented around invariants like purity and determinism, or inductive properties like termination and type. Consequently, such intuitions tend to be correct.

Designing for correct intuitions requires guarding against misleading intuitions in favor of valid natural intuitions or initially neutral models.

You can speak about things

You can speak about things in many different ways, but we place importance on nouns; anti-noun programming paradigms are at a distinct disadvantage. Programming is hard in general, flinging around constructs with nominal bindings can help us deal with complexity, or at least defer or not think about it for awhile. Truth is defined by value in functional programming, which doesn't allow for much vagueness (for better or worse, of course).

Perhaps if humans were more like vulcans, things would be different! But you can't just reject or deny human nature in language design. Our minds took millions of years to evolve language, and it isn't economical to just ignore that hardware to eliminate all bias, especially for most people who have other things to worry about.

But in ML, by all means we should do this. There is no reasons for machines to have the same biases that we do.

Even with nouns, we can have

Even with nouns, we can have many competing noun models for describing the same system. OO still locks us into just one of them, while natural language allows us to freely shift views. AFAICT, the issue of nouns vs verbs is irrelevant to my argument.

Programming is hard in general, sure. But misleading intuitions make it worse, not better. Misleading intuitions can cost entire codebases. If you believe it isn't economical to avoid bias and aim for neutrality, then you should design to leverage natural intuitions without misleading.

I disagree with the premise behind your Vulcan argument. The fact that we are weak at logic makes it important that our languages help with it. If humans were more like vulcans, a language that makes it easier to indulge our fallacies wouldn't hurt.

Values do allow vagueness, btw. You just need to model it. Some paradigms do this as a matter of course. And there are plenty of techniques to work with placeholder values for deferred top-down programming in FP.

Addendum: it seems we've derailed into OO vs FP. I'm abandoning this train now.

Words have flexible,

Words have flexible, ambiguous, vague meanings, which is the whole point! What you see as a weakness, I see as a strength (and also a weakness). The words are there to enable communication. I don't understand what you mean by this. What do you really mean? They start a conversation between the programmer, other programmers, and the computer.

The problem is that most problems are fuzzy to begin with. You don't have enough information, you might not ever have enough information, you have to fill in those gaps somehow...biased intuition it is. Programming paradigms that assume 100% perfect information totally fail. All hail worse is better.

The problem with OO's appeal

The problem with OO's appeal to linguistic intuitions isn't that words are flexible, ambiguous, or vague. It's that OO and natural language use words in distinct ways, and consequently our natural language intuitions mislead us when applied to OO. Natural language words are *descriptive* of some observed system, and this is why we can use many different, overlapping words and views. OO uses words *prescriptively* to define or construct a system. Together with mutation or information hiding, this results in a dominant model, which hinders alternative views (schemas, object or noun models).

While irrelevant to my argument, I have not advocated paradigms that require 100% information. If you practiced FP, you would quickly learn how to program top down with todos, incomplete definitions, open types, etc..

Have you ever seen a

Have you ever seen a declarative description of a dish that you could figure out how to cook? No, this is why most recipes are prescriptive. I think I've said many times before that imperative programming and OOP are joined at the hip: they both fit pret well into how things were done before for boring business tasks that benefit from automation.

Functional programming eschews vagueness and hiding for pure truth. Beyond LISPy REPLs, the thinking process gets the shaft, hence the continued success of the "worse" imperative programming/OOP paradigms. You can express partial solutions, but can you make them complete? Does the language help with that at all? If you just focus on the resulting end product, you would never even think about these problems.

A dominant object model

A dominant object model isn't much of a problem until programs grows to larger scales. I've never seen a 500 line recipe I could figure out how to cook, much less one with 100k lines written by a time-varying team of people over multiple years. Things don't remain the same as recipes at the scales of typical applications, regardless of what our fallacious human intuitions have to say on that subject.

I disagree with your allegations about FP. The thinking process is perhaps different than you'd use for OO, but it's still there. The ability to reason *locally* about code and behavior - i.e. reasoning about code even while some references are unknown or incomplete - is one of the major draws to FP. Developing spike solutions and rapid prototyping and elaborating pseudo code and refining typed holes - all those "thinking through coding" mechanisms - aren't any more difficult in FP than in OO. The ability to confine effects makes many gradual development techniques (e.g. continuous feedback via tests or simulations) much more generally applicable.

And anyone who asserts FP is about "pure truth" obviously hasn't tried it outside of toy examples like quicksort or Fibonacci functions. FP is a complete programming paradigm, and so ultimately deals with application models, user input, business logic, heuristic decision making, etc.. But this is only obvious to someone that actually practices FP. For everyone else, it's a form of FUD.

Again, the focus is on

Again, the focus is on "thinking locally about code and behavior", not developing it in the first place. Code writing, not code reading.

So let me elaborate what I mean about pure truth: equational reasoning. Yes, you don't have to do that, in say scheme where plenty of OO-like imperative abstractions are available, but in Haskell you seem to get stuck there; e.g. you can never hide an effect without an impure operation, the only way to make what effects occur configurable is to go parametric early (more abstraction). It's not FUD at all, but by design of the actual language. And I don't think we are talking about the same kind of truth, you keep bringing up things like user input, heuristic decision making, etc, that have nothing to do with writing code.

OO Semantics vs. OO Methodology

I think it may be useful to distinguish between language semantics and design methodology. The main problem I have with OO is that the language semantics bring unneeded complexity to most problem domains (doubly so if everything is imperative). I don't think that all OO ways of thinking about objects and relationships are bad, but usually the relationships could be expressed in a more semantically clean way in a language with a cleaner functional semantics.

I have similar complaints as you do about effects in Haskell. Also, I think pervasive map and zip are pretty bad for the same reason that I don't like concatenative languages - names have better stability than incantations that directly reference elements of a value structure. It's a separation of concerns problem. But functional languages tend to be able to much more crisply model real world relationships than do OO languages, because OO languages bring along a bunch of semantic baggage.

the only way to make what effects occur configurable is to go parametric early (more abstraction)

Parameters have got to be the simplest semantic way to achieve configurability, so you must be complaining about a UX / presentation issue here.

I agree with most of what

I agree with most of what you are saying here. My problem with going parametric early is just that you might not be sure about what you want, but you have to build a nice clean abstraction anyways, and then throw it away when you realize you really needed something else.

Dynamically typed languages have a real advantage in this case; they just don't require so many pre-commitments and allow one to debug their ideas. They aren't really great for end products, and are fragile to changes after they've settled down a bit. Gradual typing makes a lot of sense in that case, and I guess even Haskell is getting in on that.

OO languages do have a lot of baggage, and are not that great, especially given the way that OO is realized today. But we can't ignore why they remain popular, it isn't just momentum and good marketing. Supporting more of the problem solving process in our languages will lead to better languages.

Static typing

There are legitimate complaints to be lodged against many static type systems, but I don't think dynamically typed languages offer any serious advantage over a well designed type system. Some form of gradual typing is a good idea, but even when typing is dialed down to zero, it should always infer coarse Hindley-Milner style types -- but with support for unsafe casts. The cost of having to add a few casts in the rare cases when you want to do something tricky is far exceeded by the benefit of having inferred types around in all of the other cases.

But I didn't even know we were talking about typing. Was the configurability you mentioned earlier only needed as a work around for static typing of effects? If so, I think that's probably a limitation of whatever typing system you have in mind. I don't think there's anything particularly difficult about typing effects. Something coarse and reasonable can always be inferred that's much better than dynamic typing.

Type inference does help, of

Type inference does help, of course. Quantum types (the form of type inference that typeless provides) are also an option (compared to Hindley Milner) if you are into sub-typing. I'm not sure how far you can take it, though, it seems like there will always be a limit especially when you need more advanced type features to correctly type your code (type systems are conservative).

My point was that there are features that seem dangerous and stupid when looked at from one dimension. If we just look at the end solution, and maintenance or reuse of that solution, implicit effects looks bad, OO looks bad, dynamic typing looks bad, lack of correctness proofs look bad. The world is crazy for relying on these technologies! But if we consider the process of actually getting the code written in the first place, the features begin to make more sense, and the world isn't that crazy after all.

I mostly agree with that

It's probably true, though, that many of the properties that make software easy to maintain & modify also make it easy to develop in the first place. Explicit typing makes software less flexible and thus less easily developed (unless you happen to get it right on the first try). Same thing for most separable concerns - separating them is important. When I sit down to type in what I know about the problem I'm solving, if the language makes me commit to assumptions about aspects I'm not sure about, that's setting up potential rework down the road. Probably more importantly, languages that allow you to avoid such assumptions in a natural way will allow you to produce better libraries that, using natural idioms, can be reused in more situations. (For these reasons, untangling concerns is IMO the most important job of a language designer.)

Regarding type systems, the important caveat I added was that the type system should be weak, with support for explicit casts. Then you don't require ever more advanced type features. You can opt to just insert a handful of casts. Probably many around here consider casts to be ugly, but I think they're crucial for a good software engineering platform.

Do you have reference for Quantum types / Typeless? I think I may have looked at as it passed through here but Google isn't helping me much at the moment.

Local reasoning

Local reasoning is useful for both reading and writing code. When writing code, it allows you to refactor, abstract, and reason about incomplete programs. It enables you to more quickly trace bugs when they occur. It makes it easier to predict the consequences of editing code or replacing a subprogram. And you don't get "stuck" with equational reasoning: all the normal informal reasoning (about what a function 'does', etc.) remains available. Even hidden effects are easy if you model them (I've modeled object capability message passing systems in FP as a proof of concept) but is much less useful for reasoning about and writing code than you seem to be assuming. An FP developer quickly gains the intuition to be aware of where external advice, inputs, or outputs may be necessary. The rest is easily deferred; I frequently add new effects or state to a pure monad. Seriously, you seem to be arguing from a position of extreme ignorance on the subject of FP.

Addendum: Those heuristic techniques, weights, etc. are applicable to writing fuzzy code in a more literal sense. You can use them to guide construction of a function. I have experimented with this in both OO and FP, usually in context of declarative configurations.

Look, if you want to

Look, if you want to concurrently accuse me of speaking from extreme ignorance while also claiming OO has no benefits despite being extremely successful, please go ahead. But don't fool yourself into thinking that you aren't being hypocritical.

Success depends on your goal

Success depends on your goal. As an attempt to tangibly improve programming or productivity (relative to procedural) OO was largely a failure. As a religion, OO was very successful.

Hypocrisy in this case would require I speak from ignorance of OO. I used OO. It was my primary paradigm for fifteen years. I know the design patterns. I believed the hype. And I know how OO misleads people.

Further, I remember having similar doubts and ignorant opinions about pure FP. As an OO developer, even one who had recognized the counterintuitive nature of OO models, it was difficult to imagine programming without pervasive state and effects. With benefit of hindsight, pervasive state and effects were a security blanket dragging - unnecessary, a tripping hazard, and ultimately unsanitary.

declaring an imperative in cooking

Over at http://www.cookingforengineers.com/ they have a very interesting representation of proceedures. Look for the little "tables" at the bottom of each recipe.

Rather than a list of steps, there's a list of ingredients along the left column, then they are combined over time moving towards the right column. Ultimately, you're left with a single row that represents the completed dish. This is a declaration of a parallel procedure in a way that is much clearer and more flexible than a traditional/singular/linear procedure for cooking.

My point here is that, as always, the ideal lies somewhere in the middle.

That is quite beautiful and

That is quite beautiful and elegant, thanks! I can't help but notice that many of the tables are linearized (meaning there isn't much parallelism in the recipe). Also, these are just summaries of the more complete linear instructions above. Finally, it seems like the tables are distilled from knowledge in the imperative specification...could you start by writing these tables instead of writing down the recipe step-by-step?

I totally agree that there are paradigms that can provide the best of both worlds (e.g. managed time), but a new programmer (and maybe lower-experienced ones) are going to grock line-by-line first (it will take the a while to grasp the tables, but contrasting to the line-by-line approach can help a lot).

Also, see Jonathan Edwards' paper No Ifs, Ands, or Buts:
Uncovering the Simplicity of Conditionals
.

parallelism and chicken vs egg

Regarding the inherit parallelism or lack thereof: Sure, many recipes, and indeed many programs, lack parallel algorithms. However, the parallelism in a sequential procedure must be recovered from the data and control dependencies. Where as in the tabular cooking procedure, the parallelism is implicit in the representation. Knowing that there's no opportunity for parallelism saves me from trying to reschedule operations.

If you assume a single cook, then having pre-scheduled operations is convenient, but have you ever tried to do simultaneous prep/cooking with a spouse or friend? You need to recover the parallelism and reschedule anyway. It's also much easier to go from the parallel table to a naive sequential schedule: Just go left-to-right, then top-to-bottom. No need to jump around if you have a single core^H^Hok.

Regarding your question "could you start by writing these tables instead of writing down the recipe step-by-step?". Yes, absolutely. But it's only for processes that have parallelism. In the simplest case, you can take two sequential procedures and compose them in parallel. Consider making a main protein plus a side dish. Or more interestingly, you can join them by some combining operation, such as making a cake and the frosting, then applying the frosting to the cake. Sure, it's sequential at the bottom, but it's far easier to compose parallel recipes than to reschedule an interleaving of sequential ones.

Prep at my house involves

Prep at my house involves making the rice, pre-cooking the meat, and dicing the vegetables. We don't have enough space in our Chinese kitchen to really do anything concurrently, so its my job to do it all before my wife gets home.

I am more interested in the design aspect of the process. In general, recipes are just concrete examples, declarative abstraction happens progressively as those examples are generalized (so you can create new dishes). But if you were just focusing on one recipe with little variation, a procedural encoding is probably the easiest way to do that. A parallel representation would come later as you realized what could be done in parallel, but you aren't starting there...it is more like a refactoring.

inherently unordered inputs

Even without abstraction, there are inherently unordered inputs to many operations.

If you're making cookies, you need to add to your mixing bowl flour, sugar, salt, butter, etc all at once in no particular order. Even in this individual and completely un-abstracted step, there's inherent parallelism. Compare with this random line from a cookie recipe I just found: "Sift together in a large bowl flour, baking soda and salt. To this add sugars, egg, vanilla extract and butter." Is it really that critical that I add the flour before the egg? No, probably not...

Traditional recipes tend to encode this sort of parallelism informally via the same mechanism they use to make the recipe appear less complex than it actually is: sub-steps. The aforementioned cookie recipe has 3 numbered "steps", but has at least 8 verbs of interest to perform: Preheat, sift, add, beat, stir, drop, bake, and cool. This is no different than procedural abstraction in an imperative language. Each subroutine has it's own local schedule and some flexibility is possible within that subroutine.

To me, if I were writing this recipe, I'd probably just write:

batter = {flour, baking soda, salt, sugar, egg, vanilla, butter}

Here the curlies are set notation.

And yes, I'd do that without considering of the following intermediate representation:

batter = {}
batter += flour
batter += baking soda
batter += salt
batter += sugar
batter += egg
batter += vanilla
batter += butter

Again, when I'm designing

Again, when I'm designing new food, I'm not so interested in what can be done in parallel yet. I'm doing it in "an order", other orders might be possible, but it isn't something I want to think about right now. My own head is single threaded, after all.

If I know what I'm cooking, or following a pre-existing recipe, then it is very different. I'm not thinking so much as doing...playing the part of an interpreter I guess. Or I could optimize a recipe I've already come up with, figuring out what parts can be done in parallel where ordering dependencies don't really exist.

abstraction by construction

I understand what you're saying, so let me paraphrase: You execute some procedure, you capture a trace, and then you generalize from that trace. Makes perfect sense, and I'm all on board with that plan. However, you can get some automatic abstraction by construction.

Consider this: Rather than record each individual operation in your trace as you execute them, you may modify and replace the most recent step as you execute new operations. Think of it like "Undo typing..." in a word processor, where you aggregate multiple typed characters, instead of one undo per individual character. Running with the ingredient mixing example: You executed batter+=flour and batter+=sugar, etc, but when you're done, you've got batter={flour, sugar, ...}.

You only need to introduce a new procedural "step" when you execute an operation that cannot be merged with the most recent step. Or generalizing beyond a single step, you can update recent steps up until some sort of barrier, such as a "sequence point", such as "put batter in pan" or "put pan in oven", or some generalization thereof.

I think that this sort of abstracting/compressing/optimizing/coalescing in informal reasoning is pretty natural, automatic, and convenient.

That is related to my

That is related to my current strategy, and I think what Bret was getting to at the end of his learnable programming essay (create by abstracting). You can do some of it automatically (bind the left edge of a rectangle sketched below another in a diagram), but others require human decisions (for X = 50, and f(X) = 100, is f(n) n + 50 or n * 2?).

But the generalization going on above is an optimization (allow steps to run in parallel) rather than a parameterizarion (derive different foods from an abstracted recipe). In that case, we can look at the semantics of multiple statements and apply or suggest refactorings automatically. Even cooler, perhaps we could execute it for one context, notice parallelisation oppurtunities by executing them, and propose refactorings based on the trace. I'm sure work has been done like that before.

But I think the first go at the code should be as easy as possible. Besides premature optimization, we have premature abstraction, and if there was a way to just ignore those details not related to getting one instance written down, and then generalize afterwards, that would be a huge win.

Not disagreeing

Just want to emphasize: Premature abstraction and premature optimization are both bad, but so are extraneous details and insufficient performance. Aim for striking the right balance.

I think the right way to

I think the right way to think about it is "later" rather than "never". Given refactoring and whatever else we can think of in the future, code no longer has to be immutable.

Some code is practically immutable

When developing large software projects, some decisions would require an extensive rewrite to change. So the idea that all code can easily be refactored is simplistic. Some code can be easily rewritten, some code cannot. A good software architect can make the right choices about which parts cannot easily be changed, and what technologies and architectures to use for them at the beginning of a project. This enables agile develelopment as the remaining decisions can be deferred until you have better information, or be continuously refactored throughout the project.

The experience should try to

The experience should try to defer the assumption of good decisions being made as early as possible, but that won't be true in general. There is simply the fact that "good software architects" are a scarce resource, and anyways, the ability to make decisions early on is related to problem understanding as well as ability.

This might all be a fairy dream, however. To be able to defer almost all decisions until late in the process, and have things work out.

Lean Architecture

Some examples from databases that come to mind are that it is easy to add a column to a table providing existing data can be assigned a null or default value. It is easy to remove a column, providing all references to the column are removed. It is easy to add a new table, and the same for removing a table (providing all references are removed).

What is more difficult is refactoring the table normalisation, which is harder.

What is likely to be 'immutable' is that the project is using an SQL database, moving to a NoSQL solution like CouchDB or MongoDB is likely to require a complete rewrite, although this does depend on the software architecture. Sometimes MVC is enough to enable the 'M' layer to be easily changed, however a lot of business software is quite a thin UI layer over the database, such that there is little that can be retained when changing the database.

I recommend the book "Lean Architecture" by James O. Coplen & Gertrud Bjørnvig, which explains that LEAN principles require decisions to be made early (in the design phase) whereas Agile postpones decisions to be as late as possible in the development phase. LEAN clearly works for some kinds of projects (The Toyota way etc.) and Agile clearly works for some kinds of projects. Most real world projects are somewhere between the extremes and the trick is to know which decisions need a LEAN approach and which can be dealt with in an Agile way. By making the correct design decisions up front you create a framework that facilitates the successful agile development of the rest of the project.

ease of training

A nice concrete goal to aim for might be to write programs in such a way that training intuition was as easy as possible. It would be nice to have a rough model of difficulty of training like big O gives a rough model of performance.

Reasons

I don't see any reasons to persuade me my view of simplicity is wrong, only that you disagree. That's fine of course, we can agree to differ, but we don't learn anything from that.

Re: Reasons

I'm tired of arguments on the Internet. The fraction of arguments that participants 'learn anything from' is depressingly small. Since you're asking, I'll provide some counter-points for you to contemplate. But I'll not argue them further.

Naturally things are mutable, have identity, and are multiply referenceable, like objects.

This is a very dubious assertion. Humans are able to assign identity to large, stable systems like rivers or humans, but we have much more difficulty with fine-grained or unstable system like a specific path (down to the footstep) you last walked to a mailbox or the specific composition of atoms that make up your body. Identity of an object isn't clearly separate from the observer. Philosophers that actually think deeply on these concepts have much difficulty with identity and objecthood.

Time-varying physical systems are also rather distinct from the notion of "mutability" in procedural languages. For example, there is no way for you to 'set' variables in physics (nor even to 'get' them). OOP advocates frequently conflate mutability with time-varying structure to make claims about how natural OOP supposedly is. I have my doubts.

The correct notion of simplicity is based on registers and combinatorial logic, the mutable fabric on which computers are built.

This seems incredibly arbitrary. Why stop at registers or combinatorial logic? Why not something lower level, like transistors? or physics?

Anyhow, you have implied a claim about "the correct notion of simplicity", that it should be related to modern technology, how computers are built today. You have faith that "mutable register CPUs are not going anywhere" (*). I do not.

I'm more inclined to favor the definition of 'simplicity' from Out of the Tarpit, i.e. the elimination or avoidance of "accidental complexity" that gradually calcifies large programs, makes them unnecessarily difficult to comprehend. State as modeled in imperative languages is a source of much accidental complexity, but this doesn't imply that every model of state will share the same problems.

The finite-state-machine is the simplest model of an electronic computer, and is where mathematical modeling should start.

You made this assertion without providing any reasons. I disagree for various reasons. Cellular automata arguably offer a simpler model for the electronic computer, for example (you can model every transistor and the flow of electricity, and concurrent hardware systems). More importantly, I don't believe it's necessarily the goal of 'mathematical modeling' to model the electronic computer. Modeling the desired program behavior is frequently sufficient.

Organisation, Simplicity, and Hardware Design

When referring to how things behave naturally I am not talking about physics (quantum or otherwise) but simple concepts of organisation. If I have a chest of drawers and I put my socks in the first one, I expect to find them there later (stability), but I also can change the contents of the draw adding and removing individual socks (mutability).

When referring to concepts of simplicity I am referring to total system complexity, including the complexity of the hardware necessary to execute the software into the calculation. Also factoring in the difficulty of correcting mistakes in hardware, the relative value of hardware simplicity in a system is much higher than software simplicity. Designers correctly try and push any necessary complexity into the software.

Cellular automata may (or may not) offer a simpler model of hardware based on some definition of complexity. However based on design experience, correctness and human understandability is much better with a FSM. There is a reason there is a dominant design architecture in VLSI logic, and that is other approaches like cellular automata or asynchronous logic get much harder to work with at scale. I once thought as you do, I even designed a processor based on cellular automata, but the more I looked into writing software to run on it, the more I realised that placement and routing was a hard problem. You see this problem today in FPGA programming. The best way to understand this is to try and write an operating system for an FPGA to run in a multi-processing environment.

Edit: on "out of the tarpit" I agree with the initial analysis of what causes programming to be difficult. My problem with it comes from "necessary" state. The classic problem for this is the airline seat booking problem, where multiple people simultaneously access an aircraft layout to choose their seats on a flight. The simplest solutions are stateful and transactional. Languages incapable of expressing the semantics tend to externalise the functionality into a "database", which seems a poor solution to me. I agree with the comments above that relational modeling is a good fit, better than OO (and tables can be mutable). However there seems little difference between a class and a relation (if we ignore storage layout), if we allow methods to be rewritten as type-classes.

Streamable Bytecode

I'm developing a purely functional, streamable bytecode that has the properties you describe, as part of Awelon project. (Aside: A bytecode is 'streamable' if it has no backward jumps. Loopy code can still be expressed, e.g. using a Z-combinator.)

Awelon Bytecode (ABC) has only five basic types: integer, product, sum, unit, and function. This is sufficient to model lists, trees, deques (e.g. via finger-trees), purely functional objects, and many other ad-hoc structures. Further, discretionary value sealing serves as a variant on newtype wrappers, e.g. so you can apply `{:stats}` to a small set of numbers. After doing so, one must use `{.stats}` to unwrap and access the content. This gives us ad-hoc types. ABC has no implicit type conversions, no internal support for reflection. Static type inference is feasible.

The ability to represent data with code allows for transparent procedural generation, templating, spreadsheet-like properties and live coding, etc.. With large bodies of code that mostly represents data, partial evaluations and caching become hugely useful ('constant propagation' becomes the most common form of computation). Conversely, the ability to easily and uniformly represent code within data is useful for expressing objects or existential types. Effectful code is easy to model using free monads or similar techniques.

Reading and writing raw bytecode would be a pain, of course. But it's possible to present editable views over bytecode. I've already developed a Forth-like text view called Claw (Command Language for Awelon). Further, I have a straightforward path to a Visual Claw dialect where I can represent a volume of code with embedded forms and widgets (sliders, radio buttons, text areas, editable diagrams, etc.).

In addition to editable views of bytecode, we can use higher level views of the codebase - the 'dictionary' in my parlance, but at large enough scale it's more like a filesystem. We can evaluate and view structured data. Between caching and a command pattern (adding a new definition for each command to update an object) we can model entire applications and databases directly in the dictionary: forums, blogs, wikis, game worlds, etc..

Making all of this efficient is challenging due to my choice of a minimalist bytecode. I'll need accelerators, for example to recognize common list functions like 'reverse' and 'access at index' and translate them to a single opcode under the hood. I'll need to use representation tricks, e.g. annotating some lists to be represented as arrays or chunked lists. I need JIT compilation, preferably with static type analysis. Etc.. I plan to eventually develop `ABC Deflated` which is just ABC with a standardized set of accelerators.