Cognition and Coding

Fascinating research from MIT on human cognition and coding. The brain uses spatial navigation as well as math/logic areas to understand programs. Our apparent need for at least some spatial processing when programming, may lend additional support for an explicitly spatial syntax and semantics for programming languages.

The full article is published here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: spatial reasoning in

Re: spatial reasoning in programming, I think it makes sense. From birth, we develop an intuition for locality, Einstein's "continuity of action". That's likely connected to why lexically scoped variables are easier to reason about than dynamically scoped variables, why equational reasoning and referential transparency are easier to reason about than ambient side-effects, and so on.

Drilling down into code is applied reductionism, but reductionism is harder to apply if it's hard or impossible to isolate a subsystem due to non-local effects (GOTO, mutable state, particularly static mutable state, thread preemptions, etc).

The particular form in which these are expressed in current languages aren't necessarily the ideal form, but the principles are probably sound.

Locality and FP

hi naasking, I agree physics has a role to play in computer science, but was surprised by your jump from spatial thinking in programming to locality and then to FP-type precepts. I imagined that spatial considerations would incline one to consider environments which are not abstracted from state and space. (In my view GOTO and static mutable state have advantages and do not have to hinder the isolation of subsystems, providing their use is properly contained, eg lexical scoping. I also note there are neural connections in the brain that are much longer than the majority of ‘local’ links, make of that what you will.)
I refer in my reply to John Shutt to the absence of a functional language in the above study, it would be fascinating to consider a range of them in terms of FP purity. A guess is that they might have profiles closer to the math/logic side, than to the spatial side of the brain.

In my view GOTO and static

In my view GOTO and static mutable state have advantages and do not have to hinder the isolation of subsystems

I agree they have advantages and do not have to hinder isolation, but they absolutely can and often do because they're not suitably restricted. The non-locality of QM is instructive: it's just enough to permit interesting behaviour, but not so much that it permits superluminal signalling which would lead to paradoxes (integrity bugs in a program in this analogy).

That's the kind of outcome you'd want for taming mutable state and control flow.

A guess is that [FP languages] might have profiles closer to the math/logic side, than to the spatial side of the brain.

I'm not sure. Qualitatively, I think I just use different spatial intuitions in FP because everything is local, but I still think in terms of pipelines and how this value here will influence an output "over there", further down the pipeline (or conversely, how this incorrect value came from "over there" further up the pipeline).

Miscellaneous thoughts

I agree that programming involves geometric thinking. Admittedly, I have something of a geometric/visualization turn of mind, which I gather isn't so for some people. (To date, I don't venture much into the PL-design implications of differences between cognitive styles of programmers, because I find I can't reason about that without my mind blowing a fuse.) By my observation, it's not just me, though. Flowcharts were a thing, back in the day, and I've heard of programmers who were never as effective after flowcharts went away. People keep trying to make visual programming work.

Which said, I don't trust the feasibility of learning about thought processes by studying which regions of the brain are activated. Sure, the brain gives rise to the mind, but I don't think the causal connection is remotely straightforward. Reductionism calls for the lower level to give rise to the higher, but that doesn't mean you can seriously expect to understand the higher level by studying the lower. Attempts to reason from brain impulses up to semantics put me in mind of a (partial) line from The Princess Bride: "I do not think it means what you think it means."

I also perceive that programming is a linguistic activity. All software that truly lasts is text-based (witness TeX). Despite those persistent and revealing efforts to make visual programming languages work, we still use text-based PLs; I submit there is a reason for that.

Re naasking's point about non-local effects, though, I would suggest that programming requires non-locality; like quantum mechanics, you can't get away from it, and if you try it might seem okay for a while but sooner or later something will go horribly wrong.

An explicitly spatial syntax for programming.

Given the blanks in our knowledge of how cognition actually functions, I agree evidence from low level brain activity is at best circumstantial and cannot be used to demonstrate the cognitive reality of higher level processes. I am reminded of the futile attempts to prove the psychological reality of transformational grammar (some discussion here from 1978). But such studies can offer clues, and I wish that this one in particular had included a standard functional language abstracted from hardware and a notion of space — it may have given a different result and be of relevance to naasking’s intuitions.
I agree programming is essentially a linguistic activity, but I submit there is a reason why visual programming has such a draw — it is better at representing many to many relationships than the standard tree syntax found in programming languages, which is inherited from natural language. A conventional tree has non-leaf nodes which belong to syntactic categories known as parts of speech, and such nodes can individually only be a child of at most one more complex part of speech — hence the syntax cannot directly share sub-expressions. I call this the single parent restriction (SPR).
Incorporating SPR into the most primitive syntactic category above symbol level, results in difficulty in representing many to many relationships on the semantic level, giving rise to things like pronouns, repetition of terms, explosion in expression size, labelling of complex expressions, and un-necessary complexity.
One solution is to ditch natural language grammar as a template (why assume it is a suitable template for formal languages?) and to devise an explicitly spatial environment incorporating restricted tree forms and an abstract memory, which allows parts of speech to be children of unbounded numbers of parents, called interlanguage (no relation to Selinker’s linguistics concept of second language acquisition). A brief summary can be found in section 2.4 of a an 8 page paper. A shorter summary of synchronic computation can be found here.

Yea, verily

I'm substantially in agreement with most or all of that, yes. It is a fascinating subject, and even circumstantial clues are of interest. Small sample size, which I see is mentioned in the full article (as one should expect of a properly cautious scientific paper), would likely present an obstacle to studying more detailed distinctions between programming paradigms, especially as one would really want programmers fluent in various paradigms. I find myself wondering, come to think, about people communicating fluently in Lojban (speaking of fluency and sample size).

My own view (btw) on the universal grammar hypothesis, based on my wrangling with the origins of sapience, is to substantially reject the metaphor of a coding/decoding device interceding between the sapient mind and its communications interface with other sapient minds. As I put it in a recent blog post (six months ago seems awfully recent, in the skewed subjective time of this year), "[...] given that there has to be some sort of thought engine there at all, suppose it's a powerful engine and I see no call for anything more than that; don't bother with a fancy interface unit to code and decode, just hook up a transmitter and receiver and let the things resonate with each other directly" (link).

Re naasking's point about

Re naasking's point about non-local effects, though, I would suggest that programming requires non-locality; like quantum mechanics, you can't get away from it

The tide may be turning! I think the analogy to QM works well though: once you start down the path of non-local effects, you give up your ability to assume the systems you're analyzing are statistically independent. Seems to check out, and we probably want to avoid that whenever possible.


Hm, the trick seems to be getting just the right kind of non-local correlations; a very deep thing, as aspect-orientated modularity demonstrates. Yet another facet to the programming-QM analogy: just as we're still baffled by how to do modularity right, we've been struggling with QM (especially its thorny integration with relativity) for a century or so and still haven't got it figured out.

Most people need a little syntax

I had (still have, to some degree) a long-lasting love of lispy languages. You know the general type; everything's fully parenthesized prefix notation and the language also manipulates tokens expressed as fully parenthesized prefix notation. But most people can't really look at Lisp code and tell what's going on. Some say it's because there are too many nesting levels for proper cognition, but I am pretty sure it's because there's not enough syntax for them. Because below some threshold every statement looks alike to them no matter what it does.

I made a specialized lisp a while ago as a translation-target for other computer languages - like an intermediate code in a compiler, but with programs isomorphic to the syntax tree expressed in the original language, and readable and editable by humans. At least that was my thought. But by the time I got done with FEXPRs and access to caller environments (necessary to be parse-tree compatible with some languages) I had reached a peculiar point: It was now possible to make a program with the same parse tree as original source code in dozens of languages and have the program mean the same thing, BUT: any non-primitive expression could mean anything.

And at that point I discovered that it became very hard to use for anything complicated, because what little syntax I had left - the parens, mostly - had lost its value of telling me what the hell was going on when I looked at a given expression. I started to feel about my translation-target dialect the way most people feel about lispy languages in general: not enough syntactic clues what's going on.

We want things to have a shape. We want characteristic symbols in consistent spatial relationships to each other so we can home in on the bit we care about. We want some redundancy that shows important things more than once, so we get it sometime before we've read the whole program source. We want to know what kind of expression we're looking at before we've gone somewhere to read its source code. And parens don't give us any clues.

So, yeah. People use spatial memory. That's as expected. No program language is really written all on one line: we use indentation as organization to tell ourselves relationships between things. And as we do so we give things recognizable shapes. And, like syntax, the shapes help us keep track of what's going on.

Lisp syntax

There's a new Unix shell called Oh, which is essentially a Lisp using Kernel-style fexprs, that manages to eliminate almost all the parentheses with a few simple conventions.

As for syntax extension, I've long had in mind to provide a gently-extended syntax for Lisp. I got over my early enthusiasm for free-form syntax extension once I'd systematically explored its consequences (techreport (pdf)): you should always be able to tell, looking at a coherent piece of source-code, how to parse it, without having to know what syntax extensions might have been applied before processing that code. This doesn't preclude mild syntax extension, but it does mean you have to start with some sort of straightforwardly unambiguous syntax and then further constrain things from there; likely one would do this with an "environment" defining certain "operators" which are the only ones allowed within its scope, generalizing Lisp environments in which the operators are symbols.

(I've some notions how one might do this. A single line would automatically be a statement. Within a line, full parenthesization would be required (except that the line as a whole wouldn't need parentheses, of course), but rather than prefix notation, elements of an expression would alternate between subexpresions and keywords; some expressions start with a keyword (so the odd sub-elements are keywords), some start with a subexpression (so the even sub-elements are keywords). There'd be some sort of lexical rule for which lexemes are keywords and which are, or can be, identifiers. Somehow one would have to translate these "operators" —each operator would specify the sequence of sub-expressions and keywords— into a syntactic representation that can be looked up in an "environment" (not sure whether or not that would be a conventional symbol-value mapping). I've speculated on rules for multi-line constructs, likely involving some coordinating (separating/delimiting) lines such as begin/end or curly-braces, but haven't worked that out in as much detail.)

Whether all this is "spatial" or "linguistic" (in the sense of involving, say, Broca's area), idk. Although, in attempting to apply concepts from lambda-like calculi to basic physics, I've taken to describing syntactic tree structure as "geometry", and symbol bindings (thus, environments) as non-locality.

yeah: syntax matters a *lot*

If it weren't already the case that Java (vs. C++) and Golang (vs Python) hadn't taught us the critical importance of getting syntax right (and the extent to which it's really more important in many programmer's minds than semantics), the example of Perl5 should be dispositive.

Perl5 is a modern LISP: it's got packages, functions-as-first-class-objects, dynamic binding, a primitive object-system, GC, and on and on. All in a package that has a killer app: "UNIX scripting". And yet, so many people can't get past the syntax. I program in a bunch of languages, and *also* in Perl5. To me, its syntax is just a facade behind which lurks LISP, and that's how I program it. But the number of people I know who can't read Perl5, and who won't even attempt to learn how, is pretty compelling proof that syntax, and minor syntactic matters, are far, far more impoortant than we give them credit for.

P.S. I'm not arguing that anybody should write Perl5. Far, far from it. I'm just noting that the biggest reason people don't is -syntax-; I remember well first learning Perl5 in 1995 after having been deep in the Python VM for 6 months, and thinking "this is just a different syntax for the Python VM". Not completely accurate, but still, not wrong, either.

syntax and semantics

Back in 1967, Christopher Strachey's "Fundamental Concepts in Programming" proclaimed the "basic irrelevance of syntax and primacy of semantics". Some here might recall I wrote a blog past a few years ago challenging this premise, on the grounds that abstraction really straddles the boundary between syntax and semantics. (link)

I don't actually disagree with your observation that you can't get anywhere without good syntax. I would add that (despite my challenge to Strachey's extreme statement) in the long run you can't get anywhere without good semantics, either. I'm reminded of the classic observation that the most important leg of a stool is the leg that's missing.

Syntax blurs into semantics.

People claim - frequently - that the semantics of a language matters and the syntax doesn't, or vice versa.

But here's a bit of perspective from studying human languages - there is no clear dividing line.

You take whatever you're doing to parse the syntax of a language, and do "more" of it (for some value of "more" - and you will start discovering that more and more higher-level ideas are "only" syntax... until eventually you realize that you're dealing with semantics, and that the syntax was really like the recursive base case of it. Or, you take what you're doing to get the semantics out of an analyzed string, and you eventually realize you could have done the same thing, with some configuration, to break the string into syntactic tokens in the first place.

The one fades into the other, or defines the other, the way pointillist paintings are defined by the shapes of both the point color and the background color.

It may be the case that something similar applies to computer languages as well. At the very least I wouldn't presume to rule it out.


I'd agree about the blur. Your detailed description seems to me applicable to programming languages but not human languages; the difference being that with programming languages the "semantics" is generally taken to be computational, thus formal just as the syntax is; while with human languages the semantics is in the province of sapient thought which I do not find ultimately formalizable at all. Thus, I would agree that with a programming language, doing "more" of what you do for parsing will gradually move you into the computational-semantic realm, but with a human language doing "more" will gradually slide into a quagmire caused by trying to formalize something that will never fully submit to formalization (though it may keep leading you on so you can spend a lifetime in the unsuccessful attempt).

Formal Methods are no longer fashionable for human languages.

That "squishiness" - in both syntax and semantics - is one reason why language models are now statistical - or neural networks, which is "more levels of" statistics.

I had experience dealing with human languages using purely formal methods though - with formal grammars that grew, and grew, and grew, until you reached the point where you couldn't make them any better until you upped the minimum memory requirements for machines to use your software. In practice they were about 95% accurate on 4G machines, 97% accurate on 8G machines, and we got up to 98.5% on 16G machines but never finished that project. I got a couple of patents for design work there on resolving pronoun referents, but they became irrelevant at about the same time they expired, because the world has moved on.

Anyway, that system started with plain old pattern matching - pick up a few keywords, hang them on a sentence template, consult the context established by the foregoing sentence, and decide what general context to parse in. Then parsed using a regular-old recursive-descent production grammar with a token stack and checked structural invariants using a link grammar. And, modulo the grammars growing infinite hair the more you tried to cover edge cases and exceptions, it actually worked as well as any neural network trained on a corpus of less than ten million words. And my experience was that the link-grammar phase eventually built the same kind of parse tree out of the "syntax" that a compiler builds extracting the "semantics" of a program. We could traverse the parse tree extracting meaning (or at least the top four or five candidate meanings) the same way a compiler traverses the parse tree of a program extracting instructions.

Computer languages are much more tractable. The grammars for them are unambiguous, rarely require anything more than Chomsky type 2 parsing, (natural-language is context-sensitive, meaning it requires Chomsky Type 1, at least) and usually fit in just ten kilobytes or so of memory. I'm a little bit surprised to hear that you experienced the same thing there; for me it was something that eventually emerged from the howling complexity of natural language.

lies, damn lies, and...

I would characterize statistics, and (artificial) neural networks, as abandoning sapience altogether, forcing us to view the world through a filter that removes everything that can't be handled technologically ("formally") so that one can then claim formal/technological methods can handle "everything" — since one has excluded anything they can't handle).

You are reminding me of

You are reminding me of so many people who may have claimed in the past that something (even something simple like correctly identifying a picture) requires "real" intelligence, but when confronted with software that actually does it, immediately dismiss the task as something that can be left to mere technology.

Artificial Neural networks as they are usually designed are hardly more "sapient" than the muscle reflex when someone's knee gets a tap. But that's a symptom of our intent in building those systems. We want something that reliably has this particular kind of 'reflex' as appropriate to our chosen classes of inputs. We design for that, and we get it. In the rare cases where it's not the symptom of our intent, it's a symptom of the limitations of our design skill.

Neural networks can be considered a particular type of domain-specific language. It even has some abstractions, like "edge recognizers" at one level that are set when the input hits them and can then be used by reference (via a connection and multiplier of course) by nodes at later levels. The formalism is as Turing-Complete as any programming language in that whatever arbitrary task you like can be performed by some network of nodes of summed input and nonlinear output connected by edges having multiplicative factors. In fact I could write a compiler that takes formally expressed tasks and spits out networks having, provably, EXACTLY that behavior. You could too; it's a simple implementation of boolean logic.

So if using an ANN is "abandoning sapience all together" then sapience is a mystical thing that can't be done by any machine. I think I have a squishy machine between my ears that does "sapience", so I'm pretty sure there exists some machine that can do it.

The uses of ANN's that you are objecting to are strictly beyond that. We can usually guess at a network design for a simple task, then use statistical feedback to find a set of weights for the connections in that network that *approximate* the task, with no available proof that the task is being done with exact formal precision. Even if it's a task we can't specify a formal method for. That capability is strictly IN ADDITION TO a complete formalism capable of general computation.

errors of reasoning

(This is going to be quite meta. Sorry about that; nature of the beast.)

Errors of reasoning are rampant on this subject. Like, I suspect, most people, I first became aware of various errors of reasoning commonly committed by people trying to claim the human mind can do things technology cannot. Gradually I became aware that people on the other side of the question —who claim technology can do anything the human mind can— commonly commit different errors of reasoning.

People in the "humans are smarter" camp may attempt to justify their belief by claiming some particular standard task by the human mind cannot be done by technology. I suggest that for any straightforward standardized test (yes, "straightforward" here is a weasel word), one can reasonably expect to construct technology that will outperform the human mind on that test; making this sort of claim a common error by the "humans are smarter" camp.

People in the "technology can do it all" camp are, I think, apt to agree with my suggestion, but then try to infer from it their own favored conclusion. That is, accepting that technology can perform all such standard tasks that the human mind can, they conclude that technology can do everything the human mind can. Which would follow iff everything that can be done by the human mind is a standard task of this sort. This however is also an error of reasoning, unless one also provides evidence that everything that can be done by the human mind is of this sort. Notice, there is no burden of proof here on the "humans are smarter" camp: the inference isn't valid in the absence of a counter-example (something humans can do that isn't of this sort), rather it's invalid in the absence of evidence that there are no such counter-examples. Trying to put the burden on the "smart humans" side would be what Wikipedia calls argument from incredulity. Assuming all tasks the human mind can do are of the standard sort, and inferring from this that technology can do everything the human mind can, would be circular reasoning.

Another possible error here would be to suppose that because some of the "smarter humans" camp commit a strategic error (by pinning their hopes on some standard task that they expect technology to be incapable of), their conclusion is necessarily wrong; that one would be argument from fallacy. Generally, it seems, someone in the "smarter humans" camp would make this strategic error because they've reached their position without having solid reasoning to justify it, and they're looking for a justification. Their reasoning could be "mystical", for which I'd have no particular sympathy (that too should qualify as a fallacy of some sort); but they could also have an intuition that they just can't readily pin down.

(Do I have a specific reason to suspect the human mind actually is capable of something technology isn't? Well, yes; I have in mind some evidence —not conclusive, but perhaps suggestive— and I have some ideas about the possible nature of such capabilities, though all of it is clearly quite preliminary. I'm wary of getting too far into that when I suspect my audience may be hung up on argument-from-incredulity.)

(Oh, and, just for the record: yes, at heart I'm a reductionist. That doesn't put me in the "technology can do it all" camp.)

The Church-Turing thesis does apply to the problem.

I'm not really arguing that all the things a brain can do are "straightforward tasks". I'm arguing that the brain can't do anything that's beyond the capabilities of something made of atoms acting according to the laws of physics.

And because ANNs are provably general computation devices, they can, at least theoretically, do anything that a device made of atoms acting according to the laws of physics can do. As can any general computation device capable of high-fidelity simulations of atoms acting according to physics.

This is not an argument from incredulity, nor an argument from fallacy; this is an argument about the equivalence of the capabilities implied by the fundamental laws governing the two different types of device. I argue that this is true because the Church-Turing thesis is true. I argue that the Church-Turing thesis applies because simulating the action of a device made of atoms acting according to the laws of physics is within the set of things that software can do.

The question of whether and how such a network is designed and/or trained is at this point beyond us. But a general AI constructed using GOFAI techniques in a classical programming language is also beyond us, even though we've gone from "Hello world" to analyzing the orbits of billions of stars looking for evidence of dark matter.

The Church-Turing thesis also doesn't apply.

I'll comment a bit, and then I'll see if I can't bend it all back toward the point I was (if I can remember that far back) trying to make re programming languages.

I agree that the argument you're putting forward there is neither from incredulity nor from fallacy. Though afaics you are starting with a declared belief in reductionism and reasoning from there to what can be done by technology versus by the human mind, and there's a logical gap there. As mentioned, I am inclined to reductionism; but I note (and approve) a qualification in your remarks: "at least theoretically". Here's a related quote (mentioned in my relatively-recent blog post on the philosophical implications of Gödel's Theorem(s), yonder): "In theory, there is no difference between theory and practice. But, in practice, there is." The question here seems to me to be whether these two levels of description —"formal" and "sapient"— can be unified by understanding one in terms of the other; and I suggest the answer should be: in theory yes, in practice no. I also suggested in the blog post that formal reasoning is bottom-up, while sapient reasoning is more top-down (though I wouldn't say that's an entirely adequate way of putting it; but it's a start). Sapience is characterized by considering the larger context, which is where building things up formally keeps its uncovered gaps.

I alluded, btw, to having a reason to suspect sapiences are in some sense fundamentally more powerful that formal systems. Fwiw: I'd take a cue from memetic evolution. As I read it, genetic evolution on Earth has followed a track about four billion years long, with a major jump in rate about four hundred million years ago; while memetic evolution has followed a track no more than about three million years long, with a major jump in rate about fifty thousand years ago. This suggests to me that memes are evolving more than three orders of magnitude faster than genes did. Yes that could be simply due to a faster copying mechanism; but I don't think so. The copying process is profoundly different, and I think the key difference is that the host minds are sapient. (I could go on with the memetics theme, but that mostly gets further afield from PLs.)

So we've got these two levels, call them technology and sapience, or formalism and sapience, or even syntax and meaning; I reckon reductionism is supported "in theory" by the higher level being "emergent" from the lower, while the divide between the two is likely to be, at the same time, unbridgeable in practice. This would seem to go both ways: describing concepts in terms of cerebral neurons is likely at about the same level of hopelessness as describing cerebral neurons in terms of concepts.

The point I started out to make was that the description of a programming language seeks to connect syntax to computational semantics, which is basically connecting one kind of formalism to another and ought to be doable; whereas the description of a natural language seeks to connect syntax to concepts, which is attempting to go from the formal level to the sapient level. That syntax-to-concept thing might start out with a plausible-seeming approximation, but blows up as you try to improve the approximation. The same should happen if you try to bridge the gap from a programming language (syntax or formal semantics) to sapient concepts, which are always hovering nearby since programming languages are meant to be used by sapient beings.

I alluded, btw, to having a

I alluded, btw, to having a reason to suspect sapiences are in some sense fundamentally more powerful that formal systems.

I'd maybe buy that if "sapiences are complete but inconsistent" is a lemma of any such argument. For instance, if they are truly top-down where formalisms are bottom-up, then sapiences would be context-sensitive logics, ie. I can associate rules X,Y with context C1, and rules Z,W with context C2, but I'll just be really confused and probably get stuck if I ever find myself unable to figure out the context.

Existence proofs of sapience exist. Explanations not so much.

A lot of AI people are looking for systems that do something which is not what cognition does. They want guarantees of correctness. They want explanations. They want consistency. They want a symbolic understanding of cognition and subjective experience.

While there are existence proofs that sapience is possible, there are no proofs of any kind that a sapient system capable of that kind of guarantee, or even an accurate symbolic *explanation* of its own actions, can exist.

The "extraordinary claim" therefore is the claim unsupported by evidence. I'm not inclined to believe those who claim that a system meeting these additional requirements can exist, until or unless there is strong evidence that it can.

I can give an explanation of why an ANN does what it did. But it won't be the kind of symbolic explanation that would satisfy humans; it would be a mechanistic trace of the signal-propagation that led to the output. That's the true explanation. It would take longer to even read than the reader or author has time to live, and comprehending it would require retaining millions of times more context in short-term memory than humans can. That's the only explanation that's "true", but that doesn't make it a good enough explanation.

People want explanations in shorthand, symbolic terms that they can understand. Even when that fact means that those explanations are necessarily false. I fear that when dealing with anything that requires a "general" intelligence, they are doomed to disappointment because humans are unable to fully comprehend the grinding monotony and vast, asymbolic, inelegant bulk of the correct answer.

The best explanations we can come up with would likely be produced by the same method as the actions themselves. Obviously if we can produce correct explanations in symbolic terms then we should be using the explanation-generating system to produce the action in the first place because that would be the only system of which such an answer could be true.

Otherwise the explanations, like the actions themselves, will always come with no satisfying guarantees of correctness. The explanations we give when someone asks us why we did something, even when we believe that they are true, are post hoc. They do not seem to be substantially more correct than the reasons an outside observer of our actions could give.

Godel showed that any system of mathematics which is complete is also inconsistent.

So if our lemma is that sapiences are complete but inconsistent, we can still work with them; we just need to understand that certain properties will always be true of the results.

No guarantees of correct answers, no guarantees of a correct explanation of those answers that we can understand, no guarantees of getting the same answer every time.

But I'm inclined to think that doesn't mean it rules out "genuine" cognition, because the same problems arise in biological neural networks. According to generally accepted standards biological neural networks *DO* perform "genuine" cognition.

So if we are interested in genuine cognition, we focus on that. That, we can be sure by an existence proof, is possible. The kind of intellectual satisfaction that Descartes was after ... we've never seen any proof that it can exist.


I don't quite feel on solid ground as to what sense of symbolic you're using here, since we're at the intersection of PLs/computation where "symbol" means one thing, and sapient semiotics where "symbolic thought" means something else.

"symbol" in the informal sense.

I mean "symbol" in the way non-programmer, non-mathematician human beings use the term.

I mean words, of the sort one could speak to an admiral to explain why the ship's brand-new computer, in the midst of combat, suddenly decided to jettison all the lifeboats. The admiral is not interested in formalisms. Nor in program traces or signal propagation records. The admiral wants someone to explain, using words that are part of a shared vocabulary, why the action was taken and who should be blamed.

And that is the sort of thing we can never fully provide. Any answer we give will either be false, or it will be wholly unsatisfying and useless to the admiral because it will not be expressed in words from any kind of shared vocabulary and will not address the concerns and implications of the sudden "decision."

Symbolic explanations

I figure, from the point where we started telling stories (onset of the upper paleolithic), it took us about fifty thousand years to devise a credible way of telling ourselves a story about ourselves — that being the self. It's a flawed technique; the self is, by my reckoning, a fictional character we inject into our story about ourselves, and we pretend that the things actually done by our sapient mind were done by this fictional character. Like a fictional character in an historical novel, though, the self wasn't really there at the time and cannot actually affect the outcome. Sometimes the fiction breaks down; a classic example is the illusion that when faced with an unexpected stimulus we experience a paralyzing hesitation before reacting. (I think Daniel Dennett may have written about this one.) It takes time for nerve impulses to propagate from our senses to where they'll be reacted to, and for the reaction impulses to propagate back to our muscles; but those internal delays aren't part of our self-portrait, so when we construct our fiction about how the self reacted to stilumuli, we have to invent a model-consistent explanation for the time delay; thus the fictional "hesitation" is born.

But we've had less than a century, so far, to devise a credible way of telling a story about computers. If it took us fifty thousand years to come up with a somewhat-flawed story mode for people, we can perhaps be forgiven for having not done as much for computers in less than a century. Granted, this time around we've got much more advanced existing storytelling technology to build on in addressing this new problem. We've made some progress already, with the invention of the concept of a "bug", which in its technological sense is apparently about a century and a half old. The way you explain that lifeboats-ejection to the admiral is likely to involve the term "bug", which the admiral will probably have heard before even if the admiral has greying hair and entered officers' training long before development of the Hypertext Transfer Protocol.

We should probably try to bring this back on topic.

I eventually decided that human beings need some syntax for programming language understanding, and lisps (among a few other languages like FORTH) doesn't have enough of it. These languages tend to have marvelously pure basic paradigms of evaluation and computation, but that purity isn't enough by itself to give most people real clarity as to what their symbols mean.

Conversely, APL with its backwards-martian notation actually requires the user to think about the symbols as opposed to the ideas and that's too much syntax. And Perl has a syntax with so many shortcuts, pronouns, and options that it can look like line noise at first glance. Unless you already know the language it's hard to even identify the boundaries between significant lexemes, and that's also too much syntax.

So how much is the right amount? Python? Python with its indentation-sensitive syntax mandates the indentation level, which is a convention that a lot of us use to keep track of structure; It helps make Python code clear to read. On the other hand it requires the programmer to manually set the indentation level, so there are shortcuts I rely on when writing code (uses of autoindentation in my editor) that won't work.

I remember novelties and inventions that tried to make programs more visual - everybody remembers flowcharts for example. Flowcharts kind of suck at expressing abstraction. Control flow diagrams can express abstraction, but don't do so better than text. There were dataflow languages that tried to express programs as pipes and filters via which data "flowed" from input to output. They were good for some programs and bad for others.

Bavel's power automata were basically a fine-grained control flow diagram where the signals trigger gates for data flow, and I sort of thought they were more important than people seemed to give them credit for. But like most of these things, the actual positioning of various parts of the program in the diagram isn't correlated with the job that part is doing. So all these 'diagram' languages based on nodes-and-edges graphs fail the first test of syntax in that you don't know where to look for the different parts of the program that you're working on - or at least, not without following a bunch of lines that lead off in a bunch of directions only one of which will turn out to be correct. When it's the indentation-level that's the unit of structure, it's much easier to navigate "up" and "down" the parse tree of the program.

There was another language that tried to make the parse tree apparent using a "contains" relation instead of an "indented further" relation. Your program started as a rectangle and then it subdivided it into given shapes for serial statements, control structures, definitions, etc. Your entire program was a rectangle with four straight sides (if well-formed) and tesselated into parts that subdivided into other parts, giving them geometric relations that were supposed to be clearer than text. It was a cute idea, but really they weren't clearer than text. The size of the blocks wound up with little relationship to their conceptual importance in the program, so the structure you cared about was lost in the noise. And they would "stretch" the block making the things you wanted to find tiny. The block diagrams were so useless that the version of it I used had labeled all the boxes with Pascal syntax so programmers could tell what was happening, and a utility that read or printed out the program text as pretty-printed pascal was the development tool of choice.

Funges are *very* syntax-heavy. Originally an example of a nonlinear language (one that doesn't build a "syntax tree" to traverse) in Aho,Hopcroft,Ullman's compiler book, funges tried to make the flowchart idea work for code, but turned out to be a usability disaster.

In fact the funge paradigm is the basis of some of the trippiest of the usable-but-bizarre "esoteric" languages. They were an honest effort to do something good, and maybe there's an idea out there that could render them good. Conversely many of the esoterics are *deliberately* awful.

Stack languages like FORTH also don't really have a syntax tree as such; they just have a linear list of tokens and all the semantic subsetting is done by the stack. I always felt that they were just an abstraction or two away from being a good idea, but I was never sure what the abstraction ought to be. I had an idea to make the stack into a tree, but never developed it. That would give a purpose to at least one kind of syntax and make the programs tree-structured which might help people "get" it. But I'm doubtful.

Anyway, here's what we're up against. We agree that people need some syntax, but what kind? What's it for? 'Cause there are radically different ways to use it. Concurrent programs can be modeled as Petri Nets, but should we use syntax to express the network? Or, more likely, more conventional parse-tree code to say how the "action tokens" are allowed to traverse the network?

Lisp syntax

I was told (or, well, at least I encountered the saying somewhere) early on that Lisp has no syntax. I didn't get it, at the time; how can a language have no syntax? It took me something like three decades to figure that out (can't say exactly since I don't know exactly when I first encountered the saying, but I only figured this out in the past few years). Lisp has no syntax for doing anything. None whatsoever. There is no way in Lisp to tell the computer to do anything at all. The only syntax is for expressing passive data structures; it happens that if you send that passive data structure to an evaluator, the evaluator will interpret it and thereby do something. This is also (btw) where the 1998 paper "The Theory of Fexprs is Trivial" goes wrong: that paper is quite up-front that it only considers the equational theory of source-expressions —syntax— but, in fact, all Lisp source expressions are passive data. All of them. Once you've factored out superficial stuff like insensitivity to how much whitespace you use, the only way any two Lisp source expressions can be equivalent is if they're identical. In order to cleanly model the semantics of Lisp using small-step semantics, you have to introduce some term constructs that do not correspond to any source expression. In essence, these constructs model the action of an evaluator, which is a creature logically separate from the passive data structure of the source code. Vau-calculus has essentially three parts: There's the source-code syntax, which is just a passive data syntax and has no rewrite rules at all associated with it. There's the part that, when externally called upon to do so, renders down S-expresions into computational elements, with rewrite rules that amount to the dispatch routines of a Lisp evaluator. And there's the part that actually does the raw computation on computational elements once they become available from the rendering process. I'd been so successfully brainwashed into believing that fexprs are dreadfully badly behaved and make it impossible to achieve anything remotely like the elegance and power of lambda-calculus, that I'd been working with this stuff for the better part of a decade when it finally hit me that, other than some trivialities, that third part of the calculus to do with computation essentially is lambda-calculus. (Fwiw I did write a related blog post.)

I think I agree that Lisp syntax is a bit too sparse on clues to what's supposed to be going on. At the same time, I approve of both Lisp's elegant evaluator and its clean separation between data structures (which are S-expressions) and control structures (which are basically lambda-calculus). It seems to me what's wanted is a slightly more articulate representation of the data structures. I imagine something that actually looks rather like vintage BASIC; not surprising, perhaps, since that was my first programming language.

It does seems to me, increasingly in the past few years, that what we want is going to be off in some direction that isn't even perceived as a degree of flexibility for conventional programming languages.

Speaking of more articulate

Speaking of more articulate representation of data structures.
What do you think of data visualization? Where shows what the data means in human terms.

Also the beginning Brian Cantwell Smith's "on the orgin of objects" had a really interesting section on how when someone using a computer paint program lays shapes on top of one another they see the intersection of those shapes as a shape to be manipulated in it's own right even though it doesn't have an explicit data structure in the program. I think he suggests that the program could parse the output in order to fluidly move between data structures that correspond the the human understanding of the situation.

Re data visualization

It's been my impression that, bluntly, data visualization is an art. Studying data visualization means developing a repertory of techniques, and then getting creative when actually trying to use visualization on a real data set, drawing freely on one's repertory at convenience. This fits interestingly with known limitations of particular visual programming languages, which work better for some programs than for others. For a given programming problem one might dream up a visualization technique that would work for that problem but wouldn't necessarily work for other problems.

Incidentally I hate the nomenclature.

Calling the nodes and connections in an ANN "neurons" and "synapses" makes me grind my teeth. Biological neurons and synapses are not what they model. Or if they are, then the model hasn't been updated since the 1940s. When I'm writing, they are "nodes" and "connections."

I think I would prefer "signal propagation network" to "artificial neural network" but if I used that terminology there's a real chance I'd cause confusion and be misunderstood.

I'm more than a little frustrated by a complaint about "filtering out whatever technological methods can't handle" without giving any concrete examples of what kind of thing you mean to say can't be handled. Because I guess I'm one of those reductionistic, materialistic people who really has yet to see any such example and I actually doubt that they exist.

Our brains are made of atoms acting according to the laws of physics. There's nothing "magical" about them that is beyond any capability of emulation or modeling. Anything that's categorically out of reach of technological methods is also out of reach of our brains.

That said, it's going to be a long time before we see a computer play a credible game of "seven minutes in heaven."

ok Dillinger, you just

ok Dillinger, you just identified one thing that this technological method can't handle what others things that this approach can't handle can you think of?

Oh, I'm sure it'll happen.

When we do get around to simulating brains, the inclination to participate in such games, and the ability given appropriate peripherals do do so credibly, is likely to be emergent in the simulation just as it's emergent in biological brains. All I said was that we're not going to get there any time soon.

Although what counts as "soon" in this field is its own puzzle. It would be "astonishing", at least to me, if the time were less than 24 months or more than 24 decades. It would be "surprising" if it were less than 5 years or more than 5 decades. Which is a ridiculous spread, but it's a very fast-moving field right now and we don't really know yet when the rate of making significant discoveries about ANN's will start to slow.


Just like syntax is fossilized semantics, semantics is fossilized pragmatics.

However I've never seen any programming language with any significant pragmatics to it.

Re: Dillinger's 'Most people need a little syntax'.

As we don’t have access to a reductionist description of cognition (am philosophically inclined to admit such a description is possible), we can only speculate as to where spatial processing might occur in the spectrum from surface features in syntax, to the deep semantic representation of programs. One thing the paper did suggest was that mathematical comprehension uses spatial processing less so than program comprehension. This seems significant, because mathematicians have to spatially structure their proofs to some extent on the page as well, to aid comprehension.

Separating parts of speech and expressing data flow layering in formal languages based on natural language tree grammar, are facilatated by the use of parens, but there are alternatives. Broadly speaking interlanguages (I mentioned in another comment here) involve widening the spectrum between syntax and semantics, to include an additional syntactic layer to handle the syntactic sharing of subexpressions, before semantics are introduced. Having a tree language system based on interstrings, that allows parts of speech to have multiple parents, obviates the need for parens for the expression of data flow layering. In the interlanguage Space, two levels of parentheses are used to separate storage, submodule and code declarations, another for arrays, and in code a further two or three (not involved in DAG layering per se), are the maximum required to express DAG dataflows of arbitrary depth and complexity. A sea of parens is not a requirement for program readability.

side note

I suspect that relating surface syntax to deep semantics may only be addressable in conceptual terms; having access to a reduction of cognition (even supposing such is possible in practice as well as in theory) might simply be unhelpful. Recall Douglas Adams's description of pan-dimensional beings' vast research project to discover the answer to the ultimate question of life, the universe, and everything, which after some millions of years determined unhelpfully that the answer is 42.


Until someone presents such a reduction, you are of course entitled to question its possibility. But if one were developed/found, would it not be useful in natural language processing, and contribute to the construction of a general theory of human-like artificial consciousness? Lots of practical and useful applications would ensue. Not sure if you are alluding to the unreadability of compiled machine code needed to run a machine, versus a high level program.


(I almost didn't make my above remark at all, for fear of sidetracking things again; but, having started this, I s'pose I oughtn't simply disown it...)

I've mentioned I consider myself a reductionist. In theory. In practice, I'm skeptical. But my small point was meant to be independent of how likely it is.

would it not be useful in natural language processing, and contribute to the construction of a general theory of human-like artificial consciousness?

Maybe, though not necessarily. But I suspect understanding how lower-level wetware gives rise to sapient thought wouldn't help us understand the thoughts it gives rise to, even if it gave us the ability to create devices that actually have such thoughts.


Thanks John, I have an interest in encouraging discussion on the idea of introducing a spatial element to syntax. Although this is a programming languages weblog, abstraction from spatial considerations is in the DNA of Lambda. So that kind of discussion may face an uphill struggle here, because spatiality is somewhat orthogonal to the idea of Lambda being the ultimate programming paradigm.
The creation of devices that have sapient thoughts would be a landmark advance, and things like the Qualia problem in philosophy would have less significance (or none in my view). Although I cannot grasp every detail on every level of abstraction of the way in which a high level program is compiled to machine code and runs, I can broadly understand the mechanisms underlying the translation between differing levels of abstraction. This gives a good intuition for understanding high level programs, and potentially thoughts in my opinion. Not sure if further elaboration would be productive given the blank spaces in the current state of the art.


I'm suggesting construction of a sapient machine might have no useful bearing on understanding stuff like qualia. (In passing: I've observed philosophers are often especially guilty of making distinctions that obscure rather than clarify; when a philosopher starts talking about "qualia" it may be time to consider giving up on them, just as I tend to do when an economist uses the word "equity". But, anyway.) I also think successfully building a sapient machine would turn out to be an evil, both because we'd be creating a class of owned people, and because we'd try to cut corners on the things that —sometimes— cause humans to become decent people rather than monsters; though I don't think we even know what to try to do, let alone have any chance atm of succeeding. Anyway: programming languages.

Abstraction from spatial considerations is in the DNA of the lambda paradigm

Maybe, but we're not limited to that on LtU, despite the name :).

Something it took me years to pick up on: Back in the 1930s there were three main models of general computation: general recursive functions (Jacques Herbrand and Kurt Gödel), λ-calculus (Alonzo Church), and Turing machines (Alan somebody). General recursive functions take an equational approach, λ-calculus an applicative approach, and Turing-machines a mechanical approach. And once general recursive functions and λ-calculus were proven equi-powerful, Church was convinced they were both capturing the general notion of calculation, but Gödel remained skeptical — until both were proven equi-powerful with the nuts-and-bolts Turing-machine model. Then Gödel was convinced. Not to put to fine a point on it, Turing-machines use (one-dimensional) geometry. Bridging that gap between abstract theoretical elegance and concrete mechanics also came up again in Gordon Plotkin's 1975 solution to the problem of small-step rewriting semantics: you have a λ-like calculus that's mathematically elegant, and a syntactically similar but deterministic-rather-than-well-behaved operational semantics (that you can be certain really is describing the language you want to describe), and you prove correspondence theorems connecting them so you can use the well-behaved calculus to reason about the nuts-and-bolts operational semantics.

So it does seem there's precedent for geometry having a place in the larger scheme of things.

Theoretical elegance and brutalism.

Theoretical elegance is a bit like happiness, it is an emergent property and we should be concerned by its absence, but wary of pursuing it as the most important initial goal, especially if practical utility is considered important. My focus on incorporating a mechanical model, and a compiler oriented semantics for a general framework, is desire to bootstrap not just novel languages but also machine models, to address programmability and performance issues in parallelism and concurrency. The problem with the purely interpretive approaches you mention (and the TM) is that they although they may a geometrical aspect to varying degree, are not IMO parallel friendly despite the referential transparency, offer no insights into new machine models, and are based on conventional tree syntax exhibiting the Single Parent Restriction (SPR) I mentioned earlier.

My work suggests new language systems are needed for data structures and languages, because SPR is irreparably defective. They need a machine environment for parallel execution semantics, but do not if avoidance of the SPR is all that is required for data structures. New machine models are needed because the Von Neumann processor network was an ad hoc parallelisation of the Von Neumann model, is neither scalable nor suitable for the executive control of parallelism, whereas alternative models (GPUs, FPGAs) are effective but application specific.
Complete abstraction from hardware in programming languages is overkill, because important opportunities for orchestrating parallelism, and for the implicit avoidance of data hazards and deadlocks are lost. It is claimed that an abstract machine environment can maintain abstraction, whilst avoiding side effects.

I share your concern about the ethical implications of artificial sentients, they should be accorded legal rights of personhood. Having said that, a blueprint for human cognition would reveal the structure of non-sentient subsystems which could be used to perform even more of the tedious repetitive tasks that computerisation has already replaced. I agree that encouraging decency over brutalism, is more important now than ever. If artificial sentience is possible (and I intuit it is), then it almost certainly will happen, given our compulsive curiosity and the commercial opportunities. Bad actors will be able to create new forms of evil, and there should be a great focus on trying to block them. Unfortunately this kind of phenomenon has happened throughout the history of technological development.

If you're looking for a parallelizable geometric model...

If you want an alternative to Von Neumann architecture that's completely parallelizable and reasonably well-developed in terms of its theory but is mechanistically geometric in definition and semantics, you should investigate Petri Networks.

A Petri network is a graph whose edges are pathways along which 'tokens' may travel. Tokens have different 'colors' and each edge is designated as a pathway for one color of token. Each node contains room for a limited number of tokens and may under some circumstances create, delete, or re-color tokens. Tokens operate sometimes as semaphores controlling access to parts of the network, and sometimes as 'bits' carrying information (a base-10 register for example has nodes that can contain 0 to 9 tokens).

A computation works by starting with tokens in a set of nodes, and proceeding to move tokens as allowed along various pathways until a state is reached in which they cannot be moved. If there is only one such state, then the program is considered deterministic - even if there were many ways to reach that state. There can be many ways, and nondeterminism can be represented, because there is no order imposed on the fundamental operations. Any token eligible to move, may move at any time. Any token that may be deleted, changed or created at any node, may be deleted, changed, or created at any time.

"At any time" means local determinations of actions - parallelism is implicit, requiring no central coordination or "flow of control" regardless of the size of the network. And unlike Chomsky Grammars it doesn't require a global state that must be considered by each node.

It is a peculiar model, but there is theoretical work on it that is in a... I'm going to call it a middling state of development. Isn't mainstream and intensely researched like Vonn Neumann architecture, Turing Machines, or Lambda Calculus, but it is well developed. Its state of development is in the tier below Graph Theory or Chomsky Grammars, about even with Automata theory, and considerably better developed than Power Automata.

You can call that raw quality of not having been researched all that much "low hanging fruit" or "technological risk" depending on your perspective, but it is an alternate model to think about and scales (much) better than any of the single-flow-of-control paradigms.

Graph based models

Petri’s great invention of the idea of a marking is a fundamental aspect of parallelism, and found it’s way into the control mechanism of the Synchronic A-Ram described in five paragraphs in section 3 of this paper. Petri Nets are mid level models and were introduced to describe and establish safety conditions for non-deterministic systems composed of nodes/modules with many to many relationships, e.g. concurrent programs, and workflow management. The basic structure used is that of a graph, so a Petri Net description is a set of edge tuples, rather than a textual program. They are usually implemented on networks of Turing Machines/processors, are not usually thought of as a primitive formal model of computation, and have not challenged Von Neumann. They could be repurposed for modelling asynchronous and synchronous digital circuits. However, the implementation of an n-input logic gate in one transition requires 2 to the n places. (Similarily Winskel’s Event Structures require 2 to the n events to model an n-input logic gate in one causation.)

Kahn’s later Dataflow Models are also mid-level, and are also usually implemented on networks of processors. He seems to have taken inspiration from Petri Net tokens but he did not cite Petri — it’s possible the idea was reinvented. Johnston et al in an influential survey [1] describes software and some hardware related issues with dataflow models: the token system of program control cannot easily support iteration and data structures for dataflow programming languages, without adding significant complexity and potential deadlocks to programs. Another contender is graph-like synchronous programming (eg the Ptolemy II environment), which are Dataflow Models with feedback loops introduced to implement parallel executive control for embedded systems. For understandable reasons, some designers did not fancy using process algebras to describe Real Time Operating Systems. Systolic computation on grids of coarse grained functional units lacks an abstract model beyond the coarse grained, systolic grid itself. All these approaches are somewhat domain restricted; the programmer is obliged to cast every program in terms of tokens and streams, which are problematic for data structures and not suited for control intensive operations.

Alpha-Rams are put forward as fundamental ground models. All of the above models of computation can be compiled as higher level programs, and given true parallel semantics. I have suggested elsewhere that nondeterministic concurrency can readily be simulated on the A-Ram as well.

[1] Johnston, J. R. Paul Hanna, and Richard J. Millar, “Advances in dataflow programming languages,” ACM Comput. Surv. 36, no. 1 (2004): 1-34

sapient programming

(While building sapient beings without giving them personhood rights would be, well, vile, building them and giving them rights would be a different kind of disaster. A large part of our societal structure is geared toward raising our young well and limiting some of their rights until they've reached an age where, hopefully, they're wise enough to wield them responsibly; one shudders to think how badly we would bollix all that up when dealing with sapients that can simply be constructed, one sort of mess if we could control how they were going to turn out and another sort of mess if we couldn't.)

I've been trying to imagine a better way to program since I first dealt with general-purpose computers almost fifty years ago. Choosing primitive elements of computation is a game that led me to fexprs. While spatial programming would be thinking outside the box, I'm now thinking maybe it's not far enough outside; like, perhaps the basic primitives in programming should not be exclusively on the formal side of the formal/sapient gap.

Nightmare fuel.

Well, if the conversation comes round to AI and philosophy ... there are a few bits of nightmare fuel to point out.

We're going to figure out Artificial Intelligence capable of consciousness sooner or later. I hope we do it right, because sooner or later it's also going to figure out us.

Sanity is unlikely to be the default condition of the first kind of mind we discover a way to create. I'm afraid we may not be able to make a conscious AI that is also sane, until we understand how and why the first generation are crazy.

And I don't think you're going to get intelligence "shaped" like ours until you evolve it in situations that teach it the value of cooperation. Such situations also have the potential to teach it strategies of cheating, because there is a niche for defectors of various kinds in any environment where cooperation (and therefore trust) are implicit for others.

The ability to simply create copies of a conscious being is a difficult power to determine a proper moral framework around. It's a "gray area" when an employee comes to work and gets copied a thousand times, then at the end of the day a thousand are deleted and one goes home none the wiser. The company leverages one stream of time-in-service to get a more experienced employee, but creating and deleting a thousand of that employee every day. I don't think that's quite okay if we're talking about a sapient being.

Then you get to the fact that humans aren't all sane either and have to consider that some kind of sadist is going to create, and then torture to death, a new copy of his or her favorite victim every week. And yes, that's horrifyingly wrong, but how will we ever find out that it's happening, in order to stop it?

I think our only real hope of developing AI is via some kind of undirected evolutionary algorithm with a simulation of a 'relevant' environment where your desired traits are also so I've already tossed most kinds of "provability" and "explainability" out the window.

And then there's the development cycle. We program by creating and destroying a thousand intermediate versions of the program - most fatally flawed in important ways we don't discover until we interact with them. What's version 3.0 of your AI going to think when it sees you start work on version 4, and it has no memory of ever having been version 2?

Superseding Lambda

Both Shutt and Dillinger raise troubling scenarios which I had not bothered to consider, you are right it is very scary. Artificial sentience is some ways off thankfully (I now think). Let’s hope legislators have enough time to deal with this.

Re sapient programming, I do not suggest that spatial-style programming is the ultimate answer, but do claim it supersedes Lambda. There may turn out to be sapient styles of programming which supersede everything around today.

sapience, space, hardcopy

Re how to deal with artificial sapience, I think we first need to figure out how to deal with sapience at all. We literally don't know what to do with ourselves; our civilization is still on the leading edge of a possible economic disaster because people are losing their economic value as we concentrate on finding ways to do things with machines instead of people (people have high long-term operation-and-maintenance cost). As usual, the economic impact is likely overestimated in the short term and underestimated in the long term. Bottom line, I'm interested in how to make the most of sapient people rather than how to replace them with machines, and, in connection with that, how to alter our economic system so that it incentivizes the former rather than the latter.

Re spatial-style programming. I agree there may be some interesting things to be found. Speaking of which, here's a point I'm just not sure where to put, that ought to hook in with the spatial thing and yet also seems to be bringing in another element of some sort. I've just been contemplating some significant changes to code I've been adding to, somewhat haphazardly, for about ten years, and it's an awful mess because even though I feel there is an overall rhyme-and-reason of it, I can't actually wrap my head around it — and then it occurs to me that what I really want to do is something I used to do back in the 1980s: print out the whole thing on fan-fold (or other continuous form) paper, and unfold/unroll it all across the floor of our living room (the biggest floor space we have). This technique will degrade when dealing with more than a few thousand lines of code, but, still; nowadays we try to view things one screen-full at a time, like studying a world we can only look at through a keyhole. I've never quite been able to articulate the advantage of printed books over electronic ones, but when I'm trying to extract wisdom from an old scientific journal I expect to get almost nothing out of an electronic archive whereas I could spend days on end browsing a physical archive.


Sorry John, I found that you had posted a comment at the same time as an edit to my last post. Apologies.


(Heh. It happens.) I'm not sure what you mean by 'superseding' lambda. The challenge with PLs is to forge a link between thoughts and computations. Are we still talking about computation, but expressing it in a more spatially-oriented way?

Why spatial supersedes lambda for parallelism and concurrency.

My take on the spatial approach argues that the study of parallel and concurrent programming languages should not be abstracted from hardware, if we wish to facilitate the orchestration of parallelism, and the implicit avoidance of data hazards and deadlocks. It proposes that program modules cannot simply be called as an abstract software entity divorced from hardware, each mention of a submodule has an instance similar to the way component instances are numbered in digital circuits on a chip. An instance is the first link in a chain, that reaches all the way down through layers of abstraction to machine resources. Modules have a schematic or circuit character, and if machine size is large enough, instructions are executed in situ without fetch.
I acknowledge that specifications for translating functional languages into the λ-calculus have been devised, but no pure or typed λ-calculus simulation environment for a high level language has ever been devised, that can feasibly simulate non-trivial programs. A lot of effort has gone into making hardware abstracted parallel re-writing of lambda graphs the basis of a declarative parallelism, which has really not borne fruit or entered the mainstream. Synchronic computation feasibly provides an executable compiler oriented low level parallel semantics for any model of computation, and a roadmap for developing safer and higher performance environments and architectures. It is for this reason that I claim the spatial approach supersedes lambda.
I agree with your broader point about people losing their social and economic value because of robots and computerisation. The current socio-economic system is tragedist and brutal, we could do a lot better for humanity and the environment. There are correspondences between programming models and political ideologies, in that they are both concerned with the question of which methodologies and levels of abstraction might be employed, to effect change in large complex systems. A good theory of parallelism might have the potential to be a conceptual antidote to the political form of tragedism, which characterises the idea of effective collective action for the common interest, as always being in some sense incoherent, or impossible in practice.
I take your point about how easier it is review code when it is all printed out, and how browsing a physical archive is more satisfying. For the Space compiler I have an A0 poster printed of the call graph for about 40,000 lines of C code, which helps. It would be cool to have a very large wall mounted screen, that allowed you to display that and bring up module text when desired.


Some thoughts from my recent studies on the evolution of human thought (most recent blog post, fwiw, yonder). Per Eric Havelock, about 2500 years ago there was a massive change of cultural mindset, from orality to literacy. In orality, culture is passed along through a poetic tradition; think of the Iliad (or, for that matter, Beowulf); with severe constraints on how things can be expressed. In literacy, culture is passed down in words that are stable because they're written down (rather than requiring the regularities of poetic form). Literacy makes possible a whole new kind of thought: abstract thought. To my understanding, the conscious self more-or-less-requires a literate culture, because if you want to model society as a bunch of individual people and relationships between them, those relationships are abstract connections, which cannot be conceived via orality. The oral model of society partitions it into, basically, primal motivations, a.k.a. gods, which serve as actors in the poetic story lines. (Note that the Iliad starts something like "Chant, O Goddess, the wrath of Achilles" — the two actors are Goddess and wrath, with Achilles not even an actor but merely a qualifier of which wrath we're talking about.)

A single phenomenon may be analyzed in different and mutually incompatible ways; rather like breaking down a two-dimensional array of numbers into either rows or columns. Analyzing it one way may produce a view that's wildly different from how it would look when analyzed a different way. (Thus, analysis of society into conscious selves versus Olympian gods.) It sounds to me as if you're suggesting that the way we're analyzing computation doesn't support clear understanding of parallelism/concurrency, which naturally leads one to wonder what alternative analysis might be more suitable. I'm doubtful that parallelism/concurrency is incompatible with abstraction as such, but I wouldn't be surprised if it clashes with the abstractions we've been using. For the record, I do not favor decomposition of parallel programs into gods, colorful though the notion is.

Re large displays for grasping the Big Picture of things (figuratively as well as literally), seems to me that when science fiction depicts holographic interfaces not restricted to some terminal, it tends to depict them as huge. Some of the UIs I saw in my glimpses of The Expanse (though alas I only got to see a few episodes) made me seriously envious. On a less radically futuristic note, if I could have one large wall of a room made into one huge computer display, I'd make it a wall of our living room —which is where we have our good music sound system— and I'd play this.

Linguistic schizophrenia and conventional tree syntax.

Yes, I am suggesting that the use of syntax with SPR as a template for formal languages (and even natural language) has had a negative impact on our capacity to describe and reason about complex objects and situations with many to many relationships. The inability to directly share subexpressions contributes to code bloat in software, disconnected representations of environments, and a kind of linguistic schizophrenia.

The cognitive significance of the shift from oral to literate sounds plausible, and the Expanse certainly had some moments, though later series lacked the early impact. Would be surprised if an app for exhibiting live call graphs on large screens with an option to view internal module code is not available, might anyone know of one?


Although I do as mentioned perceive that a well-chosen picture can be worth quite a few words (link), I skeptically reserve judgment on the single parent restriction being too significant a limitation on the expressiveness of language generally. Conventional structural linguistics treats grammar as a self-contained formal structure, but one way or another there's always semantics. In my master's thesis (related link) I was quite enthusiastic about the clarity advantage of trees in expressing grammatical structure. λ-calculus has computational behavior that, to a degree, follows the hierarchical structure of the syntax tree (I tend to call its tree structure the "geometry" of λ-calculus, having gotten to thinking of it that way from my exploration of its possible analogy to basic physics (in case of morbid curiosity: link)); but even λ-calculus has also remote connections expressed through variable bindings. Abstract thought should belong to sapient semantics rather than formal syntax, so the very tricky choice of how to think about parallelism/concurrency, which I mentioned earlier, oughtn't be significantly impeded by tree structure. That said, would spatial representation make a big difference in our ability to express parallel/concurrent algorithms? This is honestly unclear to me. I feel spatial representation is under-exploited generally, and parallelism/concurrency is an especially complicated sort of pattern therefore ought to especially benefit; but I have doubts that the benefit to parallel/concurrent programming would be disproportionately larger. The benefit might be a constant factor that's the same regardless of whether the programming involved is parallel/concurrent.

Why SPR is bad for parallelism and concurrency

When once asked what his thoughts were on a technical issue in addition to the marks on a page he made for a chunk of mathematics, Richard Feynman made the point those marks on a page were his thoughts. I do not usually consider sapience in formal considerations, because the detail is opaque to us. The transformation of marks on a page from one form into another is all we have formally, in being able to understand the relationship between syntax and semantics.

A λ-calculus term can have either it’s own operational semantics based on beta reduction and normal forms, or be given one in domain theory. Sequentiality is inherent in beta reduction, parallel rewriting of lambda graphs is a different model of computation and has its own issues. Domain theory is math and abstracted from computation, so is orthogonal to the issue here. The remote connections expressed through variable bindings in λ may occur anywhere in non-trivial terms, and a simulator/machine must perform work searching or through indexing for bindings, thus presenting a barrier to efficient execution. λ-calculus has a kind of zero time semantics, because beta reduction occurs in one step, no matter how lengthy the term, or how randomly distributed the relevant variable is in the term.

In general, DAGs are logarithmically compressed with respect to their expressions in tree syntax with SPR. Without being able to syntactically express sharing of subexpressions, any branch in a SPR tree may be arbitrarily long, and identical parts of speech may be randomly repeated anywhere in the tree (like λ-calculus). This presents complexity and overhead to a parallel semantic processing of the expression. The spatial character of interlanguage expressions is exhibited by allowing multiple parents for a part of speech, allowing many-to many relationships to be represented directly. They have a highly regular tree structure in which only the rightmost and rightmost but one branch may be indefinitely long, and are designed for efficient parallel semantic processing.

Feynman diagrams

If our paradigm for the sort of thing we're talking about is Feynman diagrams, the key would seem to be finding a really good visual idiom (the advantages of Feynman diagrams don't accrue to just any visual form, after all, but specifically to that sort of diagram). And one would also still have to figure out good abstractions for moving up to a range of scales.

Why SPR is orthogonal to parallelism and concurrency

You defined SPR as a property of syntax ("syntax cannot directly share sub-expressions. I call this the single parent restriction (SPR)"). This definition does not constrain semantics, i.e. we aren't limited by substitution or local reduction.

Graphics processing pipelines, Kahn Process Networks, and many other models more suitable for concurrency and parallelism can be expressed an using syntax with the SPR property. Really, SPR imposes no restrictions in *what* you can model, only in how directly you express the model.

There are good arguments for a more spatial notation for many problems. But I think parallelism and concurrency isn't one.

Re: Why SPR is orthogonal to parallelism and concurrency

Synchronous Digital Circuits can be for example be represented in SPR based RTL without subexpression repetition, because semantic notions are available in the framework. The same applies to Dataflow Models. I do make reference to resulting implications for parallel semantics. I agree SPR imposes no restriction on what you can model, it’s a bit like saying sequential computation can compute anything that can be computed in parallel. But it does obscure insights and opportunities for new language and machine concepts.

vertical and horizontal

It's occurring to me that, in considering λ-calculus for this discussion, I've been confusing in my mind two different kinds of syntactic sequentiality. In terms of the way parse trees are usually represented, these would be horizontal and vertical. Vertical sequentiality —which isn't strictly sequential as such, though it is at least some sort of restriction— is due to the "SPR". Horizontal sequentially, which really is sequentiality in the strictest sense, is due to the fact that what we're writing is a string. So, summarizing a significant part of that, the horizontal constraint is due to what we're rewriting, and the vertical constraint is due to how we're rewriting it. Put another way, the horizontal constraint is syntactic while the vertical is semantic.

I'm not convinced the vertical/semantic/SPR constraint can be reasonably dispensed with. It seems rather related to our ability to conceptualize what's going on, so that eliminating it is, in a sense, eliminating our ability to understand; this is, broadly, consistent with the observation that the vertical constraint is more semantic than syntactic (even though this was originally formal semantics, i.e. rewriting rules, rather than sapient semantics). I also note that, in my analogy between rewriting calculi and basic physics, although the tree structure of a term is what I analogized to geometry, this hierarchical structure wasn't preserved on the physics side of the analogy. The horizontal constraint is clearly syntactic, and could be dropped without compromise to the vertical constraint.

Something's felt a bit skew to me, here, about the discussion of diagrams on one hand and the SPR on the other. I'm now thinking it's this: the SPR is clearly about the vertical constraint, but the introduction of diagrams is horizontal. I'm amenable to the introduction of diagrams, but that's syntactic, whereas the idea of dropping the SPR seems to me to be a much more fraught question that's likely to be confused if one allowed it to get mixed up with the syntactic diagram-versus-string question.


For the purposes of λ-calculus and the reduction rules, SPR cannot be dispensed with. Although the lambda paradigm has been invaluable in generating PL concepts, it is from the point of view of execution semantics obsolete IMO. Outside the λ-calculus, the distinction between syntax and semantics varies to context. The verticality you mention constitutes a layering of parts of speech, which for conceptualisation is reflected in interlanguage and diagram-like DAGs as successive horizontal layers without subexpression repetition. Not sure how that might cause the confusion you refer to, are things clearer?


My point was merely that I hadn't noticed this distinction before; I see the failure to notice as a source of confusion. I do agree that the computational ordering imposed vertically on λ-calculus is inherent to λ-calculus. It's worth noting in that regard, I think, that Church didn't invent λ as a computational device at all, but rather as a precise definition of what a variable means; the SPR is thus, by intent, about variable scope, with the imposed computational order as an unintended consequence. If there's a lesson there for parallelism, likely it's to beware of inadvertent entanglement between your computational device and your scoping rules.

A small note regarding syntax trees for natlangs. There's a distinction between constituency grammars and dependency grammars. Chomsky grammars are, as I understand it, constituency grammars, meaning the emphasis is on the pieces, whereas with dependency grammars the emphasis is on the connections between the pieces. A bunch of the grammatical concepts most bandied about these days (e.g., head marking versus dependent marking) are from the dependency tradition rather than the constituency tradition; I mention this merely because 'layering of parts of speech' seems to favor constituency rather than dependency.


Thanks for the mention of dependency grammar. Will contextualise it for a conference paper in preparation.

stable, spatial mental maps

Term-rewrite systems and graph-rewrite systems have a clear spatial aspect, but the shape of space is unstable. Cellular automata have a stable spatial structure, but the scope of activity and communication (e.g. floater patterns) is unstable.

Only to the extent a program's spatial model is also stable can we simplify mental maps. Having a stable spatial structure is also convenient for presenting and debugging program behavior, e.g. live documents with state rendered in-situ. Kahn Process Networks, Functional Reactive Programs, Flow Based Programming, and Blackboard Systems all have both properties to a significant extent. I'm sure there are many other suitable models.

Yet, the most popular, most conventional programming models rely heavily on first-class functions or object references, which are frequently created, communicated, and destroyed during runtime. The resulting structure is unstable except for whatever ad-hoc structure is imposed by the software architect. And I think this hasn't been good for comprehension of programmers.

Simplifying mental maps.

Comprehension is of course hard for chaotic and dynamic programs, especially non-deterministic ones. The mid-level, spatial/dataflow programming models you mentioned are better behaved and easier to understand, but either rely on non-textual diagrams which makes manipulation complicated, or descriptions in conventional tree syntax which obscures the network of relationships. The architectural structure of the Field Programmable Object Array-type machine architecture I am considering is fixed like an FPGA, but whose functional nodes and links are reconfigurable. The system will be able to support abstract datatypes and virtual modules, and I intuit space shape and communication patterns can be dynamic, without impacting on mental mapping too much. But I am promising jam tomorrow, so at best I can only ask you to reserve judgement.

SPR vs. Syntactic Entanglement

the single parent restriction (SPR). Incorporating SPR into the most primitive syntactic category above symbol level, results in difficulty in representing many to many relationships on the semantic level

Although SPR does have some disadvantages, it also has a few advantages. Relevantly, it limits syntactic entanglement between program expressions. Reduced entanglement at the syntactic layer simplifies metaprogramming, modularity, composition, and code reuse across contexts. One reason I favor combinatory logics over lambda calculus is that lexically bound 'variables' are also a form of syntactic entanglement.

A compiler for SPR languages can add a deduplication pass to deal with undesirable "repetition of terms, explosion in expression size". This is especially valuable in context of metaprogramming and inline optimizations. Compression is easy to automate, whereas barriers introduced by syntactic entanglement don't have any known automated solutions.

Kahn Process Networks are semantically spatial. However, syntactic expression of KPNs may vary wildly from spatial box-and-wire diagrams to conventional procedural code that 'constructs' an abstract KPN object. I prefer a hierarchical process model: A program represents a process, and may contain subprograms representing embedded processes. Every process has an interface of external data ports, and the program can wire between ports. This design can combine benefits of hierarchical structure with benefits of stable, spatial structures. The hierarchy essentially becomes part of the spatial model.

I could render a generated KPN as a box and wire diagram, but I can also support abstraction over many potential diagrams aligned with the hierarchical structure and single-parent restriction.

Re: SPR vs. Syntactic Entanglement

A DAG (as opposed to an SPR) based program with a lot of subexpression sharing (syntactic entanglement) might create the problems you suggest, if one then dispensed with any attempt to employ modularisation and encapsulation in program description. But the latter two are essential for Space, and I would have thought for any high level programming environment. I can think of one or two situations where you may not want subexpression sharing, but provision could be made in the framework for optional repetition. The combinatory logics I have seen have inherently sequential reduction, and would require some kind of parallelisation, like λ graph processing instead of λ term reduction.

Compilers for SPR languages do have an optimisation phase that roots out subexpression repetition, but this is not the only issue arising out of SPR. The strategy that is being proposed calls for software not to be studied in splendid isolation, but for hardware to be closely aligned from the outset, in order to address programmability and performance issues in parallelism and concurrency. If one leaves SPR to be sorted out later under the hood as it can be, then insights and opportunities are lost for new language and machine concepts. For example Space dispenses with the need for Static Single Assignments, and interlanguage gave rise to a new class of formal machine models with unique characteristics (see slides and video (in two parts) .

Kahn process Networks and Dataflow Models generally (and there are many attempts at enhancements) acknowledged the primacy of the DAG, and the need to consider new takes on the Von Neumann building block, and had the benefit of describing dataflows as graphs rather than trees. But DM is not a fundamental low level model of computation, being understood to be a network of Turing Machines, Finite State Machines, or Von Neumann machines. Program control is oriented to tokens, which tends to restrict their use to DSP like applications. I am suggesting the Von Neumann network model and its derivatives, and the former's associated multi-threading have in general not served us well, and alternatives need to be considered.

Module names are expressions, too.

Using an 'import modulename' expression from multiple modules is technically subexpression repetition. There is no DAG in the syntax. Just another pronoun.

How a PL interprets an import expression is a question of semantics. It might not form a DAG even in the semantics. My Glas language simply provides access to a module's compiled value, for example. A conventional language might treat modules as static objects, imports as references and namespace manipulation.

Most PLs get encapsulation backwards. They protect modules from users when the real need is to protect users from modules. The conventional coupling of encapsulation with module systems does vastly more harm than good.

In any case, SPR seems unrelated to modularity except insofar as your syntax for referencing and composing modules is either SPR or not.

Re: Strategy

There is more than one hardware description language with the SPR syntactic property, and I'm sure you can imagine non-SPR languages well divorced from hardware. I don't believe that poor alignment with hardware is an "issue arising out of SPR". I see no causal connection.

For a PL design, there is a huge list of system qualities and a list of cognitive qualities that might receive different levels of attention, priority, conflict resolution, and compromise. Close alignment with hardware is related to several qualities, especially efficiency. But it isn't wrong for a PL designer to place higher priority on other qualities.

I do agree that we should consider alternative hardware models and architectures. I just also believe this is a totally separate issue from SPR syntax.

Re: Module names are expressions & strategy

Let’s see if I understand you properly. In Space a module’s submodules can declare more than one numbered instance, where multiple instances are each compiled, and are required in order to implement parallelism. It seems both Space and your Glas both provide access to a module's compiled value. Text of program code is only composed of DAGs and module replication and program control constructs.

Compilation of a module first involves constructing a semantic DAG of submodules, where each successive layer of the DAG represents a higher level of submodule abstraction. Space tries to protect modules from users, and to protect users from submodules, whose internal parallelism remains protected from meddling at higher levels of abstraction.

I mentioned earlier in this too long thread, of a non-SPR language for data structures which does not need to reference a machine environment. I argue they do need an abstract machine environment for parallel code, which is usually the preceding layer of abstraction. I offered an interlanguage treatment of functional 2-ary terms in predicate logic, incorporating an abstract machine environment, which preserves the Tarskian distinction between syntax and semantics. So although aspects looks semantic, interlanguage is syntax in the Tarskian sense.

Having initially looked at the list of system and cognitive qualities, the potential for having to consider trade-offs does not immediately seem clear within the context of Space, but to check that properly would take time. Generally my strategy is led by performance and efficiency.

I say poor alignment with hardware is an issue arising out of SPR, because SPR has obscured the study a lot of things, including formal machine models, and hence more physical ones. I would not suggest an immediate causal connection, but an indirect one. So in my view this is not a “totally separate issue”.

Ambient calculus & Bigraphical Reactive Systems

Both Luca Cardelli’s Ambient calculus and Robin Milner's Bigraphical Reactive Systems seem to offer promising and intuitive spatial formalisms. Sadly, it seems little development has occurred in the last decade. Most sad to me is the abandonment of Bi-graph systems with the death of Robin Milner. Luca Cardelli, however, seems to have abandoned his Ambient calculus in favor of something that more resembles the pi-calculus. There is some irony here as Milner (one of the creators of the pi-calculus) was moving towards something heavily influenced by Luca’s Mobile Ambients.
I have wondered, is there something problematic about these formalisms in particular, or did they just never escape the computer science ivory tower they were born into.

Problems with Process Algebras and their offshoots.

In the beginning of the Process Calculi paradigm, the choice of underlying machine model was considered to have been resolved, and taken to be the problematic Von Neumann processor network, (viz the Communicating Sequential Processes). Elaborate structures have been devised to incorporate, or subsequently add, a true concurrency and resource semantics to Process Algebras. I question the original premise of using an SPR based algebra, abstracted from space and hardware, to address parallelism and concurrency. In addition I question the decision to put to one side the hardware issue, and to take the notion of a communication channel as fundamental. These generate deadlock and other hazard potentialities, which much of the effort in Process Algebras is designed to address. They never needed to be there in the first place. The later enhancements you mention do seem to take more note of the spatial aspect, but seem complex and very elaborate, not helpful for escaping the ivory tower.

Graph rewriting

I believe that bigraphs were not very successful in themselves, but motivated further study of graph rewriting from an abstract point of view. This was very influential and lead to an algebraization of graph rewriting with new mathematical tools like adhesive categories, double push-out rewriting, and the like. See e.g. the work of Lack, Sobocinski, Bonchi and so on for more on such topics.

Curious parallel with art appreciation.

Some kinds of cognition involve less motor or spatial awareness (eg mathematical thinking). A neuroaesthetic study of patients with Parkinsons disease, throws light on how parts of the brain involved with movement (and presumably spatial awareness) are involved with art appreciation. The complexity leaves me wondering if we will ever get a handle on the intricacies of how we function.