Cognition and Coding

Fascinating research from MIT on human cognition and coding. The brain uses spatial navigation as well as math/logic areas to understand programs. Our apparent need for at least some spatial processing when programming, may lend additional support for an explicitly spatial syntax and semantics for programming languages.

The full article is published here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: spatial reasoning in

Re: spatial reasoning in programming, I think it makes sense. From birth, we develop an intuition for locality, Einstein's "continuity of action". That's likely connected to why lexically scoped variables are easier to reason about than dynamically scoped variables, why equational reasoning and referential transparency are easier to reason about than ambient side-effects, and so on.

Drilling down into code is applied reductionism, but reductionism is harder to apply if it's hard or impossible to isolate a subsystem due to non-local effects (GOTO, mutable state, particularly static mutable state, thread preemptions, etc).

The particular form in which these are expressed in current languages aren't necessarily the ideal form, but the principles are probably sound.

Locality and FP

hi naasking, I agree physics has a role to play in computer science, but was surprised by your jump from spatial thinking in programming to locality and then to FP-type precepts. I imagined that spatial considerations would incline one to consider environments which are not abstracted from state and space. (In my view GOTO and static mutable state have advantages and do not have to hinder the isolation of subsystems, providing their use is properly contained, eg lexical scoping. I also note there are neural connections in the brain that are much longer than the majority of ‘local’ links, make of that what you will.)
I refer in my reply to John Shutt to the absence of a functional language in the above study, it would be fascinating to consider a range of them in terms of FP purity. A guess is that they might have profiles closer to the math/logic side, than to the spatial side of the brain.

In my view GOTO and static

In my view GOTO and static mutable state have advantages and do not have to hinder the isolation of subsystems

I agree they have advantages and do not have to hinder isolation, but they absolutely can and often do because they're not suitably restricted. The non-locality of QM is instructive: it's just enough to permit interesting behaviour, but not so much that it permits superluminal signalling which would lead to paradoxes (integrity bugs in a program in this analogy).

That's the kind of outcome you'd want for taming mutable state and control flow.

A guess is that [FP languages] might have profiles closer to the math/logic side, than to the spatial side of the brain.

I'm not sure. Qualitatively, I think I just use different spatial intuitions in FP because everything is local, but I still think in terms of pipelines and how this value here will influence an output "over there", further down the pipeline (or conversely, how this incorrect value came from "over there" further up the pipeline).

Miscellaneous thoughts

I agree that programming involves geometric thinking. Admittedly, I have something of a geometric/visualization turn of mind, which I gather isn't so for some people. (To date, I don't venture much into the PL-design implications of differences between cognitive styles of programmers, because I find I can't reason about that without my mind blowing a fuse.) By my observation, it's not just me, though. Flowcharts were a thing, back in the day, and I've heard of programmers who were never as effective after flowcharts went away. People keep trying to make visual programming work.

Which said, I don't trust the feasibility of learning about thought processes by studying which regions of the brain are activated. Sure, the brain gives rise to the mind, but I don't think the causal connection is remotely straightforward. Reductionism calls for the lower level to give rise to the higher, but that doesn't mean you can seriously expect to understand the higher level by studying the lower. Attempts to reason from brain impulses up to semantics put me in mind of a (partial) line from The Princess Bride: "I do not think it means what you think it means."

I also perceive that programming is a linguistic activity. All software that truly lasts is text-based (witness TeX). Despite those persistent and revealing efforts to make visual programming languages work, we still use text-based PLs; I submit there is a reason for that.

Re naasking's point about non-local effects, though, I would suggest that programming requires non-locality; like quantum mechanics, you can't get away from it, and if you try it might seem okay for a while but sooner or later something will go horribly wrong.

An explicitly spatial syntax for programming.

Given the blanks in our knowledge of how cognition actually functions, I agree evidence from low level brain activity is at best circumstantial and cannot be used to demonstrate the cognitive reality of higher level processes. I am reminded of the futile attempts to prove the psychological reality of transformational grammar (some discussion here from 1978). But such studies can offer clues, and I wish that this one in particular had included a standard functional language abstracted from hardware and a notion of space — it may have given a different result and be of relevance to naasking’s intuitions.
I agree programming is essentially a linguistic activity, but I submit there is a reason why visual programming has such a draw — it is better at representing many to many relationships than the standard tree syntax found in programming languages, which is inherited from natural language. A conventional tree has non-leaf nodes which belong to syntactic categories known as parts of speech, and such nodes can individually only be a child of at most one more complex part of speech — hence the syntax cannot directly share sub-expressions. I call this the single parent restriction (SPR).
Incorporating SPR into the most primitive syntactic category above symbol level, results in difficulty in representing many to many relationships on the semantic level, giving rise to things like pronouns, repetition of terms, explosion in expression size, labelling of complex expressions, and un-necessary complexity.
One solution is to ditch natural language grammar as a template (why assume it is a suitable template for formal languages?) and to devise an explicitly spatial environment incorporating restricted tree forms and an abstract memory, which allows parts of speech to be children of unbounded numbers of parents, called interlanguage (no relation to Selinker’s linguistics concept of second language acquisition). A brief summary can be found in section 2.4 of a an 8 page paper. A shorter summary of synchronic computation can be found here.

Yea, verily

I'm substantially in agreement with most or all of that, yes. It is a fascinating subject, and even circumstantial clues are of interest. Small sample size, which I see is mentioned in the full article (as one should expect of a properly cautious scientific paper), would likely present an obstacle to studying more detailed distinctions between programming paradigms, especially as one would really want programmers fluent in various paradigms. I find myself wondering, come to think, about people communicating fluently in Lojban (speaking of fluency and sample size).

My own view (btw) on the universal grammar hypothesis, based on my wrangling with the origins of sapience, is to substantially reject the metaphor of a coding/decoding device interceding between the sapient mind and its communications interface with other sapient minds. As I put it in a recent blog post (six months ago seems awfully recent, in the skewed subjective time of this year), "[...] given that there has to be some sort of thought engine there at all, suppose it's a powerful engine and I see no call for anything more than that; don't bother with a fancy interface unit to code and decode, just hook up a transmitter and receiver and let the things resonate with each other directly" (link).

Re naasking's point about

Re naasking's point about non-local effects, though, I would suggest that programming requires non-locality; like quantum mechanics, you can't get away from it

The tide may be turning! I think the analogy to QM works well though: once you start down the path of non-local effects, you give up your ability to assume the systems you're analyzing are statistically independent. Seems to check out, and we probably want to avoid that whenever possible.


Hm, the trick seems to be getting just the right kind of non-local correlations; a very deep thing, as aspect-orientated modularity demonstrates. Yet another facet to the programming-QM analogy: just as we're still baffled by how to do modularity right, we've been struggling with QM (especially its thorny integration with relativity) for a century or so and still haven't got it figured out.

Most people need a little syntax

I had (still have, to some degree) a long-lasting love of lispy languages. You know the general type; everything's fully parenthesized prefix notation and the language also manipulates tokens expressed as fully parenthesized prefix notation. But most people can't really look at Lisp code and tell what's going on. Some say it's because there are too many nesting levels for proper cognition, but I am pretty sure it's because there's not enough syntax for them. Because below some threshold every statement looks alike to them no matter what it does.

I made a specialized lisp a while ago as a translation-target for other computer languages - like an intermediate code in a compiler, but with programs isomorphic to the syntax tree expressed in the original language, and readable and editable by humans. At least that was my thought. But by the time I got done with FEXPRs and access to caller environments (necessary to be parse-tree compatible with some languages) I had reached a peculiar point: It was now possible to make a program with the same parse tree as original source code in dozens of languages and have the program mean the same thing, BUT: any non-primitive expression could mean anything.

And at that point I discovered that it became very hard to use for anything complicated, because what little syntax I had left - the parens, mostly - had lost its value of telling me what the hell was going on when I looked at a given expression. I started to feel about my translation-target dialect the way most people feel about lispy languages in general: not enough syntactic clues what's going on.

We want things to have a shape. We want characteristic symbols in consistent spatial relationships to each other so we can home in on the bit we care about. We want some redundancy that shows important things more than once, so we get it sometime before we've read the whole program source. We want to know what kind of expression we're looking at before we've gone somewhere to read its source code. And parens don't give us any clues.

So, yeah. People use spatial memory. That's as expected. No program language is really written all on one line: we use indentation as organization to tell ourselves relationships between things. And as we do so we give things recognizable shapes. And, like syntax, the shapes help us keep track of what's going on.

Lisp syntax

There's a new Unix shell called Oh, which is essentially a Lisp using Kernel-style fexprs, that manages to eliminate almost all the parentheses with a few simple conventions.

As for syntax extension, I've long had in mind to provide a gently-extended syntax for Lisp. I got over my early enthusiasm for free-form syntax extension once I'd systematically explored its consequences (techreport (pdf)): you should always be able to tell, looking at a coherent piece of source-code, how to parse it, without having to know what syntax extensions might have been applied before processing that code. This doesn't preclude mild syntax extension, but it does mean you have to start with some sort of straightforwardly unambiguous syntax and then further constrain things from there; likely one would do this with an "environment" defining certain "operators" which are the only ones allowed within its scope, generalizing Lisp environments in which the operators are symbols.

(I've some notions how one might do this. A single line would automatically be a statement. Within a line, full parenthesization would be required (except that the line as a whole wouldn't need parentheses, of course), but rather than prefix notation, elements of an expression would alternate between subexpresions and keywords; some expressions start with a keyword (so the odd sub-elements are keywords), some start with a subexpression (so the even sub-elements are keywords). There'd be some sort of lexical rule for which lexemes are keywords and which are, or can be, identifiers. Somehow one would have to translate these "operators" —each operator would specify the sequence of sub-expressions and keywords— into a syntactic representation that can be looked up in an "environment" (not sure whether or not that would be a conventional symbol-value mapping). I've speculated on rules for multi-line constructs, likely involving some coordinating (separating/delimiting) lines such as begin/end or curly-braces, but haven't worked that out in as much detail.)

Whether all this is "spatial" or "linguistic" (in the sense of involving, say, Broca's area), idk. Although, in attempting to apply concepts from lambda-like calculi to basic physics, I've taken to describing syntactic tree structure as "geometry", and symbol bindings (thus, environments) as non-locality.

yeah: syntax matters a *lot*

If it weren't already the case that Java (vs. C++) and Golang (vs Python) hadn't taught us the critical importance of getting syntax right (and the extent to which it's really more important in many programmer's minds than semantics), the example of Perl5 should be dispositive.

Perl5 is a modern LISP: it's got packages, functions-as-first-class-objects, dynamic binding, a primitive object-system, GC, and on and on. All in a package that has a killer app: "UNIX scripting". And yet, so many people can't get past the syntax. I program in a bunch of languages, and *also* in Perl5. To me, its syntax is just a facade behind which lurks LISP, and that's how I program it. But the number of people I know who can't read Perl5, and who won't even attempt to learn how, is pretty compelling proof that syntax, and minor syntactic matters, are far, far more impoortant than we give them credit for.

P.S. I'm not arguing that anybody should write Perl5. Far, far from it. I'm just noting that the biggest reason people don't is -syntax-; I remember well first learning Perl5 in 1995 after having been deep in the Python VM for 6 months, and thinking "this is just a different syntax for the Python VM". Not completely accurate, but still, not wrong, either.

syntax and semantics

Back in 1967, Christopher Strachey's "Fundamental Concepts in Programming" proclaimed the "basic irrelevance of syntax and primacy of semantics". Some here might recall I wrote a blog past a few years ago challenging this premise, on the grounds that abstraction really straddles the boundary between syntax and semantics. (link)

I don't actually disagree with your observation that you can't get anywhere without good syntax. I would add that (despite my challenge to Strachey's extreme statement) in the long run you can't get anywhere without good semantics, either. I'm reminded of the classic observation that the most important leg of a stool is the leg that's missing.

Syntax blurs into semantics.

People claim - frequently - that the semantics of a language matters and the syntax doesn't, or vice versa.

But here's a bit of perspective from studying human languages - there is no clear dividing line.

You take whatever you're doing to parse the syntax of a language, and do "more" of it (for some value of "more" - and you will start discovering that more and more higher-level ideas are "only" syntax... until eventually you realize that you're dealing with semantics, and that the syntax was really like the recursive base case of it. Or, you take what you're doing to get the semantics out of an analyzed string, and you eventually realize you could have done the same thing, with some configuration, to break the string into syntactic tokens in the first place.

The one fades into the other, or defines the other, the way pointillist paintings are defined by the shapes of both the point color and the background color.

It may be the case that something similar applies to computer languages as well. At the very least I wouldn't presume to rule it out.


I'd agree about the blur. Your detailed description seems to me applicable to programming languages but not human languages; the difference being that with programming languages the "semantics" is generally taken to be computational, thus formal just as the syntax is; while with human languages the semantics is in the province of sapient thought which I do not find ultimately formalizable at all. Thus, I would agree that with a programming language, doing "more" of what you do for parsing will gradually move you into the computational-semantic realm, but with a human language doing "more" will gradually slide into a quagmire caused by trying to formalize something that will never fully submit to formalization (though it may keep leading you on so you can spend a lifetime in the unsuccessful attempt).

Formal Methods are no longer fashionable for human languages.

That "squishiness" - in both syntax and semantics - is one reason why language models are now statistical - or neural networks, which is "more levels of" statistics.

I had experience dealing with human languages using purely formal methods though - with formal grammars that grew, and grew, and grew, until you reached the point where you couldn't make them any better until you upped the minimum memory requirements for machines to use your software. In practice they were about 95% accurate on 4G machines, 97% accurate on 8G machines, and we got up to 98.5% on 16G machines but never finished that project. I got a couple of patents for design work there on resolving pronoun referents, but they became irrelevant at about the same time they expired, because the world has moved on.

Anyway, that system started with plain old pattern matching - pick up a few keywords, hang them on a sentence template, consult the context established by the foregoing sentence, and decide what general context to parse in. Then parsed using a regular-old recursive-descent production grammar with a token stack and checked structural invariants using a link grammar. And, modulo the grammars growing infinite hair the more you tried to cover edge cases and exceptions, it actually worked as well as any neural network trained on a corpus of less than ten million words. And my experience was that the link-grammar phase eventually built the same kind of parse tree out of the "syntax" that a compiler builds extracting the "semantics" of a program. We could traverse the parse tree extracting meaning (or at least the top four or five candidate meanings) the same way a compiler traverses the parse tree of a program extracting instructions.

Computer languages are much more tractable. The grammars for them are unambiguous, rarely require anything more than Chomsky type 2 parsing, (natural-language is context-sensitive, meaning it requires Chomsky Type 1, at least) and usually fit in just ten kilobytes or so of memory. I'm a little bit surprised to hear that you experienced the same thing there; for me it was something that eventually emerged from the howling complexity of natural language.

lies, damn lies, and...

I would characterize statistics, and (artificial) neural networks, as abandoning sapience altogether, forcing us to view the world through a filter that removes everything that can't be handled technologically ("formally") so that one can then claim formal/technological methods can handle "everything" — since one has excluded anything they can't handle).

You are reminding me of

You are reminding me of so many people who may have claimed in the past that something (even something simple like correctly identifying a picture) requires "real" intelligence, but when confronted with software that actually does it, immediately dismiss the task as something that can be left to mere technology.

Artificial Neural networks as they are usually designed are hardly more "sapient" than the muscle reflex when someone's knee gets a tap. But that's a symptom of our intent in building those systems. We want something that reliably has this particular kind of 'reflex' as appropriate to our chosen classes of inputs. We design for that, and we get it. In the rare cases where it's not the symptom of our intent, it's a symptom of the limitations of our design skill.

Neural networks can be considered a particular type of domain-specific language. It even has some abstractions, like "edge recognizers" at one level that are set when the input hits them and can then be used by reference (via a connection and multiplier of course) by nodes at later levels. The formalism is as Turing-Complete as any programming language in that whatever arbitrary task you like can be performed by some network of nodes of summed input and nonlinear output connected by edges having multiplicative factors. In fact I could write a compiler that takes formally expressed tasks and spits out networks having, provably, EXACTLY that behavior. You could too; it's a simple implementation of boolean logic.

So if using an ANN is "abandoning sapience all together" then sapience is a mystical thing that can't be done by any machine. I think I have a squishy machine between my ears that does "sapience", so I'm pretty sure there exists some machine that can do it.

The uses of ANN's that you are objecting to are strictly beyond that. We can usually guess at a network design for a simple task, then use statistical feedback to find a set of weights for the connections in that network that *approximate* the task, with no available proof that the task is being done with exact formal precision. Even if it's a task we can't specify a formal method for. That capability is strictly IN ADDITION TO a complete formalism capable of general computation.

errors of reasoning

(This is going to be quite meta. Sorry about that; nature of the beast.)

Errors of reasoning are rampant on this subject. Like, I suspect, most people, I first became aware of various errors of reasoning commonly committed by people trying to claim the human mind can do things technology cannot. Gradually I became aware that people on the other side of the question —who claim technology can do anything the human mind can— commonly commit different errors of reasoning.

People in the "humans are smarter" camp may attempt to justify their belief by claiming some particular standard task by the human mind cannot be done by technology. I suggest that for any straightforward standardized test (yes, "straightforward" here is a weasel word), one can reasonably expect to construct technology that will outperform the human mind on that test; making this sort of claim a common error by the "humans are smarter" camp.

People in the "technology can do it all" camp are, I think, apt to agree with my suggestion, but then try to infer from it their own favored conclusion. That is, accepting that technology can perform all such standard tasks that the human mind can, they conclude that technology can do everything the human mind can. Which would follow iff everything that can be done by the human mind is a standard task of this sort. This however is also an error of reasoning, unless one also provides evidence that everything that can be done by the human mind is of this sort. Notice, there is no burden of proof here on the "humans are smarter" camp: the inference isn't valid in the absence of a counter-example (something humans can do that isn't of this sort), rather it's invalid in the absence of evidence that there are no such counter-examples. Trying to put the burden on the "smart humans" side would be what Wikipedia calls argument from incredulity. Assuming all tasks the human mind can do are of the standard sort, and inferring from this that technology can do everything the human mind can, would be circular reasoning.

Another possible error here would be to suppose that because some of the "smarter humans" camp commit a strategic error (by pinning their hopes on some standard task that they expect technology to be incapable of), their conclusion is necessarily wrong; that one would be argument from fallacy. Generally, it seems, someone in the "smarter humans" camp would make this strategic error because they've reached their position without having solid reasoning to justify it, and they're looking for a justification. Their reasoning could be "mystical", for which I'd have no particular sympathy (that too should qualify as a fallacy of some sort); but they could also have an intuition that they just can't readily pin down.

(Do I have a specific reason to suspect the human mind actually is capable of something technology isn't? Well, yes; I have in mind some evidence —not conclusive, but perhaps suggestive— and I have some ideas about the possible nature of such capabilities, though all of it is clearly quite preliminary. I'm wary of getting too far into that when I suspect my audience may be hung up on argument-from-incredulity.)

(Oh, and, just for the record: yes, at heart I'm a reductionist. That doesn't put me in the "technology can do it all" camp.)

The Church-Turing thesis does apply to the problem.

I'm not really arguing that all the things a brain can do are "straightforward tasks". I'm arguing that the brain can't do anything that's beyond the capabilities of something made of atoms acting according to the laws of physics.

And because ANNs are provably general computation devices, they can, at least theoretically, do anything that a device made of atoms acting according to the laws of physics can do. As can any general computation device capable of high-fidelity simulations of atoms acting according to physics.

This is not an argument from incredulity, nor an argument from fallacy; this is an argument about the equivalence of the capabilities implied by the fundamental laws governing the two different types of device. I argue that this is true because the Church-Turing thesis is true. I argue that the Church-Turing thesis applies because simulating the action of a device made of atoms acting according to the laws of physics is within the set of things that software can do.

The question of whether and how such a network is designed and/or trained is at this point beyond us. But a general AI constructed using GOFAI techniques in a classical programming language is also beyond us, even though we've gone from "Hello world" to analyzing the orbits of billions of stars looking for evidence of dark matter.

The Church-Turing thesis also doesn't apply.

I'll comment a bit, and then I'll see if I can't bend it all back toward the point I was (if I can remember that far back) trying to make re programming languages.

I agree that the argument you're putting forward there is neither from incredulity nor from fallacy. Though afaics you are starting with a declared belief in reductionism and reasoning from there to what can be done by technology versus by the human mind, and there's a logical gap there. As mentioned, I am inclined to reductionism; but I note (and approve) a qualification in your remarks: "at least theoretically". Here's a related quote (mentioned in my relatively-recent blog post on the philosophical implications of Gödel's Theorem(s), yonder): "In theory, there is no difference between theory and practice. But, in practice, there is." The question here seems to me to be whether these two levels of description —"formal" and "sapient"— can be unified by understanding one in terms of the other; and I suggest the answer should be: in theory yes, in practice no. I also suggested in the blog post that formal reasoning is bottom-up, while sapient reasoning is more top-down (though I wouldn't say that's an entirely adequate way of putting it; but it's a start). Sapience is characterized by considering the larger context, which is where building things up formally keeps its uncovered gaps.

I alluded, btw, to having a reason to suspect sapiences are in some sense fundamentally more powerful that formal systems. Fwiw: I'd take a cue from memetic evolution. As I read it, genetic evolution on Earth has followed a track about four billion years long, with a major jump in rate about four hundred million years ago; while memetic evolution has followed a track no more than about three million years long, with a major jump in rate about fifty thousand years ago. This suggests to me that memes are evolving more than three orders of magnitude faster than genes did. Yes that could be simply due to a faster copying mechanism; but I don't think so. The copying process is profoundly different, and I think the key difference is that the host minds are sapient. (I could go on with the memetics theme, but that mostly gets further afield from PLs.)

So we've got these two levels, call them technology and sapience, or formalism and sapience, or even syntax and meaning; I reckon reductionism is supported "in theory" by the higher level being "emergent" from the lower, while the divide between the two is likely to be, at the same time, unbridgeable in practice. This would seem to go both ways: describing concepts in terms of cerebral neurons is likely at about the same level of hopelessness as describing cerebral neurons in terms of concepts.

The point I started out to make was that the description of a programming language seeks to connect syntax to computational semantics, which is basically connecting one kind of formalism to another and ought to be doable; whereas the description of a natural language seeks to connect syntax to concepts, which is attempting to go from the formal level to the sapient level. That syntax-to-concept thing might start out with a plausible-seeming approximation, but blows up as you try to improve the approximation. The same should happen if you try to bridge the gap from a programming language (syntax or formal semantics) to sapient concepts, which are always hovering nearby since programming languages are meant to be used by sapient beings.

I alluded, btw, to having a

I alluded, btw, to having a reason to suspect sapiences are in some sense fundamentally more powerful that formal systems.

I'd maybe buy that if "sapiences are complete but inconsistent" is a lemma of any such argument. For instance, if they are truly top-down where formalisms are bottom-up, then sapiences would be context-sensitive logics, ie. I can associate rules X,Y with context C1, and rules Z,W with context C2, but I'll just be really confused and probably get stuck if I ever find myself unable to figure out the context.

Existence proofs of sapience exist. Explanations not so much.

A lot of AI people are looking for systems that do something which is not what cognition does. They want guarantees of correctness. They want explanations. They want consistency. They want a symbolic understanding of cognition and subjective experience.

While there are existence proofs that sapience is possible, there are no proofs of any kind that a sapient system capable of that kind of guarantee, or even an accurate symbolic *explanation* of its own actions, can exist.

The "extraordinary claim" therefore is the claim unsupported by evidence. I'm not inclined to believe those who claim that a system meeting these additional requirements can exist, until or unless there is strong evidence that it can.

I can give an explanation of why an ANN does what it did. But it won't be the kind of symbolic explanation that would satisfy humans; it would be a mechanistic trace of the signal-propagation that led to the output. That's the true explanation. It would take longer to even read than the reader or author has time to live, and comprehending it would require retaining millions of times more context in short-term memory than humans can. That's the only explanation that's "true", but that doesn't make it a good enough explanation.

People want explanations in shorthand, symbolic terms that they can understand. Even when that fact means that those explanations are necessarily false. I fear that when dealing with anything that requires a "general" intelligence, they are doomed to disappointment because humans are unable to fully comprehend the grinding monotony and vast, asymbolic, inelegant bulk of the correct answer.

The best explanations we can come up with would likely be produced by the same method as the actions themselves. Obviously if we can produce correct explanations in symbolic terms then we should be using the explanation-generating system to produce the action in the first place because that would be the only system of which such an answer could be true.

Otherwise the explanations, like the actions themselves, will always come with no satisfying guarantees of correctness. The explanations we give when someone asks us why we did something, even when we believe that they are true, are post hoc. They do not seem to be substantially more correct than the reasons an outside observer of our actions could give.

Godel showed that any system of mathematics which is complete is also inconsistent.

So if our lemma is that sapiences are complete but inconsistent, we can still work with them; we just need to understand that certain properties will always be true of the results.

No guarantees of correct answers, no guarantees of a correct explanation of those answers that we can understand, no guarantees of getting the same answer every time.

But I'm inclined to think that doesn't mean it rules out "genuine" cognition, because the same problems arise in biological neural networks. According to generally accepted standards biological neural networks *DO* perform "genuine" cognition.

So if we are interested in genuine cognition, we focus on that. That, we can be sure by an existence proof, is possible. The kind of intellectual satisfaction that Descartes was after ... we've never seen any proof that it can exist.


I don't quite feel on solid ground as to what sense of symbolic you're using here, since we're at the intersection of PLs/computation where "symbol" means one thing, and sapient semiotics where "symbolic thought" means something else.

"symbol" in the informal sense.

I mean "symbol" in the way non-programmer, non-mathematician human beings use the term.

I mean words, of the sort one could speak to an admiral to explain why the ship's brand-new computer, in the midst of combat, suddenly decided to jettison all the lifeboats. The admiral is not interested in formalisms. Nor in program traces or signal propagation records. The admiral wants someone to explain, using words that are part of a shared vocabulary, why the action was taken and who should be blamed.

And that is the sort of thing we can never fully provide. Any answer we give will either be false, or it will be wholly unsatisfying and useless to the admiral because it will not be expressed in words from any kind of shared vocabulary and will not address the concerns and implications of the sudden "decision."

Symbolic explanations

I figure, from the point where we started telling stories (onset of the upper paleolithic), it took us about fifty thousand years to devise a credible way of telling ourselves a story about ourselves — that being the self. It's a flawed technique; the self is, by my reckoning, a fictional character we inject into our story about ourselves, and we pretend that the things actually done by our sapient mind were done by this fictional character. Like a fictional character in an historical novel, though, the self wasn't really there at the time and cannot actually affect the outcome. Sometimes the fiction breaks down; a classic example is the illusion that when faced with an unexpected stimulus we experience a paralyzing hesitation before reacting. (I think Daniel Dennett may have written about this one.) It takes time for nerve impulses to propagate from our senses to where they'll be reacted to, and for the reaction impulses to propagate back to our muscles; but those internal delays aren't part of our self-portrait, so when we construct our fiction about how the self reacted to stilumuli, we have to invent a model-consistent explanation for the time delay; thus the fictional "hesitation" is born.

But we've had less than a century, so far, to devise a credible way of telling a story about computers. If it took us fifty thousand years to come up with a somewhat-flawed story mode for people, we can perhaps be forgiven for having not done as much for computers in less than a century. Granted, this time around we've got much more advanced existing storytelling technology to build on in addressing this new problem. We've made some progress already, with the invention of the concept of a "bug", which in its technological sense is apparently about a century and a half old. The way you explain that lifeboats-ejection to the admiral is likely to involve the term "bug", which the admiral will probably have heard before even if the admiral has greying hair and entered officers' training long before development of the Hypertext Transfer Protocol.

We should probably try to bring this back on topic.

I eventually decided that human beings need some syntax for programming language understanding, and lisps (among a few other languages like FORTH) doesn't have enough of it. These languages tend to have marvelously pure basic paradigms of evaluation and computation, but that purity isn't enough by itself to give most people real clarity as to what their symbols mean.

Conversely, APL with its backwards-martian notation actually requires the user to think about the symbols as opposed to the ideas and that's too much syntax. And Perl has a syntax with so many shortcuts, pronouns, and options that it can look like line noise at first glance. Unless you already know the language it's hard to even identify the boundaries between significant lexemes, and that's also too much syntax.

So how much is the right amount? Python? Python with its indentation-sensitive syntax mandates the indentation level, which is a convention that a lot of us use to keep track of structure; It helps make Python code clear to read. On the other hand it requires the programmer to manually set the indentation level, so there are shortcuts I rely on when writing code (uses of autoindentation in my editor) that won't work.

I remember novelties and inventions that tried to make programs more visual - everybody remembers flowcharts for example. Flowcharts kind of suck at expressing abstraction. Control flow diagrams can express abstraction, but don't do so better than text. There were dataflow languages that tried to express programs as pipes and filters via which data "flowed" from input to output. They were good for some programs and bad for others.

Bavel's power automata were basically a fine-grained control flow diagram where the signals trigger gates for data flow, and I sort of thought they were more important than people seemed to give them credit for. But like most of these things, the actual positioning of various parts of the program in the diagram isn't correlated with the job that part is doing. So all these 'diagram' languages based on nodes-and-edges graphs fail the first test of syntax in that you don't know where to look for the different parts of the program that you're working on - or at least, not without following a bunch of lines that lead off in a bunch of directions only one of which will turn out to be correct. When it's the indentation-level that's the unit of structure, it's much easier to navigate "up" and "down" the parse tree of the program.

There was another language that tried to make the parse tree apparent using a "contains" relation instead of an "indented further" relation. Your program started as a rectangle and then it subdivided it into given shapes for serial statements, control structures, definitions, etc. Your entire program was a rectangle with four straight sides (if well-formed) and tesselated into parts that subdivided into other parts, giving them geometric relations that were supposed to be clearer than text. It was a cute idea, but really they weren't clearer than text. The size of the blocks wound up with little relationship to their conceptual importance in the program, so the structure you cared about was lost in the noise. And they would "stretch" the block making the things you wanted to find tiny. The block diagrams were so useless that the version of it I used had labeled all the boxes with Pascal syntax so programmers could tell what was happening, and a utility that read or printed out the program text as pretty-printed pascal was the development tool of choice.

Funges are *very* syntax-heavy. Originally an example of a nonlinear language (one that doesn't build a "syntax tree" to traverse) in Aho,Hopcroft,Ullman's compiler book, funges tried to make the flowchart idea work for code, but turned out to be a usability disaster.

In fact the funge paradigm is the basis of some of the trippiest of the usable-but-bizarre "esoteric" languages. They were an honest effort to do something good, and maybe there's an idea out there that could render them good. Conversely many of the esoterics are *deliberately* awful.

Stack languages like FORTH also don't really have a syntax tree as such; they just have a linear list of tokens and all the semantic subsetting is done by the stack. I always felt that they were just an abstraction or two away from being a good idea, but I was never sure what the abstraction ought to be. I had an idea to make the stack into a tree, but never developed it. That would give a purpose to at least one kind of syntax and make the programs tree-structured which might help people "get" it. But I'm doubtful.

Anyway, here's what we're up against. We agree that people need some syntax, but what kind? What's it for? 'Cause there are radically different ways to use it. Concurrent programs can be modeled as Petri Nets, but should we use syntax to express the network? Or, more likely, more conventional parse-tree code to say how the "action tokens" are allowed to traverse the network?

Lisp syntax

I was told (or, well, at least I encountered the saying somewhere) early on that Lisp has no syntax. I didn't get it, at the time; how can a language have no syntax? It took me something like three decades to figure that out (can't say exactly since I don't know exactly when I first encountered the saying, but I only figured this out in the past few years). Lisp has no syntax for doing anything. None whatsoever. There is no way in Lisp to tell the computer to do anything at all. The only syntax is for expressing passive data structures; it happens that if you send that passive data structure to an evaluator, the evaluator will interpret it and thereby do something. This is also (btw) where the 1998 paper "The Theory of Fexprs is Trivial" goes wrong: that paper is quite up-front that it only considers the equational theory of source-expressions —syntax— but, in fact, all Lisp source expressions are passive data. All of them. Once you've factored out superficial stuff like insensitivity to how much whitespace you use, the only way any two Lisp source expressions can be equivalent is if they're identical. In order to cleanly model the semantics of Lisp using small-step semantics, you have to introduce some term constructs that do not correspond to any source expression. In essence, these constructs model the action of an evaluator, which is a creature logically separate from the passive data structure of the source code. Vau-calculus has essentially three parts: There's the source-code syntax, which is just a passive data syntax and has no rewrite rules at all associated with it. There's the part that, when externally called upon to do so, renders down S-expresions into computational elements, with rewrite rules that amount to the dispatch routines of a Lisp evaluator. And there's the part that actually does the raw computation on computational elements once they become available from the rendering process. I'd been so successfully brainwashed into believing that fexprs are dreadfully badly behaved and make it impossible to achieve anything remotely like the elegance and power of lambda-calculus, that I'd been working with this stuff for the better part of a decade when it finally hit me that, other than some trivialities, that third part of the calculus to do with computation essentially is lambda-calculus. (Fwiw I did write a related blog post.)

I think I agree that Lisp syntax is a bit too sparse on clues to what's supposed to be going on. At the same time, I approve of both Lisp's elegant evaluator and its clean separation between data structures (which are S-expressions) and control structures (which are basically lambda-calculus). It seems to me what's wanted is a slightly more articulate representation of the data structures. I imagine something that actually looks rather like vintage BASIC; not surprising, perhaps, since that was my first programming language.

It does seems to me, increasingly in the past few years, that what we want is going to be off in some direction that isn't even perceived as a degree of flexibility for conventional programming languages.

Speaking of more articulate

Speaking of more articulate representation of data structures.
What do you think of data visualization? Where shows what the data means in human terms.

Also the beginning Brian Cantwell Smith's "on the orgin of objects" had a really interesting section on how when someone using a computer paint program lays shapes on top of one another they see the intersection of those shapes as a shape to be manipulated in it's own right even though it doesn't have an explicit data structure in the program. I think he suggests that the program could parse the output in order to fluidly move between data structures that correspond the the human understanding of the situation.

Incidentally I hate the nomenclature.

Calling the nodes and connections in an ANN "neurons" and "synapses" makes me grind my teeth. Biological neurons and synapses are not what they model. Or if they are, then the model hasn't been updated since the 1940s. When I'm writing, they are "nodes" and "connections."

I think I would prefer "signal propagation network" to "artificial neural network" but if I used that terminology there's a real chance I'd cause confusion and be misunderstood.

I'm more than a little frustrated by a complaint about "filtering out whatever technological methods can't handle" without giving any concrete examples of what kind of thing you mean to say can't be handled. Because I guess I'm one of those reductionistic, materialistic people who really has yet to see any such example and I actually doubt that they exist.

Our brains are made of atoms acting according to the laws of physics. There's nothing "magical" about them that is beyond any capability of emulation or modeling. Anything that's categorically out of reach of technological methods is also out of reach of our brains.

That said, it's going to be a long time before we see a computer play a credible game of "seven minutes in heaven."

ok Dillinger, you just

ok Dillinger, you just identified one thing that this technological method can't handle what others things that this approach can't handle can you think of?

Oh, I'm sure it'll happen.

When we do get around to simulating brains, the inclination to participate in such games, and the ability given appropriate peripherals do do so credibly, is likely to be emergent in the simulation just as it's emergent in biological brains. All I said was that we're not going to get there any time soon.

Although what counts as "soon" in this field is its own puzzle. It would be "astonishing", at least to me, if the time were less than 24 months or more than 24 decades. It would be "surprising" if it were less than 5 years or more than 5 decades. Which is a ridiculous spread, but it's a very fast-moving field right now and we don't really know yet when the rate of making significant discoveries about ANN's will start to slow.


Just like syntax is fossilized semantics, semantics is fossilized pragmatics.

However I've never seen any programming language with any significant pragmatics to it.

Re: Dillinger's 'Most people need a little syntax'.

As we don’t have access to a reductionist description of cognition (am philosophically inclined to admit such a description is possible), we can only speculate as to where spatial processing might occur in the spectrum from surface features in syntax, to the deep semantic representation of programs. One thing the paper did suggest was that mathematical comprehension uses spatial processing less so than program comprehension. This seems significant, because mathematicians have to spatially structure their proofs to some extent on the page as well, to aid comprehension.

Separating parts of speech and expressing data flow layering in formal languages based on natural language tree grammar, are facilatated by the use of parens, but there are alternatives. Broadly speaking interlanguages (I mentioned in another comment here) involve widening the spectrum between syntax and semantics, to include an additional syntactic layer to handle the syntactic sharing of subexpressions, before semantics are introduced. Having a tree language system based on interstrings, that allows parts of speech to have multiple parents, obviates the need for parens for the expression of data flow layering. In the interlanguage Space, two levels of parentheses are used to separate storage, submodule and code declarations, another for arrays, and in code a further two or three (not involved in DAG layering per se), are the maximum required to express DAG dataflows of arbitrary depth and complexity. A sea of parens is not a requirement for program readability.

side note

I suspect that relating surface syntax to deep semantics may only be addressable in conceptual terms; having access to a reduction of cognition (even supposing such is possible in practice as well as in theory) might simply be unhelpful. Recall Douglas Adams's description of pan-dimensional beings' vast research project to discover the answer to the ultimate question of life, the universe, and everything, which after some millions of years determined unhelpfully that the answer is 42.


Until someone presents such a reduction, you are of course entitled to question its possibility. But if one were developed/found, would it not be useful in natural language processing, and contribute to the construction of a general theory of human-like artificial consciousness? Lots of practical and useful applications would ensue. Not sure if you are alluding to the unreadability of compiled machine code needed to run a machine, versus a high level program.


(I almost didn't make my above remark at all, for fear of sidetracking things again; but, having started this, I s'pose I oughtn't simply disown it...)

I've mentioned I consider myself a reductionist. In theory. In practice, I'm skeptical. But my small point was meant to be independent of how likely it is.

would it not be useful in natural language processing, and contribute to the construction of a general theory of human-like artificial consciousness?

Maybe, though not necessarily. But I suspect understanding how lower-level wetware gives rise to sapient thought wouldn't help us understand the thoughts it gives rise to, even if it gave us the ability to create devices that actually have such thoughts.


Thanks John, I have an interest in encouraging discussion on the idea of introducing a spatial element to syntax. Although this is a programming languages weblog, abstraction from spatial considerations is in the DNA of Lambda. So that kind of discussion may face an uphill struggle here, because spatiality is somewhat orthogonal to the idea of Lambda being the ultimate programming paradigm.
The creation of devices that have sapient thoughts would be a landmark advance, and things like the Qualia problem in philosophy would have less significance (or none in my view). Although I cannot grasp every detail on every level of abstraction of the way in which a high level program is compiled to machine code and runs, I can broadly understand the mechanisms underlying the translation between differing levels of abstraction. This gives a good intuition for understanding high level programs, and potentially thoughts in my opinion. Not sure if further elaboration would be productive given the blank spaces in the current state of the art.


I'm suggesting construction of a sapient machine might have no useful bearing on understanding stuff like qualia. (In passing: I've observed philosophers are often especially guilty of making distinctions that obscure rather than clarify; when a philosopher starts talking about "qualia" it may be time to consider giving up on them, just as I tend to do when an economist uses the word "equity". But, anyway.) I also think successfully building a sapient machine would turn out to be an evil, both because we'd be creating a class of owned people, and because we'd try to cut corners on the things that —sometimes— cause humans to become decent people rather than monsters; though I don't think we even know what to try to do, let alone have any chance atm of succeeding. Anyway: programming languages.

Abstraction from spatial considerations is in the DNA of the lambda paradigm

Maybe, but we're not limited to that on LtU, despite the name :).

Something it took me years to pick up on: Back in the 1930s there were three main models of general computation: general recursive functions (Jacques Herbrand and Kurt Gödel), λ-calculus (Alonzo Church), and Turing machines (Alan somebody). General recursive functions take an equational approach, λ-calculus an applicative approach, and Turing-machines a mechanical approach. And once general recursive functions and λ-calculus were proven equi-powerful, Church was convinced they were both capturing the general notion of calculation, but Gödel remained skeptical — until both were proven equi-powerful with the nuts-and-bolts Turing-machine model. Then Gödel was convinced. Not to put to fine a point on it, Turing-machines use (one-dimensional) geometry. Bridging that gap between abstract theoretical elegance and concrete mechanics also came up again in Gordon Plotkin's 1975 solution to the problem of small-step rewriting semantics: you have a λ-like calculus that's mathematically elegant, and a syntactically similar but deterministic-rather-than-well-behaved operational semantics (that you can be certain really is describing the language you want to describe), and you prove correspondence theorems connecting them so you can use the well-behaved calculus to reason about the nuts-and-bolts operational semantics.

So it does seem there's precedent for geometry having a place in the larger scheme of things.