The Economist: Language and Computers: Why language isn't computer code

The Economist Language blog just posted an essay on the differences between computer code and natural language, taking a negative stance on their relationship. For all the talk of grammar, perhaps the title should have been "Why computer code isn't language".

Some of the qualities said to not exist in PL could be argued the other way ("dialects, idiolects, variation, natural change over time"). Additionally, though PL do not typically have some qualities mentioned, PL could and have been constructed to have some of these qualities mentioned in the article. Am I in the minority for thinking different than than article?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

No offense intended to the

No offense intended to the author of the article, but...

it doesn't bring to this sort of linguistic considerations as much as I could hope for after reading the title.

(I found a bit misleading in the sense the article doesnt lay down enough of the broader problematic to even hope speculate about an answer to the title. But ok, I understand its not meant for LtU readers only ;-)

Indeed, it is very tempting to try have a closer look at the analogies or disanalogies between natural languages and computer languages (seemingly, esp. the PLs, for what the article focuses on).

If only out of "boredom", it's the kind of recurring reflections I've had for myself many times. Not being a very informed linguist nor very informed about PLT either, I now tend to refrain spending time with that any longer.

Here's what I ended up (temporarily?) concluding for myself, at least if one intends to consider things beyond the purely syntactic, semantic (or even, semiotic) issues already documented :

natural languages have, in essence, these unmatched ambiguity and maleability qualities that make them suitable enough for their only capable processor: our brains.

I write "in essence" because they were born, and they develop, evolve, die, are reborn, if only because of and for our brains, individually and/or as human culture collectives, primarily, with a usage (processing) context with unmatched complexity: human life.

No need to invoke the entire universe, there, btw, as far as NLs go: after all, I suppose it's unlikely that a rock, or a super nova, or lake Michigan, need anything close to what we refer to as "natural languages".

By their very 1000s of variables hop by hop or continuous construction process over centuries, their ever evolving syntactic or semantic properties (sometimes even at individual level-grain), they can accommodate very well with a frightening number of purely linguistic, as well as (time and space dependent) extra linguistic factors and tightly bound to the same brain processor that created them:

among those factors, our health, our mood, our number of calories left at the end of the work day, our past traumas, our garbage collecting dreams, our conscious emotional storms or hard thinking efforts, etc, etc.

Contrast with PLs : not exclusively, but mostly a tiny little family of artifacts (another one of our brains, i.e. Turing's) and of a very limited, utterly constrained rational syntactical forms that can fit in what we call the model of "Turing Machines". That said, a nature of inception/birth, which removes nothing to their TC power otherwise, btw. Of course Turing was a genius. No debate to have, there.

Especially since a lot, if not all, of our current and most abstract mathematical knowledge, is precisely reductible by means of machines' work to "dumb simple" encodings over ... natural numbers, that they can add, multiply, map, etc already much faster than we do (probably because our brain is also so busy with other things we have no reflective ideas thereof yet...)

Put simpler:

is the outcome/result of an operational TM which is processing an input "phrase" dependent on time-varying amounts of calories ?

same question about the location in time and space of the ongoing computation ?

same question about "the past experience" of the said TM ? (which, as we know, is basically reductible to "something" close to zero/the void, since you can describe/simulate any TM with the utm in just a few bits you could have written once for all right after the big bang)

same question about the simultaneous physical presence or absence of other TMs ? Or whether those TMs are or are not doing another computation at the same time ?

Etc, etc.

I do not say it is uninteresting or useless to try make parallels or identify essential tensions between PLs and natural languages, but I think our time is better spent in, at least, not trying to define one "too much" in terms of the other(*).

That sort of "mistake", I think, is likely to unavoidably bring us promptly to more and more metaphysical considerations ... eventually rarely productive for either domains (PLs or NLP).

(*) I'm alluding to the direction the article's title "bait" is geared to, in that form of questioning "why X isn't/aren't as Y" ? My guess is trying to answer this sort of thing is in a ballpark alike the halting problem for TMs. Unless an Oracle comes around and decides for everybody with "they aren't the same because ... etc"

A matter of domain

As programming languages expand into the domains of 'programming by example' and 'programming by analogy' and so on, we should expect them to become more similar in function and form to natural languages.

If I understand correctly

If I understand correctly what you're alluding to, I tend to agree re: the remark about the form. Especially as the raise of the level of abstraction confirms itself in the "vocabularies" borrowed from various ontologies for text based syntax or even graphical forms, as e.g., EBNF's, chosen decades ago to give a standard layout for various types of grammars.

I'm much more skeptical as to whether (few or many?) more major breakthroughs are left to discover re: their functioning proper (computational models, if that's what you meant?), past parsing the input phrases, i.e., beyond rewriting based logics or the Turing tarpit. Or as far as the von Neumann computer is concerned anyway.

By that I don't mean there isn't big room left for improvement opportunities on "today's computer" there, but just that their nature is likely to be more about better implementations, tooling interop, usability, composition and reuse, and so on, rather than purely theoretical results of a comparable impact/extent as those of the past century (Godel's, Church's, Turing's, etc).

And as far as the "quantum computer" idea goes, I have no clue how it could/will possibly relate "alone on its own" to our today's way of designing and implementing PLs, say, without any TM around.

I think that we'll see a

I think that we'll see a similar transition point from rule-based to statistically-based programming, as has happened with most of AI.

And then, for example, it will be possible to build programs by providing examples and having the computer generalize.

Which is how we basically program humans.

Statistics / learning by example

I agree that this will be a powerful feature of future programming environments, but think that the proper role for such technology will be as a programming assistant that helps build more traditional programs. There will always be advantages to mathematically simple rules (e.g. facilitating theorem proving), so I don't see functions and logic ever being replaced by a giant ball of statistical mud.

Statistical Mud

Statistics or other soft constraint variations (probability, weights, heuristic cost-benefit analysis) should see deeper and more dynamic application than static (development-time) metaprogramming. Reasons include development of robust programs that can adapt to a wide range of scenarios, and supporting rich HCI that can deal with ambiguities in human input.

There is, of course, a significant role for mathematically simple rules. Rather than one big ball of statistical mud, soft constraints can provide flexible joints, declarative intelligence, and learning models to arrange relatively rigid mathematical/logical components (which may, internally, have more such joints).

Your wording, above, seems to imply a dichotomy between programming assistant and giant statistical ball of mud - a false dichotomy. Statistics, weights, probability, etc. have all been applied directly within logic and constraint models.

Fuzzy wuzzy was a bear to debug

IMO the application of fuzzy rules at programming / edit time when the programmer can review and agree to resolutions is fine and good. Once you start trying to apply those rules at run-time their cost increases greatly because to establish the correctness of the system the programmer or theorem-prover must reason about those fuzzy rules, which tend to bring with them significant complexity (certainly I consider big statistical balls of mud to be complex). For some problem domains this may be essential complexity (the problem really is fuzzy in nature), but in my estimation it's usually accidental complexity and to be avoided.

Note that I'm fine with constraints when they're logically simple. The problem I have with soft constraints is when the correctness of your system depends on the behavior of the opaque constraint solver. If your problem domain really is fuzzy, I'd like to see that encoded with explicit dependence on the big ball of statistical mud. That way such dependence is at least contained to the few places that need it.

We've had variations on this discussion before, though.

Fuzzy waza problem domain

For some problem domains this may be essential complexity

That is an understatement. Most problem domains (and their associated sensor, control, business rules, etc.) are fuzzy by nature. The exceptions are rare. If it seems otherwise, it is only because of a severely biased sample - we prefer to automate those rare exceptions. They are the only domains we know how to automate effectively with the imperative abstractions most readily accessible today.

To shoehorn solutions into simple mathematics, e.g. in order to reason more easily about "correctness" as a binary quality, requires you to either make simplifying assumptions about fuzzy problem domains, or abandon automation of said fuzzy problem domains. Either decision results in huge opportunity and productivity costs.

There are ways to reason about correctness of a system as a soft (non-binary) quality - i.e. to reason about:

  • which parts of the system contribute to its incorrectness
  • the degree of contribution; asymptotic or stochastic bounds on incorrectness
  • spatial and temporal bounds; controlling how incorrect behaviors and partial failures may propagate through space and time and across modules or services
  • resilience and self-healing, ease of repairs, after a cause of incorrectness is removed

In general, this is a better way to reason about correctness than as a binary quantity. By better, I mean: more broadly applicable, realistic, and adaptable. It is very closely associated with reasoning about system security, which should not come as a surprise - our ability to reason about security is ultimately our ability to reason about behavior in an open system. In my opinion, the right approach to robust, correct systems is securable programming models, no more and no less. (Well, I'll also accept well localized white-box analyses for staged programming.)

Introduction of fuzziness does increase complexity - i.e. there are more possible behaviors for a system. Rather than avoiding such complexity, we can control it. Your "big statistical balls of mud" scenario only occurs to the extent we don't provide effective programming models for controlling complexity.

It is difficult to isolate this essential complexity - i.e. if you have some probability of invoking some remote service, it would be ideal to query that service and ask: "hey, if I were to invoke you with command foo, what are your likely responses?". If you can invoke services with such requests, you can avoid monolithic (ball of mud) simulations to model probable responses from every service you might need.

I have been toying with probabilistic models for RDP. I haven't had difficulty with controlling complexity and supporting modularity in these models. Rather, the difficulties I've faced involve controlling performance - preventing combinatorial-cost explosions, supporting probabilistic models for real-time systems. (One conclusion I've reached is that I must have a dual symmetry: exactly as much support for collapsing possibilities as for expanding them.)

Most problem domains (and

Most problem domains (and their associated sensor, control, business rules, etc.) are fuzzy by nature.

Perhaps in some sense, but I doubt they're fuzzy in the same way, such that you can build one big fuzzy constraint solver that handles all of them nicely. Within a problem domain, you can certainly define a quality metric and try to optimize it. My reaction to your comments about baking a protocol for services to exchange information about probabilities into RDP is the same as I've had in the past: too monolithic and special purpose. I understand why you want to have a well integrated set of features, but I'm still skeptical.

I doubt they're fuzzy in the

I doubt they're fuzzy in the same way, such that you can build one big fuzzy constraint solver that handles all of them nicely.

My experience is the opposite. Any model of fuzzy computation (whether based on probability, weights, confidence, cost, etc.) can serve effectively for expressing fuzzy abstractions, patterns, strategies, and policies. Relative to state of the art, it almost doesn't matter how you model the fuzziness, so much as that you model it. All fuzzy computations reduce to search with multiple solutions and optima - so what matters, ultimately, is formalizing, constraining, stabilizing, modularizing, controlling, and integrating this search.

Some fuzzy models might not be optimal for some problem domains. But general purpose programming is about being sufficient and effective and general (not specialized), not about being optimal.

Edit: Also, I've not been suggesting use of global constraint solvers. When I speak of "modularizing" these things, part of that is modularizing (and composing, integrating) local solvers. (Naturally, they must be "partial" or delimited solvers if they are to compose.) Also, 'constraint solvers' aren't essential to fuzziness - only necessary to the extent a constraint logic is involved. But soft constraint logics are where I learned about most of these models, thus is prominent in my ontology.

My reaction to your comments about baking a protocol for services to exchange information about probabilities into RDP is the same as I've had in the past: too monolithic and special purpose.

As usual, it seems to me that your reactions and intuitions are contrary in-the-large to your goals and reality:

  1. fuzzy models are general purpose by definition because they effectively (even if not optimally) support the vast majority of real problem domains
  2. supporting a common or standard fuzzy model discourages developers from inventing specialized, incompatible models for each application; i.e. they'll fit the application to the model that is most readily available
  3. to the extent said fuzzy model supports open composition and extension, each application doesn't need to reinvent a monolithic ecosystem of fuzzy services, thus avoiding monolithic code in practice

Feel free to be skeptical, but do apply some of that skepticism to your own intuitions and reactions. Have you analyzed whether your preferred approach would lead somehow to less monolithic or specialized code?

Even systems that are well

Even systems that are well defined in the small become ill defined in the large; take anything that requires iterative processing! I think better support for statistical reasoning in the language is useful in its own right, and its a simple leap to start thinking about soft constraints and input that is not unambiguous in the pure sense but unambiguous in the statistical sense.

For logic that isn't very fuzzy, I think good feedback can close the loop there. Like Wolfram Alpha: you ask something and it gives you, along with a result, a canonical representation of the query so you understand what alpha understood. So the conversation goes like "computer do this" and the computer answers "here is the answer for what I understood as 'this'". But this is kind of slow, so in a program, you might consider confirmation of understanding more a part of the debugging process that you perform in bulk (not edit-for-edit).

This is all speculation though, we haven't seen such systems yet to really understand how they should work. We should do lots of experiments on both sides of the extreme.

take anything that requires

take anything that requires iterative processing!

I don't follow. Why would iterative processing necessarily make anything ill defined?

Like Wolfram Alpha: you ask something and it gives you, along with a result, a canonical representation of the query

This is more or less exactly what I was proposing: the front end UI can help you with fuzzy / unstructured inputs, but then there is a more structured program being built that is visible to the programmer.

But this is kind of slow, so in a program, you might consider confirmation of understanding more a part of the debugging process that you perform in bulk (not edit-for-edit).

This is exactly what my system does for ambiguity resolution, as I've described in on LtU recently. Basically you just type in text and it uses the whole context (before and after) to help resolve overloads, select "type class" instances, etc. So this is something I'm actively playing with as well.

Load of Crap

I know dyslexic people who are very good coders, yet they are unable to spell certain words right. Programming code is mostly about the mental model you have before your mind's eye, natural language has almost nothing to do with it.

There's a difference between

There's a difference between dyslexia and being able to communicate in a very precise manner. The former doesn't really apply to programming languages, but the latter seems essential.

I was commenting on the Article

Writing skills are important, and writing skills are becoming more important. But both the original author, and the critique in the article, make a bad attempt at comparing writing skills and coding.

Of course, you can choose not to employ people with lousy writing skills. But to make the connection between sloppy writing and sloppy programming in the manner the original author did is nonsense. Moreover, they end with a number of abject platitudes about how 'good programmers' and 'good writers' think and write. I know people who are excellent at both, and people who will never achieve anything in either. Moreover, grosso mundo, as a teacher having taught both, I noted that your average programmer writes lousely, and codes averagely.

Writing is a skill most programmers can learn easily. If they didn't, blame the education they got.

(To put it more concisely, I think there's hardly a relation, except for that you might weed out programmers who are 'dumb as hoots.' And even if the author of the critique makes a number of good points, he also falls into the trap of viewing code as prose himself.)

You're more harsh with the

You're more harsh with the article than I was, but I get your point and I agree.

In the same vein, here's another tension between the two natures of languages that the article failed to point out/vastly overlooked:

even a NON-dyslexic, but lousy, naive or lazy copy-paste programmer still has enough writing "discipline" to have the machine eventually executes his/her program and perform a computation to yield the intended output/result. But of course, at the cost of a spaghetti/hard to decipher/unmaintainable source code "structure" (or lack thereof). And there, "No man is an island". That also applies to coders I suppose. :)

Contrast this with a most skilled and of best education literary writer (who has no clue about computer code and is still absolutely happy so) but who also can fail any time to convince his reader if the latter just isn't ready to accept the conveyed ideas/opinions beyond the form and rhetoric.

That, for one, IMO, is another and BIG essential difference between the two linguistic natures and processing contexts:

human communication and life vs. ... well, LtU's topic realm.

It's a big fail, I think, of the article to not see that it's not just about languages and writers, but also about the natures of recipients/processors for the phrases.

Doesn't one write to be read / interpreted, to begin with... and eventually ? Then, by what, how, and what for, anyway ?

"QED" if I dare write ;)

You're more harsh with the

You're more harsh with the article than I was, but I get your point and I agree.

Yeah, this paper was in defense of the 'average' programmer, and I think I was a bit offset by that the author reacted too mildly. I needed to read the article a number of times to see where the author actually disagreed.

The problem, with so many sweeping general arguments, is that in order to counter such an argument, people often react with a gut feeling which really half-heartedly confirms some of the original prejudices. (It's a rhetorical trick: Start off with a hyperbole and people will meet you half way.)

To give an example: 'Women can't do math, and have no role in academia.' You might disagree with arguments which show that 'A women's intuition is sometimes better than a man's,' or 'Groups work better with women present,' or whatever. Problem is: All those refutations are just as sexist as the original remark, the refutation is almost just as bad.

I normally therefor default by just shouting out 'Baloney!' whenever I see overly general statements, or bad refutations, like these.

Why Language Could Be Computer Code

Our PLs could be a lot less precise and exacting than they are:

  1. Use probabilistic or weighted grammars, to allow for programs that contain possible errors.
  2. Leverage constraint systems and searches for code that achieves goals. This allows the user to underspecify the code, and the compiler to fill the gaps with something that is at least moderately sane. When we don't like the compiler's results, we refine our constraints, much like we refine searches to any search engine. This allows us to sketch our code and still achieve something useful.
  3. Support rough fitness heuristics - soft constraints and weights, cost-benefit analysis, a user model of known preferences. This allows us to push towards "better" models that fit the user's intention.
  4. Leverage paraconsistent logics, which allow us to constrain the propagation of inconsistency. This can allow us to program by analogy and metaphor, without following those analogies "all the way down" to the extremes where they fail in silly ways. This could allow a much richer model for mixins, traits, compositions.
  5. We can develop semantics that reduce commitment to action, i.e. allowing users and developers to understand the consequences of their programs - not just locally, but on a real system - without committing to those actions, allowing opportunity for refinement. I.e. allow takebacks.
  6. Our programming models, IDEs, and UIs can better provide explanations of how they are interpreting the code. This allows users to know when the computer knows what the users mean, with less guesswork and greater confidence. In return, this refines a communication skill in users, who will learn quickly what to clarify up front and what can be left to search.
  7. We can extend to live and "real-time" programming with a real dialog in both directions, where the computer itself can ask for clarification in the case of ambiguity. Live, interactive programming is also a very viable approach UI in the future age of ubiquitous computing.

Prediction: some day, even our written words won't be static things for humans to read. The average documents we write will be live documents, capable of gathering information resources, providing explanations, include training exercises, etc.. But this won't happen unless it is easy to write such documents - i.e. just as easily and imprecisely as we hack English today.

Working on something interesting in this direction.

A few months ago I instrumented the C++ standard template library, and a few other libraries dealing with sorts and searches and collections, with code to (roughly, not precisely) measure time spent in calls.

This is to support a programming language/library that chooses among possible implementations of particular data structures (mainly collections) to minimize overall computation costs given feedback from profiling.

So for example the programmer specifies a 'map' of keys to values, uses a bunch of functions that are defined on 'maps', and the system decides, by profiling those functions on live data, which implementation of maps to use for data structures created by that line of code. If a lot of necessarily-ordered traversals are done, the profiling will cause the system to gravitate to an underlying implementation as a red-black tree, because it supports ordered traversals very efficiently, whereas if a lot of random key lookups are done instead, the system will decide to represent the map as a hash table.

The idea is that the programmer is supposed to write code that describes the problem and desired solution, but not necessarily all the details of the exact implementation of that solution. Because some aspects of the problem change over time as the code is reused in contexts undreamt-of when it was written, and as the nature and sheer size of data sets that people throw at it changes as new uses are discovered for the code, etc, I'm thinking that it should be left to automatic profiling to decide much of the low-level implementation.

Profiling works when the code is run directly in the interpreter, and may dynamically switch representations from time to time. Profiling information is left in a file which the compiler will (hopefully eventually) read and use to find good representations when actually preparing a binary.

So maybe this is an example of the language humans use to communicate to computers getting "less precise." But the fact is that picking low-level details of the implementations gives programmers a chance to get it wrong - and later programmers who are merely reusing the code for a different purpose are almost certainly doomed to get it wrong.

If the profiler can identify a container representation that provides everything the current instance of a program needs, does it efficiently, and wastes no time or space keeping track of information it would need to do anything else, then that's the right implementation to use.

This is an effort to 'have your cake and eat it too' in that you can have language-level types that support particularly rich sets of operations - but if the program doesn't actually use them, then allow the actual implementation to be a more efficient choice. With the performance optimization choices taken care of by automatic profiling, the number of 'types' that the programmer has to worry about drops dramatically.

Why should a human being worry about whether a 'sequence' is implemented as an array or a list? Give him all the operations for both, and pick the representation that best fits the use he makes of it. Likewise why should a human being worry about whether a 'map' is implemented as a tree or an alist or a hash table? A correct description of the problem calls for a 'map', and the rest is just an optimization question that shouldn't be decided without profiling anyway.

There are language design implications, of course. For example, I don't want the situation where the programmer uses ordered traversals out of mere habit where unordered traversals will do, so I make sure that the syntax makes ordering necessarily into a conscious choice, an option that must be deliberately invoked when calling the 'traverse' function. And there are many other such things that need to be left unspecified unless the programmer explicitly calls for them as an option.

This is consistent with a philosophy that code constrains the implementation. So to constrain the implementation choices less, you should have to write less code.

Ray

Re: performance adaptive collections

With the performance optimization choices taken care of by automatic profiling, the number of 'types' that the programmer has to worry about drops dramatically.

Why should a human being worry about whether a 'sequence' is implemented as an array or a list? Give him all the operations for both, and pick the representation that best fits the use he makes of it.

I've sought in the past to develop adaptive collections, though it hasn't been a priority for me in over five years. A programming and profiling tool called Chameleon seems to have attempted it with reasonable success.

My interest in imperative/event-updated collections has waned (they're a poor fit for reactive and declarative programming). However, there are a number of adaptive collection models that still interest me - e.g. regarding automated indexing or clustering. Something like Dynamic Decentralized Any Time Hierarchical Clustering is a close match to my needs in developing declarative reactive state models, especially to the extent that clustering can be learned based on queries (i.e. the assumption being that two records are related if they're combined in a query result).

No important theoretical difference

``I reject the contention that an important theoretical difference exists between formal and natural languages. ... In the present paper I shall accordingly present a precise treatment, culminating in a theory of truth, of a formal language that I believe may reasonably be regarded as a fragment of ordinary English. ... The treatment given here will be found to resemble the usual syntax and model theory (or semantics) of the predicate calculus, but leans rather heavily on the intuitive aspects of certain recent developments in intensional logic. ''
Richard Montague: ``English as a formal language'', 1970.
"There is in my opinion no important theoretical difference between natural languages and the artificial languages of logicians; indeed, I consider it possible to comprehend the syntax and semantics of both kinds of languages within a single natural and mathematically precise theory" (Ibid.)

Interesting and relevant

Interesting and relevant refresher about Montague's thesis.

I'm certainly close to nothing next to him as far as linguistic goes, but his opinion isn't so difficult to challenge when one looks closely. I'm not sure if it has been done before but a possible counter argument that I can think of would go like this:

assuming his claim be valid, then one cannot miss to notice that this supposed absence of fundamental, theoretical differences, however, doesn't help to explain the rather overwhelming practical differences in the respective usages of the two kinds of languages. Rather obvious and objectively observable by anybody, since it's been for a while one of the milestones of AI researchers to make such a bridge between the two kinds, in effectively practical, usable applications.

More concretely, for instance : where is to be found, in formal logic/languages, the peer of the shared pragmatics semiotic properties that we "find" in natural languages that can be as distant as between English and Chinese (on the surface at least), and which are now studied almost just as much as the other two: syntax and semantic ?

"Can I have the salt ?" : how can this sort of mere-politeness pragmatic property be mapped into the formal systems used by computers vs. a direct, and would-be irrelevant semantic interpretation about the phrase's recipient physical capability, there ? Can we even find any useful analogous peer for computers and PLs, which isn't already just a specific semantic case (miscategorized) ?

(Btw: just a classic example, but note that you probably read and interpreted this "can I have the salt" phrase, there, correctly and before seeing the colon sign ':', even though 1) it wasn't even used in its typical pragmatic usage context for a vocal utterance between persons sitting at the same table... and 2) I didn't use the version which ends with "... please ?"

Indeed, I believe that is natural language still working on a LtU page and embedded in a completely different ontology of use... ;-)

With a bit of an initial stretch of assumptions, one could always propose that the purpose of pragmatics for natural languages would, e.g., more-or-less-directly correspond to what is called "metadata" for systems studied in CS/PLT, heck, why not, but then that would still need to be formally exhibited (i.e. proven) as such, no ?

Or maybe I missed that previous work and reference somewhere. (I did attend a few linguistic classes in addition to CS)

Also, if natural languages and the formal ones used in the various logic systems are so close to each other (re: their formal functioning), how come the former's semantic properties are still not practically and easily reductible in undisputable ways (read: objectively, via a single formal process) to any arbitrarily chosen representative of the latter ?

My point is even if Montague's is "right" on the theoretical aspect of the forms and transformations in his "all languages, unified", that still doesn't help us much, if at all, in practice, to answer the sort of question as held in the article's title (granted, IMO, not a very good one anyway).

Or does it ?

No important theoretical difference

Indeed.

One of my takeaways from "Curry/Howard" is that much of the absurd jargon / silly notation found in various branches of math, logic, and computer science is revealed to be inconsequential, and likely often dictated by its sociological role (as currency for academic / corporate politics) rather than by its fitness as a formal language for use by human beings.

John Baez's "Rosetta Stone" paper suggests that Category Theory can provide a common language as an alternative, but ironically, he also says that "Category Theory is to most mathematicians, as mathematics is to most people".

Steven Pinker, Baez, and others work diligently to avoid jargon and find ordinary language expression and everyday metaphors to communicate intricate formal systems, in order to reach a wide audience.

Supposedly Erik Meijer said "The total world's population of Haskell programmers fits in a 747. And if that goes down, nobody would even notice."

Well, one feature of language is that it can serve as a membership badge. If Eric is right, it seems that Haskell has produced a rather exclusive club. Some might see it that as sign of superiority on the part of the members, but I sadly see it as a failure of the language - not in terms of its functionality (I think it is fantastic) but rather in terms of its applicability to human communication / interpretation. It seems it's just not a good encoding for "most people".

In todays world where there's a requirement to communicate with and educate millions of people and not just a handful of phd students, I suspect formal languages can be made to reach a wider audience by leveraging the corresponding capabilities in natural language (or other aspects of our built-in "mentalese").

Obviously this wouldn't include use of proper names, greek and latinate (i.e dead language) word roots, overuse of symbols, over-uniformity of notation etc, that characterize existing formal languages.

Supposedly Eric Meijer said

Supposedly Eric Meijer said "The total world's population of Haskell programmers fits in a 747. And if that goes down, nobody would even notice." [...] but I sadly see it as a failure of the language - not in terms of its functionality (I think it is fantastic) but rather in terms of its applicability to human communication / interpretation. It seems it's just not a good encoding for "most people".

Well, I hear you.

But Haskell's case isn't so bad, IMO; it's not very difficult to find probably "worse", so to speak (i.e., as far as readability "for most people" goes, anyway).

I don't know much about Iverson

I don't know much about Iverson, but the problem afaict isn't isolated to a few nutty people.

Of obvious examples, it's hard to know whether to laugh or cry seeing otherwise credible people offer with a straight face "mnemonics" for their notation or jargon. E.g:

The isomorphism ⌊−⌋ is called the left adjunct with ⌈−⌉ being the right adjunct. The notation ⌊−⌋ for the left adjunct is chosen as the opening bracket resembles an ‘L’. Likewise—but this is admittedly a bit laboured—the opening bracket of ⌈−⌉ can be seen as an angular ‘r’.

Surely, it should make one pause and reconsider the utility of the "language" if resorting to such mnemonics is required.

our language

Wittgenstein says:

Do not be troubled by the fact that languages (2) and (8) consist only of orders. If you want to say that this shews them to be incomplete, ask yourself whether our language is complete;---whether it was so before the symbolism of chemistry and the notation of the infinitesimal calculus were incorporated in it; for these are, so to speak, suburbs of our language. (And how many houses or streets does it take before a town begins to be a town?) Our language can be seen as an ancient city: a maze of little streets and squares, of old and new houses, and of houses with additions from various periods; and this surrounded by a multitude of new boroughs with straight regular streets and uniform houses.

Programming languages are these suburbs with flashy skyscrapers.

The core of our language is about life and death, programming languages have nothing to do with them.

Ambiguity and Redundancy

The article presents ambiguity and redundancy as two separate properties of natural language, claiming that the latter may be intentional to make up for the deficiencies of the former. I would speculate that the former is a direct consequence of the latter (in present natural languages). I would also guess that languages could be constructed that were redundant, but not ambiguous; languages in which multiple words have the same meanings, but no words have distinct meanings.

The claim that programmers cannot be essayists because they do not express styles inherent to natural language is actually annoying. Knuth is clearly referring to the communication of an idea; choosing between the many possible compositions of its parts. A good programmer does this because a non-trivial program can be created in multiple ways. Each of those ways is a selection of implementation choices that affects the communication of both real data and implied constraints and contracts throughout the code. Choosing which of those methods creates the clearest communication of an idea, using styles of coding that are suitable to the problem is the work of an essayist. Unfortunately the author begs the question somewhat by assuming that only styles of natural language expression are valid choices for an essay, and then uses those lack of styles in a program to make his conclusion.

The more interesting observation about Knuth's quote is that in natural language style flows upwards. Writing style is observable at the fine grain, even when fragments are separated from context. I am not so sure the same property holds of programming style as it seems to be more observable if you start at the top and look at the overall design of a program.

The claim that programmers

The claim that programmers cannot be essayists because they do not express styles inherent to natural language is actually annoying. Knuth is clearly referring to the communication of an idea; choosing between the many possible compositions of its parts.

Exactly.

Also, let's not forget we're likely barely past the dawn of serious scientific considerations about the relations between the set of forms and of meanings/intentional definitions that languages, of natural origin or not, can usefully have reincarnated and automated in machines, for, say, large scale human consumption.

Turing's and Godel's major papers are only eight decades old. Chomsky's, on syntactic structures, is even younger, with just six decades. And the idea of the importance of proof carrying code (for reliability and scalability, precisely) for better quality software platforms, in the mainstream trend, even fresher yet. Please be kind to allow a couple more years for Wall Street to catch up on that sort of clues, btw... ;)

Yet, somehow paradoxically, it seems like we've just got the essential of the light turned on, about one facet/fraction of a tool that we've had been using for milleniums to do most of the mathematics and arts we already were constantly busy with ... before realizing (circa in the same 1930's/40's period) that we can have also, at least that fraction of this tool, functioning in non-living tangible things we call machines.

I suspect/I'm afraid that, we like it or not, we're still with a long way to go before 1) we know which fraction exactly we actually got figured out to be automatable, and 2) what else interesting could we get out of it, past the "boring" (to most humans, but not to logicians or proud programmers :) coding thing, still fairly limited to be around fancy I/Os/UIs or clever persistence schemes (for whatever "app", but very rarely of anything resembling close enough our own "intelligence" still)...

It's hard to say a baby isn't much like its grandparent....

It's hard to say that programming languages necessarily are or aren't this or that yet; as a subject of discourse, they are too young compared to natural languages and we haven't yet seen all of what they are or will become.

I reject the author's points that programming languages are fixed in syntax and semantics and do not give rise to dialects and regional variations. Even within a single implementation, rigid as its lexer/parser may be regarding syntax, what is considered to be "good style" in programming evolves in much the same essential way as good style in natural languages. There is frequently a major "translation" job involved in taking code that conforms to one shop's style (sometimes expressed in a "style guideline" document) and making it conform to a different shop's style -- even when the two styles exist for ostensibly the same reasons!

The major variations are more by discipline than regional, but socially, it's similar; people tend to write, and most easily understand, code in the style of the code they most often read and work with. And when "good style" in programming has drifted too far to be easily expressible within the syntactic constraints of a particular language implementation, new implementations of old languages are created, and new languages are created.

We give new languages different names. But really, is there any way to understand Java except as an evolution of C++? Or C++, except as an evolution of C? Or etc... New languages which are actually new are rare. Most "New" programming languages present a dialect of a previously-known programming language, borrowing notation and syntax and even most of semantics wholesale from an extant favorite or two and inventing only a few, if any, new features to facilitate a particular idea of "good style" in programming, or restricting a few features of a previous language use of which has come to be regarded as "bad style" or which are seen as adding too much syntactic or semantic complexity.