What are your thoughts on the idea that "succinctness is power"?*

I've been thinking about it a lot lately, especially with regards to stack-based languages, but I can't quite solidify my thoughts on the subject.

* Presumably 'power' means 'better' in this context. While this isn't always true, it seems valid enough for general purpose languages.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Probably Not

Succintness is a correlate of power, a lot of things are a correlate to power. It's hard/impossible to make quantitative comparisons to something that is qualitative (ie. language X is more powerful than language Y), though in some cases it seems clear, but to what degree? Succintness can be in the small scale or the large scale.. PG seems to be more concerned with the small scale.

Reductio ad absurdum

It seems to me that succinctness is what programming languages are for.

For most users of computer languages (software developers) programming languages are for specifying the implementation of software systems that are correct and easily maintainable. Succinctness is at the bottom of a long list of desirable qualities.

To test the truthfulness of the "succinctness is power" hypothesis we simply need to consider the case of any maximally succinct language. That would be a language with maximum information content (i.e. no more compression is possible). Such a language would clearly be utterly unusable.

While I agree that some degree of succinctness is important (clearly we don't want the expression of basic algorithms to be an order of magnitude longer than it should be) the challenge with developing software isn't writing the code, it is finding bugs and making changes. It is more important to the vast majority of software developers that languages make those tasks easier than to allow us to type as fast as possible.

Absurdumitude goes both ways

To test the truthfulness of the "succinctness is power" hypothesis we simply need to consider the case of any maximally succinct language. That would be a language with maximum information content (i.e. no more compression is possible). Such a language would clearly be utterly unusable.

I suspect that the definition of "succinct" that Graham has in mind is along the lines of that given by the American Heritage Dictionary: "Characterized by clear, precise expression in few words". "Maximally succinct" is therefore not some sort of equivalent to "maximally compressed" or "cryptic", both of which are almost the opposite of "succinct", due to not being clear.


That's an interesting point. I had always assumed PG thought that less tokens = better. He basically says that in his essay, but now I realize that there's an invisible clause attached to his statements along the lines of "don't take this to the extreme."

Of course, that brings up the question: at what point does this clause kick in? Where do you draw the line?

Reading between the lines.

but now I realize that there's an invisible clause attached to his statements along the lines of "don't take this to the extreme."

Your assumption that Paul is advocating less tokens, I think is a sound one. His essay is clearly emphasizing brevity as the most important quality of code.

Where do you draw the line you ask? For a general purpose programming language we could do the following thought experiment: imagine writing a million line program in a language. Now imagine giving it to someone else who is not an expert in your language and ask them to find and correct the bugs.

I think the line is precisely where compacting code any further would start to have a negative impact on debugging. Of course, this only applies to languages intended to be used on medium to large software projects.


Yes, that seems logical. But would it not be wise to simply invest time into learning the less verbose language? If shorter really is better, then presumably the small start-up cost would quickly be overshadowed by the benefits, would it not?

Brevity costs clarity

Th cst f smllr cd s tht t tks mr tm t prccss rgrdlss f yr lvl f prfcncy.


I don't understand. How does removing vowels from English words make it impossible to be fully proficient in reading them?

I think the suggestion's

I think the suggestion's more that t's stll slwr hwvr prfcnt y r[1]. This isn't true in general though - increased proficiency can allow someone to quickly recognise and work with a conceptual framework that enables much faster communication overall.

[1] it's still slower however proficient you are


But how does his example show that? Why would removing vowels prevent somebody from ever being 100% proficient at understanding it?

It was never said that it

It was never said that it does:

"The cost of smaller code is that it takes more time to proccess regardless of your level of proficiency."


That was my fault. By "100% proficient," I meant that it doesn't take any more time to process than the base case.

There are sometimes

There are sometimes ambiguities that can only be resolved with context, which I think's likely to slow anyone down at least somewhat.


Ohhhh! Wow. I completely missed that. Good point. So once a language becomes concise to the point where ambiguities arise, comprehensibility goes down? That sounds true, although I can't think of any programming languages that are ambiguous in that manner. I suppose the definition of "ambiguous" could be fudged a bit to mean "not initially apparent," although even then I'm having a hard time coming up with examples.

Pretty much anything with

Pretty much anything with sufficient overloading, especially where multiple (conceptual) types have the same representation but you're working in an untyped language. If you get the conceptual hints early and in convenient places you do less work inferring the relevant framework in which to understand what's going on.

changePalette or launchMissiles is far more informative than OUT portNumber!


Those seem specific to a set of circumstances, rather than universal to the language. After thinking about it for a while, I've come to the conclusion that program length should be minimized along with the amount of work your brain has to do. (ignoring the time it takes to actually learn the language, of course.)

Does that make sense, or am I delusional again?


I've come to the conclusion that program length should be minimized along with the amount of work your brain has to do.

This seems odd. The amount of work necessary to understand a program has a direct impact on maintenance costs, with a high constant multiplier and superlinear order of growth. The program length has comparatively small costs (disk space, build time). Why wouldn't any rational developer optimize cost of understanding at the expense of brevity?


Whoops, sorry, that's what I meant. I forgot to point out the relationship between the two. Actually, I suppose I could just drop off the "program length" clause completely: "power is inversely proportional to the amount of cognitive load."

How does that sound?

Brevity and Clarity

Paul said:

If smaller source code is the purpose of high-level languages ...

This is very clear: he is talking about compact code. As a result I was interpreting the definition of succinct as more along the lines of the Random House Unabridged Dictionary definition:

1. expressed in few words; concise; terse.
2. characterized by conciseness or verbal brevity.
3. compressed into a small area, scope, or compass

It was my impression that the article was focusing on brevity as opposed to clarity. I believe that both are important chracteristics in a programming language. However I think it is safe to say that clarity trumps brevity when we look at how effective a programming language is for the purposes of developing software.


I've just been reading about the K language which can only be described as 'terse'. It has a small number of heavily overloaded tokens. For example, the '!' symbol has three different, unrelated meanings. K programs are undoubtedly brief but I think the ability to read them must be hard won.


Interestingly, we had a discussion on that topic in my (grad) PL class last week. The prime examples of extremes are APL and COBOL. APL is close to maximally succinct, but sacrifices a huge amount of readability for that. COBOL is probably not the most verbose language around, but it is definitely a contender for that title. In fact, it is so verbose that it loses clarity because of all the syntactic noise!
From a more modern perspective, one can compare Java and Haskell. Neither are at the extremes of COBOL and APL (respectively). But Java is still very noisy, while sometimes Haskell can be a bit too terse. (To me, too terse == prevents comprehension by a competent reader.)
Paul Graham clearly knows about the pitfalls of APL:

Reminder: What I'm looking for are programs that are very dense according to the metric of "elements" sketched above, not merely programs that are short because delimiters can be omitted and everything has a one-character name.

In any case, this issue is backed by sound theory, namely Kolmogorov Complexity and Minimum Description Length. This has been applied to software before - see for example T. Veldhuizen's paper Parsimony Principles for Software Components and Metalanguages (and notice a paper of mine in the references where I use this idea in the context of computer algebra systems.)
There are some fascinating results from logic which shows that this is a non-trivial issue. My favourite is probably in the paper The succinctness of first-order logic on linear orders where various logical fragments (of first-order logic) are proved to be exponentially more succinct that others. I have often bored my colleagues with repeated cries of

First-order logic == assembler

As they know that I am a dedicated programmer of high-expressivity languages (mainly metaocaml, Haskell and Maple), they understand my disdain for writing specifications in such an impoverished and verbose language. If you want to see a logic I like, read up on Chiron: A set theory with types, undefinedness, quotation, and evaluation by my colleague William M. Farmer.

The eye of the beholder

The prime examples of extremes are APL and COBOL. APL is close to maximally succinct, but sacrifices a huge amount of readability for that.

Thanks very much for the interesting-looking references.

Regarding APL, I certainly understand that many people find it hard to read. But it's possible to get comfortable with it, even good at it. I used it mainly 25-30 years ago, but was surprised how well I could read from the FinnAPL Idiom Book when I pulled it down from a (dusty) shelf the other day. Perhaps APL takes the single-character operator idea too far, but in smaller doses it works quite well, for example in traditional written mathematics or (I think) in Haskell.

Of course APL will look pretty foreign to those who are more used to Pascal/C/Java, etc., but then Mandarin will do so to those who know, say, English, Dutch and German. I assume that my history with APL would help me learn J or K, but then perhaps even my exposure to SML and Haskell would help, relative to someone with only an imperative language background. (How much of this is syntactic, how much semantic? Does Mandarin differ semantically from European languages as much as it does syntactically? I think Chung-chieh Shan would be the person to ask re the latter ... .)

I must admit I was surprised to see someone in a blog post elsewhere refer to Haskell as "looking like line noise": it looks very much like math notation to me anymore. It is more concise than ML, which seems a bit cluttered with keywords to me now (fun, orelse, andalso, etc.). But I must admit that I pronounce the of from ML's datatype declarations when reading data declarations in Haskell: the direct juxtaposition of capitalized value and type constructor names in Haskell's version can be a bit confusing, especially for beginners. Perhaps this is one place where Haskell went a bit too far in the "succinct" direction.

Anyway, I hope you will indulge me in coming to the defense of APL, as an old friend of mine (I received my first paycheck using it back in '76). And I hope that everyone here would admit admit that APL was a bold and inventive stroke by Iverson, especially considering the time.


I basically agree with your comments about learning APL (which I used just a little about 20 years ago). With effort, it is possible to get comfortable with it. But the effort is considerably larger than for more verbose languages.

About the value of the APL experiment: I emphatically agree. Iverson deserved that Turing award. What is sad, to me, is that other languages have yet to 'learn' enough from APL's successes. APL is way more polymorphic than Scheme for example [here I mean the language plus the standard library]. The only work that I know why has attempted to give types to that kind of polymorphism is Barry Jay's FISh. APL's polymorphism made sense (unlike the ad hoc polymorphism in Matlab); why can't it be expressed in Haskell?


Well one of the problems I have with this, particularly coming from PG, is what's allowable as succint code. If you look at "the arc challange" (same page) he attempts to show off arc by demonstrating some web task with 23 parse tree nodes. The problem being of course that it's just a call to a library, later some one posted a smalltalk ( a language often called verbose) example that was about half the size, but again, used the seaside framework. is worth looking at with regards to this topic.

that said I'm mostly in agreement with

Power = Work / Time. Let's

Power = Work / Time.

Let's pretend that Time could be thought of as length of source code, and Work is the result of executing the code. Therefore if we have two programs that have the same results when executed, one written in a verbose language and one written in a succinct one, the succinct programming language is clearly more powerful. :-)

if only...

If only physics was that universal. We can always dream :-)

Power = Mass * Distance / Time

What are the programming equivalents to mass and acceleration? ;D


Isn't work force * distance?


I meant.. Mass * Acceleration * Distance / Time


Perhaps the folowing analogy would work:

Mass ~ Scalability
Distance ~ Implementation Progress
Time ~ Effort

I think succinctness has to do with:

  • Avoding architectural artifacts in abstractions (e.g. syntactic, semantic, computational, hardware, and algorithmic artifacts).
  • Constructing and leveraging context over expressions (e.g. the Forth stack as implicit context; the APL array as implicit context; DSLs leveraging the implicit domain context).

It's really more about succinct abstraction and domain modeling a.k.a. bottom-up programming.

Paul Graham is coming from his philosophical logic background: the most succinct kinds of logical expressions are the most powerful (IOW, the ones that can compose many predicates into a single one).

He did forget to mention the other important quality of a programming language: applicative universality.

According to Paul Graham's definition, I think the most powerful type of programming modulo universality is constraint programming.

A few other thoughts....

One other thing that is important is elimination of unnecessary redundancy. Some redundancy is a good thing (i.e. requiring declarations of things), if and when it can increase the syntactic distance between two valid (but different) programs and thus make it more likely that a typo will result in an invalid program, rather than a valid program that does the wrong thing. Excessive redundancy, on the other hand, makes maintenance painful and increases the chance of error due to inconsistent specification.

Possibly stupid question...

I ran searches on Google and Wikipedia for "applicative universality", and wasn't able to find a definition of the term. What does it mean? Do you know of a paper/article/blog post that describes it?


It's not a technology or a paradigm. It just means "the quality of being universally applicable", meaning that there are no limitations to what one can program. I could have said "universal applicability", but I wanted to emphasize "universality" more. Put in another way, it means "the quality of not being limited in the number of domains which one can program." LISP features compile-time metaprogramming, which is another way of saying that one can model/program/implement the compiler's domain, so it has more domain applicability than languages without CTMP.

An example

So a piece of software that allowed the user to enter equations and solve them (for example) could be seen as having a lot of power, but not much applicability, because it can only be used to solve one type of problem. Although, perhaps it would be possible to encode other types of problems that the software wasn't designed for as equations and solve them anyway, but that would be annoying. It would probably take a lot of code to hack something together, thus losing all of the software's expressive power.

Think of trying to do numerical processing...

in the Bourne shell. Or trying to write a business application in Mathematica.

I think that I understand.

So, some language features may give some expressiveness, but only for very specific problem domains. Other features, like first-class closures, for example, are more general. People who talk about one language being more "expressive" than another are suggesting that one language has more of the widely-applicable, expressiveness-enhancing features than another. Going back to first-class closures, some people might argue that a language with first-class closures is more expressive than a language with function pointers, or a language with nothing of that nature at all.

Expressiveness is a double-edged banana

Separate from if something is more or less expressive - there's the impact the expressiveness can have on the users of the language. As in, scaring them away.

With art, sometimes having constraints lets you actually make something (a) more quickly and even (b) better. I think there is even some truth in that when it comes to programming languages. It is a trade off of knowing who is going to use the language vs. how much 'freedom' there ought to be. Like, if Java were gBeta, or even Scala, would it have gotten the traction it did? Personally I think not.

I agree

Different language features can have a negative impact on the spread of a programming language, and popularity is important. I suppose that I think a lot about expressiveness because... it just interests me, independent of whatever effects it might have on the choices made by the programming community.

The right verbosity is subjective

How about this: the core of the language is anything, I don't care what the ASCII is, as long as it has the semantic things desired (for me that would e.g. include as much type inference as reasonable). But then on top are different visualizations of it. So if I have a hard time with the typing of things, then I want a representation that surfaces all of the types, like Java. On the other hand, if I don't need that I can surface it like something with type inference.

Like, I thought reading AsmL was much easier to read than C# even though / precisely because it was more verbose in places (unfortunately I still can't find the paper where they showed them side by side).

directness, comfort, completeness

"Succinctness" seems the wrong concern to me.

As a prelude to the following I must first assert that the "power" of a language is always relative to particular problems to be solved using the language. For example, if the problems are to produce certain specific sequences of machine instructions, then assembly languages win hands down. Conversely, those same languages are a burden if the aim is to implement, say, a web browser.

"Power" is also relative to who will be solving the problem. For example, an accountant may find a spreadsheet language to be almost unbeatably powerful even if a lisp programmer might replace 100 of the accountant's equations with but a single line of lisp.

"Directness" is one element of a language's power: does the notation of the language admit expressions over terms that directly correspond to the problem domain? For example, if the problem is to write a binary tree algorithm, can I form expressions over Nodes, LeftChild(), RightChild(), Parent(), etc? or am I constrained in a less powerful environment in which I must constantly transliterate and speak of, say, indexes into various arrays in which a tree structure is encoded in some ad hoc way? (Of course, is my problem is to express how to efficiently encode the trees in arrays I want the indexes and so forth. If my problem is generally express, say, splay operations then I'm better off with the abstract data types.)

"Comfort" captures the role of the programmer in determining what is and what is not powerful. Steele's Fortress language, for example, promises comfort for scientists who write most naturally in "math textbook" notation. On the other hand, lisp notation is more natural for programmers who want to express transformations on syntax trees. The amount of conscious "translation" required between the most humanly natural notation and the programming language notation is a factor of power.

"Completeness" is the most difficult and subtle aspect of programming language power. It elaborates on "directness" and "comfort" by demanding equal rights for any constructible, higher-order concept. Support for higher order functions is a familiar example, of course. Support for closures. Other higher order constructs matter too. The idea is that if I have a language L in which I can, by hand, following some mechanical construction, create an arbitrarily precise approximation of some higher order construct by iterating the mechanical construction, then the language L' is "more powerful" if it lets me express the entire induction in a single expression. So, a macro-assembler is more powerful than an assembler. Lisp macros and closures are more powerful than C macros and C function function pointers.

To the original question:

The "stack basis" of a language would seem to me an obstacle to power for most ordinary problems and ordinary programmers. It is an obstacle to directness since values of interest must always be referred to by their position on the stack rather than by their logical role. It is an obstacle to comfort because to think about snippets of code a programmer has the burden of maintaining a conceptual model of the order of values on the stack. (Some comparison could be made here to Subtext's observation that linear boolean expressions are not especially comfortable, compared to tables.)

On the other hand, the "concatenative basis" of a language could well be a boon to both directness and comfort. We can understand concatenation as an avoiding of renaming (for parameter passing). Concatenation is simply the propagation along a thread of control of "context" -- of a domain of discourse. Many human languages (natural and artificial) rely on an implicit state (for example, we have pronouns).

The difference between more and less powerful concatenative languages would be, in this view, a difference in how well they support moving away from indirect concerns like the layout of a stack and moving towards a malleable model of context in which natural pronouns, referring to problem-domain entities can be referred to. The ease with which a concatenative language can address those problems is, I think, closely tied to its completeness.


shell script

I like the succinctness of shell script, especially on the metaprogramming end (and, with the 'rune' thing I mentioned b4, that's a style still under development after a bit of a very unfortunate stall...) . It does take some some to wrap one's head around good old /bin/sh, but once you get the hang of the variable substitution and expansion operators, you never (very often) have to have the overhead of 'sed' again, plus lots of other cool stuff that i hope someday to actually write down (that is if a crazy ex doesn't delete all my CODE *again*).


Unix programming exposed

It's maybe time to (re)read chapter 8 of The UNIX haters handbook...

read it many times

and i'm still



a hypothesis

from th' handbook:

At the risk of sounding like a hopeless dream keeper of the intergalactic
space, we submit that the correct model is procedure call (either local or
remote) ...

So, with the benefit of hindsight — although without the benefit of LispMs — would we in the 21st century consider this a correct model or hopeless dream?

Counter hypothesis

The counter seems to have been laid out well in Sun's A Note On Distributed Computing.

The gist: distributed failure modes means that remote calls are fundamentally different from in-process procedure/function calls and hiding that is ultimately both impossible and undesirable.

It may look the same on the outside...

...but if it takes an order or two mangnitude longer to complete, you'd better be writing different times of functions to take that lag into account, or else, you're just wasting your time and bandwidth.

Remote calls should take *at least* (I'm thinking) double the round trip time for a nullcall to be in any case worthwile. Applications like raytracing, compilation and optimzation/compression are some obvious application (with proper data chunking), so maybe it's time to start

thinking different

about how we use RPC and GRIDS to solve our problems.

Or at least come up with some interesting problems...I don't think GOOG has the market on that even though they have a pretty big PARALLELL SUPERCOMPUTER at their disposal...