Do names and symbols really imply semantics? If so what to do about it?

Some languages, like APL, are written in Martian script. Their proponents insist that a set of characters humans have never seen before in their lives have intrinsic, unambiguous semantic meanings which can be inferred directly from their shapes.

I don't think that is true. Not to humans, anyway. Martians probably see differently.

Some languages have strong requirements about the semantics of operators, based on the familiar semantics and relationships of those operators. For example there was a beta release of Pascal 7 (from Borland, around 1995 I think) that allowed people to overload operators, but warned that the compiler would simplify expressions according to the precedence rules the language set for them and the distributive, associative, etc. properties implied by those operators before any definition overloads were looked up. If an operation was commutative, the compiler was allowed to reorder its arguments arbitrarily. If distributive expressions like (a*b)+(a*c) would be reduced to a*(b+c) before looking up the operator overloads. If you defined any two relational operators, the compiler would automatically define the rest using the identity axioms. You were not allowed to define three or more relational operators. Etc.

This strong assumption that the semantics of the overloads must follow, in most respects, the semantics of the operators whose names they were using, really upset a lot of people who wanted to use '+' to concatenate strings or wanted to use '<=' and '=>' (which is how Pascal spelled 'equal-or-greater' as some kind of redirection operators. A lot of it (but not all of it) got changed between beta and release.

I was never really convinced that changing it was the right thing. It seemed perfectly reasonable to me that these familiar symbols should be constrained to have the familiar semantic properties that we infer when we're looking at an expression built of them. I thought people who wanted string-concatenation operator or a redirection operators should be using different names for their operators, like '$+' or '|>'. Sadly this was impossible as there was no way to define new operator symbols. This deficiency remains true in almost all languages that allow operator overloading.

In Scheme there is a strong assumption/tradition that any variable whose name ends in the character '?' will be bound to a procedure that returns a boolean value. The same idea in common lisp is associated with the trailing character 'p'.

The tradition in scheme goes on that any variable whose name ends in '!' is bound to a procedure that has a side effect. Many schemes will produce a warning if any such variable is ever bound to anything else. And some consider it an error if a side effect is produced by any procedure not so named, because the name is considered to be an important warning to the programmer that a side effect is possible.

Scheme used to have 'indeterminate digits' in numbers, written '#', so for example 123# denoted 'one thousand two hundred and thirty-something,' an inexact integer. This got scrapped once it became clear that implementors had no interest in keeping track of how many significant figures of decimal accuracy a calculation represented. They were only using 'inexact' to mean 'ieee754 floating-point hardware representation' and were openly hostile to the notion that anyone might hope for it to mean anything more. Many regarded it as a contradiction in terms that something could be both 'integer?' and 'inexact?' And I think in the current scheme standard it may in fact be a contradiction in terms. But getting rid of it meant abandoning the possibility of tracking significant figures of accuracy. So that character used to have a semantic meaning and purpose, but nobody valued that purpose.

Early basics had 'type sigils' so that the programmer (and the interpreter) could know the type of a variable just by looking at the name. $foo always referred to a string, .foo always referred to a floating-point number, #foo always referred to a binary value, etc.

People made an awful lot of fun of those type sigils in old basics. Until the same people turned around and started using Hungarian notation to keep track of the types of their variables in C++. Because keeping track of the type was HARD, and they wanted to be able to see what type something was by looking at it. So they defined their pszName and hwndInteractionWindow and their dwIDnumber and so on and didn't think about basic's type sigils at all because this, after all, was something different.

And after all these examples of naming semantics. How much semantics is it reasonable to expect to be able to infer from syntax? How much is it reasonable for the compiler to enforce, based on the name alone? And what relationship does programmers liking it have to helping them produce good code?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Pragmatics to consider

Viewed purely from a mathematical perspective, a case can be made for handling commutativity in the way that Borland apparently did (I had not known about this - thanks). But from a pragmatics perspective, the behavior of overloading is now fairly well established and would create confusion if changed.

Though I can definitely see an argument for stating such things as part of mixfix definitions.

Programming languages considered harmful?!?

I think you (Ray) are poiting towards this old (but true) observation...

"Programming language" is a historically accidental category in the following sense:

In practice, we have compiler stages that transform some abstract syntax into machine readable instructions. (In this context, "compiler" is meant to include programs which are "interpreters".)

We have various compiler stages that do transformations from one abstract syntax to another. Optimization is one possible motive. Systematic semantic changes to a program can be another (e.g. macro expansion in scheme).

Finally, we have parsers, that translate hand-built representations of abstract syntax into the data structures to feed to later stages. (Here, "parser" includes not just grammars over text streams but any structured objects that can be interactively created -- such as but not limited to dataflow diagrams.)

So the activity of programming can be seen as a production process that breaks down into maintaining a surface representation -- human made documents, diagrams, and so forth -- and a stack of translators that eventually end with compilation/interpretation. There is no principled reason why these stages of translation aren't richly modular and re-composable, though interestingly, in practice, there are no real widespread realizations of such modularity other than internally to a "programming language" (e.g. lisp family languages are hackable and notorious for the ease with which they allow the invention of new abstract syntaxes, mixing and matching them, composing compilation stages in application-specific ways, etc.)

In that mindset, where programs all use specific notations but none is necessarily regarded as the overarching "programming language", and where notations are application specific and translation-technique specific but multiple notations happily co-exist and cooperate, something like a purely syntactic transform that uses algebra laws to reduce the number of multiplications (as in Turbo Pascal) are applicable when they are -- when that stage is chosen during program construction as part of how the program will be translated -- and are irrelevant otherwise (as when a notation really really wants to use "+" for both arithmetic and non-commutative
string concatenation).

If we're in that mindset - that there is no fixed language, just application-specific translations from chosen notations through various abstract syntaxes to compilation/interpretation - then your question "Do names and symbols really imply semantics? If so what to do about it?" is really one about "rules of thumb" for the design of domain-specific notations. In some contexts, it turns out, people use "+" to mean a commutative operator, in others a non-commutative, sequential operator -- be aware of this when designing and documenting notations -- here or there is how one might decide whether or not to reserve "+" for one kind of meaning or another.

But we can't have that discussion because history gave us the "Programming Language was Oevre" -- a Programming Language™ is a static, "general purpose" notation with fixed stages of translation, attributable to a single author (an individual or group), often a specific type of commodity (tying it to "the economy").

Programming Languages™ are, perhaps fetish objects -- delusions that come from product standardization taking precedence over modular, flexibly composable translators and surface syntaxes.

This was pretty much the main point of lisp by the time it got to my ears, at least as far as I'm concerned.

language systems

The notion of modular, flexibly composable translators surface syntaxes has certainly been explored. Wyvern has its type-specific languages. Scala has its DSLs. Lisp has its reader macros. However, such mechanisms haven't really entered the common consciousness of programmers. It's never a go-to solution. I wonder why.

Is it that people don't want to learn to read and write a dozen problem-specific languages? Or economics and opportunity (e.g. by the time the benefits for a DSL for a problem is recognized, the program is half-written and developing a DSL would be significant scope creep)? Or integration with tooling? Alternatively, are we losing the opportunity because the existing mechanisms are deficient in some subtle manner, forming a barrier for practical use?

The language system I'm currently developing, Glas, attempts to resolve some deficiencies with existing mechanisms. Syntax is per-file, i.e. we compile `filename.ext` using a function from module `language-ext` (with a special case to bootstrap module language-g0). Per-file syntax should simplify integration with external tooling. There is no implementation hiding by modules, just a standard representation for language module functions as tree-structured data. This simplifies reuse of behavior models, e.g. a web application might transpile useful subprograms to JavaScript to run client-side. Implementation hiding can be supported by having a program check whether a subprogram directly observes certain data.

But this is an experiment. I'm not convinced that people will actually want to develop language modules except perhaps to create or extend general-purpose surface syntax to better fits their preferences. (The g0 language used in bootstrap is essentially a Forth, so probably doesn't match most programmer preferences.) I do hope to develop at least one general purpose surface syntax above MySQL or another database-in-a-file format to support graphical programming. A smooth transition and integration between textual and graphical programming was another motive for design of the Glas language system.

"The notion of modular,

"The notion of modular, flexibly composable translators surface syntaxes has certainly been explored."

Not much at all in popular practice making efforts like Bespoke interesting.

Popular practice requires

Popular practice requires popular interest in such features, or enough utility to gather it. Does this interest or utility exist?

Perhaps graphical programming is necessary to really achieve multiple syntaxes without becoming a complete mess of incomprehensible text. Each 'syntax' becomes an applet, and could have its own help function and mini-tutorial. But it would still become difficult to grok or maintain after you have thousands of user-defined libraries each adding a unique bundle of applets that a maintainer must learn.

If control of applets were centralized, I think it'd be closer to a conventional Programming Language™ with one keyword per applet. But it might be easier to learn and maintain.

Enso language is another interesting direction to take graphical programming. There is a graphical view that is quite flexible, but each statement has a consistent textual syntax that is readily accessible.

popular practice

A couple of notes:

I very much like the idea of thinking in terms of an abstract syntax that, sure, might really shine in a graphical display but is also accessible and tractible in other forms. For one thing, it shows that somebody gave a damn about accessibility and that's all too rare a thing in this weary world. For another, it takes a certain clarity of thought and attention to detail to have an abstract syntax that's that good.

I'm trying (not very successfully, judging by replies) to problematize and dissolve (or least question) the idea of "programming language" with all the baggage we've attached to it. That's why I suggested "notations" the single example of which is merely a "notation" with interpretation a toolbox of stages of transformation.

There is a kind of math heavy heritage to the concept of a programming language that sees a "program" not as a situated computing system but as a set-theory-style map from input conditions to out, some notion of time or other, and above all -- standardization as a commodity (at least in the extension). Cobol exists independently of any implementation. One can buy (or could) instances of cobol translators. Cobol translators could be compared for efficiency, accuracy, price, and so on.

I don't see that commodification of the invented category of "standardized programming language" as a necessary state of affairs. I see it as a hindrance on building systems. Worse, it results in accumulations of legacy systems with very tall stacks of dependencies --- flaky infrastructure is the rule rather than the exception because of this.

notations at scale

For systems at sufficient scale, we'll still need to consider modularity, extensibility, integration, architecture, security, etc.. This is a natural consequence of the scale, cf Conway's Law.

Notations for programmable systems deal with scale - human organizations collaborating, concurrent development, many moving parts, etc.. I believe that much "baggage" associated with "programming languages" is a natural outcome of solving problems of scale - a problem that does not appear for most other notations, to the best of my knowledge.

And due to the nature of humans attempting to collaborate at scale, if an interface isn't "standard" it will be "convention" or "contract" that eventually calcifies to the extent you might as well document it and call it a de-facto standard. Where we don't need to collaborate, we don't need standards. But software and the programmable systems it controls and the interests involved are too big, too complex, too political to fit within the small hands and minds of individual companies, much less individual humans. Collaboration is necessary, and standards are inevitable.

I'm not convinced we can avoid most baggage of programming languages. But I do believe we could organize that baggage much more effectively. Some things I think we've done poorly:

  • Modules shouldn't hide implementation. I.e. modules should provide recipes for behavior in an easily processed notation, which can be rewritten and adapted for use in different contexts. Not an opaque application of behavior. This requires reconsidering whether IP protection should be technologically enforced.
  • Internal DSLs are a bad idea because they require external tools to adapt to the language system, instead of simultaneously adapting the language system to the external tools. Per-file syntax, e.g. where file extensions guide interpretation, is a much cleaner solution that allows individual components to be expressed in various notations such as Cobol or Python or music notation or MySQL databases or whatever.
  • Ambient effects and their relative, foreign function interfaces, hinder analysis and adaptation of code to new contexts. Essentially, they require too much ad-hoc, decentralized contextual knowledge, raising a barrier to analysis or rewriting code for use in a new context. Algebraic effects are a promising alternative.
  • In many cases, we use effects for performance, such as manual caching of data, or manual loading of subprograms onto a GPGPU. This becomes another source of entanglement between code and context, which contributes to bit rot and flaky infrastructure. Alternatives include annotating programs for caching or use of software acceleration of abstract CPUs or GPGPUs (cf. inversion of the language tower).
  • The most common application model today is the procedural loop, i.e. we have `void main() { loop { do stuff } }` or some variation thereof. This has a problem of being difficult to inspect, compose, or update at runtime because the loop captures a lot of implicit state and the external interface is essentially closed (we can only kill the loop or let it run). There are many other possible application models - apps as objects, apps as blackboard system agents, apps as materialized interactive views, apps as system patches or overlays, etc..

The standards and conventions of modern programming languages should be questioned, especially those with troublesome systemic implications.

But I disagree with your position regarding commodification or standardization as unnecessary or a hindrance on building systems. I think fault lies with several systemically problematic design choices that have become so conventional that they are rarely questioned.

I kinda went there and I don't believe it any more.

DSL's are very nice for program development, but have failed to gain traction because program maintenance is not done by people who have already learned the DSL that they have to know to do maintenance.

I took the idea of the DSL-capable language to what I consider its logical conclusion with an experimental lisp using fexpr semantics with optional hygienic renaming and two name spaces- one for dynamic and one for static scope. In principle, any s-expression could mean almost anything.

It was intended to be a 'translation target' or first stage of a 'universal interpreter.' The idea was that the syntax tree of a program in nearly any conventional language, expressed as an s-list in this translation-target dialect and linked to a "translation model" that defines how those s-expressions in context are interpreted, should have the same semantics as the original program. And this would enable, in principle, programs where every module and routine were written in different, unrelated languages, to be linked together and play nice.

Although that (sort of) worked, it was no damn good for what I had really hoped it might enable. Predictably-in-hindsight, it turned out to be largely unusable as a language of its own. Any expression, just as I had designed, could mean anything. So there was almost zero semantic information available when looking at the code at the bottom level that actually did things in terms of the domain, because none of the expressions you were looking at actually had a known meaning unless you already knew a significant fraction of the rest of the program. IOW, everything had become a DSL.

The good part about defining a new Domain-Specific Language for each new program. Development can run an accelerating curve because you are making it easier and easier to express the solution in semantics relevant to the problems, and becoming familiar with the language of that domain makes it easier to think in terms of solutions in that domain.

The bad part is that What you leave behind isn't easily maintainable by people starting from zero - and contrary to your best hopes and intentions, the people who matter most to the lifetime of this code will always be starting from zero, again and again and again.

So, my question was less about the virtue of a domain-specific language, than what that maintainer, "starting from zero," ought to be able to infer just from looking at one small part of the code. The semantics of naming conventions are a powerful part of that, but

Someone who is assigned to fix a bug in a program is, in the 'usual' case, someone who was hired on to the company seven years after the program was written and two years after the last engineer who actually worked on implementing it is gone. Because the company no longer has any of the original implementation team around to talk to and nobody knows where any documentation of that program's internal DSL might be (if, indeed, there ever was any), he has a book on the implementation language in hand but is otherwise flying blind, and never saw the source code of this particular program or this particular DSL before being assigned to fix the bug. Further, they know for a fact that this DSL will never be any good to them in maintaining or bug-fixing any other program besides this one. Or, even if it is, it will never be any good to them in maintaining or bug-fixing anything developed at a different company, and this knowledge limits their motivation to learn the DSL.

If the DSL was documented at all, the documentation is typically lost in one reorganization or another within five years, leaving only source-code comments as hints and clues of what's going on. Some engineer leaves, some new one is hired, and a bunch of nondescript "stuff" gets dumped off those bookshelves into the trash, by people who don't even know what it is, to make room for new "stuff." No matter how you tried to impress upon management that keeping track of it was important, if they're not making a profit on it this quarter it falls below their notice.

This may be an overly cynical view, but from experience in Silly Valley, it's a fairly pragmatic cynical view. Every line of code you write is going to have to be maintained by some poor schmuck who has no idea what it's for or what's going on in the rest of the program. If that line is in a DSL that the poor schmuck doesn't know, then you're writing code that will come to be known as harder and more expensive to maintain, and people are going to eventually agree that it must be reimplemented in whatever language is popular at the time.

So... I'm less enamored of languages that allow new DSL's to be created for every program than once I was. I think the only way you get DSL's that matter, that can be maintained, is when you build them into the vastly-extended set of "standard libraries" for your language. That way the poor schmuck who has to maintain your code can be expected to start with a working knowledge of the DSL for your subject matter domain that's standard in your language. Or at least, if they have to learn it they can learn it from a living community that maintains documentation external to the company, and it can be a transferable skill that they can use somewhere else.

Who do "we" work for?

Someone who is assigned to fix a bug in a program is, in the 'usual' case, someone who was hired on to the company seven years after the program was written and two years after the last engineer who actually worked on implementing it is gone. Because the company no longer has any of the original implementation team around to talk to and nobody knows where any documentation of that program's internal DSL might be

So, it isn't ethical where I came from to design systems with the intention that that is normal, nevermind designing languages and programming environments that encourage it.

Sadly designing for job security is also unethical.

I take your point, but if you want to extol the virtues of designing so that nobody except the original implementors can work on something, I think that's going to run into ethical problems rather quicker.

Like you I hate the notion that engineers can or should be treated like interchangeable parts. But in the long run, either we are interchangeable parts or the things we work on die when we move on.

Some contractors have put logic bombs in their code for job security. That's pretty squarely unethical. If we have code that's unmaintainable by anyone else, we're not doing much better.

Don't write overly complex systems

We're in a feedback cycle where programming language "theorists" invent ever more abstract ways to build up reams and reams and reams and then some reams of intricate code nobody understands -- and this boosts commodity output, giving more incentive more more of this direction in programming languages.

People should resist and even sabotage that kind of work.

Capitalist society has similarly blown non-software infrastructure very broadly as well, and in similar ways. Our big systems in general are tottering and beginning to collapse as a result.

"Oops."

TBF, reams of code that

TBF, reams of code that nobody understands is the default state for any mature software system. Doesn't matter whether it's abstract or not. Software becomes impenetrable and incomprehensible by sheer bulk and connectivity.

cf. big ball of mud architecture. The most popular architecture.

A lot of PL theorists have attempted, without much success, to avoid the problem of all architectures eventually becoming mud. At best, they have enabled pre-mudballs to be two or three sizes larger. AFAICT, the underlying issue seems to be some variation of Parkinson's Law.

"the default state for any mature software system."

That's quite a pronouncement.

dup

dup

We have quite a history to

We have quite a history to support it.

Blaming new PLs or paradigms is false attribution of cause for the fragile tower of dependencies. You'll have a barely maintainable heap of Cobol if using that notation. Indeed, there were many such systems in the era where Cobol was dominant, still dwindling gradually for decades after.

no more nihilism, please

Thomas - I think you have some cause and effect mixed up here.

The fundamental issue is one of scale (excellent comment from dmbarbour separately in this thread, I think), and the challenge of somehow continuing to manage the growth of complexity as scale continues to increase relentlessly.

Humans have a complexity budget. When systems cannot contain complexity, that budget gets shot very quickly, and the systems die. There is never any other result. This is always the same, exact outcome: A system dies when the cost of evolving the system exceeds the cost of replacing the system. (And, the corollary: Maintenance mode is the result of the cost of system replacement exceeding the perceived value of that system.)

"People should resist and even sabotage that kind of work. Capitalist society has similarly blown non-software infrastructure very broadly as well, and in similar ways. Our big systems in general are tottering and beginning to collapse as a result."

Resist? Yes. Managing complexity (or rather, managing entropy) is the job of IT. And IT, generally, does this job rather poorly.

Sabotage? Capitalism? etc.? No, that's just nihilism speaking, and we should avoid the adrenaline high of dabbling in such a logical void.

"nihilism"

Critiques of what the automatic logic of capital creates in the built environment are not nihilism.

I'd be more interested in

I'd be more interested in seeing you address the assertion that your argument mixes cause and effect.

I don't believe new abstractions from PL theorists would become popular without enough people having experienced pain while lacking those abstractions. Further, programmers almost never try new languages while they can make do in their preferred languages. Thus, those new abstractions must be difficult to adopt into the existing languages. The complexity that drives adoption of new PLs precedes those new PLs. But your argument seems to posit that the complexity is caused by those new PLs. Cause and effect, seemingly reversed.

I don't believe you're a nihilist, but spreading blame too broadly does look a lot like nihilism. Stovepipe systems, walled gardens, lock-in, and other forms of entanglement with context are certainly troublesome, and this includes lock-in to notation or a runtime. Standards have potential to be good or bad, much like code. Regarding commodification of software artifacts which might represent blueprints, recipes, information, knowledge, or creative works (music, games, etc.) - it takes work to create these artifacts, so should they be free?

If through a few carefully chosen standards we can reduce entanglement and make it easy to trade a banana without also sharing the gorilla holding it and the entire jungle, and the language runtime or virtual machine that simulates the jungle, that's a good thing.

I don't understand how "automagic logic of capital" or a 'critique' thereof is especially relevant to the development trajectory or complexity and fragility of software systems. I do understand that capitalism is all about using leverage to extract as much as possible, that poorly regulated capitalism is essentially sociopathic, and that capitalists have historically used a lot of propaganda and dirty tactics to break unions, weaken regulations, and make voters accept or overlook their abuses. I suspect it's the propaganda-driven non-reasoning that you refer to with 'automagic logic'.

But most PL designers are not motivated as capitalists; most just say "my favorite language EXCEPT with these few tweaks WILL BE GREAT!" and then implicitly, unthinkingly inherit any lock-in problems of their predecessor. From my perspective, it is mostly a few bad standards/conventions inherited in such a lineage that is the real problem, not the existence of those creative people who are willing to work hard to improve their own situation.

re: I'd be more interested in...

There's a lot here that seems silly.

For example, you're describing PL success as uptake by programmers who are almost exclusively working as wage labor for capital, on the other hand you say PL designers aren't "motivated as capitalists". Those are contradictory.

And you say that PL innovation doesn't deserve blame for complexity since it retrospectively tames complexity. But that ignores the question of "and what happens next"? The towers of successive subsumption are the growing fragility, the growing unrecognized, unmanaged life-critical system entanglements, and so on.

Of course, the economic pressures on PL designers, in and out of academia, tenure or no tenure, highlight the way that even if the subjective experience of PL designers is for some that of playful exploration, nevertheless their estimation of relevance is shaped almost entirely by the imperatives passively received from the requirements of capital accumulation.

An alternative approach might be multi-disciplinary, developing a critique of the social impacts of computing systems (including but not limited to complexity-driven fragility of life-critical systems), a critique of the aims that drive construction of those systems, and the development of an imaginary of how a saner society might use computing -- including what PL would look like for that alternative. Who knows, such researchers might even come up with something that gains some traction. But, of course, engagement in such activity at any scale and duration would for most be a career ender.

re: contradictory

You say there is a contradiction regarding uptake and motivation, but there really isn't. Most PL designers are hobbyists who don't expect any market success.

There have been over 7000 PLs designed and developed when I read surveys in 2010. This is just the fraction developed publically enough to find. There are at least a few hundred failed PLs per year. Almost none see the light of day or make a dime. Rather than market success, the primary motive behind these efforts is hobbyist passion and interest. Makers making things

There is perhaps a hope for market success, but no valid expectation of it. Like winning a lottery. But even then there is almost no leverage to capitalize on having designed the PL. A PL designer cannot write all the libraries needed to succeed. Maybe they can write a book.

Of course, there are exceptions. Capitalized languages like LabVIEW. JavaScript was produced by Netscape in a hurry to beat other browsers. Etc.. But I think if you study the history, you'll find that the language capitalists mostly borrow designs, and are essentially a separate group from those who innovate PL abstractions.

re fragile towers

Our software systems are fragile towers.

But this will be true even without PL innovation. There is plenty of evidence of this just looking at existing systems. Instead of new syntactic abstraction, it becomes towers of frameworks and libraries. Many PL designers have observed that framework API design is essentially PL design, usually performed by people who barely know the many potential pitfalls of PLs.

If there is not an easy option for concise expression and automated integration, programmers will use verbose expression and manual integration. Same story, over and over.

The fragile tower shouldn't be blamed on PL innovation. But it is a problem that deserves attention.

Producing a more robust, systemic, community oriented design is among the interests of many PL designers I follow on Twitter. Of course, it's a very self-selected group. But where you blame innovation, I think the solution is innovation with attention to the whole-community experience. Regarding how we share, distribute, trust, modify, and integrate code. How to avoid the walled-garden separation of most applications, at least by default. How a newcomer explores and develops an effective mental map of the system. Etc. There are a lot of projects in this vein if you look for them.

And as you might infer, we *mostly* do such work as a hobby. Not as a career.

The tower of dependencies must be cut somehow at basement level

I've thought hard about the "tower of dependencies" and "tower of interpreters" and how they generally make efficiency in time, memory, or more usually both, go straight to hell.

The enormous fragility they bring is another issue.

The only thought I've had about it was about subsuming the dependencies and then optimizing the resulting application until all that extra baggage is trimmed away and all the simulated virtual-machine code reduced to actual machine code that runs native. And that's half the answer.

That's not a simple optimizer to build. It's not clear that it *can* be built. But if we allow the optimizer to "chew" for extended periods of trying every possible thing we can think of, and probably apply AI/ML techniques to determine which transformations to try next or which sequences of transformations to plan ... such a beast could, mostly, be built. It would be an absolutely enormous amount of work though.

But as I said, that's only half the problem. A major issue with the tower of dependencies is that undesired behavior can emerge at any level of the tower. In fact the way some applications built on, say, the Boost framework are forced to operate are in fact bugs according to what the designer and users of a particular application want. If we subsume all that complexity and then optimize the snot out of it, we wind up keeping that undesirable behavior.

So the second half of the problem is that we have to be able to render that code, after a tremendous amount of optimization work has been done and all the semantics attached to unused parts of the original source code ruthlessly boiled away, in a language that a human programmer can actually read, understand, and correct. Without being bothered by a need to master all the code for semantics that the particular program does not in fact use.

Which is, more or less, a requirement for 'roundtrip' optimizing - where the optimizer spits out not only shorter and faster machine code, but also maps it all out in a source code that preserves all the symbols and abstractions that still have semantic meaning in the context of the individual program.

I don't think anyone has ever even attempted that. It's hard to even imagine what it ought to mean and what should be considered 'success.' And while I think the first half would be very hard - could be done but would be an astonishing amount of work - I'm not sure the second half can be done at all.

cutting the tower

I believe the tower of dependencies is problematic mostly due to conflation of concerns in conventional module systems (notably abstraction, decomposition, and identity).

Ideally, modules shouldn't be viewed as existing "below" their client program. Instead, each program should itself be the robust foundation upon which we install, integrate, and compile objects using algorithms found in libraries. Like building a circuit board, or a blueprint for one. A program should also be able to adapt a module for the local context, e.g. metaprogramming, patching, etc..

But that perspective simply isn't viable while we insist that modules hide implementation, have object identity via module-level variables, or can implicitly communicate without opportunity for access or intervention by the parent program.

Improving upon separation of concerns has often been a successful path for language design. But the module system doesn't get nearly as much attention as it should for its impact on the community experience of a PL.

So long as we follow the same old conventions for module system design or prioritize FFI comptability, we'll be stuck with software systems that are fragile, wobbly towers.

Yup, this must hold for everything in the whole universe

"A system dies when the cost of evolving the system exceeds the cost of replacing the system."

Yup, this must hold for everything in the whole universe, AFAICT.

Except for one thing: when the topic of raising the debt ceiling is on the table of our dear, do-gooding, ever-well-meaning, disinterested politicians.

But of course, that peculiar sort of unsustainability can only work when / because the public itself doesn't really care. Or, more precisely, doesn't feel enough pain yet.

An invariant condition which, in itself, is quite the thing to behold, we gotta admit.

... especially, after centuries of, say, "past data points", to stay polite.

Human techniques for managing complexity-and their limits

The point that humans have a complexity budget is a good one.

Our "seven plus-or-minus-two" short term memory is one of our fundamental limitations. Much of language design is managing things so people don't have to remember more than "seven plus-or-minus-two" to get things done.

The most fundamental paradigm for it is written language. It does in code what it does in everything else we use it for - it holds all the complexity we can't immediately remember in a fixed form we can understand, so that we can get it *back* into our heads when we need to work on it, and so we are allowed to *forget* it when we need to work on something else.

And language has a special place for us as humans neurologically. Our brains have special circuitry for dealing with complex relationships and structured semantics in language that have no parallel for any other representation.

Another strategy is abstraction. If we can make the language we're programming in capable of expressing more with less code, the theory goes, or make the code expression organized more closely along the paradigms and operations that people actually think of when dealing with that subject matter - then people will have to remember less.

The actual results are mixed, though. A DSL that follows more closely the paradigms and operations that people actually think of when dealing with that subject matter, provide tremendous leverage to subject matter experts. But not so much for the one who isn't already trained to think in those paradigms and operations. In fact it can be counterproductive for such because then they have to remember *MORE* - all the stuff that's abstracted away where the subject matter experts don't have to think of it has to be hunted down and understood before they can follow the more-abstract code, and they'll be going to check details of what each operation does or the details of what the process requires to be done, every time their seven-plus-or-minus-two stack limit blows.

We have additional strategies that teach programmers to do things to help others deal with that memory limit - the reason why we try to impress on people that it's a good idea to name a variable 'newlineCount' instead of 'N' is because it saves the reader from having to remember what 'N' means. Instead of remembering it, they can infer it from the name. The reason why we make it a member variable in an object named editingBuffer is because we are first telling the reader that they don't have to remember it in other contexts and second allowing 'chunking' in that at some higher level of operations they can just keep track of the editingBuffers instead of remembering which instance of every 'N' applies to which document and in which context.

Naming conventions, unlike abstraction capabilities, don't seem to have a cognitive downside for those not already "in the know." And don't, in principle, interfere with abstraction capabilities. So if they can help, why aren't they a language feature? Why can't I tell by looking at the name of a function whether it changes the value of one or more of its arguments? Why can't I tell by looking at the name of a variable that holds a value of one of the language's fundamental (non-derived) types which type it holds? Why can't I tell from looking at the name of a function returning such a type which type it returns?

In short, how much of the cognitive burden of remembering everything can we relieve the programmer of, just by having a solid set of naming conventions in our language?

AI complete

if they can help, why aren't they a language feature?

Until we have an AI for a linter, we'll never be able to suggest the 'N' should be 'newlineCount'. Thus, such naming is doomed to be convention, not a language feature.

Setting that aside, in many programs the programmer will mostly work with non-fundamental, derived or composite types. This might even be encouraged. Building in a bunch of conventions to identify a few fundamental types is awkward in this context.

We could perhaps manually associate naming conventions (prefix or suffix) with user-defined types, have a linter check that.

I've seen a development environment that associated types with colors, including user-defined types. Function calls would have colors transition from inputs to output. A little legend was visible. This provides a similar ability to recognize things at a glance, though it was awkward for infix operators.

On humans' complexity

On humans' complexity budget...

Striving to remain pragmatic, I'd tend to be of the opinion that that one can much vary from one individual to another, and likely is very context-dependent, too. But that's not saying much. After many years as a user of "programming languages" of all sorts my sense now is that complexity is neither a friend or foe.

Certainly, accidental complexity in notations or tools is inevitable, and essential complexity is... well, the essence of whatever the designer(s) and/or implementor(s) have introduced:

unless one is into the hobby of self-inflicted pain, one will probably wish to maximize the essential / accidental complexity ratio, or whatever can at least be perceived as such, if it isn't easily measurable.

But that ratio itself might be much context dependent too, which doesn't help us much either for a constructive introspection over what we have done / are doing / hope we'll be able to do.

Case in point:

mere mortals around me are often baffled by the news, when I'm asked, that it has already been for most of the decades (out of only a few in total, at that) for software engineers and language designers to fight in quasi-religious battles over "purity" or "expressive power", etc, of the languages favored by their respective churches (I won't use "sects" to not sound too cynical or pessimistic, of course).

So what I'm trying to say is that maybe our very, very, very young domain still have a long way to go before maturity in terms of its own introspection capabilities. IOW, we may not even have actually lost our innocence yet, about what we're grappling with.

Think Copernicus. But then, Newton. But then, Einstein. But then, the postmodernists. j/k

Computer Science does exist, I think, and has as much utility as Physics does for us to bargain better with nature, but there is still very little of CS running on our laptops, server blades, or smartphones - my feeling is it's still 90% "or so" of craftsmanship or "Art" (in Knuth's vernacular) or folklore still being deployed in binary form at an exponential rate, while provable or proof-carrying code thus is more often than not the exception.

It all seems as if this is a world where "we", collectively, recently unleashed that Turing Machine animal, and enthusiastically decided to let it go loose wildly, and then, only after the fact...

... are now trying to tame it so that it doesn't come back too often or too unexpectedly to bite us (as the flawed humans we are to ride it for its applications in our broad spectrum of ethics - ie, for better or worse as with anything which pertains to human affairs but computers have no clue about that).

IOW:

Let's be happy and carry on with the coding: the dawn of this domain of ours is still exciting, IMHO.

With or without centuries of math and logic behind it, our Towers of Interpreters and such are promising precisely because while nobody has had their Einstein moment to remove all confusion about it, we've at least already noticed (and probably accepted) that it isn't going to be resolved any time soon (if ever) - so that the risk that the Art suddenly disappears and is replaced with a boring, fully automatic assembly of code - by an Evil Matrix (*) hellbent on enslaving us all, of course - is seemingly, still, very close to zero for the foreseeable future.

(* On the other hand, there are some really sick people with the levers of power, out there - and those need no supercomputers to do much damage. Pens and ink largely suffice. As history shows quite clearly.)

Which by no means implies we should blindly tolerate the ugliest forms of that Art either - but our own personal pain thresholds (as readers or writers of computer code) is a pretty good insurance against that tendency in the long run anyway. Let's keep the faith : )

Your post really resonates

Your post really resonates: I also think the task of creating a DSL shouldn't be taken too lightly.
A DSL is a mini language, constrained to be applied in your particular domain: but it is a language none the less.

And with this 'medium' power that a DSL provides comes great responsibility.
You need an active team to support, document and maintain a DSL, like any other programming language (i.e. parsing, the compiler, useful error messages, etc).

I share your point on standard libraries. May be a library can be considered a poor man's DSL?
My latest approach is to create a library and, next to that, create a minimal boilerplate-free embedded DSL to express various library constructs.
Indeed, the burden of maintaining such library is similar to maintaining a DSL (team wise), but at least you don't need to maintain a parser or a compiler. The down-side of libraries is that compile-time errors are probably not domain specific enough (because of the host language).

But lousy error-messages are a trade-off I'm willing to take, and probably also the consumers of your library.

Bottom line: consumers of your library can more easily swap out your library for something else. Swapping out your DSL is much harder in my opinion.

Your post really resonates

dup