subtext: Uncovering the simplicity of programming

Some of you might be interested in subtext, a project that explores the idea of example centric programming and non-textual programs.

The basic idea is that the representation of a program is the same as its execution (possibly related to the discussion currently going on about edit-time, reasoning-time etc.)

Programs are constructed by copying and executed copy flow: the projection of changes through copies.

If all this sounds intriguing, hop over and take a look. The site hosts a couple of demos and papers that provide more details.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

discussed on ll-discuss

There was an interesting thread (and numerous subsequent forked threads) about subtext on ll-discuss recently.

Some points

Some of the ideas behind subtext seem to be common among many programmers, but I don't know if they are grounded in reality.

Quoting from the manifest:

Compared to every other field of design and engineering, programming is an embarrassment and a failure. The “software crisis” has dogged us from almost the beginning.

First there's this repeated notion that software construction is, in its current form, somehow "inferior" to other enginnering disciplines. I am not saying that it's not, but the point is arguable. Doesn't the other types of engineering projects fail ? I seem to recall another failed NASA probe recently. How many failed attempts were necessary to reach orbital flight, or land in the moon ?

The lack of meaningful progress in the last 20 years means that there is an enormous pent-up potential for change [...]

No progress in 20 years ? I think it's more the case that the progress made in this time did not reach the mainstream, not that there were no progress made.

Programming is so hard that only highly talented, trained, and dedicated individuals can do it passably well. The inescapable conclusion is that programming as we know it is just unnatural for humans. The solution is to reinvent programming to suit human cognitive skills – to program the way we think.

Now exchange 'mathematics' for 'programming' in the text. Indeed, mathematics as we know it is just unnatural for humans; does this mean we throw it away, or restrict it to be like 'the way we think' ? A trained professional is a trained professional, very few things are 'natural' for humans. The whole discipline can't be blamed if market pressures makes huge masses of poorly-trained programmers have a job.

That said, I have nothing against subtext, and I welcome new approaches based on structuring and AST manipulation. But I see these kinds of arguments repeated over and over, and I'm not convinced these to be true.

some of those hardware failures

are software failures
http://www.ksc.nasa.gov/facts/faq08.html

Abstractionless prgogramming

I listened to the demo, and thought it was interesting, but my evaluative sense was that it was not a promising direction for programming.

A first observation is that there is nothing really new about this kind of programming. Anyone who has used Excel for non-trivial calculations will recognized this kind of "programming by reference graph".

I think this analogy also points the way to certain failings of this approach: some things that are relatively easy to do with "normal" name-based programming become quite tricky if you are limited to fixed absolute or relative locations for all values.

What this amounts to is the elimination (or at least hobbling) of abstraction: the only mechanism to factor out commonality of computational behaviour is by copying and reference.

As I am nearly finished Milner's pi-calculus book, I'm currently impressed that naming is a pretty powerful foundation for various kinds of abstraction, and I see the subtext approach as moving AWAY from the increases of modularity, abstraction, reusability, etc. that most of us actually want.

Not new

A first observation is that there is nothing really new about this kind of programming. Anyone who has used Excel for non-trivial calculations will recognized this kind of "programming by reference graph".

Not only Excel, but other systems based on graphs were tried before too. My impression, after seeing the demo, is that it's a reinvention rather than inovation.

It seems people are in desperate need of a 'silver bullet', a magical new approach that will make programming easy. Does such a thing even exist ? I have my doubts.

However, such a tool for "end-user programming" might be quite interesting, depending on context. I'm still not convinced it is useful for heavy-duty programming, though.

Collapsing under weight

I'm still not convinced it is useful for heavy-duty programming, though

An interesting "feature" of the demo was that it ended just as the example was getting complex enough to be hard to understand. I felt that things would have become plain confusing if he had gone just one step further in adding logic to his program.

Given how simple it was, this did not fill me with confidence that it would do well even with the ordinary level of complexity of most non-toy programs, let alone that of a heavy-duty one.

Naming

What this amounts to is the elimination (or at least hobbling) of abstraction: the only mechanism to factor out commonality of computational behaviour is by copying and reference.

What is naming/abstraction if not reference? Conversely, what is reference if not naming? In the docs for subtext and in other discussions I've read recently, people occassionally seem to talk about "real references" or replacing names with a "direct" link to the "actual thing". For instance, from the manifesto:

Structure needs to be made explicit and directly manipulable, not implicit and encoded.

This all seems like nonsense to me: Is the suggestion that we should advance, soldering-iron in hand, to "directly manipulate" the (real) "structure"s that we are working on? :-) There are no "real things" inside a computer, except for the transistors and electrons in the processor etc.[1] There are only representations which refer to things under some interpretation. When you create a "direct link" between two items in subtext, you are indeed using names. Whatever pointers etc are used to create the "graph structure", they are just names. I see essentially no difference between subtext and a conventional programming language. The discussion of copying and persistent references between the copies and the parent reminded me a lot of Self, which also had this emphasis on "direct manipulation". The same sort of idea crops up again and again [2].

Given that we have to encode our structures and algorithms somehow, the other half of the question then is whether textual representations are sufficient. The author of subtext seems to be implying that text is not good enough. However, I don't think this is the case. Clearly, as a representational medium, text is perfectly adequate. What seems to be the real deficiency (and what subtext really seems to be trying to address) is the tools we use to manipulate these representations. Here, I think a stronger argument can be made. Tools for automatically identifying and tracking references and dependencies between different parts of a program, and allowing to manipulate them and update all occurances are certainly going to be a big help (e.g., refactoring browsers etc). However, we should be clear that we are creating better tools for manipulating representations and dealing with names, not eliminating them. Manipulating representations is the primary function of compiler/interpreter technology, so this once again indicates developing more bonds between them and editors.

[1] This is similar to the use of terminology of mental "images" used in literature surrounding cognitive science/psychology/neuroscience. Sometimes people seem to get dangerously near to expressing the view that there are real pictures appearing in their head. But that's a whole other argument... :-)

[2] I'm perhaps particularly aware of this as a Tcl fan. The amount of times I've been told that Tcl can't do linked-lists because it doesn't have "real" references is depressing.

Manipulating text with structure

What seems to be the real deficiency (and what subtext really seems to be trying to address) is the tools we use to manipulate these representations. Here, I think a stronger argument can be made.

I agree. Eclipse includes some nifty tools for working with java code (automatic refactorings, etc) that are based on AST manipulation. Almost two years ago Bill Gosling gave this interview on Artima.com about Jackpot, a tool for direct AST manipulation. Besides this interview there is an article on java.sun.com about it; but the Jackpot website is almost 3 years without news.

I believe similar techniques were used in lisp editors as far as the 80s, possibly earlier. So the idea is hardly new, although it seems every time it reappears it is sold as 'revolutionary'.

A name by any other name...

What is naming/abstraction if not reference? Conversely, what is reference if not naming?

And thus was lauched hundreds of volumes in the philosophy of language. ;-)

But let's keep our can of worms much smaller. I get the feeling that we are muddling the idea of reference as "pointer", which is a kind of naming, and what subtext does which is a link to a particular value.

The power of a name is that it is just a formal entity to which a value can be bound, but that can enter into logical relations and constraints before it is yet bound.

In the subtext demo (I didn't explore beyond that) you had to create a specific value first, and then you could could link it elsewhere. The fundamental concept of abstraction ( defining relationships or properties that exist independent of particular values ) is missing, except in the weakened form of copying a pre-existing relationship and plugging different values in it.

If a name only exists when it is already bound to a particular value, it isn't really a name, and you don't have real abstraction in your language.

If a name only exists when it

If a name only exists when it is already bound to a particular value, it isn't really a name, and you don't have real abstraction in your language.

Given that subtext appears to link to values directly, it would seem then that there is not true abstraction in this sense. However, this would also seem to imply that subtext is entirely static (if all links are bound at creation time). But this isn't the case. From what I can tell, you can replace the value that is linked to at runtime, and then it will cause appropriate updates. So, it can't be true that the links are between values, but rather they must be links between some sort of reference/variable. Therefore, it would appear that there is some naming going on (for the link end-point to have identity independent of its current value), but somewhat out-of-sight. Ironically, without explicit names for these variables it would seem that you can only refer to them rather indirectly (by pointing at an area of the screen and saying "this one", effectively), which seems to go against the whole "direct manipulation" philosophy. [Edit: The positive side of this, of course, is that the editor/IDE can make sure that you never misspell a name, or get a syntax error.]

Analogy to the WWW

This argument for plain-text programming sounds to me kind of like someone in 1991 arguing against the Web: http and html do the same things as anonymous ftp and README files, so why bother? Go build a smarter ftp client instead.

But the technically simple differences produced a huge jump in usability. In programming, the fact that we're always dealing with representations at some level doesn't exclude similar advances. Liveness and first-class links are even along vaguely the same lines as the web's improvements over ftp -- interactivity and hyperlinks.

Better representation or better interpretation?

I'm not sure I understand. HTML is a textual format. Hyperlinks are based on names (URIs). What is "liveness"? What is a first-class link (over and above a name)? I was arguing against a casual trend to ascribe certain properties (such as "realness") to concepts (such as links) as if they were intrinsic, when they are really only there under some interpretation. Do you believe there is some fundamental difference between a link in an HTML document, and a piece of text in a README file which names another file, and which a sufficiently "smart" ftp client could interpret/render as a link? The "realness" of the link is in the client that interprets it, not in the link itself (beyond its ability to represent sufficient information). In that respect, a web browser is just a smarter ftp client (one which knows how to interpret HTML).

The real point I was trying to make was not to argue for plain-text (whatever that is), but rather to point out that it is the interpretation (the tools) rather than the representation where most improvements can be made. There may well be good reasons for moving to different representational formats, but I don't think they are fundamental issues. With subtext there is a confusion (in my mind, at least) about whether it is a better tool/interpreter (it has some interesting features, certainly), or whether there is a fundamentally improved representation -- the argument for the former seems much stronger than for the latter.

Point of value

I believe the valuable point to take away from the SubText experiment is that programming should not necessarily be (a) about general purpose languages, or (b) languages necessarily designed around ASCII.

SQL is not general purpose, but is good enough for its problem domain that it dominates its niche. People just work around it's limitations by interfacing it with a GP language.

Musical notation is not ASCII either, but is fit for purpose.

WS-BPEL is another DSL that a general purpose language doesn't handle very well. Being a flow oriented language, it's representation is best described visually rather than in text.

Guy Steele is onto something when he said Fortress programs should look like equations. With the benefit of cheaper memory, why shouldn't programs be written in binary? It'll do away with parsing time altogether, and that gets rid of the entire class of syntax-related issues in a single swoop.

What is so refreshing about Lisp even nowadays is it's disavowal of syntax-oriented programming language. Instead, it focusses on business end of handling and manipulating lists.

I used to think that programs ought to be in ASCII for the purposes of completeness of documentation, so that we can type it all back in if we lose the disks, but serialization takes care of that nowadays.

The only downside with non-ASCII languages is that they tend to be hard to print on paper, to take away and read. However, this will be a moot point in about 10 years time, since complex programs are better studied with a computer (since it's easier to jump between procedures), and it'll be pretty cheap then to carry around portable devices. I had to mention the 10 year period, because if we start with a new language design now, it'll take that long to bake-in and it'll be in a position to take over once hardware become cheaper.

If you ASCII silly question, you get a silly ANSI...

(a) about general purpose languages, or (b) languages necessarily designed around ASCII.

On point (a), the demo makes general claims about programming without qualification, which in my mind suggests general purpose. An example: "Cutting-and-pasting is a natural way to program."

So "it's a DSL" isn't really a defense. If it was I would say, "Oh, you mean you want to find a new interface for Excel?".

On point (b), I don't think ASCII is the issue: it is the semantics of the language.

If he wanted to program by manipulating coloured shapes and linking them together, that could still work fine IF there were some coloured shapes that could be bound to arbitrary structures as part of computation.

In other words, "naming" is not dependent on the nature of names, but on the ability to use a meaningless-on-its-own symbol as a stand in for another semantic value via some defined process of binding.

The whole notion of "using semantics directly" without the medium of a syntax (whether ASCII, images or something else) is somewhat confused, and skips over some of the real power and value language.

Programming in the concrete vs programming in the abstract

The SubText demo has some interesting concepts, but its implementation can still evolve a little bit. What is conveyed is quite difficult for programmers to grasp, as we are all used to talking in the abstract, with variables and all.

I remember the jump a very long time ago I had to make before variables made sense to me. Programming with variables for the uninitiated man-in-the-street is as difficult as metaclasses for the ordinary developer. It makes the brain explode <wink!>. Looking at SubText is very much an unlearning process, as it represents the notion of manipulating objects, rather than an abstract pointer to an object.

The question SubText posed is: Can we make programming as easy as making pottery, where you mold your final product out of a series of intermediate steps. Can we shape streams of data and messages the way we shape clay? Forth made a good attempt at it. What next?

The fact that it resembles a spreadsheet is a tribute to the widespread adoption of spreadsheets by non-programmers, which made the power of computation accessible to millions of people worldwide.

Syntax

What I find scary in subtext's notation is that it's not static: you have to hover the mouse over boxes to discover where are they linked to. "Compasses" provide only a rough clue, writing all links as lines would make the lines too crowded to be readable.

I'm also not sure whether the amount of sharing is visible, and it's not clear which subtrees need to be expanded in order to have a complete view of a program.

This impliess that it's impossible to print out a non-trivial subtext program and examine it on paper, without having access to an implementation.

It's not am improvement over ASCII but the opposite.

ASCII and beyond

From the manifesto I gather that the author's primary aim is to retire flat text as an acceptable representation form. So it's not surprising to me that examining programs on paper is not supported.

I have nothing against more advanced representation formats. But statements like these confuse me:
Programming is so hard that only highly talented, trained, and dedicated individuals can do it passably well. The inescapable conclusion is that programming as we know it is just unnatural for humans.

In my definition, highly talented individuals program (or dance or sing) very well. Kasparov may have lost to the computer, but I am not going to say that he played passably well and choose the computer's level of chess as my new definition of "good". Just where are these people coming up with these artificial expectations ?

I suppose what he meant could be that the productivity/quality gap between the highly skilled and Random J Coder is very large, in fact, too large. However, this is again an unsubstantiated claim (how large should it be?)

The reality is the skill gap between Van Gogh and Random J Painter or between Michael Jordan and the guy next door is very large. But I don't want to claim that basketball is unnatural for humans and change the rules to suit more human levels of jumping.

The "people" do deserve better tools to do their jobs. However, by downplaying the values of abstraction, mathematics, monads, etc. the author is denying "the people" the opportunity to program as well as the highly skilled.

In the beginning the author says: programming is an embarrassment and a failure. Again and again I hear this and comparisons to other disciplines. Manufacturing microchips and building bridges have obvious material costs that justify strict quality control. Software management typically pushes on complexity and time to market, and they control the damage caused by resulting bugs via patches (viewed inexpensive as compared to recalls).

I think that many people that visit this site (or even think about software) are programming very well, not passably well. And we can do better. I understand that some people are stuck in their debuggers and would like to see the state of everything and would not want to know anything about the ugly stuff. For some reason I just don't feel that they are being truly empowered, though.

Re: ASCII and beyond

From the manifesto I gather that the author's primary aim is to retire flat text as an acceptable
> representation form. So it's not surprising to me that examining programs on paper is not supported.

It is surprising: that it's not text doesn't imply that it can't be a static image. I can imagine some graphical notation which is not plain text yet is printable. Being printable is a significant advantage - how else you can discuss it in papers?

Anyway, I haven't seen any non-text notation for general purpose programs which would look better than text. At most it can be a pretty-printed text with nice symbols instead of operators, i.e. something isomorphic to plain text.

I don't understand all those claims "textual form for programs is obsolete" since they aren't backed by anything else which is readable, without even mentioning technical issues with having to use a specialized editor. Using names to refer to things is the only sane choice where there are thousands of them.

Hope I'm not too late to join

Hope I'm not too late to join the discussion. The point of this experiment was to investigate how an alternative representation of programs could lead to improved programmer usability. If you don't feel that programming is much harder than it needs to be, then this is irrelevant.

The major potential usability benefit is seeing your code executing live while you edit it. This is like a spreadsheet, but in a much more general setting. It is still just a toy, but I believe the complete "static" visibility of execution is novel. Even spreadsheets hide the internal execution of formulas, whereas everything is transparent in Subtext. Debugging and animation tools show everything, but only in a run-time that is distinct from edit-time.

Counterbalancing this benefit are some daunting challenges. Scaling to large programs may be difficult. Replacing the well-understood abstraction mechanisms based on delayed binding of names by using links and copying is risky. Losing the ability to program with only paper and pencil (or keypunch, or Emacs) is disruptive. You have to really really want to radically improve programming to even consider such challenges.

Easier for whom, easier for what

The point of this experiment was to investigate how an alternative representation of programs could lead to improved programmer usability. If you don't feel that programming is much harder than it needs to be, then this is irrelevant.

In some sense or another, all of us here are interested in improving PLs to make programming better/easier, for some value of better/easier.

I think when we disagree, it is more often than not because we have different notions of what would make programming easier, what makes programming hard now, and who and what our ideal programmer and purpose are.

In a mailing list thread discussing subtext that Dave mentioned earilier, the subject header read "Joe Six-Pack programming".

I think this is actually misnamed, since I don't think "Joe Six-Pack" is the intended audience for subtext. Programming will NEVER be easy or natural for someone with no technical ability.

My impression is that this project is aimed at the programmer whose natural aptitude is as a mechanic: focused on the concrete case, prefering to get results by manipulating the parts directly. For such a person, abstraction is just extra baggage, since he is not interested in designing engines in general, but fixing the one at hand in particular.

So for certain people for certain types of tasks (the Excel example being an excellent one) that kind of "direct manipulation" approach may be the "easiest", but I don't think that addresses the "problems of programming" in the general case (at least not what I consider the general case. ;-) )

So I'm glad that this approach is being explored, since it probably does have some natural niches of application, but I don't believe it is going to "revolutionize" programming as a whole field.

I think what programming needs is MORE abstraction, not less. ;-)

Abstraction considered limiting

"I think what programming needs is MORE abstraction, not less." Well that is where we disagree.

I used to be supremely confident in my intellectual abilities and think that the limiting factor was the power of my tools/langauges. Experience has taught me that human abilities are usually the limiting factor in programming. We are all way over our heads.

A major difference between great programmers and average ones is their capacity to handle intricate abstractions. But they are both in exactly the same situation: running at the limits of their abstraction abilities. Making abstraction easier ought to benefit all programmers, if I am right.

Abstraction simplifies

A major difference between great programmers and average ones is their capacity to handle intricate abstractions. But they are both in exactly the same situation: running at the limits of their abstraction abilities.

You seem to be saying that abstraction is more complex than the concrete, and that abstraction makes it harder to think about something.

Perhaps we have different notions of abstraction, because for me abstraction is about SIMPLIFYING complexity by identifying common patterns of phenomena.

Let's take a simple example: the idea of a "dog". Not an abstraction you say? To casual observation, what do a chihuahua and a great dane have in common? Imagine if each time you saw a new breed of dog you had to freeze and decide if it was some new dangerous animal or not.

The convenient abstraction of a "dog" allows us to walk through the streets confident that all those different-looking beasts we see actually have shared behaviours and can be trusted (mostly) not to attack us.

In a language like subtext, you can have an individual great dane, you can make a copy and change it into, say, a mastiff pretty easily, but you can never capture the abstract notion of "dog" that applies to all instances of dogs.

For non-trivial arrangements of a program (or any other problem), this makes it HARDER not easier to undertand.

Abstraction is a given necessity

The question is how abstractions build on each to create higher and higher levels - which is something I don't see adressed by a clone and mutate scheme.

Of related interest might be a similar debate within the embedded software world of getting away from text - A picture is worth a thousand lines of code.

"You seem to be saying that a

"You seem to be saying that abstraction is more complex than the concrete, and that abstraction makes it harder to think about something."

Yes, exactly. We universally explain abstractions with examples. All I am saying is that when you are working with abstractions it helps to stay in touch with examples, and our languages and tools can be structured to facilatate that. That is also the reason unit testing is so effective.

The only time abstractions stand alone are when they become so entrenched that they are second nature. That may happen for the eternal abstractions of math and physics, but not for the myriad disposable abstractions in programs.

"In a language like subtext, you can have an individual great dane, you can make a copy and change it into, say, a mastiff pretty easily, but you can never capture the abstract notion of "dog" that applies to all instances of dogs."

This is the old argument about whether prototypical languages have "true" classes. You refactor the copy relationships so that "dog" is the parent of all dog instances. Apart from metaphysics, prototypes can do everything classes can in practice, yet are more concrete.

Concrete Blonde

when you are working with abstractions it helps to stay in touch with examples, and our languages and tools can be structured to facilatate that. That is also the reason unit testing is so effective.

I think there is a subtlety that is being lost here. I'm a TDDer, so I'm 100% in favour of testing abstractions against specific instances (how else do you know if you have the right abstraction?)

However, I still want to be able to explicitly express the abstraction in my program, because the information that, say, three different instances share the same behaviour or characteristics is useful and meaningful.

In a pure "copy and paste" environment, the onus is on the programmer to figure out that the three instances have something in common, and even small divergences can make it hard to even spot that commonality.

Also, as the demo showed, it is not always obvious what the starting entity for a particular instantiation should be (think of using "identity" or whatever as the basis for a more elaborate function).

This is the old argument about whether prototypical languages have "true" classes...
Apart from metaphysics, prototypes can do everything classes can in practice, yet are more concrete.

Hmmm. This sounds like a "what is OO?" debate, which most of us here have given up as a penance. (Most of the time, anyway. ;-) )

But from my point of view, if the something called a "prototype" actually does what abstractions do (specify general properties or behaviours that apply to potentially as-yet-unindentified instances), then it would BE an abstraction, regardless of its name. (Another example of abstraction by naming at work. ;-))

You refactor the copy relationships so that "dog" is the parent of all dog instances.

This makes me think that subtext does have some kind of hidden abstraction mechanism that I just missed.

Based on my current understanding, you would have to have an "is-a-copy-of" link of some kind, that conferred some common functionality, or something like that.

Can you elaborate how that would work?

BTW, shame on you for not identifying yourself as subtext's creator. ;-)

Copy links

"Based on my current understanding, you would have to have an "is-a-copy-of" link of some kind, that conferred some common functionality, or something like that."

Exactly. It's not in the video, which was done on an earlier version. The latest version with copy links is shown in the OOPSLA paper. I am in the process of unifying copying with linking, so that you can dynamically re-parent a structure based on a linked reference, which results in higher-order functions. I hope that will earn me some respect with the functional programming crowd.

There is more detail in the paper (see higher-order copying), but frankly there are still a lot of issues to resolve. Subtext is still very much a partial sketch of an evolving idea.

Abstraction

I think this discussion of abstraction is a little bit too er... abstract.

It might be helpful to consider concrete examples of types of software abstraction. Software abstractions are quite specific and well defined language constructs, and it's best to discuss them rather then argue about the philosophical notion of abstraction.

Consider: template based programming (aka "generic prorgramming") which allow you to abstract over types or classes (depending on the specific language) to define abstract containers and algorithms. Consider module languages, that allow you to explicitly and declaratively specify the realtionships between modules in you system using, for example, documented and checked intefaces (e.g., signitures). etc.

All these depend in one way or another on naming.

Abstraction by any other name

All these depend in one way or another on naming.

Not to mention inheritance and polymorphism. All I have done is show how the grandaddy of all of them - functional abstraction - can be done without names, and more importantly, without dividing program construction from program execution.

Some object that my links are just pre-bound names, but that is exactly the point. I have removed name binding from the language. Binding is completely up to the programmer (or metaprogram) to do in whatever way they like when the program is constructed, and is outside the language semantics. Attaching ASCII identifiers is an optional convenience. By eliminating name binding, you don't have to wait till compile-time or run-time to see what the links mean. This is crucial to seeing abstractions in code as living examples that adapt to their context.

Two birds with one abstraction

I get two for the price of one here: I can satisfy Ehud's request for specificity and make my point at the same time. ;-)

All I have done is show how the grandaddy of all of them - functional abstraction - can be done without names

I think this is precisely what you have NOT done. To the extent that you still have the functionality of functional abstraction (being able to reuse a pre-specified relationship or computation with different bound name values) you actually HAVE naming; you have simply obscured it from your source.

It is this obscuring that I think makes this approach harder for programming.

We agree that your source doesn't seem to have naming. However, this is only slight of hand: you have made manual the process that normally happens at runtime of "copying" the abstraction before evaluating it through beta-expansion (parameter binding). The fact that you can change the bindings and preserve the relationship between the "names" gives away that you still have naming going on, just using pointers or memory addresses instead of ascii strings.

Which leads to a second place that you have hidden naming (and abstraction): your primitives. Clearly, a notion like difference is necessarily an abstraction. You have simply given default values to the name binding.

So, you say, "I have the power of abstraction, but I have banished explicit abstraction from my source; I win".

Not so fast. There are two catches with this.

First, as I've argued before shunting information out of the source into the programmer's head makes programming harder not easier. (Notation of ideas is your friend)

But second and more important, you have BROKEN some of the functionality of abstraction: the copy of a "function" can be changed so drastically that it no longer represents the same relationship.

If I start with something like "Identity (0) = 0" and then, through your "variant" copying end up with "Factorial (0) = 1", how can I say that there is any abstract relationship between them at all. (Ironically, all you have done is copied the implicit naming scheme of the first to the second; the actual abstraction, the logical/computational relationship over the names, is erased.)

So to sum it all together, you have removed naming and functional abstraction from the "source" by shunting it under the hood, and then forced the programmer to implicitly keep track of it in their head.

I can't see how that would make programming easier.

See your code executing live...

Two that I know of are:
Vital



and HOPS.




--Shae Erisson - ScannedInAvian.com

Is it live, or is it Memorex?

Vital shows the result of a top-level function application, not its internal execution. It does make data declarations effectively live, though. Thanks for the ref.

HOPS animates evaluation, which must be manually invoked with manually supplied arguments, and then played out. Just a fancy stepper.

Subtext shows the code and its execution together at all times and in complete detail.

The dance is a poem of which each movement is a world

Another example of a visual interface by the language/game guru also known as Aardappel: the Aardappel language.

Surprised you didn't directly link to this example (from HOPS):

static visibility

I believe the complete "static" visibility of execution is novel. Even spreadsheets hide the internal execution of formulas

I made a toy language that did that in 2003. (Not in a very interesting way; Subtext is much cooler.) I've thought about how to get that property into a non-toy language, and came up with an approach based on a mapping between hypertext documents and Hewitt's Actors scheme, but it's not fleshed out. (The documents-as-actors approach seems more conventional, which has its good and bad sides.)

Documents-as-actors

Could you share that with us? I am very interested in learning different ways to achieve static visibility in graphical languages.

More on naming

Without commenting on subtext as a whole, a few notes about naming.

I'm going to offer up a rather rough taxonomy here, so please take with a grain of salt. In many programs, the names given to variables, functions, and such can be divided into two types:

* Local names. These names are assigned to terms whose uses (both production and consumption) are all known, and easily determineable. Often, these correspond to "local" variables in programming languages and such. In most cases (if not all), the name is not necessary (and can be elided from an abstract syntax graph); such names can also be eliminated through numerous other means (use of stack-based machines/languages like Forth or Joy, de Brujn indices, etc.) or renamed via alpha-conversion. Sometimes, the names have external benefits to the user as a form of documentation (letting me know that the term in question represents a customerBalance, for instance); but the primary reason to have such names is to allow an arbitrary directed acyclic graph to be represented by a tree, or by a linear sequence of tokens. Many graphical programming environments find it useful to exclude these names (or make them optional). In the electrical engineering world, one can compare schematic capture tools with Verilog; in Verilog every wire must be named. In graphical tools, the engineer can easily let trivial wires be anonymous.

* Exported names. These are names corresponding to terms which are published to the world; it is assumed that anybody might need to reference the term; and one does so through its name. Obviously, something which is published must be "reachable" somehow; and a name is a common way to do it. Many module systems have the feature that published names are primary keys; two different terms/features must not have the same name. To get around the namespace pollution problem, a hierarchical naming system is often employed (such as in Java), along with rules on how to form names (start with the Internet domain name of your organization with the fields inverted--i.e. com.sun.java). Other environments provide alternate query schemes--making UUIDs or other unique-but-meaningless tokens the primary key; and providing more advanced query mechanisms (which may include names, descriptions, versioning information, vendor/author information, meta keywords describing functionality, etc.)

Given that lengthy diatribe; one might conclude that I'm against names. I'm not--frequently they are the simplest thing that works. In many contexts (such as accessing the fields in a simple record), they are the easist thing to reason about; they provide order independence, they often provide documentation to humans, etc. OTOH, in many cases they are a hindrance rather than a help. Many names are incidental rather than essential in nature. And as Brooks points out, the more incidental complexity we can eliminate, the better.

A name is not a string

such names can also be eliminated through numerous other means (use of stack-based machines/languages like Forth or Joy, de Brujn indices, etc.)

What you seem to be talking about, Scott, is the implementation of a name, not the name, or the idea of naming, itself.

The whole idea of a de Bruijn index is that allows naming by unique position rather than by non-unique strings of letters.

Either way, you have a name, that is a "place holder" or symbol that can be used in logical relationships independent of what value it will be bound to in application.

I'm really harping on this notion of naming in this thread, because I think it is an important concept for computation and PLs (it is the fundamental idea of the lambda calculus, for example), and that muddying it by focusing on its various forms of implementation as though they are fundamentally different things actually obscures what we are talking about.

(If I may be overly thorough for a moment ;-), this is another example of abstraction simplifying ideas by finding the underlying principle behind them.)

Semantics unaffected

I'm really harping on this notion of naming in this thread, because I think it is an important concept for computation and PLs (it is the fundamental idea of the lambda calculus, for example), and that muddying it by focusing on its various forms of implementation as though they are fundamentally different things actually obscures what we are talking about.

I agree, 100%. It seems obvious to me that Subtext, whether in its current form or in a future more complete form, is and will be easily mappable to some textual representation, which is very likely to be essentially equivalent to some existing language(s).

I argued a related point on the on the ll-discuss list, in a discussion which occurred before Subtext came up. The whole "visual" thing is a question of user interface — we're really talking about visual syntaxes. Semantics are semantics are semantics, they're usually trees (although other structures are possible), and whether you represent them visually, textually, or whatever makes no difference to the semantics.

That's not to say there aren't useful things that can be done with visual syntaxes -- I don't disagree with the point which Darius made about the WWW. There can be benefits just to thinking in terms of a different interface — for example, looking at lambda calculus from a visual perspective makes it clear that alpha conversion is an artifact of a certain kind of textual representation.

However, we're much more likely to make useful advances if we're clear about what we are and aren't dealing with — and one thing we're emphatically not dealing with when it comes to visual syntaxes is anything new in the realm of semantics.

Syntax and user interface matter

It seems obvious to me that Subtext ... will be easily mappable to some textual representation, which is very likely to be essentially equivalent to some existing language(s).

The textual representation would have to be some XML-like encoding full of generated unique identifiers, hardly human-readable or human-editable.

The whole "visual" thing is a question of user interface — we're really talking about visual syntaxes. Semantics are semantics are semantics, ... whether you represent them visually, textually, or whatever makes no difference to the semantics.

You are right that so far I have only duplicated traditional semantics. I speculate in the paper on some novel semantics, for example first-class program changes.

I disagree that syntax is irrelevant. The syntactic innovations of Fortran and Algol were intertwined with their semantic inventions. User interface issues are likewise crucial: contrast Eclipse with a keypunch. I will admit that looking at only the last 20 years of programming languages could lead one to fatalism.

Sure they matter, but...

The textual representation would have to be some XML-like encoding full of generated unique identifiers, hardly human-readable or human-editable.

That doesn't appear to be true as long as the nodes are labelled or commented. Entirely anonymous links might need generated names, depending on the circumstances, but how important are these? Too many anonymous links would become difficult for a human to reason about.

The third use of names mentioned in your paper, "comments and mnemonic aids", is the only use for human-readable names which is actually necessary. The need for those doesn't go away, even if there are some cases where names can be avoided. Those names can be exploited to make useful textual representations of the program. In fact, the Subtext demo depends on this, even providing a more expression-oriented view, which would become pretty hard to understand if the names were removed.

You are right that so far I have only duplicated traditional semantics. I speculate in the paper on some novel semantics, for example first-class program changes.

Wouldn't that be equivalent to a Lisp-style macro feature?

I disagree that syntax is irrelevant. The syntactic innovations of Fortran and Algol were intertwined with their semantic inventions. User interface issues are likewise crucial: contrast Eclipse with a keypunch.

I said the representation makes no difference to the semantics, and that's accurate. I also acknowledged that thinking in terms of different interfaces can have benefits, which I consider addresses the above point. But I think the keypunch/Eclipse comparison supports my perspective at least as well: there's no fundamental difference in what programs you can write with a keypunch vs. Eclipse — the difference is essentially in convenience. (I've been an Eclipse user since the R1 release, but for most purposes I'm just as productive in a decent text editor. That's not to say I wouldn't like a high-tech structural editor if I could get it, but Eclipse doesn't reach that level.)

I will admit that looking at only the last 20 years of programming languages could lead one to fatalism.

I'm not looking so much at extant programming languages, but rather at PL theory. The points about the nature of abstraction can't be handwaved away. I think many of the techniques which underlie Subtext are very promising: visual interfaces to ASTs, programming by example, etc. I'm also all in favor of more clearly separating the mnemonic role of names (which is really one of the only ways we have of relating a program to its application domain) from the use of names as a reference mechanism within a program. But I don't see any of this having a significant effect on language semantics.

Mnemonic vs. referential

I'm also all in favor of more clearly separating the mnemonic role of names (which is really one of the only ways we have of relating a program to its application domain) from the use of names as a reference mechanism within a program.

I'm curious about this, Anton. Do you have specific ideas/examples of how you want to do this? And what benefits do you think accrue to the programmer in such a scheme?

Do we not have this power already in most languages with scope, name aliasing, and similar mechanisms?

More on mnemonic vs referential

Anton wrote:

I'm also all in favor of more clearly separating the mnemonic role of names (which is really one of the only ways we have of relating a program to its application domain) from the use of names as a reference mechanism within a program.

To further Marc's response; we do this already to some extent.

Just as many languages perform type erasure; many langauges also perform name erasure--it's considered an essential performance optimization in many cases. When you access an element in a C/C++ struct, the compiler doesn't emit code to look it up by name in some dictionary--the name is erased and replaced with a numeric offset from the top of the structure. Local variables are replaced with offets into stack frames (or don't ever touch memory at all in some cases). In most C/C++ programs; the only symbolic names which are guaranteed to survive are externally-exported symbols; and even those may be mangled by the implementation.

Needless to say, this has its drawbacks. For one thing; name erasure can lead to the syntactic fragile base class problem, and similar issues that occur when different code modules are separately compiled and deployed, with inconsistent source versions, and aggregated at runtime. For another, keeping the names around is necessary if you want runtime reflection or introspection.

A useful general facility is the ability to assign "attributes" to terms. This could be viewed as an extension to the type system, and could be useful for both assisting with translation, metaprogramming, and for documentation purposes.