Memory, Actions and Extensible Syntax

My view of actions and action syntax is based on computer hardware and machine language. A computer has memory and a processor, which is analogous to a Turing machine, as well as a person (processor) using a library (memory). In the analogue world there are things like a slide rule which is moved to calculate, electronic devices that are analogs to slide rules, and quantum particles with a state (memory) forces in which motions that process the particles to change their state. Memory and action are universal.

Human languages help us communicate about the universe. An efficient language allows us to accurately and succinctly communicate, and our efficiency in using a language depends partly on our understanding of the language. Thus, I propose a syntax to describe actions. There are four variations, infix, prefix, suffix and nullary. A subroutine acts on parameters, constants and/or variables. The simplest form is nullary which takes no arguments; such subroutines may compute Pi, get the time, or retrieve data from memory (i.e., act as a variable). Prefix and suffix forms are more complex, and infix most complex. These four forms describe all possible variations of subroutine name and arguments in a statement. However, there are other facets of subroutine call syntax to consider.

The other syntax facets may include, but are not limited to the following: punctuation, keywords, and argument order. To be completely extensible requires there be no immutable punctuation, nor reserved keywords, nor inflexible order. And, of course, arguments may either precede or succeed the subroutine name. Finally, arguments may be characters, character strings (names), or strings of character strings (expressions). These are requirements needed to make a subroutine call syntax for an extensible language that does not require metaphrase layering to express all variations of subroutine call. There is at least one solution that avoids metaphrase layering. I am curious to know if there there are any others.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Your comments are always

Your comments are always terribly syntax-centered. Parsing technology is interesting and research to find languages that are more comfortable to use because of more appropriate syntax is certainly useful, but yet I wanted to ask: do you realize that it is only a (subjectively) minor part of what a programming language is about?

Syntax express in what concrete form the programs should be written. Semantics (of various form) express what the different construction of the language mean. Most programming language research devote more attention to the semantic than to the syntax: they operate on a view where all the concrete parsing has already been done, and they have an abstract, highly structured representation of the program, on which they can work.

Your work is not only syntax, it mingles in some semantics -- to have an extensible syntax you describe dynamic changes to the "evaluation rules", in some sense, of the syntax -- but by using "string" as abstraction for programs instead of the nearly arbitrarily structured expressions we use in Abstract Syntax Trees, you loose a lot of abstraction and come back to a very pedestrial level.

I suppose you seek to improve on the expressivity of existing languages. But where exactly would those improvements be ? You seem to assume that you would obtain them by allowing a richer, extensible syntax, but couldn't you describe them at a semantic level, using a simple, fixed syntax ? Lambda-calculus -- with its fixed and most basic syntax -- is already able to describe most of the semantics effects of existing language: self-reference, implicit quantification, context dependency, etc.; the kind of things most related to the "accurate, succint and natural communication of actions" in our natural languages. See Montague grammars and related topics.

So that would be my question: imagine I am most unimaginative regarding syntax changes, could you convey to me your ideas of new programming constructs, by the only means of semantics description of a fixed-syntax language?

Syntax-centered

A college project of mine was to write a library to manipulate sparse arrays using circa 1970 Fortran. It annoyed me that the library calls were not expressed in the same syntax as standard Fortran arrays. And, similar scenarios annoyed my throughout my career, syntax for libraries could not be altered to represent the semantics.

Over the years extensions, including macro front-ends, have helped programmers make syntax match semantics. However, Thomas A. Standish pointed out that metaphrase extensions, using macros to extend syntax is not a panacea for language extension. See: "Extensibility in Programming Language Design," afips, pp.287, 1975 Proceedings of the National Computer Conference, 1975.

An extensible syntax that avoids metaphrase layering may simplify the process of matching syntax and semantics, which IMO would be a good thing.

AST

The Ameba symbol table uses folded strings for symbols, but that is not a replacement for an AST. An AST can be developed and used effectively. Both the folded symbol table and an AST have data that can be used by some processes to do the same thin, but the symbol table cannot replace an AST.

Ameba is a very simple language. Like a calculator or machine language, it reads an op-code and evaluates it. When processing source, it acts similarly, as soon as a symbol is recognized it is converted into and op-code and evaluated. Thus, an AST is not necessary to process basic Ameba syntax. On the other hand, an AST would be very handy for metaprocessing such as optimization and translation.

To be completely extensible

To be completely extensible requires there be no immutable punctuation, nor reserved keywords, nor inflexible order.

I'm not sure if this is really possible. You'll still need some syntax to define things which will require keywords/symbols. You could allow overloading on those keywords, but I'm not sure that is wise.

All your discussions have been about making something uberflexible, as though that alone is enough to make it better. While I generally agree that programming languages would benefit from more natural language features, I have to question the benefit of going all the way in that regard. At some point, you're going to be gaining little practical value in exchange for making the language more difficult to implement and grok.

Extensible syntax is

Extensible syntax is possible, even under the above definition. Christiansen grammars (attribute grammars where one attribute is the grammar) would support the level of flexibility described here. More precisely: your 'language' would be described by the abstract syntax tree rather than by the syntax, and your initial grammar would describe a default syntax that will generate a tree of the target format - including a feature to tweak the grammar-attribute. Any extensions to or mutations of the syntax would still reduce to a common AST-type, but you could control syntax (eliminate keywords, for example) in order to support programming disciplines and protocols.

The real difficulty is making the extensible syntax modular, such that they can be provided by library and syntaxes from multiple libraries combine in a sane manner in a single program. Name spaces are problematic enough for multiple libraries, and dealing with overlapping names is a relatively simple problem compared to coordinating overlapping grammar extensions.

I had my heart set on fully extensible syntax for several years, but I'm less convinced today. I am concerned about complexity, difficulty of implementation, clarity of error reporting, and potential for confusion among developers. The final argument that changed my mind was Gilad Bracha's 'ban on imports', which discourages named dependencies between libraries.

I've decided to favor a much weaker variation of extensible syntax: the ability to name a basic parser (really, a stage-0 interpreter) at the top of a 'page' of code. This is sufficient for versioning the language syntax and providing alternative syntaxes for data description, DSLs, and so on. These stage-0 interpreters must produce an AST of a common type, and strictly keeping the dependencies shallow (a module depends on multiple interpreters, and a project composes multiple modules) also satisfies Gilad Bracha's arguments for reusable code and avoiding dependency hell. This is more a plug-in extensible language than true syntax extension.

Extensibility Morass

Yes, the computer industry is locked into doing business without syntax extensibility. If techniques to extend languages had been discovered in the '50s, computing tools and artifacts would probably be very different.

The extensibility solution developed by me does not need an adaptive grammar. The default syntax allows all possible variations of subroutine call, which alleviates the need to modify the default syntax. In a program, every character and token is processed by a routine. Ameba calls a routine for every character and token, including, but not limited to the following: do, while, if, then, else, and fi,

Machine language also works this way, each op-code calls a routine. Ameba extends the idea of an op-code to include characters, tokens, and expressions, which are the language. A program is a file of characters. Characters are collected into tokens, Tokens are collected into expressions. Ameba is expression based, so has no statements.

A subroutine that processes its own syntax may need to read from a file to get data necessary to process its arguments. Such a subroutine must be called before its arguments and their syntax can be known. Ameba, assumes this scenario.

Based on this assumption, a library of extensible code cannot be distributed as we currently distribute libraries. On the other hand, libraries written in existing languages can be used by an extensible language interpreter, because the interpreter for the extensible language must be written in an existing language, such as C.

I hope you do not give up on an extensible language. IMO the difficulty is not the technology; rather, it is thinking outside the box to understand it and find simpler solutions.

Many years ago, during the punch card era, I noticed the evolution of card readers. They began as rivals to Rube Goldberg machines and became simpler and simpler. As the era closed, a card reader was trivially simple. A tilted table to feed the cards, a read station, and a bin to catch read cards. New technology is unnecessarily complex and evolves to be simpler; it is a recurring pattern.

If my body allows, you will be able to test drive Ameba in the not too distant future. Version 0.1 will run on Linux.

I had my heart set on fully

I had my heart set on fully extensible syntax for several years, but I'm less convinced today.

Agree 100% for all the reasons you listed. Once you support prefix, infix and postfix operators, you've covered probably 99.9% of what anyone would want to express in a language.

People aren't generally interested in the syntax IMO, as long as the language is sufficiently expressive, they are more interested in altering the semantics of embedded constructs via safe metaprogramming of some sort. Safe stage-0 interpreters, macros, and so on are what's needed. We should be able to transform a program statically ala Active Libraries in a safe and modular way.

Not generally interested in syntax

It does seem that programmers are not really interested in syntax. On the other hand, mathematicians and authors write very precise syntax and occasionally invent a syntax nuance.

Thanks for your compliment.

On the other hand,

On the other hand, mathematicians and authors write very precise syntax and occasionally invent a syntax nuance.

I think the convention is to use a symbol only to make an important semantic distinction. They wouldn't have adopted the integral symbol, or gradients, etc. if they were not semantically distinct from every other mathematical operation at the time.

In programming, our convention is to invent names for semantic distinctions. IMO, we need better ways to make semantics extensible and composable, not syntax. Syntax is just window dressing to understanding a program. Any syntax that makes semantics clear would suffice.

Certainly a certain syntax is clearer with a new syntax, but that seems the exception, not the rule. I'd like to see examples of programs where prefix/infix/postfix operators were unsufficient before accepting that arbitrary extensible syntax ala Katahdin is truly necessary.

Not that sure..

> Once you support prefix, infix and postfix operators, you've covered probably 99.9% of what anyone would want to express in a language.

I'm not that sure, most programmer who know several programming languages regrets that contruct 'foo' is not available in the programming language he has to use and those construct are not necessarily doable with infix,prefix,post operators.

My own language construct that I tend to miss in the 'normal' programming languages are
- embedded evaluation/variable ie print("this is a text ${foo}\n");
- enums with strings representations:
enum type_enum { foo, bar, baz };
type_enum v = foo;
print(v); // prints foo not 0

Sure you can do print("this is a text %s\n",foo) or use C++ style and create classes for the enum but these are workarounds from a syntactic point of view they don't look very pretty..

- embedded

- embedded evaluation/variable ie print("this is a text ${foo}\n");

Right, but casting this as a syntax issue is questionable. It's more like a dependent typing issue, or you can by adding more structure to the control string.

You can probably solve it with syntax too, since string delimiters cause the quotation issues in the first place. If you could come up with some unambiguous way to specify a string without delimiters, or with some set of delimiters that allow splicing operations, that could work, but it's starting to look like Danvy's solution.

In any case, solutions to printf certainly don't require syntax changes, or even syntax extensibility.

- enums with strings representations:

This doesn't seem like a syntax issue either, it's just an overloaded "show" as in Haskell (ToString() in .NET/Java).

Your first example is a

Your first example is a perfect example of the general "quotation" mechanism : a closed fragment of program that is written in a completely different syntax -- translated after the fact in the syntax of the host language -- that possibly itself includes "antiquotations", that is fragments of the horiginal host language. A preprocessing phase can expand the quotations away into the simple language syntax.

Quotations account for most uses of domain-specific syntax in a given language (think specific XML syntax, SQL-like embedded syntax, and possibly even printf-like format strings), and they are easy to delimit and naturally generic (just add an anotation to specify which specific parser is to be used for the quotation). It is therefore easy to include from the start a general quotation syntax in your language, and does not require full syntax extensibility.

Your second example can also be accounted for a common case of metaprogramming : annotating the language constructs with code producers. In your example, you overload a supposedly existing language feature (enum types) with a specific behavior (direct string representation). You could formulate it as a generator that, given the AST of an enum type definition, would implicitly produce additional toplevel definitions such as a printing function (say string_of_type_enum), so that you could use `print(string_of_type_enum v)` in your code.

Such generators are also quite easy to plan in the initial syntax of your language : devise a generic generator syntax for some of the syntaxic constructs (a generator for type declarations and a generator for value definitions may have a slightly different specification in what kind additional code they may produce), and let other people plug their generators as a preprocessing phase.

In general, I think that it would be fruitful to isolate such uses of syntax flexibility, and try to add for them a fixed, but flexible and plugin-ready syntax. Those use cases are generally much easier to handle that general syntax extensibility (changing syntax rules of an already existing syntaxic class, etc.), and much more well-behaved. Comments might ultimately be one instance of this global idea as well.

Extensible

It certainly is possible. It's done in Felix. The whole grammar is dynamically loaded from a bootstrap. There are no keywords. A few symbols are fixed as punctuation marks such as parens, asterisk, comma but these have no semantics. Felix also fixes the lexicology of some core literals (although that's not necessary except for some integers).

The parser is Dypgen, an extensible GLR parser with many extra features. The action codes for the reductions are arbitrary Scheme codes evaluating to S-expressions (which are converted into Felix's native terms which are Ocaml variants).

See Katahdin.

uberflexible

To eliminate keywords is rather easy. Consider the case of machine language and its character string form, assembler language. The machine language is defined by op-codes and addresses. Before ROM I often entered short programs numerically to load a single punch card to bootstrap a better loader that loaded an operating system..

Assembler language is much easier to use than machine language, because of mnemonics for op-codes and addresses. Assembling a program makes machine code. A disassembler may output an alias for an op-code or address, for example changing a LDA (load A from memory) into MMA (move memory to A). As long as both mnemonics are valid in the assembler, the disassembled code can be assembled.

If we tie semantics of a language to op-codes, byte-codes or 4-byte-wide-codes, instead of mnemonics, we can vary the mnemonics with no ill effects. Moreover, people handle similar scenarios in their communications with ease--synonyms, nicknames and aliases are common. In fact, lambda expressions usually rename arguments, and I doubt you are complaining about that practice.

Does this perspective help assuage some of your concerns?

Does this perspective help assuage some of your concerns?

Not at all.

Those mnemonics are the language. Even if you're making some sort of meta language definition you need some syntax to use, otherwise you end up with people writing something like cil by hand alongside BNF.

Katahdin

Nice work by chrisseaton.

Using a PEG to build a packrat parser is a job for a compiler writer. IMO a simpler method of extending a language, more natural, would be better, so that a larger percent of programmers can participate.

mnemonics are the language

I agree that mnemonics are the language. On the other hand, the semantics associated with natural language are inherent in each person using any language. To some extent, thoughts are independent of language, Although some people think verbally, others like myself translate thoughts into words, because their primary mode of thinking is not verbal.

Try it

Unfortunately, Ameba 0.1 is about a year from being useable. I believe it will allow you to do the things you want and need to do...time will tell.

Ameba does not allow programmers to use memory addresses. It hides addresses for two reasons. First, Ameba manages memory allocation, deallocation, and garbage collection. Second, fast pointers are difficult and dangerous and smart pointers are not fast. Smart pointers defeat the main reason for using pointers, high speed performance.

There are many ways to dig a hole,

There are many ways to dig a hole, depending on what you are digging, the size of the hole, and other factors. If you want to dig a tunnel thru a granite mountain to shorten your trip to work from 40 miles to 4, and the only tool you have is a shovel, you are out of luck. You will have to drive 40 miles. Diggers need a variety of tools, and programmers do too. Extensible syntax is a tool.