Looking for a language like...

Prior art:

A few years ago I programmed on an HP Non-Stop that had a script metalanguage named TACL. It was interesting because its ROUTINES can get arguments, rather than the routine being passed arguments. Using #ARGUMENT it could get character-by-character all the characters following the called until end of the ROUTINE call scope, i.e.:

    [<RoutineName> <argument characters>]

For example:

    [MyPseudoCommentRoutine takes a comment as argument characters, counts consonants and vowels, but returns nothing.]

Getting arguments is an unusual feature of languages in my experience. A ROUTINE can parse its character string argument any way the programmer wants. Technically that string is passed as an argument, and the TACL interpreter probably does exactly that. I am not privy to TACL source code. This idea sparked my imagination. I thought about how TACL ROUTINES actually work, and I thought of an alternative that seems more extensible.

*Hope I can explain this well enough. I'm not good with words, and I'm dyslexic. I wouldn't last a nanosecond as a proofreader.*

My idea:

*Like most bright ideas, 99.999% of the time it has been documented.*

Another language may implement a similar Routine but not require brackets that surround the call and its arguments. This Routine is passed a return address argument from caller to Routine--nothing else. Using the return address this Routine can scan through a buffer full of source program to finds its arguments. Of course, arguments are typically near the location of a Routine call.

The reason for eliminating syntax that brackets a Routine call is to allow Routine definitions to be the universal method of defining operators, methods, functions, etc.

Moreover, since defining a Routine does not need a prototype to describe its argument list, only one define syntax is required, such as the following:

Define "symbol-name" = "symbol-definition".

The reason this language interests me is that its parser is distributed among all its Routines. Once a Routine is called, it can parse arguments however the programmer desires. Of course, a parsing free-for-all is probably a bad idea, and I'd expect a library parser that most people use. The library could provide call prototypes. Except, a Routine with an argument syntax that is not context free may be too complicated to make a call prototype, for example if keystrokes affect argument syntax.

Conclusion:

I've thought a lot about the features of a language based on this kind of Routine, and could write more. But, I'm not a good writer, and I think "My idea" is the kernel from which many other features of the language flow logically.

Of course, my logic is not infallible. However, I have spent years doing mind experiments about this language. I am convinced this language can exist, and that it will have features other languages I have used do not have. On the other hand, I am not a language pro, and haven't designed a complete language in my head, maybe half.

I have searched the Internet for a language as described, read a lot of reference manuals, but cannot find one example of it. I'd like to download it and try it out. If such a language does not exist, then why?

I am not an expert, and fear that I am doing a very poor job presenting my idea. Thus, I fear no one understands what I am trying to say. Especially if I, an average joe, has hit the 0.001% probability jackpot and had a novel idea. Sooner or later some joe will do it.

My Bio:

I am not a good communicator. Mom said I didn't learn to talk until I was 3. Learning to speak late in childhood is characteristic of autistics. I've not been diagnosed autistic, but in 1947, I don't know if medicine knew about autism.

My thought processes are clear. I've made a living for 35+ years and retired. I am happily married. I don't owe anything to anybody, and have not been in jail. I graduated from the University of Texas at Austin, Texas with a Bachelors in Electrical Engineering specializing in computer engineering. There was no computer science undergraduate program at the school at that time.

I worked as a programmer using almost two dozen languages at the University and professionally, about half of them assembler. I worked on mainframes, a supercomputer, minis, and micros. I think Linux, Lisp and Emacs are awesome.

Please ask questions if my writing is unclear.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Macros

I think Linux, Lisp and Emacs are awesome.

If you're familiar with Lisp, you might want to consider how the feature you're asking about differs from Lisp (or Scheme) macros. There are some strong similarities, although Lisp macros are a bit more controlled.

Macros basically get access to the symbolic representation of the source code of the call to a macro. So, they don't get unfettered access to all the source surrounding the call, which doesn't give them quite the degree of flexibility you're suggesting. However, the tradeoff is an improved ability for both humans and programs to reason about the source of a program.

As you say, "a parsing free-for-all is probably a bad idea". There tend to be various consequences of such a free-for-all - one is that automated analysis of code tends to become more difficult. Such analysis includes the ability to compile code efficiently.

One reason you don't see many languages that do such things is that many people have experimented with such approaches and given them up as impractical. One idea that comes up often for people designing or implementing languages is the Fexpr, which was used in early Lisps to achieve macro functionality. Most people who try these approaches abandon them soon enough, although there are a few exceptions. One language that might be worth looking at is Kernel.

Lisp macros, and their cousins Scheme macros, provide a way to achieve something along these lines, providing syntax customization (in an S-expression context) in a way that has proved reasonably manageable, and in the Scheme case at least, quite amenable to static analysis of code.

There are some papers on this subject in the Readscheme.org list of macro papers.

Thanks Anton

I will read the things you have suggested, and comment about them later.

tyvm

Reply to Anton

I was already familiar with lisp macros and fexpr, but I reread their description. Kernel is new to me, and I have not read the entire document. It appears to be a lisp based language, similar to Scheme. Lisp macros and fexpr are related similar in some ways to TACL ROUTINES, and part of what makes Lisp awesome. Kernel, being Lisp based has a lot going for it before its specific nuances. But, they don't answer my question.

I am not quite sure what you mean by, "... many people have experimented with such approaches and given them up as impractical." Perhaps you mean what is described in http://en.wikipedia.org/wiki/Extensible_language in the paragraph titled "Death of the historical movement." Your recommendation to look at Lisp macros is aligned with the description in that article of the "Modern movement."

The article does not seem to describe what I'm looking for, which I'd characterize as related to writing a new language or changing an existing language. Writing a new routine with its own argument parsing is similar to writing a new language, and changing a Routine syntax is similar to changing a language. The difference is amount of effort involved. Writing or modifying a language is a big project. Writing or changing a routine is not.

The parser in each routine may be recursive descent. All one needs is calculations, conditions and loops to make a parser for some arguments. "Multiple macro shells" as described in the Wikipedia article are not necessary. And, the "Death of the historical movement" has not stopped people from making new languages nor changing ones that already exist.

Maybe it's just late but...

How do you propose that the language determine that source code input is even well formed? I mean, since your routines are determining essentially the grammar for the language, you're going to be stuck just trying to run the thing to see if it even parses correctly. And even then you're not going to be certain of it since you might not have traversed the path that could lead to the error...

Continuing

Perhaps I should add, that even the lexer in my hypothetical language is distributed. It is distributed among the definitions of characters in the character set used by that language.

In other words, each character is the symbolic name with its definition being a Routine that scans lexemes. For example, the main interpreter loop reads an input character, lets say the digit 5, and calls the Routine named 5. The 5 Routine knows to return a number, so scans forward in the call line looking for additional digits, a decimal point, an E or e, etc., it calculates the value of either an integer or floating point number, and creates an entry in the symbol table for the lexeme.

Both the lexer and parser are distributed among routines.

Io

The Io programming language allows exactly this. Any method/block can access the raw, unevaluated message passed to it at runtime, manipulate it, and evaluate it as desired. Io goes a step further though than TACL in that the message is an object with a rich structure, not simply a flat string.

Early flavors of Smalltalk did what you describe

... though I don't know where you'll find documentation on them. It's Smalltalk-72 you want, and maybe Smalltalk-76 as well. You should also look at Forth.

Reply John Nowak

Io looks great. I worked on an HP Non-Stop fault-tolerant multiprocessor system for several years. Io for it could have made using multiple processors easier.

However, Maybe my quest is more about how a language is implemented than what syntax and semantics it has.

The Io guide doesn't say anything about how literals are processed. It has a context free grammar to describe the language syntax, But, does not give any information about adding another kind of literal, for example an arbitrary long number to represent Pi to 500 decimal places. Maybe the add-on manual will help.

Each character in my language is a first-class object, overloading the definitions of the built-ins for digits to add a literal arbitrarily long number could be done without an add-on. Of course, one would need to overload other things, including the operators or they could not do anything with an arbitrarily long number.

I am looking for something closer to a Turing machine. Maybe a calculator that accesses memory one character at a time, with recursion and conditionals. It really doesn't matter whether it is infix, polish or reverse polish syntax. I think reverse polish is supposed to be simplest; although, I'm not fond of it. Features of a high level language can be achieved through extension.

I know that extension is supposed to be impractical, but using this method, I'm sure it can be done.

Forth and Factor

I second John Cowan above. You should definitely take a look at Forth, and I would add Factor as well. I think you're likely to find some variant of Forth very sympathetic.

S3 2008 - Piumarta & Warth - "open, Extensible Object Models"

they show, among other things, how to add "multiple inheritance" to prototype-based languages.

See also papers on vpri.org website.

Warth's Ph.d. thesis also investigates syntax-directed compilation, as it is based off Meta-II [Shorre, 1964]. The system he creates is called OMeta. Similar system is created by Chris Seaton, called Katahdin.

A side note about Meta-II. There were many variants of this idea. Val Shorre originally pitched it with a couple of sales pitches, one of which was complete elimination of the goto construct. Shorre wanted to show with good examples of hard problems how eliminating goto improved modularity and made programs shorter.

Eventually, syntax-directed compilers in the Meta-II lineage got killed off by the U.S. military when they marked the projects as classified IP.

Postscript: I am not sure why you are making each character in your language a first-class object, since I think what you want to do is what object-oriented PEGs like OMeta do: reify the PEG as the core control flow construct in the language. [Edit: Actually, I think I get it now. You are trying to build an imperative adaptive grammar [at least, according to your LtU profile, you are]. To make the grammar fully adaptive you have to start with the smallest indivisible unit possible for your parser framework, which is the character. Even so, I don't see why you are performing the overload at the DIG(IT) level. I would separate the type a non-sigiled number falls into from its parsing. e.g. the cases are the combinatoric possibilities between something like 1 and 11111111111111111111111111111111111111111111111111111111111 and 1L and 11111111111111111111111111111111111111111111111111111111111L where L denotes "Long" data type constructor. If the second Long literal doesn't fit in the target language's native definition of Long, then there is a specification error.

In Visual Studio 2008, the IDE will do the following...
for:

long l = 11111111111111111111111111111111111111111111111111111111111L;

result:

Error	1
Cannot implicitly convert type 'ulong' to 'long'.
An explicit conversion exists (are you missing a cast?)
File:	C:\Users\jzabroski\AppClass\Program.cs
Line:	25
Column:	22
Project:AppClass

Error	2
Integral constant is too large
File:	C:\Users\jzabroski\AppClass\Program.cs
Line:	25
Column:	22
Project:AppClass

]

Reply Z-Bo

I took a quick look at Katahdin, I think it's definitely in line with what I expected. I will closely examine the documentation.

Yes, "To make the grammar fully adaptive you have to start with the smallest indivisible unit possible for your parser framework, which is the character." I do not fully understanding your counterpoint.

Overloading a character is one way to change the the lexer, whether making a new comment syntax, adding a literal type, or the characters allowed in names. It is consistent with overloading of any Routine, and for that matter with overloading any symbol definition in my hypothetical language.

Read macros

This looks a little like Common Lisp reader macros. In reader macros, you can set up a character (or pair of characters) to make the reader call a custom function. That function must read chars one at a time and construct an arbitrary object. This happens in a phase prior to macro expansion so it could in principle be used to implement something similar to the TACL routines.

That being said, getting rid of the brackets and not having the operator in prefix position, you lose the little structure you had. You need a more complicated mechanism (like a context free grammar) to be able to let the operator control the lexing and parsing of the operands. If that grammar is distributed between the operators, then you have to start combining and composing them. When there are only a couple of operators of the same type this isn't so hard. For the infix arithmetic operators in most programming languages, you only need a precedence order and parenthesis to disambiguate. In the general case this could be a lot more difficult...

Reply Johnwcowan

Having trouble finding Samlltalk 72 manual. Thanks

Link to Smalltalk-72 Manual

You can find a copy of the "Smalltalk-72 Instruction Manual" here:

http://212.219.56.146/sites/www.bitsavers.org/pdf/xerox/parc/techReports/Smalltalk-72_Instruction_Manual_Mar76.pdf

While I've not found a Smalltalk-76 manual online, there is a paper by Dan Ingalls that may be of interest:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.106.2641&rep=rep1&type=pdf

Reply Matt

Will check out Factor, have already looked at Forth. Thanks

Reply Andres

Will look at reader macros, most of my Lisp has been for Emacs. Thanks

Scheme Macros

Well, in that case you should definitely play with Scheme's syntax-rules and syntax-case.

Thanks

I'll look at it.

Reply Z-Bo Continued

Perhaps the word overload is has too loaded a definition for my use of it as "overloading a character Rouitine." Polymorphic overloading is controlled by argument types, but the context of my use no argument type is known before calling one or the other. In fact, the argument about to be scanned determines whether the overload or original definition will work. There is no option but to try one, and if an error occurs, try the other.

Reply with 'reply'?

I don't mean to be rude—I'm really interested in this discussion, and (despite the good arguments against it) am always most excited by languages offering the kind of flexibility you describe—but I wonder why you are starting lots of new threads titled “Reply to …”, rather than using the ‘reply’ button? It makes the conversation hard to read.

reply to L Spice

Thanks, I didn't realize. Had not posted here before.

TeX, Racket, Perl 6 and graphical programming languages

In other words, each character is the symbolic name with its definition being a Routine that scans lexemes. For example, the main interpreter loop reads an input character, lets say the digit 5, and calls the Routine named 5. The 5 Routine knows to return a number, so scans forward in the call line looking for additional digits, a decimal point, an E or e, etc., it calculates the value of either an integer or floating point number, and creates an entry in the symbol table for the lexeme.

The TeX language (on which LaTeX is based) has similar features. Tex macros can use \catcode to change the lexer category of individual characters, and can take as many arguments as they wish, depending upon internal logic. This means that a macro can "overload" a character's meaning, and analyze the rest of the file or part of it if they wish.

Indeed, the pgf / tikz extension for TeX defines a Domain-Specific Language for drawing figures, and switching from "pure TeX" to "Tikz-enhanced TeX" is simply a matter of :

\begin{tikzpicture}
[pgf / tikz code here.]
\end{tikzpicture}

However, the fact that any macro can "eat" any number of characters makes the language extremely difficult to debug. When you incorporate some advanced TeX code within Tikz-style code, sometimes a Tikz macro will eat part of your code and evaluate it — but since it's only part, your code is broken and crashes. This means that code is not portable within the same program (moving a working piece of code results in it being broken).

I have tried to write a scheme interpretor in TeX, which would allow you to embed pieces of scheme code, result of which would be included in the document, and I found it really difficult to code something of such complexity, due to the debugging problems explained above.

The problem is that you can't put a hard limit on what a macro will take as arguments (thus preventing it from eating the whole source), and you don't have meta-levels (actually you can use \expandafter for this purpose, but it's usage is awkward), which would allow you to do :

\weirdmacro{___Here we're in meta-level one,
execute some stuff and insert the result just
in front of \weirdmacro before executing it___}

Racket, which was discussed here previously, offers something similar (lisp reader macros, which Andres Navarro talked about earlier in this thread), and hopefully avoids the above problems, although I have not played with it enough to tell. See Creating Languages. They explain here (3rd paragraph) and here (code of dollar.rkt) how to extend the base language (scheme) with infix operations between a pair of dollars. It is pushed further with scribble, which defines a language for writing documentation, but which can still include pieces written in scheme.

Also, I have heard that in perl 6, the parser is accessible to the programmer, effectivly enableing him to re-define the language from within the program, bringing it close to what Katahdin does (thanks for the link, Z-Bo by the way!).

Another solution to the lack of structure that derives from macros interpreting a variable length of source, is to use graphical programming languages : I'm currently working on a language where "functions" (graphical blocks) can control their look, and thus each function provides an ad-hoc user interface. The programmer then uses that interface to specify the function's parameters. This means that the looks of the code is controlled by the code itself, but it's structure remains. If you can read french, here's my report (pdf & LaTeX) describing the intent of that language (the code is an utter failure, but the report is ok).

P.S. : I hope my english is good enough, it's not my native language.

tyvm Georges

Your English is very good, unfortunately I cannot read French.

The issues you describe are interesting consequences of a mutable programming language. I am optimistic they can be resolved adequately, because people are very adept at speaking a variety of mutable languages, including French and English.

Since we use a mutable metalanguage for everyday communication, a simpler mutable programming metalanguage seems to me a realizable goal. Using reader macros to extend a language is a method used by several languages, with advantages and disadvantages. Katahdin has blended languages using another method.

The distributed lexer and syntax analyzer technique seems less studied and documented, which peaks my interest.

I have been trying to find a mutable metalanguage based on routines with parsers that get their arguments. With such a language, it will be possible to study the consequences of parsing distributed among routines. This study could identify advantages and disadvantages of distributed parsing. But, such a language does not seem to exist.

I don't know if I am able to do this work for two reasons: first, my innate capabilities, and second, time to do the work. I am not a young man.

At this time, I am trying to find a simple way to demonstrate (maybe prove) parsing distributed among routines that get arguments can, in fact, make a mutable programming metalanguage.

REXX

REXX is a scripting and command language from about 1980, when the term scripting was not yet used. It is still actively used today on all major operating systems. What you are describing strongly reminds me of REXX, if one limits oneself to an entirely textual, as opposed to symbolic, domain. In REXX, both code and data are text, and a program can freely generate and change itself. A distinguishing property similar to what you say of TACL is that a procedure is passed a single argument, a string, from which it, by parsing that string, obtains as many pieces of information as it needs.
Here is a small and silly example: the program reads and computes simple postfix expressions. Each line is passed to a procedure where the expression is parsed and transformed to infix. Arg is a simple form of the much more general parse command. The data is ‘hidden’ as comment lines at an arbitrary place within the program.

    do i=2 by 1 while sourceline(i-1)\='/*'; end
    do i=i by 1 while sourceline(i)\='*/'; say op(sourceline(i)); end
    exit
    /*
    3 5 +
    3 5 *
    314 217 /
    */
    op: procedure
    arg x y o                 /* think of x,y,o as three arguments */
    interpret 'return x' o 'y'

Note that some REXX implementations interpret the program line by line, so the difference between ‘normal code’ and ‘data’ is blurred: one can mix them as he pleases, provided control flow is so organized that ‘data’ can only be executed indirectly.

tyvm Boyko

I played with REXX, maybe in the 80's and it is an excellent scripting language, especially for its time. the HP Non-Stop(tm) TACL and Datapoint's Chainplus are in the same category.