Theory of syntax extensions: does it exist?

I've become interested in the question of syntax extensions: not the general theory of extending syntax, nor the mechanisms for extending syntax (there are many, from the C preprocessor to Racket's extended syntax-case), but some basis for deciding what syntaxes should be provided and how they ought to be organized. Most languages have fixed syntax, either because their creators took it for granted or as a matter of principle. Those that do allow syntax extension usually have cultural constraints against overusing it, and the extended syntaxes provided by the standard library are a rag-bag whose members depend on history or the creator's whims or both. We seem to be at the same state where syntax libraries are concerned that we were in in the 1960s for procedure libraries.

Is there state of the art about this that I haven't heard about yet? Please inform and/or enlighten me.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

tuppence

In one of the two big extensible-language symposia, around 1970, someone made the point that extensible syntax means you can't make any sense out of a block of source code until and unless you know what syntax extensions have been done before it. My take on this was a paper I started in 1988/9 and finally finished nineteen or twenty years later: Well-behaved parsing of extensible-syntax languages. I found the results reached theoretically interesting, but my practical conclusion was that syntax should not be structurally extensible: I recommend an unambiguous syntax with extension limited to a choice of which syntax rules (a.k.a. "operators") are available in a given environment.

The problem persists

Even if your language has fixed syntax, which syntactic elements should it have? The answers remain AFAIK purely empirical.

Do you mean, what should the

Do you mean, what should the fixed syntax be, or which "operators" should you have?

For the first question, I know of no solution except to try something. Lisp did in fact try something, which has on one hand the advantage of not introducing arbitrary inflexible conventions, and on the other hand the disadvantage of not leaving much flexibility. Attempts to "improve" basic S-expression syntax have been imho not particularly successful. The ones that try to eliminate some of the parentheses typically go about it wrong; either trying to eliminate the parentheses on small expressions where, frankly, parentheses are most desirable, while leaving the parentheses in place on large expressions where parentheses are least appropriate; or else introducing multiple parenthesis-like delimiters that make the syntax more complicated without offering any fundamental improvement. Then there's named keywords, which I disapprove of because it screws up the underlying structural conception of the language. My own first approximation that I'd like to try calls for multiline expressions using coordinating lines and indentation, and within-a-line expressions that require alternation between subexpressions and keywords. I can't really prove that it'll work except by implementing it, which, well, hasn't happened yet.

For the second question — which operators to have — that I see as just a souped-up version of the question of choosing names for identifiers.

I mean "syntax" in the Scheme sense

Not lexical syntax, but the syntax of constructions. Ordinary operators don't count, because they are just sugar for function calls. We know that if and lambda are pretty much requirements, but what should be additionally provided for ease of use, and how do we decide?

This is analogous to the question ""What math functions should be provided?" Plus, minus, times, divide .....

Sequence, alternation and repetition.

I thought the classic elements were well known as sequence, alternation and repetition. In the modern context you might want to include parallel as well, but it's not necessary to write programs, and some would say it's the compilers job to extract parallelism from repetition.

unsure

I'll try to rephrase (alas, I often don't put things concretely enough).

  • To me, "syntax extension" means the programmer can choose to specify what syntax is used to call their device; for example, they might favor something of the style
    for x = y to z step w
      <block>
    end for
    

    or perhaps

    (for-loop x y z w <block>)
    

    (There's no accounting for taste.)

  • I would ordinarily take "syntax in the Scheme sense" to refer to choice of special forms. Of course I favor first-class operatives ("fexprs"), so that in Kernel you can't actually ask what special forms are needed because there are literally no special forms; however, there are primitive operatives in the ground environment. Some of those are merely the underlying operatives of primitive applicatives; for example, applicative +, which evaluates its operands in the current environment and adds the results, has an underlying operative that simply adds its operands without evaluating them first, throwing a dynamic type error if they aren't suitable for adding together. If you discount those underlying operatives, and just count the ones that are explicitly presented in the Kernel report as primitive operatives, there are indeed very few of them; only three in the language core ($if, $define!, and $vau). Of course, $lambda isn't one of them; it's a library feature, which can be constructed using other stuff (most centrally, $vau).

Minimal set of operators

needed for syntax constructions is:

  • choice operator
  • sequence operator
  • function definition operator
  • parameter application operator

Beside these, if we wanted a complete language, simple math, boolean and string addition operators are also needed. "If" can be described in notions of function.

Depends

I don't remember everything but AFAIK this is only true in a classical interpretation of math. I.e., if you take computational strategies into account, if needs to be a primitive when rewriting eagerly. (Also likely becomes more troublesome when an if checks preconditions to a function application.)

"if" example

Here is what you can do with pattern matching:

// The following example shows how to construct Case function
// and If function in terms of Case function. Rudimentary
// Boolean set if created to assist If function.
//
// no prefix denotes new symbol
// '*' prefix denotes type variable
// '@' prefix denotes existing symbol
// '|' denotes a choice
// ',' denotes a sequence
// 'a <= b' symbol definition; reads "define 'a' specifies from 'b'"
// 'a => b' symbol definition; reads "define 'a' specifies to 'b'"
// 'a <- b' parameters application; reads "'a' specifies from 'b'"
// 'a -> b' parameters application; reads "'a' specifies to 'b'"
//
// Note that there are no hardcoded symbols. All
// calculations are done with pattern matching. (^_^)

Test <= (
    Case <= (*condition => *result) |

    Boolean <= (True | False) |

    If <= (
        (
            Condition   <= @Boolean, 
            TrueResult  <= *var1,
            FalseResult <= *var2
        ) => (
            @Case <- (
                @Boolean.True  -> @var1 |
                @Boolean.False -> @var2
            )
        ) <- @If.Condition
    ) |

    @If <- (True, "yes", "no") // returns "yes"
)

Just to stay on the subject, the same set of operators can be used for constructing parsers and code translators.

Not sure

about the state of the art, but here is a way for extending syntax. It's about functions. Suppose we have a simple math parser defined in the following way:

Sum <= (
    AddSub <= (Left <= (@Sum | @Null), In <= ('+' | '-'), Right <= @Fact) |
    Fact <= (
        MulDiv <= (Left <= @Fact, In <= ('*' | '/'), Right <= @Exp) |
        Exp <= (
            Group <= (Left <= '(', In <= @Sum, Right <= ')') |
            @Integer
        )
    )
)

If you want to extend it with, say, syntax for factorial, without changing the initial "Sum" library, you write the following:

NewSum <= (
    Factorial <= (('!', @Integer) => @Integer) |
    @Sum
).Sum


Because the function Factorial has @Integer as a result type, it's easy to make an algorithm that automatically accepts the factorial syntax wherever an integer is needed. "NewSum" defines "Factorial" in the same level where "@Sum" is included and exposes ".Sum" to a user. Parsing algorithm checks first-depth levels of all parents to pick insertable functions. To reach other functions, an explicit call should be made.

It is possible to define all the math calculations too in the same code, but I omitted them for a sake of simplicity.

I assume this was initial thread question. How do you like the answer? I'm interested in critics.

Trying to understand code

Is it specified somewhere how this sort of syntax-extension code is executed?

Yes

Here is (somewhat simplified) fully defined first part:

Sum <= (
    AddSub <= (
        (
            Left <= @Sum, In <= ('+' | '-'), Right <= @Fact
        ) => @Integer -> (
            @Case <- (
                @In == "+" -> @Left + @Right |
                @In == "-" -> @Left - @Right
            )
        )
    ) |
    Fact <= (
        MulDiv <= (
            (
                Left <= @Fact, In <= ('*' | '/'), Right <= @Exp
            ) => @Integer -> (
                @Case <- (
                    @In == '*' -> @Left * @Right |
                    @In == '/' -> @Left / @Right
                )
            )
        ) |
        Exp <= (
            Group <= (
                (
                    Left <= '(', In <= @Sum, Right <= ')'
                ) => @Integer -> @In
            ) |
            @Integer
        )
    )
)


And the extension part would be:

NewSum <= (
    Factorial <= (
        ('!', Param <= @Integer) => @Integer -> (
            @Case <- (
                @Param == 0 -> 1 |
                @Param > 0  -> @Param * !<<@Param - 1>>
            )
        )
    ) |
    @Sum
).Sum


where '<<x>>' is used to pass 'x' to parser. Code shown in this post can calculate supported math expressions while preserving operator precedence. This is about extensible language syntax. I'm not sure if I got it right. Does anyone have any suggestions? Or that would be it?

You say that like it's a bad thing

"someone made the point that extensible syntax means you can't make any sense out of a block of source code until and unless you know what syntax extensions have been done before it."

Uhm so?

No on makes syntax extensions at random. People make them when they can make code more readable.

If you have a bad syntax extension that's like any other mistake or bug. The answer is "don't" and "it's your fault".

Dangerous things should be

difficult to do by accident. If an unambiguous syntactic framework provides plenty of notational flexibility, so that more flexibility than that doesn't actually accomplish anything useful but opens you up to problems, then, as a language designer, don't do that. If there is no such unambiguous syntactic framework, then there's another set of questions to explore. So, attempting to devise that sort of syntactic framework is of interest.

Figuring out semantics is

Figuring out semantics is difficult enough while reading code, I shouldn't have to also figure out the syntax just to get to the point where I can start figuring out the semantics.

Whither semantics

Why aren't we / I day dream maybe we could avoid writing in semantics, rather than syntax?

my practical conclusion was

my practical conclusion was that syntax should not be structurally extensible: I recommend an unambiguous syntax with extension limited to a choice of which syntax rules (a.k.a. "operators") are available in a given environment.

I had a similar conclusion after studying extensible grammars for a couple years. Some observations:

  1. I don't want syntax extensions themselves to become a form of boiler-plate in my source code.
  2. I am not so good at predicting or understanding syntax from a composite set of independent syntax extensions. I am better at learning from examples.
  3. Modularity remains important. The client of a module or user of a function should not care about the syntax in which that module or function was written.

From the first two points, it seems preferable that I can identify syntax for a volume of code with a single word or symbol. This minimizes boiler-plate and makes it easy to search a codebase for examples of a syntax. Together with the third point, it seems useful to treat syntax extensions or embedded DSLs as something close to external DSLs, albeit with easier integration.

Reusable language components, composition of syntax extensions, and even versioning and deprecation of syntax is still possible. It's just shifted to the development of libraries for defining new languages.

Whereas I took the completely opposite approach...

In my defense, I was attempting to create a language that was a natural translation target for code written in many other computer languages. The basic idea was that, with appropriate libraries in place, you could take code written in any "normal" programming language and make only a very superficial syntactic (not structural) transformation to have it run in the target language.

Essentially I took the position that all "normal" programming languages can be parsed by recursive descent, and that delimiting recursive descent parsing by parens yielded something that could, if you took a few liberties, be seen as code in a lispy language.

But in order to do that, I had to define executable routines so broadly that every last one of them could function as a syntax transformation.

And, bizarrely enough, John and I ended up in a very similar place, re-exploring the semantics of fexprs.

Yes, it does seem curious

Yes, it does seem curious we'd go our separate ways and the paths would both lead to the same neighborhood. Perhaps because we were both looking for something deeper than superficial syntax but not abandoning it for semantics.

Syntax transformation? That's all we need.

There should be a bunch of syntax definition libraries arranged in a tree like package fashion. They should be uniformly reachable at the time we want to use them. We only have to distinguish between core languages and language extensions (which are core language dependent).

What's interesting, if a core language is Turing complete, extensions could be written very easily, as syntactic translators from extension syntax to core language syntax. Further, as all core languages ultimately compile to assembler (which is Turing complete), these core languages can be written as syntactic translators to assembler.

So at the end, we don't even have the distinction between core languages and extensions. The only core language would be assembler and all other languages (being "core languages" or "extensions") would be extensions built either on top of assembler, either on top of higher order languages that are extensions of assembler.

My opinion is: everything can be done with syntax transformation, as long as we have a Turing complete layer underneath.

Turing complete assembly code...

Is sure enough Turing complete, but has so much hair growing on it - in terms of the semantics of, eg, condition registers, execution sequence, modular arithmetic, floating-point representation peculiarities, etc, that it's really hard to reason about.

In many ways a higher level language isn't so much an extension of assembly code so much as an attempt to restrict all that hair so you get something smoother to reason about.

Hmm. Perhaps programming models have a hierarchy that can be expressed in terms of hair, like:

Machine code: Hair.
Assembly language: individually labeled hairs.
C: Some of the hairs have been woven into twine.
C++: Dreadlocks.
Ada: Most of the hair has been woven into ropes and bondage gear. The rest is tucked up under a military uniform cap.
Java: A popular hairstyle.

All the way down to ....

Coq: A billiard ball. Has good aerodynamics but most programmers have no idea how to get hold of it.

lol

Makes me wonder what language would be a Mohawk. Better yet, a bright purple Mohawk.

the Age of Aquarius

Algol, the musical.

ETL

I have developed syntax extensibility framework ETL. It is mostly based on ideas of Prolog expression extension ideas with possibly to add statement (mostly development of ideas from LISP and Dylan). Language definition and usage workflow is similar to one of SGML/XML: there is a grammar that could be used to parse source defined in it to common document model.

The core ideas about extensibity stands and language definition worklflow still stands. But new version 0.3.0 of framework in the Git has some changed rules about lexical layer (it is much more Java-like with respect of numbers), and it has more restricted rules about keywords. ETL 0.3.* is also major refactoring to simplify framework usage in IDE context, but I'm still working on the plug to IDEA. The work is stalled for some time due to other activities in my life.

The core idea is that syntax is defined is separately from semantics and it builds Object/Property AST. Downstream processors works objects.

There are some mechanism for extending the grammars (grammar/context extension).

Note, that I have started language extensibility by letting go notion of supporing any lexical layer or phrase layer. Lexical layer and ways the blocks statement organized are hardcoded, but there are escape hatches in the form of prefixed strings that allow for adding custom tokens in case of need. Extension happen on operator/statement level. Baically, grammar developer works with statements, operators, contexts and grammars. In a typical language spec, it is possible to see that languages are defined in these terms, but later these terms are translated to LL(*) or other types of grammars. In ETL, there is no need for such translation. Grammar compiler does it for you.

As a teaser, here is a parsed source of language grammar defined in itself. Left side is document model to be used by normal downstream processors, right is grammar source. The file is produced by greneric tool that supports any language defineable in ETL.

Natural language

Natural language has keywords which unless you're e. e. cummings you can't use in general contexts. Such words perform a syntactic function but may also include semantics e.g while. Surely it's not just due to the whims of their designers that these have been carried over to programming languages.

From extensibility point of

From extensibility point of view, it is better to compare programming languages not to generic natural language, but to sublanguage of it, namely mathematics. In mathematics, new syntax constructs are invented constantly. And it is actually major part of how it evolves. I see no reason, why programming languages should not strive for the same flexibily on syntax level.

Currently, in mainstream programming, evolution of the language is locked in hand of few language designers. On other hand, evolution of the libraries is available for all, and there is much more progress there. I see it as problems that users of the Java language had to wait years for blessing of Java owner to get lambdas, enums, or even foreach loop. We are still wating for some simple extensions that are available in groovy and scala.

Haxe macros

I don't know if this is sufficiently related, but! I worry that people ignore valid reasons some things are in the core language vs. the libraries. To wit, e.g. Haxe macros where I wanted to have 2 missing features, and I got macros for them... but then they would not work together because of the order in which they should be applied was not obtainable. And probably they should have been interleaved. So fundamentally w/out a better gordian knot solution to the macro approach (e.g. reapplying until reaching a fixpoint or something like that) it is a flat out bad evil mean abdicating weasel LIE to say that Haxe macros are the answer to all missing features, that all missing features can be done outside the core.

macros are limited

There are problems with macros. They are good for syntax sugar style extenstions, where one does not have to be aware of context and semantics. However, not all extensibility scenarios like it. Some require really complex processing and require awareness of types and of other context.

I think that lightweight extensibility mechanisms should be supported, but compiler should allow heavy extensions too. For example, asynchrouns programming requires some heavywieght language extensions, as most reasonable implementations of it will involve some form of implicit CPS rewrite at points of asynchronous invocation.

Corporate politics

Currently, in mainstream programming, evolution of the language is locked in hand of few language designers.

Mainstream languages are embroiled in corporate politics - not really in the hands of individual language designers.

How would you feel about

How would you feel about a crowdsourced language whose core would be put online on a dedicated site? The site would be a crossover between a dedicated IDE and a dedicated social network (like github + IDE + twitter, dedicated to this language). This site would be a perfect place for collaborative development of language extensions, libraries and whole applications. Programmers would have an opportunity to share their work with others over this network by simply making their results available to public with one mouse click.

I'm talking about no corporative steering for profit based solutions, no waiting for big heads to implement what programmers want, just pure freedom of making extensions with usability feedback from other programmers in collaborative social machine.

What if this language I'm talking about would have a Turing complete core that could host any syntax extensions and the whole programming languages, as well as DSL-s (like Math set operation chruncher, or anything else) to be used in place with general use languages?

What if all of said would be possible to be ran by the crowd of independent programmers?

Would this site with this language make any difference in the programming world?

And what name for this grand language?

I propose Titanic.

I was thinking of

I was thinking of something more lightweight.

Maybe Utopia. I wonder if it is possible :)

And I don't understand the negative attitude. Isn't this what we really want?

I'm skeptical

I think Turing completeness doesn't solve the real extensibility problem, which is interoperability of extensions. Some kind of environment that makes free exchange easy sounds good and useful for producing libraries, but when it comes to designing language fundamentals I don't think number of collaborators is as important as quality of collaborators. Also, free software still generally has the problem of programmers needing to put food on their families.

Tx

Thank you for insightful opinion. I'd like to share some of mine too.

About quality of designing language fundamentals:
When there is enough quantity, some number of entities will reach the quality. Filtered by the tweets, likes or +1s, it should be easy to get the cream from all of it. And dealing with fundamentals is not really that much messed up. Those fundamentals are merely libraries with some syntax sugar. They are just another functions, but with specific syntax, if we like calling them through specific syntax lens, instead of through standard language interface.

About free software:
Take a look at Linux. As free as we can imagine. Yet it takes a part of the pie from big corporations. Not quite the same quality, but it weights. And it is free. I guess there is something really romantic in all those programmers :) Imagine what would happen if all those programmers had a centralized quality tool for collaboration. But I think that direction of the future changed with the appearance of HTML5. I think that web apps are the right direction.

Suppose you had a cool job with designing some cool app and got what you had to feed a family. Wouldn't some of you, just because you're romantic, share with others some reusable code fragments from your job? Especially if you have everything you want already? I think that most of us don't work because we have to earn a dime, but because we like to do what we are doing. And if we are doing it well, I think we'd like to share our work with others.

crowdsourcing

Fwiw, I seem two challenges for serious heavyweight crowdsourcing of this sort of thing. One is how to bring true crowdsourcing to bear on things that depend on overall coherence; it looks to me as if naive voting on specific content is fundamentally ineffective for such things. The other is (as alluded to in the preceding two comments) how to fit crowdsourcing into a functional economic system (a fundamental problem with Wikipedia, actually: it's suffered from having no real economic model, just "please donate", which works well enough if you have a project vastly smaller than the global economic system it's embedded in but starts to decohere when the project becomes itself a major player on the global stage).

Voting

could be complemented by statistical reports also. More the library or syntax extension is used, higher rank it gets. But this shouldn't be the only criteria for ranking, as newly created libraries also deserve a chance under the sunshine.

I had some thoughts on distributing donations over all engaged programmers. But I'm still not completely sure of how would all of that work. I still don't know how would a fair distribution formula look like.

My distrust of voting of

My distrust of voting of this sort is not about how to make the choice. It's about what choice is being made, and about how that relates to why you want to crowdsource.

There's an approach to things, sometimes called "big data", that is in some hard-to-pin-down sense the antithesis of sapience. (Apologies that I'm about to use a lot of words; I'd use far fewer if I knew how.) Instead of looking at things and grokking them and taking guidance from that insight, you mechanically process lots and lots of cases without understanding any of them and try to produce an answer by sheer "force of numbers". Artificial Intelligence is, I think, essentially the same: no insight, instead using mainforce to come up with a workable solution without having to understand anything. These sorts of techniques can be used to complement sapience, but it's not hard for them to instead interfere with the functioning of sapience, in much the same way that people for centuries (or perhaps millennia) have been aware that bureaucracy can amplify stupidity; I like the term "anti-sapience" for this effect. To my thinking, sapience is a spectacularly inefficient process that produces tiny drops of insight that cannot (by all I've ever... grokked) be obtained by any other means, and almost all of which gets wasted, lost. Anti-sapience effects can make that worse, and the great challenge is how to gather up more of those precious drops of insight. Crowdsourcing — when you start thinking about it on a global scale, rather than some infinitesimally small project — is about collecting a bigger share of the total insight of the vast numbers of the population. It's not at all like "big data". And when a crowdsourcing community votes on specific content, that presupposes that specific content is what you should be selecting. But selecting specific content is a big-data approach. Is there an alternative? Well, yes, I've seen one, but don't imagine I'm saying all the kinks are worked out of it either. While Wikipedia is sort of in the vote-on-specific-content family, its sister project Wikinews is... very different, so much so that it inspires some Wikipedians to thoughts of sororicide. The whole structure of the Wikinews project is oriented toward individual people in the community gaining reputations within the community, and the community selects some of them to promote to reviewer status; each submitted news article has to be rigorously reviewed by an authorized reviewer (who is uninvolved in the writing of that article) for compliance to community policy. Each article typically gets looked at extensively by only two people: a writer and a reviewer; the whole community is involved with the process, and the people, but not the specifics. Yes, this particular arrangement requires some properties of news that don't apply to an encyclopedia; I mean simply to point out that it's possible to imagine different approaches to crowdsourcing that seek to realize the potential of individual people as well as individual artifacts.

What is an alternative?

Well, we are living in one right now and I just don't like it. We are living in a programming world where only rich men can succeed because they have a power to gather up a hundred of programmers under their boots. Ugh!!! Sorry, I get emotional when I think of bosses and all of those employee stomach aches because some problem has to be solved the way the boss says. Ok, I have to admit, this is not a kingdom world anymore, but why to stop here?

Anyway, I think that humanity owes itself to try to reorganize "badly" structured communities. I'm not sure if I have the right answers, but we (humanity) try here and there. And someone occasionally succeeds. I don't know if trying to achieve better is more about being somewhat crazy or being of the real world use. But those two come together and in some small amount, the real world use gets somehow better and better.

I just think we shouldn't give up and stop at where we are.

stasis

Oh, I agree we shouldn't stop improving things. One's first thought isn't necessarily a good one, though; one wants a balance between too little planning and too much. It's also worth being aware of problems bigger than one has time to tackle directly atm; for example, we've got no economic system that doesn't have severe problems. Unregulated capitalism is a disaster when it's on a global scale for an increasingly information-oriented civilization. Government regulation is another kind of disaster waiting to happen (because government isn't separate from the system).

Wikipedia falls far short of the potential of crowdsourcing, partly because it dehumanizes the users. If you say "treat contributors equally" it sounds good, but not if you say "treat contributors as interchangeable parts". Somehow quality control is needed, and Wikipedia is philosophically incapable of applying meritocracy to the problem, so the two means it uses are, essentially, mass voting on content, and bureaucracy. In other words, endless squabbling and red tape. If you figure you're trying to tap into the potential of each individual contributor for coherent thought, evidently voting on specific content is too oblivious to individuals and too oblivious to big-picture coherence; and bureaucracy is dehumanizing too. There probably isn't just one solution, either; the sort of solution one wants likely depends on the specific problem domain. We should have a vast range of techniques we can pick and choose from and adapt for a given domain; instead we have, mostly, a few rather narrow techniques that all lean toward dehumanizing users.

So. In this case, what are you trying to accomplish? And maybe we can work from there to how to achieve it.

I'm here to hear your opinions

So. In this case, what are you trying to accomplish? And maybe we can work from there to how to achieve it.

I'm just checking grounds for a project I'm currently developing in javascript. Here, on LTU, you are smart guys so I wanted to hear your thoughts on some themes. For now, the plan is first to program a metalanguage, and then mentioned online collaborative IDE. IDE Would be free for opensource projects, that's how I plan to gather up libraries for which I don't have resources to develop. If everything passes as planned, it would ignite a crowdsourced phenomenon.

Metalanguage with complete runtime would, of course, be free for redistributing with commercial and other apps (like .Net framework is).

Beside free online IDE version I would host for public opensource projects, I plan to sell IDE copies to companies that want to have private collaborative IDE sessions on their own servers.

So roughly, for now, these are mine intentions. Might work out, but I'll believe it when I see it

Seems like a worthy goal

I suspect the challenge with your plan is going to be convincing free software developers to use your system and that will be largely dependent on the perceived strengths of your environment. I doubt that an offering of Turing complete extensibility will even move the needle. A nice collaborative IDE? That seems a little more substantial, but I'm still skeptical. I wish you luck, though.

extensibility / IDE

Interesting juxtaposition of extensibility with IDE. It seems to matter a lot how the two interact with each other. I mentioned somewhere above a fatal flaw of extensibility, that when you look at a piece of code you literally can't read it unless you know what extensibility has already occurred; so... does the IDE help you track that?

More than that

This IDE I've been pondering about is in the form of operating system. Each code fragment can have its own syntax (Either DSL, either complete programming language). Those fragments are accompanied with IDE UI links to their syntax definition (I like those autogenerations from Javadoc, so I'd like to implement a version of that too for my language).

Moreover, if a code fragment is built in respect of certain GUI interface (parallel to Java interfaces), instead of code, you will see that interactive GUI interface in the place of the code fragment. Why should be all interfaced through code text streams? Although I plan to keep that under the hub, I'd like to implement GUIs wherever possible. And I'd like to give other programmers that ability too. And GUI means applications. Thus, it could be a little operating system. But I don't know, maybe this would be an overkill.

tooling associated with extensibility

If a language or tool transforms code (from one syntax, presentation, or organization to another), you would like an IDE to provide a two-way mapping, so browsing one has fast access to the other. So part of reading either of them involves spot checking the other too. When debugging you want to see both: one closer to what happens and the other closer to original intention expressed.

Syntax extensibility is similar in nature to code rewriting, and I had been thinking about browsing before-and-after versions of rewritten code. The ideal form I had in mind was capturing a tarball that relates both, with markup for browser display either done statically as another file in the tarball (or else dynamically on demand if you write a lightweight http server viewing the tarball). It would amount to debug info corresponding to a build.

Like ivanodisek, I'm thinking of a little operating system, using lightweight processes, which is what would implement the daemon serving http access to the debug info. For extra credit, you can figure out how to integrate with gdb, so some protocol allows symbolic debugging of either the before or after version.

Stu: Can I inject JS into the daemon to alter browsing dynamically?
Wil: Sure, knock yourself out.
Ned: Or just inject new processes, some of which emit JS.
Stu: But I don't want to compile to JS. I want it to be real.
Ned: Your idea of reality is weird.

Useful

Successful open source projects need to be useful, and by that I mean do something useful before people will engage with them. Linus had a complete working kernel (lacking in features) before Linux was released. Also many believe the success of Linux is due to the direction being tightly controlled by Linus.

So the success criteria for an open source project seem completely at odds with the goals for this language.

Without competition there is no pressure on the language to be effective. Rather than collaborative, maybe what you want is a competitive environment with as many starting languages as possible to create evolutionary pressure, where only the fittest languages survive.

Project tree

Also many believe the success of Linux is due to the direction being tightly controlled by Linus.

I believe that centralized environments are more effective than decentralized ones. But I don't like all that bossing around to get things done in the perfect way.

With a good project manager, people would have the best of centralized and decentralized worlds. Suppose you are starting a new project you want to develop in collaborative fashion. You would expose the major project tree task points, leaving empty name places beside the tasks you are not developing currently. When people want to engage with a specific task, they would pin their name beside a task they want to develop, so the others know what's going on. I think this would be a better way to do things than classical boss-employee directives. People should be free to choose what they want to do. Moreover, people should be free to choose how they want to do things, so the starting project tree should be as least unfolded as it can be, while leaving developers the freedom to design things in their own way. If someone doesn't like some part of a project in development, she can always fork an alternative tree node.

So, instead of having centralized, people centric pyramid, it would be cool to have a centralized, project centric pyramid needed for effective programming, while leaving to developers the freedom to choose and design code segments they want to participate in.

Starting

You will still need to build the core concept yourself, or persuade people to contribute their time. If it was easy, all you would have to do is create a GitHub account for the language source, and it will get built. Let's face it, without strong leadership to get people to commit to the project, it's just not going to happen.

The things are that beyond

The things are that beyond lisp, there is no experience in language extensibility and this is real challenge to make it work. Particularly with type-aware extensions.

There is no clear concept yet of what is good extensibiliy path.

But long time ago, there was the sampe problem with function and procedures. Extensibily along that dimension has become state of art only when concept of call stack and local state was understood and implemented in the languages and in programming practices. No it looks self-obvious, but just one could reread "Go-to considered harmful" to understand that what we consider for granted now was a major insight that reshaped the way we program before.

On syntax level, I'm trying to solve the problem with ETL. I think I has made a good progress with composable and extensible grammars. But on semantic level, there are own cans of worms that has to be handled. We need to invent basis for sane composable extensions and the concept of open compilers and safe compile-time environment.

I also think that fundamentals of language are not big problem right now. JVM or .NET semantics as base could support development of languages for long time. No need to invent it now. Later, it might be recognized, that extensible languages will greatly benefit from runtime features that are beyound of what is offered by these platforms. But IMHO we are not the point were we could recognize it.

After all, extensible language is extensibility at compile-time, the target runtime matter a little, at compile-time any reasonable (GC, dynamic module loading, runtime code generation, portable libraries, etc) platform that does not fix front-end language would work. Both JVM and .NET qualify. On other hand, Haskell will have a problem (no dynamic modules and runtime codegen).

I think you miss a point.

I think you miss a point. There is no need for a central site for language evolution. Like that you do not need a central site for developing a library.

You should be able to do it locally for yourself. And share on some central site if you want co-developers. There are might be official language extensions, apache language extensions, or private-brew language extensions.

The language extension is just a library that executes at compile time and provides both syntax and compilation for new constructs.

The core language is just infrastructure that allows extensions to live together with basic services. It has to be developed, but this is a research time, so there is no problems if a number of such platforms will flow around. It would be even better if there will be many of them now, because "the way of doing things" is not understood well now.

Java and .Net framework

wouldn't ever succeed without extensive libraries that take some hundred of programmers to develop. One man alone simply cannot do it. It's too big.

really?

I'm not sure that's true.
The libraries aren't that impressive.
What's impressive about .net and java are parallel garbage collectors that scale huge.

I agree

about the most of the libraries alone, but all of them put together? With a database access among them? Takes years for a single man to do it, right? Or am I missing something crucial?

I'm not sure what point you

I'm not sure what point you are trying to make here.

.NET and Java succeeded because they delivered relatively solid foundation, so libraries could be delivered and used. The idea is that one man alone or small group could evolve the entire language only up to some extent. After that the the language development need to become distributed, allowing for competing language extensions to co-exists (possibly in different modules of the same program).

Centralized site

I think that centralized site for collaboration would beat a big number of different project sites scattered over the web. When you need something, you know where to look for, it's all at one place.

Big corporations succeeded because of big foundations they ship with their products. Other third party add-ons mostly stay unnoticed. With crowdsourced language there wouldn't be such a limit between first and third party add-ons. One could start with relatively small number of add-ons, and the community would fill in holes and do a lot more. Eventually, I think that it would excel against corporational products.

Felix

Felix has a user defined grammar, meaning the language grammar is defined in the library. It's not a question of whether the user can extend the base grammar, there is no base grammar, except for the grammar used to define grammars.

The syntax for defining new syntax is hard coded into the compiled parser. An EBNF like syntax is used for productions, with action codes in R5RS Scheme. The parser is bootstrapped from Dypgen and I am using OCS Scheme embedded in Ocaml.

http://felix-lang.org/share/lib/grammar

But this is not a theory. Here is a theory.

Non-terminals are types. Grammar productions are type constructors. The way to support the Open/Closed Principle correctly for types is known, and so that should be the way for grammars too.

The technique is called Open Recursion. It can be done in Felix because Felix allows parametrised productions, that is, a production can take a non-terminal argument. I have not made much use of it at this stage, but here is an example:

http://felix-lang.org/share/lib/grammar/utility.fsyn

The system supports "higher order macros", you can pass a macro to another macro. However the current version is untyped (more precisely, unkinded).

For practical extensibility by multiple users without conflicts, modules packaging closures of various stages of extensions would be required.

It is not clear how far one can push open recursion. The Felix compiler itself used Ocaml polymorphic variants with open recursion originally, to manage phases of term translations. It soon proved impossible to manage, because of the number of types required: expressions, statements, patterns, and some other things as well, all had to be parameters to the open constructions, which meant there was a combinatorial explosion of possible closures, and a huge amount of text was required to build even one complete closure. This is what happens when you have 20-30 type constructors per type, instead of the kind you'd have in a school or an academic paper.

So in practice, a single set of fixed constructors and dynamic checks is a lot easier to manage, even if it isn't as strongly typed.

Open recursion

I would argue that the right way to handle this isn't widely known. The problem of combinatorial explosion that you're describing is a typical problem that occurs when you try to model partial values functionally and then link them manually. Having access to first class partial values is nice, but comes at a high cost if you lose the ability to automatically link them.

FYI: I enjoy reading your posts about Felix.

Open Recursion

Perhaps I should have said the underlying mathematical principle is known. Why it's hard to use and how to make it easier is a question of psychology, I agree that's tough.