XKCD on type theory...


Just thought people might find this amusing.

The hell of this is I can tell exactly which languages he's making fun of in almost 75% of those lines.

Edit: I would use the "embedding url" with an html tag containing


but LTU refuses to display it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


Oh ya!

This really hits home...uhm...maybe I'll have a teaser out next week.

LTU doesn't do images. Most link aggregater sites don't do image embedding like G+/Facebook do.

His most valid dig is the popup text here


"Functional programming combines the flexibility and power of abstract mathematics with the with the intuitive clarity of abstract mathematics"

Really I can't wait for all of this mathematizing of programming to end.

Yes, mathematics comes up with useful algorithms, but 99.9% of the papers are more like "in my new mathematically advanced system, programming problems that are easy and obvious to non-mathematical programmers using tools from the 60's and 70's can be written in 50% of the cases, with enough work."

I really think the role of mathematics in computer science should be limited to "coming up with useful algorithms" and yes, coming up with limited languages is part of it, but I don't think that those need to be imposed on engineers except when they're needed. The place of mathematics shouldn't be "telling programmers how to program".

This one is less fair http://xkcd.com/1312/
"Code in Haskell is guaranteed to have no side effects."
".... Because no one will run it?"

Also this fun talk is on topic to the post.


Really I can't wait for all

Really I can't wait for all of this mathematizing of programming to end.

That's an academic thing, it doesn't happen in the real world as far as I can tell (unless the math is necessary...e.g. in machine learning).

Expecting programmers to use a lot of math is unrealistic

That said, I think people designing programming languages should be very concerned about how easy it is to reason formally about programs because ultimately that makes the kind of informal reasoning that programmers do much easier.

Software engineering must become more professional

Software engineering must become more professional and is no longer the domain of amateurs.

At the same time, professionals must make the principles and methods as intuitive as possible.

Automation of all aspects of software engineering will gradually increase.

Have you ever done the style of programming that you insist

all engineers should be stuck with? What have you accomplished in this style? How do you know if it's workable and productive in practice?

You're famous for being connected with Planner, which was written in Lisp after all - no rigorous type system there.

Examples of ActorScript programs on HAL

There are examples of ActorScript programs on HAL including a meta system. See ActorScript.

PS. I don't insist on this way of programming. Also, ActorScript may yet undergo further development.

Software Engineering is doing just fine

It are the scientists who are not very professional.

One of these days I'll try to post a new computer language

on LTU as actual software instead of as a paper.

I think that's against the rules though. This may not be a site for people interested in new computer languages this may be a site for people interested in academic publishing.

I have no idea what LtU is

Forty to fifty goofballs in some corner of the Internet discussing academic papers in a bad manner?

If this site would be about programming languages you'ld expect a lot more news from industry though, since the science approach to programming languages is interesting but more or less irrelevant.

No idea. I still think we're just a set of goofballs.

Since I am making a configurable language

I got to talking the other day about possible options for a language.

When we got into the realm of typography (things like allowing full unicode) I found myself mentioning that I actually think programs would be more readable if we came up with a syntax that allows entities to be phrases. English isn't Japanese, we expect spaces between words.

I'm not sure how you display grouping (color) or input it (context?), but I still feel like that is a good feature that you could never sell to people...

Has been tried/CDL3 has this

One of the compiler compilers on the university I went to has this, identifiers can be phrases.

It's also because at the point the predecessors, CDL/CDL2, were invented people were still in the "Algol solves all your needs" phase while simultaneously lots of people were researching and implementing complex parsers. Phrases as identifiers isn't even a hard problem.

But it never made it to mainstream. From that you can deduce that the idea isn't worth it.

Maybe it becomes something interesting for non-English languages, I don't understand Japanese, but from the manner I use identifiers -generalizing from one- I would say the idea isn't really worth it.

The point about Japanese is that they don't use spaces

Words are demarked by being only one or two Chinese origin characters in a row, having conjugation endings in a different, phonetic, character set.

It's English where spaces would be useful.

Knuth's literate programming idea would have allowed embedded in text. I always liked that idea, but when I looked for a literate programming system I couldn't find anything more complete than rotting command line programs that weren't particularly compatible with windows.

The difference isn't that big

In a compiler compiler it makes somewhat sense to allow for phrases as identifiers since often the production rules are well described with phrases.

I.e., with phrases you end up with concise and readable production rules such as

    universally quantified expression:
         arithmetic expression
       | universal variable, double arrow, universally quantified expression.

But most code is written in far more complex languages where whitespace separates between constructs and you'll be easily lost if identifiers might include whitespace.

Moreover, underscores can be used as an alternative for spaces and give a strong visual clue what 'phrase' constitutes an identifier.

I.e., "universally_quantified_expression" solves most your need better than "universally quantified expression".

I don't see much use for the idea. But as I said, I don't know a lot of languages.

keyboard problem

I agree that underscores work, and that's what I use in real life, i don't like how hard they are to type. Maybe the solution is an editor that puts in the underscores for you.

Didn't both early FORTRAN

Didn't both early FORTRAN and early LISP allowed whitespace within identifiers? I think in one or both of those cases, the whitespace wasn't significant.

symbol delimitors

Some Lisp dialects support spaces in symbols -- which are identifiers in a homoiconic AST representation -- by using a distinguished quote character for this, like vertical bar, such that everything within is part of the symbol name. Presumably you can get a vertical-bar inside by doubling it, or by some other means of escape. But the general idea is that anything could be an identifier, but you lose vertical bar as an operator. (To make printing efficient, I usually reserve one bit inside symbols to mean "contains a character needing escape" so I don't process a symbol carefully on a letter by letter basis unless that bit is set.)

Typically minus is used as hyphen in Lisp dialects to join words in phrases. It's easier to type on U.S. (American) keyboards because shift is not needed, and it reads better too since English uses hyphens in compound words. Using minus this way bothers a few folks who want to reserve minus exclusively for negation and subtraction in arithmetic, especially if they cannot bear whitespace between operators and number tokens.

I recently decided I don't like forward-slash (/) reserved exclusively for division in arithmetic. I like it better as a path separator, so symbols can be paths more easily, for example in namespace qualification. It unifies things a little better if an abstract file system is part of the language semantics.

As first described, lists in

As first described, lists in LISP were comma-separated, so spaces in symbols would have been no problem. Though comma separators for all lists would only make S-expression Lisp a lot more unreadable.

programming in morse code

would only make S-expression Lisp a lot more unreadable.


The only thing that could be less readable than S-expressions are XML.

From this a cynical person could deduce that S-expressions are obsolete, never used and XML has taken over the industry. A cynical person would be right. I think I'm going to go cry.

acquired tastes, and otherwise

I know quite a few people, myself included, who have acquired a taste for sexprs (especially Clojure's slightly extended brand). However, despite a fair amount of exposure, I don't know of anyone who actually enjoys XML outside of the very narrow use cases for which SGML was designed to address.

I realize that this is anecdotal, but developed parens-vision seems to be a common enough occurrence as to preclude dismissing them outright as unreadable, as opposed to unfamiliar.

Documents are not a narrow use case ...

... rather, data is a narrow use case of documents. Any system general enough to handle documents in their full generality can handle any kind of data, though it may be overkill.

I rather enjoy XML, especially the MicroXML subset. Not, however, for writing code, though I have designed XML document types for representing arbitrary graphs, finite-state machines, and HTML-like schemas that could each be compiled (using XSLT) straight to Java.

Fortran and Algol

Fortran (up to Fortran 90), Algol 60, and Algol 68 all allow arbitrary whitespace in identifiers, though no two identifiers are distinct by reason of the whitespace in them, unlike Lisp |significant white space| identifiers. Algol 68's standard library took heavy advantage of this, with standard procedures like logical file ended and char in string.

In Fortran, all whitespace is ignored except within strings and (in fixed format) before column 7. That is why in Fortran DO 100 I = 10, 20 creates a loop where i iterates from 10 to 20 from the DO statement to the statement labeled 100, whereas DO 100 I = 10. 20" is an assignment statement setting do100i to 10.2.

When we got into the realm

When we got into the realm of typography (things like allowing full unicode) I found myself mentioning that I actually think programs would be more readable if we came up with a syntax that allows entities to be phrases. English isn't Japanese, we expect spaces between words.

This is more fraught with peril than it first appears. For instance, see the homographic attacks introduced when we internationalized domain names.

I know

The very first thing I published for this project was this:

It's a translation (from C to Lua) of a library that can translate Unicode text into various canonical forms such that all text that LOOKS the same has the same encoding.

[Update] Looking at your site, I doubt my code protects from letters that look the same from different alphabets (it might though, I'll test it), it protects different way of decomposing things like acute marks (they can be part of the letter or separate, and multiple marks can be in different orders and render the same). But code isn't the same as public facing things like URLS, it's for developers it won't be much of a security risk, and if you don't know whether you're typing Greek or English, that's your problem.

A lot of software engineering is project management

Software projects are particularly hard to manage as writing software is largely a design activity rather than a production activity. So most of what passed for software engineering 20 years ago was about project management. I've followed the various efforts by the IEEE to develop a "Software Engineering Body of Knowledge" (SWEBOK), which is clearly better than nothing but hasn't achieved the kind of consensus you would expect if software engineering were a mature discipline.

That's not the problem

It isn't that software engineering is often done badly, it is that scientists somehow push the idea that they'll solve it better than what has developed from within industry.

I have taught management and software engineering courses such as OOAD, RUP and AGILE, 6sigma, etc.

We already have a lot of theory on project management, requirements analysis, requirements elaboration, design phase, etc.,etc. There are bad products there and (still) often deadlines aren't being met, but the problem roughly isn't one of a lack of knowledge, but often knowledge not being applied. (Also, mismanagement, personalities, or financial reasons.)

The idea that 'science' can just step in and solve this hard and overly analysed problem is foolish. The idea that you'll solve it with a functor is insane.

People from within industry will solve it. Not some 24 year old PhD student who has never programmed a line of code in his life being supervised by an academic doing theory who him-/herself doesn't do any noteable form of management.

It's just an idiotic idea that academia will have a lot to say on the subject.

Very striking

"People from within industry will solve it. Not some 24 year old PhD student who has never programmed a line of code in his life being supervised by an academic doing theory who him-/herself doesn't do any noteable form of management.

It's just an idiotic idea that academia will have a lot to say on the subject."

That's very striking. You're saying that computer science is as delusive an academic discipline as the many fake subjects in the humanities such as literary theory or "peace studies".

Nah. There simply is no silver bullet

I am somewhat exaggerating but there is no silver bullet. I've read various academic books on how to develop software right, books such as "A development method for Hard Real-Time Java". (I forgot the title. As a lecturer you get free copies of books so I have a library of 500? books stashed at my father's place.)

Academia simply often gets it wrong. That book, I forgot the title off, presented some badly overengineered method for RT Java solving problems nobody cares much about.

And most books from academia I read are like that. Especially when they try to develop their own methods. They don't make much sense.

You don't get much better than presenting industries best practice combined with how to solve specific problems algorithmically.

To me, academia is bad theory, bad languages, and bad tooling.

Having said that. There is no hard border between academia and industry, so maybe once in a while some industrial person will write a good academic book on it, and that'll be the headlines for a while.

Software engineering isn't something you can solve

Even if we made programming an order of magnitude easier it would only reduce not eliminate the need for software engineering.

I agree

The language, or the programming, simply often aren't the (main) problem.


There is, as you no doubt know, a strong set of disincentives for academics to tackle industrially relevant problems. Partly because many academics have no industrial experience. But also because the research involved needs to be publishable (harder without some juicy abstract math), in small enough units that it can be tackled by inexperienced students within the timespan of a degree or by an academic within the span of a tenure review, and preferably fundable by some grant agency. Oh, and did I mention publishable?

Not to say that it can't be done. For example, Harel created Statecharts while working with some Israeli avionics designers, and now they're used (and abused) everywhere. But I agree that it's more the exception than the rule.

psychological pressure or abuse

I understand the modern trend in software management is "Agile" which seems to mean that you have one designer who makes it up as he goes along (thus the "agile" or adaptive part) and a bunch of peons who are kept under maximum pressure by requiring the impossible from them, that they plan their work, maybe a week advance to an accuracy of half an hour or smaller and then that they document all of their time down to the minute. Also documentation probably isn't done, no time given for making code readable etc. - nothing that would make for good engineering.

It sounds more like extreme psychological abuse than management.

Every Project has its own Life Cycle

Yah well. Agile isn't a silver bullet either. But, as I hope I taught my students, you chose a development method given a problem, not the reverse.

Agile will probably scale nicely for small and medium sized projects (I forgot a lot about it), so I am not bothered by the development.

Agile vs "Agile"

"Agile" (as laid out in the original manifesto) is more about mindset than anything else. It's about realizing that no matter whatever process you claim to follow, at the end of the day what gets software delivered is people who care about getting it delivered (and will bend or circumvent the process to make it happen). And it's about realizing that part of what makes software valuable is the fact that it is "soft", so whatever process you follow needs to be ready to deal with change.

Of course, layered on top of that you get all sorts of "agile processes" that (at least as implemented some places) seem to replace the old process straitjacket with a new one. Instead of realizing that it's the process itself that's supposed to be "agile".

Concerned? Ya, but not

Concerned? Ya, but not enslaved by it. It is actually ok to start with a good solid design and work out the formalisms as the design evolves, rather than starting the other way around. Most non acedemic but stiill successful languages (e.g. Java, much of C#, all the dynamics) actually take this approach, for better or worse. And even Haskell lacks a formal specification.

You also have to separately ensure that informal reasoning is easier for programmers...easy formal reasoning helps, but doesn't ensure that. You could have a language so simple (e.g. See Lisp syntax) that in practice it becomes unusable.

I am pretty amazed by Lisp sometimes

I consider Lisp to be an operational model, not a language. I personally cannot stand the syntax.

But I am pretty amazed by how you can lift Lisp to implement tools with. Somewhere, it tells that's its more important to get a flexible operational model right than to dwell in overly clean pure mathematical languages. Javascript and Python share the same qualities; somehow, academically horrible languages but such a great flexibility that they're great for small and medium scale software projects.

Lisp is pretty fine, I personally loath Haskell from a software engineering perspective.

Having said that. I am implementing a language I should advise everybody not to use according to my own principles. So yeah, there's that too.

I find

that I can implement more of an advanced programming tool in a day to a week in scheme than I can in a year in C++

And the scheme one will do a lot more than the C++ one.

And if you want a C++ library to interface into the Boost C++ libraries, it will take a team of experts two years to do the integration.

Of course your scheme code will be a little hard to read. But it turns out that having the semantic power is more powerful than the syntactic readability - when you need it.

I actually started liking C++

Depends on the problem domain I guess. I imagine I wouldn't like to create a simple GUI for a simple application in scheme unless forced to.

I am now implementing an interpreter in C++ because I want to support an autotools installment of the interpreter for possible users. I.e., run configure, make install, run the interpreter.

It isn't a natural fit and substantially more work since a compiler is tree rewriting and bookkeeping and every language supporting abstract data types better than the OO model of C++ would be easier.

Still. I am amazed how far you can get (with RAII) in C++. Since I started off with C knowledge I first wanted to do everything with raw pointers but having switched to RAII it's almost like programming in a better Java or maybe C#.

You can do a lot in C++ these days without exploiting templates, just sticking to straightforward RAII, and never be hindered by the problems everyone associates C with, i.e., null pointers and dangling memory. I even start to like the memory model; maybe immediate reclamation of memory is a better idea than, say, Java's.

If C++ would add pattern matching syntax it could actually even become a better language; a fine functional language even. Looks like Stroustrup did some experiments in that direction but nothing definite yet. Maybe it's a bad idea anyway.

Reference counting gc built in to the libraries

works for not-advance data structures. But you would never want to do anything advanced like a compiler without some horrible cludge like libgc (... a buggy, incomplete, undocumented grafting of a reasonable garbage collector onto C and C++). But if you use advanced features of libgc, you are likely to have to search for a C++ compiler and library combinations that don't seem to generate random crashes - it's that bad.

But using the built in reference counting is an inefficient solution. The increment and decrement are always interlocke instructions, hidden, extra headers have to be allocated using inefficient allocators. Multithreaded access to pointers is not safe apart from race concerns.

Other advanced stuff like the atomic library also may have problems. it's a good idea, but Microsoft's implementation has been so lazy as to be considered malicious.

Somewhat agree

I know I take a performance hit for using C++'s shared pointers (which are supported in all modern C++ compilers). I somewhat hope that is offset by that everything which isn't term rewriting executes much faster than in the other languages I could have chosen. (Also, I am doing tree rewriting. I don't need libgc. Everything is a DAG, reference counting suffices.)

The price I pay is that it'll likely only compile small to medium sized sources fast. But that would have been true for most other more abstract languages I could have chosen to implement in; including almost all Lisp (derivatives).

The more you want to support, the more you'll end up going down to the machine layer. Any fast compiler for large projects probably ends up written in C with specialized data structures and a hundred year man effort. I simply don't have the stamina or time for that.

(I am also pretty sure I am writing mostly portable C++ code. I haven't found a need to do much weird stuff yet.)

I've been playing with this lately


It's a jit for a somewhat incomplete language, but it has potential and it's relatively popular.

I find it a much nicer syntax than scheme, the jit is surprisingly good, and its advantages over Ruby is that Ruby is way too baroque and too slow.

It's such a simple language that extending it with a custom parser is easy. Think about this. Writing a complete parser for C++ would take over a month (or more) and good tools to write and test, and help from experts in the 3 compilers you'd want to emulate. Writing a complete parser for Lua with position and error reporting can be done even with hand made code and no tools in two days.

I find that having optional multiple return values per function saves a surprising amount of glue.

I just wrote a scanner layer in Lua and instead of using C style where you pass a mutable pointer in, or C++ style where you you have an object with changing state, I did it in pure functional style, passing a lot in and out of each routine. I was surprised that it was usable. It gave me a very primitive ability to backtrack by simply deciding to ignore the results of a function, and since "doing nothing/ignoring a value" is foolproof it had no option for bugs to creep in on backtracking.

Even though every single closure has to be written "function (...) ... end" that's perfectly manageable even if it's a weird syntax for "let".

Not an option for me

I am creating a language working with the illusion that it'll be a usable tool for someone, at some place, at some time.

It's more or less an effort of getting the language I defined out of my system; a non-performing bootstrap simply wasn't good enough. But a non-performing Lua implementation isn't good enough either.

Sorry. But for implementing a minimally useful tool I am stuck to using either C or C++. There simply aren't many alternatives.

(As an example. I am doing 1MB/s Unicode parsing. How fast is your Lua implementation?)

I'm guessing

I'm not sure what you meant by "It's more or less an effort of getting the language I defined out of my system; a non-performing bootstrap simply wasn't good enough. But a non-performing Lua implementation isn't good enough either."

I'm guessing you mean that you were already working in C++ or that it has to integrate with C.

Part of the definition of Lua is a C interface, so it could be integrated. The original version of Lua is in pure, portable C - though of course that's a lot slower than the jit.

But saying "not performing" - if you mean that you think lua is slow, you might be wrong. The jit is a very odd thing in that it's defining a language that should be slow, but in practice it's not. Very clever implementation probably coupled with the fact that computers these days are memory bound more often than computation bound. It's like the thing you said the other day, that SAT solvers are fast not because the algorithm is so good, but because they were formulated to use mutable arrays in C - ie if you wrote a SAT solver in abstract Haskell it would be a disaster.

No. I bootstrapped my language

I wrote a bootstrap interpreter in ocaml and then a bootstrap compiler in Hi. Upgraded my system to 8GB, ran the interpreter for a day, bootstrapped the language.

So there is a working compiler with a somewhat unfinished type checker (I recently checked it, it's actually quite good) because I rushed it. But no performance. I do combinator rewriting in C with lots of low-level routines still in Hi; thousands of rewrites just to print an integer. It's really fast, but not fast enough to get a performant bootstrapped compiler unless I spend forty years on it. (And even then it may fail. See MLTon/Clean/Haskell.)

Horrible experience. Never going to do it again.

But I somewhat got intellectually stuck with the language. I simply want an interpreter and maybe do some small experiments with it myself.

So. C++. I gathered.

I'll talk later going to work

I find term rewriting and related things like constraint handling rules to be a mystery (and there are no books on these things I can afford - no $200 book for me!)

My problem with rewriting is that I don't see how you come up with confluent systems.

I remember some crackpot who was trying to promote his symbolic math package written in prolog decades ago. The problem with it was that even though it had lots of rules, they didn't combine into anything useful ever. No matter what equations you handed to his system, at best it would rewrite a little, deduce nothing and return nothing useful.

Naive playing with rewrite rules (that I've done, just playing around) always turns out that way. Just because you have rules that COULD do something useful to your equations doesn't mean that a program ever WILL find a useful path through a state space that might be astonomically large.

How in the world do you find problems where that sort of form turns out to be useful? I mean I bet mathematicians have come up with 8 of them, right? And beyond those eight there's probably no use for term rewriting.

Term Rewriting is just an abstract manner of Implementing

Term rewriting is just an abstract manner of implementing a compiler. A compiler, at minimum, handles an AST, and an AST is trivially represented as a term (language), and a term language is trivially represented as an abstract (or algebraic) data type.

So if you want to implement a compiler, it makes a lot of sense to use a functional language, since they do term rewriting well. Not a good choice if you'ld like to implement a C compiler, since that'll need to handle 100MLoc projects and memory management and mutability become necessities, but a good choice for reasonably small languages.

And a compiler for a medium sized language can just be neatly implemented as a multipass compiler on the AST, then intermediate forms. (That is, up to the point where you'll be doing operational optimizations, such as peephole optimization.) Just rewrite the AST with some external bookkeeping.

It's just an abstract manner of looking at how you should implement a compiler. For instance, a compiler written in ASF-SDF abstractly implements a term rewrite system (but I doubt that'll ever scale beyond toys, like two hundred declarations for a Lisp like language maximum).

I imagine some Lisp compilers work as term rewrite systems internally, though I imagine a Lisp compiler will use mutable structures. (It'll still be slow unless people will put in a lot of effort; expanding the language with fast mutable structures, packing data, tail-call optimization, etc.)

(I could also have called it AST rewriting, but that doesn't capture the abstract vision on compilers. Confluency is not something I am bothered with. Through transforming I implement a rewrite system on the AST and internal representation which might be confluent but only termination is necessary. It's a nice academic question whether confluency on the system could be exploited though. But I simply don't bother with it.)

Thousands of rewrites to print an integer

You'll excuse my unlearned deductions here:

Thousands of rewrites to print an integer: that means that a rewrite is unit of work not much bigger than a single machine instruction.

So unless "rewrites" can be mapped directly into machine instructions, perhaps you better come up with a different model for your computation, either one with much larger chunks or one that does map to machine instructions.

Not going to

I have a small declarative language. It is implemented as an eager combinator rewrite system, or trampolining thunks in C if you want the operative view.

Unless I sacrifice on the declarative nature of the language and reinvent C in my own language I am not going to implement a fast bootstrapped compiler.

A) I don't want to sacrifice on the declarative nature of the language. B) I don't have the time or stamina to reinvent C.

You don't write a compiler in SQL, right? I don't have the vision that I am implementing a GPL. If anything, I want an interpreted declarative language which binds to C.

I don't mind if it isn't too fast. As long as it is robust, has a small implementation, and has marginally okay-ish performance I don't care.

oh, ok

I was glancing at your site the other day and wondering why it was that you were saying that your language was too slow.

:/ I get a feeling that luajit is fast because the man who programs it thinks entirely in terms of memory caches, instruction pipelines and probability that a certain function will run... Ie, as low level as possible. He's so out of touch with higher level programming that he mocked someone who asked for his system to optimize copying a data structure - he considers copying almost unnecessary to programming. If I can figure it out I may force that change into his jit one day.

Yah well.

The bootstrap was three orders off. Of course, I could have made printing integers a primitive of the language and save some cycles but that wouldn't have solved the fundamental problem that combinator rewriting is a simple, robust, but fundamentally too slow manner of implementing a bootstrapped compiler.

In the end, I guess it was just a Bad Idea^tm. Like trying to implement SQL in SQL, or maybe even Javascript in Javascript.

As a side note. It also raises the question whether a Lisp or Haskell should be written in Lisp or Haskell. Maybe it stands to reason for a GPL, but maybe you should just accept that if you're going to implement a declarative language, you just went up the tower of abstraction, and it is just fine to have an implementation in a lower level language.

As far as Luajit goes. People who can think in terms of the machine, well, gotta adore those people too, right? Without them, nothing would run.

But I have to admit I am not like that. I just want a declarative language to run such that I can start experimenting with it. I'll faithfully implement the combinator rewrite system, as to end with a very robust system, but I am sure people will complain about the lack of TCO, stack manipulations, mutable structures, or a for loop.

But then, Mathematica exists too, I guess.


Every method that is too inefficient in the present will be the standard way in the future, if it's more robust and easy for the programmer.

Clang (otherwise known as LLVM) started out as an academic exercise, considered too fancy and slow for real work. Mocked. By now it's faster than GCC.

I'm pushing for us to forget Chomsky and all those over-optimized parser algorithms when it comes to compilers. Computers are too fast and big now to hobble ourselves with optimizable algorithms when the general form is fast enough and doesn't get in the programmers way as much.

Yeah well. I only have one life

It isn't that you couldn't create a fast bootstrapped compiler in Hi (probably still with some imperative extensions), it is that I cannot do it given that I only have one life.

I roughly am willing to spend one man year on it. CLang by now probably already is a hundred or more man year effort.

I somewhat tend to agree with you but that was also the manner in which I ended up with a compiler which was three orders too slow in performance...

I'm an evil man


Another way to bootstrap would be to find a DIFFERENT, but already completed higher level language that's similar enough to what you need, but optimized.

So instead of bootstrapping Hi in Hi or trying to do it in C, you write the first compiler in Prolog or Mercury or Haskell or whatever strikes your fancy. Then once Hi is fast enough you can write version 2 in Hi.

I already did that

I already got an interpreter in Ocaml, right? It is terrifically slow and I needed to go through hoops to even get it to interpret the bootstrap.

For my declarative language, I need to output low-level code which runs on some abstract machine to get even near marginal acceptable performance.

Only C will give me the bindings to a fast enough abstract machine to compile to.

I meant the compiler not an iterpreter

three orders of magnitude is amazing though...

I'm almost interested in figuring out why to look at the code, but I have my own problem to work on a real life that's falling apart.

There are various reasons

Three orders, right. Roughly, I guess combinator rewriting is one order (I'll never totally get rid off), no imperative constructs is another order (which I don't want), and no fast compact representation (list of chars instead of char array) is another.

Most people are clueless about how spoiled they are with all the fast compilers which somewhat emerged through natural selection given thousands of failed attempts. It's like all those people who have a mini-supercomputer in their backpocket using it to post pictures of their private parts. Clueless.

Maybe memory allocation?

Java new is at least one order faster than C++ new since it just bumps a pointer.

But I'm not sure a language like the one you described has to be slow. I bet (even though it's a logic language rather than a functional one) that the same limits apply to Mercury, and it supposedly compiles to fast C.

If memory allocation on small lists is the problem then maybe chicken scheme would be good - it allocates them on the stack kind of like Java, even though it's translated to C. Cheney on the MTA.

Not likely

It has a simple but very efficient stop-and-copy GC in C which also bumps a pointer. No generations though. It's even quite smart that constants simply remain in the text section of the compiled C program.

No, that ain't it, at least it should give similar performance to a smartly compiled Lisp program.

But that's another good reason not to write an interpreter in a high-level language. The overhead of mimicking a heap in something like Haskell is that substantial that it renders such a solution useless.

Lisp is an operational model, not a language

I am not going to go Lisp with it. Even with Lisp, they usually need or add imperative extensions for self-compiles, I would end up using those.

(The three orders for a self-compile are also due to the declarative nature of the language. It's just a bad idea to self-host. You don't have the problem writing an interpreter, granted, but I'll simply inherit too many of the down-sides of scheme.)

Simply put. There's no good reason for me to choose scheme over C++. I've got an AST/rewrite system working in C++, there are no substantial reasons left to choose scheme over C++.


Mercury has been in development since 1995 and has a 6 month release schedule.

It also doesn't have the kind of type inference I have. I probably have more semantical passes. It doesn't list how long it takes for a self-compile. Possibly it is one of those heralded academic compilers which takes three minutes to self-compile; no offense meant, but there are many of those. (And even if it has a fast self-compile, might have taken them years to get there.)

As I said, there's no reason I couldn't implement a fast Hi-compiler (with some extensions) in Hi, except that I only have one life. Since I've got a scheme to rewrite terms in C++, there are no real benefits left for choosing another language. And, finally, I need to be able to compile to code for an abstract machine, and there are not many languages (apart from C) which allow that.

An interpreter in C++ it is.

EDIT: To clarify a bit: The effort it takes to do a self-compile gives somewhat of a predictor for the effort it would take me to implement a compiler myself in a certain language. But I want an interpreter, so I am also stuck with the burden of getting a reasonably fast operational model right in the implementation language. I don't expect Mercury can give that to me.

Well. It's been nice talking to you

Ah well. Fourth message. Just wanted to say it's been nice talking to you. A good opportunity to rethink my decisions. But they stand, it seems.

Unicode parsing

(As an example. I am doing 1MB/s Unicode parsing. How fast is your Lua implementation?)

Neither my code nor the original C library is very fast on large files because it was written to make multiple passes over whole files held in a buffer the size of the whole thing. And my code is also slower because buffer size can't be specified in Lua, it has to grow by powers of 2... and other such silly things. I just ran a benchmark and the C code was a bit less than 5 times faster.

If it were rewritten into a producer/consumer style where all of the passes ran at once it would be much faster in either language. I started to do that and gave up because I have higher priorities.
[note, updated]

On the other hand scanners don't have to be that fast

So on a slow atom processor, my code takes 1 second in lua to decompose a 1 megabyte unicode file and recompose it into the shortest canonical form. The C code would take 0.2 seconds. How fast does it have to be?

Good enough

No, that's actually quite good. I am still not going to implement it in Lua though. Too big a chance that something at some point will bite back, as you have shown with the other example.

I want the autotools for users and the assurance that even if I implement a small slow algorithm, I can always implement an alternative with fast data structures imperatively. I am cutting enough corners as it is.

How many useful languages do you know not written in C/C++?

I can also revert the question: How many useful languages do you know not written in C/C++?

I guess Scala, maybe. But I really don't trust they can deliver performance for non-trivial applications.

Python is in C. Q is in C. Lua (I imagine) is implemented in C. Javascript is implemented in C. Even a lot of fast small Lisp interpreters are in C. The list is endless.


Look, I'm not sure lua is maximally useful. The main limitation is the lack of threading.

But it has a nice incremental GC, and since I'm playing with making continuations coexist with normal kinds of functions, the fact that it has guaranteed tail call optimization is important.

Lua's other main advantage is that the jit runs on everything (including android), is high quality, is used in industry (scripting on everything from a lot of games to Adobe products), and thus is alive with actual users.

This isn't a very

This isn't a very interesting question. I wrote my language in C#, compile its tree with the DLR, which runs on top of the CLR...which is probably at some point implemented in C++...maybe.

Via the magic of static/run-time meta-programming, you can always write something in a slow language that produces something that executes quickly...e.g. consider the Jikes RVM (Jalapeno?) as a good example of that.

Too declarative a language

See my comment elsewhere.

Misread who a comment was responding to.

comment deleted.

re refcounting gc and cost-specific variations

So you write your own refcounting in C, in more than one flavor, so what libraries do will not affect you, unless you wrote them. This follows the advice of ancient chinese sage Fox Mulder who said, "Trust no one."

But using the built in reference counting is an inefficient solution. The increment and decrement are always interlocke instructions, hidden, extra headers have to be allocated using inefficient allocators. Multithreaded access to pointers is not safe apart from race concerns.

For refcounted objects seen by only one thread, use faster non-atomic instructions. Only use atomic interlocked instructions for refcounted content passed between threads -- typically request objects for service queues. This does imply multiple refcounting mechanisms: a different flavor for each context. If you are rewriting code anyway, you can change your mind about the flavor used in a particular case.

Code which is going to operate on state hosted solely inside a fiber (and peers within one thread) won't need an expensive refcounting scheme. If the same code gets ported (say by a rewrite tool) to manage shared global state accessed by lots of threads, that new version should replace reference mechanisms with something slower but thread-safe.

You need only get the worst-common-denominator result if everyone shares an identical code base for everything, and cheap contexts end up paying the price imposed by expensive contexts too. Granted, that is the usual result of folks saying there's only one way something is done. (I sometimes think of this as There-Can-Be-Only-One thinking, which is stupid in my POV.)

If you have processes send messages to each other, there's less shared mutable state. And you can also avoid the cost of thread-safety in specialized circumstances when i/o is done with a synchronous interface, where a caller parks or blocks until a read or write is done (or fails). A bunch of non-blocking fibers can shared a i/o service queue in another thread, so they park while i/o is done in the service thread, and the buffers involved are temporarily set aside for exclusive read and write access by the i/o subsystem. This is just like a system call, when you block until reply, where you show buffers to another party but they take no references (though they may create new non-shared copies). But the important idea is that, just because another thread will see state owned by one thread, you may need no other extra protection than whatever request/reply protocol gets the communication done.


The problem with a "trust no one" approach is that you would never be able to reuse the libraries of others.

I am not sure this is a comment on my approach, the indentation got somewhat lost, but if I take the compiler I am writing as a use-case:

Yes and no. It's not worth implementing refcounting yourself until you hit a bottleneck and it's far more worth using the standard implementation of refcounting meanwhile, while hoping the standard will be extended with multiple manners of refcounting will the need ever arise.

But I somewhat agree, ideally, I would have implemented refcounting myself. But then, what if I wanted to make use of my multicore machine, have concurrent rewrites, and need the locking? For the moment, I am better off using C++'s standard libraries.

And how sure are you that implementing your own refcounting will solve a bottleneck? CPU's have test-and-set instructions build in. Maybe you end up shaving 4% of the running time of your application and you could have done better solving another bottleneck...

dial T for trust

The "trust no one" joke aimed to say trust is optional rather than obligatory. Setting the trust dial to zero is good default until cost/benefit tradeoffs make trusting more worthwhile. In a more complex vein, trust need not be distributed uniformly. I can trust something over there, but not here, for example. The humorous version is briefer, and more memorable, while spelling out every proviso is a bit of a grind to get precise. I replied to Josh Scholar (quoting him), and wasn't talking about your use of modern C++ as a base runtime. My comments would only apply to writing a language in C++ if you didn't use any standard libraries other than those you wrote yourself as standard for your PL.

If you use a language, like modern C++, which is highly opinionated about how memory management should work, then sailing against the wind is probably not worthwhile, and may never be safe under future versions even if you hit safety now. But if you use a laissez faire language where your memory management choices are your own business, then by all means do your own refcounting when the mood strikes. You can start out by specifying how it works abstractly, then bind it to something concrete at one of several steps as code moves toward compiled product. At any point, you can bind to a standard instead of your own hand rolled code, provided the api is pluggable enough.

If you know code will manage state that's mutable in only one pre-emptible thread, and won't be exposed to races, you can bind the refcounting operations in that code to run faster, because there won't be any need to pessimistically sync (slowly) with other cores when you increment and decrement a count.

You can take two different approaches toward preventing a mistake, where a developer shoots themselves in the foot by not using thread-safe refcounting when a counted object actually gets shared among multiple threads where races occurs. First, you can say "don't do that" and let them shoot themselves in the foot if they want (it's a free country). Or second, you can add the ability to see this mistake in a tool, perhaps the one compiling one of the legs of language processing involved. Then you can generate an error when dataflow shows an object can escape to a context where races happen. I see doing the latter myself, when code compiled to run in fibers, threads, or processes will identify which namespace in a tree is used by all code in that communicating sequential process, so static analysis shows whether shared data escapes out of that tree namespace.

You can model an application as composed of lots of processes, either co-hosted in one operating system process or not. Then you can color each process according to how much trust is allowed in each. One process might be "trust no one" while others vary in what libraries are safe enough to re-use in those contexts. Personally I think architecture is better when there is some structure in what and where things happen, as opposed to letting anyone call anything at any time in an undifferentiated bag of gore. If you really don't trust someone, let them run in another OS process with address space protection holding them at arm's length, and talk to them through streams and messages. Then you can re-use libraries of others even if you trust very little.

Since I'm pretty sure this is all familiar to you, I apologize for using this sub-topic as an excuse to teach new runtime tactics to others reading this, who haven't thought it all through yet.

2+2 Done

Someone in the xkcd forum says that a hidden gag is that 2+2 seems to mean something like "increment the definition of 2 by 2" except that it somehow not only made 2 show up as 4 in the next lines it also somehow made 11+1 into 14.

:) given the mathematizing of computer science that one sees on LTU and obsessive focus on the trivial, I'm tempted to take this idea and run with it. One could come up with a dialect of Lisp where only the characters 0-9 and ( ) are allowed, numbers start out autoquoted as representing themselves, but as you bind them they take on other values. Then you write endless scholarly articles about how you can solve the binding problems of numbers being hidden by being bound to other values.

I thought it was Prolog, but I was not sure

The "done" response reminded me of Prolog. I had a vague recollection of a Prolog interpreter responding with that. But I am not sure.