archives

minor lexical tokenization idea via character synonyms

A few days ago I had a tokenization-stage lexical idea that seems not to suck after going over it several times. On the off chance at least one person has similar taste, I can kill a few minutes and describe it. Generally I don't care much about syntax, so this sort of thing seems irrelevant to me most of the time. So I'll try to phrase this for folks of similar disposition (not caring at all about nuances in trivial syntax differences).

The idea applies most to a language like Lisp that over-uses tokens like parens with high frequency. I'm okay with lots of parens, but it drives a lot of people crazy, apparently hitting some threshold that says gibberish to some folks. Extreme uniformity of syntax loses an opportunity to imply useful semantics with varied appearance. Some of the latent utility in human visual system is missed by making everything look the same. A few dialects of Lisp let you substitute different characters, which are still understood as meaning the same thing, but letting you show a bit more organization. I was thinking about taking this further, where a lot of characters might act as substitutes to imply different things.

The sorts of things you might want to imply might include:

  • immutable data
  • critical sections
  • delayed evaluation
  • symbol binding
  • semantic or module domain

A person familiar with both language and codebase might read detail into code that isn't obvious to others, but you might want to imply such extra detail by how things look. Actually checking those things were true by code analysis would be an added bonus.

I'm happy using ascii alone for code, and I don't care about utf8, but it would not hurt to include a broader range of characters, especially if you planned on using a browser as a principle means of viewing code in some context. When seen in an ascii-only editor, it would be good enough to use character entities when you wanted to preserve all detail. It occurred to me that a lexical scan would have little trouble consuming both character entities and utf8 without getting confused or slowing much unless used very heavily. You'd be able to say "when you see this, it's basically the same as a left paren" but with a bit of extra associated state to imply the class of this alternate appearance. (Then later you might render that class in different ways, depending on where code will be seen or stored.)

A diehard fan of old school syntax would be able to see all the variants as instances of the one-size-fits-all character assignments. But newbies would see more structure implied by use of varying lexical syntax. It seems easy to do without making code complex or slow, if you approach it at the level of tokenization, at the cost of slightly more lookahead in spots. As a side benefit, if you didn't want to support Unicode, you'd have a preferred way of forcing everything into char entity encoding when you wanted to look at plain text.

Note I think this is only slightly interesting. I apologize for not wanting to discuss character set nuances in any detail. Only the lossless conversion to and from alternatives with different benefits in varying contexts is interesting to me, not the specific details. The idea of having more things to pattern match visually was the appealing part.

How Useful is Erlang Hot-Swapping of Code?

Please discuss. I am interested.

MCG: A Visual Functional Programming Language

Hi everyone,

It's been a while! Some of you might remember me posting here a few years ago about my adventures with type systems and stack-based languages. For the last couple of years I have been head's down working on a visual functional programming language at Autodesk. The language is called MCG (Max Creation Graph) and is part of the commercial 3D animation, modeling and rendering software package 3ds Max. I've written a blog post about MCG that introduces the language to a technically savvy audience.

As some of you may know visual programming languages are quite commonplace in 3D software packages. A few examples include: Houdini, Grasshopper, Softimage ICE, Fabric Engine Canvas, Dynamo, NUKE Compositing, and more. However, I can't find much mention of these languages in the literature.

Switching from designing programming languages as a hobby to doing it professionally has been very interesting! I found myself spending a lot less time consulting the literature and spending more of time responding to customer needs and feedback. Now that MCG is shipped and being used, I'm interested in reconnecting with the academic community. I am wondering if anyone has some suggestion about what aspect of MCG might be of the most interest for researchers, and what type of publication I should pursue (e.g. technical report, experience report, research paper, or simply stick to non-academic publications). Any tips or suggestions would be most welcome! I'm also interested in exploring potential collaborations with researchers so please reach out to me if this is something that might interest you.

Thanks!