"8th" - a gentle introduction to a modern Forth

Found on the ARM community's embedded blog. It seems that Forth may be making a comeback.
"8th" - a gentle introduction to a modern Forth

8th is a secure, cross-platform programming language based on Forth which lets you concentrate on your application’s logic instead of worrying about differences between platforms. It lets you write your code once, and simultaneously produce applications running on multiple platforms. Its built-in encryption helps protect your application from hackers. Its interactive nature makes debugging and testing your code much easier.

As of this writing it supports 32-bit and 64-bit variants of:

  • Windows, macOS and Linux for desktop or server systems
  • Android (32-bit Arm only) and iOS for mobile systems
  • Raspberry Pi (Raspbian etc) for embedded Linux Arm-based systems.

...
8th differs from more traditional Forths in a number of ways. First of all, it is strongly typed and has a plethora of useful types (dynamic strings, arrays, maps, queues and more).

Other differences from traditional Forth appear to include automatic memory management and some kind of signed and encrypted application deployment.

[Edit: per gasche's comment, please note that 8th appears to be closed source. From their FAQ:

"Is 8th a GPL-Licensed product? No, it is a commercial product. None of the libraries it uses are under the GPL or LGPL. Due to the desire for security, 8th includes its required libraries in the binary, and the GPL family of licenses is therefore not appropriate."
Let the arguments about the effectiveness of security-by-obscurity begin. Source is apparently available if you buy an Enterprise license and sign an NDA.]

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

A language whose time has passed

As Mark Twain supposedly once said about a German story that came in three parts, he was still waiting for the third volume, because that's where all the verbs are.

I love revisiting old languages (in some of which I invested thousands of hours programming quite happily), but nowadays I'd rather have a compiler invest a few billion CPU cycles on my behalf, because I can wait the extra half second for the compilation result. After all, my time as a programmer is worth something, too!

What do today's languages do that's so unique?

The only real difference between Forth and infix languages (which presumably you're referring to when you refer to compilers which do things on your behalf) is the lack of a parser.

That is it.

Honestly, that's not enough for me to reconsider using a language. There are no technical reasons why a Forth compiler cannot do the same things your favorite language can do on the back-end with respect to (1) providing a package or library system (which 8th seems to offer; note its aggregate types), (2) efficient compilation (which 8th, but also other commercial Forth vendor packages perform), or (3) integration with host environments (block-based environments are often emulated on a host filesystem, if at all). I've even explored (in toys, admittedly) compiling to SSA form, and producing optimized code output. Contemporary Forth systems rarely invest that kind of effort, granted, but rarely does not equal never.

8th seems to be a rather admirable development platform. I likely won't use it myself, but not for the reasons of "compiler which does nothing for me" (as you imply). I've been a Forth programmer for going on 20 years now (still am), and having used languages like C/C++ as well as Python, honestly, I prefer Python first, Forth second, and waaaaay off in the distance lies C/C++. Like, off in the next county kind of distance.

But, to each his own. It's good you have opinions on your favorite languages. But, to suggest Forth is dead because other languages "do stuff for you" is fallacious at best, and misleading at worst. There are so many other good reasons to prefer, e.g., Python or Go over Forth, but ultimately, how to express a program to the compiler's back-end isn't one of them.

Why else?

I have to respectfully disagree. If the language/compiler/etc. is not doing substantial work for me, why would I choose to use it?

I have coded in machine code, just like I have done physical labor carrying things by hand.

I have coded in assembly code, just like I have done physical labor with a simple tool e.g. a shovel.

I have coded in C, just like I have used a pulley for physical labor.

But there's a reason why a back-hoe exists, why a bulldozer exists, and why a dump-truck exists, despite the ubiquity of shovels and wheelbarrows.

Efficiency and productivity do count for something. To ignore the capability of a modern computer to provide dramatic productivity gains for the developer is crazy. Who cares if the compiler uses an extra few billion CPU cycles, since my workstation pushes over 50 billion per second. On the other hand, if I must carefully and laboriously lay out program structure to make the compiler's job easier, so that it can save a few CPU cycles, then I am literally burning money.

In summary, I am not concerned with how much work the compiler writer has to do, because he/she only has to do it once. I am not concerned (within reason) with how much work the CPU has to do to compile, because the availability to the developer of compute capacity is enormous.

I am not suggesting wastefulness. Purposeful inefficiency is stupid. But leveraging the tools that we have (fast and wide CPUs, lots of RAM, and very fast storage and networks) makes sense as workmen.

Mechanical advantage.

work distribution

I am not concerned with how much work the compiler writer has to do, because he/she only has to do it once.

As someone developing a Forth-like language, I'd like to offer a counterpoint.

A sophisticated syntax makes tooling a lot more difficult. This isn't work done only by the compiler writer. It will be repeated when you develop: syntax highlighting, linting tools, IDEs, projectional editors, metaprogramming facilities, genetic programming and other program search models, etc.. It also complicates development of a lightweight interpreter or use of simple, easy-to-debug semantics like confluent program rewriting.

However, I agree with your premise that the system should be doing a lot of work on behalf of the user. I would only posit that maybe the compiler is not the right part of the system to shoulder this labor.

It turns out that a simple syntax works quite nicely together with projectional editing. And Forth is pretty much as simple as it gets while supporting programmer-defined behaviors. Even simpler than a Lisp. Shifting burden to projectional editors means we're only paying for presentation-layer features during presentation, rather than upon every compilation or codebase analysis. It also gives us a lot more freedom to develop and experiment with syntactic sugars and problem specific extensions without requiring special privileges from the compiler writers. We could even develop graphical syntax such as embedding music notation, graphs, Kripke diagrams, and decision trees in the code.

This would have been a bad fit for languages designed before about year 2000. But today we have web services, web sockets, widespread editors that aren't limited to text (aka browsers), and also FUSE-like filesystem adapters to integrate older editing tools, that can make projectional editing a lot more accessible. If only we have the languages to leverage such features and the will to explore them.

Core syntax vs. abstract syntax

A sophisticated syntax makes tooling a lot more difficult. This isn't work done only by the compiler writer. It will be repeated when you develop: syntax highlighting, linting tools, IDEs, projectional editors, metaprogramming facilities, genetic programming and other program search models, etc.. It also complicates development of a lightweight interpreter or use of simple, easy-to-debug semantics like confluent program rewriting.

I don't think the trade offs are so cut & dried. If you start adding important things "on top" of your core syntax, you have exactly the same problem of those important things needing duplication by lots of tools.

On the other hand, if you have a sophisticated syntax, you can always write your tooling in terms of an abstract syntax and let "a library" handle building that abstraction (the same way that we usually parse XML, for example). One might even view your simple core syntax as being a reification of such an abstract syntax and conclude that these two approaches aren't so different.

As an aside, I'd also argue that even if your syntax is very simple, it's probably not so simple that meta-programming over it is a good idea.

re: core vs. abstract syntax

If a concrete syntax is both sophisticated and meaningful, then its abstract syntax will also be sophisticated when compared to the abstract syntax for a simple concrete syntax. Rather than "core vs. abstract", I suggest thinking of "concrete vs. abstract" and "simple vs. sophisticated" and "core vs. shell" as three distinct categories.

You could reasonably say that my simple concrete syntax is a reification of my simple abstract syntax. I did intentionally design it to be so, or as close as possible. But I believe the simplicity matters more than the concrete/abstract aspect. A sophisticated syntax, even when abstracted by a library, has more cases and more complicated relationships between program fragments for back-end tools to to process.

Projectional editing above a simple core syntax tends to work this way: assuming we have already parsed the concert core, we then "parse" our simple core abstract syntax into a sophisticated abstract shell syntax, then render this shell to a human for editing. (The render may be textual or graphical in nature.) After editing, we rewrite our edited syntax back into the abstract core, then serialize and save the concrete core. Only human agents require this treatment. There may be some redundant efforts, e.g. if we want to add syntax highlighting to a textual view. But it's still isolated to the front-end. There are significant benefits from our ability to work with multiple shells, specific to the problem or user or rendering engine.

As an aside, if meta-programming is needed, it will be performed using a sophisticated syntax if not using a simple syntax. Ideally it should use an abstract syntax either way. But a simpler abstract syntax is generally more convenient for metaprogramming (based on my experiences, at least).

Simple language = intermediate representation?

What's the relationship between your simple & sophisticated syntaxes? It sounds like the simple syntax is effectively a compilation target for sophisticated projectional editors.

One advantage of abstract syntax over concrete syntax is that you can have multiple abstractions for the same syntax. For example, at a certain level of abstraction, you might just understand the hierarchical structure. At another, the binders, too. So, even if the underlying syntax is sophisticated, the tooling can treat it simply with abstraction.

Bi-directional

You could understand it as an intermediate language to which we compile the user facing shell syntax, but only if you also understand the shell syntax in terms of decompilation from the intermediate language. Compile-decompile as a form of codec. Most intermediate languages and front end compilers aren't designed for this round tripping.

I agree that there are plenty of advantages to working with abstract syntax within a tool. Concrete syntax is mostly for communication and storage.

Too restrictive?

Doesn't that greatly restrict what you're able to do in the sophisticated language? For example, I understand that your simple core doesn't support names, but how could you add them in a projection? I can imagine stashing annotations for the sophisticated language into core comments, but that seems a bit of hack.

Not very restrictive

I do need to use comments to stash non-semantic information like human provided variable names, as you've surmised. But that isn't much of a hack. The primary utility of human meaningful variable names is to comment for human readers. So it makes good sense to model them as a comment.

I've always thought that

I've always thought that every language's stdlib should come with a parser for the language to aid in building tooling. This would seem to solve most of the problem you're alluding to without restricting oneself to simple syntaxes.

Some might object that the compiler should be free to aggressively reorganize its parser and AST without worrying about breaking parser clients, which is a sensible goal, but I don't see a problem with marking the parser as inherently unstable and breaking backwards compatibility when needed. So the tooling only works with a certain range of language versions, this is little different than most experimental abstractions. Better to have it available anyway to accelerate tooling development, because tooling is almost as important as the compiler itself these days.

re: libraries, compatibility

Providing libraries to parse the language is a good idea. But based on my experiences, "simple" and "abstract" are on entirely different axes, and parsing concrete syntax to an abstract syntax is only one step in tooling. See my reply to Matt.

Backwards compatibility is valuable due to how codebases grow organically, and how important the human (psychological, social, economic) aspect is to languages. Without backwards compatibility, we essentially partition the language, abandoning the people and their learning, the tools and libraries, the businesses and books. Backwards compatibility isn't something to be abandoned lightly.

One nice psychological aspect of building upon a very simple syntax is that it's relatively easy to call it "complete", such that I'm confident I won't need to change it and potentially break compatibility in the future. If I add a little non-critical syntactic features to optimize for some common use case, I find it very easy to add just a little more to cover another case. One language extension is a gateway for more. Complexity can aggregate rather quickly unless we're very careful and disciplined about it.

agreed, but ...

Yes, I think that each language should have its own standardized meta-library for working the language. That should be as much a given in 2017 as a language having a compiler.

However, it doesn't help sometimes where it's needed most, i.e. when a big chunk of the tool-chain is already built in some other language, such as an editor/IDE (Eclipse, DevStudio, IDEA, etc.)

Commercial language

There probably exists niches were a "commercial language" (closed-source, and you pay for the libraries that allow embedded usage) can succeed, but not in my community -- and I don't feel like looking at the language much further if it doesn't want to share its good ideas with the rest of us.

I was curious of what the 8th type system is like -- the sort of generic polymorphism you need for expressive concatenative programming is not exactly the same as for non-stack-based languages, so I was curious to see what compromise had been made between genericity and simplicity. I haven't found any document.

I think you should clearly indicate that the language is closed-source and commercial when you communicate about it, so that people for which it is a showstopper can save time.

Re: Commercial Language

Thanks for pointing that out, and sorry for the wasted time. I hadn't actually noticed it was closed source. I've updated the story accordingly.

Struck by this as well

Forth has behaviors in which blocks can exit in such a way that the stack is conditionally unwound (that is: it may or may not pop). It seems to me that this defeats purely static type checking. I suppose one could do a scheme that typed the (stack, operator instance) pairs in the style of lightweight static checking, but I suspect that in the limit it is necessary to do dynamic type checks.

Am I missing something?

Stack types

Static typing is possible for stack languages, but does constrain use of subprograms to just those with obviously static type. Conditional behavior needs careful attention. And you cannot easily model the pick and roll operations (which take a number for how deep on the stack to operate) without dependent types.

For my language, I use a simple annotation that enables developers to mark functions dynamic up to a given arity. Static typing is possible after some partial evaluation. This could allow static typing for code using pick and roll, for example, but only in contexts where the numeric argument is also computable statically. It's weaker and simpler than full dependent types, yet sufficient for a lot of staged metaprogramming.