archives

ANN: Bipedal, a new, untyped, stack-based HLL

In the seven years since I first began tinkering on what would eventually become Babel, the Internet has exploded with things called Babel. It is, unfortunately, a very popular name for things related to language. So, I've renamed the syntax layer of the language to Bipedal, while the VM is still called "Babel" or "the Babel-core".

I spoke on Babel at StrangeLoop 2013 in St. Louis on the 18th. The video will be available sometime in the next few months.

Bipedal

Bipedal is a front-end for the Babel programming langage core. Babel is essentially the VM on which Bipedal runs.

Bipedal progamming language is a general-purpose, untyped, stack-based high-level language. Despite being stack-based, it is not concatenative. It is not pure, as many operators can have side-effects. The syntax-layer is as transparent as possible, so when you look at Bipedal, you're looking at a nearly 1:1 translation to Babel bytecode.

Blitz Syntax Overview

comment:    -- this is a coment  
immediates: 'string' "string\n"  12345  0x123aced   
identifiers: non_alpha<>chars_permitted!_but-no-spaces:quotes|or|parens  
code: { "Hello, world" << }  -- << is the print operator  
list: ( 1 2 3 )  
s-expr:   
     [code "Hello, world" <<]    -- same as above  
     [list 1 2 3]                -- same as above  
     [ptr 1 [ptr 2 [ptr 3 nil]]] -- same as (1 2 3)  
     [val 1 2 3]   -- array/vector

There is more to the syntax, but this is enough to introduce some of the key concepts of the language. You can check out more examples at RosettaCode.

Bipedal as High-Level Language

There are two strikes against Bipedal's use as a HLL: 1) it is stack-based and some people are of the view that stack-based languages are only suited to intermediate-languages, 2) it is untyped. I have attempted to address each of these problems in my design of Bipedal/Babel.

Stack-noise reduction

Babel actually maintains two stacks: up-stack (ustack) and down-stack (dstack). Let the notation

x y z|

represent a stack where x has been pushed first, z last, with '|' visually marking the TOS.

Now, let us define star_n as a combinator that will print an asterisk n times, where n will be passed on TOS. One way to write the code for this is as follows:

[star_n { { '*' << } swap times }]

The order of operands for the times operator is "{ code } N times", where N is some number. Since n is being passed on TOS, this creates stack-noise where we must swap the code to be executed with the number of times it is to be executed, already on TOS.

Another way to solve this in Babel is to use the up (->) and down (<-) operators:

[star_n { <- { '*' << } -> times }]

This has an equivalent effect. A view of the stack images:

          dstack  ustack    
           n|       |       | 
            |       |n      | t
  { '*' << }|       |n      v
{ '*' << } n|       |

In this case, it is not clear that up/down are an improvement over swap - in fact, we have used an additional operation to accomplish the same task. However, the up and down operators allow the stack to be "taken apart" and processed piece-by-piece, for any number of "arguments" passed to a particular operator. To illustrate this, consider the following function in the notation of C:

foo(a, b, c)

The arguments are pushed on the stack and then foo is called. If it is possible to perform all processing on a, then b, and then c, then we can apply the following schema using the up/down operators to process foo in a similar way in Babel:

[foo { <- <-            -- (a is now TOS)  
        operate_on_a  
        ->               -- (b is now TOS)  
        operate_on_b  
        ->               -- (c is now TOS)  
        operate_on_c }]

Of course, this pattern will not solve all stack-noise problems, as it can be the case that values on the stack need to be accessed in some arbitrary order with repetitions, such as: a c b c b a c b. The up/down operators are unlikely to reduce stack noise over against the more traditional stack-rotations in this case.

So, Babel permits the use of a 'let' construct to allocate local variables:

[foo {(a b c)   
     { a c +  
     b c /  
     b a c *  
     b - - - }   
     let }]

This construct should not be confused with the let form in Lisp because it does not perform any kind of "binding" - a, b and c in this case are just pointers and the let operator merely saves/restores the pointers (using the rstack).

Another tool for tackling stack noise is the nest operator. Nest allows the construction of a stack-discipline using the following rules:

  • Current stack is saved on entry to a nested-context, except TOS which is popped off current stack and becomes TOS of the new stack
  • Prior stack is restored on exit from a nested context, except TOS which is popped off the nested stack and pushed onto the restored stack

    [bar { 'a' 'b' { << 'c' 'd' } nest }]

Invoking bar will cause 'b' to be printed to STDOUT and the following will be left on the stack:

'a' 'd'|

Nest allows a nested-section to monopolize the stack. This is useful if you want to generate a bunch of results on the stack and then collect them together into a list (you can use the collect pseudo-operator for this).

Using nest in combination with let will give something roughly analogous to an ordinary function-call stack discipline:

[foo { { (a b c) { <code> } let } nest } ]

Bipedal has a pseudo-operator called args to perform this:

[foo {(a b c) { <code> } args}]

The args pseudo-operator also assigns the elements of the list on TOS (assumed) to a, b and c.

Typing

Babel does not have any built-in type system, beyond some rudiments to differentiate between pointers and non-pointers. To me, "type" means one of two, completely different concepts:

a) The format of data. ASCII is a type specification, XML is a type specification, JPEG image format is a type specification. When you say, "this is type string", it is difficult to say what exactly you mean without talking about the format in which the string data is kept.

b) Data-flow connectors, i.e. "you can't put a square peg in a round hole", or "you can't put an int in a char" (usually). Of course, typing systems can convey much more information, but it is my contention that there is no reason why this information must be conveyed through types - for example, the "is-a" and "has-a" relationships.

What Babel implements instead of types is a tag - a tag is a hash (typically, from hashing a string) that can be attached to an otherwise untyped data-structure. Of course, tagged things can be composed, so you can have "types all the way down", if you need it. That is, you could track the fact that a string is a string, a number is a number (and its size) and so on.

What you don't get (automatically) in Babel is any kind of inference. However, you can implement your combinators in such a way that typing is always tracked through tags.

Support for prevalent data forms, I/O

Babel has built-in support for operating on UTF-8 encoded strings, numerics, arrays, lists, hash-tables, binary and n-ary trees. Bipedal files are UTF-8 formatted. File and console I/O are supported.

Unique or Notable Features

The Babel Virtual Machine is ultra-lightweight, making launching of nested virtual machines very cheap. In Babel, the VM is intended to be the natural unit of abstraction. There is no OO support built-in. The advantages of VM abstraction over OO include that the VM can be interrupted, hibernated, forked, and has configurable resource limits (such as memory-allocation, time, file-access privileges, etc.) Other unique or notable features of Babel include:

  • It is easy to containerize and launch nested Babel programs

  • Built-in crypto makes it easy to enforce a "white-list" code execution policy, reducing the risks associated with executing remote code. The idea is to invite the community of users to write their own, distributed library and to choose their own, trusted sources.

  • Your data structures are all stored in an underlying, uniform bstruct structure, making it trivial to save and restore them to and from disk, to make deep copies of them, to delete all memory associated with them, and so on.

  • You can easily compress and encrypt your program data objects, (or whole Babel programs).

  • Visualizing your data is uniquely easy with Babel's built-in support for generating Graphviz dot files. These can be post-processed with the dot tool to generate graphical snapshots of your data. This speeds up debug and helps the programmer fully absorb the semantic significance of the syntax.

  • Babel provides both Lisp-style lists and array structures, permitting you to organize your data in a manner to maximize both flexibility and performance. Modern computer architecture is array-based and can perform array lookups in constant time rather than linear time for list- traversal. But for data that is constantly changing size or continually undergoing complex permutations, lists are a clear performance winner by permitting you to perform more of your operations in-place.

  • The Babel namespace is modeled on a Unix-style directory structure, making it possible to nest data and code in arbitrarily deep and complex structures. Manipulating namespace paths permits implementation of OO-style templates, polymorphism and more.

But We Already Have Factor

My answer to this is that, while Babel is a "practical" stack-based language, it's just a completely different flavor of programming language. It's difficult to meaningfully compare very complex things like programming languages, but perhaps it could be put most succinctly this way: Babel is the Perl5 to Factor's Python. Minus Perl's labyrinthine syntax, of course.

Another way of comparing the difference is that the Babel binary is right around 100K as of this writing; it may grow as large as 200K by the time Babel is at revision 1.0, but this will mostly be the result of linking in some support for encryption and compression operators. The Babel-core is essentially finished. Babel takes a more minimalist "swiss-army-knife" approach versus the more "kitchen-sink" approach of Factor, which has a 425K core plus an incredible 66M image file of Factor library code.

Conclusion

Your feedback is welcome. Babel isn't quite ready for prime-time, but it's getting closer every day. If you have a Windows box with Cygwin and ActivePerl, you can just clone Babel from github, run make.pl and you're ready to roll. You don't need a compiler as the repo has a copy of tcc and will compile from that. If you want to try Babel, please do not hesitate to contact me and I'll help you with any questions you may have.