Feather: A Heapless Functional Programming Language

Hi all!

I'm busy designing my third language (or, redesigning the first, depending on how you look at it). I changed Feather to be a heapless functional programming language. Primarily, this means there is no need for a garbage collector. Programs for embedded devices that have real-time constraints stand to benefit the most from this. With any luck, we could get a subset to compile to some GPUs.

Here's the design as it is. Note that I use underscores for tabs whitespace due to forum restrictions -


The Feather Programming Language is a heapless functional programming language for real-time software, especially on embedded devices. It is a compiled language with a Hindley-Milner type-inferencing algorithm and strict function argument evaluation.

Feather’s lack of heap is accomplished by extending the end frame pointer for previous stack frames whose return slot uses the data of the successive frames that the end frame pointer is extended over. This elides the need for any type of garbage collection. The cost is that a bunch of dead ‘popped’ variables also get extended over, but this shouldn’t be a big problem in practice when using the techniques described next. Feather can also return closures, but that means the end frame pointer must be extended over the closed-over values, so they must be used with caution.

Non-consable return values are placed at the beginning of the next stack frame and the previous end frame pointer is extended over it.

Consable return values like lists and maps are treated with more care. When a consable value is to be returned, 8 to 32 slots are allocated at the beginning of the new stack frame, and any conses that take place on the consable return value are placed in one of those slots. (In fact, there may need to be a way to specify what the beginning slot count should be for a given function call.) Further, any conses on those conses will also be placed in one of those slots. If the return slots are finally filled, then n * 4 new slots are allocated past the last stack variable for further use, and so on. An analyzer could be hooked in at run-time to track the average and max allocations needed at each call site to eliminate most needs to do secondary (and so on) allocations for consable return values.

There may also be a special cons that allows temporary conses to be used on consable return value that are _not_ placed in the return slots. If one of these accidentally makes it into the returned cons structure, it will be wasteful due to extending the return frame end pointer over that stack variable - but will not be terminal to the program.

With these technique, I think a minimum amount of space will be wasted on dead varables, but we'll have to see for sure. Tail-call optimizations should be possible with this set up.

There will almost certainly be some restrictions on language's expressiveness due to the heaplessness, but we'll discover and work around those as we go along. For example, I'm not sure if monads will be implementable without a heap, but the syntax is there in case we can.

Some syntax -

/* multi-line comment /* nests */*/
// line comment
/// Documentation comments are done exactly the same as in F#.

// function definition syntax uses powerful mix-fixity and repetition specification
define two as 1 + 1
define double [x] as 2 * x
define [x] times [y] as x * y
define [x] *** [y] precedence 0.5 as x * x * y * y
define if [condition] then [lazy consequent] else [lazy alternate] as condition ? consequent ?? alternate // ? and ?? are the built-in ternary operator
define let ([n] be [x] *and) in [e] with (Name n, Expr e) as…
define sqrt [x] requiring x >= 0 as… // 'requiring' specifies function contract ('ensuring' might mean post-condition).

// function signatures
asFloat :: x -> y |> (Number x, Float y)
let :: ((n, x), e) -> r |> (Name n, Expr e) // function signatures only use the major word

(map)________// partial application. Same as (x, y) ~> x map y
(xs map)_____// same as x ~> xs map x
[map]________// translation to a fully curried function. Same as x ~> y ~> x map y
(f) . (g)____// function composition

() // unit
'c' // char
"string" // string
$(1, 2, 3) // list
$[1, 2, 3] // vector
${1, 2, 3} // tuple
$(("Jim", 1), ("Bob", 2)) // association list
$[["Jim", 1], ["Bob", 2]] // dictionary
${{a, b, c}} // set

double 5
2 times 3
(x + y) * z
x = 5 // equality
x : xs // cons
x | 0xff // all characters in a hex literal must be lower-case (including the 'x' separator)
true & false // and operator (non-short-circuiting version)
true && false // and operator (short-circuiting version)
set x to 5 // x is a variable (mutable)
set x to $(1..5)
set x : xs to $(1, 2, 3)
aVector !! 5 // look up 5th element
lazy 1 + 1 // use lazy evaluation. NOTE: not sure how well this will work in a heapless language…

(x, y) ~> x * x - y
steps print 1 then print 7
if 3 let x be 10 and y be 13 in x + y
$(1, 2, 3) map (+ 3) filter (isOdd) // because of white space, (+ 3) != (+3)
for x from someList and y from $(1..5) and where y isOdd then lift x * y

// recommended indentation
let
____x be 10 and
____y be 88 and
____z be 12 in
____x + y * z

// sequential expression
steps
____print 1 then
____print 7 then
____print "Hi!"

// choice expression
choose
____when x = 5 then print 22
____when x >= 5 then print 33
____when x ____else print 55

// simple match
match m
____with x : xs then x
____with ${x, x, _} then x
____else m

// multi-match
match m and n
____with x : xs and x then m
____else n

// match chain
match m
____with x : xs where x = 5 then xs
____else $()

// restricted match (makes inferred function type more specific when using a match)
match m as Int
____with 0 then 10
____with 1 then 20
____else 100

// monad comprehension
for
____x from someList and
____y from $(1..5) and
____where y isOdd then
____lift x * y

// functional types
data...________// like Haskell data
synonym..._____// like Haskell type
derivation...__// like Haskell newtype
protocol...____// like Haskell class
instance…______// like Haskell instance

----

Error handling - Uses a system similar to scheme's Condition system.

----

Literal numbers are Int by default -
5(u)b________// (U)Byte - 8-bit
5(u)s________// (U)Short - 16-bit
5(u)_________// (U)Int - 32-bit
5(u)g________// (U)Long - 64-bit
5.0__________// Float - 32-bit
5d___________// Double - 64-bit
5(u)n________// (U)NInt – Native-size integer – varies by hardware platform.

----

Function type arrows –
-> - Describes a function of inferred, unspecified, or dependent purity
=> - Describes a pure function
:> - Describes an impure function

----

Uses of semi-colon -
fn (1, 2, 3; 'a', 'b', 'c') // ; separates repeated parameters
fn2 (1, ; 'a') // ; omits optional parameters
class Thing expose attribute Age as var UInt; print 5 // ; manually ends expr scope without a newline. Note that classes aren’t in the language anymore.

----

Miscellaneous rules -
import… // very similar to Haskell's import, but much simpler and less error prone
Module.fn // qualifies fn with module name. A.b != a . b due to whitespace (and apparently capitalization). In the expression Module.(+), the parens are required for symbolic operator qualification.
shadowed fn // calls the function that the current function is shadowing… might be a special form
rest… // the rest parameters with name rest
rule: indentation defines scope
rule: symbolic operators like + and * have specifiable precedence and associativity, textual ones do not
rule: symbolic operators take precedence over all textual operators
rule: symbolic operators can be sequenced and so can textual ones, but they cannot be sequenced together in the same function specification.
rule: 'if' is a major word since it's the beginning of a sequence and 'then' and 'else' are minor words since they're in the tail of a sequence.
rule: symbolic chars cannot be mixed with alphanumeric chars in the same word, and neither can be mixed with delimiters such as (, ), {, }, [, and ]. Sadly this means we can’t have meaningful (|banana clips|) or [|oxford brackets|].
rule: words in an ongoing sequence take precedence over a single word or words beginning a new sequence.
rule: the point where a function is injected is its application site, even if it is partially applied with no arguments, like map in the expression fn (map).
rule: purity means more than just no immediate side-effects; it means referentially transparent.
rule: expressions should generally be readable from left to right and top to bottom.
rule: there are 3 types of function purity – pure, impure, and dependent. Only higher order functions can be dependent. A dependent function resolves to pure or impure depending on how it is invoked. If one or more of its parameter functions is impure, it becomes impure. Otherwise, its purity depends on whether it calls other impure functions.
rule: a function with impure contents can be forced to be have a pure type with the pure function modifier. This allows most pure functions to be inferred, and the remaining be manually declared as such.
rule: all module, type, protocol, and type / data constructor names start with an uppercase letter. This makes pattern matching much easier to use (this is what Haskell does).
rule: the only time you need to qualify a name with the module is when it collides with another imported name.
rule: operator precedence and associativity is the same as in C, but with floating point precedence values (is that a good idea if the embedded systems might not have floating-point acceleration?)
rule: like python and its recommended 'pythonic' style philosophy, there is a similar recommended philosophy in Feather. In Feather, there is always a single best way to do any one thing, and it should be obvious. This won't always be achievable, but is a good language goal.
rule: textual operators should typically be written in an active voice.

So anyways, I'll keep designing this until I get a chance to start implementing it. I've already implemented a pure functional programming language with F# (see AmlProject at SourceForge), so I've got enough confidence to start on this one. I'm thinking I'll use LLVM to build the compiler.

Hopefully this language will finally allow us to program stuff like games for embedded devices and console in a functional style.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

"I'm not sure if monads will be implementable without a heap"

Why do you think that? You have ADTs and closures, what more do you need?

Monads can be used to

Monads can be used to express real state, so I expect the approach to somehow be limited in how ADTs and closures are expressive.

The author is targeting (or talks about targeting) GPUs, actually, the only way I know how that would work is if all closures were somehow inlined at compile-time, but its hard to tell, the details of the language are still pretty raw.

I would hope the author could come up with a better design document, and talk more about examples of how the language is used and how it is similar and dissimilar to other languages out there.

Regions

This elides the need for any type of garbage collection. The cost is that a bunch of dead ‘popped’ variables also get extended over, but this shouldn’t be a big problem in practice.

What you sketch sounds like a coarse-grained variant of region-based memory management. You may want to have a look at the respective literature. At the end of the day, it turns out that it is quite a severe problem in practice (even with more fine-grained inference), and ultimately, a supplemental GC seems necessary after all.

Pre-allocated Slots for Returned Conses

To avoid extending the end frame pointer over lots of dead variables in the face of consable structures like lists and maps, I am thinking of employing the following strategy -

Non-consable return values are placed at the beginning of the next stack frame and the previous end frame pointer is extended over it.

Consable return values like lists and maps are treated with more care. When a consable value is to be returned, 8 to 32 slots are allocated at the beginning of the new stack frame, and any conses that take place on the consable return value are placed in one of those slots. (In fact, there may need to be a way to specify what the beginning slot count should be for a given function call.) Further, any conses on those conses will also be placed in one of those slots. If the return slots are finally filled, then n * 4 new slots are allocated past the last stack variable for further use, and so on. An analyzer could be hooked in at run-time to track the average and max allocations needed at each call site to eliminate most needs to do secondary (and so on) allocations for consable return values.

There may also be a special cons operator that allows temporary conses to be used on consable return value that are _not_ placed in the return slots. If one of these accidentally makes it into the returned cons structure, it will be wasteful due to extending the return frame end pointer over that stack variable - but will not be terminal to the program.

With these technique, I think a minimum amount of space will be wasted on dead varables, but we'll have to see for sure. This will also help performance by keeping related variables together in memory.

EDIT: I'm going to place this in the spec so it's more informative.
EDIT: many further edits.

it looks like

Cheney on the MTA.

What is the current status

What is the current status of regions? I've read a couple of different things:

  • Regions work fairly well but you need to write your program while keeping in mind how the regions will be allocated, and manually do some deep copies (~= manually doing copying GC).
  • You still need a real GC, and then the advantage of regions is largely negated: regions + GC is slower than just GC.
  • The Stalin compiler seems to use region memory management plus a conservative garbage collector. Apparently almost all garbage is reclaimed by regions, so a slow GC is not such a problem.
  • In LLVM, pool allocation can give large speedups because pools improve cache behavior (and less administrative data structures are required compared to a general purpose malloc).

I'm not sure how this would

I'm not sure how this would be directly related to regions. It's merely a single, completely contiguous stack that grows in a single direction in an intelligently-automated or user-directed way, and shrinks when all values in a calculation (other than the result) are popped.

Ah well.

If you're not going to support a heap, I assume you're going to support a stack, or rather I assume your language is one of the stack based languages.

All stack based languages have the same problem: In case some large binary object, like an array or a complex tree, is present on the stack as an intermediate value, you end up copying the darned thing with every darned simple update on that value.

There's a reason we have heaps...

There are alternative to

There are alternative to stack-based and heap-based languages. MapReduce and CUDA come to mind, data flow is mostly implicit with respect to the computations.

CUDA heapless?

I do understand that the device-side part of CUDA is basically C/C++ minus any of the features that might allocate heap memory (e.g., malloc, new). That said, I would never consider it to be any less stack+heap based than C. The various address-space qualifiers in the CUDA C language make things somewhat explicit: e.g., local memory is on the stack, while global memory is in the heap.

Could you elaborate on what features in CUDA you see as an alternative to stack/heap allocation?

I found it to be a very

I found it to be a very symbiotic model. The CPU-side does stack/heap manipulation, and then the GPU side is essentially a big parallel-for (one data-parallel stack frame).

It is also a step backwards: the CPU code manually manipulates the heap, such as going from scratchpad to texture memory. This is probably a good choice in a systemsy way; it has proven itself as an assembly target for higher-level GPU languages.

Ok, as of 2.0, you can

Ok, as of 2.0, you can in-kernel allocate memory on small fixed-sized stacks or heaps (i.e., they support pointers, overall sizes must be specified statically); so malloc and free are supported. Before, all buffers had to be pre-allocated on the CPU, and you still should do that if the buffers are somewhat large.

Persistent data structures

Persistent data structures could make that problem go away with some constant overhead. Easiest read I know for grokking the concept if you're not familiar: http://blog.higher-order.net/2009/02/01/understanding-clojures-persistentvector-implementation/

(A heap would still exist under the hood, but not exposed in the language)