Feather: A Heapless Functional Programming Language

Hi all!

I'm busy designing my third language (or, redesigning the first, depending on how you look at it). I changed Feather to be a heapless functional programming language. Primarily, this means there is no need for a garbage collector. Programs for embedded devices that have real-time constraints stand to benefit the most from this. With any luck, we could get a subset to compile to some GPUs.

Here's the design as it is. Note that I use underscores for tabs whitespace due to forum restrictions -

The Feather Programming Language is a heapless functional programming language for real-time software, especially on embedded devices. It is a compiled language with a Hindley-Milner type-inferencing algorithm and strict function argument evaluation.


Featherâ€™s lack of heap is accomplished by extending the end frame pointer for previous stack frames whose return slot uses the data of the successive frames that the end frame pointer is extended over. This elides the need for any type of garbage collection. The cost is that a bunch of dead â€˜poppedâ€™ variables also get extended over, but this shouldnâ€™t be a big problem in practice when using the techniques described next. Feather can also return closures, but that means the end frame pointer must be extended over the closed-over values, so they must be used with caution.
Non-consable return values are placed at the beginning of the next stack frame and the previous end frame pointer is extended over it.
Consable return values like lists and maps are treated with more care. When a consable value is to be returned, 8 to 32 slots are allocated at the beginning of the new stack frame, and any conses that take place on the consable return value are placed in one of those slots. (In fact, there may need to be a way to specify what the beginning slot count should be for a given function call.) Further, any conses on those conses will also be placed in one of those slots. If the return slots are finally filled, then n * 4 new slots are allocated past the last stack variable for further use, and so on. An analyzer could be hooked in at run-time to track the average and max allocations needed at each call site to eliminate most needs to do secondary (and so on) allocations for consable return values.
There may also be a special cons that allows temporary conses to be used on consable return value that are _not_ placed in the return slots. If one of these accidentally makes it into the returned cons structure, it will be wasteful due to extending the return frame end pointer over that stack variable - but will not be terminal to the program.
With these technique, I think a minimum amount of space will be wasted on dead varables, but we'll have to see for sure. Tail-call optimizations should be possible with this set up.
There will almost certainly be some restrictions on language's expressiveness due to the heaplessness, but we'll discover and work around those as we go along. For example, I'm not sure if monads will be implementable without a heap, but the syntax is there in case we can.
Some syntax -
/* multi-line comment /* nests */*/

// line comment

/// Documentation comments are done exactly the same as in F#.
// function definition syntax uses powerful mix-fixity and repetition specification

define two as 1 + 1

define double [x] as 2 * x

define [x] times [y] as x * y

define [x] *** [y] precedence 0.5 as x * x * y * y

define if [condition] then [lazy consequent] else [lazy alternate] as condition ? consequent ?? alternate // ? and ?? are the built-in ternary operator

define let ([n] be [x] *and) in [e] with (Name n, Expr e) asâ€¦

define sqrt [x] requiring x >= 0 asâ€¦ // 'requiring' specifies function contract ('ensuring' might mean post-condition).
// function signatures

asFloat :: x -> y |> (Number x, Float y)

let :: ((n, x), e) -> r |> (Name n, Expr e) // function signatures only use the major word
(map)________// partial application. Same as (x, y) ~> x map y

(xs map)_____// same as x ~> xs map x

[map]________// translation to a fully curried function. Same as x ~> y ~> x map y

(f) . (g)____// function composition
() // unit

'c' // char

"string" // string

$(1, 2, 3) // list

$[1, 2, 3] // vector

${1, 2, 3} // tuple

$(("Jim", 1), ("Bob", 2)) // association list

$[["Jim", 1], ["Bob", 2]] // dictionary

${{a, b, c}} // set
double 5

2 times 3

(x + y) * z

x = 5 // equality

x : xs // cons

x | 0xff // all characters in a hex literal must be lower-case (including the 'x' separator)

true & false // and operator (non-short-circuiting version)

true && false // and operator (short-circuiting version)

set x to 5 // x is a variable (mutable)

set x to $(1..5)

set x : xs to $(1, 2, 3)

aVector !! 5 // look up 5th element

lazy 1 + 1 // use lazy evaluation. NOTE: not sure how well this will work in a heapless languageâ€¦
(x, y) ~> x * x - y

steps print 1 then print 7

if 3 
let x be 10 and y be 13 in x + y

$(1, 2, 3) map (+ 3) filter (isOdd) // because of white space, (+ 3) != (+3)

for x from someList and y from $(1..5) and where y isOdd then lift x * y
// recommended indentation

let

____x be 10 and

____y be 88 and

____z be 12 in

____x + y * z
// sequential expression

steps

____print 1 then

____print 7 then

____print "Hi!"
// choice expression

choose

____when x = 5 then print 22

____when x >= 5 then print 33

____when x 
____else print 55
// simple match

match m

____with x : xs then x

____with ${x, x, _} then x

____else m
// multi-match

match m and n

____with x : xs and x then m

____else n
// match chain

match m

____with x : xs where x = 5 then xs

____else $()
// restricted match (makes inferred function type more specific when using a match)

match m as Int

____with 0 then 10

____with 1 then 20

____else 100
// monad comprehension

for

____x from someList and

____y from $(1..5) and

____where y isOdd then

____lift x * y
// functional types

data...________// like Haskell data

synonym..._____// like Haskell type

derivation...__// like Haskell newtype

protocol...____// like Haskell class

instanceâ€¦______// like Haskell instance
----
Error handling - Uses a system similar to scheme's Condition system.
----
Literal numbers are Int by default -

5(u)b________// (U)Byte - 8-bit

5(u)s________// (U)Short - 16-bit

5(u)_________// (U)Int - 32-bit

5(u)g________// (U)Long - 64-bit

5.0__________// Float - 32-bit

5d___________// Double - 64-bit

5(u)n________// (U)NInt â€“ Native-size integer â€“ varies by hardware platform.
----
Function type arrows â€“

-> - Describes a function of inferred, unspecified, or dependent purity

=> - Describes a pure function

:> - Describes an impure function
----
Uses of semi-colon -

fn (1, 2, 3; 'a', 'b', 'c') // ; separates repeated parameters

fn2 (1, ; 'a') // ; omits optional parameters

class Thing expose attribute Age as var UInt; print 5 // ; manually ends expr scope without a newline. Note that classes arenâ€™t in the language anymore.
----

Miscellaneous rules - importâ€¦ // very similar to Haskell's import, but much simpler and less error prone Module.fn // qualifies fn with module name. A.b != a . b due to whitespace (and apparently capitalization). In the expression Module.(+), the parens are required for symbolic operator qualification. shadowed fn // calls the function that the current function is shadowingâ€¦ might be a special form restâ€¦ // the rest parameters with name rest rule: indentation defines scope rule: symbolic operators like + and * have specifiable precedence and associativity, textual ones do not rule: symbolic operators take precedence over all textual operators rule: symbolic operators can be sequenced and so can textual ones, but they cannot be sequenced together in the same function specification. rule: 'if' is a major word since it's the beginning of a sequence and 'then' and 'else' are minor words since they're in the tail of a sequence. rule: symbolic chars cannot be mixed with alphanumeric chars in the same word, and neither can be mixed with delimiters such as (, ), {, }, [, and ]. Sadly this means we canâ€™t have meaningful (|banana clips|) or [|oxford brackets|]. rule: words in an ongoing sequence take precedence over a single word or words beginning a new sequence. rule: the point where a function is injected is its application site, even if it is partially applied with no arguments, like map in the expression fn (map). rule: purity means more than just no immediate side-effects; it means referentially transparent. rule: expressions should generally be readable from left to right and top to bottom. rule: there are 3 types of function purity â€“ pure, impure, and dependent. Only higher order functions can be dependent. A dependent function resolves to pure or impure depending on how it is invoked. If one or more of its parameter functions is impure, it becomes impure. Otherwise, its purity depends on whether it calls other impure functions. rule: a function with impure contents can be forced to be have a pure type with the pure function modifier. This allows most pure functions to be inferred, and the remaining be manually declared as such. rule: all module, type, protocol, and type / data constructor names start with an uppercase letter. This makes pattern matching much easier to use (this is what Haskell does). rule: the only time you need to qualify a name with the module is when it collides with another imported name. rule: operator precedence and associativity is the same as in C, but with floating point precedence values (is that a good idea if the embedded systems might not have floating-point acceleration?) rule: like python and its recommended 'pythonic' style philosophy, there is a similar recommended philosophy in Feather. In Feather, there is always a single best way to do any one thing, and it should be obvious. This won't always be achievable, but is a good language goal. rule: textual operators should typically be written in an active voice.

So anyways, I'll keep designing this until I get a chance to start implementing it. I've already implemented a pure functional programming language with F# (see AmlProject at SourceForge), so I've got enough confidence to start on this one. I'm thinking I'll use LLVM to build the compiler.

Hopefully this language will finally allow us to program stuff like games for embedded devices and console in a functional style.

"I'm not sure if monads will be implementable without a heap"

Why do you think that? You have ADTs and closures, what more do you need?

By Sjoerd Visscher at Mon, 2012-05-07 10:05 | login or register to post comments

Monads can be used to

Monads can be used to express real state, so I expect the approach to somehow be limited in how ADTs and closures are expressive.

The author is targeting (or talks about targeting) GPUs, actually, the only way I know how that would work is if all closures were somehow inlined at compile-time, but its hard to tell, the details of the language are still pretty raw.

I would hope the author could come up with a better design document, and talk more about examples of how the language is used and how it is similar and dissimilar to other languages out there.

By Sean McDirmid at Mon, 2012-05-07 10:53 | login or register to post comments

Regions

This elides the need for any type of garbage collection. The cost is that a bunch of dead â€˜poppedâ€™ variables also get extended over, but this shouldnâ€™t be a big problem in practice.

What you sketch sounds like a coarse-grained variant of region-based memory management. You may want to have a look at the respective literature. At the end of the day, it turns out that it is quite a severe problem in practice (even with more fine-grained inference), and ultimately, a supplemental GC seems necessary after all.

By Andreas Rossberg at Mon, 2012-05-07 10:52 | login or register to post comments

Pre-allocated Slots for Returned Conses

To avoid extending the end frame pointer over lots of dead variables in the face of consable structures like lists and maps, I am thinking of employing the following strategy -

Non-consable return values are placed at the beginning of the next stack frame and the previous end frame pointer is extended over it.

Consable return values like lists and maps are treated with more care. When a consable value is to be returned, 8 to 32 slots are allocated at the beginning of the new stack frame, and any conses that take place on the consable return value are placed in one of those slots. (In fact, there may need to be a way to specify what the beginning slot count should be for a given function call.) Further, any conses on those conses will also be placed in one of those slots. If the return slots are finally filled, then n * 4 new slots are allocated past the last stack variable for further use, and so on. An analyzer could be hooked in at run-time to track the average and max allocations needed at each call site to eliminate most needs to do secondary (and so on) allocations for consable return values.

There may also be a special cons operator that allows temporary conses to be used on consable return value that are _not_ placed in the return slots. If one of these accidentally makes it into the returned cons structure, it will be wasteful due to extending the return frame end pointer over that stack variable - but will not be terminal to the program.

With these technique, I think a minimum amount of space will be wasted on dead varables, but we'll have to see for sure. This will also help performance by keeping related variables together in memory.

EDIT: I'm going to place this in the spec so it's more informative.
EDIT: many further edits.

By bryanedds at Mon, 2012-05-07 12:05 | login or register to post comments

it looks like

Cheney on the MTA.

By astrobe at Mon, 2012-05-07 12:23 | login or register to post comments

What is the current status

What is the current status of regions? I've read a couple of different things:

Regions work fairly well but you need to write your program while keeping in mind how the regions will be allocated, and manually do some deep copies (~= manually doing copying GC).
You still need a real GC, and then the advantage of regions is largely negated: regions + GC is slower than just GC.
The Stalin compiler seems to use region memory management plus a conservative garbage collector. Apparently almost all garbage is reclaimed by regions, so a slow GC is not such a problem.
In LLVM, pool allocation can give large speedups because pools improve cache behavior (and less administrative data structures are required compared to a general purpose malloc).

By Jules Jacobs at Mon, 2012-05-07 14:17 | login or register to post comments

I'm not sure how this would

I'm not sure how this would be directly related to regions. It's merely a single, completely contiguous stack that grows in a single direction in an intelligently-automated or user-directed way, and shrinks when all values in a calculation (other than the result) are popped.

By bryanedds at Mon, 2012-05-07 17:25 | login or register to post comments

Ah well.

If you're not going to support a heap, I assume you're going to support a stack, or rather I assume your language is one of the stack based languages.

All stack based languages have the same problem: In case some large binary object, like an array or a complex tree, is present on the stack as an intermediate value, you end up copying the darned thing with every darned simple update on that value.

There's a reason we have heaps...

By marco at Wed, 2012-05-09 01:59 | login or register to post comments

There are alternative to

There are alternative to stack-based and heap-based languages. MapReduce and CUDA come to mind, data flow is mostly implicit with respect to the computations.

By Sean McDirmid at Wed, 2012-05-09 02:18 | login or register to post comments

CUDA heapless?

I do understand that the device-side part of CUDA is basically C/C++ minus any of the features that might allocate heap memory (e.g., malloc, new). That said, I would never consider it to be any less stack+heap based than C. The various address-space qualifiers in the CUDA C language make things somewhat explicit: e.g., local memory is on the stack, while global memory is in the heap.

Could you elaborate on what features in CUDA you see as an alternative to stack/heap allocation?

By Tess Foley at Wed, 2012-05-09 16:59 | login or register to post comments

I found it to be a very

I found it to be a very symbiotic model. The CPU-side does stack/heap manipulation, and then the GPU side is essentially a big parallel-for (one data-parallel stack frame).

It is also a step backwards: the CPU code manually manipulates the heap, such as going from scratchpad to texture memory. This is probably a good choice in a systemsy way; it has proven itself as an assembly target for higher-level GPU languages.

By lmeyerov at Wed, 2012-05-09 21:33 | login or register to post comments

Ok, as of 2.0, you can

Ok, as of 2.0, you can in-kernel allocate memory on small fixed-sized stacks or heaps (i.e., they support pointers, overall sizes must be specified statically); so malloc and free are supported. Before, all buffers had to be pre-allocated on the CPU, and you still should do that if the buffers are somewhat large.

By Sean McDirmid at Wed, 2012-05-09 23:32 | login or register to post comments

Persistent data structures

Persistent data structures could make that problem go away with some constant overhead. Easiest read I know for grokking the concept if you're not familiar: http://blog.higher-order.net/2009/02/01/understanding-clojures-persistentvector-implementation/

(A heap would still exist under the hood, but not exposed in the language)

By dataangel at Tue, 2012-05-22 15:31 | login or register to post comments

Lambda the Ultimate

User login

Navigation