Closures for C

Chris Lattner's post last week to cfe-dev, "Blocks" in Clang (aka closures), has generated a certain amount of enthusiasm, since Clang is, or will be, a compiler for C to LLVM (cf. LLVM 1.5 has been released!).

The representation of closures as blocks resembles the technique used for Poletto's 'C language; cf. the related discussion on The MetaC Language.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Tick C

This is not the same subject, but since you mentioned it I want to say that `C seems cool! Any news of this project?

I would be interested from the perspective of a C target for a language compiler. That beats C macros.

Extract:

void cspec mk_factorial(void)  { /* Create a cspec for factorial(n) */
    unsigned int vspec n = param(unsigned int, 0); /* Argument */
    return `{ if (n > 2) return n * self(unsigned int /* Return type*/, n-1)
              else return n;
            };
}

Moribund 'C

Everything you need to know about 'C can be found at Poletto's tcc page I linked to in the story; in particular for trying the language out, that tcc doesn't target x86, isn't currently maintained, and that 'C isn't supported, so far as I know, by any other compiler.

The best paper for introducing the language is:

‘C: A Language for High-Level, Efficient, and Machine-Independent Dynamic Code Generation (1996). Dawson R. Engler, Wilson C. Hsieh, and M. Frans Kaashoek. In Proc. 23rd POPL conference.

It would be excellent if someone were to take tcc under their arm and target x86, or add 'C support to another compiler...

In particular...

...Clang and LLVM would seem to be natural building-blocks for a modern re-implementation of `C.

In particular...

...Clang and LLVM would seem to be natural building-blocks for a modern re-implementation of `C.

Maybe

I certainly thought of this, although I guess there would be more work to support 'C for Clang than to add x86 as a target for VCODE, the VM behind tcc, because the runtime for 'C has to be able to dynamically compile C code, and the tradeoffs characterising a good dynamic compiler are quite different to those for a good static compiler.. VCODE appears to do well, according to the discussion in:

Dawson Engler (1996). VCODE: a retargetable, extensible, very fast dynamic code generation system. In Proc. conf. PLDI '96.

Check the exokernel tree

There actually was an x86 implementation for VCODE at one time.

I don't think VCODE is all

I don't think VCODE is all that different from GNU Lightning. The latter supports x86 and x86-64 on svn, so it's definitely out there.

Good call

Incidentally, the GNU lightning manual says VCODE "directly inspired gnu lightning", for avoiding an intermediate representation.

A problem with VCODE -- bad

A problem with VCODE -- bad enough that I felt the PLDI committee was asleep at the switch on that paper -- was that the "11-12 instructions per target instruction" was nonsense. That number assumed no buffer overrun check. But that issue aside, VCODE was good work.

When I looked at GNU lightning, it did seem to me that the register models were a bit different, and that the GNU lightning approach might not be as amenable to fast+good code generation.

If you look at the new

If you look at the new standards for C++0x, you will get lambda and closures.

Closures in C++0x

For those interested, description of the feature here.

And the kitchen sink!

C++0x will, AFAIK, support every useful programming-language feature available to a multi-paradigm, impure language. They really need to learn when and where to stop.

GCC nested functions

GCC C supports nested functions.

These are stack-allocated "downward" closures, IOW they cannot escape their scope of definition "upwards". (But I think this is actually the right way to do it in C.)

What's very nice about them is that they are fully compatible with ordinary C function pointers. Existing APIs taking function pointers as arguments can be called with these nested functions.

On the downside, there are no anonymous functions/lambdas, so you have to define nested functions before using them.

(I find the implementation quite interesting, too: when you don't pass a nested function around, the compiler statically knows the stack offsets of closed over variables, and the "closure" is emitted as basically a normal C function. When you do pass the closure around, it can/will be called at statically unknown stack depths; thus the compiler creates a "trampoline" function (on the stack) that contains information about the definition stack depth and that dispatches to the original code (which is rewritten by the compiler to use this "stack depth" information to access closed over variables). Or something like that :P)

One minor disadvantage with

One minor disadvantage with nested functions as defined by GNU C is that they sometimes cause some common syntax errors - a missing closing curly bracket, for instance - to be reported far away from the actual source of the error and in a way that may be difficult to understand, even if nested functions are never used.

While I agree that Gcc's nested functions are nice, they do make the code less portable, and sometimes considerably less efficient. The reason is that on may platforms, expensive cache flushing instructions - sometimes even system calls! - are inserted where the trampolines are generated, to guarantee cache coherence.

On platforms where C function pointers do not point to the code directly but to a descriptor (a small structure containing a code pointer and one or more data pointers, typically to constant pools), this is not a problem because only a descriptor needs to be generated, no executable code. This is of course (more or less) how blocks in Clang work.

Closures exist in C++

Closures have existed in C++ for ages.
Why stick to C which ceased evolving a long time ago?

Closures exist in C++

That is not true sorry, C still evolving and almost everything it's made in C, as OS's kernels, Web server's, Networking Systems, Games, Office suit's, Desktop's Managers, mobile phones, Embedded Systems(wash machines, satellites, cars, GPS's), Programming languages, Compiler's, etc.

I'm not sure why people look at C++, I was a C++ programmer but I don't do it anymore, it's an abomination of programming language and too complex for mortals.

C is much more simple to understand and program, it's compact and it have the power and with closures will rock so I think C still evolving, About OOP it is not a panacea, Anyway you can do OOP with C if you really need it or take Java or Python or even Lisp with CLOS it is much better solution and more healthy.

Regards.

[Admin]

The style of the above post is unacceptable on LtU.

so close...

I've tinkered over many years with various approximations of "closures in C". The feature described here sounds pretty great -- pretty useful. That's about what you come up with, no matter which approach you take to how to implement it or with what particular syntactic sugar.

Yet, in my opinion, this is the wrong way to do it (for C) on a subtle point: the introduction of the "__block" qualifier and reference counting GC for frames moved automatically to the stack is very "un-C-like".

One of the distinguishing characteristics of C is that, aside from stack variables, it does not automate memory management. That flexibility is worth a lot. If anything, C could be "more C-like" itself if it didn't even have a built-in stack but instead synthesized one out of macros. Regardless of whether or not C should even have a stack, it shouldn't have automagic migration from stack to heap combined with reference counting -- there are alternatives and programmers should be free to pick among them and implement what they need. (I get similarly annoyed at C++ for "building in" vtables.) (These opinions aren't meant to be based on some deep principle of language theory; they are aesthetic opinions from a working C programmer.)

All that's really needed for the essence is a structure-like family of types whose pointers can be cast to function pointers and back again, combined with a way to write anonymous functions in the scope of the closure. In pseudo-code, imagine code like these fragments:

/* declare a closure type (and an instance) */
struct foo_closure int (*)(int x) {
  int y;
} a_cl;

/* Init the instance using some anonymous code */
struct foo_closure
make_adder (int my_y)
{
  struct foo_closure result;
  int x = 42;  /* this x is an unused variable (showing scoping) */
  result.y = my_y;
  result.() = { return x + y }; /* that's the struct's x, not local x */
  return result;
}

/* sample use: */
   struct foo_closure test = make_adder (1);

   printf ("%d\n", (&test)(6));   /* prints 7, not 43 */

/* casting to a function pointer: */
   int (*test_2) (int) = &test;
   printf ("%d\n", test_2(6));   /* same result */

/* casting back (dangerous if not used correctly) */
   struct foo_closure * f = test_2;
   printf ("%d\n", f->y); /* prints 1 */

With that, you'd have much greater flexibility about the memory mgt. of C closures and you could do everything described for clang using macros (and shift the "debate" to a need for something nicer than just CPP).

The dangerous "cast back from function pointer to closure" gives garbage if a function pointer is used which is not a closure of the appropriate type. This kind of danger is already present in C by the rule that says you can cast a pointer to a value of the type of the first field of a structure type to a pointer to that structure type.

-t

foo_closure

Regardless of whether or not C should even have a stack, it shouldn't have automagic migration from stack to heap combined with reference counting -- there are alternatives and programmers should be free to pick among them and implement what they need.

All that's really needed for the essence is a structure-like family of types whose pointers can be cast to function pointers and back again, combined with a way to write anonymous functions in the scope of the closure.

I agree with both these points; I'm sorry to say, though, that I can't follow your illustrative pseudocode. Could you explain what exactly is meant to happen when I make a declaration of the form "struct foo_closure ...", and what role the instance a_cl is meant to play?

ick... danger of speaking via pseudocode

I agree with both these points; I'm sorry to say, though, that I can't follow your illustrative pseudocode. Could you explain what exactly is meant to happen when I make a declaration of the form "struct foo_closure ...", and what role the instance a_cl is meant to play?

Sorry, I hoped that would be clear and I also made a mistake.

The pseudo-code fails to use a_cl. A draft did use a_cl but I edited out all but that declaration. So it is a dead variable. a_cl is just an accidental red herring and we can ignore it. Very sorry.

The other question: what does it mean to declare a structure "struct foo int (*) (int x)": that was what I was trying to show by illustration.

Essentially, the function prototype added after the structure type name indicates that the structure begins with an implicit field, dubbed "()" as in "result.() = ...". The implicit field contains a pointer to code.

Suppose we are given a pointer to such a structure. In the pseudo-code, the expression "&test" is such a pointer. We can use that pointer in two ways illustrated by "test->y" to refer to the closed over value of of y and "test(42)" to call the code pointed at by the implicit ".()" field passing the pointer "&test" as an implicit paramter.

Before I posted this reply, user "naasking" made the comment just below. He's got it right (and I've a reply to his comment that I'll post in a moment). This *does* basically represent a change in calling conventions in C but the change isn't so bad... it's an upwards-compatible extension, so to speak.

-t

I don't see how this could

I don't see how this would work. The struct arguments must be passed to the function pointer unless you're using code generation to propagate the struct arguments into the function body at the time of definition.

Otherwise, all closures require an additional parameter, the environment, where ordinary functions don't. Unless you're suggesting that C change all functions calls to be called with an extra environment parameter, and ordinary function pointer just pass an empty environment. Not sure that would fly with most C users.

No, you've got it right...

Yes, I'm suggesting a change to the calling convention and this new implicit paramter. I also suggested ( :-) that C would be *even better* if we also eliminated the implicit stack pointer parameter. So, I'll take one new implicit param in the calling convention and happily give up the one that's already there. Hmm. Darn it, you've given me an idea.... [to go work on, not burble about here, today].

I never claimed that most C users would be instantly excited by this :-)

-t

Ah, so this calling

Ah, so this calling convention is only for calls via function pointers then? I can see difficulties with repeated closure application since C allows arbitrary mutation of arguments. If taking a function pointer implicitly allocates a frame for all of its arguments, then partial application also becomes doable with only a type coercion. There are perhaps difficulties with pointer arguments to stack allocated data though. It does have some interesting possibilities.

[Edit: I meant partial application, not currying.]

C closure notion

I'm not certain I get you naasking but we can make things more concrete:

A decl like "struct foo int (*) (int)" describes a structure with a code pointer in the ".()" field, plus any additional fields described in the structure body.

Suppose I have a variable x declared "struct foo x;". Then "&x" is a pointer to the stack or heap or global variables -- wherever x is declared. That pointer can be used in ordinary ways like "x->y" to refer to the "y" field of the structure.

The pointer "&x" can be cast to a function pointer of type "int (*) int". When calling a function through that pointer, the ".()" field of the structure is used to find the function to actually transfer control to and the structure pointer itself ("&x") is added as an extra parameter. So, the code pointed to has, in some sense, the type "int (*) (struct foo *, int)".

The simplest calling convention change is to say that for all normal functions (not closures) an expression like "&fn" doesn't return the address of the code directly -- it returns a pointer to a static, constant structure associated with the function fn. That static constant struture has a type like: "struct fn_type int (*) (int) {};" In other words, it is a closure with nothing closed over -- just the code pointer.

So, yes... you can do partial application. Yes, there are perhaps difficulties with poitner arguments to stack allocated closures.

The mechanism is "just enough" to give the essence of closures (through C eyes) but not so much as to impose memory mgt. policy etc. It's "just enough" that if it were present people would be *scrambling* for something better than CPP as fast as they could. :-)

-t

Implicit context parameter

The way I imagined it (not in the context of C), the implicit context would be either a pointer to the stack or a pointer to a heap-allocated region. The allocation would occur when the function pointer escapes the current function scope.

In either case data would be laid out the same way in memory.

Looks like LLVM's closures

Looks like LLVM's closures for C is finally getting some real-world use, seeing as how it looks like it's part of Apple's "Grand Central Station" concurrency framework which will be released to the world later this year. Here's a declaration of a "block":

x = ^{ printf("hello world\n"); }

Everyone's firmly on the concurrency bandwagon now.