C runtime type info gimmick which supports scripting

I am experimenting with runtime type information in C, so that values in memory have types, rather like Lisp and Smalltalk values have types, and not just variables. This post is intended to be fun. (The idea is for you to enjoy this, and I don't want to argue. But I might respond to remarks with short dialogs in lieu of argument, since that would be fun.) The rest of this is explanation and some detail.

In an earlier post I described a kind of heap I call a vat. It has the effect of associating metainfo with every block of memory allocated, in a different position in memory. (There is a constant time arithmetic calculation that turns a block pointer into a metainfo pointer, and vice versa; so the main penalty in access is a cache line miss if this happens more often when lacking adjacency to its block.) A continuous memory block is a rod, while the fixed size metainfo describing it is a knob. A counted reference to rod (or any point within) is a hand. On a 64-bit platform, a knob is 128 bits, a hand is also 128 bits, and rods vary in size.

Some bits of a hand are a copy of bits in the knob, so a reference is only valid when it agrees with the current self description of an object, particularly the current 16-bit generation number, so dangling refs are seen. Of the bits copied, another 20 bits are the id of the object's plan, which describes the type, layout, and any dynamic behavior if you also supply methods for that type in hashmap entries. You can think of plan as morally identical to class in Smalltalk, but initially I was aiming for perfect format description on a field by field basis, with recursion grounded in primitive native types. I wanted to write centralized code to debug print anything based on the plan. But then I sort of noticed I would be able to write Smalltalk style code that works on the C objects, if you were willing to call a dynamic api to send messages (method selectors with parameters).

If you point at a subfield somewhere inside the middle of a rod, you can eventually figure out exactly what it is by drilling down in the rod's plan, after fetching the plan for the id in the rod's metainfo knob. But since this lookup is offset-based, it aliases anything with the same offset. That is, a struct named foobar that contains a foo and then a bar will have the same offset for foobar and foo. If you send a message, did you mean to foobar or to foo? I would start a search in method tables for the largest containing object, then work through nesting until I found a match.

At the moment I am writing metainfo by hand, but generating it from C declarations later is the idea. On 64-bit platforms a plan is 256 bits, and each field requires 128 bits, plus however long the string names get, and method tables if there are any. When a field is composed of bit fields, I use a plan that says all its fields are bitfields, and then each field describes bit sizes and shifts instead of byte sizes and offsets.

Even if most of an app is comprised of static code, I will want to write dynamic and interactive scripts to explore runtime state in tests, and to inspect things at runtime in ways I did not anticipate earlier. I would normally think of glomming on a scripting language to do this, on the side. But I can use C as the scripting language instead, if memory state is universally annotated this way with type metainfo.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

behavior

    "Why don't you use existing debug info to get that format meta data?", Dex asked.
    "Two reasons," Wil explained. "First, it's not debug info: it must be there all the time, without being stripped. I need it to work even if folks try to kill all debug info for a build. Second, there's no method table in existing debug info, which is the interesting part. I would not bother posting about just format debug info."
    "I ignored you when you said method table," Dex confessed. "It sounded like Smalltalk BS. What does that mean? You want everything to have a vtable?"
    "In effect, yes," Wil nodded. "You might only use it when something dynamic or interpreted runs, either in debugging or testing; but every C value would be a polymorphic object when it lives in space allocated by a vat. Even native types."
    "Now you sound like a lunatic," Dex warned. "What kind of vtable? Something more like C++, or more like what a dynamic language uses?"
    "More like the latter," Wil said, "which would require a dynamic runtime in some scripting to dispatch. I guess I'd start with an AST interpreter using Lisp syntax but Smalltalk message passing semantics. An easy lesson for my sons: here's how you parse, and here's a way to interpret. It would only be crazy the AST uses refcounted nodes."
    "I hate that crap," Dex admitted sourly. "If you like C, why not stick with it?"
    "I could always compile a script to C, if I like it enough to keep and use more heavily," Wil said. "But making stuff up as you go along is harder in C, and more dangerous, and not nearly interactive enough for what I want. I hope to demonstrate the state of internal data structures in the middle of a suspended algorithm to my sons. And when writing new things wholesale, this approach makes bottom-up construction easier, since you can see what happens when you put together components like tinker toys."