archives

C runtime type info gimmick which supports scripting

I am experimenting with runtime type information in C, so that values in memory have types, rather like Lisp and Smalltalk values have types, and not just variables. This post is intended to be fun. (The idea is for you to enjoy this, and I don't want to argue. But I might respond to remarks with short dialogs in lieu of argument, since that would be fun.) The rest of this is explanation and some detail.

In an earlier post I described a kind of heap I call a vat. It has the effect of associating metainfo with every block of memory allocated, in a different position in memory. (There is a constant time arithmetic calculation that turns a block pointer into a metainfo pointer, and vice versa; so the main penalty in access is a cache line miss if this happens more often when lacking adjacency to its block.) A continuous memory block is a rod, while the fixed size metainfo describing it is a knob. A counted reference to rod (or any point within) is a hand. On a 64-bit platform, a knob is 128 bits, a hand is also 128 bits, and rods vary in size.

Some bits of a hand are a copy of bits in the knob, so a reference is only valid when it agrees with the current self description of an object, particularly the current 16-bit generation number, so dangling refs are seen. Of the bits copied, another 20 bits are the id of the object's plan, which describes the type, layout, and any dynamic behavior if you also supply methods for that type in hashmap entries. You can think of plan as morally identical to class in Smalltalk, but initially I was aiming for perfect format description on a field by field basis, with recursion grounded in primitive native types. I wanted to write centralized code to debug print anything based on the plan. But then I sort of noticed I would be able to write Smalltalk style code that works on the C objects, if you were willing to call a dynamic api to send messages (method selectors with parameters).

If you point at a subfield somewhere inside the middle of a rod, you can eventually figure out exactly what it is by drilling down in the rod's plan, after fetching the plan for the id in the rod's metainfo knob. But since this lookup is offset-based, it aliases anything with the same offset. That is, a struct named foobar that contains a foo and then a bar will have the same offset for foobar and foo. If you send a message, did you mean to foobar or to foo? I would start a search in method tables for the largest containing object, then work through nesting until I found a match.

At the moment I am writing metainfo by hand, but generating it from C declarations later is the idea. On 64-bit platforms a plan is 256 bits, and each field requires 128 bits, plus however long the string names get, and method tables if there are any. When a field is composed of bit fields, I use a plan that says all its fields are bitfields, and then each field describes bit sizes and shifts instead of byte sizes and offsets.

Even if most of an app is comprised of static code, I will want to write dynamic and interactive scripts to explore runtime state in tests, and to inspect things at runtime in ways I did not anticipate earlier. I would normally think of glomming on a scripting language to do this, on the side. But I can use C as the scripting language instead, if memory state is universally annotated this way with type metainfo.