User loginNavigation |
Content Addressable Type SystemsHello, I have an interesting problem that this site's audience may have an opinion on. I'm assuming that people know what content addressable storage is. Let's say it uses SHA1; we put some binary blob in it, get its hash back, which we can later use to retrieve the blob again. Now imagine I have a primitive type system that looks something like this (pseudo-code): struct TypeInfo { string name; MemberInfo [] members; } struct MemberInfo { string name; Hash type; } I can use the TypeInfo and MemberInfo structures to describe types, and insert them into the content addressable store. The type field in the MemberInfo structure contains the hash of the field type. This all works fine and dandy, until a type refers to itself. The trivial case is a direct reference, for example: struct BinTreeNode { BinTreeNode left; BinTreeNode right; } But one can imagine more complicated type relations where a type only refers to itself indirectly (i.e., Foo refers to Bar refers to Pep refers to Foo). Suddenly, I can no longer build a type description using the content addressable store. After all, it means generating a SHA1 hash value for a binary blob that is supposed to contain that very hash itself. Next thing I know, I'm looking at fixed point combinators and I'm trying to crack SHA1. Clearly that isn't going to work - not given the machine power and time available to me... :-) Does anybody have some creative suggestions on how to solve this problem. So far, my work arounds include various combinations of losing type safety and/or introducing an extra level of indirection. The former would involve dropping from a typed reference to an untyped one (a void* if you will). The latter would result in breaking the invariant of my content addressable store, namely that the SHA1 of each blob is the hash used to refer back to it. Another interesting question is how to decide where in a type relation cycle it makes the most sense to insert such a work around. Theoretically, interpreting and inserting types can begin on any point on a cycle, so it seems arbitrary to pick the first time my parser detects a cycle; it would lead to non-deterministic type descriptions. If the problem description is too vague, I can provide more detail if necessary. It's language agnostic. I'm suspecting that there won't really be a clean solution; I'm looking for the least ugly work around. Note that I'm aware we could generate hashes from just the type names instead of the full type description. But the whole point of the exercise is to capture a type's complete content in the system; this has numerous interesting benefits once types start evolving (in the area of database and schema upgrades). Thanks, Jaap Suter - http://jaapsuter.com By JaapSuter at 2008-09-18 02:31 | LtU Forum | previous forum topic | next forum topic | other blogs | 4860 reads
|
Browse archives
Active forum topics |
Recent comments
22 weeks 6 days ago
22 weeks 6 days ago
22 weeks 6 days ago
45 weeks 19 hours ago
49 weeks 2 days ago
50 weeks 6 days ago
50 weeks 6 days ago
1 year 1 week ago
1 year 6 weeks ago
1 year 6 weeks ago