Implementing arrays

I got to thinking that it would be cool if using a language's builtin tools you could create its array type -- without using its array type. Most language's I've seen have some builtin concept of an array or list. And although some languages give a 'definition' of what a list is -- that defining code won't actually run through the interpreter.

I came up with the following C++ implementation that manages to implement an array type without actually using arrays. Before you cry that it depends on the class layout which isn't guarunteed -- it is when a class is a POD, and in this case the element is public so it is, and even if it wasn't, it's a defacto standard that things are layed out this way and projects like boost.python depend on it, and the point of the example is to show that a language can do it (even if the language is "C++ with small tweak").

template<class T, int n>
class Array : public Array<T, n-1>
{
public:
    T& operator[](int i);
    T tail;
};

template<class T, int n>
T& Array<T, n>::operator[](int i)
{
    // Should be able to just use line below
    // but some compilers (GCC 3.4) are stupid
    // return *(&(Array<T, 1>::mem) + i);

    int& x = Array<T, 1>::tail;
    return *(&x + i);
} 

int main(int argc, char *argv[])
{
    Array<int, 5> x;

    cout << sizeof(Array<int, 5>) << endl; // 20 as expected

    x[0] = 9;
    x[1] = 8;
    x[2] = 4;
    x[3] = 2;
    x[4] = 5;

    // Prints correctly
    cout << x[0] << x[1] << x[2] << x[3] << x[4] << endl;

    return 0;
} 

How often do you see recursive inheritance? *grin*

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Languages with co-inductive

Languages with co-inductive types like Charity allow you to define cons cells (and thus lists) directly. Arrays could be represented as functions in a language with dependent types.

In O(1) access time?

This is something I've been wondering about. Is it possible to implement arrays in such a language with O(1) access time? Or can they only be done with cons cells and thus O(n) access time?

Do O(1) arrays really exist?

Here's a slightly different question. Do O(1) arrays really even exist? For example, take the RAM chips in your PC. When you look at the circuitry, you'll see that they're really implemented as trees, with O(log(N)) logic gates in the row and column decoders. Or, if you assume that communication speeds are bounded (by say, the speed of light), and you assume that bits occupy a finite volume, then it seems like the best access time you can get for a physical data structure is O(N^1/3).

Memory is usually

clocked, isn't it? Doesn't that imply constant speed access regardless of the size of array?

For a given technology...

...you are going to have more gate delays for a bigger array, so your clock period will have to increase.

log log n goes to infinity

Well, the base of the log will be the fanout of the gates, which I think in practice is pretty large, so that this is not the limiting factor. I'm reminded of the joke that log log n goes to infinity ... although it's never been observed doing so.

Anyway, the access is still O(1) as long as n It's the kind of detail that is sometimes worth remembering, but usually is better forgotten.

May be true, but...

That really doesn't have anything to do with Big-O measurements. Likewise, your measurements on memory access don't really apply either. It still takes as long to flip all of the switches and access memory location 0x1 as it does 0xFFFFFFFF.

Sure, but...

It still takes as long to flip all of the switches and access memory location 0x1 as it does 0xFFFFFFFF.

Sure, that applies for any finite amount of storage, including finite lists (there's an upper bound of access time for getting the last element of an 0xFFFFFFFF length list). And if the universe is finite, then I guess every list is O(1)...

They do today

RAM timings you get with today's specs are still consistent with O(1) access.

However, your question is intriguing: if the underlying system is not O(1), what would happen if instead of emulating an O(1) storage, the tree structure was exposed?

Cache Hierarchy

Memory timings are not constant, if you measure random access times over datasets with sizes at various levels in the cache hierarchy. It's not exactly the same as exposing a tree structure.

However, the asymptotic

However, the asymptotic behaviour's still constant. You need a more subtle tool for playing with cache behaviour.

Pretend they do.

Ignoring memory access limitations (because here an O(1) implementation would have O(1) memory accesses whereas an O(log n) implementation would have O(log n) memory accesses), where does that leave us?

Er, I guess I did a bad job

Er, I guess I did a bad job of making my point (since people seem to be concentrating on low-level details), but I believe that if you want O(1) arrays in your language, you have to take them as axiomatic or rely on a related primitive (like pointers). I don't think this is a "memory access limitation" issue, but instead a more fundamental one. Take a run-of-the-mill language with the usual complement of lists, trees, recursion, etc., and I don't think you'll be able to construct O(1) arrays with that alone (especially for a physically realizable machine).

thats sort of trivially

thats sort of trivially true, because its impossible to construct an O(1) random access datastructure from either an O(n) or an O(ln n) random access data structure.

Algorithmic analysis often times is done using the model of a Random Access Memory model of the actual machine (that is, any memory location address can be dereferenced in constant time)

Could someone with a stronger complexity theory background elaborate upon what i'm saying or clarify further?

But General Relativity says

But General Relativity says that the amount of information in a given volume is only proportional to its surface. Put to much information in, and it collapses into a black hole, which has infinite access time. Therefore, array access is only in O(n^1/2).

(SCNR. The point we should be dabating is whether it is possible to implement a data structure where access is possible with O(1) memory reads. I don't think we can, unless we are given arrays or pointer arithmetic (equivalent to a single large array) as primitive.)

Is this cheating?

let create_array init =
  let r0 = ref init
  and r1 = ref init
  and r2 = ref init
  and (* etc. *) in
  function
  | 0 -> r0
  | 1 -> r1
  | 2 -> r2
  | (* etc. *)

let get_item a i = !(a i)
let set_item a i v = a i := v

In O'Caml, pattern matching of sequential integers is implemented as a jump table (like in C), giving O(1) access time. Granted, it's limited to a fixed-length array but the length is arbitrary.

yes

... IMO, because jump tables are suspiciously similar to arrays. In fact, my best description of the relationship between them is that jump tables are an array of procedures (for a bit of a loose definition of procedure).

O(1) may meet the academic

O(1) may meet the academic measure for array access, but can the Ocaml compiler reduce that down to an integer addition and memory access? That should be the case for my implementation.

Improvements

I think you left off the base case:

template<class T>
class Array<T, 0> { };

Another minor change: the int& x should be T& x. Now, the next question is, how do you make it typesafe? We could add the following member function into class Array:

    template<int i>
    T& at()
    {
        return ((Array<T, i+1> *) this)->tail;
    }

Which allows accesses to statically resolved indices:

    x.at<4>() = 7;

Unfortunately, this isn't actually safe, because C++ allows conversion between arbitrary pointer types, and if we tried to convert values rather than pointers, temporaries may be involved.

Good catches

I forgot to include the base class yes and since I tested with int I missed the int& bug ;P

How is the current code not typesafe? How is your code not typesafe? I don't see how conversion between arbitrary pointer types should matter.

Do you mean in implementation or in use? I don't see how you could fill an Array with 'bar' objects.

Re: Good catches

Do you mean in implementation or in use? I don't see how you could fill an Array with 'bar' objects.

In particular, I meant the implementations, but this also affects the safety of the uses. The unsafe operation in your implementation is &x, conversion from a reference to a single element of type T to a pointer to a sequence of locations holding elements of type T.

My implementation avoids the above situation by performing a cast to the appropriate class instead. Incidentally, that avoids assumptions on the memory layout. But since the C++ compiler allows conversion between arbitrary pointer types, it's still unsafe: we can place an element in the 5th position of a 4-element array. If C++ had an up_cast<> operation which only allows you to cast from a derived class to a base class, then using that would be safe (though only allowing static indices).

I think your implementation

I think your implementation could lose the advantage of being the same speed as C arrays because of the member access on tail. Also, the memory layout assumption is an assumption in the general context of programming languages, but not in C++ -- if all data members are public it's a POD type which _must_ be layed out that way.

nice idea.

for an application of this very technique with an industrial-grade implementation see boost::tuple's implementation, which uses the same cons-style generic list approach to implement arbitrary length tuples.

Arrays are homogeneous tuples.

Arrays are homogeneous tuples, i.e. tuples where all the members are of the same type.

So C++ and other languages should have had tuples, instead of arrays, and allow indexing of tuple members just like arrays.

Which is what I have discussed here.

How often do you see recursive inheritance?

By coincidence I just created some for a different application. In fact, I just put together a pair of mutually recursive families of types.