archives

Judy Stores

A Judy tree is generally faster than and uses less memory than contemporary forms of trees such as binary (AVL) trees, b-trees, and skip-lists. When used in the "Judy Scalable Hashing" configuration, Judy is generally faster then a hashing method at all populations.

Some of the reasons Judy outperforms binary trees, b-trees, and skip-lists:

  • Judy rarely compromises speed/space performance for simplicity (Judy will never be called simple except at the API).
  • Judy is designed to avoid cache-line fills wherever possible.
  • A b-tree requires a search of each node (branch), resulting in more cache-line fills.
  • A binary-tree has many more levels (about 8X), resulting in more cache-line fills.
  • A skip-list is roughly equivalent to a degree-4 (4-ary) tree, resulting in more cache-line fills.
  • An "expanse"-based digital tree (of which Judy is a variation) never needs balancing as it grows.
  • A portion (8 bits) of the key is used to subdivide an expanse into sub-trees. Only the remainder of the key need exist in the sub-trees, if at all, resulting in key compression.

The Achilles heel of a simple digital tree is very poor memory utilization, especially when the N in N-ary (the degree or fanout of each branch) increases. The Judy tree design was able to solve this problem. In fact a Judy tree is more memory-efficient than almost any other competitive structure (including a simple linked list). A highly populated linear array[] is the notable exception.

Worth a look, considering the open-source LGPL license and range of applications. HP released this library in 2004. From the PDF manual,

Judy arrays are remarkably fast, space-efficient, and simple to use. No initialization, configuration, or tuning is required or even possible, yet Judy works well over a wide dynamic range from zero to billions of indexes, over a wide variety of types of data sets -- sequential, clustered, periodic, random.

The Essence of Data Access in Cw

The Essence of Data Access in Cw, The power is in the dot! Gavin Bierman, Erik Meijer, and Wolfram Schulte.

In this paper we concentrate on the data dimension. Our aim is to describe the essence of these extentions; by which we mean we identify, exemplify and formalize their essential features. Our tool is a small core language FCw, which is a valid subset of the full Cw language... we are able to formalize the type system and operational semantics of the data access fragments of Cw.

If you have been following the discussions here of Cw, you already know about the language features discussed here, since the paper doesn't introduce new features. If you haven't seen Cw, section 2 is a short and readable introduction.

The rest of the paper is more formal, and unless you need to prove formal results regarding Cw, might not be all that interesting. It won't hurt to keep in mind that it exists, since some of us may need something like FCw at one point or another.