Decayed Memoization

This is something I have been wondering about recently..

The basic idea is memoized data that decays over time.

The first time data is required it is calculated
If it took substantial time to calculate it is memoized and calculation time recorded
Subsequent requests to the data (with same parameters) are given the memoized version
The memoized data decays over time based on a metric that incorporates data size, calculation time, last access time, current memory load and so forth
Once the memoized data has fully decayed, it is freed from memory
Future requests to the data cause it to be recalculated, and (re)memoized
Repeat

At first glance, this seems like it might give a nice dynamic time/space balance to certain aspects of a programming language.

Having not come across the idea before, I was wondering if anyone might point me towards any languages or papers on the subject?

Another name for...

The basic idea is memoized data that decays over time.

What you just described is caching. You can add caching to any application, but I doubt a general purpose programming language would benefit from it as a default language-level guarantee. For one thing, it would make it even harder to calculate computational cost for the program.

For another, good caching schemes require problem specific knowledge to be optimized: chances are you would get worse performance in general from this language than without this "feature".

By Marc Hamann at Wed, 2007-09-19 10:59 | login or register to post comments

Maple has it

For good or for ill, the computer algebra (and programming) language Maple has a very sophisticated caching strategy that users can choose to associate to any function/procedure. For those functions where caching is enabled, then computational cost can indeed become impossible to calculate, the best you can do is measure it empirically [since it depends on external factors!].

In practice, it seems to help for medium-sized problems, and can be a hindrance for very large problems [mostly because Maple's usage of memory tends to be distributed uniformly over all allocated memory, which really messes up with the processor cache].

By Jacques Carette at Wed, 2007-09-19 23:53 | login or register to post comments

Maple

Thanks, this is just the sort of thing I'm looking for..

By the way, out of curiosity, do you know why Maple distributes its allocations in that way?

By SamK at Thu, 2007-09-20 19:08 | login or register to post comments

Memory allocation in Maple

At the low-level, the most fundamental data-structure (after arrays) in Maple are hash tables. These are absolutely pervasive in the low-level implementation. In particular, all objects are made "unique"; in other words, all objects (implemented as DAGs, Directed Acyclic Graphs, via C pointers) only exist in one copy in memory. This is done by using a global hash table with keys based on object's memory addresses. Memory allocation happens, like in most systems, based on various free-lists. Since a lot of computations are ``almost functional'', this means creating a lot of temporary objects which quickly get collected. The main effect of this is that long-lived objects tend to get spread throughout memory pretty uniformly.

I do believe that some of these ideas were already present in Macsyma. However the Maple implementation was at least an order of magnitude more efficient, so that in the mid-80s, Maple completely took over from Macsyma as the ``main'' computer algebra system.

By Jacques Carette at Thu, 2007-09-20 23:54 | login or register to post comments

Yes, AKA Caching

I use the term "decayed memoization" to emphasize its use as a (transparent) language level feature.

For one thing, it would make it even harder to calculate computational cost for the program.

Indeed it would.. But perhaps not any more than choice of GC scheme/execution strategy/optimization technique.

chances are you would get worse performance in general from this language than without this "feature".

This is a valid concern. I picture the runtime only inserting "memoization points" at places where the computation time is significant. Memoized data is considered second class, and in tight memory situations the allocation of "real data" causes a cull of the memoization pool based on the metrics mentioned above.

I can imagine various scenarios where this scheme would produce good speed-ups, and others where it wouldn't help at all. I guess the tricky part would be to get the runtime smart enough to distinguish between the two.

I'm toying with the idea of adding this feature to a dataflow style language I'm currently working on, as it appears be a natural fit. I guess the proof is in the eating..

By SamK at Thu, 2007-09-20 19:00 | login or register to post comments

Different cost

But perhaps not any more than choice of GC scheme/execution strategy/optimization technique.

I'm talking about algorithmic complexity cost, not runtime cost per se.

Algorithmic costs are in general WAY more significant than operational runtime costs, such as GC, on any vaguely decent platform.

Furthermore, I tend to believe that operational concerns without algorithmic impact don't really belong in a language design.

I picture the runtime only inserting "memoization points" at places where the computation time is significant.

I would think that in general, you would be better off building a simple purpose-built cache using your specific knowledge of your data structures and their required characteristics.

May be worth a try though. Good luck with it.

By Marc Hamann at Fri, 2007-09-21 02:09 | login or register to post comments

Memoisation as an operational constraint

I can see why one might want to constrain the performance characteristics of a programming language (modulo resource limits). Scheme's tail recursion optimisation is an example. The general idea is to make the computational cost of an algorithm much easier to compute; for example in a (strict) language with memoisation one could in effect require that a naive implementation of Fibonacci (one with branching recursion) execute in linear rather than exponential time.

There is some discussion of this in this paper by John Hughes; unfortunately there doesn't seem to be an online version. This paper by Turner also seems relevant, although I haven't read it. Neither addresses the original poster's question, however.

By Roly Perera at Wed, 2007-09-19 11:42 | login or register to post comments

online version

of Hughes' paper: here, thanks to google books.

[on edit:] alas, they only show the first 2/16 pages, plus citations.

By dbfaken at Thu, 2007-09-20 16:12 | login or register to post comments

Lambda the Ultimate

User login

Navigation