incremental algorithm help

I hope someone can point me to literature regarding theories of 'incremental' calculation. Many people here will know about folds or catamorphisms and how they help with computations on lists. I am trying to understand the theories of similar calculations when items are added to lists. For example, it is one thing to calculate a sum of integers in a list, but what about returning the sum each time a new integer is added? Or to make it a little more tricky, how does one effeciently (and generically) calculate an average of integers in a list...when integers are being constantly being added?

Actually I found this paper: Incremental algorithms on lists by Johan Jeuring, it has a chapter on incremental algorithms...but I can't understand one word. I didn't understand Functional Programming with Bananas, Lenses, Envelopes, and Barbed Wire until I read some introductory material on foldr (I still don't understand it completely). I'm hoping there are more basic references which will help me understand incremental algorithms (or would it be right to call them incremental folds). Thanks!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: incremental algorithm help

Actually I found this paper: Incremental algorithms on lists by Johan Jeuring, it has a chapter on incremental algorithms...but I can't understand one word.
Section 3 of the paper is about incremental algorithms on lists, but section 2 contains the preliminaries, or things you should know before moving further. I would make sure I understand every word of section 2 first; it may be hard though.

I don't think the material in section 2 is hard by itself. I find the notation used very terse. The author introduces tens of operators (along with some theorems about these operators) in about 6 pages of text. Of course this is reasonable since he's writing a paper, not a book. But how would I, the reader, tackle this paper? Here is what would work for me.

First, don't rush into the paper. The concepts need time to sink in.

Second, try to relate the concepts in the paper to things you know; and implement every operator in your favorite language (Haskell, ML, Scheme, ...). Every equation in the paper corresponds to a program that's easy to write. Try to convince yourself that the code you just wrote works. At first reading, start with concrete examples. Generalize when you're more comfortable with the concepts.

For example, section 2.2 (page 4) talks about the theory of lists. It introduces the following notation: 1_{++}, [a], and the ++ operator. I would relate it to scheme's lists: 1_{++} is '(), [a] is (a), and ++ is the function append. The paper says that ++ is associative, so I ask myself if append is associative: (append a (append b c)) == (append (append a b) c). Come up with a function that is both associative and commutative. Come up with one that's also idempotent in order to gain concrete understanding of what idempotent means (I didn't know and looked it up in Wikipedia).

Progress further. Look at definition (2) of map (or *) and implement it. Implement reduce (from definition 3) and implement filter in terms of append, compose, p, map and reduce (definition 5). Work your way to the end of section 2.

I think if you make it past section 2 using this methodology, you will not have any problems with section 3.


PS. Is there a standard reference of the names of these operators or how to pronounce them? I find it hard to think in terms of an encircled-plus or a stroked-forward-arrow operator.

Going bananas

Second, try to relate the concepts in the paper to things you know; and implement every operator in your favorite language
I think that I have understood the first 6 pages of the Bananas, Lenses, Envelopes, and Barbed Wire paper. I shovelled some coal into my REPL and when steam was up I did
(defun catamorph (base-value f)
  (labels ((h(list)
	     (if (endp list) base-value
		 (funcall f 
			  (car list)
			  (h (cdr list))))))
    (function h)))
(defun anamorph (end-test grower)
  (labels ((h (&rest x)
	     (if (apply end-test x)
		 (destructuring-bind (a . b)
		     (apply grower x)
		   (cons a (apply (function h) b))))))
    (function h)))
(defun hylomorph (base end-test grower shrinker)
  (labels ((h (list)
	     (if (funcall end-test list)
		 (destructuring-bind (extension fundament)
		     (funcall grower list)
		   (funcall shrinker
			    (h fundament))))))
    (function h)))
(defun paramorph (base combination)
  (labels ((for-numbers(n)
	     (if (zerop n)
		 (funcall combination
			  (- n 1)
			  (for-numbers (- n 1)))))
	     (if (endp list)
		 (funcall combination
			  (car list)
			  (cons (cdr list)
				(for-lists (cdr list))))))
	     (etypecase x
	       (number (for-numbers x))
	       (list (for-lists x)))))
    (function for-all)))
A useful little function for trying things out is
(defun f (&rest args)
  (cons 'f args))
After playing with these for a while and sleeping on it I decided that the key idea was
(defun meta-copy (&key
		  ((:nil initial-value))
		  ((:atom  end-test))
		  ((:cons constructor))
		  ((:car head))
		  ((:cdr tail)))
  (labels ((f (list)
	     (if (funcall end-test list)
		 (funcall constructor
			  (funcall head list)
			  (f (funcall tail list))))))
    (function f)))
For example
         (defparameter reduce
	   (meta-copy :nil nil
		      :atom #'atom
		      :cons #'f
		      :car #'car
		      :cdr #'cdr))
CL-USER> (funcall reduce '(a b c d e))
=> (F A (F B (F C (F D (F E NIL)))))
Doing it numerically, instead of symbolically:
         (defparameter sum
	   (meta-copy :nil 0
		      :atom #'atom
		      :cons #'+
		      :car #'car
		      :cdr #'cdr))

CL-USER> (funcall sum '(1 2 3 4) => 10
And trying out factorial
(defparameter factorial
	   (meta-copy :nil 1
		      :atom #'zerop
		      :cons #'*
		      :car #'identity
		      :cdr #'1-))
CL-USER> (funcall factorial 5) => 120
What I think I've learned is
You keep the old atom, car, and cdr, but supply your own cons and nil
You keep the old cons and nil, but supply your own car, cdr, and atom
You can change all five.

The point of the paramorphism is that the hylomorphism suffers from redundancy. The appropriate definitions of atom, car, and cdr are implied by the data type.

The paper has a single function in place of car and cdr, and that function's value gets destructured to produce the car and cdr. For me, that obscured the parallels to the simple recursive definition of copy-list

Whole thesis is online

Interesting stuff. It struck me that a bag is elsewhere called a multi-set. I tried googling for confirmation and found the whole thesisThere looks to be about 70 pages filling in the relevant background, and it seems to take a more leisurely approach to presenting the material, so reading the larger document might be the faster route.

I found the definition of promotability, numbered 6 on page 6 puzzling. It is indentical to saying that f is a homomorphism. A mathematician would say that f:G->H is a homomorphism when f(id)=id and f(xy)=f(x)f(y). The mathematical tradition is to write xy for x o y and to leave the reader to work out that f(xy) must be using the operation on G, while f(x)f(y) must be using the operation on H.


Thanks Alan, Aziz,

I have the thesis as well, I actually found the shorter paper to be easier since it gets to the point more quickly. I did see a couple of other citations, I'll investigate those. Unfortunately, I didn't see many papers referencing this one in citeseer or google scholar. I'll just have to do it the hard way and read the thing :)

Incremental Algorithms

I suggest you have a look at the work by Umut Acar. He has developed methods for automatically turning ordinary offline algorithms into incremental algorithms. The work comes in various flavors, both as language extensions (to SML) and as libraries.

Also, make sure to read Magnus Carlsson's paper on implementing these techniques in Haskell.

thanks for the great reference!

The reason I'm trying to learn this stuff is the following: There have been many papers that show that the fold operator and list comprehension can be used in a manner similar to sql. I'm hoping these incremental algorithms will model 'stream databases.' I don't know if this stuff will help with actual implementation of my query optimizer...we'lll see :)