Lambda the Ultimate

inactiveTopic Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )
started 1/25/2004; 12:27:26 PM - last post 2/7/2004; 4:40:17 AM
Ehud Lamm - Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/25/2004; 12:27:26 PM (reads: 14923, responses: 57)
Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )
via Patrick I came across Brian Marick's discussion of code reading practices based on a session in the recent MFA trail run.

The question at hand was in the context of an accumulator passing tail recursive implementation of factorial. Which is better: to use an optional private parameter in a public function or to use a "named" let, and introduce the accumulator there?

I am with Patrick (I think) who seems to prefer the nested let. It seems to me that this is also what Richard Gabriel prefers.

My reasons? First, by exposing the optional parameter you are implying it is part of the interface available to outside clients. I don't think this is the correct semantic interpretation in this case, so it is misleasing. Essentially, what you want is a helper function (fact-iter), and fusing it with the public function seems like a hack. Sexy but confusing...

Second, scope is one of the most important clues when reading code. Scoping techniques make code more readable, both for humans and for language processing tools. Provided, that is, that you pay attention to them when you are writing as well as when you are reading code.

And yes, Patrick, I think that it is idiomatic Scheme.
Posted to general by Ehud Lamm on 1/25/04; 12:30:41 PM

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/25/2004; 1:08:09 PM (reads: 1211, responses: 1)
(defun fact (n)
  (if (= n 0)
      1
      (* n (fact (1- n)))))

Ehud Lamm - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/25/2004; 1:12:16 PM (reads: 1206, responses: 0)
Your point being?

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/25/2004; 1:15:31 PM (reads: 1208, responses: 0)
BTW Patrick and Brian, your programs are indented as Emacs Lisp, i.e. for (if test then &rest else).

This should help:

(add-hook 'lisp-mode-hook
          (defun my-lisp-mode-hook ()
            (set (make-local-variable 'lisp-indent-function)
                 'common-lisp-indent-function)))

Sincerely,
Your friendly .emacs pedant. :-)

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/25/2004; 1:19:45 PM (reads: 1199, responses: 1)
My point is just to throw another style of implementation into the mix. Overlooking the simplest solution strikes me as the most interesting point of style.

Ehud Lamm - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/25/2004; 1:27:14 PM (reads: 1197, responses: 0)
Ok then.

A point well taken.

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/25/2004; 2:14:16 PM (reads: 1172, responses: 0)
On rereading it's the = and <= tokens that jump out at me. Thinking back, I did semi-consciously optimize for conciseness over correctness/robustness.

It seems there are three choices for negative inputs: loop forever, return nonsense, or give an error. What are the idioms for different languages?

Patrick Logan - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/25/2004; 2:30:06 PM (reads: 1176, responses: 0)
1. I like Emacs-Lisp style. ("out"-dent the "else" clause).

2. What did I write? I prefer neither Brian's nor Richard's style. If I were using Common Lisp again I would still use a named let.

3. Non-tail recursive example - yes, simpler, but the style argument was about how to express the accumulator.

Dave Herman - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/25/2004; 3:58:29 PM (reads: 1147, responses: 0)
It seems to me a pretty straight-forward question of what you want the interface to be. If the users of the function shouldn't (need to) know about the accumulator, then hide it; otherwise, if it really is something that should be exposed (say, because they should be able to start the computation with different initial values for the accumulator), then expose it.

(How you hide the accumulator is another question. Named let? Letrec? Internal define? Local? Helper function? Helper function hidden in a module (assuming your Scheme implementation has a module system)? But this is probably getting really petty.)

A similar example: a (not whole-program) CPS transformation. In the simple case, where the transformation is completely local, you should hide the "start continuation" (usually λx.x). But if you have a bunch of combinators that are all in CPS, you keep the continuation in the interface to give the user the ability to plug in different continuations. And then... you can introduce *another* layer of information hiding, if you want the combinators to work together but without making the accumulator visible to the "top-level." For this, you wrap them in an abstract data type that allows you to combine them, so their continuations are exposed to each other, but not to the user.

Noel Welsh - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/26/2004; 9:52:59 AM (reads: 956, responses: 1)
I find this 'literary criticism' approach to understanding code very interesting. I'm tutoring a bunch of first year students in the arcane arts of Java programming. I tried getting them to critically read code last week. Results were mixed but I suppose not too bad for a first attempt. I'll write up more if I ever get the time.

Ehud Lamm - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/26/2004; 10:06:55 AM (reads: 955, responses: 0)
I'll write up more if I ever get the time. Please do!

Keep in mind that trying this with first year students is a bit like teaching 'literary criticism' to first graders just learning the alphabet...

Pascal Costanza - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/26/2004; 11:42:37 AM (reads: 928, responses: 1)
Here are some variations:

(defmethod fact1 ((n (eql 1)) &optional (acc 1))
  acc)

(defmethod fact1 (n &optional (acc 1)) (fact1 (1- n) (* n acc)))

(defun fact2 (n) (labels ((f (n acc) (if (<= n 1) acc (f (- n 1) (* n acc))))) (f n 1)))

(defun fact3 (n) (do ((i 1 (1+ i)) (r 1 (* r i))) ((> i n) r)))

(defun fact4 (n) (loop for i from 1 to n for r = 1 then (* r i) finally (return r)))

[fact2 is from the original posting. The point here is that its formatting is different.]

I like fact1 and fact4 for different reasons. fact1 shows how cool CLOS is and fact4 describes best what's happening IMHO.

I find the example strange. How often do you write functions in a value accumulating style, or transform functions to tail-recursive style? What's wrong with iteration?

If this is a course that treats programming as an art form, it shouldn't teach the "one" "correct" style. This doesn't make sense in the arts.

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/26/2004; 12:19:46 PM (reads: 948, responses: 0)
I have had a go at Brian's mental exercise and enjoyed it! I tried to consider what each word suggests, as well as the overall meaning. My trains of thought while doing this are recorded below. For the level of pedantry I offer neither excuse nor apology :-).

(defun fact(n &optional (so-far 1))
   (if (<= n 1)
        so-far
      (fact (- n 1) (* n so-far)))

  1. The whitespace is unconventional:
    • Usually there would be a space between fact and the open-parenthesis on the first line. This appears to be either a typographic error or a C-ism.
    • The else part of the if is indented as for Emacs Lisp (where if takes any number of alternative forms). However, this is not an Elisp program, because having the default value for so-far in the argument list would not be legal. I read this as Common Lisp.
  2. Using <= to find the base case means returning 1 for negative values. Factorial is not defined for negative inputs. The programmer appears to be operating on the assumption that the function may return anything at all when called with an invalid input, or at least that doing so is better than looping infinitely. The Software Tools book advocates this form of "defensive programming": using <= in preference to = as to produce garbage rather than loop in programs that reach inconsistent states. In (at least) the Eiffel and Erlang communities this is considered a Bad Thing. I feel there is a missing assert or check-type expression in this program.
  3. Why is there an accumulator at all? I suspect that the programmer was aware that the program could be shorter without it, but semi-conciously chose the tail-recursive version anyway. I suspect that he did this as a reflexive optimization. This starts several trains of thought.
    • Is there any reason to optimize this function? This is a simple yes/no question that should always be considered and answered when writing any function. ("Not sure" means "no".)
    • If the answer is no: the accumulator adds no meaning to the program. It should be removed.
    • If the answer is yes:
      • The accumulator seems to be intended to save stack space. The tail-recrusive program takes O(lg N!) bits of memory (for the bignum), which looks like O(n lg n). So even the tail-recursive version uses significant memory.
      • For the optimization to be effective the stack must presumably grow very long. It's not clear to me that memory (rather than CPU) would be the limiting factor in computing factorials., but it would have to be so for the optimization to make any sense.
      • Since efficiency is important, are there not much (much much) better algorithms for computing factorials than this? I don't know any but I think they exist. If so this program should be rewritten to use a more efficient algorithm.
    • I also agree with you guys who say the accumulator should not complicated the interface. This is especially true in Lisp where argument lists are queried from the system as online help.
  4. (- n 1) could also be written (1- n). I use the second form, but I don't know if either is more "idiomatic".
  5. Later addition: There is a missing close-parenthesis at the very end.

In summary, between careful consideration and my own personal tastes, I think the function should have been written exactly like this:

(defun factorial (n)
  "Return N!"
  (check-type n (integer 0 *))
  (if (zerop n)
      1
      (* n (factorial (1- n)))))

(I added the documentation string and renamed from fact to factorial while convincing myself to put HTML-emphesis on "exactly". Open-source is good for software engineering :-))

That's what I reckon!

Too much nitpicking? Want revenge? Here's a favourite function of mine, in production use for many years:

make_set([])    -> [];
make_set([H|T]) -> [H | [X || X <- make_set(T), X /= H]]. % no comment

andrew cooke - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 4:41:35 AM (reads: 808, responses: 0)
fwiw fact1 reminded me of ernst et al's paper on predicate dispatching - http://citeseer.nj.nec.com/ernst98predicate.html

Dominic Fox - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 7:30:26 AM (reads: 773, responses: 2)
make_set([]) -> []; make_set([H|T]) -> [H | [X || X <- make_set(T), X /= H]]. % no comment

I wonder why not (switching to Haskell, as I don't know what language this is in - guessing ML?):

make_set []  = []
make_set (x:xs) = x : filter (\p -> p /= x) (make_set xs)

- in other words, why is the list comprehension better than a filter?

Daniel Yokomiso - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 7:45:30 AM (reads: 776, responses: 1)
I wonder why not (switching to Haskell, as I don't know what language this is in - guessing ML?):

It is Erlang.

- in other words, why is the list comprehension better than a filter?
I guess it's just a matter of preference. Some people say that list comprehensions are easier to read. Also comprehensions are usually shorter.
make_set []  = []
make_set (x:xs) = x : [p | p <- make_set xs, p /= x]
vs.
make_set []  = []
make_set (x:xs) = x : filter (p -> p /= x) (make_set xs)
Or even:
make_set []  = []
make_set (x:xs) = x : (do p <- make_set xs
                          guard (p /= x)
                          return p)

Ehud Lamm - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 8:00:07 AM (reads: 768, responses: 0)
For some reason I find the list comprehension in this case confusing, and the second approach quite clear.

Dominic Fox - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 8:04:52 AM (reads: 757, responses: 0)
It is Erlang.

Doh! Of course it is...

List comprehensions are preferred in Python also - filter is frowned on, if not actually deprecated. So in Python you might say:

>>> def make_set(n):
...     if n == []:
...             return []
...     else:
...             return n[:1] + [p for p in make_set(n[1:]) if p != n[0]]

Frank Atanassow - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 8:15:07 AM (reads: 759, responses: 1)
I don't know what language this is in

I think it's Erlang.

Regarding the accumulator argument: After some consideration, I don't see why one should regard this (as Dave suggests, and Ehud in the topic header) as an interface issue. In fact, besides namespace clutter, I don't see what's wrong with lifting the helper function to the top-level. It doesn't expose any implementation issues, since there is no guarantee that the factorial function uses it. (Of course, in LISP this could cause problems because you could change the behavior by rebinding the helper function; but then, LISP is LISP, not a `normal' language. ;)

Indeed, it's pretty useful to have the helper function available in cases like this. Let me give you an example where it's clearer, free monoid normalization.

data Mon a = Nil | Mon a :+: Mon a | Elem a

norm :: Mon a -> Mon a norm m = reduce m Nil

reduce :: Mon a -> Mon a -> Mon a reduce Nil = id reduce (m :+: m') = reduce m . reduce m' reduce (Elem x) = (Elem x :+:)

Now, suppose I have an arbitrary value m :: Mon a, and another value n :: Mon a which I happen to know is in normal form, and I want to know the normal form of m :+: n. Either I can do:

norm (m :+: n)
which complexity is linear in size(m)+size(n), or I can take a shortcut:
reduce m n
which complexity is linear only in size(m).

Daniel Yokomiso - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 9:11:38 AM (reads: 739, responses: 0)
Is this reduce (Elem x) = (x:) correct?

Darius Bacon - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 12:54:17 PM (reads: 693, responses: 0)
make_set = foldr (\x -> (x:) . filter (/= x)) []

(I'd write it a little less tersely if it weren't for the Haskell show-off factor here...)

Dominic Fox - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 1:33:33 PM (reads: 679, responses: 0)

(Elem x) = (x:) gives a type error ([a] where we want Mon a), as does (Elem x) = ((Elem x):) ([Mon a]).

The problem is that the type of (:) is a -> [a] -> [a]. I think we need something with type Mon a -> Mon a -> Mon a instead?

(Edited to remove nonsensical use of word "monoid").

Dominic Fox - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 1:48:08 PM (reads: 680, responses: 0)
make_set = foldr (\x -> (x:) . filter (/= x)) []

Nice - but to my eyes the least readable so far. That's probably because my eyes aren't adjusted to this particular style, however - I had to squint at Frank's example for the same reason. I end up translating it in my head into something like:

make_set :: Eq a => [a] -> [a]
make_set = foldr (\x xs -> x : (filter (\p -> p /= x) xs)) []

Darius Bacon - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 2:34:33 PM (reads: 672, responses: 2)
Yeah, in a real program I'd probably write

adjoin :: Eq a => a -> [a] -> [a]
adjoin x = (x:) . filter (/= x)

make_set :: Eq a => [a] -> [a] make_set = foldr adjoin []

which brings us back to the utility of exposing helper functions...

Dave Herman - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/27/2004; 5:16:42 PM (reads: 652, responses: 0)
Regarding the accumulator argument: After some consideration, I don't see why one should regard this ... as an interface issue. In fact, besides namespace clutter, I don't see what's wrong with lifting the helper function to the top-level. It doesn't expose any implementation issues, since there is no guarantee that the factorial function uses it.

That's a good point. Although namespace clutter is not unimportant, it's orthogonal to the issue of information hiding, and more a question of whether the helper function is of any general utility. If the helper function is useful, as in your example, I agree that it makes sense to expose it. That's what I dislike about nested helpers such as internal define. I prefer having a module system where you can hide and unhide helper functions easily without too much cutting and pasting.

But I still think this is an interface issue: even in your example, you provide a top-level function that provides an initial value for the accumulator, for the obvious reasons that a) the caller shouldn't be required to provide the same value each time, and b) the correctness of the implementation of norm shouldn't rely on the caller providing it.

(Of course, in LISP this could cause problems because you could change the behavior by rebinding the helper function; but then, LISP is LISP, not a `normal' language. ;)

Are you referring to dynamic scope or set{f,!}?

Daniel Yokomiso - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/28/2004; 4:19:47 AM (reads: 606, responses: 1)
Just curious. I'm not a native english speaker, so the reason may be obvious, but why name it adjoin?

andrew cooke - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/28/2004; 4:24:31 AM (reads: 608, responses: 0)
it seems to be the standard term for adding an element to a set: google

Frank Atanassow - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/28/2004; 4:46:19 AM (reads: 600, responses: 0)
Is this reduce (Elem x) = (x:) correct?

Sorry, it should be (Elem x :+:) on the RHS. I've fixed the code.

Are you referring to dynamic scope or set{f,!}?

I'm not familiar with the details of LISP, but I'm referring to the fact that top-level variables are bound late, so presumably setf or defun. For example, in Scheme you can change the result of f [typo fixed] by rebinding g using set! or define:

(define (f x) (g x))
(define (g x) 0)
(f 9) ; => 0
(define (g x) 1)
(f 9) ; => 1
(set! g (lambda (x) 2))
(f 9) ; => 2

So if factorial depends on a helper function, then rebinding the helper will change the behavior of factorial, unless you are careful and use an idiom like:

(define fachelp ...)
(define factorial
  (let ((fach fachelp))
    (lambda (n) ... fach ...)))

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/28/2004; 7:38:51 AM (reads: 569, responses: 2)
I have to say it: I love LtU threads with code!

How about a variant that preserves the order of the input elements (retaining the first of each occurence) and runs in O(n log n) time or better?

Here's one in Common Lisp:

(defun without-duplicates (list)
  "Return a copy of LIST with repeated occurences of elements removed."
  (loop for e in list
        with table = (make-hash-table)
        unless (gethash e table) collect (setf (gethash e table) e)))

P.S. Frank: I think Schemers regard that problem the same way Haskell guys would regard someone changing a subfunction and recompiling/running the program. i.e. it's bad to change the semantics of a helper function without updating its callers, whether done dynamically or statically.

Ehud Lamm - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/28/2004; 7:42:01 AM (reads: 563, responses: 0)
I have to say it: I love LtU threads with code!

I thought that you didn't like threads...

(I agree with the sentiment, btw)

Frank Atanassow - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/28/2004; 7:44:36 AM (reads: 573, responses: 0)
Wow, seeing this (without-duplicates) I wonder where people ever got the idea that `LISP has a simple, uniform syntax'.

BTW, the function make_set is in the standard library module List, where it's called nub.

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/28/2004; 8:03:44 AM (reads: 547, responses: 0)
The thought occurs that Scheme's syntactic sugar for side-effects and predicates compose quite amusingly :-)

(defun without-duplicates2 (list)
  (let ((table (make-hash-table)))
    (flet ((there!? (x)
             (prog1 (gethash x table) (setf (gethash x table) t))))
      (remove-if #'there!? list))))

Darius Bacon - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/28/2004; 11:38:00 AM (reads: 513, responses: 0)
I love LtU threads with code!

There's a mailing list I started a while back for code discussions like this, though it's gotten no traffic and almost no subscribers. I should probably have started it off with something like Richard Gabriel's annotations instead of just plopping down some code to criticize.

Guillaume - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/28/2004; 11:54:38 AM (reads: 517, responses: 0)
Wow, seeing this (without-duplicates) I wonder where people ever got the idea that `LISP has a simple, uniform syntax'.

Common Lisp's loop construct is a special case, and is frowned upon by some Lispers (while other love its expressive power).

Notably, Paul Graham (in "Ansi Common Lisp") says to avoid and ignore it, while Peter Norvig (in "Principle of A.I. Programming") give it a totally different treatment.

Pascal Costanza - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/28/2004; 6:00:58 PM (reads: 464, responses: 0)
Here are some more variants:

;; very functional
(defun without-duplicates1 (list)
  (reduce (lambda (acc elm)
            (adjoin elm acc))
          list :initial-value nil))

;; very old fashioned (defun without-duplicates2 (list) (cond ((null list) nil) ((member (car list) (cdr list)) (without-duplicates (cdr list))) (t (cons (car list) (without-duplicates (cdr list))))))

;; very modern and cool (defun without-duplicates3 (list) (loop for elem in list unless (member elem result) collect elem into result finally (return result)))

Question: Why the hash table? Isn't this premature optimization? For short lists, the overhead of creating a hash table is probably too large, and for long lists its probably better to keep them pruned from the start, right?

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/29/2004; 3:42:22 AM (reads: 426, responses: 0)
The hash table was an optimization to implement the specification of "must run in O(n log n) time or better". It makes the function O(n).

The specification of running-time may be overly specific. However, since I have needed such a function in real life I thought it would add some spice. Note that without this stipulation the many make_set implementations already posted already do the job, since they incidentally preserve the order of the input list (but in O(n^2) time).

Your suggestion of keeping it pruned from the start suggests yet another variation: a data structure that allows N objects to be inserted, duplicates discarded, and then turned into a list with the elements in the same order they were inserted in O(n log n) time or better. What makes you prefer it, and where's the code? :-)

BTW, for an "old fashioned" (or at least very specifically tuned) implementation of without-duplicates, see CMUCL's definition of the fancier standard Lisp function remove-duplicates. This function appears (judging by source code comments) to have been written around 1984. (CMUCL's CVS is temporarily down so I can't date it more reliably with cvs annotate.)

Pascal Costanza - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/29/2004; 5:25:05 AM (reads: 414, responses: 0)
The hash table was an optimization to implement the specification of "must run in O(n log n) time or better". It makes the function O(n).

Sorry, I haven't payed enough attention to this.

Your suggestion of keeping it pruned from the start suggests yet another variation: a data structure that allows N objects to be inserted, duplicates discarded, and then turned into a list with the elements in the same order they were inserted in O(n log n) time or better. What makes you prefer it, and where's the code? :-)

My only reasoning was that a hash table adds an overhead for which you don't necessarily know if it pays off in the end. If I were a nitpicker I could ask where your requirement stems from and why you are sure that you really need O(n log n), but I am not. ;)

A good data structure for keeping both the O(n log n) requirement and the elements pruned from the start should be a red-black tree. For example, see http://www.aviduratas.de/lisp/lisp1.html and http://www.cs.colorado.edu/~main/manuals/scheme/scheme_113.html - however, I am not 100% sure.

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/29/2004; 5:46:26 AM (reads: 415, responses: 1)
Note: Etiquette does not permit you to accuse me of premature optimization (asymptotically) while simultaneously complaining about tiny constant factors like the allocation of an empty hash table :-)

Off hand I can't think of a way to implement this well using trees, because they usually want to change the ordering for balancing. Can anyone think of a nice way to write a data structure such that one can incrementally insert n elements (m of which are unique) and then produce a list of unique elements in the same order they were inserted -- all in O(n log n) time and O(m log m) space or better? I have some ideas but nothing that I would call nice. In the back of my mind I suspect that Scheme or SML/NJ hackers could do something delightful..

(I don't know red-black trees by heart so I may be missing something obvious. Long ago Darius Bacon pointed me to a beautiful way of writing purely-functional trees and I've always done it that way so far in this life.)

andrew cooke - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/29/2004; 7:27:13 AM (reads: 408, responses: 0)
you don't alter the order of nodes to balance trees - you just jiggle around the branching structure.

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/29/2004; 8:24:14 AM (reads: 400, responses: 0)
[This comment intentionally left blank.]

Luke Gorrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/29/2004; 8:30:00 AM (reads: 396, responses: 0)
I hear but I don't visualise the lovely solution. In code, please!

Pascal Costanza - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/29/2004; 10:24:43 AM (reads: 383, responses: 0)
[left blank]

Pascal Costanza - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/29/2004; 10:27:25 AM (reads: 385, responses: 0)
[left blank]

Pascal Costanza - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/29/2004; 10:29:33 AM (reads: 391, responses: 1)
One nice property of red-black trees is that they keep the key-value pairs sorted by keys and sorted by the order in which they were added at the same time. The link to the Scheme library I have given above mentions the complexities of insert/delete/lookup and conversion functions, which are all delightfully low. ;) However, the constants are much higher than for other data structures, IIRC, so this probably pays off only for a large number of elements.

The link to the Common Lisp library above provides actual source code.

andrew cooke - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/29/2004; 11:06:46 AM (reads: 390, responses: 0)
key-value pairs sorted by keys and sorted by the order in which they were added at the same time.

they do? the "left-to-right" order of the leaf nodes reflects key order (in this case you have key=insertion order, presumably). but how is insertion order stored separately (if it's not equal to the key? or is that what you meant?)

for code, see the bible

Pascal Costanza - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/30/2004; 4:50:20 AM (reads: 311, responses: 1)
I am sorry, but I don't remember all the details. The book "Introduction to Algorithms" by Cormen/Leiserson/Rivest has a whole section on red-black trees. See http://theory.lcs.mit.edu/~clr/

andrew cooke - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
1/30/2004; 5:55:34 AM (reads: 321, responses: 0)
if you view a single post there's a link that says "in response to". that link only works if people reply to individual posts rather than posting at the end of the full discussion.

people seem to be neither posting replies to individual posts nor quoting context, so i have no idea what reply is associated with what comment.

if this reference to cormen at al is about red/black trees storing insertion order then ok, i will check it out when i get back home (where my copy is). but i already know how they work - i've written code myself - http://www.acooke.org/jara/mllib/map.sml - and they rearrange the branching of nodes to maintain the colour property. the insertion order is not preserved (and it makes no sense to do so - the whole point of rebalancing is to remove the pattern that insertion ordering imposes. a balanced tree is exactly one where the insertion order is not preserved - it should be balanced whatever the order of insertion was).

you can see this in the code above, if you look at the balance function. the ordering of a, b, c, d remains unchanged (ie key ordering is preserved), but the relative nesting of the types (ie the branching structure) varies (and the insertion ordering is not stored directly (the nodes contain only keys and subtrees)).

more generally, consider a balancing alogithm that guarantees minimum branch length. three keys, A B and C will always be stored with nodes A and C below B (assuming lexical ordering) no matter what the insertion order. red-black trees have a slightly weaker guarantee (the number of one color - is it red? - down any branch is a minimum), but the same argument can be applied to sufficiently large trees.

i try to be careful about making absolute statements, because i have been wrong too many times, but i am pretty sure that insertion ordering is not preserved by red black trees (or any other balanced tree, unless it is explicitly stored, given the arguments above).

Christopher Hendrie - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
2/2/2004; 12:32:24 PM (reads: 202, responses: 0)
Just for fun (and mostly off topic)...

A counting argument shows why any binary tree data structure (including red-black trees) can't maintain both insertion and key orderings without auxiliary information.

The information content of a key-ordered binary tree comes from: 1) the set of values in the tree 2) the shape of the tree 3) extra per-node bits (e.g. the 'red' bit in red-black trees)

Since the user controls the set of values, the only place the insertion order can be maintained is in the shape of the tree and the extra bookkeeping bits. (I'm assuming here that the data structure isn't allowed to be padded with extra nodes.)

The number of binary tree shapes with n internal nodes (ie the nth Catalan number) is strictly less than 4^n. Let's assume that there are a fixed number B of extra bits allowed per node. Then there are less than (4*2^B)^n different instances of the binary tree data structure (much less given balance constraints). There are n! insertion orders, which can be made to exceed (4*2^B)^n by choosing n sufficiently large.

In conclusion, any key-ordered binary tree data structure with a fixed number of per-node extra bits cannot maintain insertion order for an arbitrary number of nodes.

I'm sure the implementation cited just adds some extra pointers to maintain insertion order (alhough neither CLR nor C++ STL do) so it's kind of a pointless argument -- but it was fun to run the numbers.

Pascal Costanza - Re: Reading Styles ( Idiomatic Lisp and Idiomatic Scheme )  blueArrow
2/7/2004; 4:40:17 AM (reads: 139, responses: 0)
New context: Here is something in Common Lisp, and I wonder how this is expressed in other languages.

(defun add-vectors (&rest vectors)
  (apply #'mapcar #'+ vectors))

? (add-vectors '(1 2 3) '(4 5 6)) (5 7 9) ? (add-vectors '(1 2 3) '(4 5 6) '(7 8 9)) (12 15 18)

Pascal