## expressivity of "idiomatic C++"

The April issue of C/C++ Users Journal has an article called A New Solution To an Old Problem by Andrew Koenig and Barbara E. Moo. In it, they revisit the Hamming numbers problem from Dijkstra's A Discipline of Programming.

They examine four different solutions:

1. A naive O(n^2) solution.
2. Dijkstra's efficient but somewhat convoluted O(n) solution (re)implemented in C++.
3. An elegant O(n) solution in Haskell.
4. Their own new O(n*log(n)) solution in "idiomatic C++".

The Haskell solution is the following

scale n (x:xs) = (n * x) : (scale n xs)
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys) =
if x == y then
x : (merge xs ys)
else if x < y then
x : (merge xs (y:ys))
else
y : (merge (x:xs) ys)

seq = 1 : (merge (scale 2 seq)
(merge (scale 3 seq) (scale 5 seq)))


Their "idiomatic C++" solution uses ordered sets:

set<int> seq;
seq.insert(1);
set<int>::const_iterator it = seq.begin();

int val = *it;
seq.insert(val * 2);
seq.insert(val * 3);
seq.insert(val * 5);
it++;


In conclusion, they have this to say (emphasis mine),

We have looked at four solutions to a simple problem. The first was straightforward but slow. The second was much faster but fairly tricky. The third, in a functional language, was again straightforward -- but requires a totally different way of thinking about programming. Indeed, advocates of this way of thinking use the program's straightforwardness to argue that this way of thinking is superior.

With the fourth solution, we believe that the argument is far from obvious.

I may be reading too much into this quote, but it sounds to me like Koenig and Moo consider it a bad thing to require a "totally different way of thinking about programming".

P.S. While googling for Hamming numbers, I came across this related paper: Expressivity of Functional-Logic Languages and Their Implementation by Juan José Moreno Navarro.

## Comment viewing options

### FP as a second language

but it sounds to me like Koenig and Moo consider it a bad thing to require a "totally different way of thinking about programming".

Of course they do. Remember that one of the core design values of C++ is building upon pre-existing knowledge and expectations from C. I think there is an inherent conservativeness in the C/C++ community, partly because of this value and partly because they are the "powers that be" in many programming domains.

I should point out that, though I don't share that PL conservative outlook, I don't think it is unreasonable either. If you have spent many years mastering a particular "way of thinking about programming", you probably aren't going to want to give it up on short notice.

No one wants to go from being a doctor or engineer in his "native" PL to being a taxi driver in someone else's.

I think a lot of technological "religion" flows from exactly this kind of learning opportunity cost.

### Agreed

First, it's a fair observation that when you're faced with a problem and someone proposes a paradigm shift as a solution, you should explore all other avenues first, because paradigm shifts are high-cost events by definition. Given this, it's not clear to me that Koenig and Moo are necessarily hostile to FP, and in fact anyone who writes forcefully about taking full advantage of the STL, as Koenig and Moo have done, can't be all that hostile to it!

However, even if I'm mistaken on that last point, it doesn't mean that the entirety of the C++ community agrees, or even that people who are on the standards committee agree: witness Bjarne Stroustrup's insistence that C++ is multiparadigm including functional programming, boost::lambda, FC++, and Phoenix. Also, boost::lambda and Phoenix are being integrated, and hopefully at some future date we'll also see monads make it from FC++ into whatever the integrated boost functional programming library is called.

It pains me to say it... but C++ is my fourth-favorite functional language. That may not sound like much, but the fact that it's on my list at all just flabbergasts me. For the curious, my first three are Oz, O'Caml, and the Lisp family.

it's a fair observation that when you're faced with a problem and someone proposes a paradigm shift as a solution, you should explore all other avenues first, because paradigm shifts are high-cost events by definition.

STL was a paradigm shift at the time it was introduced, was it not?

### So was OO

This is the big advantage C++ has- even over other "Object Oriented C" languages (like Objective-C). It provides the new "paradigm" (or pseudo-paradigm- I don't think templates in C++ or modules in Ocaml quite qualify as full on paradigms by themselves) without really requiring it. You can ignore it if you don't understand it or don't understand it well enough to use it right now. I still know a number of programmers supposedly programming in C++ whose code looks awfully similiar to C.

Note that this is also a disadvantage of C++, in that code written by one programmer who "knows" the language is not necessarily understandable by another programmer who "knows" the language.

### Surely you mean that code wri

Surely you mean that code written by a programmer who knows the language may not be comprehensible to one who "knows" (i.e. doesn't know) the language?

### Known Knowns :-)

I think his point was that it's quite possible (actually, common) for one C++ programmer to know the language from one angle and another programmer to know it from another angle, and for each of these programmers to find the other's code incomprehensible, even though they're both writing perfectly valid ANSI C++.

I encounter this phenomenon a lot: I make no apologies for the fact that my C++ style is very much driven by the modern capabilities of the language and libraries coupled with my background as a Lisp and, nowadays, O'Caml programmer. Folks who come to C++ with what I think of as the MFC (or, to be fair, MacApp) perspective frequently find my code incomprehensible (I pasted some compiling, but not working, source code into a chat with other C++ programmers with over a decade's experience and had one of them seriously challenge the idea that the code would even compile). Conversely, I find code written in the rely-on-members-in-a-superclass-to-side-effect-a-data-member-that-I-rely-on, coupled with plain-ol'-public-data-members, style utterly incomprehensible as well.

C++ now supports multiple diametrically opposed styles and their associated communities. This is something that it has in common with Perl. I leave you to draw your own conclusions about that.

### Yes- that was precisely what I meant

In my experience, there are three main "styles" of C++ programmers I've seen:

First, there is C-with-classes programmer. These people are comming out of the C world, and they're not really comfortable with all these new-fangled stuff. Often times, there code will compile and run perfectly fine on a straight C compiler. Most of the rest use the occasional class or STL library, but not often. You'd have to change maybe 1% of the code to get it to compile as C. The red flag that you're dealing with a C-with-classes programmers is when you can do:
#define class struct
and everything still pretty much works.

Next up is the Java++ programmer. These are people comming out of the Java, Python, Ruby, etc. world. And they are still programming in said languages. They use objects a lot, but avoid using features not in these languages, like pointer arithmetic, templates, operator overloading, etc. The red flag that you're dealing with a Java++ programmer is when you start seeing classes with only virtual member functions all over the place (that's how you implement an interface, see...). Also, a tendancy to allocate everything on the heap and unbox almost nothing is also a common warning sign.

Lastly, there is the he-man C++ programmer. Generally, these people don't really know another language- several of them may have learned Pascal or Scheme back in college, but they're "grown-ups" using "grown-up languages" now (just ask them), but an increasing number of them learned C++ as their first language and never moved on. These programmers have memorized everything Bjarne Stroustrup ever wrote, and are out to prove it. The red flag that you're dealing with a he-man C++ programmer is that you find yourself having to look up what a private const static virtual constructor means, because that's critical to understanding why the code even works. He-man C++ programmers love template metaprogramming, which is possibly the most damning thing I can say about them. He-man C++ programmers never, ever eat quiche.

### In my experience, there is a

In my experience, there is a fourth category: Those who wish they were in the third, but don't know enough C++ to pull it off. Warning sign: writing libraries that replicate existing libraries to augment C++, or writing libraries to duplicate functionality alreay present in the standard language, because they don't know about it.

In fairness, though, plenty of C++ code uses a lot of classes, as it is written by people who have graduated beyond the first category, but have yet to discover "advanced" features like STL, templates, or operator overloading in any meaningful way, or don't really need it that much, to grind out code.

There's a natural tendancy to assume I listed the classes from least advanced to most advanced. Well, OK- the c-with-classes folks generally really don't know C++. But I tend to fall into the Java++ class, despite knowing all about templates, operator overloading, etc. Simply because a language provides a feature doesn't mean you should use it at every opportunity- as the IOCCC shows. And while the Java++ programmers may be ignorant of the more obscure corners of the languages, He-man C++ programmers are just as commonly (in my experience) ignorant of Design Patterns. I've seen He-man C++ programmers be completely baffled by the code written by Java++ programmers. Not that they didn't understand the mechanics of how the code worked, but because they didn't understand the metaphors and patterns the Java++ programmer was using.

### You do know what quotes signi

You do know what quotes signify, right?

### Yes

And I meant them.

Simply because you're up on the syntax and semantics of C++ doesn't necessarily make you a superior programmer over the guy who just got introduced to the language a week ago.

### No, But...

...like money buying you happiness, it certainly helps a lot!

### I meant in English. Simply be

I meant in English. Simply because you can think up snide asides doesn't make you more articulate.

### stereotyping is not very constructive, and...

The red flag that you're dealing with a C-with-classes programmers is when you can do:
#define class struct
and everything still pretty much works.

by language definition every c++ program works just as well with the above macro applied. have a look at the c++ standard to figure out why...

you should at least support your overgeneralizing critique by providing correct accusations.

in my personal experience i have met just as many close-minded and/or incompetent c++ programmers as i have met such persons in most other language community of your choice. attitude and skill is usually not directly related to programming language preference.

### Look at the wookie!

Disclaimer: following honored traditions pioneered at Slashdot, I haven't read the article being referred to. But I'm going to rip into it anyway. :)

If you have spent many years mastering a particular "way of thinking about programming", you probably aren't going to want to give it up on short notice.

No one wants to go from being a doctor or engineer in his "native" PL to being a taxi driver in someone else's.

I think a lot of technological "religion" flows from exactly this kind of learning opportunity cost.

True, and it results in a pretty irrational argument on the part of the authors of this article.

On the one hand, the article implicitly demonstrates the benefits of considering multiple approaches to solving a problem, in that by doing so, we can find solutions that meet a desired balance between e.g. expressivity and performance. That's how they achieved their new solution, after all.

OTOH, the implied conclusion is (apparently) that a major part of the possible solution space - i.e. functional solutions - should be excluded from consideration in general.

The reason for excluding such solutions boils down to the fact that they've previously been excluded, which is why something as simple as a basic recursive solution to a problem is seen as a "totally different way of thinking about programming". The reasoning is circular (dare I say recursive?) There's no underlying logic here, other than the need to defend a knowledge gap that exists for historical reasons. The same argument could have been made against using STL-based code in the first place.

To hide this weak argument, they've resorted to major obfuscation — conflating issues arising from different paradigms, different languages, different algorithms, differing amounts of library support, and implied issues like side-effect free programming, all without making any attempt to tease the issues apart. To top it all off, we are to believe that this single, highly confused example has some bearing on the general case.

All in all, we're dealing with the Chewbacca defense.

### another quote from the article's Conlusion

the implied conclusion is (apparently) that a major part of the possible solution space - i.e. functional solutions - should be excluded from consideration in general.

I don't believe this is a fair characterization of K & M's conclusion. Let me give you another out-of-context quote from the same piece:

It is not essential to master every detail of every tool that you might ever need to use. What is essential is knowing that the tools exist, being open to the possibility of using them, and being willing to learn to how to use them when they are likely to be useful.

Thanks for Chewbacca Defense link by the way. I've been using the term ever since I saw the South Park episode in which this notion was introduced. I didn't realize, however, that the term had gained such wide recognition.

Koenig & Moo wrote:
It is not essential to master every detail of every tool that you might ever need to use. What is essential is knowing that the tools exist, being open to the possibility of using them, and being willing to learn to how to use them when they are likely to be useful.

I agree with this point in general, but it doesn't save the authors in this case. If someone is at a point where FP seems to require learning "a totally different way of thinking about programming", they're not going to meaningfully be able to keep that as a readily available option in their toolbox. Suddenly learning all about FP right at the point when it is "likely to be useful" isn't a great strategy. In any case, the comments you quoted do seem to imply something about the authors' opinion on the merits of investing in learning that different way of programming.

A big problem I have with all of this is what, specifically, the bogeyman is here — what are the elements of the "different way of thinking" they see as being required for the FP solution to this particular problem.

Presumably the problem isn't Haskell syntax in general, since that's a specific language issue, rather than anything intrinsic to the notion of FP. You could implement the same sort of code in C++.

If the problem is recursion, then what we're really talking about is the fact that mainstream languages have done a poor job of implementing recursion, primarily because they didn't know any better at the time. So while recursive solutions may be a totally different way of thinking about programming for some people, that's only because they've been Sapir-Whorfed into partial ignorance by the languages they use. The "different way of thinking" that is required here actually involves unlearning old limitations on thinking, and the only possibly valid argument against doing so is if the language you're using still doesn't support recursion well.

I can't imagine that the pattern matching is the issue — non-FP languages have that too, Python comes to mind. The lack of mutation in Haskell might be more of a concern, but that's somewhat orthogonal to real-world FP, and it's not news to anyone that mutation can have convenience and performance advantages.

In all these cases, the real issues involve learning principles which are more general, more widely applicable, than their specific expression in any single language, particularly mainstream languages. I would hope that someone who's actually teaching students would recognize this.

### Myths and ideology in programming

Here is how my girlfriend, a literary critic, would analyze this.

According to Roland Barthes, a myth is an artificial cultural artifact which is presented as timeless, universal and natural. You can find examples in nationalism, advertising and media. Judging from the quoted text of the article, it sounds like the authors are using myths to reinforce a power structure where pedants and celebrities exert control over programmers and users.

The whole "different way of thinking" thing is part of the imperative ideology. It's a bogeyman playing on xenophobia that is used to make people uncomfortable about competing paradigms, and keep them in the fold of a community (culture) where what they read and hear can be controlled and censored.

They write that the FP approach is "straightforward -- but requires a totally different way of thinking about programming." Hm? It is both straightforward and "totally" alien? This borders on contradiction. Indeed, if a person can grasp the FP solution without having done any actual functional programming, just by reading a short article, and moreover regard it as "straightforward", one can hardly claim it is a "totally different way of thinking". Even if it is totally different from imperative thinking, whatever that is, it may not be incompatible with other sorts of thinking habits which their audience might broadly share. After all, we all get taught about equations, functions and, to some extent, recursion in high school. (At least factorial and the chain rule.) How alien can it be?

To be fair, though, there is also an FP (and LP) myth that because FP is "based on" mathematics it is natural and hence superior to procedural and OO paradigms. But mathematics is a cultural and historical artifact. I do not mean that mathematical truths are relative; I mean that the body of mathematical literature has taken a form which is a historical accident. For example, mathematics is often divided into geometry, analysis and algebra, but this could as easily have been geometry, analysis and coalgebra, in which case OOP advocates could have exploited this myth, OO being rather coalgebraic in nature.

Don't get me wrong. I think that there is a genuinely closer relationship between mathematical truth and FP, because FP has been analyzed in a thousand different ways in scientific literature, and FP languages tend to come with a semantics which is moreover effective for reasoning about programs. And I regard this as an advantage, because of the opportunity cost argument. But the "FP is more pure" (= natural) argument is bogus.

### Myths and ideology in mathematics

But mathematics is a cultural and historical artifact. I do not mean that mathematical truths are relative

This reminds me of Proofs and Refutations by Imre Lakatos, a highly influential work in philosophy of mathematics. Since I haven't reread it in more than five years, my recollection may be faulty. (You can double-check it by following the above Amazon link which gives you access to the entire scanned book.)

I believe Lakatos would strongly agree that "mathematics is a cultural and historical artifact". I think he goes a little further and argues that mathematical truths are, in fact, relative. The above Wikipedia link gives a fairly good summary of his views.

Here's a quote from the book found by searching "inside the book" for absolute truth. It's from the footnote that spans pages 54 and 55:

'Nature confutes the sceptics, reason confutes the dogmatists' (Pascal, [1659], pp. 1206-7). Few mathematicians would confess [...] that reason is too weak to justify itself. Most of them adopt some brand of dogmatism, historicism or confused pragmatism and remain curiously blind to its untenability; for example. 'Mathematical truths are in fact the prototype of the completely incontestable... But the rigor of maths is not absolute; it is in a process of continual development; the principles of maths have not congealed once and for all but have a life of their own and may even be the subject of scientific quarrels.' (A.D.Aleksandrov [1956], p.7) (This quotation may remind us that dialectic tries to account for change without using criticism: truths are 'in continual development' but always 'completely incontestable'.)

### from Lakatos back to Hamming

On Apr 7, 2005, I wrote:

This reminds me of Proofs and Refutations by Imre Lakatos

Interestingly, this brings us back to Hamming. I've been reading some of the papers mentioned in a recent new forum topic posted by Charles Stewart on Apr 17, 2005. Specifically, R.W. Hamming's essay The Unreasonable Effectiveness of Mathematics makes it clear that he was deeply influenced by the aforementioned work of Imre Lakatos.

A couple of quotes from the essay:

It is claimed that an ex-editor of Mathematical Reviews once said that over half of the new theorems published these days are essentially true though the published proofs are false. How can this be if mathematics is the rigorous deduction of theorems from assumed postulates and earlier results? Well, it is obvious to anyone who is not blinded by authority that mathematics is not what the elementary teachers said it was. It is clearly something else.

...

Indeed it seems to me: The Postulates of Mathematics Were Not on the Stone Tablets that Moses Brought Down from Mt. Sinai.

...

For over thirty years I have been making the remark that if you came into my office and showed me a proof that Cauchy's theorem was false I would be very interested, but I believe that in the final analysis we would alter the assumptions until the theorem was true. Thus there are many results in mathematics that are independent of the assumptions and the proof.

### That reminds me

I had meant to respond to your earlier post. I simply want to say, that anyone who hasn't read "Proofs and Refutations" should (even if you aren't particularly mathematically inclined). It's enjoyable and quite good.

### Whose myth?

What's missing in Your analysis is that claiming that FP employs a completely "different way of thinking" is not a myth generated by the imperative crowd originally but by the FP crowd itself. What happend is turning an optimistic revolutionary attitude towards programming ( "forget everything you know about C .." or "you will have a hard time to unlearn but it will be worthwhile in the end .." [1] ) into a conservative and defensive reaction ( "you don't need a completely different way of thinking to get the job done .. "). The authors simply play back the ideological distinction to their originators. The message always arrives it's receiver.

Kay

[1] "Think different" is/was an optimistic slogan of Apple Corp.

### And if it's recursion...

How can you be a good programmer if you think that recursion is a "new way of thinking"? Good programmers understand recursion, even if they write mostly for loops.

### Nth languages

I should point out that, though I don't share that PL conservative outlook, I don't think it is unreasonable either. If you have spent many years mastering a particular "way of thinking about programming", you probably aren't going to want to give it up on short notice.

I do find this somewhat unreasonable. It should not take years for an experienced programmer to master or at the very least become proficient in a new programming language even if there is a paradigm shift. And as one learns more languages it takes even less time (and it takes even less time without a paradigm shift). And it's not like you have to drop everything and hide away in a cave while you're learning.

Finally, this example alone is a pretty good example of why knowing multiple paradigms/languages is useful, and there are heaps of FP techniques being applied in OO languages nowadays providing another good incentive (at least for FP). Learning other paradigms/languages is not a waste of time.

The Haskell code irks me; it is more verbose than it needs to be. I assume this was done to make it more recognizable to non-Haskell (read C++) programmers.

scale n = map (n*)
merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys)
| x == y    = x : merge xs ys
| x 

### Unilingual Perquisites

I do find this somewhat unreasonable. It should not take years for an experienced programmer to master or at the very least become proficient in a new programming language even if there is a paradigm shift.

How many years (decades) did it take OO to be understood by mainstream programmers? Many still don't.

Just because some of us are good at this kind of thing shouldn't lead us to assume that most otherwise skilled programmers should be too.

From a sociological point of view, I think you skipped over the pithier part of my post:

No one wants to go from being a doctor or engineer in his "native" PL to being a taxi driver in someone else's.

The psychological benefits of sticking to what you know well are a pretty strong motivator.

Like an English speaker in any part of the world where English is widely spoken as a second language, the path of least resistance is always to revert to English when the going gets rough.

### Professionalism

How many years (decades) did it take OO to be understood by mainstream programmers? Many still don't.

I'm not talking about the mainstream, I'm talking about individuals, particularly professionals (more the connotations of the word rather than the denotation).

Just because some of us are good at this kind of thing shouldn't lead us to assume that most otherwise skilled programmers should be too.

I have a feeling most people who are "good at this kind of thing" weren't born that way, have many languages under their belts, and found there first or second language took more time than the latter ones. In a nutshell, they are "good at this" because they have practiced it!

No one wants to go from being a doctor or engineer in his "native" PL to being a taxi driver in someone else's.

Part of my intent is that one isn't going to be a "taxi driver" for long. Further, if such people are really so insecure, they can keep it to themselves until they feel comfortable.

The psychological benefits of sticking to what you know well are a pretty strong motivator.

Like an English speaker in any part of the world where English is widely spoken as a second language, the path of least resistance is always to revert to English when the going gets rough.

Yep, this and the above are good reasons not to ever learn anything new at all!

I guess I should've stopped working at guitar, when I was feeling so clumsy trying to change chords smoothly or stopped drawing after circular filing some less-than-successful attempts, and there certainly were those CT papers that gave me a niced glazed look. Sure, I wouldn't be able to play guitar, draw, or know what a colimit is now, but at least I wouldn't have felt uncomfortable!

### Pragmatism

Yep, this and the above are good reasons not to ever learn anything new at all!

There is a difference between advocating a position and observing its power "on the ground".

If we want to advocate new ways of doing programming, we have to confront the reasons why they are resisted.

If we don't care if the benighted wallow in ignorance until their dying day, well, we could ignore them (and we wouldn't be discussing this article at all.)

It is easier to change people's behaviour than to change their values. If we want them to adopt our favorite paradigm, we have to address the reasons they resist it.

If we don't care about that, we shouldn't get so mad that they resist it. ;-)

### Just a thought about target audiences.

Wonders whether teaching any subject, requires the decision of whether to teach to the bottom, middle, or top of the class. Those that teach to the top are accused of elitism. Those that teach to the bottom are accused of dumbing down. And those that teach to the middle are accused of promoting mediocrity.

[edit note: And I'd prefer LtU to be accessible to all three at the same time]

[edit edit note: Ok, so we might not want the absolute dregs] :-)

### Different Class

Not only that, but some of us are sitting in the Advanced Category Theory seminar while others are attending Object Oriented Programming With Java 101.

I personally want to see some Ruby-on-Rails-style tutorials for web programming with Haskell...

### Old school

Being one who's mid-tier but hangs around smart people hoping some of that knowledge will rub off, I think it's important for the best and brightest to reach down (as well as up). It helps to touch bases with the basics periodically to help solidify the more abstract ideas (a dose of reality) and helps bring in new aspirants to mix up the field of ideas.

Of course, the alternative is to put the Oleg's and Noel's in a room by themselves, but they'd probably get bored with each other sooner or later. :-)

### My approach

Or you create a community site like LtU, where demi-gods can share their wisdom... I really suggest people use this opportunity (e.g., Oleg's paper about delimited continuations currently being discussed).

### Conciseness

The Haskell code irks me; it is more verbose than it needs to be.

Perhaps then you should also drop the matches on nil in merge.

scale n = map (n*)
merge (x:xs) (y:ys)
| x == y    = x : merge xs ys
| x <  y    = x : merge xs (y:ys)
| otherwise = y : merge (x:xs) ys

seq = 1 : merge (scale 2 seq)
(merge (scale 3 seq) (scale 5 seq))


### Ironic

The irony is that the original definition of scale will only work on infinite lists whereas my version will work on finite or infinite lists, however the original version of merge would work on finite lists, but this version won't. It's a bit of an odd inconsistency in the original code.

### IMHO the version of each func

IMHO the version of each function that works on finite lists is to be preferred, simply because that is least surprising. That said, a comment that as all lists inside the programme are infinite, termination conditions are not needed would probably remove the element of surprise.

### Apples to oranges

The Haskell implementation is complete self-contained solution. It requires no reference to libraries. The C++ is simply leveraging off behavior provided by the template library. In terms of simplicity, the haskell solution just requires you learn that languages syntax, of which this example is a pretty plain jane example. For the C++, you have to learn it's immense syntax, figure out how it's used here, and then dive into the template library and figure out what to do there. And we are talking about a math problem here.

### Default Assumptions

Chris Rathman: For the C++, you have to learn it's immense syntax, figure out how it's used here, and then dive into the template library and figure out what to do there.

I think Koenig and Moo's assumption is that a modern C++ programmer has done that, and therefore there won't be any mystery to their C++ solution, whereas even once you explain the Haskell syntax, the approach of pattern matching on algebraic data structures is still alien.

Of course, you still have a point; it's not the least bit alien to anyone conversant in any of the ML family, Haskell, Oz, Concurrent Clean...

The topic is a pertinent one to me. As I've written before, my day job involves a large-ish body of C++ written aggressively in the Microsoft MFC style, and I'm slowly attempting to untangle it and make it more idiomatic in the context of what it does. Since a lot of what it does is deal with containers, moving more to functionalism and the use of exceptions feels very natural to me, but folks who don't know the STL and boost::bind reasonably well find some of my code weird. So even within the C++ community, there's a knowledge/stylistic-preference gap that has to be addressed.

### Wrong assumptions

One of the first languages I looked at after C and Java was OCaml (I read part of SICP before). Pattern matching algebraic data structures was the most natural thing. I immediately started hating C dialects for the (needless) lack thereof.

Often ML code is clearer than the infinitely tangled data and control flow that usually ends up in any non-trivial Java program, with all kinds of objects calling each other. No wonder that often a bug ends up somewhere inbetween!

That said, I like imperative features.

### "Language" vs "Library" is often an artificial distinction.

Does it matter much that lists in ML are an "intrinsic" datatype, whereas std::set is part of the library (and is capable itself of being written in "plain" C++)? In my mind, no.

There are many good arguments to use in language flamewars; the language-vs-library argument isn't one of them. After all, if C++ had sets as an intrinsic datatype, that would just be further grist for the "it's a bloated langauge" mill. :)

Any competent C++ programmer should be familiar with the STL. That said, there are many who aren't--who still use it as little more than an OO extension to C.

### It's a question of the amount of information conveyed

The Haskell example captures the abstraction in very large part. The C++ example shows a recipe for using the library to create a solution. There is an impedance mismatch between the depth of information the Haskell code conveys and what the C++ code conveys - being at once both a higher and a lower level of abstraction for the Haskell code.

Sure, libraries level the playing field. For 99.99% of the population that uses such things, the example given will just be a recipe to be copied and pasted. Either solution could be put in a library and added to the next generation of HQ9++ as a single character N (H is already taken).

The traditional mathematician recognizes and appreciates mathematical elegance when he sees it. I propose to go one step further, and to consider elegance an essential ingredient of mathematics: if it is clumsy, it is not mathematics.
- Edsger Dijkstra

### Lists in ML; C++ templates

aren't part of the language. They are pattern-matched like everything else.

Lists in ML, just like strings in Java, have an extra built-in syntax, though, because their use is so common.

To C++: the best features of that language are that is makes some common C idioms easier, adds some typechecking and gives inheritance (which is overused, though).

OTOH, I still use C in preference, since C++'s assumptions to memory layout (variable sized structs aren't really possible as classes) are restrictive. Templates are the worst thing since unsliced bread, requiring everything the code to be compiled for EVERY freaking instantiation! In C I just put a void pointer in a structure, no need to compiled it 50 times. I don't use the STL&C++ for that reason.

### Builtin syntax vs. standard library? Doesn't matter

Regarding the simplicity of a program, it doesn't matter which part of the language it uses is built into the language and which is a part of the standard library.

Actually if more is done in a library then it's a sign of a more flexible language. Since libraries can be added but compilers cannot be easily extended, it means that a random other problem has a higher chance of admitting a clean solution: even if it wasn't considered by language designers, it can be solved by library writers later.

If you say that it's unfair because it makes languages with lots of libraries appear more powerful than those without, then I reply that they appear more powerful because they are. Libraries matter.

A ready library might be less universal than the ability to implement a solution from scratch, but it definitely allows the task to be solved much easier when it applies.

### Sign of a flexible language?

Actually if more is done in a library then it's a sign of a more flexible language.

The argument would hold more weight had the template library not required the C++ compiler to be majorly modified to get them to work.

More generally though, there are lots of smart programmers out there who can make libraries out of even the worst languages. The availability of libraries has much less to do with the quality of the
language and more to do with the shear numbers of people using the language to various ends.

Anyhow, my argument isn't about how simple it is to do this in C++ vs. Haskell. It's about the mismatch between the code samples. The C++ code has much of the detail hiding behind the library whereas the Haskell code is just fairly straight forward from that languages perspective. And just because the sample Haskell code does not use a library doesn't mean that it couldn't be implemented as a library for sequences.

There are a lot of things which are simpler in C++ but I fancy that infinite algebraic series is not among the top.

### Of course... a language should support some basic set

of features.

You mention C++ templates (more generically, generic polymorphism). You are correct; implementing the STL--or anything like it--required this feature be present in the language; Java is now undergoing a similar evolution.

No argument there.

And as others have pointed out, the advantage that Haskell has over C++ in this instance is lazy evaluation and lazy data structures. One could port the Haskell algorithm (and it's O(n) performance) to C++, but the code would be an ugly hodgepodge of roll-your-own thunks and other assorted hackery, needed to emulate lazy evaluation in an otherwise eager language. (Think of a Scheme implementation of the same algorithm, then convert the uses of delay/force one would find into equivalent C++ machinery).

However, comparisons of the C++ and Haskell code above are comparing apples to oranges; they are different algorithms. One uses lazy evaluation of lists to select elements from an ever-expanding "frontier" of possible choices; the other uses an ordered associative container (a STL "set") to essentially sort the elements. The behavior of std::set is well-known to any C++ programmer (including performance characteristics, which are documented), and the meaning of the above code fragment is clear as day. Just as the meaning of the Haskell snippet is obvious to a competent Haskell programmer. Both samples are likely to be difficult to read for someone not conversant in the language.

Regarding "hiding detail"--where I come from, that's considered a good thing. I don't want to know (or care) that std::set is implemented using a red/black tree, for instance... such details are not germaine to the problem at hand. What salient detail did you think the C++ code obscured?

### Guess I could be defensive to the bitter end

Of such intransigence are language wars made of. But I'll bow out under the premise that I glean more information from the Haskell code than the C++ code - and yes that could very well be because I've been working more with Hindley-Milner type systems of late and haven't worked much with C++ Templates.

But as you state, they are different algorithms, so I win my argument anyhow, even if I had to totally change my argument in the process. :-)

### totally different?

Where the article presents "The Functional Solution" it is quite reasonable, introducing lazy lists gently, and ending with "[the definition of seq] is no more magical than our earlier example". The comment about a "totally different way of thinking" is not supported in the article, but it's surely intended to set up the C++ solution as preferable. Of course, it's the FP advocates who continually assert the difference.

The primary difference in the functional program is that it has a name for the infinite list that is the solution. The alternatives provide a data structure where the solution is accumulated, which to my eyes appears unfinished. The "idiomatic" solution generates the numbers OK, but does not provide an idiomatic way to access them, i.e. via an iterator. The article's conclusion is flawed, since its solution amounts to a fair demonstration of a hole in C++'s expressivity.

What is the full C++ listing? The C++ listing given assembles an ordered set containing the numbers 1,2,3,5.

### for the reader to complete

The OP contains all of the code from the article, and http://www.cuj.com/code/ doesn't have it yet. (And if you guess the name of the zip file for the April code, the code from the Koenig/Moo article isn't in that file, either.)

OP?

### An attempt at a full C++ version

Below is an attempt at a full C++ version. It differs from the haskell version at the 38th number.

(edit) Initially I was confused as to why this is; I've realised that this is because of the order in which the numbers are produced. Increasing the 20 to 60, for example, causes the lists to agree in the 38th number. One might suggest that the haskell version has comprehensibility benefits.

#include <set>
#include <iostream>
using namespace std;
int main()
{
set<int> seq;
seq.insert(1);
set<int>::const_iterator it = seq.begin();

for(int i = 0; i < 20; ++i)
{
int val = *it;
seq.insert(val * 2);
seq.insert(val * 3);
seq.insert(val * 5);
it++;
}

it = seq.begin();
std::cout << *(it++);
for(; it != seq.end(); ++it)
std::cout << "," <<  *it;
std::cout << std::endl;
return 0;
}


### a deficiency in the C++ implementation

It differs from the haskell version at the 38th number. Anyone have an idea why?

Because you haven't computed the 38th number yet. You execute the body of the loop 20 times, generating the following 41 numbers:

1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36, 40, 45,
48, 50, 54, 60, 64, 72, 75, 80, 81, 90, 96, 100, 108, 120, 125, 135, 150, 160,
180


135 is (incorrectly) followed by 150.

If you iterate 30 times — instead of 20 — your program generates the following 56 numbers:

1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36, 40, 45,
48, 50, 54, 60, 64, 72, 75, 80, 81, 90, 96, 100, 108, 120, 125, 128, 135, 144,
150, 160, 162, 180, 192, 200, 216, 225, 240, 250, 270, 300, 320, 360, 375, 400


135 is now (correctly) followed by 144.

So, I think your example highlights a deficiency in the "idiomatic C++" implementation. There is no easy way to tell how many insertions one needs to perform in order to compute the Nth Hamming number correctly. Thanks for pointing this out.

P.S. I found this on the python list: Classical FP problem in python : Hamming problem (alternatively, see the entire thread)

### I can't believe I wrote this...

There is no easy way to tell how many insertions one needs to perform in order to compute the Nth Hamming number correctly.

On the other hand, by simply iterating N times, you do at most O(1) too much (amortised) work.

This suggests the following implementation, which is a little closer to the Haskell version. Plus, it's about as short as the original C++ version if you ignore includes and typedefs.

#include <iostream>
#include <set>
#include <algorithm>
#include <boost/iterator/transform_iterator.hpp>

using namespace std;
using namespace boost;

typedef binder1st<multiplies<int> > scale_type;
typedef transform_iterator<scale_type, set<int>::iterator> scale_iterator;

main()
{
set<int> s;
s.insert(1);

scale_iterator it2(s.begin(), bind1st(multiplies<int>(), 2));
scale_iterator it3(s.begin(), bind1st(multiplies<int>(), 3));
scale_iterator it5(s.begin(), bind1st(multiplies<int>(), 5));
set<int>::iterator it = s.begin();

for (int i = 0; i < 40; ++i)
{
s.insert(*it2++);
s.insert(*it3++);
s.insert(*it5++);
if (i > 0) { std::cout << ", "; }
std::cout << *it++;
}
std::cout << '\n';
}


This is almost the Haskell version. To do the Haskell version properly, and have constant time access, you need to implement merge. Here are the gory details, courtesy of Boost. Don't read if you're squeamish...

#include <iostream>
#include <list>
#include <algorithm>
#include <boost/iterator/transform_iterator.hpp>

using namespace std;
using namespace boost;

typedef binder1st<multiplies<int> > scale_type;
typedef transform_iterator<scale_type, list<int>::iterator> scale_iterator;

template<typename It1, typename It2>
class merge_iterator
merge_iterator<It1, It2>
, const int
, incrementable_traversal_tag
>
{
public:
merge_iterator(It1 p_it1, It2 p_it2)
: m_it1(p_it1), m_it2(p_it2)
{
}

private:
friend class iterator_core_access;

void increment()
{
int x1 = *m_it1;
int x2 = *m_it2;
if (x1 <= x2) { ++m_it1; }
if (x1 >= x2) { ++m_it2; }
}

const int& dereference() const
{
return std::min(*m_it1, *m_it2);
}

It1 m_it1;
It2 m_it2;
};

main()
{
list<int> s;
s.push_back(1);

std::cout << 1;

scale_iterator it2(s.begin(), bind1st(multiplies<int>(), 2));
scale_iterator it3(s.begin(), bind1st(multiplies<int>(), 3));
scale_iterator it5(s.begin(), bind1st(multiplies<int>(), 5));
merge_iterator<scale_iterator,
merge_iterator<scale_iterator, scale_iterator> >
it(it2, merge_iterator<scale_iterator, scale_iterator>(it3, it5));

for (int i = 1; i < 40; ++i, ++it)
{
int x = *it;
std::cout << ", " << x;
s.push_back(x);
}
std::cout << '\n';
}


It's a lot less ugly than I thought it would be.

Interestingly, the hardest part that I found was getting the typedefs and other type declarations right. The actual code was easy. Especially since I ignored termination conditions on the input iterators.

One thing which isn't covered here, of course, is that the Hamming number problem is a very artificial problem in the first place.

### I note that the "full haskell

I note that the "full haskell C++" version does require a lot more boilerplate and declaration than the haskell version, which does rather obscure its meaning, at least at first glance. The same is not true of the "semi-haskell" version I feel, which is probably clearer than either of my versions.

### To be fair...

It's true that the "full Haskell C++" version has more boilerplate, but in fairness, I used a pile driver to crack an egg: hamming_iterator is a fully conforming proposed-standard iterator, which is arguably overkill.

In writing it, I found that the declarations were the most tricky parts. The boilerplate (and it's not clear that it's all boilerplate; it's there because C++ iterators can do things that Haskell lazy lists can't, such as allow clients to modify what they point to if the container allows it) was fairly mechanical, didn't take long, and worked first time more or less.

I agree with Scott Turner that I'd rather do a recreational programming problem like this in Haskell. On the other hand, I learned stuff doing it this way in C++.

I think the most disturbing remark in the original article is that the Haskell solution "requires a totally different way of thinking about programming". If you're a professional software engineer on a deadline, and you can see a solution that's easy for you to understand and good enough, then you should do it. Your time is valuable.

In this case, though, we're talking about Hamming numbers. Why you would bother with such a simple, artificial, recreational programming problem if you weren't trying to learn something?

### C++ still not comparable to Haskell

That matches the efficiency of the Haskell, but not the expressiveness. The Haskell version has a name 'seq' which embodies the infinite sequence of Hamming numbers. Very straightforward. In C++ that should be represented by an iterator. In the above programs, neither s.begin() nor 'it' is the complete answer.

Below is the implementation an iterator class hamming which represents the infinite sequence of numbers.

[includes and typedefs just like "almost Haskell" version above]

set<int> set_of_one () {
set<int> s;
s.insert(1);
return s;
}

class hamming
hamming
, const int
, incrementable_traversal_tag
>
{
public:
hamming ()
: s(set_of_one()),
it2(s.begin(), bind1st(multiplies<int>(), 2)),
it3(s.begin(), bind1st(multiplies<int>(), 3)),
it5(s.begin(), bind1st(multiplies<int>(), 5))
{
it = s.begin();
}

private:
friend class iterator_core_access;

void increment()
{
s.insert(*it2); it2++; // Fix bug affecting the first increment.
s.insert(*it3++);
s.insert(*it5++);
it++;
}

const int& dereference() const
{
return *it;
}

set<int> s;
set<int>::iterator it;
scale_iterator it2;
scale_iterator it3;
scale_iterator it5;
};


Fixing the bug in the "almost Haskell" version, and figuring out a way to initialize everything was a firm reminder of why I do my recreational programming in Haskell. It's not just that the C++ version is more verbose; it's more complex and error-prone. A class like hamming can be useful, and "a whole new way of thinking" is not necessary to understand that. Agreed, the problem is not routine, but it remains a good example of the merits of FP.

### One small thing...

I carefully used list rather than set in my "full Haskell" version. The difference is O(n log n) vs O(n) time.

### Another modification

I should mention that the following version behaves slightly more intuitively:

#include <set>
#include <iostream>
using namespace std;
int main()
{
set<int> seq;
seq.insert(1);
set<int>::const_iterator it = seq.begin();

for(int i = 0; i ::const_iterator it2 = seq.begin();
std::cout 

### python version

I should mention that the following version behaves slightly more intuitively:

Right, we end up generating an ordered set of 72 numbers (1, 2, ..., 675, 720) of which only the first 40 can be trivially proven to be consecutive Hamming numbers. The rest of the sequence contains gaps.

For archival purposes, here's the Python version:

# http://mail.python.org/pipermail/python-list/2005-January/262118.html
# http://mail.python.org/pipermail/python-list/2005-January/262277.html
# http://mail.python.org/pipermail/python-list/2005-January/262292.html

# Requires Python 2.4

from itertools import tee, imap

def imerge(xs, ys):
x = xs.next()
y = ys.next()
while True:
if x == y:
yield x
x = xs.next()
y = ys.next()
elif x < y:
yield x
x = xs.next()
else:
yield y
y = ys.next()

def hamming():
def _hamming():
yield 1
for n in imerge(imap(lambda h: 2*h, hamming2),
imerge(imap(lambda h: 3*h, hamming3),
imap(lambda h: 5*h, hamming5))):
yield n
hamming2, hamming3, hamming5, result = tee(_hamming(), 4)
return result

for i in hamming():
print i,


### The version you replied to pr

The version you replied to printed out the first 40 hamming numbers correctly. Replacing 40 in the programme with n would result in the first n hamming numbers being printed. For the record, I guessed that the modification would rectify the problem, rather than reasoning about the programme.

### zerg

I may be reading too much into this quote, but it sounds to me like Koenig and Moo consider it a bad thing to require a "totally different way of thinking about programming".

In my Programming Paradigms & Foundations class w/ Dr. Koenig, so far we've covered Algol, Smalltalk, Icon, APL and Simula. We've tried as much as possible to compare language features to Java, C++, Scheme and Python. We've talked about Djikstra's structured programming and next week we will talk about John Backus' "Can Programming Be Liberated from the von Neumann Style?"

In no way at any time have I gotten the impression that he would think that "a totally different way of thinking about programming" would be a bad thing. This includes the first lecture, which was more or less the article he & his wife published in the 2005 April CUJ.

### Dodo-egg Omelette

Algol, Smalltalk, Icon, APL and Simula... Java, C++, Scheme and Python

Pardon me for being underwhelmed; only one unambiguously FP language in the bunch, and heavily weighted to PLs that predate C++ (and which thereby could have had their features co-opted or rejected during the development of C++.)

Everybody thinks they're open-minded; they just have a few things they think are too weird to consider seriously. ;-)

### Well, I believe he has used A

Well, I believe he has used Algol & APL during his work @ AT&T, but I see what you're saying. Still, I'm not convinced that he's outright rejected functional programming and I don't see his article as a warning to stay away from FP... After all, it is called C/C++ Users Journal, and not Haskell Users Journal...

w/ respect to your post's title, I believe you will only pry Squeak from its users' cold, dead hands. :P Also, I'm pretty sure Icon is where Python got its generators from (I've been wrong about this sort of thing before).

(Is there such a thing as HUJ?)

### Generators

Also, I'm pretty sure Icon is where Python got its generators from

You're certainly OK on this point. It should be noted though that while the concept of generators from Icon made it into Python (largely I believe due to Tim Peters), the way in which generators integrate into the language is entirely different. Icon's underlying expression language is massively different to Python's, and generators are a natural fit into Icon, whereas in Python they're rather restricted. Some well known Icon people feel that Python's generators don't compare to Icon's and I think they have a point.

### Koenig and FP

For what it is worth:

"In his 35+ years as a programmer, Andrew Koenig has written an online student registration system in APL, a software distribution system in a mixture of C and C++, pattern matching libraries in C++, ML..."

Perhaps of more interest:
"An anecdote about ML type inference" By Andrew Koenig
http://www.usenix.org/publications/library/proceedings/vhll/full_papers/koenig.a

### Logic paradigm solution for Hamming numbers

 stratify hamming(N) by [N,hamming]. stratify print(N) by [N,print]. stratify hamming << print. hamming(1). hamming(New) <-- hamming(Old), New is Old*2, New < 100000. hamming(New) <-- hamming(Old), New is Old*3, New < 100000. hamming(New) <-- hamming(Old), New is Old*5, New < 100000. print(Num) <-- hamming(Num). 

via Starlog.

### a Prolog solution

The Starlog solution looks much neater than, say,

/*
http://lml.ls.fi.upm.es/~jjmoreno/jjpapers/expr_imp.ps
*/

generate(dif(Twos, [V2|L2]), dif(Threes, [V3|L3]), dif(Fives, [V5|L5])) :-
select(Twos, Threes, Fives, V),
dequeue(Twos, Twos1, V),
dequeue(Threes, Threes1, V),
dequeue(Fives, Fives1, V),
V2 is 2*V,
V3 is 3*V,
V5 is 5*V,
write(V),  nl,
generate(dif(Twos1, L2), dif(Threes1, L3), dif(Fives1, L5)).

select([X1|_L1], [X2|_L2], [X3|_L3], X1) :- X1 =< X2, X1 =< X3.
select([X1|_L1], [X2|_L2], [X3|_L3], X2) :- X2 < X1, X2 =< X3.
select([X1|_L1], [X2|_L2], [X3|_L3], X3) :- X3 < X1, X3 < X2.

dequeue([X|L], L, X).
dequeue([Y|L], [Y|L], X) :- X \= Y.

hamming :- generate(dif([1|L2], L2), dif([1|L3], L3), dif([1|L5], L5)).


hamming = 1 : map (*2) hamming # map (*3) hamming # map (*5) hamming
where xxs@(x:xs) # yys@(y:ys)
| x==y = x : xs#ys
| x<y  = x : xs#yys
| x>y  = y : xxs#ys


### Number signs?

xs#ys

Are those number signs supposed to be "++"?

### Local infix operator # (merge)

No, # is the infix operator that's defined as a local function in hamming's where clause.

### Back to drawing board

No, # is the infix operator that's defined as a local function in hamming's where clause.

Ah, obvious now that you point it out.

I guess I either have to go back to the drawing board with Haskell, or that code was a bit TOO concise for my own good. ;-)

### Space Complexity

Has anyone done a space complexity analysis of the Haskell program?

### Re: Space Complexity

I think it's fair to ask the same question of the C++ implementation.

In the Haskell case, you need to "remember" (some of) the previously computed numbers. To be more precise, to compute the Nth Hamming number HN, you need to remember the previously computed numbers in the interval of [HN / 5, HN].

In the C++ case, you end up carrying around all of the numbers in the interval of [1, HN] plus "most" of the numbers in the interval of [HN, 5*HN].

Since I don't have the required number-theoretical skills to estimate the sizes of the aforementioned sequence intervals, I ran a quick numeric experiment instead. The following listing shows the first 12 Hamming numbers. The blue lines show the sequence that the Haskell implementation must "remember" at each step. The red lines show the set of numbers computed by the K & M's C++ implementation at each step. To keep the sequences visually aligned, I use underscores to show the current gaps in the C++ ordered set. These show the positions of currently missing Hamming numbers that will be filled at some later point.

In the Haskell case, I use the following conventions to indicate the current end of the "scaled" lazy lists:

• the underlined number is the end of the list scaled by 2;
• the bold number is the end of the list scaled by 3;
• the crossed-out number is the end of the list scaled by 5.
1
1 2 3 _ 5

1 2
1 2 3 4 5 6 _ _ 10

1 2 3
1 2 3 4 5 6 _ 9 10 __ 15

1 2 3 4
1 2 3 4 5 6 8 9 10 12 15 __ __ 20

2 3 4 5
1 2 3 4 5 6 8 9 10 12 15 __ __ 20 __ 25

2 3 4 5 6
1 2 3 4 5 6 8 9 10 12 15 __ 18 20 __ 25 __ 30

2 3 4 5 6 8
1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 __ 30 __ __ 40

2 3 4 5 6 8 9
1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 __ __ 40 45

3 4 5 6 8 9 10
1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 __ __ 40 45 __ 50

3 4 5 6 8 9 10 12
1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 __ 36 40 45 __ 50 __ 60

4 5 6 8 9 10 12 15
1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 __ 36 40 45 __ 50 __ 60 __ __ 75

4 5 6 8 9 10 12 15 16
1 2 3 4 5 6 8 9 10 12 15 16 18 20 24 25 27 30 32 36 40 45 48 50 __ 60 __ __ 75 80


Unfortunately, this doesn't really tell you anything about the asymptotic behavior of each respective implementation.

### Re: Space Complexity

Since I don't have the required number-theoretical skills to estimate the sizes of the aforementioned sequence intervals

(Other than observing trivially that the space complexity is no worse than O(n) in both cases.)

### Not convinced it is trivial

Thanks for taking the time to write out the sets!

While observing the "previously computed" numbers does make it appear as if the Haskell version doesn't require storage greater than N, this is not neccessarily the case. There is also a need for temporary storage as well at each iteration, which could possibly increase asymptotically greater than the rate of N. Is it easy to do the analysis of the cost of the lazy evaluation temporary storage?

### expected Haskell behavior in pseudo-code

... does make it appear as if the Haskell version doesn't require storage greater than N, this is not neccessarily the case.

The O(n) estimate is based on the assumption that given a moderately smart compiler, the Haskell implementation can work internally along the following lines (excuse my bad Scheme):

(define (make-hamming)
(let* ((tail '(1))
(s2 tail)
(s3 tail)
(s5 tail))
(lambda ()
(let* ((result (car tail))
(h2 (* 2 (car s2)))
(h3 (* 3 (car s3)))
(h5 (* 5 (car s5)))
(next (min h2 h3 h5)))
(set-cdr! tail (,next))
(set! tail (cdr tail))
(if (= h2 next)
(set! s2 (cdr s2)))
(if (= h3 next)
(set! s3 (cdr s3)))
(if (= h5 next)
(set! s5 (cdr s5)))
result))))

Storage-wise, this behaves like the blue sequence in my previous comment, provided that the above snippet really works the way I think it does.

### Re: Space Complexity -- Haskell wins

To follow up on my earlier comment

the space complexity is no worse than O(n) in both cases.

In the C++ case, the O(n) estimate cannot be improved, given an implementation like the one fleshed out in marcin's post.

The Haskell implementation seems to be doing a little better than O(n) based on the following empirical data:

Step # Tail size tail size / step #
10 7 0.70
100 44 0.44
1000 228 0.23
10,000 1,108 0.11
100,000 5,253 0.05
1,000,000 24,623 0.02

The table basically says that assuming the Haskell implementation behaves as described in this post, it needs to remember 44 previous numbers in order to compute the 100th. It needs to remember 24,623 previous numbers in order to compute the 1,000,000th. On the face of it, that seems to be more efficient than O(n).

(As a sanity check for my Python script, I should probably ask Anton to verify whether or not the 1,000,000th Hamming number is 519381797917090766274082018159448243742493816603938969600000000000000000000000000000, assuming you count 1 as the 0th number.)

### Sanity confirmed (in some narrow sense)

(As a sanity check for my Python script, I should probably ask Anton to verify whether or not the 1,000,000th Hamming number is 519381797917090766274082018159448243742493816603938969600000000000000000000000000000, assuming you count 1 as the 0th number.)

Yes, it is.

> seq !! 0
1
> seq !! 1000000
519381797917090766274082018159448243742493816603938969600000000000000000000000000000
> elemIndex 519381797917090766274082018159448243742493816603938969600000000000000000000000000000 seq
Just 1000000


That last expression used a cut & paste from the original quoted number, to make sure they matched.

### Approximation

Sloane's on-line encyclopedia of integer sequences is always a useful reference. In this case it provides the approximation
f n = (ln (n * sqrt 30)) ** 3 / (6 * ln 2 * ln 3 * ln 5) + O(lnln n)

for the number of hamming numbers smaller than n. Let's forget about the last term, and invert the formula to obtain the nth hamming number
h n = exp ((n * (6 * ln 2 * ln 3 * ln 5))**(1/3)) / sqrt 30

Test: h 1000000 = 5.183690169e83: close enough for me. Now the tail size, as you call it, is
tailsize n = n - f (h n / 5)

Test:

Step # Tail size Approx
10 7 7.66
100 44 44.52
1000 228 228.31
10,000 1,108 1,108.77
100,000 5,253 5,254.48
1,000,000 24,623 24,624.33

The approximation is not far off. The right-hand side of tailsize can be rewritten as

tailsize
=   (ln 5 * 3*c**(-1/3)*n**(2/3) - 3*(ln 5)**2 *c**(-2/3)*n**(1/3) + ln 5**3/c)
where
c =   6*(ln 2)*(ln 3)*(ln 5)


and this approximately

tailsizeapprox n
=   2.48*n**(2/3) - 2.05*n**(1/3) + 0.57)


So yes, this is sublinear.

Now, all the complexity measures discussed so far seem to assume that integer operations are constant time and integers are constant size. If we take into account that the size of the hamming numbers grows than it changes. The size in bits of the nth hamming number is

sizeh n = log (h n) / log 2

The log nicely cancels out with the exp in (h n),
sizeh n = (2*(c*n)**(1/3)-ln 2-ln 3-ln 5)/(2 * ln 2)
where
c =   6*(ln 2)*(ln 3)*(ln 5)

This function has the form
sizeh n = c1 + c2 * n**(1/3)

We can approximate the size of the first n hamming numbers sum [sizeh n | i <- [0..n]] with the integral
sizehrange n = c1 * n + 3/4 * c2 * n**(4/3)

The size of the hamming numbers in the tail is
tailsize = sizehrange n - sizehrange (f (h n / 5))

Now this gets messy, but the function is still dominated by the n**(4/3) factor, which makes it superlinear. For example tailsize 1,000,000 = 6819368.46 bit = 832KB. Does this seem about right?

### Perl implementation with lazy streams

Thanks for your analysis. Beautifully done. Thanks a million.

For example tailsize 1,000,000 = 6819368.46 bit = 832KB. Does this seem about right?

Since I don't have a Haskell compiler installed, I'm going to have to pass this question on to Anton "the sanity checker" van Straaten. (My Python script doesn't currently discard numbers that are no longer needed and I'm too lazy to fix it.)

As an aside, the AT&T Encyclopedia of Integer Sequences links to a nice article by Mark Dominus (the author of Higher-Order Perl) that shows how to solve the Hamming problem in Perl using Mark's own lazy streams module.

### Aside: time complexity

This reminds me of something I have wondered: what is the time complexity of append (as in ++) in a lazy language? I've heard that it's O(n) just like in a strict language but in my fuzzy mental picture I expect it to be O(1). Know what I mean?

### What, because strictly speaki

What, because strictly speaking, a lazy language ought to print out [1,2,3]++[4,5,6] by printing the elements of the first list, then the elements of a second, rather than running over the first list, adding it into a new list, then adding the elements of the second to that list, then run over that new list printing out its elements?

### IO lists

Right.

In Erlang we have a very useful related datatype called an IO list to represent binary data that will ultimately be written to a file or socket. While we represent plain binary data as either a string (list of 8-bit integers) or a binary (array of 8-bit integers) an IO list is a "deep list" (tree of conses) of these elements. What this means is that if you have two chunks of data to join together you can cons them into an IO list in O(1) time instead of having to concatenate them in O(n) time. In practice this makes it very easy to avoid the pitfall of spending O(n^2) time to accumulate large binary messages from the repeated concatenation of smaller parts.

Take HTML generation in the Yaws webserver for an example:

ehtml_expand({Tag, Attrs, Body}) when atom(Tag) ->
Ts = atom_to_list(Tag),
NL = ehtml_nl(Tag),
[NL, "<", Ts, ehtml_attrs(Attrs), ">", ehtml_expand(Body), "</", Ts, ">"];

If the last line was instead NL ++ "<" ++ Ts ++ ... then in a strict language like Erlang we would take O(n^2) time to generate the HTML. How about in e.g. Haskell?

Creating these IO lists is cheap but of course they will ultimately need to be written to a file descriptor. At this point either the runtime system concatenates the IO list into a byte array in memory (O(n) using relatively fast C code - the list_to_binary primitive) or a writev system call could be used and might avoid ever concatenating the data in memory at all.

This has a lot in common with Boehm's most excellent cord data structure.

### Not sure,

but if I remember correctly, there are both amortized and worst-case O(1) lazy append's given in CTM and I think I saw them in Okasaki too (Purely Functional Datastructures).

### Functional nearly-one-liner

ham1:: [Int]
ham1 = y
where y = [1:merge (map ((*) 2) y) (merge (map ((*) 3) y) (map ((*) 5) y))]


via Clean.

### Why can this not be written a

Why can this not be written as one or two lines? Why are the last two lines separate?

### I just paste them...

...as the author posts them. Call it a two-liner then.

### Top-level sharing

In Clean by default top level functions are reevaluated.
So if you write this as

ham1:: [Int]
ham1 = [1:merge (map ((*) 2) ham1) (merge (map ((*) 3) ham1) (map ((*) 5) ham1))]


then ham1 is reevaluated over and over again.

There's a special definition symbol (=:) to indicate that the "http://foldoc.hld.c64.org/foldoc.cgi?CAF">CAF should be shared.

ham1:: [Int]
ham1 =: [1:merge (map ((*) 2) ham1) (merge (map ((*) 3) ham1) (map ((*) 5) ham1))]


hamming :: Num -> [Num]
hamming n = [i | i <- [1..n], i mod 2 == 0 || i mod 3 == 0 || i mod 5 == 0 ]


via Professor Jeff Rohl using the old Haskell relative, Gofer.

### Correctness...

Pasted verbatim, not checked. Possible errors: logical operator, assumption that all numbers divisible by 2,3, and 5 are strictly Hamming numbers.

### 210?

Isn't a Hamming number. The missing bit of spec is "only", as in "only divisible by 2, 3, and 5".

### Functional one-liner in Mathematica

(Edit. See my later comments for code corrections and remarks. There's nothing wrong with the idea here - though the code is incorrect, it's easily fixed.)

Sort[Flatten[Table[2^i*3^j*5^k,{i,0,5},{j,0,5},{k,0,5}]]]


This one-liner is trivial, lucid, and fast. I'm a C++ guy too, but not provincial. C++ is wrong for Hamming numbers.

Using the same command format on a 1.0 GHz PC, Mathematica 5.1.1 computed and sorted the first 27000 Hamming numbers in 0.3 second. The 27000th number is 6863037736488300000000000000000000000000000.

In all cases memory allocation is optimal because the array is allocated once according to the Table size. The C++ code is another question.

The C++ code fails outright at the 10791st Hamming number. This is the first Hamming number greater than 2^64. If the hardware is only 32-bit, C++ fails at the 1846th Hamming number. Big-ints are the worry, not efficiency. Mathematica handles them seamlessly, as do certain other languages. C++ does not. This example makes being "close to the hardware" a curse.

And I seriously doubt that C++ is faster with "small" Hamming numbers. Mathematica's computation of the first 216 Hamming numbers was essentially instantaneous. They are shown below as output from the one-liner.

{1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36, 40, 45, 48, 50, 54, 60, 72, 75, 80, 81, 90, 96, 100, 108, 120, 125, 135, 144, 150, 160, 162, 180, 200, 216, 225, 240, 243, 250, 270, 288, 300, 324, 360, 375, 400, 405, 432, 450, 480, 486, 500, 540, 600, 625, 648, 675, 720, 750, 800, 810, 864, 900, 972, 1000, 1080, 1125, 1200, 1215, 1250, 1296, 1350, 1440, 1500, 1620, 1800, 1875, 1944, 2000, 2025, 2160, 2250, 2400, 2430, 2500, 2592, 2700, 3000, 3125, 3240, 3375, 3600, 3750, 3888, 4000, 4050, 4320, 4500, 4860, 5000, 5400, 5625, 6000, 6075, 6250, 6480, 6750, 7200, 7500, 7776, 8100, 9000, 9375, 9720, 10000, 10125, 10800, 11250, 12000, 12150, 12500, 12960, 13500, 15000, 16200, 16875, 18000, 18750, 19440, 20000, 20250, 21600, 22500, 24300, 25000, 27000, 28125, 30000, 30375, 32400, 33750, 36000, 37500, 38880, 40500, 45000, 48600, 50000, 50625, 54000, 56250, 60000, 60750, 64800, 67500, 75000, 81000, 84375, 90000, 97200, 100000, 101250, 108000, 112500, 121500, 135000, 150000, 151875, 162000, 168750, 180000, 194400, 202500, 225000, 243000, 253125, 270000, 300000, 303750, 324000, 337500, 405000, 450000, 486000, 506250, 540000, 607500, 675000, 759375, 810000, 900000, 972000, 1012500, 1215000, 1350000, 1518750, 1620000, 2025000, 2430000, 2700000, 3037500, 4050000, 4860000, 6075000, 8100000, 12150000, 24300000}

### Mathematica not comparable to Haskell

Using the same command format on a 1.0 GHz PC, Mathematica 5.1.1 computed and sorted the first 27000 Hamming numbers in 0.3 second. The 27000th number is 6863037736488300000000000000000000000000000.

Compare this to ghc on a 1.3 GHz box. Using the concise source code the program calculates the 27000th number correctly as 3983823817623774429511680, in 0.12 second.

The Mathematica code is very nice as long as you realize it demands care, because it generates many but not all of the Hamming numbers.

### some missing

Your output implies that all Hamming numbers between 7776 and 24300000 are multiples of 5.

coincidently, 7776 = 2^5 * 3^5.

I'm not even a Haskell programmer, but Haskell's spirit is channelling through me and forcing me to point out that it can do this very concisely too:

hamming = sort [2^i*3^j*5^k | i <- [0..5], j <- [0..5], k <- [0..5]]


Of course, it suffers from the same drawbacks pointed out by Scott.

### Templates to the rescue!

The C++ code fails outright at the 10791st Hamming number. This is the first Hamming number greater than 2^64. If the hardware is only 32-bit, C++ fails at the 1846th Hamming number. Big-ints are the worry, not efficiency. Mathematica handles them seamlessly, as do certain other languages. C++ does not. This example makes being "close to the hardware" a curse.

If you look at the sample C++ (iterators) versions above, you can see that by replacing the type held in the list<> or set<> you can get different storages. If you used something like GMP's big int's library, I'd imagine everything would carry on as if nothing had changed. That's where "close to the hardware" gives you choice.

### Templates to the Rescue, Indeed

genneth: If you look at the sample C++ (iterators) versions above, you can see that by replacing the type held in the list<> or set<> you can get different storages. If you used something like GMP's big int's library, I'd imagine everything would carry on as if nothing had changed. That's where "close to the hardware" gives you choice.

On the contrary, that's where parametric polymorphism gives you choice, and C++ is a relative latecomer to the parametric polymorpism game. OK, not as late as Java... :-) In fact, C++'s parametric polymorphism still isn't as expressive as, e.g. O'Caml's. This isn't legal C++:

 template <typename T> typedef foo <T> bar <T>; 

whereas this is legal O'Caml:

 type 'a bar = 'a foo 

### C++ and parametric polymorphism

C++'s parametric polymorphism still isn't as expressive as, e.g. O'Caml's. This isn't legal C++:
template <typename T>
typedef foo <T> bar <T>;

whereas this is legal O'Caml:
type 'a bar = 'a foo

That's true, but not the greatest example, since in C++ you can do
template<typename T>
struct bar {
typedef foo<T> type;
};

and
typename bar<T>::type
is now a type alias.

My first complaint about C++ parametric functions would be that C++ parametric functions can't be passed as arguments.

My first complaint about C++ parametric functions would be that C++ parametric functions can't be passed as arguments.

the lack of first class parametric functions is indeed one of the more serious issues with c++. a solution to the problem is possible through wrapping the parametric function inside a non-parametric object. ugly, but it works:

struct foo {
template<typename T>
static void bar(T t) { }
};

template<typename T>
void test(T f) { T::bar("test"); }

int main() {
test( foo() );
}


while many defiancies in c++ (like the aforementioned generic typedefs) will be addressed in near future, this and similar issues are unfortunately bound to stay because of the kludgy parametricity design.

### Isn't 64 a hamming number?

The sequence above does not contain it.

### ...that works wonders

(Edit. See my later comments for code corrections and remarks. About this notion of finding stand-alone Hamming numbers, the corrected array code uses an isosurface boundary which I expect might help. There may be valid formulas from other inspirations, too.)

One need not compute any previous Hamming numbers to explore all of them:

Sort[Flatten[Table[2^i*3^j*5^k,{i,100,105},{j,100,105},{k,100,105}]]]


works just as fast. The front and back of the resulting output is:

{5153775207320113310364611297656212727021075220010\
00000000000000000000000000000000000000000000000000\
0000000000000000000000000000000000000000000000000,
10307550414640226620729222595312425454042150440020\
00000000000000000000000000000000000000000000000000\
0000000000000000000000000000000000000000000000000,
15461325621960339931093833892968638181063225660030\
00000000000000000000000000000000000000000000000000\
0000000000000000000000000000000000000000000000000,
...etc...
12523673753787875344186005453304596926661212784624\
30000000000000000000000000000000000000000000000000\
00000000000000000000000000000000000000000000000000\
000000}


### Hamming subsequence with no gaps

One need not compute any previous Hamming numbers to explore all of them

If you want to compute a contiguous Hamming subsequence in the interval of [M, N], then you need to find all triples (i, j, k) such that

log(M) < i*log(2) + j*log(3) + k*log(5) < log(N)

If you know a way to do this without computing all previous Hamming numbers less than M, please share.

This is wonderfully exciting stuff! I am having a lot of fun, I hope we keep this thread alive for a while, because I am writing the Unimperative version.

For those who don't remember Unimperative is a functional programming C++ subset I posted about a while back, which has been reposted (at http://www.ootl.org ) after having been taken offline for an overhaul (it now works on GCC 3.4 and no longer needs the Boost library).

The following is not tested (I need a couple of more days) but I wanted to post it for fun, to see if it is comprehensible without documentation, (a tall order I realize, but I am just shooting darts here):

Def Merge =
(Switch,
(IsEmpty, _1), _2,
(IsEmpty, _2), _1,
Default,
(If,
(MakeList, (Head, _1), (Merge, (Tail, _1), _2))
(MakeList, (Head, _2), (Merge, _1, (Tail, _2)))
)
);

Def Merge3 =
(Merge,
(Merge, _1, _2),
_3
);

Def Hamming2 = (Map, _1, (Lambda, Mult, 2));
Def Hamming3 = (Map, _1, (Lambda, Mult, 3));
Def Hamming5 = (Map, _1, (Lambda, Mult, 5));

Def Hamming =
(If, GtZ, _1
(Merge3,
(Hamming2, (Hamming, (Dec, _1))),
(Hamming3, (Hamming, (Dec, _1))),
(Hamming5, (Hamming, (Dec, _1)))
),
(MakeList, 1)
);


It the syntax recognizable to functional programmers? I am not very experienced in functional languages. A little hint _1 refers to the first argument, and a function is evaluated when it follows a left paranthesis, and If is lazily evaluated.

### This looks horrible. At first

This looks horrible. At first I thought it looked like prolog.

### I guess you don't like Scheme

Unimperative syntax was inspired by Scheme. IMO they are quite closely related.

### Inspired by scheme in what wa

Inspired by scheme in what way? As a language whose syntax and style should be avoided at all costs?

### Syntax, dialogue

Inspired by scheme in what way? As a language whose syntax and style should be avoided at all costs?

Wow, great burn!

### What Style Would That Be?

marcin: Inspired by scheme in what way? As a language whose syntax and style should be avoided at all costs?

The style of the Lambda Calculus? The functional style, which at least in the form of SSA even seems to underlie modern imperative/OO language implementations? That style?

Or perhaps you mean the syntax that even John McCarthy didn't want, but later conceded was likely crucial to Lisp's survival? That syntax?

Really, there's very little in the realm of programming language discussion more tiresome than the endless rants about Lisp's "shortcomings." Although the recent ll-discuss post about overcoming C++ "brainwashing" in favor of Lisp surely is up there, too.

### You're really saying that tha

You're really saying that that code looks like an s-expr based language, Scheme, or otherwise?

### Close Enough for Government Work

marcin: You're really saying that that code looks like an s-expr based language, Scheme, or otherwise?

I see a lot of functions in prefixes, and some occasional "syntax" (if, etc.) It looks slightly more verbose than plain s-exprs, but not to the extent of, say, XML.

But that wasn't what I was responding to: I was responding to what I perceived as a typical rant against s-exprs. If I misunderstood, I apologize. I think I let my feelings about the ll-discuss post color my perceptions, albeit, ironically, from the opposite perspective!

### Unimperative s-exprs

The right hand sides of the definitions do look exactly like s-exprs, except that they have commas between their terms. Just dropping the commas and adding meaningful lexical names, which would require Lambda to take a formals list, would make it look a lot like Scheme.

### Yes - that's my point: If you

Yes - that's my point: If you change all of the syntax by dealing with left-hand sides, adding formal lists, removing commas, and not having hanging brackets (a particular bugbear of mine), then you would have something that looks like scheme. In the same way, if you put brackets around every function application in C++, and put operators into prefix positions, you would have something looks like scheme, and absolutely nothing like it looks now.

### If you change all of the sy

If you change all of the syntax by dealing with left-hand sides, adding formal lists, removing commas, and not having hanging brackets (a particular bugbear of mine), then you would have something that looks like scheme
These are all very minor syntactic details as far as a language comparison goes. However, I don't know what is a "hanging bracket".
In the same way, if you put brackets around every function application in C++, and put operators into prefix positions, you would have something looks like scheme

Unimperative is actually C++! Go check out the following links: http://www.ootl.org/uni/uni_tests.hpp.htm and http://www.ootl.org/uni/standard.uni.htm. The Unimperative implementation as C++ is freely available online at: http://www.ootl.org/uni/uni.hpp.htm

I really don't believe it is possible to make C++ much more Scheme like than what Unimperative does.

### Can you explain why the diffe

Can you explain why the differences are "minor"? How are you quantifying language differences?

Finally, whether or not it is possible to make a C++ compiler accept a programme that looks as much like Scheme as you believe unimperative does is not the issue - the issue is that one can specify a translation from C++ to scheme syntax that requires fewer rules than the translation between Unimperative syntax and scheme syntax. The fact is that there is not a trivial mapping to lisp-like syntax from the Unimperative, unlike for example, from XML to S-exprs.

### I can't believe you that it i

I can't believe you that it is a fact that there is no trivial mapping from Unimperative to Lisp syntax. I will demonstrate shortly on LtU, that there is in fact a trivial mapping.

### Death by a thousand cuts

(Edit. Final and correct array code is shown in my last reply, below. The points here remain valid.)

This is not a language shootout or speed contest, but an idiom contest. Haskell is a fine language. So is C++. Mathematica is very nice. All can be fast, and a faster CPU like Scott's will run them faster. We are discussing idioms. My only reason to offer stats was to corral the usual C++ hobby horses of speed and hardware proximity. C++ has no advantages here, which makes the test case a remarkable selection. The big-int problem even gives hardware proximity a disadvantage.

So why did CUJ pick Hamming numbers to compare programming idioms? I am suspicious. If Hamming(N) necessitates imperative stepwise algorithms, then the challenge is rigged. An objective challenge requires an objective test. (Disclosure, I haven't read the CUJ article for lack of access.)

What amuses me is that Hamming(N) is a number-theoretic sort of problem. Few workers in that field think that C++ surpasses tools like Mathematica (Haskell, Lisp, whatever) for such work.

The kind corrections are partly right and partly wrong. Here is the issue. Table generates all forms 2^i*3^j*5^k. It is wrong to say that Table misses some numbers. Set the bounds high enough and it produces all numbers of this type, i.e. "all Hamming numbers." It produces them exhaustively and without duplicates, an important advantage. It permits one to study high Ns without computing previous values, even if without a proper Hamming index. The code is therefore "complete" in a certain sense, if not the right sense according to Hamming.

The bug is a boundary effect. It may be O(N^3), I don't know, but if C++ overflows at 2^64, one must ask which scenario is worse.

The one-liner's avoidance of duplicates might pay for a simple way to fix the boundary using another type of waste. Generate a bigger Table and truncate the bad data. Some clever formula might exist, but a simple test on Hamming(27000) shows that it's indeed a boundary glitch:

In[181]:=
(* one-liner formatted only for LtU wraparound *)
hammingNums = With[{n= 82},
Take[Sort[Flatten[
Table[2^i*3^j*5^k,{i,0,n-1},{j,0,n-1},{k,0,n-1}]
]],Floor[ (n^3) / 2  ]]];

In[182]:=
Length[hammingNums]

Out[182]=
275684

In[183]:=
(* check equality with Scott's num *)
hammingNums[[27000]] == 3983823817623774429511680

Out[183]=
True


There you go. Now Hamming(27000) is 2^81.72042812373361, much larger than 2^64. So, mea culpa: my quickie one-liner has a boundary glitch, while their C++ segfaults on large N.

Incorrect or incomplete assumptions will break any combination of language and idiom. At least I did not waste time writing bad code in C++. That's the neat thing about one-liners, they are throw-away. This one-liner, with whatever fixes it needs (gasp a second line), is certainly not the best or most elegant solution. The assertion is only that a pithy and obvious functional idiom solution exists, one which is reasonable for this problem, and arguably more so than C++ code which breaks at 2^64 or 2^32.

While I could have ported a previous solution into Mathematica as Anton ported mine, porting is not the issue. I wanted to show an entirely new idiom since that was the original exercise. In an almost brain-dead condition the functional idiom still produces something workable and simple.

We can quibble over "workable" but at least it's a fair starting point. Analysis of this one-liner will not be very productive. I would only be curious to see a good boundary formula. However what would help everyone is a list of the first 100,000 Hamming numbers. That data would help those writing novel solutions. It's not my job, though, since I've done my duty here and am going on vacation from Hamming numbers.

### Helping everyone

However what would help everyone is a list of the first 100,000 Hamming numbers. That data would help those writing novel solutions.

That's easy: just take the original Haskell solution or one of its shorter derivatives and type "take 100000 seq" at the Haskell prompt. It's not even a particularly resource-heavy operation. However, the results probably shouldn't be posted in this thread... :)

It's also easy to do comparisons with other solutions, e.g. given the original definition of 'seq' as a correct and contiguous sequence of Hamming numbers, we can do this:
> let hamming n = sort [2^i*3^j*5^k | i <- [0..n], j <- [0..n], k <- [0..n]]
> let test m n = take m (hamming n) == take m seq
> test 581 20
True
> test 582 20
False


Hey, lucky guess! So the first 581 out of the 21^3=9,261 Hamming numbers generated by (hamming 20) are a contiguous Hamming sequence. (I'm guessing that's not going to be the most efficient way of finding such sequences.)

Re the rules of this contest, I was under the impression it most resembled a game of Calvinball. Ha! We're in the "infinity is cheating" zone!

### Re: Death by a thousand cuts (!/0)

Based on the title of your post, I was expecting it to be about Prolog. No such luck.

Table generates all forms 2^i*3^j*5^k. It is wrong to say that Table misses some numbers. Set the bounds high enough and it produces all numbers of this type.

...

Analysis of this one-liner will not be very productive. I would only be curious to see a good boundary formula.

Say,

H = { 2i * 3j * 5k  |  i < N  &  j < N  &  k < N }

Now for the boundary formula. The following subset of H can be trivially seen to contain gaps in the Hamming sequence

W = { h in H  |  h >= 2N }

To ground this in concrete numbers, take N = 10. Then

max(H) = 29 * 39 * 59 = 19,683,000,000,000

W is the wasted subset that consists of all elements of H that are greater than 210 = 1024. The cardinality of H is 1000. The cardinality of W is 913.

I'm too lazy to do real math, but my gut feeling is that you're right. The efficiency of the above one-liner is most likely O(n3).

### I do everything around here

El-vadimo - The array idiom can be O(N) if done right. Just follow through as proposed. The concept has no intrinsic flaw. Declarative arrays need only correct boundaries - here, tetrahedral, a corner cut from a cube.

The proof code below is identical but with a boundary fix. It checks accurately against Haskell to the limits of available memory. Mathematica 5.1.1 runs it without external dependencies.

(* Define a "two-liner" function *)
HammingList[N_] := Module[ { A, B, C },
{ A, B, C } = (N^(1/3)) *
{ 2.8054745679851933, 1.7700573778298891, 1.2082521307023026 }
- { 1, 1, 1 };
Take[Sort[Flatten[Table[ 2^x*3^y*5^z,
{ x, 0, A }, { y, 0, (-B/A)*x + B },
{ z, 0, C-(C/A)*x-(C/B)*y } ]]], N]];

(* Run once to bypass first-run issues  *)
First[Timing[thousand = HammingList[1000];]]

0. Second

(* Check performance *)
runtimes = Table[{N, (1/Second)*First[Timing[
chk =.; chk = HammingList[N];]]},
{N, 0, 1000000, 50000}];

(* Show formatted results *)
{Right}], {7, 3}, NumberPadding -> {"", " "}]

      0    0.
50000    1.
100000    2.156
150000    3.375
200000    4.656
250000    5.969
300000    7.328
350000    8.625
400000   10.125
450000   11.422
500000   12.906
550000   14.344
600000   15.719
650000   17.156
700000   18.578
750000   19.859
800000   21.313
850000   22.828
900000   24.406
950000   25.688
1000000   27.187


These numbers plot a nice straight line showing O(N). Accuracy validates against Haskell output to well over one million numbers.

(* Check 27000th number   *)
chk[[27000]]

3983823817623774429511680

(* Check millionth number *)
chk[[1000000]]

519312780448388736089589843750000000000000
000000000000000000000000000000000000000000


The "millionth" sanity number from earlier is the million-and-first. Yes, this code computes it. Mathematica arrays index from 1. If you want the millionth number, you ask for it; if you want the million-and-first, you ask for that.

ck = HammingList[1000000+1];
Last[ck] ==
5193817979170907662740820181594482437424938166039389696
00000000000000000000000000000

True


All that is why I called premature analysis unhelpful and tried shifting interest to the boundary question. Analysis will always condemn the wrong shape - cubes, spheres, prisms, or buckyballs. None of that condemns the array idiom. The worthwhile effort was to find the right shape.

Armchair analysis shows why O(N) stands to reason. Generating N numbers is O(N). In this code a proportion are O(N^(2/3)) waste, which vanishes because O(N + N^(2/3)) = O(N). The waste is typically a few percent of N (small-N cases aside). This waste compares favorably against duplication waste in Haskell and C++. Formulas called Ehrhart polynomials could eradicate it altogether. Flatten is essentially free. Sorting is O(N log N) in general, but can be O(N) here, since we know a maximum value from the isosurface.

Timing values will vary between machines. The timing experiment is the best simple one I can run. Smaller N happens so fast as to be unreliable, since OS background tasks jump in and out. One could average across many runs, I suppose. Larger N invokes virtual memory, which also breaks validity. Multiple millions of big-ints quickly tax computer resources.

This throw-away code remains terse and easy to understand. It has room for improvement, like any code. People can nitpick it or perfect it. Habitual nitpickers should try the latter option for novelty's sake.

Anton - We're on the same side, man. Quell urges to shoot down allies over small stuff. We both support functional programming.

My lucky guess was deliberately chosen to demonstrate a boundary problem. Naturally the value barely worked. That was the point. On one scale things worked, on another they didn't.

I had hoped someone would take the hint. Boundaries had already been suggested, and that simple go/no-go check validated the intuition. It did not yield formulas, but confirmed merit in a search. Somewhere you lost the plot and began Slashdotting.

After you leave college, coworkers will express concepts and intuition without proofs on a platter. Every good idea starts somewhere. Try to be more encouraging and not kill the baby.

This idiom is just one among many which I've posted. Thank me later for that Haskell find; I like it too.

Infinity police should arrest someone else. Infinitely recursive Haskell streams need it, not arrays. Everyone praises the infinite-stream (Haskell) solutions. C. Hendrie's code is very good, I agree. Let's just be clear about infinities.

A reference list quite obviously doesn't belong inline. The use of Haskell is obvious too. That sparkling brilliance for the obvious is blinding. My suggestion was a means for C++ folks to avoid Haskell. GHC is a 50 MB download/install/learn procedure with a console window, monad I/O, and non-terminating code in this case (CTRL-C and cross your fingers). LtU's site manager, already running the code and skilled in Haskell, could have done the deed in less time than it took to dismiss.

A better offering than a huge list would be a Haskell-compiled executable (not script) that takes N and a file name as arguments, then spills the numbers to disk. Someone else can work on that, maybe Anton, maybe not.

Everyone - The take-home lesson is that functional idioms can solve this problem in several ways. Some may feel more comfortable than others, but take your pick.

Ehud will get a MathReader file with graphics and derivation. LtU still has no way to upload images.

I'm not planning to revisit this page. Questions will have to be answered by the file. The numerical coefficients incorporate some consolidated arithmetic, so don't twist your brain on them. Send email through the web interface if there is something that you desperately want me to know.

### The promised file

Here's Mark's file.

### 7-zip compressed?

% file HammingNumbersDeclarative.7z
HammingNumbersDeclarative.7z: 7z archive data, version 0.2


and there's only an implementation of 7-zip for Windows.

### Ocaml Solution (purely applicative, O(N log N))

Here's the Ocaml version of the C++ code (or rather a version):

module Int = struct
type t = int
let compare (x: int) y =
if x < y then
-1
else if x == y then
0
else
1
end

module IntSet = Set.Make(Int)

let hamming n = (* generate the first n hamming numbers *)
let rec loop i accu set =
if i < n then
let v = IntSet.min_elt set in
let set = IntSet.remove v set in
let set = IntSet.add (v * 2) set in
let set = IntSet.add (v * 3) set in
let set = IntSet.add (v * 5) set in
loop (i+1) (v :: accu) set
else
List.rev accu
in
loop 0 [] (IntSet.add 1 IntSet.empty)
;;


A variation of the Haskell solution is possible, but you'd have to write your own lazy list module (Ocaml doesn't have on in it's standard library).

Hacking the above code to return both the required values and the calculated but not needed values I get:

 Num Required Num Remaining 10 13 20 21 30 26 40 32 50 36 60 41 70 46 80 49 90 54 100 57 200 88 300 117 400 140 500 161 600 183 700 202 800 221 900 238 1000 256

Unfortunately, at this point I overflow integers. Trying to calculate the 1238th Hamming Number, I overflow (on 32 bits). And I don't care enough to recode with larger numbers.

### Recoding the hardware

Trying to calculate the 1238th Hamming Number, I overflow (on 32 bits). And I don't care enough to recode with larger numbers.

On 64 bits (Debian amd64), this same OCaml code overflows when calculating the 10,856th Hamming number.

BTW, my previous Haskell calc of the first million numbers was done on a sedate old 32-bit 450MHz PIII. (Insert entirely unreasonable apples-to-kumquats comparative observation here.)

### Yeah

Yeah- I fiddle with my code a little bit to see if I could compute hamming numbers large enough that it took signifigant amounts of time on my 1.8GHz Athlon-64. No dice (yet).

Here's the thing: from my explorations it looks like to find the nth hamming number, you have to compute about n extra numbers- so the total number of hamming numbers you have to compute is O(N). Each number costs approximately O(log N) to compute (or so- I have some thoughts about that I'll post seperately). Log of 10,000 is approx. 12, so we're looking at maybe 250,000 operations. Even on an old, slow P3, this isn't that much calculation.

### A Lazy Ocaml Version

Not sure what the space complexity here is. Also, I'm not sure I'm being as lazy as I could be (pun intended)- I may be forcing some calculations unnecessarily. But here's the code (this should be the same solution as the Haskell code):

type 'a t =
| List of 'a * 'a t Lazy.t

let rec scale n = function
| List (h, t) -> List((h * n), lazy (scale n (Lazy.force t)))
;;

let rec merge x y =
match x, y with
| List(xh, xt), List(yh, yt) ->
if xh == yh then
List(xh, lazy (merge (Lazy.force xt) (Lazy.force yt)))
else if xh < yh then
List(xh, lazy (merge (Lazy.force xt) y))
else
List(yh, lazy (merge x (Lazy.force yt)))
;;

let rec seq = List(1, lazy (merge (scale 2 seq)
(merge (scale 3 seq) (scale 5 seq))));;

let hamming n =
let rec loop i accu = function
| List(h, t) ->
if i < n then
loop (i+1) (h :: accu) (Lazy.force t)
else
List.rev (h :: accu)
in
loop 1 [] seq
;;


### This is an O(N) solution

I instrumented the code above to count how many times merge and scale are called to calculate the nth Hamming number. Here's my results:

 N scale calls merge calls 2 3 2 4 5 5 8 10 11 16 24 26 32 53 56 64 121 118 128 267 243 256 577 495 512 1226 1002 1024 2573 2021 2048 4951 4051 4096 7312 7985 8192 11408 16177

This looks to me like an O(N) solution.

### Another way to approach C++

It occurred to me that the Haskell code is, to look at it imperatively, specifying each new element of the list in terms of three stateful "cursors" that move down the list. So, if the goal is to write a solution of that runtime order in C++, one can make the list a vector (a list would work too, but it would mean more typing), and sort of encapsulate the state in a struct, and...

#include <vector>
#include <iostream>

using namespace std;
typedef long long foo_t;

struct bar {
vector<foo_t>& v; mutable int i; int m;
operator foo_t() const { return v[i]*m; }
foo_t operator++(int) const { foo_t r=*this; ++i; return r; }
bar(vector<foo_t>& iv, int im):v(iv),i(0),m(im) { }
};

int main() {
vector<foo_t> s(1,1);  foo_t q;
bar x(s,2), y(s,3), z(s,5);
while(s.back()>0) {
cout << s.back() << '\n';
do q=min(x,min(y,z))++; while(q==s.back());
s.push_back(q);
}
}


Not that that's hugely idiomatic (and the mutable is a little evil; perhaps I've been reading too many IOCCC winners lately) but my point, such as there is one, is that it's basically the same algorithm as the "obvious" Haskell solution, done imperatively in C++ without too many (necessary) contortions.

And it's not particularly harder to "target" plain C that way, either. And go back to a linked list for simplicity's sake, and reclaim unused storage... and then sadistically fiddle with the flow control and formatting, because I have been reading too much IOCCC code, and:

#include <stdlib.h>
#include <stdio.h>

typedef long long foo_t;
#define PR_foo "%lld"

int main() {
struct c { foo_t a; struct c*d; } *o=malloc(sizeof*o),
*t=o; struct { foo_t m,n; struct c*b; } *w,x,y,z; w=&x;
x.m=x.n=2; y.m=y.n=3; z.m=z.n=5; (x.b=y.b=z.b=o)->a=1;
goto harmful; while(o->a>0) {
w->n=w->m*(w->b=w->b->d)->a; if (w==&z) free(t);
t=z.b; w=x.n<y.n?&x:&y; w=z.n<w->n?&z:w; harmful:
if (o->a==w->n) continue; printf(PR_foo"\n",o->a);
(o->d=malloc(sizeof*o))->a=w->n; o=o->d; } return 0; }


### back to the drawing board

It occurred to me that the Haskell code is, to look at it imperatively, specifying each new element of the list in terms of three stateful "cursors" that move down the list. So, if the goal is to write a solution of that runtime order in C++, one can make the list a vector, and sort of encapsulate the state in a struct, and...

We have now come full circle. The above is essentially the original solution proposed by Dijkstra, the one that the article describes as "much faster but fairly tricky". I didn't include the corresponding code listing in the original post to save space.

So yes, you can implement an efficient solution in C++. However, efficiency was not the article's primary consideration. The article intended to defend C++ against claims by the FP community that there was no elegant C++ solution comparable in speed and simplicity to, say, the Haskell implementation. The article attempted to share the authors' Aha-Erlebnis that there is, in fact, a C++ solution that is

• quite elegant because "it is hard to imagine a program that follows more accurately our original statement of the problem";
• only slightly less time-efficient than the Haskell solution (it's O(n*log(n)) versus O(n) in the Haskell version);

• written in "idiomatic C++" and thus does not require "a totally different way of thinking".

In light of this, the article claims, the Hamming problem can no longer be held up as a shining example of the alleged superiority of the functional paradigm.

However, the article fails to address the following points:

1. It hides the clunkiness of the fully fleshed out implementation by only listing a mere four-line outline of the main idea.

If the article were to provide a full implementation, it would have become obvious that, as Scott Turner pointed out more than once, the C++ version provides no idiomatic way to access the generated sequence.

2. The Haskell version is substantially more space-efficient.

3. While STL may no longer be considered "a totally different way of thinking", it can be argued that it used to be just that when it was first introduced.

Overall, the article was a good read despite failing to support its conclusion.

The thing I really like about the Haskell approach is that it feels much easier to experiment with different approaches in a delcarative setting.

For instance, we can observe that the given Haskell solution isn't making use of the commutativity of multiplication: the Hamming number 36 is generated five times as 2*2*3*3 and 2*3*2*3 and 2*3*3*2 and 3*2*3*2 and 3*3*2*2.

We could consider annotating the list of hamming numbers with their largest prime factor, to help induce an ordering on the multiplications:

-- using merge as above
scalef n xs = [ (n * x, n) | (x, m) <- xs, m <= n ]
hamf = (1,1) : scalef 2 hamf merge scalef 3 hamf merge scalef 5 hamf
ham2 = map fst hamf


Then we can observe that it's wasteful for 'scalef 2' to have to filter through all the hamming numbers, when it's just generating powers of two. If we stage the lists, we can omit the filtering.
After a few more iterations in ghci, I arrive at:

all_multiples n (x:xs) = a where a = x : merge xs (map (n*) a)
ham3 = foldr all_multiples [1] [2,3,5]


Quite a bit faster than the original, and requires less from 'merge', since it never generates duplicate numbers.

As a fun exercise, you can plug a 'primes' one-liner in for [2,3,5] and construct the natural numbers the hard way. :)

[ edited to correct identifiers ]

### Even errors in reasoning are faster the declarative way

I was sloppy in my reasoning about the redundancy in the original implementation: each Hamming number is generated at most three times, not an arbitrarily large number of times. I should have been suspicious when my "optimized" versions only offered a constant factor of improvement.

Also, all_multiples is too strict to work for regenerating the naturals from the primes. We need a more constrained method which takes advantage of the ordering:

-- multiply_out requires that n is strictly less than (head xs)
multiply_out n xs = a where a = n : merge xs (map (n*) a)
ham4 = 1 : foldr multiply_out [] [2,3,5]

nat = 1 : foldr multiply_out [] primes


### Minor aside

all_multiples n (x:xs) = a where a = x : merge xs (map (n*) a)

would be completely non-strict written,

all_multiples n ~(x:xs) = a where a = x : merge xs (map (n*) a)

I'm too lazy to see if that will make it work. On a side note, multiply_out looks broken (copy/paste error?)

### not strictly strict

I was perhaps, er, lazy when I used the word "strict". What I meant was that all_multiples requires the first element of the list to be passed through from the right. If it's being foldr'd over an infinite list, we never get the initial 1. multiply_out solves the problem by
1) providing the tail of the list produced by all_multiples.
2) requiring that n It boils down to the observation that no amount of laziness will allow us to do a general infinite-way merge. We need to impose some ordering on the heads of the lists to be merged to make it work.

Typo in multiply_out fixed, thanks.

### A little question.

I have a little question concerning the way Haskell evaluates infinite lists. I don't understand why the Haskell algorithm is o(n) while the C++ algorithm is o(n*log(n)). How does Haskell do the computations? doesn't it have to compute all hamming numbers from 1 to N, in order to reach the Nth number? isn't C++ doing the same?

I understand that the Haskell algorithm uses a clever sort that actually merges two already sorted lists, while the C++ one uses binary tree algorithms (the STL set). Is that what makes the difference? or is it something else?

How does Haskell unites lists? doesn't it have to allocate a single linked list node in its garbage collected heap for each value?

### Yes, it needs to compute N-1

Yes, it needs to compute N-1 Hamming numbers before read Nth, but it does not check for duplicates by using any O(n*logn) structure such as set or map templates.

Duplicates are handled in merge operation which merges two ascending lists.

### Wouldn't be possible for the

Wouldn't be possible for the C++ version to use the same merge as in Haskell?

EDIT: Isn't the merging operation slower than using trees?

### Yes, it is possible, because

Yes, it is possible, because GHC compiles to C. ;)

Yes, because you can emulate lazy evaluation.

And sound no because you wouldn't be even closer to count of lines of Haskell version. I tried that with lists with strict heads and lazy tails and I know what I am talking about.

Merging operation is faster that lookup operation for usual sets and maps.

More than that, merging of lazy lists allows you to throw away unnecessary computations on the process of computing some big hamming numbers, and you start to win in memory requirements. Or you can organize computation as memoizing, and you will start to win in speed.

### Is the following C++ solution O(n) ?

template < class R > void hamming(R &r, int n) {
r.push_back(1);
R::iterator it = r.begin(), t;
while (n) {
t = it;
t = insert_sorted(r, ++t, *it * 2);
t = insert_sorted(r, t, *it * 3);
insert_sorted(r, t, *it * 5);
++it;
--n;
}
}


If you can't find 'insert_sorted' in STL, don't worry; here it is(I don't consider it part of the solution since STL could have had it, if it doesn't already). It returns an iterator to the next element after the insertion point:

template < class R, class IT, class V > IT insert_sorted(R &r, IT it, V v) {
for(; it != r.end(); ++it) {
if (v == *it) return ++it;
if (v < *it) return ++r.insert(it, v);
}
return r.insert(r.end(), v);
}


maybe the insertion process can be optimized even more by keeping track of the last inserted element and then go down or up from that.

It's quite elegant, though.

Firstly, your function does not generate the first n Hamming numbers, it can compute as many as the 3n first numbers (in general it will be less due to duplicates, of course). It would be easy enough to fix this by just checking the size of the container after each insertion.

Secondly, it's pretty easy to see that the "gap" insert_sorted has to traverse grows as R grows, so the algorithm certainly cannot be O(n). I'm having a hard time coming up with the exact run-time complexity but in any case it's easy to rule out O(n).

A quick aside: I think you need to make a few changes to make your code work with containers like std::vector because of the insertions (which will invalidate 'it' whenever resizing is forced). In its current state it should work fine with std::list (and this might be what you tested it with).

### Firstly, your function does n

Firstly, your function does not generate the first n Hamming numbers, it can compute as many as the 3n first numbers

I thought the version using sets does that too.

Secondly, it's pretty easy to see that the "gap" insert_sorted has to traverse grows as R grows, so the algorithm certainly cannot be O(n).

Yes, at each iteration the "gap" grows, so it's certainly not O(n). But the algorithm takes advantage of 2*N < 3*N < 5*N to eliminate some searches.

How would it be if 'insert_sorted' searched the remaining list in the reverse direction, i.e. from end to current element?

I think you need to make a few changes to make your code work with containers like std::vector because of the insertions

Yeap, it works for containers that their iterators are not invalidated when an element is inserted.

### For what it's worth..

here is an O(n) C version that more or less mimics what the haskell version does (and doesn't check the return values of malloc, bad me:-)):

#include <stdio.h>
#include <malloc.h>

/* Production code would use a linked list implementation with a
* less braindead allocation strategy than a malloc for every node
*/
typedef struct node {
long val;
struct node *next;
} node;

/* We keep an evaluation context: a pointer to the end of the list of
*  hamming numbers, and three pointers to the next elements of the lists of
*  multiples, together with the next values from the lists
*/
typedef struct context {

node *curr;
node *times[3];
long val[3];
} context;

long fakt[] = {2, 3, 5};

/* The function that calculates the next hamming number for a given
* context
*/
long hamming(context **ctx) {
node *next;
int i;

next = malloc(sizeof(node));
next->next = NULL;

if (NULL == *ctx) {
/* First call, initialise the context */

*ctx = malloc(sizeof(context));
next->val = 1;

(*ctx)->curr = next;
for (i=0; i<3; ++i) {
(*ctx)->times[i] = next;
(*ctx)->val[i] = fakt[i];
}
return 1;
} else {
long min;

/* Find the minimal unseen element from the three lists of multiples */
min = (*ctx)->val[0] < (*ctx)->val[1] ? (*ctx)->val[0] : (*ctx)->val[1];
min = min < (*ctx)->val[2] ? min : (*ctx)->val[2];

next->val = min;
(*ctx)->curr->next = next;
(*ctx)->curr = next;

/* Advance all lists whose next element is equal to the minimum,
* to eliminate duplicates. The third list (times 5) gets a special
* treatment to eliminate inaccesible nodes, the Boehm GC would make
* this unnecessary.
*/
for (i=0; i< 2; ++i) {
if ((*ctx)->val[i] == min) {
(*ctx)->times[i] = (*ctx)->times[i]->next;
(*ctx)->val[i] = (*ctx)->times[i]->val * fakt[i];
}
}
if ((*ctx)->val[2] == min) {
node *tmp;
tmp = (*ctx)->times[2];
(*ctx)->times[2] = (*ctx)->times[2]->next;
(*ctx)->val[2] = (*ctx)->times[2]->val * fakt[2];
free(tmp);
}
return min;
}
}

int main(int argc, char **argv) {
long i, m;
context *ctx;

ctx = NULL;
if (argc < 2) {
m = 25;
} else {
m = atol(argv[1]);
if (m < 1) {
m = 25;
}
}
for (i = 1; i <= m; ++i) {
long h;
h = hamming(&ctx);
if (h < 0) {
fprintf(stderr, "Numerical overflow at hamming number %ld.\n", i);
exit(-1);

}
printf("%8ld: %16ld\n", i, h);
}
exit(0);
}


Oh, by the way, wouldn't it be nice to have the classes htmlize.el produces predifined in the CSS files?

### a long way

Undoubtedly, we've come a long way since C. Looking at the above code and comparing to even the original posted Haskell code and you see how much along we've come in programming language design.

Clearness, conciseness, expressivity... all are missing from the above. And no, putting short variable names in this case actually just make things even worse...

probably, a better approach would be just implement a Haskell compiler in C anyway? :)

### Here is the C++ O(n) solution.

And it's also an iterator.
template < class T, class C = vector < T > >  class hamming_number {
public:
hamming_number(int reserve) : m_v2(2), m_v3(3), m_v5(5) {
m_vals.reserve(reserve);
m_it = m_it2 = m_it3 = m_it5 = m_vals.insert(m_vals.end(), 1);
}

operator T() const {
return *m_it;
}

T operator ++ () {
T v = min3(m_v2, m_v3, m_v5);
m_it = m_vals.insert(m_vals.end(), v);
if (v == m_v2) m_v2 = *++m_it2 * 2;
if (v == m_v3) m_v3 = *++m_it3 * 3;
if (v == m_v5) m_v5 = *++m_it5 * 5;
return v;
}

private:
C m_vals;
C::iterator m_it;
C::iterator m_it2;
C::iterator m_it3;
C::iterator m_it5;
T m_v2;
T m_v3;
T m_v5;
};


I have to admit this though: although I understood the Haskell algorithm, I had not realized how to do it in C++ unless I looked at the C algorithm...and this is because the actual hamming algorithm is quite simple but Haskell did not gave me any clues about how to implement it in C++.

It's pretty simple though once you realize the fact that, at each iteration, a new hamming number is produced by calculating the minimum of 3 other numbers that are somewhere in the list of already computed hamming numbers.

The above code calculates the first 1000 hamming numbers in 0.000040 seconds (time measured using QueryPerformanceCounter under Win32) with all optimizations on and using a vector of 1000 numbers on an Athlon 64 3400+ using MSVC++ 6.0 (which it probably means that the Intel compiler could do an even better job).

### Oz translation

It seems that it is possible to translate the Haskell code to Oz almost verbatim. Just insert some lazy declarations.

Scale = fun lazy {$N X|Xs} N*X|{Scale N Xs} end Merge = fun lazy {$ X|Xs Y|Ys}
if       X == Y   then  X|{Merge Xs Ys}
elseif   X
Seq   = 1|{Merge {Scale 2 Seq}
{Merge {Scale 3 Seq} {Scale 5 Seq}}}



### Alice ML translation

For the sake of completeness, there's an Oz version of the Hamming functions in CTM (4.5.6) that I translated to Alice ML:

fun lazy scale n []      = []
|      scale n (x::xs) = (n*x)::(scale n xs)

fun lazy merge xs nil = xs
| merge nil ys = ys
| merge (xs as x::xr) (ys as y::yr) =
if x  y
then y::(merge xs yr)
else x::(merge xr yr)

val h = promise();
fulfill(h,
1 :: (merge (scale 2 (future h))
(merge (scale 3 (future h))
(scale 5 (future h)))));

### lazy O(n) solution in C++ better than Haskell(!)

Here is a C++ solution that:

• it is O(n).
• it is lazy; it can be used in loops.
• it uses STL; there are no new classes.
• it is only one function; it is very elegant.
• it can use any STL collection.
• it is sorter than the Haskell version (it can be made to be less than 3 lines by combining operations).
• it is (at least) equally concise to the Haskell version.
template < class L, class T > int hamming(L &l, T it[3]) {
if (l.empty()) it[0] = it[1] = it[2] = l.insert(l.end(), 1);
else {
l.push_back(min(*it[0] * 2, min(*it[1] * 3, *it[2] * 5)));
if (l.back() == *it[0] * 2) it[0]++;
if (l.back() == *it[1] * 3) it[1]++;
if (l.back() == *it[2] * 5) it[2]++;
}
return l.back();
}


Usage:

int main() {
list < int > l;
list < int > ::iterator it[3];
for(int i = 0; i < 40; ++i) {
int h = hamming(l, it);
std::cout << h << ", ";
}
return 0;
}



Output:

1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32,
36, 40, 45, 48, 50, 54, 60, 64, 72, 75, 80, 81, 90, 96, 100, 108, 120,
125, 128, 135, 144


I can therefore officially declare that the hamming numbers problem can no longer be used to demonstrate the superiority of functional programming languages; C++ can do it equally well.

### Well now.

The C++ solution is inherently unsafe. (As are C++ templates and typing in general.)

### The post below explained.

Templates are nothing more than string rewriting rules and do not respect typing; and without templates, the C++ type system is so weak that we need to fall back on unsafe tricks like casting to void*, etc.

### In what way...

Do templates "not respect typing"? I thought the whole point of templates was to make type-safe macros (among other things)?

### Uh, no.

Templates fully respect the C++ type system; at least that any template-expanded code must respect the type system. Templates are type-aware, as well--it just isn't a glorified text preprocessor.

Of course, if you do things like foo, the resulting code is not likely to be type-safe. And C++ template code still can take full advantage of all the legacy C holes in the type system (pointer arithmetic, unchecked typecasts, returning pointers to the stack, memory management errors producing pointers to garbage). But the template system doesn't add to the type-unsafeness of C++; if anything it improves it by providing a mechanism for building safe constructs (like vector) which are effective replacements for unsafe ones (like T[]).

### I don't see how.

The template system isn't any more "type-aware" than the old C macro system. It's the compiler that ultimately does the type checking over the expanded code; whereas I was talking more about something akin to structural typing or meta-types or whatever.

See Stroustrup C++0x where he talks about type safety, generics and so-called "concepts".

### Hmm...

The template system isn't any more "type-aware" than the old C macro system. It's the compiler that ultimately does the type checking over the expanded code; whereas I was talking more about something akin to structural typing or meta-types or whatever.

The first sentence seems to bely some ignorance on how C++ templates work. I absolutely reject the claim that the template system "isn't any more 'type-aware' than the old C macro system". Tell me how the template engine can distinguish between types and values if it is type-agnostic. Tell me how it can perform partial specialization. Are you telling me that when I write:

template <int N>
int foo() { return N; }

template <int N[3]>
void foo() { return N[1]; }

That the template engine can't tell the difference? Are you saying that this call is ambiguous?
int main()
{
foo<3>();
}

I guarantee you that the template code above is *not* equivalent to the closest matching macro code:
#define foo0(x) (x)
#define foo1(x) (x[1])

Not only is x not required to be int or int[], in the second instance, x could be something that's not even an array, as long as a suitable operator[] is defined. And saying that the compiler does the type checking makes it sound like the template engine is some disembodied ghost process that runs in some nebulous non-compiling phase, feeding its output to the "compiler proper". Ask Daveed Vandevoorde how easy it is to separate the template engine from the rest of the compiler. Templates are *not* some poor man's term rewrite engine. They are not a panacea by any means, but I think they deserve a modicum of respect for what they can accomplish.

### Equally well?

I think I see some problems with your code. Correct me if I'm wrong.

Interface problems:

1) To compute some Hamming numbers, I need to manually supply an array of 3 iterators. Why? Isn't this an implementation detail?

2) The function's signature doesn't suggest that the types L and T are related.

3) Templates are used, but the return value of "hamming" is a plain old C int :-)

Implementation problems:

4) You're doing a lot more multiplications and comparisons than required. (Certainly more than 2 times more.)

5) Some code is triplicated (2,3 and 5). The constants 2,3 and 5 are each mentioned twice.

### 1) To compute some Hamming nu

1) To compute some Hamming numbers, I need to manually supply an array of 3 iterators. Why? Isn't this an implementation detail?

It is an implementation detail. Supplying a list is also an implementation detail. But it is the nature of the language, and it does not matter, really.

The function's signature doesn't suggest that the types L and T are related.

Templates are used, but the return value of "hamming" is a plain old C int

It can be like this:

template < class L > L::value_type hamming(L &l, L::iterator it[3]);


You're doing a lot more multiplications and comparisons than required. (Certainly more than 2 times more.)

Some code is triplicated (2,3 and 5). The constants 2,3 and 5 are each mentioned twice

The number of multiplications is exactly the same as in the Haskell version. The compiler sees the same multiplications and puts them in local variables.

The number of comparisons is also exactly the same as in the Haskell version: at each iteration, Haskell has to compare the 3 values coming from the 3 multiplications and advance one of the 3 iterators.

Actually, my code is exactly the same as the C version posted above, which (the poster claims) is what Haskell does under the hood.

### Yes, this signature is defini

Yes, this signature is definitely better.

The number of multiplications (and mentions of 2, 3 and 5) in source code matters just as much as the number of multiplications in the compiler output. Don't Repeat Yourself :-)

Also, I'm not sure you're right about the number of comparisons. You're doing them twice. Once inside the min() calls and once in the equality checks. OTOH, the Haskell version has a 3-way branch on the result of each comparison. No, this isn't a "compiler smartness" nitpick on my part - this is a "language expressivity" nitpick :-)

### Ah, sorry. We're talking abou

Ah, sorry. We're talking about different "Haskell versions". You must mean the one in the original post? I mean the "remarkably concise" code posted by Mark Evans - I like it more :-)

### You almost fooled me :-) Y

You almost fooled me :-)

Your code is doing 3 multiplications on _each_ call. Too many. This is because the "next values" of iterators that fail to advance are being recomputed again and again. In Haskell, they're computed only once.

### But isn't Haskell doing 3 mul

But isn't Haskell doing 3 multiplications at each iteration to compute some new list nodes anyway?

### ?

Haskell computes each element of each of the 3 lazy lists exactly once. Or so I think :-)

Anyway you don't need 3*N multiplications to find N Hamming numbers. You should only need N :-)

### More than one for each output element

Haskell computes each element of each of the 3 lazy lists exactly once. Or so I think :-)

But not every multiplication appears in the final list,
as the function merge filters out the duplicates. You need as many multiplications for a hamming number as there are different prime numbers in its factorisation. So; 0 for 1, 1 for {2n, 3n, 5n}, 2 for {2n3m, 2n5m, 3n5m}
and 3 for the others (which is the majority). For example, you need about 2500 multiplications to compute the first 1000 hamming numbers.

Anyway you don't need 3*N multiplications to find N Hamming numbers. You should only need N :-)

Right, see Christopher Hendrie's implementatation our mine elsewhere in this thread.

### Ah, thanks. I was confused, b

Ah, thanks. I was confused, but think I understand now.

axilmar's C++ version: 3*N

original haskell: a bit less than 3*N (but asymptotically, I somehow feel this is 3*N again, because "degenerate" Hamming numbers are more and more rare)

### Back to expressivity

This brings a question - how easy it is now to update each of these programs to report exact number of multiplications it performed before getting to this specific number?

I expect this to require just a quick dirty counter variable in C++. What about Haskell - state monad, or explicit threading of the counter?

[of course, I don't mean the number of prime factors of the number, but a cumulative multiplications along all search paths]

### Count multiplications

Using a state monad or threading will serialise your computation, and you don't want that here. In this case I'd decorate the numbers with the count of multiplications that were used to compute it. You can define an instance of the Num class for these decorated numbers, so that you don't have to change that much to your program.

This requires a bit more thought than incrementing a global variable at each multiplication, but I think that's because the gap between the Haskell program and its execution behaviour is greater.

This gap may be smaller in C++, but it's still there. In the C++ example you could increment a global variable at each multiplication, but perhaps the compiler eliminates the common multiplications and your count is not accurate anymore.

### Non-invasive transformations

Thanks to type classes you don't need to change the function (e.g. ham4, just add a type declaration ham4 :: (Ord a, Num a) => [a] in this case) that generates the humming numbers to be able to count the multiples. A quick and dirty implementation of such decorated number is:


data N = N Integer Integer deriving (Eq, Show, Ord)
order (N _ o) = o

instance Num N where
(N x a) + (N y b) = N (x + y) (max a b)
(N x a) - (N y b) = N (x - y) (max a b)
(N x a) * (N y b) = N (x * y) (a + b + 1)
negate (N x a)    = N (negate x) a
abs (N x a)       = N (abs x) a
signum (N x a)    = N (signum x) a
fromInteger       = make . fromInteger
where make n = N n 0


For the millionth hamming number it gives 175 multiplications.

It's also easy to use the same technique to collect each operand:


data H = H Integer [Integer] deriving (Eq, Show, Ord)
terms (H _ ts) = ts

instance Num H where
(H x a) + (H y b) = undefined
(H x a) - (H y b) = undefined
(H x a) * (H y b) = H (x * y) (a ++ b)
negate (H x a)    = H (negate x) a
abs (H x a)       = H (abs x) a
signum (H x a)    = H (signum x) a
fromInteger       = make . fromInteger
where make n   = H n [n]


This shows that first all 2s are computed, then 3s and finally 5s. Another implementation shows how many of each:

238 3109 529 = 519312780448388736089589843750000000000000000000000000000000000000000000000000000000.

### C++ version

It's invasive, but it's smaller:
class Int {
Int(int x) : x_(x) { }
friend int operator*(int x, Int y) { ++mults_; return x * y.x_; }
static int mults_;
int x_;
};
int Int::mults_ = 0;


In the C++ version, replace the constants with Int(x) Basically 7 lines. However, I will admit that the Haskell version does more with each line. But at least part of that is because the language strips out all unnecessary syntax, whereas C++ takes a more conservative approach. Also, Haskell overloads operators even more than C++ does. And, being an FP, the intrinsic data structures are naturally richer, leading to a more concise syntax; whereas C++ elects to make anything bigger than a machine word into a library. Finally, the biggest gains Haskell makes are due to the fact that it treats functions as first-class values, which is as one should expect.

Unfortunately, I think it is too late to make C++ into an even passable FP, but I hope that Scala continues to be developed, because I think it sits in a really interesting (and I suspect very, very powerful) point in the design space.

### Here's a slightly better version...

#include <iostream>
#include <list>

template <class L>
typename L::value_type hamming(L &l)
{
static int const p[] = { 2, 3, 5 };
static typename L::iterator it[3];
if (l.empty()) it[0] = it[1] = it[2] = l.insert(l.end(), 1);
else
{
typename L::value_type const f[3] = { *it[0] * p[0], *it[1] * p[1], *it[2] * p[2] };
int const i = f[0] < f[1] ? (f[0] < f[2] ? 0 : 2) : (f[1] < f[2] ? 1 : 2);
l.push_back(f[i]);
for (int i = 0; i != 3; ++i) if (l.back() == f[i]) ++it[i];
}
return l.back();
}

int main()
{
std::list<int> l;
std::cout << hamming(l);
for (int i = 1; i != 40; ++i)
{
int h = hamming(l);
std::cout << ", " << h;
}
std::cout << std::endl;
return 0;
}


This version obviously requires that operator*(typename L::value_type, int) be defined in the sensible way, as well as requiring L::value_type to be LessThanComparable and EqualityComparable, but any reasonable numeric type will need to satisfy those constraints anyway.

It's a bit long, but mostly that has to do with the fact that C++ template code tends to be a little verbose.

### Smooth numbers - generic Hamming

I suggest making the task harder by requiring parametricity in primary multipliers - make 2, 3, and 5 just a special case, appearing only in the client code at the invocation/instantiation/etc. site.

Will Haskell version still be better than C++?

### Template parameters can be constants.

The 'hamming' function can be paramerized on the primary factors by using template constants. For example:

template < int P1, int P2, int P3, class L > L::value_type hamming(L &l, L::iterator it[3])


And used like this:

hamming<2, 3, 5>(l, it);


### Errata

Um, I didn't mean limiting the requirements to exactly three factors.

On other issue: your client code seems not using "it" in any way other than passing it into the "hamming", am I right?

### You can generate...

The code to support an arbitrary number of parameters using the Boost.Preprocessor Library, but Paul Mensonides is probably the only person who could do so in an hour or less. Of course, the arity will be limited by the limitations of the C++ preprocessor, but I'm sure you could get a fairly high number of factors.

### Phase limitation

What if I learn these factors only at runtime? That's actually my main reservation about expressive power of C++ meta-programming - the fact that it is exactly meta. I would prefer an approach that would not impose a choice of phase of evaluation on me.

### Example?

Can you cite a real-world example of an algorithm where you know the fundamental parameters at compile time sometimes, and at runtime other times?

I've seen all sorts of crazy optimizations for FPS games. I wouldn't be surprised if some people have specialized their code to know image sizes at compile time.

I like partial application/evaluation because then I don't have to choose. (Yes, I know that only partial evaluation is compile time.)

--Shae Erisson - ScannedInAvian.com

### I'd be surprised if it happen

I'd be surprised if it happened now. Of course, there was a time when compiled sprites were worth using...

### It happens

On PC-level computers it happened a few years ago in massive "real-time" strategy game engines like Cossacks. I am possitive it still happens on lower-end devices like PDAs and mobiles.

### Reframing

I don't see this property as intrinsic to any algorithm. If any algorithm is to be reusable, it should not make decisions for its clients. Or, to put it another (milder) way, the more decisions you make for clients, the less clients you have.

I agree with shapr that computer games are especially prone to pushing cycles from runtime to compile time (Paul/Tim?), but in fact any application living on the edge of feasibility given the hardware and other parts of environment would do that.

### I Was Just Going to Observe...

... that it's dirt common in games to have huge amounts of data at game build-time that you could simply pump through your renderer at runtime, but you wouldn't get acceptable performance that way, so some data-driven calculations are done at build-time. For example, games tend to do "potentially visible set" (PVS) calculations at build-time so as not to bother rendering polygons that wouldn't be visible to the user anyway at runtime. More recently, a great deal of work has been done in the application of partial evaluation to raytracing and to specializing complex shaders. It's interesting to note that Mark Leone, whose name is well-known in partial evaluation, runtime code-generation, and Standard ML circles, was ultimately hired by Pixar and wrote an important paper on the Multi-Pass Partitioning Problem. His name also appears in the credits of "The Incredibles."

Shaders are where it's at these days, and a lot of attention is being paid to functional programming and partial evaluation in the context of shader algebra and specialization. It would be interesting to hear from Tim how Unreal Technology 3's artist-controlled visual shader development tools do (or do not) represent a particular form of specializing shader compiler.

### No problem with run-time parameterization.

The function would become:

template  L::value_type hamming(L &l, vector  &it, const vector   facts);


The code will use the supplied vector of iterators and factors, and use for-loops inside the code to perform the computation.

### Paul Mensonides is probably t

Paul Mensonides is probably the only person who could do so in an hour or less.

No disrespect to Paul, but if you can afford to use a C99 preprocessor, anyone who knows FP should be able to write the generator in a matter of minutes using the Order interpreter. See here for examples. (For the record, given a choice, I wouldn't use C++. I have given up on C++.)

### Order vs. Chaos

Heh. This debate always sounds like the fate of the universe hangs in the balance of some preprocessor metaprogramming libraries (PMLs). ;> One thing that the existence of PMLs has convinced me of is that there must be a better way. ;) That better way, I think, is a language that allows you to jump levels of abstraction seamlessly, like Lisp/Scheme, but within an imperative/static typing context, which is probably about as painful and difficult to implement as it sounds. I sympathize with your sentiments re: C++. I think it's a fine language in some respects, and I think it deserves some respect for the areas in which it broke ground; but I fear that true progress can only be had by starting fresh. With no disrepect to Paul or the other Boost.PP lib authors, my comment is as much a commentary on the Boost.PP lib as it is a compliment to Paul's skills.

### Order-PP

I haven't seen Order before. I'd appreciate better links if there are any. Looks cool!

### Order-PP was my hobby project

I don't think that there are considerably better links than the examples at SF. Order was my hobby project for some time, but, like I said, I have given up on C++ (meaning: I try very hard to avoid wasting any more time with C++).

The roots of Order can be traced back to this post. After that I switched to a C99 preprocessor and discovered several improvements, which improved performance (in some important cases) by two orders of magnitude and made interpretation "practical" in some sense. I spent one summer to make an implementation of the Order language culminating in this post. Since then I have come to the conclusion that I don't want to waste more time with C++ and the Order project is (consequently) on hold. The actual Order interpreter is basically finished (just checkout the code from SF CVS). Documentation (except for the examples) is incomplete.

Thanks

### Ok, let me try the Haskell version.

If no one else does it, let me try my first Haskell program:

merge xs [] = xs
merge [] ys = ys
merge (x:xs) (y:ys)
| x == y    = x : merge xs ys
| x <  y    = x : merge xs (y:ys)
| otherwise = y : merge (x:xs) ys

ham factors = seq
where
seq = 1 : (foldr mergeScaled [] factors)
mergeScaled n xs = merge xs (map (n*) seq)

First timer issue: I find it odd that the arguments to mergeScaled have to be n xs, when the order in foldr is [] factors.

I also tried an imperative version in Javascript. It was telling that getting the Javascript version right was harder than getting the Haskell version right, while this was my first Haskell program, and I program in JS daily.

### Smoothies

You can generate the pn+1-smooth numbers from the pn-smooth numbers by mixing in pn+1 (pn being the nth prime):

smooth :: [Integer] -> [Integer]
smooth l@(h:t) = next
where
p = missing  l
next = h : merge (map (* p) next) t


The function missing finds the first gap in the previous smooth sequence, which is the next prime.

missing :: [Integer] -> Integer
missing l@(_:t) = head [n+1 | (n,m) <- zip l t, n+1 /= m]


Now we can generate all smooth sequences by iterating over the 2-smooth sequence (the powers of 2):

smoothies :: [[Integer]]
smoothies = iterate smooth (iterate (* 2) 1)


For example the hamming sequence (5-smooth) is the
third element from this list, so the following test function should not terminate.

test = smoothies !! 2 == ham


Interestingly enough the more general version is almost twice as fast as the direct version, but I have yet to figure out why.

We now also have a odd way to generate the odd primes:

oddPrimes :: [Integer]
oddPrimes = map missing smoothies


### An unexpected bonus

Interestingly enough the more general version is almost twice as fast as the direct version, but I have yet to figure out why.

It would be nice to know whether it is just a quirk of the compiler, or a genuine improvement of algorithm. If we only had compilers inferring algorithmic complexity for us (say as a part of type inference process)...

### No duplicate composites

The general version is faster because it doesn't generate any duplicate composites. The corresponding specialised hamming function is:

ham = c5
where
c2 = 1 : map (* 2) c2
c3 = 1 : merge (map (* 3) c3) (tail c2)
c5 = 1 : merge (map (* 5) c5) (tail c3)


Because the algorithm doesn't generate duplicates, the following merge function suffices:

merge (x:xs) (y:ys)
| x 

### Conclusion

I suspect that any lazy functional algorithm can be translated to an imperative one by using these simple rules:

1) a lazy algorithm can be encoded as an imperative function which transforms some input according to some computation.

2) the imperative implementation needs 1 collection and a number of iterators equal to the number of references of the lazy function in the right side of the equation. For example, the Haskell version mentions 'hamming' 3 times at the right hand side, so 3 iterators are needed for the imperative algorithm.

The above is from intuition, I must say.

### "Why Functional Programming Matters"

The above named paper is a rather old paper (but a still a good read), but has some more complicated examples using lazy evaluation that you could use to test your hypothesis. Particularly section 5 which implements the alpha-beta heuristic.

Why Functional Programming Matters

### Ugh.

That papers really irks me.

While it proposes a nice list of hacks, this sort of thing will not impress anybody who does industrial programming. Quite the contrary, I think.

### Tried something like that, but didn't really work (yet)

I tried an OCaml solution to the Hamming problem. It is listed below.

It uses a small trick with a continuation/coalgebraic flavor. An infinite lazy sequence can be modelled as a datatype which consists of a head, and a tail function with will give the next head and the next tail function.

It's my first ten minutes try, it doesn't work for large lists. Probably there is some reevaluation taking place (no sharing?) - will look at it again at some point.

Anyway, here's the code; please spot the bug ;-)

type num = int

type clist = H of (num * (unit -> clist))

let rec merge: clist -> clist -> clist =
function H(x, fx) -> function H(y,fy) ->
if x < y then H(x, function _ -> merge (fx()) (H(y,fy)))
else if x == y then H(x, function _ -> merge (fx()) (fy()))
else H(y, function _ -> merge (H(x,fx)) (fy()))

let rec scale: num -> clist -> clist =
function s -> function H(x, fx) -> H(s * x, function _ -> scale s (fx()))

let rec hamming: clist =
H(1, function _ -> merge (scale 2 hamming) (merge (scale 3 hamming) (scale 5 hamming)))

let rec print_clist: num -> clist -> unit =
function n -> function H(x,fx) ->
if n = 0 then print_newline () else
(print_int n; print_string " : "; print_int x; print_newline (); print_clist (n-1) (fx()))

let _ = print_clist 1000 hamming



[Edit: hmm, might be because I de/reconstruct all the H-fields]

[Edit the edit: can't be it, either I am explicitly copying -am I?- or ocaml doesn't have a true graph rewriting semantics -does it?- which would make sense I guess...]

### "some" reevaluation...

Your evaluation order is busy, rather than lazy, it reevaluates everything, all the time, try

let rec scale: num -> clist -> clist =
function s -> function H(x, fx) ->
Printf.fprintf stderr "Evaluating %d * %d\n" s x;
H(s * x, function _ -> scale s (fx()))


You should change your lazy list to

type clist = H of (num * clist Lazy.t)


and sprinkle your code with magic pixie dust the approriate lazy and Lazy.force statements.

### Great

Thanks for the heads-up. Yeah, I know you can do it like that. But that wasn't really the point of the exercise: I wanted to try and do it without the lazy keyword...

### C++ solution on par with Haskell

Here is another way to do it in C++ using lazy lists:

list < int > hamming = val(1) >> next(min(2*hamming >> 3*hamming >> 5*hamming));


The list would be used like this:

iterator < int > it = hamming;
for(int i = 0; i < 40; i++) {
cout << it() << ",";
it = tail(it);
}


Notes:

• the operator >> is used to define a cons of lazy expressions.
• the val function returns a functor that, when evaluated, it will return the given value.
• each reference to hamming at the right side of the equation creates one iterator instance which is initialized to point to the first element of the list when the object hamming is constructed.
• The operator * between a number and a lazy iterator would return a pair of {value * value of iterator, iterator} when evaluated.
• The operator >> between {value, iterator} pairs would create a temporary array of the pairs.
• The min function would scan the given array of {value, iterator} pairs and would return an array of references to iterators that contain the iterators that belong to pairs that their values are equal to the minimum of the values of the pairs.
• the next function would create a new list node that would be the next node of the current node contained in the iterator it; the value of the node would be the value of the first iterator contained in the array that is the result of min; the iterators contained in the result of min would be set to point to the next element.
• The first tail(it) would set the internal pointer of it to point to the next lazy element of hamming.
• The subsequent invocations of tail(it) would create the next hamming value by invoking the lazy expression created above.

The above can be used in any lazy list. For example, the Fibonacci sequence would be defined as:

list < int > fib = val(0) >> val(1) >> next(fibs + tail(fibs));


The above formula would create two iterators (since there are two fibs references at the right side of the formula). Initially, both iterators would point to value 0. At the third extraction, the new value would be the value of iterator 1 (0) to the value of the next element of iterator 2 (value 1). The function 'next' would be used to advance the iterators passed to it to their next element.

EDIT:

Yeap, it can be done, thanks to C++ operator overloading. So the hamming algorithm can be done better in C++ than in Haskell...in just one line of code!

### One line of code?

So the hamming algorithm can be done better in C++ than in Haskell...in just one line of code!

One line of code? What about all the lines of code to implement operator *, operator >>, val, min, etc? It seems to me that you end up implementing a crippled version of a subset of Haskell in order to achieve your "one line of code". I'm reminded of Greenspun's 10th Rule of Programming.

### Indeed, I implemented some su

Indeed, I implemented some subset of Haskell (along with some hacks), just to show that it can be done. Aside from being a cool hack, the code can be reused for doing any kind of lazy lists.

Of course since Haskell is available, there is no point in using C++ as Haskell...but it is a nice exercise anyway.

By the way, I think it is possible to do it exactly like in Haskell with complete lazy evaluation, and writing the merge function as a functor object, like this:

fun2 merge = _if(_1 == _2, merge(tail(_1), _tail(_2)))
| _if(_1 < _2, _1 >> merge(_1, _tail(_2))
| _2 >> merge(_tail(_1), _2);


The above functor object would return an operation object that would be attached to the list and evaluated when needed.

### You're kidding, right?

Yeah, right. We're pretty close to expression trees and an evaluator... the next step would be a parser and interpreter.

Function merge = new BinaryFunction(new Branch(new Case(new EqualCondition(Parameter.FIRST, Parameter.SECOND), ...et cetera, ad nauseam... and we get garbage collection for free!

Surely there's got to be a limit somewhere. For me, the limit is simple: Don't Write Bad Code. The code you supplied is so obviously Bad, I'm embarrassed even to have to explain it.

After all, C++ is the devil for LtU members. But the truth is, since C++ allows operator overloading and value type objects, a lazy library can easily be written that does exactly what Haskell does and with the same expressitivity.

And your Java example is simply wrong, because what makes the hamming algorithm be beautiful it is the way it is expressed using operators. Since Java lacks operator overloading, it can not be considered equal in expressitivity.

So there has got to be a limit somewhere. For me, the limit is simple: don't say stupid things. The example you supplied is so stupid, I am embarrassed even to have to write about it.

### I have to agree

That operator overloading was one of the smartest things Bjarne added to C++. Yes, yes, it's "just" syntactic sugar, but we are used to seeing and reading symbols, especially when they are a powerful shorthand for important concepts. Imagine if we taught schoolchildren to do math like so:
       four
equals eight


It's pretty much exactly equivalent to the usual notation, except that it's much harder for us to read, not at least partly because it's so verbose. I'm sure part of what makes Haskell so appealing to some programmers is how concise it is. And it achieves much of that syntactic parsimony through the heavy use of symbols. A more verbose Ada-like dialect of Haskell where all the symbols are replaced by word tokens would probably cause FP programmers to recoil in horror.

I would argue that choosing to *not* include operator overloading was one of the biggest failures of Java's design, and is why it would be utter folly to even attempt to implement half the libraries that exist for C++, including Spirit++, Blitz++, Boost.Iterator, etc. Sometimes I look upon libraries like Boost.Lambda with a combination of admiration and loathing, but you have to admit that for a 20 year old statically typed imperative PL, C++ does all right for itself.

### Hmmm...

A more verbose Ada-like dialect of Haskell where all the symbols are replaced by word tokens would probably cause FP programmers to recoil in horror.

You mean like Scheme...? ;)

### Beauty is in the eye of the beholder

I'm all for judicious use of operator overloading, but I think it a mistake to assume that the elegance of the Haskell code is due to operators. The use of operators in the Haskell code is pretty bare minimum and there's nothing particularly out of the ordinary for those that are used ([] : * = ==

So the question back is why is the Haskell code considered elegant? FWIW, the Oz code and the Alice ML code are fairly close to the Haskell code, so perhaps the answer is not the Haskell Type system. For sure, you have to have lazy functions. (IMHO, any solution that uses iteration and array pointers misses the whole point to the elegance of the recursion).

As an aside, APL has always been on the extreme in terms of usage of operators. For those that understand the symbols, APL provides a power for manipulation that is as expressive as any language. For those that don't understand the symbols, APL can look like chicken scratch. The biggest problem with APL is that you can't define new operators that act in a first class fashion.

The use of symbols in code can sometimes aide readibility. It can also sometimes make it harder to grok. Not particularly caring about C++ at this juncture in history, I find it weird that the use of such symbols as >> and

### Syntactic brevity

The use of operators in the Haskell code is pretty bare minimum and there's nothing particularly out of the ordinary for those that are used ([] : * = ==

In that example, although Haskell uses a lot more symbols than that. But perhaps another major source of conciseness is the type inference. That takes a *lot* of identifiers out of the code that you find in manifestly typed languages.

The use of symbols in code can sometimes aide readibility. It can also sometimes make it harder to grok.

Can we say: "awk/sed"?

Not particularly caring about C++ at this juncture in history, I find it weird that the use of such symbols as >> and

Well, I've never understood this criticism of C++. If you want to concatenate stream operations, you need to have a stream operator. Was there some de facto stream operator before ? It makes perfect sense to me, since most *nix shells use > and .

### Expressivity

I was way too harsh and vague in my comment about "bad code". Excuse me, I'll try not to do this in the future.

I'll try to explain my point in some more detail, if you're still listening...

I easily grant that, given any concrete task, C++ can be made as expressive as Haskell for this task, by writing a (quite large) supporting combinator library (operator overloading and expression templates to build/evaluate syntax trees).

But the "bad code" I blurted out isn't just an expressivity measure. The criterion of "good code" I have developed over the years is as follows: good code should be right, and it should be obviously right.

Of course, there are different grades of "obviously", so it's a very subjective metric. But the experience of many people shows that it's VERY hard to write "obviously right" code in C++.

For a quick mental test, look back at the code you supplied (_if, _1, _2 and stuff). Imagine debugging/maintaining just a couple hundred lines of such code, freely mixed with "normal" C++ and - for kicks - also using some other expression-template combinator library (e.g. Boost Spirit).

I can easily imagine working with tasks of similar complexity in Haskell - though I'm a Haskell newbie at best.

For example: you alluded to my mock Java sample as "stupid", but... dont' you think that, if you're embedding a functional language within C++, you're going to need a garbage collector at some point? Yes, value types help, but they don't solve all problems. OCaml has a GC. Haskell has a GC. And even Java has a GC... so my example wasn't so stupid after all :-)

See? This semantic issue is much more important than the syntax issue (operator overloading) you pointed out.

### No GC ;>

For example: you alluded to my mock Java sample as "stupid", but... dont' you think that, if you're embedding a functional language within C++, you're going to need a garbage collector at some point? Yes, value types help, but they don't solve all problems. OCaml has a GC. Haskell has a GC. And even Java has a GC... so my example wasn't so stupid after all :-)

Well, the most important value type in C++ is the smart pointer. ;> Not only can you use vanilla strategies like reference counting, you can use a good smart pointer framework to implement a garbage-collected pointer. Of course, if you absolutely must have a "pure" garbage collector, you can just link one in to the C++ program as well. Not only does C++ come with the kitchen sink, it comes with an optional garbage disposal too! ;>

### Just for the fun of it...

Here it is in Lua, with a fairly complete lazy infinite sequence implementation using promises:

### Python version

Okay, that is a challenge. :)

def scale(n, it):
for e in it: yield n * e

def cons(x, it):
yield x
for e in it: yield e

def merge(xs, ys):
x, y = xs.next(), ys.next()
if x == y:
for e in cons(x, merge(xs, ys)): yield e
elif x
This basically follows the Haskell version point for point except that it's done using generators and iterator algebra (by the way, the above merge function only works with infinite iterators since it doesn't catch StopIteration).

### That's very pretty, but...

>>> from itertools import izip
>>> for i, e in izip(range(100000), seq()):
...   if i % 10000 == 0:
...     print i, e
...
0 1
Traceback (most recent call last):
File "", line 1, in ?
File "", line 3, in seq
<snip />
File "", line 3, in cons
File "", line 2, in merge
RuntimeError: maximum recursion depth exceeded


To be fair to Python, a nice fast working solution is in here, from a thread previously referenced in this topic.

### Well...

The Haskell version has the same number of recursions as my Python version, up to a constant factor. None of the recursions in either version are in tail call position, so they cannot be optimized away. So as far as expressivity at the language level is concerned, I don't think the maximum recursion depth bail-out is very relevant--if you tested it on Stackless (or any other Python implementation with a priori unbounded call stack) there would be no issue aside from performance (a function call in Python is at least two heap allocations).

By the way, I wasn't trying to present my code as "the best" or whatever, I just wrote it up to see how close to the spirit of the Haskell code I could get without doing any insane stunts. I figured I might as well share it.

### Sharing vs. reevaluation

The Haskell version has the same number of recursions as my Python version, up to a constant factor.

I'm just a casual Python user, but it seems to me that your version starts new generators for each recursive call to seq(), whereas the Haskell version shares the ham list.

None of the recursions in either version are in tail call position [..]

But the recursive calls of ham are on lazy positions and the Haskell program only needs constant stack space.

### Yes

You're absolutely right, that's what I get for only being a casual Haskell user (if even that). Oh well.

Just for kicks, here's a Scala solution:

object hamming extends Application {
import java.math.BigInteger

def merge(xs: Stream[BigInteger], ys: Stream[BigInteger]): Stream[BigInteger] =
if (xs.isEmpty) ys
else if (ys.isEmpty) xs
else {
val cmp = x compareTo y

if (cmp == 0) Stream.cons(x, merge(xs.tail, ys.tail))
else if (cmp < 0) Stream.cons(x, merge(xs.tail, ys))
else Stream.cons(y, merge(xs, ys.tail))
}

def seq: Stream[BigInteger] =
Stream.cons(BigInteger.ONE,
merge(seq map new BigInteger("2").multiply,
merge(seq map new BigInteger("3").multiply,
seq map new BigInteger("5").multiply)))

Console.println("The first 20 numbers:")
Console.println (seq.take(20).mkString("", ", ", ""))

Console.print("\nThe 100th number (counting from 0): ")

Console.println("\nThe 1000th:")
}

It's a real shame it doesn't work. You can compute the first 100 numbers, but you can't compute 1000.

$scalac hamming.scala$ ls -lh
total 80K
-rw-rw-r-- 1 vadim vadim 1.1K Sep  7 19:59 hamming$$anonfun0.class -rw-rw-r-- 1 vadim vadim 1.1K Sep 7 19:59 hamming$$anonfun$1.class -rw-rw-r-- 1 vadim vadim 1.1K Sep 7 19:59 hamming$$anonfun2.class -rw-rw-r-- 1 vadim vadim 1.5K Sep 7 19:59 hamming$$anonfun$3$$anonfun4.class -rw-rw-r-- 1 vadim vadim 1.5K Sep 7 19:59 hamming$$anonfun$3$$anonfun5.class -rw-rw-r-- 1 vadim vadim 1.5K Sep 7 19:59 hamming$$anonfun$3$$anonfun6.class -rw-rw-r-- 1 vadim vadim 1.3K Sep 7 19:59 hamming$$anonfun$3.class -rw-rw-r-- 1 vadim vadim 868 Sep 7 19:59 hamming.class -rw-rw-r-- 1 vadim vadim 2.4K Sep 7 19:59 hamming$.class
$time scala hamming The first 20 numbers: 1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36 The 100th number (counting from 0): 1600 The 1000th: java.lang.OutOfMemoryError: Java heap space real 0m11.424s user 0m9.053s sys 0m0.344s ### still broken under Scala 2.6... Under Scala 2.6 / Sun JDK 1.6.0_02: $ sudo rpm -ivh http://www.scala-lang.org/downloads/distrib/files/scala-2.6.1-1.i386.rpm

$java -version java version "1.6.0_02" Java(TM) SE Runtime Environment (build 1.6.0_02-b05) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_02-b05, mixed mode)$ cat -n hamming.scala | head -14 | tail -3
12        if (cmp == 0) Stream.cons(x, merge(xs.tail, ys.tail))
13        else if (cmp < 0) Stream.cons(x, merge(xs.tail, ys))
14        else Stream.cons(y, merge(xs, ys.tail))

$time scalac hamming.scala real 0m4.721s user 0m4.296s sys 0m0.169s$ time scala hamming
The first 20 numbers:
1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36

The 100th number (counting from 0): 1600

The 1000th:
java.lang.OutOfMemoryError: Java heap space
at scala.Stream$cons$.apply(Stream.scala:45)
at hamming$.merge(hamming.scala:14) at hamming$$anonfunmerge1.apply(hamming.scala:12) at hamming$$anonfun$merge$1.apply(hamming.scala:12) at scala.Stream$cons$$anon2.tail(Stream.scala:52) at scala.Stream$$anonfun$map$1.apply(Stream.scala:340)
at scala.Stream$$anonfunmap1.apply(Stream.scala:340) at scala.Streamcons$$anon$2.tail(Stream.scala:52) at hamming$$anonfunmerge3.apply(hamming.scala:14) at hamming$$anonfun$merge$3.apply(hamming.scala:14) at scala.Stream$cons$$anon2.tail(Stream.scala:52) at hamming$$anonfun$merge$1.apply(hamming.scala:12)
at hamming$$anonfunmerge1.apply(hamming.scala:12) at scala.Streamcons$$anon$2.tail(Stream.scala:52) at scala.Stream$$anonfunmap1.apply(Stream.scala:340) at scala.Stream$$anonfun$map$1.apply(Stream.scala:340) at scala.Stream$cons$$anon2.tail(Stream.scala:52) at hamming$$anonfun$merge$1.apply(hamming.scala:12)
at hamming$$anonfunmerge1.apply(hamming.scala:12) at scala.Streamcons$$anon$2.tail(Stream.scala:52) at scala.Stream$$anonfunmap1.apply(Stream.scala:340) at scala.Stream$$anonfun$map$1.apply(Stream.scala:340) at scala.Stream$cons$$anon2.tail(Stream.scala:52) at hamming$$anonfun$merge$1.apply(hamming.scala:12)
at hamming$$anonfunmerge1.apply(hamming.scala:12) at scala.Streamcons$$anon$2.tail(Stream.scala:52) at scala.Stream$$anonfunmap1.apply(Stream.scala:340) at scala.Stream$$anonfun$map$1.apply(Stream.scala:340) at scala.Stream$cons$$anon2.tail(Stream.scala:52) at hamming$$anonfun$merge$3.apply(hamming.scala:14)
at hamming$$anonfunmerge3.apply(hamming.scala:14) at scala.Streamcons$$anon$2.tail(Stream.scala:52) real 0m26.626s user 0m24.523s sys 0m0.520s  Will try again in a couple of years. ### Scala 2.8.1 Note to self regarding comment-38966: $ time /var/tmp/scala-2.8.1.final/bin/scala hamming
The first 20 numbers:
1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36

The 100th number (counting from 0): 1600

The 1000th:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3137)
at java.math.BigInteger.multiply(BigInteger.java:1144)
at hamming$$anonfunseq1$$anonfun$apply$1.apply(hamming.scala:19)
at hamming$$anonfunseq1$$anonfun$apply$1.apply(hamming.scala:19)
at scala.collection.immutable.Stream$$anonfunmap1.apply(Stream.scala:168) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:168)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
at hamming$$anonfunmerge1.apply(hamming.scala:12) at hamming$$anonfun$merge$1.apply(hamming.scala:12)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
at scala.collection.immutable.Stream$$anonfunmap1.apply(Stream.scala:168) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:168)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
at hamming$$anonfunmerge1.apply(hamming.scala:12) at hamming$$anonfun$merge$1.apply(hamming.scala:12)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
at scala.collection.immutable.Stream$$anonfunmap1.apply(Stream.scala:168) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:168)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
at hamming$$anonfunmerge1.apply(hamming.scala:12) at hamming$$anonfun$merge$1.apply(hamming.scala:12)
at scala.collection.immutable.Stream$Cons.tail(Stream.scala:565) at scala.collection.immutable.Stream$Cons.tail(Stream.scala:557)
at scala.collection.immutable.Stream$$anonfunmap1.apply(Stream.scala:168) at scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:168)
at scala.collection.immutable.Stream\$Cons.tail(Stream.scala:565)

real    0m25.552s
user    0m25.692s
sys     0m0.919s


### Relying on libraries

This is common language bigotry, and the authors cheat on their C++ "solution". They're relying on a heavyweight library class to do the difficult work, and thus hiding both the complexity and performance implications of their solution.

If you're going to compare language expressiveness and programming paradigms using examples like the Hamming problem, YOU HAVE TO ACTUALLY SOLVE THE PROBLEM, and not call some big library function to solve it for you. Otherwise you're just comparing libraries, and degenerately, an arbitrarily crappy language with a library that includes GiveMeHammingNumbers() wins.

### Parsimony

To some extent judging parsimony is in the eye of the beholder. Different dimensions of parsimony referenced in the discussion above include: runtime complexity of the algorithm, code length, whether the code uses language intrinsics vs. standard libraries vs. user-defined functions, runtime memory usage, and how close the algorithmic solution is to the way the individual wants to think about the problem. To my mind, some of the solutions others judged as very elegant are a little less so because they either require redundant information to be stored in different lists or involve a loss of information about the "progress" of each multiple generator (i.e. the 2,3,and 5 series). It seems easier to compose a solution that doesn't have these drawbacks by using an imperative style algorithm with shared state among the generators. Here is a version in C++ that doesn't use any libraries (except for I/O messaging). It would obviously look a little cleaner if it relied on a built in list construct, GC, and a more elegant looping syntax. However, it is clearly more minimalistic in terms of run time and potential memory footprint than most (perhaps all) other solutions presented. It is incomplete in the sense that it does not incorporate a BigNum library or check for overflow. As such, if the constant "1000" in the print loop is replaced with some bigger number it may overflow and generate nonsense long before it runs into other computational limitations.  #include < iostream > typedef long long BigN; // replace with best BigNum type available to you class HammingIterator { protected: // Helper Definitions: // Roll our own singly linked list called HCell struct HCell { HCell(BigN v) : val(v), next(0) {} HCell(BigN v, HCell* n) : val(v), next(n) {} BigN val; HCell* next; }; // for N=2,3,5, Gen<N> will keep track of next generated and where we are // in working through the historical values template <int N> struct Gen { Gen(HCell* c) : trail(c), cand(N*trail->val) {} HCell* trail; BigN cand; void goNext(BigN best) { if (best == cand) { trail=trail->next; cand = N*trail->val; } } }; BigN min3(BigN v1, BigN v2, BigN v3) { return ((v1 <= v2) ? ((v1 <= v3) ? v1 : v3) : ((v2 <= v3) ? v2 : v3)); } // Data Members of the iterator: HCell* d_list; // all the stored list HCell* d_curr; // the current value to output (end's the list) // State of the 3 generators, containing a trailing position and the next value // for this generator series Gen<2> d_g2; Gen<3> d_g3; Gen<5> d_g5; void cleanTrail() { //remove no longer needed front of the list while ((d_list != d_g2.trail) && (d_list != d_g3.trail) && (d_list != d_g5.trail)) { HCell* next = d_list->next; delete d_list; d_list = next; } } public: HammingIterator() : d_list(new HCell(1)), d_curr(d_list), d_g2(d_list), d_g3(d_list), d_g5(d_list) {} ~HammingIterator() { HCell* first = d_list; while (first) { HCell* next = first->next; delete first; first = next; } } BigN current() const { return d_curr->val; } void advance() { // figure out which candidate is smallest: BigN best = min3(d_g2.cand,d_g3.cand,d_g5.cand); // append smallest to the list as the new current d_curr->next = new HCell(best); d_curr = d_curr->next; // advance state for any generator with candidate equal to best d_g2.goNext(best); d_g3.goNext(best); d_g5.goNext(best); // remove front part of the list that's no longer needed cleanTrail(); } }; int main(int argc, char** argv) { HammingIterator hi; for (int i=0; i < 1000; ++i) { std::cout << hi.current() << std::endl; hi.advance(); } } `