## A Methodology for Generating Verified Combinatorial Circuits

A Methodology for Generating Verified Combinatorial Circuits. Oleg Kiselyov, Kedar N.Swadi, Walid Taha.

This is the final version of a paper accepted for Embedded Software Conference 2004. The paper doesn't show any circuits but the straight-line C code should be implementable easily.

There is a significant difference from FFTW in that the authors don't do any intensional code analysis -- the generated code is black box and can't be processed nor manipulated any further. Moreover, the generated code can't even be compared by equality. Oleg tells me that the paper is somewhat obsolete: it says that they approach FFTW in the number of operations in the generated code. That is no longer true: the power-two FFT generated code has exactly the same number of floating-point operations as that in codelets of FFTW.

Abstract interpretation is used to fix several problems in the generated code. This makes some optimizations possible (e.g., avoiding code duplication).

Multi-stage programming fans, enjoy!

## Comment viewing options

### Another version

My intro to partial evaluation treats the same two examples in Scheme, of course without the typechecking which is the whole point of the above paper. They seem to be doing about the same thing, with the abstract code and value types as a way of lifting information out of the opaque code type.

### They are not the same. Difficulties of the FFT generation

Darius Bacon wrote:
My intro to partial evaluation treats the same two examples in Scheme, They seem to be doing about the same thing ...
Despite a superficial similarity of the two approaches, there is a significant difference, beside the usual difference between partial evaluation and multi-stage programming.

One approach to code generation is to make the generator simple, and optimize the generated code later. Your introduction to PE follows this approach: a straight-forward generator based on the FFT recurrence, followed by the _second_ pass that does algebraic simplifications (like 1*a=1) and common-subexpression elimination (CSE). We, on the other hand, generate code in one pass. The result is already optimal and does _not_ need any CSE nor other optimization passes.

Why we are so uneasy about the manipulation of the generated code? Because doing so significantly weakens the equational theory, as Walid proved. Currently, we maintain the invariant that the generated code is _automatically_ strongly typed and does not need to be typechecked. That significantly simplifies the work of code generators (e.g., OCaml-to-C or OCaml-to-Fortan, perhaps other would follow). Post-processing of the generated code makes such invariants significantly harder to maintain.

Let's take an example. The "intro to partial evaluation" describes CSE that works by memoizing code fragments. That implies the comparison of code fragments in equality. How to compare two, usually open, code fragments in equality? The extensional comparison is undecidable; the intensional comparison is either unsound or insufficient. For example, are (+ x1 x2) and (+ x1 x2) the same? First of all, it depends if x1 and x2 refer to the same identifiers in both fragments. If the source language has binding forms with shadowing, the above fact is not given. Then we have to ask, if x1 and x2 _always_ refer to the same _values_? In the language with mutations such an property is far harder to determine. The "intro to partial evaluation" did not seem to include either binding forms or mutations in the source language. Furthermore, are (+ x100 x101) (+ x1002 x1003) are equal pieces of code? That again depends on what '+', x100, etc. mean. Incidentally, just structural comparison of these code fragments would say they are not equal, even if it turns our x1002 and x100 are the same identifier, etc. What about (* a (+ b c)) and (+ (* a b) (* a c)). These two fragments are structurally non-equal -- yet they essentially are common sub-expressions. Compilers therefore apply CSE, algebraic simplification, inlining, copy propagation several times, until a fixpoint is reached. To achieve the good quality of the resulting code, we therefore will have to implement a good part of the optimizing compiler! It is also far harder to assure the correctness of all these transformations.

Our approach is more algorithmic -- it really forces the user to understand the algorithm in order to generate the optimal code. In regards to FFT: if you try to generate FFT code in a straightforward way and then apply various simplifications, such as

• noting that 1*x = x and 0*x =0
• noting that sin(pi/4) and cos(pi/4) are the same and so complex multiplications by the roots of unity of the 8th order can be done simpler. Note, _computed_ sin(pi/4) and cos(pi/4) are _not_ actually identical.
• make all numerical constants positive
• CSE
you still end up with the code that has more FP operations than FFTW. To achieve the operation count of FFTW, one has to do a significantly harder job. It cost me two nights to figure out why my FFT code for N=16 has four more multiplications than the FFTW code.

I should specifically point out the phrase from the paper: there are benefits to focusing on writing better generators rather than on fixing the results of simple generators.''"

Regarding the CPS of your evaluator: that is indeed quite a useful technique, which is first described in Anders Bondorf, "Improving Binding Times without Explicit CPS-Conversion", LFP'92

### More references

I'd like to point out a few related papers for completeness.

First of all, the paper "Relating FFTW and Split-Radix" ( LtU discussion) shows that it is possible to exactly match FFTW in floating-point operation count using staging with abstract interpretation. Incidentally, this current paper A Methodology for Generating Verified Combinatorial Circuits'' is not obsolete as it provides the most detailed explanation and illustration of our abstract interpretation in the staging domain. The staging monad'' used in this and Relating FFTW...'' papers is explained in detail in a draft Staging Dynamic Programming Algorithms