## what's a useful definition of equivalent functions?

Equivalence of functions is, in general, undecidable. But specific instances of it are usually decidable. Equivalence can be proven via structure analysis, or disproven via structure analysis or by example. Equality can be proven on functions whose input is limited to a finite set simply by exhaustive testing of all possible inputs. So, while it's true that there will always be an infinite set of pairs of functions for which we cannot prove equivalence or nonequivalence, it isn't really the case that we can't ever prove anything.

But what notion of 'equivalence' is useful?

What if we have two functions that are provably equivalent where their results are defined, but result in errors in different cases, or result in *different* errors in some cases? If I can prove, in the context of a particular program, that no possible call to a function will be a member of either error-causing set, then the functions are equivalent ... for purposes of that program. ie, they are observationally equivalent for the particular subset of valid inputs that we're interested in.

What if we have two functions that do equivalent things to members of slightly different types? For example, if I have two string-concatenation functions, but one of them concatenates a string class on which fifteen other operations are defined and one of them concatenates a string class on which thirteen of the _same_ other operations and three _different_ other operations are defined? These concatenation functions are not exactly mathematically equivalent, but in some very important way they are exactly the same, and may even be expressed by exactly the same source code. If a mathematician said these functions weren't equivalent, as an engineer I would roll my eyes and ignore him. If my job required me to satisfy his useless notion of equivalence, I would *make* the functions equivalent by just defining the 'missing' operations in both classes.

What about having 'equivalent' multiplication functions on, say, ints and rationals? If I posit that there is a multiplication function on a 'number' type that both of the other functions are subsets of (that is, they give equal results on equal inputs, for all inputs they can be given), is that a useful notion of equivalence?

What if we have two functions that return exactly the same results in all cases, and we can prove it ... but one of them takes N-squared memory and NlogN runtime, and the other takes N-squared runtime and NlogN memory? As an engineer I'd find this semantic equivalence and resource-requirement nonequivalence useful and swap one for the other depending on whether speed or memory use were more important. As a mathematician, I don't even know whether I'd be working with a notion of equivalence that would note the difference.

In the end, 'equivalence' is any relation that's transitive, symmetric, and reflexive. There's an infinite number of choices of such relations, and people who talk about function equivalence often do so without saying which equivalence they mean. Or, importantly, which equivalence is most useful in some relevant context.

And while you're chewing on the question of what 'equivalence' is most useful in a given context, what notion of proof is most useful to establish it?

What if we have a pair of functions which has been observed to return the same answer for the same inputs on a randomly-selected billion possible inputs ... and monitoring both has revealed that we have achieved 100% code coverage in both functions while doing so ... and other monitoring, plus a few million not-so-randomly-selected inputs, has assured that all possible combinations of taken and not-taken conditional branches in both functions have been exercised, on inputs that differ from each other by the theoretical minimum possible for that input type? As a mathematician, I couldn't claim that I know these functions are equivalent. As an engineer, I'd consider the indications of function equality to be strong enough that I wouldn't mind the absence of a proof. I'd feel comfortable swapping one for the other, especially if they are both relatively short.

Why short? Shortness limits the possible semantic complexity to some wobbly notion of a boundary, making it less likely that one or both contains subtleties not revealed by the extended fuzz-test. If one or both of the functions were twice as long, I'd want more fuzz-testing before I'd feel as comfortable swapping them. This corresponds to some intuition about reaching a particular certainty level, but that certainty level, no matter how you slice it, is less than an absolute proof.

So, here's a question. Is it useful for programmers to go with some 'heuristic' of function equality, or an engineering-grade 'indication' of function equality, if no mathematically certain proof is available and by doing so they can achieve measurably better results? Is it worthwhile for PL theorists and researchers to seek and refine such 'heuristics' and ways of estimating the strength of such 'indications' (rather than 'proofs') of function equality?

In mechanical engineering, strength of materials is always an estimate. In programming, sometimes the strength of our knowledge is also an estimate. This is a problem, but not necessarily a disaster, in terms of producing useful results. Mechanical engineers compensate by including a 'safety margin' well beyond the requirements of the particular job. How should programmers compensate?

## Comment viewing options

### Multiplication Example

The example of multiplication over integers and rationals "feels" somewhat unnatural. The subset relationship to a "number" type seems like a more natural relation between the two multiplication functions. To get the right properties for equivalence you're restricting the meaning to "equivalent over the intersection of domains", but one domain is already a strict subset of the other. In that case an order relation would seem to be more natural - and then there is a useful result which is that one can be substituted for the other as an approximation.

### True, all integers are also rational.

In every popular numeric representation format implemented on computers, all representable numbers are members of a subset of the rationals - so integers are always something of a subset in the common cases, but of course every fixed-width format has a different range, and therefore multiplication on particular fixed-width formats have different domains in the less-common cases. In integer and floating-point representations of equal bit width, there are always integers representable in each form that are not representable in the other.

The only reason I didn't say floating point is that floating point multiplication is subject to rounding in the usual case, hence does NOT give a result congruent with multiplication on the theoretical 'Number' type. On the other hand, when integer values are represented in floating point formats, multiplication produces results not subject to rounding (in the most common cases). So those two multiplication functions (integer and floating point) are still a subset relation over the common cases, ie, over the domain of integers whose product can be exactly represented in both formats.

But in other domains, of course, not so much.... and it comes down to whether that's a useful notion of 'equivalence' yet again.

### Domain

You can certainly make a case for equivalence within a domain.

Functions, in the mathematical sense, are not algorithms. Algorithms have a lot of properties that are not relevant to functional equivalence - e.g. performance characteristics, or suitability for implementation on FPGA or GPU, or modularity and extensibility. As engineers or developers, we are concerned about these properties (cf. isomorphism is not enough).

I think a 'heuristic' equality might be sufficient for some purposes (such as resource discovery and linking), but I think I wouldn't want my optimizer to use it. Which mechanism of 'proof' is effective for establishing equivalence is similarly contextual (since proof itself is a computation with various modularity and performance concerns) - e.g. a runtime optimizer should probably be operating with a very narrow view and simple rewrite rules, but a separated compile-time optimizer might accept full equality saturation.

### clarification of terms?

I might have something to say if you're talking mainly about equivalent code routines, but probably not if you primarily mean a math sense instead.

I gather you mean at least two different senses when you say function, one from math and one from programming languages, and that you also intend the math sense to inform the PL sense during analysis of equivalence. If we used namespaces in everyday language the way we do in programming, we might say math::function for the former and PL::function for the latter.

For folks who have no idea what I'm talking about, contrast Wikipedia pages for Function (mathematics) and Function (computer science). A function in math means a relation consisting of a set of pairs (d, r) for d members of domain D and r members of range R where a given d appears at most once in the set of pairs in that relation. Whereas a function in code means a procedure or subroutine with executable behavior causing side effects on bits in memory or CPU registers.

When you mean the PL sense, routine might be safely distinct in a way that cannot mean the math sense. But I don't know a word for function that means only the math sense, and not a CS or PL sense too. (Wolfram suggests map as a synonym for function, but that's easy to confuse with a collection class in a PL library.)

I'm really interested in equivalent code routines. :-) What I've been doing the last dozen years is mainly optimizing results via equivalent code with the same output but different internal organization. In fact, my whole idea of testing involves checking for empirical equivalence of one thing and another under satisfactorily thorough scope of conditions and cases, etc. I can say more about it after I figure out whether it's on topic or not (if it matches what you want).

[Edit: you mean transitive, symmetric, and reflexive.]

### I mean useful equivalence for programming purposes.

My observation here is that 'equality' is a wobbly concept, and there are many notions of equality are not the notions of equality that are meaningful in writing programs.

Specifically, I was thinking of a recent discussion here where someone made a bold claim about being able to prove 'observational equality' of functions and it turned out not to be any kind of equality that I'd find useful as a programmer.

My major project (in which, darnit, I've just discovered a show-stopping bug today that will require significant redesign) is about a general (executable) representation for the operational semantics of programs written in many languages, creating a basis for interoperation in a common runtime -- with at least one reasonably high-level language in which any such program can be rendered 'idiomatically' with roundtripping capability, creating a basis for the maintenance and debugging of such programs on an ongoing basis.

One of the things that happens when you put routines initially defined in different languages together in the same program is that these routines have dependencies on other routines, usually library routines, which exist in one version per runtime and are often 'equivalent' in some significant sense, modulo specific details of types, argument ordering, different calling disciplines of different languages, etc. And in cases where the redundancy can be detected, I would like the program to rely on only one of these 'redundant' routines if possible, eliminating redundant infrastructure.

I'm using the words 'function' and 'routine' in fairly specific ways; a 'function' returns a (tuple of) return values given a (tuple of) inputs, does no I/O, has no effect on any externally visible state, and if there is any internal state that persists between calls it doesn't affect the relation of return values to input values. I consider functions to be equivalent when they describe the same mapping, but they are often (usefully) nonequivalent in the resource requirements required to compute the mapping, and I am often interested in only a subset of the mapping.

It's a notion of 'function' that differs from the mathematical notion only by the requirement of being executable and by having some observable set of resource requirements and dependencies.

A 'routine' is any executable (callable) subprogram, including but not limited to functions. Routines can have side effects.

Anyway, I have this idea that a database of function equivalencies can be built over time, identifying, eg, commonalities in the runtime libraries (or idioms) of many languages. Such data can also be useful in other ways, as a general resource for analysis of function equivalence in other contexts and as a resource to help 'clean up' discovered code that ought to be using library calls but isn't, or to 'optimize' code that has an alternate form more efficient in the current context.

### project sounds useful

Nice writeup (excellent clarity). Ideas I had in mind are only slightly related, with a focus on taking advantage of equivalence to get some kind of extra (or different) effect while preserving old requirements. It looks like you want to discover equivalence automatically, knowing it's often there already in multiple versions of similar code, maybe with or without human guidance. That sounds not-quite-strong-AI hard, when it also involves coping with loose-ends around intended semantics not explicitly stated anywhere in code or meta-info.

(One of my coworkers left a few months back, and part of my daily experience involves trying to figure out what some of his code does, was intended to do, is required to do, etc, in the absence of docs, comments, abstraction, tests, samples, or even locality of effect in some cases. It's as humbling as it is frustrating, making me sensitive to issues of perceiving latent purpose and intended consequence of code side effects. I could never write code to discover automatically what I can't figure out manually.)

Some code transformations can be done without understanding how it works. On a new code base I usually start with that. I can say, "Whatever it did before, this does the same thing faster." Afterward folks assume I understand the code, and get surprised when I say I have no idea how it works, that all I did was simplify in a way preserving invariants. Here's an example. A few years ago I found a sub-graph of functions and object fields that had no outputs to the rest of the code graph—they formed a sink with no effect on the world, so I deleted them all as a no-op. Understanding any past purpose was unnecessary, since they had no real effect. Finding this requires assuming purpose of code has an extensional role you can deduce from looking at all locations used and seeing its effect in context, like solving for N unknowns in M equations of code. (This works poorly when N is much bigger than M; how in the world do folks write code like that? Just a few comments would change the balance of unknowns.)

Anyway, I have this idea that a database of function equivalencies can be built over time, identifying, eg, commonalities in the runtime libraries (or idioms) of many languages. Such data can also be useful in other ways, as a general resource for analysis of function equivalence in other contexts and as a resource to help 'clean up' discovered code that ought to be using library calls but isn't, or to 'optimize' code that has an alternate form more efficient in the current context.

That sounds useful, especially if it helps resolve conflicts between sub-libraries that want to work with variants of standard routines just different enough to fail for one party when they start sharing a common shared version.

If libraries use common symbols in a way incompatible with each other, it would be nice if a tool like yours could statically figure that out. Here's an example involving the perils of sharing standard symbols in C headers, which is basically an unhygienic macro problem:

I wrote a library with hundreds of asserts inside, to verify pre- and post-conditions, but a couple years ago my host build environment redefined assert() in the standard C headers to do nothing at all. (The overt aim was to prevent asserts in production code.) As a side effect, my project ran tests for months with asserts undetectably doing nothing whatsoever, with the effect of subtly invalidating all tests when invariant maintenance was checked no more strongly than "seems to work." I had to change all my asserts to a private version that did something, after getting over the shakes.

In a future library I plan to use only locally defined symbols, with those provided by a host environment effectively occurring in a module that requires explicit mapping of local to global environment symbols. This makes it possible to specify what is required of symbols provided by an environment, as an informal contract, with a violate-at-your-own-peril clause understood. But I don't know an executable way to specify expectations of the sort your tool would want to know about when analyzing code.

Your tool might have trouble finding non-equivalence without actually calling routines involved, especially if a bug is present. (Behavior just as subtle might occur for correct code, and not just a bug.) For example, suppose you write a hashmap with a subtle bug inside that causes a member removed earlier to "come back" under some conditions, because the bits involved were never cleared. To detect this, in a test you can pseudo-randomly poll for deleted collection elements, to assert they never return unless re-added. To infer such semantics with a static analysis tool seems like it would be hard.

Before I wrap up, here's one last note on use of equivalence to check invariants that ought to be true, but might fail due to incorrect code elsewhere, such as memory stompers. When an instance of data (object or struct or whatever) is larger than just a few bytes, and I can squeeze in slightly more state without materially affecting costs, there are two pieces of constant data I want inside each instance, which never change during an object's lifetime. One amounts to a magic signature dynamic type tag, typically at least one byte in size, but as large as four bytes in big or rare objects. The other is a generation number, typically eight or sixteen bits in size, chosen pseudo-randomly at construction time but constant henceforth.

In methods for an object, if you assert the tag is always correct, this will sometimes fail for several kinds of reason, all of which mean the runtime is invalid or corrupt, such as using an object after it has been destroyed. The main reason to do this is to increase sensitivity to bad software, to fail faster. It's cheap to check, because looking at a field has low cost compared to a cache line miss, and cache line misses will dominate latency using lots of objects, rather than seeing if they are corrupt. In a language like C, after you catch a few horrific causes of such failures, the value of checking gets more apparent.

In contrast, the generation number field is not a self-check mechanism; it's to permit other objects to detect stale references when they believe your lifetime must be the same as it was earlier, when they captured (for example) a pointer. The idea is to prefer pointer+gen as a reference instead of just pointer alone, because this allows you to assert the gen has not changed when you access the object to which a reference points. For example, if you use refcounting, a copy of the generation number should also be used to validate refs have been done correctly, when incorrect refcounting will typically cause gen to change before it was due.

When software is correct, checking and not-checking invariants for tag and gen are equivalent, since the same thing happens, modulo cost to look at fields already in a cache line you had to access anyway. But when software is not correct — and it usually isn't at first — checking tag and gen will catch very serious problems and save a lot of development time, because it amounts to fine-grained search for signs of garbled memory state. In theory it makes no difference (if you're an optimist), but in practice it does. Turning this off by compiler switch in production code is typically unnecessary, because it won't make any difference in performance when checks don't occur in hot spots.

As you can see, that was only tangentially related to goals you have in mind.

### Tangential, but interesting.

It is actually fairly common behavior for information used to check the integrity of the codebase to have no effect in the case of correct code. That is the basic idea behind 'assert' and also the basic property of type declarations in languages that have the 'type erasure' property.

If you can really get function equivalence testing to work, you can show that such code is equivalent to code that has no such infrastructure iff you can show that the case that the infrastructure checks for never actually happens. I don't know if it can ever get that good, but that's an ideal.

It is also more common than you'd guess for asserts to actually speed the code up. A compiler that's aware of the semantics of 'assert' can treat it as a control structure for purposes of creating proofs of certain properties which obtain in given regions of the code. Thus, if someone asserts in the 'store' routine that all values to be stored in a table are between 0 and 30000, for example, a compiler can therefore prove that the code downstream from the assert statement is never executed on values outside that range, and therefore, if that's the only place that stores anything in the table, construct a proof that it can use a table made of two-byte values directly stored, which can be a huge speedup in a language that otherwise supports arbitrary-precision integers.

Constructing proofs as such is hard. It's not reasonable to expect to be able to do much more than current optimizing compilers do during compile time.

But once the theorem proving has been done, it's not too unreasonable to expect the compiler to patternmatch against larger patterns that have already been shown to be equivalent. So I'm definitely envisioning a tool more driven by a huge distributed database of code transformations than a tool that carries out all possible proofs of code equivalence at compile time. You'd first optimize things as much as you can locally and into as uniform a structure as you could get them locally, in order to reduce the vast number of different ways something can appear. Then you'd be applying pattern match to the optimized forms. The theorem proving work done (or perhaps the very strong indications gathered from directed testing, code coverage monitoring, and fuzz-testing) would accumulate in the database, rather than being done for every program every time. So, perhaps, you would consider it as an application of 'big data' to code optimization.

### workable plan

I get it: you'll focus on curating a large collection of known patterns, organized in canonical form, so they can be quickly and easily matched against new code arranged in the same canonical way. That should work, it's a good plan. Pattern matching turns into a numbers game very nicely. You want to deduplicate common code patterns.

So, perhaps, you would consider it as an application of 'big data' to code optimization.

Intuitively it sounds well-defined. I might expect a snag at the lowest possible level, where it might be hard to rise from specific detail to useful general pattern. Another possible problem is hard for me to phrase: common parts might be mixed into larger variation, making them hard to isolate and thus identify, as opposed to having boundaries that correspond exactly to entry points with names. Finding edges to start searches in a formless soup of stuff is one of the problems in dedup generally. Refining your edge-detection scheme might be an ongoing task.

### Mathematical functions.

A function in math means a relation consisting of a set of pairs (d, r) for d members of domain D and r members of range R where a given d appears at most once in the set of pairs in that relation.

You're note the only one in this thread to say that the set theoretic function is the definition of mathematicial function. Just to balance things out, here's a quote from Barengdregt:

The lambda calculus is a type free theory about functions as rules, rather than as graphs. "Functions as rules" is the old fashioned notion of function and refers to the process of going from argument to value, a process coded by a definition. The idea, usually attributed to Dirichlet, that functions could also be considered as graphs, that is as sets of pairs of argument and value, was an important mathematical contribution. Nevertheless, the lambda calculus regards functions again as rules in order to stress their computational aspects.

Just wanted to put that in the thread somewhere.

### a riff on definitions

I like your point, that things can have multiple definitions, letting us try on more than one, when thinking, until something interesting happens. It's a good way to get mileage out of the human mind. Citing one definition as the correct one can encourage folks to think only inside the lines, which isn't helpful. Okay, maybe what I said has that effect, which is one I don't like; I favor more fluid/creative perspective when I can.

When I had discrete math in college a lot of years ago, I thought the set theoretic definition of functions was charmingly reductionist. It's often convenient in computing when things break down into discrete cases. But the narrowness of it can hamper imagination. A rule-oriented way to define functions can be more organic, and maybe ambiguous when meaning of language is ambiguous. You can err on the side of stating something in wider terms than meant; but this is good when you want to search widely enough to consider something, even if you disallow it later in tighter analysis.

In effect I offered two definitions sounding really different, to demonstrate there's a lot of ground for miscommunication. People are good at finding similarity in things that aren't identical, especially when using words with overlapping meaning, but this can accidentally cause discussion that is void when precision is absent.

In computing I usually have the perspective that it's just about bits moving around. (One of my coworkers recently teased me by countering that reality is mostly about atoms and other particles moving around.) Higher level meanings are just simulations occuring in a logical phase space simulated by a lower physical phase space. I find it hard to take higher level logical simulations as "real" in a way that cannot fail, when in fact they are known to fail; when you can establish failures have not happened (with reasonable bounds of probability) then it's safe to act as though they are real from a pragmatic view. So I discourage starting from assumptions that platonic mathematical relationships are atomic when they are constructed from simulations, unless the failure rate gets calipered in some way. When someone says, "Let's proceed as if my assumptions are always given as true," it just makes me want to ask, "Why?"

When functions are defined as rules, which lets you use natural language, you can slip in a magic-goes-here part in the middle of an algorithm that isn't supported. When using set theoretic definitions, it seems easier to address discrete cases that represent exceptions to a desired result. But I get how this amounts to being a pessimistic wet blanket.

### Co-induction and Bisimilarity

One notion of program equivalence is bi-similarity, which can often be proved using co-induction. Some interesting implementations that I've stumbled on are CIRC (circular co-induction) and PACO (parameterised co-induction).

### Coinduction is an interesting new technique to me.

I hadn't been thinking of bisimulation as restricted to state machines (which is how the wikipedia article describes it).

Bisimulation, to me, is the technique of giving *both* of two possibly-identical functions the same inputs to see if they return the same results. That said, I suppose the 'state machine' formulation does say everything that's really important about the general case, if the sequence of inputs is considered as 'argument' and the sequence of transitions is considered as 'result'.

Coinduction is ... hmmm. Something I need to wrap my brain around. In reading the things you've linked, it seems like a description of a technique for finding fixpoints in the interaction of mutually recursive routines. The implication being that if two sets of mutually recursive routines induce fixpoints having the same behavior, then these systems of routines can be considered equivalent?

### Coinduction is ... hmmm.

Coinduction is ... hmmm. Something I need to wrap my brain around. In reading the things you've linked, it seems like a description of a technique for finding fixpoints in the interaction of mutually recursive routines. The implication being that if two sets of mutually recursive routines induce fixpoints having the same behavior, then these systems of routines can be considered equivalent?

As far as I understand it, that's pretty much correct.

The inductive hypothesis lets us prove things about recursive functions where the recursive call is given a smaller argument. In other words, we can recurse as long as we peel a constructor off our argument:

foo z = True
foo (s n) = foo n


The co-inductive hypothesis lets us prove things about recursive functions where the recursive call provides a smaller return value. In other words, we're allowed to recurse as long as we wrap the result in a constructor:

bar = s (bar)

baz b = if b
then z
else s (baz b)


The advantage to co-induction is that we can reason about infinite programs (like bar), by reasoning about one constructor at a time. This also doesn't require a base case. For example, we can't use induction to compare "bar" and "baz False", since there's no base case. However, we can use co-induction to compare them:

"bar" and "baz False" return equivalent values if the outer-most constructors of their return values are the same, and if the expressions inside the constructors give equivalent return values. The outer-most constructors are both "s", and the expressions inside the constructors are "bar" and "baz False", which we've just shown return equivalent values. Hence "bar" and "baz False" are equivalent.

Note that this only works if we require the recursive call to be wrapped in a constructor. Also note the similarity to state machines and bisimilarity: the constructors act like outputs and the recursive expressions act like states.