functional equivalance?

reading up on the ADT vs. OO debate, i came across an old comment that "function equality is actually decidable" if only some programming language would actually implement that particular approach. anybody understand that or know of any languages which might support it?

Comment viewing options

Related

Observational Equality, Now!

I'd need to brush up on my jargon to grasp the page you linked.

that certainly sounds like a relevant related advancement. happy to hear some smart folks are working on it :-)

Explanation?

If you (or any other readers) have dug out the cited paper and read it, could you please post a quick explanation of how functional equivalence is defined, and how that makes it decidable?

ditto

yeah, well, don't expect it from me any time soon. if somebody does have the clue, maybe post it back on that original post's page?

Does "the cited paper" refer

Does "the cited paper" refer to the original post's or David's?

RE: Quick Explanation

Functional equivalence means having the same outputs for a given set of inputs. This is, in general, undecidable. In a Turing-complete language, one could reduce it easily to the Halting Problem.

But neither of the papers assumes a Turing complete language: according to Frank's comment, repeated by Matt below, the language discussed in the paper cited by Frank (which I have not read) is the simply-typed lambda calculus, for which all programs halt and few programs are practical. The Observational Equality Now paper studies an 'Observational Type Theory' with the goal of achieving equational reasoning without sacrificing too much flexibility or practicality.

Observational equality for even an incomplete subset of a language, protected by effect-typing or a syntactic barrier, can be entirely useful. For example, functional equivalence allows one to compute such properties as associativity and commutativity, which in turn allow some powerful optimizations (partial-evaluations, reducing dependencies for data-parallelization). It also supports symbolic-level language proofs (high-level assertions with variables) and so on.

This paper...

...doesn't have anything to do with the subject of the thread. It's about how to structure proofs of equality in dependent type theory.

What Frank was talking about was "coproduct equality", which is the βη equality for the simply typed lambda calculus with sum types. That this is decidable seems obvious at first. Then, when you try to just sit down and do it, you find everything almost works and nothing actually works.

However, it actually is doable, and IMO the natural proof requires importing some ideas from algebraic topology -- basically whenever you see an expression of the form case(e, x.e1, y.e2), you want your algorithm to split on postulating whether e is inl x or inr y, and then put the results of the two trials back together. This is somewhat reminiscent of the notion of a cover in topology, and indeed can be formalized using Grothendieck topologies.

There's also a very devious and clever trick invented by (I think) Sam Lindley, which lets you avoids the use of Grothendieck topologies by means of strategic CPS-conversion. I can understand his proof (his thesis is very clear) but I am unclear why this works. (This is especially embarrassing since I've used cousins of this trick myself....)

Abstract of the paper Frank cites

Interpreting Î·-conversion as an expansion rule in the simply-typed Î»-calculus maintains the confluence of reduction in a richer type structure. This use of expansions is supported by categorical models of reduction, where Î²-contraction, as the local counit, and Î·-expansion, as the local unit, are linked by local triangle laws. The latter form reduction loops, but strong normalization (to the long Î²Î·-normal forms) can be recovered by â€˜cuttingâ€™ the loops.

So the context is the simply-typed lambda calculus - not too surprising.

Doubts

The choice of set of integers in W.Cook's latest paper is not very convincing. Sets are special: characteristic function is defined on a set. What is interpretation for other kinds of objects, e.g. free monoids (aka strings of characters), relations, etc? Or, maybe, objects always assume set structure so that instead of monoids we study semirings (sets of strings), relations as sets of tuples, and so on?

Frank's comments -- Not Even Wrong

My most serious complaint about PDA's will sound hollow to you: the abstraction relies on the fact that functions are opaque and, contrary to popular opinion, function equality is actually decidable in lambda-calculi when evaluation accounts for the eta-rule. See, for example,

    C. Barry Jay and Neil Ghani. The Virtues of Eta-expansion. JFP 95.

No extant language actually implements such an evaluation scheme, but that is a defect of extant languages; PDA's seem to depend on this defect, which I think is an undesirable situation.

OTOH, I think it is not sensible to ask a language supporting the sort of subtyping you want to admit the eta-laws since they say basically that there are no `hidden constructors' in a type. For example, the eta-law for function types says that every value is constructible via lambda-abstraction. One of the ideas of OO is precisely that the set of constructors is extensible and not fixed. This is a methodological issue; I prefer systems where both beta- and eta-laws hold, or are at least admissible, because I like the simplicity of equational reasoning.

The basic issue here is that sometimes it is convenient not to assume you can inspect the functions. That the abstraction relies on the fact that functions are opaque is a "feature" we call "flexibility". We can then use it for powerful mathematical models, such as authority based on trust, implementing uniform access rights using a simple mechanism.

I'm not sure why Frank is so hell-bent on arguing against objects as a feasible data abstraction technique. It is entirely possible for an object to wrap an algebraic data type. The difference is in how knowledge is spread across the system. With an object, it is centralized, because of the argument that a data structure should know how to interpret operations on it. This centralization is a "feature" that also allows objects to be used as a uniform distribution mechanism -- Alan Kay covers this quite eloquently in his 1997 OOPSLA speech The Computer Revolution Hasn't Happened Yet, where he discusses a design for storing records on large tapes that had to be distributed to many Air Force bases.

When you combine flexibility with ease of distribution, you see why ocaps-based languages are a very promising model for secure distributed software systems.

Not Even Even

I'm not sure why Frank is so hell-bent on arguing against objects as a feasible data abstraction technique... The difference is in how knowledge is spread across the system.

Did you actually read that post? Frank very clearly and lucidly articulates exactly what his position is: he wants overt equational reasoning, and he explains in appropriate detail exactly how he thinks objects break that. (Moreover, he does it specifically in the context of the paper being discussed.)

Rereading that after all these years, I was struck by the quality of the post. The writing is clear and well organized; Frank is communicating his position not pontificating. There are a few references but they are very relevant to the subject being discussed; Frank isn't giving us a wild goose chase to papers he just happens to have read that he thinks will "give him cred". And the post sticks closely to the topic of discussion; there is no random insertion of hobby horse issues that don't really have anything to do with the topic.

One might not always agree with Frank's positions, but anyone who wants to post intelligently on LtU would do well to emulate Frank's style.

Re: Not Even Even

That's what I meant by "not even wrong". Everything he is saying is right and correct about how to write programs. Except for the fact that being opaque can be useful.

Tucked into your reply, there seems to be the suggestion that (a) I didn't read Frank's message before writing my own reply (b) I'm not posting intelligent (c) I'm not emulating Frank's style (d) I'm not agreeing with Frank (e) I'm searching for "cred" (f) I randomly insert hobby horse issues to hijack topics.

I'm not seeking an apology, but just the right to defend myself:

(f) is probably the most untrue, since before replying to somebody to clear things up, I check to see if they've provided a personal contact in their LtU profile or if I can find an e-mail address for them via Google. Most LtU users don't do this; I do provide my contact info and get e-mailed about 5 times a month from LtU users.

(d) I actually am agreeing with Frank, upto his points about objects "the abstraction relies on the fact that functions are opaque" and then going on to imply that is a flaw rather than a difference that you can use in modeling problems *that may call for such differences*. I am saying quite simply that flexibility comes in very handy. Frank seems to be suggesting Cook's argument for flexibility is something we will never need. I saw Frank cite Barry Jay's paper, but I am curious why he didn't mention Joseph Goguen's work on hidden algebras.

Now, please, respond to the actual argument I laid down, rather than my statement that Frank was being acute and should see these issues as complementary. Often, I see designers see things as either-or. That is rarely the case, and complementary viewpoints help a lot. There have been dozens of discussions on LtU over in the past, usually between David Barbour and some type-aholic, about what sort of assumptions we should make about distributed systems.

Even Even

That's what I meant by "not even wrong".

Perhaps you were unaware, but in mathematical circles, "not even wrong" is an inflammatory (and horribly unoriginal) slur. It is much stronger than "I disagree with you on certain points", and much closer to "you are a crank, an idiot, and should go do something with someone's mother".

I just think that Frank's posting style is a good exemplar for LtU, and point out the pitfalls that he avoided. I know I aspire to equal his quality, and perhaps others do too. I leave it to each of us to decide how well we measure up.

I am curious why he didn't mention Joseph Goguen's work on hidden algebras

I know Goguen's work was discussed here at some point, but maybe he hadn't read it at that time, or perhaps he didn't agree with it, or didn't think it was relevant in context. References can be like that. An LtU post is not a PhD dissertation.

There have been dozens of discussions on LtU over in the past, usually between David Barbour and some type-aholic, about what sort of assumptions we should make about distributed systems

Though distributed systems have some interesting implications for PLs (I'm certainly interested in that topic), they aren't the only consideration. It seems pointless to attack an old post (or even a new one) that wasn't discussing distributed systems for not considering the effects for distributed systems of what was being discussed.

Dragging such issues into an unrelated topic might be a good example of a hobby-horse, though. ;-)

Joe Goguen

I'm totally unfamiliar with LtU discussing much of OBJ or Maude or related work.

In the past, I've searched the archives for names of key people, and didn't turn up anything but a few posts by Joe Kiniry (maintainer of OBJ-3), Jim Apple, Paul Snively, and that's it. Paul has mentioned the CINNI calculus a few times, Joe hasn't really shilled his language and has been mostly humble and discussing his other passions, and Jim's main area of expertise doesn't appear to be term rewriting systems.

Though distributed systems

Though distributed systems have some interesting implications for PLs (I'm certainly interested in that topic), they aren't the only consideration.

True. But they're an important consideration, right up there with concurrency, modularity, and data abstraction.

Dragging such issues into an unrelated topic might be a good example of a hobby-horse, though. ;-)

I never drag such issues into unrelated topics. Any discussion whatsoever on "practical" PLs is related to distributed systems, even if that relationship isn't immediately obvious to a given participant. ;-)

I'm sure Z-Bo read the post

I'm sure Z-Bo read the post, as did I. I disagree that the post is exemplary however. On Frank's six points:

1. The comment about algebraic and coalgebraic data types is irrelevant. Quite a lot of things are more exactly complementary than PDAs and ADTs; that doesn't imply anything about the necessity of either. This seems like a random insertion to me.

2. The claim that existing languages are defective because functions are opaque strikes me as ridiculous. Frank then proceeds to take this unsubstantiated claim of defectiveness and use it to indict PDAs.

3. The entire section about "fold" here is irrelevant and does, at least to me, seem like the insertion of a hobby horse issue. He claims no one would implement an ADT as in the paper, but certainly no one would implement a production library with "fold" as the only available operation.

4. I don't agree with Frank's characterization of the paper. The author is not "resorting" to anything; he is presenting bounded existential quantification as a means of explaining what happens in languages that blur the distinction between ADTs and PDAs. An example of a PDA that does not use existential quantification is present directly above this section. Either Frank has misunderstood the paper or I've misunderstood Frank.

5. Frank claims to be able to implement every PDA as an algebraic data type. This does not seem to take into account basic things like extensibility and other practical concerns.

6. Claiming to prefer algebraic data types to PDAs is silly because algebraic data types do not suffice for all things. Comments like this make me wonder if Frank has ever written a program that uses a GUI toolkit. The following claim makes me wonder if he's even used or written libraries before as he seems to be suggesting you can easily make global source modifications "between runs" for real-world programs:

An object-based implementation of stacks allows many different stack implementations to run simultaneously in one program; an ADT-based one allows only one implementation, but still permits the implementation to be replaced globally in between program runs. I think it is hardly ever the case that one specifically needs the ability to simultaneously support multiple implementations of an interface in the same program run.

Frank then goes on to fail to answer some basic questions that were trying to lead him into admitting the utility of PDAs claiming he has done some inconclusive research into the topic.

The main thing here is that Frank's desire for overt equational reasoning is irrelevant. I have similar desires, but real-world software engineering concerns have to take precedence. A program with nice mathematical properties that is insufficiently extensible, fragile to future modification, incomprehensible to the programmers you have available, too slow, takes too long to assemble, et cetera, is useless.

Modularity and abstraction

My biggest objection to overt equational reasoning is that it suggests modularity and abstraction automatically decrease auditability, because, obviously, according to that argument, the only way you can have auditability is making everything completely open for inspection. That's the fallacy in Frank's argument, and probably why no language designer I know of has committed an effort to his suggestion. Joseph Goguen, on the other hand, was an example of an academic with a different way to think of the problem.

Put another way, capability-secure systems actually decrease the amount of code that needs auditing. Thus, modularity via prodcedural data abstraction actually increases auditability. The Principle of Correspondance is a powerful philosophical idea.

There are some things I don't fully like in Cook's paper, I think he could have made a stronger argument for (e.g. given these things just discussed, how should we think about efficiency? as opposed to, if we're stuck with these two things, which do we pick for efficiency? -- that's not quite how a compiler writer frames optimization problems [e.g. copy elision to statically remove redundant copy constructors], nor how a system designer should peg design issues), but overall I like it.

I agree completely

If software engineers were to stop using procedural abstraction because it makes certain forms of reasoning difficult, program reliability would decrease rather than increase.

I think it may be worth separating the notions of "auditability" and "provability". A language that offered only simple constructs taken right out of category theory would no doubt have benefits when it comes to equational reasoning and constructing proofs. However, auditing code that was built solely on such components would be an absolute nightmare (if you even managed to get it written in the first place).

Languages with built-in high level notions like "objects" make auditability easier because it means required functionality doesn't have to be encoded in some ugly form (e.g. records of functions with nested type constructors to simulate objects and inheritance -- see Haskell's wxWidgets bindings) or, worse yet, done without completely. It obviously makes sense to avoid language features that are harder to reason about when possible; but, when you need them, you need them.

Mmm...

I want to go even further than that. I read this ICSE'10 draft paper (that I am not sure was accepted) co-authored by somebody who e-mailed me recently. They were doing an assessment of software architectures and looking at the notion of frameworks as software architectures. One of the frameworks they evaluated was ASP.NET. I just found their analysis to be completely handwaving, and my hope is that 10 years from now nobody will ever publish a paper like that again, because we can show a better way to analyze frameworks and systems. Joe Goguen has a great paper called "Tossing Algebraic Flowers Down the Great Divide" where he talks about, among other things, his results from general systems theory and regrets that for some reason his ideas never really caught on in the main stream.

The ICSE'10 paper was really bad, IMHO... it didn't even say what version of ASP.NET they evaluated! ASP.NET 1.0 to 2.0 had a completely different evaluation model - ASP.NET 2.0 actually has enough pragmatics to be a fully self-sustaining system (just not a particularly interesting self-sustaining system, because of its model). Then with ASP.NET MVC, well, some objects don't interact well at all with ASP.NET 1.0-2.0's stream model and that's why in ASP.NET MVC Html.RenderPartial returns void (because they are subverting the Web Forms rendering engine). Nor was their description of ASP.NET accurate at all from a modularity standpoint. The paper describes the idea of Master Pages as a very modular approach to programming, which is a joke. Master pages tightly couple content belonging to a document body to the header, forcing all pages that use that master page to use the header as well. At best, you can define a hook to replace the header, but now this again limits the expressivity and disallows 'blending' objects in unilaterally composable ways - you're tightly coupled to HTML. And Visual Studio itself is not smart enough out of the box to realize mime-types aren't XHTML Transitional 1.0. It's as if they never programmed a serious application in ASP.NET, but were somehow qualified to give a description of its architecture. And their description was no more than a page! With no math! I was so enraged that I posted a scathing review of the draft on my blog, but then took it down the next day when I realized I was being rude beyond belief, and my only real interest is in showing practical examples of how to apply PLT to software architecture, especially for interactive systems. I am not really aware of many examples of this today, and most of the ones I do know of have limitations that irritate me.

I also about a year ago had a really long rant on Jonathan Edwards' blog about Model-View-Controller and various descriptions of it, because somebody made he comment that Jonathan's definition of MVC was wrong, so I pointed out how if you take the descriptions literally from one paper, and try to build a semantic model from the descriptions, things either don't make sense or they have weird encoding issues. And you can do this for any paper on a supposed "software architecture", such as Model-View-Presenter. You can show the author hasn't really sat down and thought things through.

Auditability?

I don't really understand what you mean by "auditability". If you mean that the software should be easy to review and survey, then this is a pragmatic issue which can almost always be handled by straightforward means such as 'patterns', syntactic sugaring, embedded DSLs and others. AFAICT, this has very little to do with what the language's core constructs are and how they're defined.

Not always

Auditability means that you're able to check if the code is doing X.
Some X can be done by 'patterns', syntactic sugaring, embedded DSLs and others as you're saying.
Other 'X' such as security are much better done if the language's core construct support them than otherwise for example joe-e (*) is about being able to prove security with just local code review not global code review: which means much better auditability.

*: http://lambda-the-ultimate.org/node/3830

To my mind, auditability

To my mind, auditability means local reasoning. Objects facilitates this, particularly in object-capability systems, because the set of available operations is very clearly defined by the set of objects available. Encapsulation is a powerful tool in this domain.

Provable vs. Auditable

If it's easy to reason locally but difficult to prove things formally, that's probably an indictment of the formal system you're using. Surely the goal of a formal system should be to make the audit reasoning formally provable.

It's not necessarily an

It's not necessarily an indictment of the formal system so much as an indictment the developers and auditors, myself included. Verification incurs a heftier burden on implementors and auditors alike, though the confidence achieved is correspondingly higher as well. Lightweight formal methods, ie. refinement types, contracts, etc. are getting closer and closer to full verification though, and without much additional burden.

In any case, having everything open for inspection, functions included, would seem to violate parametricity and break equational reasoning which makes verification harder not easier, so perhaps we're all misunderstanding what Frank is suggesting. It's surely evident to him that something like the facade pattern is a fundamental abstraction for progressively larger systems, which both ADTs and objects naturally express, but it does not seem like a property of algebraic types and which breaks in the face of open inspection.

Ramblings

Verification needn't impose a higher burden on the developer, but certainly the developer can help the auditor by writing clear code. There are a number of reasons that it's difficult to formally verify code. One is having to explain obvious things to the computer, but probably a bigger one is because it forces the realization that some other things aren't as obvious as you'd thought they were in your informal reasoning.

I agree with you about the facade pattern, etc. Code should only ever require that the entities it manipulates have certain properties - never that they are intensionally equal to some particular values. Algebraic DT deconstruction should be viewed by the code doing the deconstruction as just another abstract operation - no knowledge of the actual form of the entity being deconstructed should be assumed.

I don't see PDAs and ADTs as dual. I see ADTs as the more fundamental - all code should be written against an appropriate ADT that specifies the theory of the entities manipulated. Where dynamic behavior is desired, the ADT can specify PDAs be used. "Everything is an object" is a bad idea.

The Powers of Ten: Rules for

The Powers of Ten: Rules for Developing Safety-Critical Code by Gerard Holzmann

For those of you who don't know who GH is, let me introduce him as a complete badass mofo. Besides the SPIN model checker and protocol verification and validation, he also wrote Beyond Photography: The Digital Darkroom, which was written before Adobe Photoshop 1.0. He is a grandmaster hacker and great computer scientist. His advice surprised even me.

One time, at band camp

You forgot to sign this with "I'm on a horse."

Haha

I'm just agreeing with Sandro.

There was an amusing thread on captalk recently where Jonathan Shapiro, Alan Karp, Jonathan Rees and Mark Miller listed "the places where researchers from the '70s still work and are not dead yet". According to those in the thread, many oldtimers still view capability security as something that "doesn't work", and they are "waiting for funerals".

In the above article I link, Holzmann comes up with some totally different arguments that are more focused in the way that John Nowak was using the phrase auditability (behavioral logic checking).

he seems to be suggesting you can easily make global source modifications "between runs" for real-world programs:

Isn't this pretty much the definition of dynamic linking and 'software componentry'? The distinction between existential types (and variations such as Hilbert's choice operator) and support for multiple concurrent implementations seems quite useful to me.

I agree that the distinction

I agree that the distinction is very useful. However, to be clear, dynamic linking in no way eliminates the need for procedural abstraction. On the contrary, it's procedural abstraction that enables reusable libraries that can be dynamically linked in the first place.

The entire section about "fold" here is irrelevant

The entire section about "fold" here is irrelevant and does, at least to me, seem like the insertion of a hobby horse issue. He claims no one would implement an ADT as in the paper, but certainly no one would implement a production library with "fold" as the only available operation.

This represents nothing more than a pragmatic difference between mathematics and software engineering. Software engineers design for performance and practicality, while mathematics tends to focus on generality and on keeping interfaces as minimalistic as possible, even at the cost of "reinventing the wheel". To a mathematician, once you have proven that all list observers can be expressed in terms of 'fold' there's no need to add to the interface.

Equality reasoning vs. coinduction

It is my impression that objects are basically coalgebraic by nature, hence equality reasoning is inherently difficult if not entirely inapplicable. I'm not sure what the relevance of eta-expansion is.

Nevertheless it should be possible to recover most of the advantages of equality reasoning by defining appropriate invariants. This is effectively what an OOP programmer does when she provides an "equality operator" for her custom objects.

Did you cheat? ;-)

If you click the link to Joe Goguen's old website on hidden algebras above, your impressions will be confirmed.

Bart Jacobs also has really good stuff on this, too.

another (restricted) take on it

some haskellian seemingly impossible things of note.