Determining if a function is effectively pure

Say one is programming in a not-pure-FP language (e.g. Scheme, Clojure) but one wishes to write much of their code side-effect-free, sprinkling in side effects sparingly, as needed for development necessity or convenience.

Similar to many other Lisps, every value has a type, but unlike Haskell, functions are not explicitly declared with types, nor does the compiler try to automatically infer them for you (I'm aware of type inference in some Scheme implementations, but unless they can do what I'm asking about below, ignore them for the moment).

Sometimes when optimizing code by hand for performance, it would be convenient to use mutable data structures. You'd also like to have some program statically check that you haven't accidentally introduced externally-visible side effects into the optimized function.

Another use for such static checking is when writing functions that can be called an unknown number of times during an STM transaction.

I believe that some of the work in type and effect systems might be relevant. I was hoping to learn what the current state of the art is in algorithms that can prove a function is "effectively pure", at least for some functions that use mutation or other not-pure-FP features (please let me know if there is more accepted terminology for this idea).

Example: A function kth-largest takes a list L of integers and an integer K. It returns the K-th largest value in L. We choose to implement it as follows: create a mutable vector V, copy the integers in L into V, sort V in decreasing order using mutation, and return V[K]. V is then out of scope. There is lots of mutation occurring in the implementation, but the caller can never tell (ignore the possibility of exhausting available memory).

Have people implemented systems that could determine that kth-largest is effectively pure? For which programming languages?

Are there examples of the kinds of functions these systems can prove are effectively pure, or informal characterizations of such functions that don't require a significant knowledge of semantics or category theory to understand?

Is there any significant difference in the answers if we change from implemented algorithms to published algorithms?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

ST Monad

In Haskell, you can implement stateful algorithms using the ST monad, then use runST to reduce it to a pure function. This effectively serves the purpose you're asking about.

ST monad with static checking in non-Haskell language?

Thanks, that is good to know. I'm reading some examples of using the Haskell ST monad now.

I know that there are a plethora of "Introducing monads in language X" tutorials on the web. However, is anyone familiar with work along the lines of "Implementing the ST monad in language X including static checking that the code follows the rules that lead to a pure functional result" (where X is not Haskell, and most interesting to me would be X is a Lisp variant, Java, or JVM byte code)? That would be interesting to learn about.

Would doing so in a way similar to how the Haskell compiler does it require a nontrivial subset of the machinery of a Haskell compiler somehow added onto the language X compiler?

IIRC, Bit-C includes a

IIRC, Bit-C includes a 'pure' annotation to check code. Clojure and has support for 'transient' values, that serve a similar purpose. Both of these are lisp-like languages.

BitC and Clojure transients

Thanks again for the BitC reference. I had heard of it, but didn't realize how much work on formal verification techniques was involved in that project. Hopefully I can learn something from their papers.

Clojure and its implementation of transients I am familiar with, and in fact was a motivation for my post. I suspect that a lot of mileage can be gotten in Clojure by having a simple code walker that checks for functions/macros/special forms known to cause mutation, and ones known to be purely functional (or at least assumed to be, without formal verification, e.g. the implementations of +, -, *, /, which are currently written in Java). It would also look for uses of free variables. I asked about Java and JVM purity-checkers above specifically because I wonder about the feasibility of extending such static checks down to the JVM layer beneath Clojure.

As hinted at above, I'm no semantic theorist, so there are likely all kinds of intricacies that I'm unaware of that many undergrads know about. I don't mind rediscovering some of these -- some problems become more real to you when you stumble across them yourself :-)

One place where I have a hazy awareness of potential complexities is functions passed as arguments. I suspect the best result one can give for a function like map is "it is pure functional if the function passed to it is pure functional, otherwise all bets are off."

Indeed, there may be some

Indeed, there may be some subtleties when passing a stateful function to a supposedly pure function. This problem was studied by Martin Hoffman in this paper :
Alas, his work is not focused on yielding an automatic method for checking purity.

Bit-C, as far as I recall,

Bit-C, as far as I recall, is moving away from S-Expression syntax, for better or worse.

Verifiable Functional Purity

Right on target

Very cool, and exactly the kind of thing I was looking for.

Is there perhaps some deep connection here that object capability models underly the techniques in this paper for determining a subset of pure Java methods, based on the Joe-E subset of Java, and also the EROS and Coyote operating systems that motivated Shapiro et al to create BitC referenced above? At first it seemed like a coincidence when I came across this connection, but now I wonder.

Capabilities and

There is a deep connection between capabilities and side-effects. Side-effects by their very nature are non-local changes to the environment, which means otherwise isolated computations can communicate with each other if side-effects are not controlled. This is essentially an unauthorized communication channel.

By encapsulating the authority to cause specific side-effects into explicit handles, ie. capabilities, this communication channel is now explicit and you can now reason about the flow of information and the changes to the environment, and can determine when subsystems are properly isolated by examining the capabilities they are provided. This happens to provide many of the desirable properties of referential transparency, composition and modularity in programming languages, and I personally think there is a deeper connection here even beyond side-effects that has yet to be fully explored.

Re: BitC, Joe-E, E, and capability OSes, a lot of these folks are capability researchers and they discuss many of these issues on the cap-talk mailing list.

You might be interested in the other capability work that analyzes the flow of authority or information, such as SCOLL/SCOLLAR, and Toby Murray's analysis of capability patterns using CSP.

"Witnessing side effects"

Pure kth-largest

My first post here. I have implemented what the OP asks in D V.2

(As the good David Eppstein notes, the algorithms I have used are not so efficient, I am aware.)