archives

Determining if a function is effectively pure

Say one is programming in a not-pure-FP language (e.g. Scheme, Clojure) but one wishes to write much of their code side-effect-free, sprinkling in side effects sparingly, as needed for development necessity or convenience.

Similar to many other Lisps, every value has a type, but unlike Haskell, functions are not explicitly declared with types, nor does the compiler try to automatically infer them for you (I'm aware of type inference in some Scheme implementations, but unless they can do what I'm asking about below, ignore them for the moment).

Sometimes when optimizing code by hand for performance, it would be convenient to use mutable data structures. You'd also like to have some program statically check that you haven't accidentally introduced externally-visible side effects into the optimized function.

Another use for such static checking is when writing functions that can be called an unknown number of times during an STM transaction.

I believe that some of the work in type and effect systems might be relevant. I was hoping to learn what the current state of the art is in algorithms that can prove a function is "effectively pure", at least for some functions that use mutation or other not-pure-FP features (please let me know if there is more accepted terminology for this idea).

Example: A function kth-largest takes a list L of integers and an integer K. It returns the K-th largest value in L. We choose to implement it as follows: create a mutable vector V, copy the integers in L into V, sort V in decreasing order using mutation, and return V[K]. V is then out of scope. There is lots of mutation occurring in the implementation, but the caller can never tell (ignore the possibility of exhausting available memory).

Have people implemented systems that could determine that kth-largest is effectively pure? For which programming languages?

Are there examples of the kinds of functions these systems can prove are effectively pure, or informal characterizations of such functions that don't require a significant knowledge of semantics or category theory to understand?

Is there any significant difference in the answers if we change from implemented algorithms to published algorithms?