Advantages of Purity

What advantages does purity offer in a programming language? I often hear (read) about these advantages, but never with concrete examples. I figured here is the best place in the whole www to ask this question :)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Pure in BitC

I predict that this will generate a wide range of answers.:-)

The importance of pure computation in the eyes of the programming community is closely tied to reasoning about program behavior. In most cases, impure expressions are not functionally isolated - their outputs depend on things other than (i.e. in addition to) their arguments. A pure expression is a "term" in the proof sense. This has implications for proof embedding strategies, the "program as proof" equivalence, and a bunch of other issues related to proofs. It is not clear (to me) how much of this preference exists because LCF has become the predominant proof foundation today while proof strategies involving state went out of favor.

From the compiler and optimizer perspective, pure subprograms are useful from the standpoint of optimization. Lots of rewritings and rearrangements and compositions are legal that are not generally legal with impure computations (though this is also tied up with evaluation order).

From a practical standpoint, the best answer I can give may be to explain why we were careful to maintain pure sub-computations in BitC. Basically, we knew that we were crafting a stateful programming language, and that we wanted to prove things about programs written in that language. We knew that reasoning about stateful systems is hard, and that it hasn't been much in fashion to deal with state in recent program proof work. We knew that perhaps 60% to 70% of operating system code is non-mutating, and that this subset could be reasoned about (with care) in term style. We hoped that being able to tell which sub-computations were and were not stateful would give us a significant reduction in the "complexity surface" of the proof analysis.

Whether that turns out in hindsight to have been useful remains to be seen.

[PURE] FUNCTION NOT(CONSTANT X:IN std_logic) RETURN std_logic;

As a weird comparison, hardware description languages like VHDL expect functions to be pure as defaut.

A pure function can always be translated into a combinatorial circuit, with no side effect, no flip-flop / memory cell.

It is very important for testing the behavior, composing elementary functions...

Purity offers its most

Purity offers its most significant advantages at a large scale, which is why you won't see many 'concrete examples' of these advantages. However, the introduction to Fudgets(discussed at this LtU node on Oracles) does offer a small-scale case and a reasonable explanation:

Side effects can interact with the programmer's activity of forming new subprograms, or naming subexpressions. For example, it is no longer clear that we could write

  let average = (a + b) / 2 
  in f(average,average)

instead of

  f((a + b) / 2,(a + b) / 2)

because a potential side effect of the subprogram a would be carried out once in the first case, but twice in the second.

Purity offers advantages for abstraction and code-reuse in this sense, but I would say that more significant advantages come from removing concern for ordering within the code. The programmer that doesn't need to worry about order of evaluation can write programs with one fewer worry. The compiler that doesn't need to worry about order of evaluation is free to make several optimization decisions, including parallelization, laziness, and partial-evaluations.

Due to modularity issues, it is often necessary to enforce purity in sub-programs to get many advantages from it.

But a language need not be 100% pure to offer the advantages of purity. Haskell effectively breaks the advantages of purity via its concurrency, state, and IO monads. However, the advantages of purity still apply to the pure sub-programs. Other languages, such as Mercury use effect-typing for purity, and Tim Sweeney's "Next Mainstream Programming Language" makes a case for purity for performance in a world with ever greater parallelism.

A clean separation between evaluation and execution also allows a program to 'wrap' a side-effecting sub-program and weave features into it. This isn't an advantage of purity alone; it requires some additional design regarding how communications are expressed.

Stateless state

Haskell effectively breaks the advantages of purity via its concurrency, state, and IO monads.

The ordinary State monad doesn't belong in that list - it's completely pure. Rather than breaking purity, it manages the threading of state through a computation in a pure way.

I disagree. Using the state

I disagree. Using the state monad "effectively breaks the advantages of purity" even if state is threaded in a pure manner. The advantages concerning composition and abstraction of subprograms don't apply at the interface to the state monad. Concern about relative ordering of expression is significant. Even the performance advantages are greatly diminished due to threading of that monadic context.

How can a pure function not have the advantages of purity?

There are definitely downsides to pervasive use of state monads in lieu of a more functional style, but compared to just threading the state around manually, I don't any damage to composability or abstraction of subprograms (not that I know what this latter term means).

Every Turing Machine Program is Pure

Without exception, every whole Turing machine - under its original definition - is pure. You have an input tape, an output tape, and a deterministic relationship between the two.

One can discuss whole-program composition - i.e. taking the output of one Turing Machine as input to another. In this sense, each Turing Machine is simply a pure function, and the composition of Turing Machines produces a new pure function.

But one can also discuss sub-program composition.

Composing a subprogram in a Turing Machine involves arranging instructions upon a tape, with replications or different orderings oft producing very different programs. Abstraction at this layer involves replacing blocks of instructions and parameterizing them. Modularity involves linking separately developed blocks of instructions. One must think in terms of side-effects. While in some hand-wavy sense you might say that each instruction is a function of the state of the machine to produce another state, that is of no use to developers, nor even to the compiler. Instructions are not meaningfully RT or Pure if the state of the machine passed to each instruction is different.

Working with a State monad is analogous to composition and abstraction of Turing Machine subprograms. It doesn't much matter to a developer of a subprogram that the 'whole program' composition is still pure. The local subprogram developer cannot take advantage of said purity... nor, to any great effect, may an interpreter or compiler, especially if supporting any sort of modularity.

...compared to just threading the state around manually...

I made no mention of manual vs. implicit threading. Excepting economy of expression, that isn't especially relevant.

Don't blame purity

I made no mention of manual vs. implicit threading.

I understand this to mean that you would equally condemn ordinary functional programs that idiomatically thread state. Fine, I agree, but such functions are undeniably pure, so I quibble with blaming the impurity of the State monad as the source of the problems.

I didn't call it impure.

I didn't call it impure. I said you lose the advantages that purity has the potential to offer. ("Haskell effectively breaks the advantages of purity via its ...")

And I should note that you can use the state monad without using Haskell's 'do' sugar. It is unclear to me what you think was being blamed, which point you are quibbling, or why you brought up implicit vs. explicit threading at all.

Not binary

Using the state monad "effectively breaks the advantages of purity" even if state is threaded in a pure manner.

This characterization implies a kind of binary situation in which one either has purity and its advantages, or not, and nothing in between. That's at odds with the reality of the state monad, which I'll attempt to show.

The advantages concerning composition and abstraction of subprograms don't apply at the interface to the state monad.

This is either backwards, or misleadingly put. An expression involving the state monad has a type involving something like State s a, i.e. a value of type 'a' that is parameterized by a state of type 's'.

To evaluate this, the monadic value must be applied to an initial state. This is a pure computation, which always produces the same output for the same input. Unlike impure languages, one is required by the type to consider the initial state as part of the input.

This is a fundamental difference between using the state monad to manage state, vs. having state be completely implicit in a language. The state monad approach can be implemented and used within the pure lambda calculus, which certainly isn't true of the other examples you mentioned, such as concurrency and I/O.

From another comment:

Working with a State monad is analogous to composition and abstraction of Turing Machine subprograms.

That's not the case, because of the point described above. Terms with a state monad type are parameterized by a state, and in fact are implemented as functions from a state to a value, and can be composed as such.

The behavior you seem to be thinking of is within an expression involving the state monad and the monadic bind operator, where one can have code in which purity appears to be violated, e.g.:

test :: State Int (Int, Int)
test = do
   put 5
   x <- f
   put 6
   y <- f
   return (x,y)

f :: State Int Int
f = get -- any arbitrary compuation dependent on the state

Here, x and y end up with different values because of the use of put to modify the state. However, the type of 'f' makes it clear that it's a function from a state to a value - it's pure, explicitly depends on a state, and can be composed. Ditto for the type of 'test'.

However, if you invoke 'f' using the monadic bind operator, whether explicitly with >= or implicitly using do syntax, the bind operator takes care of applying 'f' to the state. This is just higher order functions at work - a partially applied function is being passed to another function where it is given its final argument, the state.

The apparent violation of purity here is purely local, and only applies within an expression involving monadic bind. At any explicit function boundary, the type of the function will be that of a pure function parameterized by a state, and can be composed like any pure function. Even when using the function with state threaded implicitly, a caller of 'f' knows from f's type that it depends on a state, and knows the exact type of the state that f depends on, unlike the situation in languages in which state is implicit.

Because of all this, if you find yourself in a situation in which you feel you have "broken the advantages of purity" in some expression, it's straightforward to refactor with explicit state, since all the functions depended on in the offending expression are pure, even if they happen to be parameterized by a state. So the example above could become:

  let x = evalState f 5
      y = evalState f 6
  in (x,y)

In a more complex example, you'd typically have to explicitly thread the state, giving something more like:

  let (x,s0) = runState f 5
      s1 = 6
      (y,s2) = runState f s1
  in (x,y)
I made no mention of manual vs. implicit threading. Excepting economy of expression, that isn't especially relevant.

The ordinary state monad is not much more than a way of implicitly threading a state through a pure computation. Presumably you're not saying that threading state manually, as in the latter example, also breaks the advantages of purity?

the monadic value must be

the monadic value must be applied to an initial state. This is a pure computation [...] Terms with a state monad type are parameterized by a state, and in fact are implemented as functions from a state to a value, and can be composed as such.

You are focusing on composition at the external interface to the State monad - external composition, in the same sense one might compose whole Turing Machines externally.

By using words like "subprogram" I intended to refer to the internal composition: developing, abstracting, and modularizing those functions that lead from state to state on the internal interface to the monad. I refer to all that stuffing between initialization and returning the final value. The subprogram composition is what it means to actually use a state monad: if you didn't need to think and develop in terms of these small-step state transforms, you'd just write a pure big-step pattern-matching function and be done with it!

It seems my choice of words failed to communicate my meaning.

a kind of binary situation in which one either has purity and its advantages, or not, and nothing in between

Well, if you refer back to my original assertion, you'll see the opening sentence of the point to which you objected was: "a language need not be 100% pure to offer the advantages of purity". Purity helps for some subprogram expressions, and doesn't help so much for others. For the places it doesn't help, alternative approaches are reasonable, such as effect typing. My mention of Haskell seems to have thrown you, but I meant simply that even Haskell doesn't benefit from purity in all subprogram components. It is quite enough to advantage in many parts of a program.

Even if one is using a state abstraction, there is a certain degree to which one is not using it that is roughly equivalent to the size of any pure functional subexpressions on the RHS of an assignment. According to Tim Sweeney, the 'not using it' covers a significant portion of some fairly large programs.

Presumably you're not saying that threading state manually, as in the latter example, also breaks the advantages of purity?

No, I say that threading state manually also breaks many advantages of purity, to the extent you program in that style. It is still a state abstraction, and still involves thinking in terms of state transforms - especially when you start including 'while' loops and the like. By being manually threaded, it also happens to hurt economy of expression at the same time.

Composition at function boundaries

You are focusing on composition at the external interface to the State monad - external composition, in the same sense one might compose whole Turing Machines externally.

I'm focusing on composition at function boundaries, which is where most significant composition happens. Any real, non-trivial program involving the state monad involves many functions, each of which is pure, as I've described.

Within an individual function, you may have 'do' or bind expressions which exhibit some of the properties of an impure program. But as I said, this is not a binary situation where you suddenly find yourself with all the disadvantages of impurity just because you're using the state monad.

The subprogram composition is what it means to actually use a state monad: if you didn't need to think and develop in terms of these small-step state transforms, you'd just write a pure big-step pattern-matching function and be done with it!

This oversimplifies the situation, ignoring the fact that real computations using the state monad are composed of many pure functions. Some of those functions may involve expressions of the kind you're concerned about, but others may not.

If you went out of your way to emulate the style of an imperative program, by writing an entire program using a single composite state value that carried all state needed by any of the program's functions, and in addition wrote your individual functions as long, monolithic do blocks, then you could say without question that the advantages of purity had been broken. But the point of using the state monad is that it provides much more precise control over state than this — control that can't be achieved by relying on fully implicit state, and control that allows the advantages of purity to be maintained, as much as possible.

For example, it's typical for a non-trivial program to contain more than one module that deals with more than one type of state. In that case, the state needed by each module is confined to the modules that need it, with types that identify what state a module's functions rely on. This supports composition of even "stateful" functions, which is one of the capabilities that can be much more problematic when dealing with fully implicit state.

No, I am saying that threading state manually also breaks the advantages of purity.

To me, this characterization seems too imprecise to provide any insight.

In particular, threading of state through a program is usually done out of necessity, not convenience. So you're essentially saying that some programs can't benefit from the advantages of purity. But the example of Haskell with the state monad refutes this: it allows you to compose such programs out of pure functions, making the use of state explicit in the types of the functions, and maximizing the advantages of purity in the presence of state.

RE: composition at function boundaries

Discussing compositions that just happen to use an underlying State monad internally doesn't reveal anything especially interesting about how code within the State monad is developed. This is why I pointed out the analogy to composing Turing Machines vs. composing the tape-based code underlying them.

And which classes of application do you assume when you make claims about 'where most significant composition happens'?

There are quite a few Haskell applications where the situation is exactly as you describe later: much of the interesting code is developed within the context of state abstractions. Simulations are a fine example. While IO or Concurrency tend also to be involved, such applications would be no less stateful without them.

So you're essentially saying that some programs can't benefit from the advantages of purity.

No, I never argued anything to that effect. What I have said is that some subprograms aren't taking advantage of purity. Whether that amounts to 10% or 30% whatnot of the complete program is another matter. Threading state hinders leveraging advantages of purity only to the extent that your program and code development is focused on threading state.

this is not a binary situation where you suddenly find yourself with all the disadvantages of impurity

Again, I haven't argued such, but the reason I quote you here is that I don't really believe that "breaks advantages of purity" implies we suddenly suffer "all the disadvantages of impurity". Do you have a reason for this claim, or was it hyperbole?

Stamp coupling, REST, and Data Types a'la Carte

I'm focusing on composition at function boundaries, which is where most significant composition happens.

Back in the 1980s, Meilir Page-Jones wrote a very popular book on Structured Analysis where he described various kinds of 'coupling' in detail, going much further than Constantine's original definition of coupling. However, MPJ's descriptions were mostly intuitive and based on observations on changes programmer's make to the source text in response to changes in requirements. Stamp coupling, defined for the purposes of this discussion, is about packaging a bunch of independent bits and stuffing them all into one structure and then passing that structure across an interface.

The major problem with stamp coupling is not packaging independent bits into a common structure. The major problem is interpretation across an interface, and how you design that interpreter will have the biggest net effect on client/server interface segregation and responsibility segregation, accordingly. -- Alan Kay discusses this in The Early History of Smalltalk paper:

I say it this way because LISP had not only been around enough to get some honest barnacles, but worse, there were deep flaws in its logical foundations. By this, I mean that the pure language was supposed to be based on functions, but its most important components--such as lambda expressions, quotes, and conds--were not functions at all, and instead were called special forms. Landin and others had been able to get quotes and conds in terms of lambda by tricks that were variously clever and useful, but the flaw remained in the jewel. In the practical language things were better. There were not just EXPRS (which evaluated their arguments), but FEXPRS (which did not). My next question was, why on earth call it a functional language? Why not just base everything on FEXPRS and force evaluation on the receiving side when needed? I could never get a good answer, but the question was very helpful when it came time to invent Smalltalk, because this started a line of thought that said "take the hardest and most profound thing you need to do, make it great, and then build every easier thing out of it." That was the promise of LISP and the lure of lambda--needed was a better "hardest and most profound" thing. Objects should be it.

This general idea, of making everything an FEXPR, plays out in the design of many successful systems.

For example, in REST, independent bits within a mime-type can have their own URIref and therefore their own mime-type, negotiated dynamically between the client and server. Moreover, the whole message, realistically, should not be displayed "graphically" until the client and server fully resolve all these independent bits. With HTTP, technically you should be able to negotiate all these independent bits in parallel, but the truth is that modern web browsers like Firefox hardcode the user agent's connection limit to something like 4, because that just happens to be a "reasonable default" and the user agent is "dumb" in that it cannot expand connections based on environmental parameters.

Looking at Fielding's terminology, all connectors in REST are "uniform",have finite verbs and a single mandatory parameter, meaning the transport protocol (e.g. HTTP) has the semantics for parameter evaluation, and as long as the client and server play by these rules, messages can't be acted upon until their independent bits are all resolved via client/server negotiation. The Resource representation passed across the interface is recursively self-describing, using mime-types and links to other resources. Moreover, this recursion is never locked into one data representation; the user agent continuously negotiates with the server. It should even be possible to add support on the user agent end at run-time for a new mime-type via a plugin, have it be the preferred representation, and dynamically resolve new requests (from point-in-time of plug-in installation) moving forward. Moreover, this information is not part of the Connector interface, and the responsibility of the client and server to negotiate. It can be passed across the connection, but the connection doesn't care about these document-specific details.

A lot of interesting system design questions emerge from enforcing this extreme paradigm. For example, what to do if the user agent doesn't understand the mime-type? Use the self-describing document to find where the client can get an interpreter for that mime-type. Again, recursively, this is done using URIrefs. Or, for example, how does the user agent interpret the structure? By agreeing upon a representation, the user agent can map that representation to a structure it knows how to interpret. Links allow extremely late-binding of data structures, because the connector interface does not need to know about specific structures, so changing structures does not require upgrading the interface. As a consequence, the system interface is extremely reliable and portable. It is portable, not because of worse is better, but due to flexibility owing to extreme lazy evaluation.

How do we handle changing structure for services, such as upgrades to a service or differences of a service between user roles? The answer is that we don't want to put it in the uniform connector, because that would bloat the connector and destabilize communications, as now there is a many-to-many versioning problem at the network interface level.

How do we handle partial failure? The whole system describes how to deal with partial failure! Just look at HTTP Response Codes -- Smalltalk sort of had this design insight with the monolithic DoesNotUnderstand, but as far as I've seen on books on Smalltalk, discussion of partial failure is pretty much non-existent. So I think a real fitting systems discussion comparison would be Kiczales' Art of the Meta-Object Protocol to REST and HTTP.

Also, I feel a really important paper that nails the essence of stamp coupling in functional languages, from a purity perspective, is Data Types a'la Carte. It's a really beautiful paper that shows that you can freely compose free monads. Using such a mathematical analysis as a principle for describing the strength of a service encapsulation just seems 'right' to me, as you can use ADTs to help negotiate & verify data formats.

Discussed to death

Since this is an issue that has been discussed extensively both here and elsewhere, how about just linking to previous discussions and important papers (Why FP matters, comes to mind)?

Unless, of course, you want to point to unrecognized advantages, or advantages that are often overlooked.

At some point...

It would be worth it to start a Wiki. Formalize definitions, discussions, etc.

Why so few fans of effect typing?

Effect typing is a natural alternative to purity for supporting reasoning about code. A related topic which I haven't come across much discussion of is why such a small minority of PLT researchers seem interested in/hopeful for that alternative. Discussion on LtU is pretty muted (for example ) and textbooks seem to devote either one or zero chapters to the topic.

Perhaps there are "unrecognized advantages" to purity in the form of explanations for why effect typing is hard to make work in practice? I'm interested in pointers to such.

Good question

I'm not sure why that is either. Perhaps because people are holding out hope for a "simpler" (in some ways, and not in others) approach like the one discussed in Swierstra's excellent Data Types a la Carte paper?

It may just be because I am uninitiated, but it is considerably easier for me to get my head around (and even quickly implement) how to split up the IO monad using Swierstra's approach than it is for me to understand Marino and Millstein's. Along with the recently discussed paper on operational semantics for monadic computations, this seems like a clear and relatively straightforward path to a Haskell-like language with much finer-grained and perhaps even extensible (to model non-pure FFI calls) effects.

The biggest stumbling block seems to me to be the asymmetry of the coproduct injection predicates used by Swierstra, but I am reasonably confident that including a symmetric predicate for this special purpose within the type system (essentially moving from viewing coproduct functors as binary trees of functors to viewing them as sets of functors) can resolve this problem.

Purity and Effects

That's not an obvious question - or at least not one with an obvious answer.

In BitC, at least, our implementation of (pure e) did indeed rely on effect typing. The plan was for the language to automatically infer purity of expressions and procedures, with the intent that the (pure e) form expressed a requirement (i.e. a constraint) on the enclosed computation.

Among our many discussions on this topic, however, was the distinction between "pure" (i.e. a deterministic function of arguments) and "side effect free". A computation can be fully encapsulated without dropping side effects, so long as the side effects do not escape outward or inward. The "escape outward" part is obvious enough; the inward prohibition is that the computational result must not depend on mutable (i.e. by a third party) global state. Such a function is pure in the mathematical sense, and it is encapsulated in the sense discussed elsewhere on this page, but it cannot be treated as a logic term and it is not evaluable by a shallowly embedded semantics. We went back and forth on this multiple times, and eventually settled on what might be described as "side effect free with an escape hatch for initialization".

Effect systems are good for other things as well, notably (in our case) for identifying subprograms that do no heap allocation and/or subprograms that have no interactions with the collector runtime. The first was adopted for BitC with an associated (noalloc e) constraint form. The second was not, mainly because we did not consider it necessary to deal with that reductio given the goals that we were pursuing at the time.

Effect systems are good for

Effect systems are good for other things as well, notably (in our case) for identifying subprograms that do no heap allocation and/or subprograms that have no interactions with the collector runtime.

Substitute "do no" and "have no" each for "need no"; it is entirely possible that the language rules say the program conceptually uses heap allocation and possibly therefore interacts with the collector, but since a reference never escapes outward, it can be transformed from a heap allocation to a stack allocation or even directly sink it into a register allocation if it's really small data. This "needs no environment" static analysis is referred to as escape analysis.

BitC

At least from my perusal of the spec, it doesn't seem BitC has any semantics that distinguish between 'heap' and 'stack' concepts. [Edit: well, it seems I must recant: 'references', including strings and vectors, are 'heap-allocated'. The spec seems to have some wriggling room, though, in that it never actually defines what a BitC 'heap' is, or what it means to be heap-allocated.]

The distinction between heap and stack is rather frail, and really should be an implementation detail with optimizations going in either direction. When dealing with concurrency, you eventually face the fact that even stacks are effectively part of the heap. Baker and Stallman have independently developed stacks that become heaps at need, and Chicken Scheme uses a variation in its own implementation.

Bottom line

Compiler writers care about observable characteristics, not implementation details. The implementation is only an attempt at an optimal execution strategy for the observed characteristics of the program's semantics. Implementation details are improved when observed characteristics are either (a) be made better use of; or (b) more of them can be observed (e.g., waiting until a different stage in program computation to have access to a larger or more specific set of characteristics).

Allocation is a strategy (e.g., for implementing call-by-value), and therefore a detail. Techniques such as escape analysis and copy elision just demonstrate that it is very profitable to focus on specific semantic meaning in language design (e.g., call-by-value).

Proctoscoping the Bottom Line

I think you mean PL researchers rather than compiler writers. If so, I agree, and this largely explains why O'Caml and Haskell will never become mainstream languages (much though I may find that regrettable). Compiler writers generally care very much about implementation details.

Semantics is a model of computation, and the model is not the computation. For many (perhaps most) programs, the currently prevailing view of computational meaning defined by the well-defined production of values is sufficient. But for those computations where performance and/or variance are part of the requirements, this approach to semantics fails the "total abstraction" test, and choices of strategy become part of the requirements (therefore semantics) of the program.

Mere constant factors of 40x to 100x sometimes matter, and so cache residency is sometimes a thing that a developer must manage in order to do their job. Memory size also matters in practice. Replacing manual with managed storage in a large application often doubles or triples the RAM requirement. Think about the dollar cost of that consequence on a machine farm the size of, say, Google's.

Because BitC is trying to solve problems for programmers who experience these requirements, the language needs to exposr some of those concrete strategy requirements. For those applications where the strategy doesn't matter, there is no reason to object to specification of strategy within a consumed library. For those applications where the strategy does matter, inability to specify what is to be done is a sufficient and necessary reason to reject the use of a programming language.

The devil, as they say, is in the (implementation) details.

Le bon Dieu est dans le détail

O'Caml and Haskell may never become "mainstream", whatever that means - embedded system's programmers have always been neglected, it's only for the pathetic reason of energy consumption that industry is interested in helping them out.

I think by implementation details, you really mean what Walid Taha calls resource-awareness. Walid is effectively your competitor with BitC, and I think he will ultimately win - see his work on MetaOCaml device driver programming.

Also, if you think about it, resource-awareness potentially generalizes detecting for things like leakage of authority.

Ultimately, your goal is to develop a minimal model of tasks. Your ECOOP '06 position paper gives a good example of the UNIX networking stack. That is a good benchmark; but what other benchmarks do you have?

Le bon Dieu est dans le détail - The God is in the details.

irony

i go to his home page and click on the link for RAP and... it hangs... and nothing ever comes back...

Position paper

Which ECOOP paper are you referring to, since I did not write one that I know about...

I meant ASPLOS '06

Programming language challenges in systems codes: why systems programmers still use C, and what to do about it

By the way, I do appreciate the fact you and Walid are both interested in the same area of research. If there is one area of Linux that has always been an Achilles heel, it is device driver support, perhaps best typified by Linus's hilariously wrong generalizations about specs.

If there is one area Microsoft excels, it's stuff like verifying third parties obey the spec. - And Linus's "classic example" is silly, because he is arguing that layering in the OSI model decreases performance and that somehow that example applies to the discussion. In particular, RFC 3439: Some Internet Architectural Guidelines and Philosophy, Page 7, Section 3 Layering Considered Harmful does a good job discussing why architectural patterns != structural patterns, and that architectural patterns should be reasoned about dynamically. RFC 3439 puts forward a much more rationale argument than Linus's silly rant.

This is a failure of the current spec

it doesn't seem BitC has any semantics that distinguish between 'heap' and 'stack' concepts

This is a failure reflecting an early draft specification. Basically, we got to putting our attention into the type system issues and hadn't yet gotten back to this. I'm just now getting back to it.

The model, in brief, is that :val types are legal in two contexts:

  1. As unboxed members of some containing type.
  2. As unboxed members of a stack frame.

A :val value can be copied into the heap through explicit action (box e), but may not otherwise be placed on the heap by the compiler (edit: see more precise capture below). Statically allocated globals, for this purpose, are not considered part of the heap; the test is whether run-time allocation is required. There is a planned construct (noalloc e) which requires that no operation performed during the evaluation of e causes heap allocation.

The BitC compiler is free to move allocations from the heap to the stack when it can. It is not free to migrate allocations from the stack to the heap.

The issue here, in the end, is the desire to write sub-programs that are known not to trigger GC. In Coyotos, for example, the bulk of the operating system (in fact, the entire OS after initialization) would execute under this restriction.

Many collected language specifications admit implementations in which stack frames are heap-allocated. This implementation is intentionally precluded in BitC.

(noalloc e) is not discretionary

Substitute "do no" and "have no" each for "need no"...

I wrote that very sloppily. In BitC, (noalloc e) is prescriptive. It expresses a requirement that a correct implementation must not perform heap allocation.

While the BitC compiler is free in general to optimize heap allocations onto the stack, those optimizations are not generally predictable by the developer, and it is difficult to specify optimizations of this form as mandatory language semantics. In consequence, the BitC noalloc requirement is syntactic, and the requirement that the expression computation does not allocate is a requirement on every conforming implementation of the language.

This means pragmatically that the compiler may not emit code that heap allocates unless (a) it is syntactically directed to do so, or (b) it has already observed some other syntax-driven heap allocation reachable from the current function, with the consequence that further heap allocations will not change the effect typing under the NOALLOC effect.

Complexity

Effect typing essentially moves the entire burden of controlling effects to the type level. The program has effects and the effect system tracks them. In contrast, programs in languages like Haskell are actually pure. The type system is simply there to enforce this.

When you move the entire burden of dealing with effects into the type system, you need complicated types. Basic things (e.g. "does this function do IO?") can be done simply enough. However, it gets more complicated when you want to deal with things like "a mutable array allocated in 'f' that is no longer referenced when 'f' returns". The external interface of 'f' should be pure which means the effect system effectively needs to keep track of what's referencing the array (and what can't) in order to verify this. Accordingly, the types of everything necessarily get more complicated; pure functions may carry additional type variables, data types may require many more variables to have the desired polymorphism, et cetera. If you look at the effect system in DDC, you'll see what I mean. It's very easy to end up with types that are awkward to read (much less write).

Haskell, on the other hand, doesn't have this problem. Functions that are pure do not need to have complex types simply because they may manipulate objects that are mutated at some other point in the program. The burden of dealing with effects is essentially shared between the type system and the expression language, and I think this leads to greater simplicity for the vast majority of programs. Of course, if you're writing programs that do a very heavy amount of IO, this may not be the ideal solution. It is, however, still workable in cases where it is non-optimal (and would be more workable if
Haskell were an eager language... but that's another discussion).

As a quick example of the above, here's the type of 'map' in Haskell:

(a -> b) -> [a] -> [b]

And here's the type for 'map' in DDC:

forall a b %r0 %r1 !e0 $c0
. (a -(!e0 $c0)> b) -> List %r0 a -(!e1 $c1)> List %r1 b
:- !e1 = !{ !Read %r0; !e0 }
, $c1 = f : $c0

You also have additional complexity when it comes to program manipulation because programs themselves are not pure. Algebraic laws require additional side conditions about purity. For example, in Haskell, the order in which some 'f' is applied for the 'map' function is unimportant. In a language like DDC, it *is* important; any sort of program manipulation involving 'map' needs to take this into account. This is why I'm not entirely convinced by the argument on the link above that it's horrible to need both map and mapM; the algebraic laws that can be used for each are different, and requiring some explicitness on the part of the programmer as to their intent is not necessarily a bad thing.

In short, programming language design is all about tradeoffs. I think effect systems haven't taken off simply because they've yet to demonstrate that they can be sufficiently economical. If they were to get a foothold, I would imagine it would happen in low-level systems languages (e.g. BitC) rather than high-level languages that have already blown their complexity budgets on things like polymorphism and procedural abstraction.

sensible defaults?

if DDC assumed Haskell purity of effects by default, then the map could be written as succinctly, i'd guess. leaving the complexity for those cases when you really need it. pay as you go sorta thing. hopefully teaching people to stay pure as much as possible.

Premature Generalization?

I suspect you're projecting a particular language-design onto effect-typing as a whole. Questions such as "does this function do IO?" are not sensible in all effect-typing systems.

Consider an alternative effect-typing system: pure function is A->B, a value of type T obtained via IO is !T. A procedure of two arguments could be typed as A->B->!C. A procedure that takes a block might be !A->!B. The type for functional map: (A->B)->[A]->[B], same as Haskell. The type for procedural map: (A->!B)->[A]->![B]. One may have types for finer-grained effects than IO (to type delay and timing effects, input-only, output-only, confinement, divergence or non-termination, resource costs, etc.) This design is analogous to the monadic approach, which will require you write different code for the different effects.

DDC's more complex type discipline is intended to enable better reuse of code when the effectful version has similar 'shape' or expression to the pure version. But you can have effect-typing disciplines that are little more complex than anything in Haskell.

The difference between Haskell's monadic approach and simple effect-typing is felt most readily when dealing with modularity, distribution, and extension of code at run-time.

programs in languages like Haskell are actually pure

I do not agree. If programs in languages like Haskell were pure, then you could freely execute them twice, or execute them not at all, and have the same effect on the world modulo CPU and memory performance. Haskell programs that involve the IO monad are not pure. This has an impact on program development and upgrade cycles.

Criteria and Abstraction

Questions such as "does this function do IO?" are not sensible in all effect-typing systems.

I agree. I simply intended to give an example of something that's straightforward and does not significantly bloat the types involved.

But you can have effect-typing disciplines that are little more complex than anything in Haskell.

Again, I do agree. The question is if they're sufficient to replace or improve upon something like Haskell's monadic approach for enough cases to offset the additional type-level complication they introduce. I would consider the example you give to be insufficient for most applications (for a variety of reasons I'm sure you can imagine or are already aware of).

The difference between Haskell's monadic approach and simple effect-typing is felt most readily when dealing with modularity, distribution, and extension of code at run-time.

I agree, but it should be noted that the complexity of something like DDC's types can negatively impact modularity and extensibility. For example, a language with first class modules may allow the definition of an interface type and the existence of multiple modules that implement that interface. If such a system requires that the types of all functions in the modules be the same in order for this to work (likely given some set of associated types a la ML), I could see DDC's types causing serious problems due to incompatible variations between related functions.

To be clear, I don't mean to suggest effect types are bad in the least. My concerns revolve almost exclusively around their interaction with the plethora of features necessarily present in high-level languages. As far as I know, a language with expressive effect types that maintains sufficient flexibility without blowing the complexity budget for the domains in which high level languages are typically employed has yet to materialize. I certainly don't think it's hopeless, but I don't think I'm incorrect in thinking major obstacles lie head. The successful application of effect types to lower-level languages seems more likely to me in the near future, both because integration with features commonly found in such languages is likely easier and because the possible benefits (e.g. controlling the necessary manipulation of mutable data where persistent data types are not an option) would be of great importance.

I would consider the example

I would consider the example you give to be insufficient for most applications

Interesting assertion. What do you believe makes an effect-typing system insufficient for a given application?

I could see DDC's types causing serious problems due to incompatible variations between related functions.

Could you provide an example?

It seems to me that those 'forall' quantifiers allow a great deal of polymorphism.

Interesting assertion. What

Interesting assertion. What do you believe makes an effect-typing system insufficient for a given application?

It depends on the application. I'm not sure what other answer I can give you.

My only intended assertion was that the system you presented was likely insufficient for most applications if you're interested in doing functional programming. For example, just because a value is obtained via IO doesn't mean it is used in a manner where effects would be visible afterwards. Nor is it clear how your system would work in cases where a pure value references a value produced via IO; it would be a shame if the container itself and everything it references was automatically contaminated. Nor is it clear what sort of subtying is needed in the system (e.g. you give '(A->!B)->[A]->![B]', but can I pass some function '(!A -> !B)'?). More crucially, what about exceptions? Or procedures that don't produce values via IO but are evaluated solely for their side effects (e.g. "print")? How are they tracked?

Could you provide an example?

Sure. If a module requires a function of some type '(a -> b) -> m a -> m b' for some associated types 'a :: *', 'b :: *', and 'm :: * -> *', you could instantiate it with a function of type '(Int -> String) -> [Int] -> [String]' after stating 'a = Int', 'b = String', and 'm = List'. In something like DDC though, functions need far more information in order to explain how they propagate effects. For example, is the list in both cases above produced via mutation? If so, is it the same list that was passed in, or was it some list produced internally that was mutated? Does the function satisfying 'a -> b' close over anything else, e.g. does it use a value internally for which effects might be visible? Are the strings immutable and shared within the list or are they uniquely referenced and safe to mutate without visible effects?

This is all extra information that is not captured by the simple type '(Int -> String) -> [Int] -> [String]' that you can get away with if you insist on persistent data structures, nor can you somehow reconstruct it afterwards. The only possibility I can imagine is that the module itself would need to also be parameterized by the effects of all of the values that are used to instantiate it. If you wanted a module system similar to ML's where you can have multiple implementations of the same signature given some associated types, that might mean instantiating a pile of effect types manually, and it would certainly mean that the effects for all functions involved would have to match such that they could produce modules of identical types if you want to use the resulting modules interchangeably as you likely often would in the case of first-class modules. And if your first class modules need to be directly parameterized by those effect types because they can't get away solely with associated types, that's a lot of extra information your functions that use those modules will be carrying around.

I'm giving these examples quickly and off the cuff, but hopefully they give an indication of the sort of problems I'm concerned about. I'm not suggesting it is impossible to get expressive effect types into a high level language while retaining sufficient flexibility (procedural abstraction, etc) without blowing out the complexity, but I do think I'm correct in thinking that a lot of work remains to be done. I will say though that I believe DDC in particular does not meet those criteria, although I realize the author likely has different goals.

Oh, there seems to be a

Oh, there seems to be a miscommunication. I mean to say that '!T' is effectively a block/procedure returning a value of type T. A value of type T is obtained through IO, and the code effecting this is typed !T. As noted above, something like !A->!B is passing one block or procedure into another. (It doesn't actually execute anything.)

I was curious as to your deprecation of it, as it is quite analogous to the monadic approach you were promoting earlier. Your understanding does much to clarify why you thought it was a bad idea.

What about exceptions? Or procedures that don't produce values via IO but are evaluated solely for their side effects (e.g. "print")? How are they tracked? Could you provide an example?

Exceptions may be another effect-type (though I'm skeptical of the utility of exceptions in pure computations; sum-types work just as well). All procedures return values, but the Unit type only has one value and thus does not return any information. Weaker IO effect-types could be available for delays (future values) and change (reactive values) and so on.

the module itself would need to also be parameterized by the effects of all of the values that are used to instantiate it

I suppose you mean that the public signature of a module might be templated by a bunch of region and effect-type specializations. But such manipulations to the public signature will only further constrain where the module is used, not how it is implemented. Indeed, by constraining the public signature, one reduces constraints on the implementation - i.e. the functions that instantiate the module may be universally quantified over the type and effect constraints supplied by the module's public interface.

I was curious as to your

I was curious as to your deprecation of it, as it is quite analogous to the monadic approach you were promoting earlier. Your understanding does much to clarify why you thought it was a bad idea.

Thanks for clarifying. Apologies for the misunderstanding; such is the nature of posting quickly while at work!

Exceptions may be another effect-type (though I'm skeptical of the utility of exceptions in pure computations; sum-types work just as well)

To play devil's advocate, you could say that exceptions in pure computations save you from having to write duplicate versions of higher order functions just to add the ability to short-circuit computations for efficiency reasons.

But such manipulations to the public signature will only further constrain where the module is used, not how it is implemented.

Perhaps I wasn't clear here. I believe I agree with your point and I was trying to say the same thing; the abstraction may become less general because implementation details may cause different instantiations to become non-compatible where they otherwise would be. For example, 'f :: [Array Int] -> Bool' and 'g :: [Array Int] -> Bool' may not be unifiable because f's list contains arrays that must be unique (because it mutates them internally) and the g's contains arrays that cannot be unique (because it uses them in a way that is non-linear).

A language that uses persistent vectors instead of arrays avoids this sort of problem. For a high-level language, you can usually afford to take this sort of hit in performance (i.e. force the use of persistent data) to get more general abstractions. There are certainly other uses for effect types such as tracking IO, and I think these could be relatively lightweight additions worth integrating. Then again, if all you're tracking is such a simple thing, the utility of it becomes questionable; if your language already has a type system such as Haskell's, you might as well use an IO monad rather than adding additional type-level machinery.

I've been doing some work lately on integrating effect types into a language with procedural abstraction and first-class modules. My posting here may reflect a frustration with finding a nice spot in the design space more than anything else. Hopefully I've not overstated my case as a result.

exceptions in pure

exceptions in pure computations save you from having to write duplicate versions of higher order functions just to add the ability to short-circuit computations for efficiency reasons

It is unclear why you believe exceptions do anything to 'short-circuit' a computation. Are you assuming a specific implementation or evaluation strategy?

[Edit:] Pure, total functions will always terminate in the same deterministic result regardless of evaluation strategy. However, once you introduce 'effects' such as divergence or exceptions, you must also pick a deterministic evaluation-order semantics (such as strict, by-need, or normal-order) lest you unwittingly introduce non-determinism. That is, when two sub-evaluations result in an exception, it is necessary to know which exception has semantic precedence. But for any deterministic evaluation-order semantics, you can produce a computation that cannot be short-circuited in the presence of possible divergence or error. E.g. Assuming foo may raise an exception, then if(foo(X) and false)... will need to evaluate foo(X) to see if the exception is raised.

Thus, as far as I can tell, introducing exceptions into pure computations actually resists "the ability to short-circuit computations for efficiency reasons", rather than supports it. Exceptions also diminish benefits from parallelization and such.

the abstraction may become less general because implementation details may cause different instantiations to become non-compatible where they otherwise would be

An instantiation of a module is, in essence, a record of capabilities that conform to whichever public interface (aka 'signature') defines said module. The nicest module systems take that literally: a given record of capabilities may conform to many different signatures, which may be specified later.

It seems your argument assumes that each instantiation is defining a signature or public-interface, rather than conforming to one or more signatures.

i might be missing the point but

i thought the short-circuiting was for the code inside the function that threw the exception, not the code calling it.

Short-circuiting

Technically, the if(foo(X) and false) ... code may also be throwing an exception and thus be short-circuited, might it not? Yet, it would be more efficient if short-circuited prior to evaluating foo. Exceptions, thus, force you to compromise between efficiency and determinacy.

short-circuiting was for the code inside the function

No. The friendly devil's advocate was quite clear: the short-circuiting was for efficiency reasons.

Any performance feature must account for both efficiency and utilization. (efficiency: use fewer resources to provide equivalent service; requires eliminating waste. utilization: leverage more resources to improve service; concerns hardware scalability and parallelization.) Efficiency and utilization properties should always be studied on a global scale (well, the most-global scale that you can influence) lest you fall into the trap of simply shifting costs and waste to service clients and developers.

Thus, if the goal of exceptions in pure computations is efficiency, this should be measured on a global scale. I.e. you must account for the effect of exceptions on the calling code! Further, you must assume that utilization is also desirable. For pure computations, utilization generally is achieved through data-parallelism, load-balancing, pipelining. So, you should study the total efficiency of the exceptions mechanism in the context of mechanisms that enhance utilization.

It is unclear why you

It is unclear why you believe exceptions do anything to 'short-circuit' a computation. Are you assuming a specific implementation or evaluation strategy?

This is a metaprogramming control operator in C++ Boost MPL, used in the graphing library to return control by returning the result as an object. The caller knows to expect an exception as the result. The effect is that the stack is unwound and the return value returned to the caller's catch-block. It is super-clumsy.

It is unclear why you

It is unclear why you believe exceptions do anything to 'short-circuit' a computation. Are you assuming a specific implementation or evaluation strategy?

I was just tossing something out there. It should be clear to you how exceptions can be used to short-circuit computations in many existing languages. I'm not prescribing anything here. I would consider this bad practice.

It seems your argument assumes that each instantiation is defining a signature or public-interface, rather than conforming to one or more signatures.

That's not an assumption in my argument. I'm suggesting the interface many need to have additional parameters. Trivial example; an interface 'Foo (a :: *) (b :: !)' where 'a' is some value type and 'b' is some type that tracks uniqueness. If you have two instantiations of Foo, one where 'b' becomes "NonUnique" and one where it becomes "Unique", then these two instantiations will have incompatible types. In a setting where modules are first class, you would not be able to, for example, put both of the modules into a homogeneous list. A language without the ability to distinguish between unique and non-unique values, presumably by forcing the use of persistent data everywhere (or simply ignoring the issue and allowing unsafe mutation), would not face this problem; the two module instantiations *would* have compatible types.

This is a bit of a silly example that assumes a particular effect system, but hopefully it clarifies what I'm getting at.

Problem?

So you're saying that the problem is that you can't place incompatible elements into a homogeneous list? Funny. I thought that preventing such behavior was the 'problem' that types are intended to solve.

Well, I suppose it would be a problem if a polymorphic consumer of a list of 'Foo' interfaces was unable to process a non-homogeneous list of Foo types. And such a restriction would certainly exist if you are unable to express or type-check non-homogeneous lists. Throwing an adaptor pattern at Foo interfaces in order to homogenize the list seems like an undesirable solution.

If a restriction exists for dubious reasons, that is a problem.

Needs subtyping

I think his point is that if some property you don't care about is responsible for making the list appear inhomogeneous to the type system, then that's a problem. I think what's needed is subtyping rules: pure => impure, etc., so that when you put two values with the same primary type into a list, there is always a least common denominator effect types that make it work.

Confirmed

Yes, that was indeed what I was getting at.

Regarding subtyping, that would work for certain things (with purity types on arrows as an important case). It wouldn't solve the particular example I gave however. Depending on the particular effect system and the particular abstraction mechanisms, the problem I'm describing may be non-existent, extremely detrimental to abstraction, or somewhere in the middle. Even if does not restrict abstraction in an absolute sense, it may place additional burdens on the programer designing abstractions that can be parameterized by different effect patterns.

Effect Subtyping

Your example mislead me, then.

Many forms of effect-typing restrict encapsulated behaviors, restricting implementation effects, i.e. in terms of confinement, temporal coupling, divergence, delay, partial-failure, non-determinism, resource costs. Since these behaviors are encapsulated, they can easily be subject to a subtyping relationship.

But substructural types (linearity, region inference, etc.) restrict the external use of values - their origins and destinations. By their very nature, such things as origin and destination cannot be encapsulated. Nor may they be locally substituted. And, since they cannot be locally substituted, they cannot be subtyped with respect to Liskov Substitutability Principle. (They may be quantified or parameterized, though.) Since your example was attempting to place incompatible substructural types into a homogeneous list, the idea that you were seeking a subtype relationship didn't occur to me.

Due to differences in encapsulation, I tend to think substructural typing an entirely distinct concept from effect typing. However, DDC lumps them together in its documentation, and I suspect many other PL enthusiasts do the same. But if you were to follow my classifications, you could come up with general approaches to 'effect subtyping'.

For the language I've not been developing for over a year now, I use a simple rubric-based effect-subtyping of sorts: pure <: reactive <: future <: procedural, at least with regards to which may call upon others. I don't directly support any substructural types. However, static analysis tools, including optimizers, are free to recognize sealer/unsealer pairs, nonces (single-use methods), and annotations for linearity and such, and emit appropriate warnings.

But I should note that I consciously selected against implicit substitutability of effect-types; that is, I require you wrap a pure value explicitly before you may pass it to a procedural context, i.e. define AsProcedure = {fn: Value => {proc: return Value;}}. The reasoning behind this decision was based on parametricity. Effect subtyping - even where possible - may be undesirable. My earlier mention of subjecting each 'Foo' interface to an 'adaptor' refers to the more explicit approach of wrapping effects that can be substituted for one another.

I was certainly blurring the

I was certainly blurring the two, likely because my personal interests as of late have been more on substructural types as a means of avoiding visible effects rather than tracking them. Your point on effect typing restricting encapsulated behavior as opposed to external use is well-taken. I suppose the former category is what I was referring to when I said tracking IO was easy; presumably, the "ease" of it is because it doesn't limit external use.

Thanks for the clarification. While I never intended my claims regarding abstraction to apply to all effect systems, it seems I was more talking about the common integration of effect systems and substructural types. Presumably an effect system that just tracked encapsulated behaviors and had the appropriate subtyping relations wouldn't adversely affect abstraction (provided additional parameterization necessary for modules and such was also inferred). Armed with a better definition, I retract my claim.

Apples and oranges

It's not fair to compare the type of a pure map in Haskell with an effect typed map in another language. You can functionally encode a version of map which is able to produce side-effects in a functional language like Haskell, and such a thing will have a more complicated type. In fact, giving a type to such a thing that's similarly precise to the DDC effect type will probably require (or at least beg for) dependent types and end up looking close to that DDC monstrosity.

Stated another way, you can do more with that effect typed version. Currently, in Haskell, if you want to add a randomized algorithm or state to an otherwise pure numerical computation, that requires changing the type of everything on the way down. I don't see much good reason not to add this level of parameterization. The algebraic laws involved don't really change, so long as you work modulo the state changes you don't care about. If the side effects are relevant, then the algebra is more complicated, but that's unavoidable.

Sure it's fair!

It's not fair to compare the type of a pure map in Haskell with an effect typed map in another language.

It is absolutely a fair comparison given an appropriate context. I certainly agree with your point about encoding a function of similar capability in Haskell, but it's irrelevant if one rarely if ever needs that degree of generality. I think it should be uncontroversial that A is better than B in context C if it is better for the vast majority of cases in C and sufficient for the rest. Substitute whatever you want for C here; web programming, desktop applications, systems programming, embedded applications, et cetera. My suspicion is that the tradeoffs of effect types are more likely to be acceptable in the latter two than the former two.

And here's the type for

And here's the type for 'map' in DDC: [...]

IIRC, the DDC map signature is completely inferred, so you never have to supply that incredibly detailed effect typing. I think that's an important point. The only functions that will need an explicit effect typing are the ones where you need to exert that control.

Even if you never *must*

Even if you never *must* write it, you'll certainly want to! I consider the fact that a language like DDC may actually discourage this to be a major problem.

It is considered very important to give types for all top level functions in a language like Haskell. Not only does it ensure that your assumptions about the types involved are correct, but it also makes type errors significantly easier to track down. Given that functions in DDC have types that are significantly more complicated and exhibit far more polymorphism than in Haskell, I can only imagine the difficulty one would have diagnosing type errors in a program that gave only minimal annotations.

Partial Annotations

DDC allows you to provide incomplete type annotations, and it will infer the remaining bits. I.e. you can annotate the 'map' function exactly as you might in Haskell, and both the region inference and effects will be inferred. See Effect Systems # Behind the Scenes.

Sure, but that doesn't help

Sure, but that doesn't help for the inevitable type error due to something involving regions and/or effects. At that point, you'll likely have no choice but to sift through large, inferred annotations or to begin writing them down yourself anyway.

Perhaps my recollection is

Perhaps my recollection is incorrect, but the only type errors due to regions and/or effects should result from using functions where effects are manually specified, not automatically inferred. Any higher order functions would be inferred as parametric in their effects.

If we're in the domain of manual specification, we're already paying the cost of specifying effect annotations, so there is no additional cost.

I'm not sure how this could

I'm not sure how this could be the case. If you or someone else specifies constraints somewhere (e.g. the function passed to another function must not mutate its third argument), and then violate them elsewhere, you're going to get an error. If none of your functions have annotations, it will likely be difficult to track down. In the presence of transactional code or parallelism, I would expect quite a lot of constraints would need to be enforced for programs to behave sensibly.

If you have the effect system and don't use it, then sure, you won't run into errors. Of course, at that point, there's no reason to have it at all.

If you or someone else

If you or someone else specifies constraints somewhere (e.g. the function passed to another function must not mutate its third argument), and then violate them elsewhere, you're going to get an error.

I agree that I'd like effect annotations to be more readable, but I don't think there's any hard data to believe that they are completely unmanageable as you claimed.

MLKit is the only compiler I'm aware of that has been using regions with inference for years, and their experience suggests they need to inspect and specify region annotations sometimes, and that this is often unwieldy. However, this is only because they use regions for memory management, not merely effect analysis as DDC does. There's no reason to believe the same problems transfer as a result.

If you have the effect system and don't use it, then sure, you won't run into errors. Of course, at that point, there's no reason to have it at all.

The reason for it is so the compiler knows what transformations are legal at any given program point without you having to duplicate code with different algebraic properties, ie. map/mapM. The compiler can now duplicate map for you if needed.

I agree that I'd like effect

I agree that I'd like effect annotations to be more readable, but I don't think there's any hard data to believe that they are completely unmanageable as you claimed.

I never made any such claim. In fact, I explicitly said I think they have a good chance of working well in certain domains. My concerns about the tradeoffs of rich effect systems for high-level languages comes from my attempts to apply them to that specific context.

The reason for it is so the compiler knows what transformations are legal at any given program point without you having to duplicate code with different algebraic properties, ie. map/mapM.

I'll clarify then. There's no reason to expose it to the programmer if the programmer isn't going to actively make use of it and all types will be inferred. If the programmer is going to actively use it, it needs to be sufficiently easy to use and worth using for a particular application.

A simple answer...

giuseppe, the main reason programmers enjoy purity was mentioned in the first reply.

"The importance of pure computation in the eyes of the programming community is closely tied to reasoning about program behavior." (italics added)

In layman's terms, just knowing that a function call does not produce side-effects (changes to anything that is outside it), means that all of the "behavior" of a function is completely encapsulated within the function. This allows us to consider a function as "an island unto itself", rather than as a system that interacts with other parts of the program. Being independant, it is much simpler to wrap one's head around the function's definition.

Also, as the first reply mentioned, because we can reason purely about the function's definition by itself, it is also easier for programs that reason about code, to perform their reasoning tasks.

Ok, thanks everyone for the

Ok, thanks everyone for the precious input!

Advantages of purity - security

In case anyone is still reading this thread (it is on the front page of LtU at the moment!), I have co-authored a paper on the advantages of purity to security. Purity helps programmers reason about, and/or enforce, certain security properties. Here is the paper:

Verifiable Functional Purity in Java.

RE: Purity and Security

It is true that purity encompasses properties such as confinement and determinism, which are useful when reasoning about security.

But purity also constrains expression, often more so than is necessary or desirable for secure and robust composition of services. In general, I would want to reason about security using weaker properties than full purity. Ideally, my language or environment would support me in this reasoning.

By assuming an object capability language for environment, you may have assumed conditions that won't allow anything weaker than purity to suffice for reasoning about particular properties: in particular, achieving confinement of untrusted code for information assurance in an object-capability system is tricky unless you construct the code yourself from some sort of pure, readily audited source. (The audit is necessary to verify the source contains no unapproved capabilities.) An alternate environment - i.e. one supporting dataflow controls influencing the host-level communications - may allow one to perform such reasoning without constraining the code to purity.

By assuming a deterministic capability language, you have similarly made it easier to introduce 'purity' without requiring many other dedicated language features. But you'll pay for that determinism when it comes to useful composition of services and distribution of untrusted code. You can't have determinism if you have distribution, for example; the two are generally incompatible due to potential for disruption.

The argument of purity for security likely depends heavily upon which language or environment you assume, and the precision with which other properties - such as dataflow, aliasing, communication - can be controlled.