Practical rules for controlling program effects in an imperative / OOP environment.

I posted this article at reddit, and would like to hear constructive feedback here at LTU.

Thanks!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Managing concurrency effects is enormously challenging

Managing concurrency effects is enormously challenging. Monotonicity and quasi-concurency can help.

Properties of Effects

When talking about concurrency, the main property of effects that I care about is commutativity. With concurrency, ordering is of course arbitrary—so you want to know whether different orderings will have different outcomes, right? If you can say definitively “these effects can be reordered arbitrarily and nobody will be the wiser” then that frees you up to do some really spectacular stuff.

“Oh yeah, reads of the same heap are commutative, whereas writes aren’t; but writes to different heaps are commutative…” And then you naturally arrive at principled descriptions of synchronisation, memory barriers, and so on. Effect systems and concurrency go really smashingly well together, and I’ve got some great stuff planned in that department for Kitten.

Effect systems and concurrency go smashingly well together

I agree, and I'll share a bit of my recent experience.

I'm currently developing Awelon Bytecode (ABC) (a tacit concatenative bytecode based on arrows rather than stacks) and one of my interesting decisions is to enforce causal commutativity via the effects model. This supports a high degree of optimization and implicit parallelism.

It turns out that even imperative effects are easily tamed to work within the constraints of causal commutativity. It requires two more simple features: substructural types and capability security, which ABC also supports. Most objects are modeled as linear capabilities that, upon receiving a message, return a new object along with any results. A few special objects can model collaborative writes or communication based on commutativity or monotonicity. Operations on multiple objects can run in parallel, and combining results from multiple objects becomes a form of implicit synchronization.

In addition to causal commutativity, another nice property to enforce for an effects model is spatial idempotence. I prefer that idempotence refer to its original mathematical meaning of `f(x) = f(f(x))` (or `f = f f` in concatenative languages). By 'spatial idempotence' I refer to the common variation intended in most CS contexts, e.g. `do { r1 ← f(x); r2 ← f(x); return (r1,r2) } = do { r ← f(x); return (r,r) }` (or, roughly, `f dup = dup [f] dip f`). It turns out that enforcing spatial idempotence doesn't require much extra effort above causal commutativity. Mostly, this impacts how we create 'new' stateful objects, requiring we formally 'fork' an existing unique structure.

If we enforce causal commutativity and spatial idempotence pervasively, we get all the nice, declarative equational reasoning properties associated with pure functional programming... without any of the limitations with respect to open systems and decomposing applications into services. It's a very nice sweet spot in the PL design space. :)

ABC enforces these properties at the bytecode layer. Presumably, I could also model monad transformers or algebraic effects or staged DSLs. Many monads require programmers reason carefully about order and replication of effects, and hence are very imperative in nature. But I hypothesize that having causal commutativity and spatial idempotence at the lower layer will greatly improve the implicit concurrency within ad-hoc monads and other structures.

Managing concurrency effects is enormously easy

As long as you have the right programming model, or maybe I should call it "how I learned to stop worrying and love the mutation."

The advice in the reddit post is completely anecdotal, it seems like you could easily argue against each point and the real "truthiness" is not clear.

Some devilish advocacy

  1. Use functions instead of methods to define program behavior.
    A method is no more than a function with an implicit “self” parameter, and it’s an enormous convenience to have the state of a particular object available when writing code concerning the class of that object, so that you can describe the general behaviour in terms of any given concrete example.
  2. Use mutation for program transformation only, and immutability everywhere else.
    An object is an actor in a dynamic system; if it were a simple value, then we wouldn’t need to bind data and behaviour together into an object in the first place.
  3. ‘constify’ arguments where possible—A F(const B& b, C& c)
    Having immutability enforced by the compiler is at best a minor convenience; if you write well-factored code, then no mutations can hide from you in a long method, so it suffices to make single-assignment a convention only. Also, explicit immutability means I can easily forget to make something immutable.
  4. Use naming conventions for nullable reference semantics - var optThing = getOptThing().
    Rather than use Hungarian notation, the compiler might as well check it for me.
  5. Use interface inheritance only as a substitute for pattern-matching / protocols, except when external libraries force it upon you.
    Interface inheritance is the OOP way of expressing compliance with a protocol.
  6. Do not use implementation inheritance except when external libraries force it upon you.
    Why not? What if I want to mix some common behaviour into other classes, and that behaviour requires certain state? Moreover, what if I want to do that multiple times? Would you also say that multiple inheritance should be banned?
  7. Do not use mutable globals or ‘singletons’.
    But some things truly are global, and mutable, and there is logically only ever one of them at a time in an instance of my application.
  8. Avoid mixing side-effects with lazily-implemented algorithms (such as with C# Linq).
    Why? If I have a lazy algorithm, then I expect to do work lazily fullstop, regardless of whether that work involves side effects.
  9. Use queues for inherently (IE, not just circumstantially) side-effecting program operations.
    What does this gain me when I can simply run the effects I mean to run, when I mean to run them?
  10. Use Static Single Assignment for local primitive variables.
    Isn’t that the job of a compiler to do for optimisation purposes?

I'm not the author, but I

I'm not the author, but I believe I can address your questions:

A method is no more than a function with an implicit “self” parameter, and it’s an enormous convenience to have the state of a particular object available when writing code concerning the class of that object, so that you can describe the general behaviour in terms of any given concrete example.

I believe "method" is intended as "stateful procedure" and "function" is defined as "pure". These seem to be the typical meanings of these terms, and it makes the principle sound.

if [an object] were a simple value, then we wouldn’t need to bind data and behaviour together into an object in the first place.

I don't think that's necessarily true. You might only want to expose part of some piece of encapsulated data. The adapter pattern applies to immutable objects too, for instance.

Why not? What if I want to mix some common behaviour into other classes, and that behaviour requires certain state? Moreover, what if I want to do that multiple times?

Parameterization is simpler and more flexible than inheritance, always. Use interfaces to define modular signatures, and that's all you really need.

Would you also say that multiple inheritance should be banned?

Yes.

But some things truly are global, and mutable, and there is logically only ever one of them at a time in an instance of my application.

Until you need to unit test, or you want to run multiple instances of your program in isolated sandboxes. Global mutable state is not composable and difficult to reason about.

Why? If I have a lazy algorithm, then I expect to do work lazily fullstop, regardless of whether that work involves side effects.

Absent type and effect systems, or some principled replacement like iteratees, programs written with laziness + effects are not composable.

Isn’t that the job of a compiler to do for optimisation purposes?

This principle is about robustness in the face of refactoring. If you're reusing the same variable multiple times via assignment, it's very easy to change the meaning of code even when you're performing meaning-preserving, local code transformations. If your variables are all single-assignment, this can never happen.

For the record

I don’t agree with most of the points that I argued. Just offering the alternative views.

some things truly are

some things truly are global, and mutable

Global mutables - in the sense of being non-local, external to the application - are fine. But ambient authority to access those globals is problematic. If globals are accessed in a capability-secure manner, we can greatly improve testability with mockups, portability, extensibility, and securability.

enormous convenience to have the state of a particular object available when writing code concerning the class of that object

It isn't clear to me that local, aliasable state is a good thing. Local state seems a de-facto aspect of OO, but I think we can get most benefits of OO - plus better properties for extensibility and runtime upgrade - if we instead securely encapsulate bindings to external state.

An object is an actor in a dynamic system; if it were a simple value, then we wouldn’t need to bind data and behaviour together

Indeed, that's the point: we don't need to bind data and behavior together. There are other means to model dynamic systems.

Interface inheritance is the OOP way of expressing compliance with a protocol.

Consider an alternative: instead of inheriting an interface, create a function that wraps a concrete object with an interface. This is another way of expressing compliance with a protocol. It's also better for separation of concerns.

What if I want to mix some common behaviour into other classes

Perhaps if we have a proper 'traits' model - i.e. flat, commutativite, stateless mixins - then this wouldn't be so problematic. Traits offer an interesting means to compose complete objects from 'partial' object concepts. I also think they'd work very well together with statically computed constraint models.

But, historically, most approaches to implementation inheritance and multiple inheritance are deeply flawed and should be avoided if possible.

I can simply run the effects I mean to run, when I mean to run them

Right. That will be simple when computers do what I mean instead of what I say.

Isn’t [SSA] the job of a compiler to do for optimisation purposes?

I use this pattern, not for optimization but because it's easier for me to reason about code when variables within a function are constant. Of course, I prefer to avoid variables entirely and code in a point-free style. But when I'm stuck writing Java or C++ code, I use SSA.

the compiler might as well check it for me

That's a good philosophy wherever it's feasible. :)

Self discipline is a limited resource, and never is it evenly distributed in a team.

Most things others have

Most things have been addressed likely better by others than I could, but I will address the places that you suggest should be handled by the compiler.

Rembember that these are notes on how to survive in a typical imperative / OOP environment where a language like c++, c#, or java are forced upon you. AFAIK, features like SSA enforcement are not available in these languages.

What are concurency glitches in "Glitch" beyond overclaiming ;-)

What are the concurrency glitches in "Glitch" beyond over claiming ;-)

PS. Is the full paper on "Glitch" accessible?

I am definitely ambitious in

I am definitely ambitious in my claims; whether or not I'm overclaiming remains to be seen :)

I haven't written a full paper on Glitch yet, and I'm still missing an important feature with respect to bringing time inside. Its next on my list of things to do/write.

I'd only like to point out that there are plenty of ways to fix mutable state beyond avoiding it or locking it behind an actor.

What sort of feedback do you want?

The ten rules you offer seem a bit arbitrary in subject matter and order. (I find rule ten grating: that you should pray peers let you follow other rules. This helps exaggerate the uneven quality of listed items.)

Is there a language-oriented aspect you want to emphasize? Usually that's what folks here find interesting. I have trouble finding a theme beyond managing mutation. Yes, it's a good idea to identify immutable parameters, and to avoid mutation, and serialize change when avoiding it is not possible.

Avoiding globals is a good idea. In practice this is rarely done, because some form of global state is often necessary, representing whatever state an app manages to learn while running. To avoid global variables, passing an explicit parameter to every method representing global environment state is necessary, which can be very inconvenient. If you mix libraries from different sources, it's infeasible to get them all to put state in the same environment representation. (If every library abstracted this, it wouldn't be a problem, but almost no one thinks about abstracting that.)

A language might help organize state management with a scope abstraction, but it might not be useful unless standard, and finding the magic one-size-fits-all solution would be hard.

I replaced rule 10 with SSA.

I replaced rule 10 with SSA.

pro global state

well, I seem to recall D. Barbour having posted some thoughts now and then (an example) about how the concern over global is possibly a big red herring. At least if one can unravel the sweater of assumptions and history and habit enough. I don't grok it yet myself, of course.

globals as dependency injection via interfaces

Global state always seems necessary, so being anti-global translates poorly into direct, productive result. This is an instance of a larger category of good practice covered by, "If possible, don't impose your decisions on others, forcing them to cope, instead of giving them independence." Below I amplify what I mean by avoiding globals: deferring choice of what is global to whoever owns resources in the app environment. Alan Kay would call this strategy "late binding", which applies to a lot of things. Depending on something too early limits possible uses, which below Matt Fenwick correctly calls coupling, which devs in the olden days (80's and 90's) used to hope could be solved by component architectures inspired by OO style coding. Dependency injection is a way to late-bind, so callers can pass in dependencies they prefer, instead of using those you force on them willy nilly.

Here's a fictional dialog where Stu burns Jim with early decisions about globals:

     "When I init your library a second time, it breaks," Jim complained.
     "Of course," Stu nodded. "My globals get initialized the first time, so it's dumb to do it again. User error: don't do that."
     "I'm trying to simulate two instances of something in one OS process," Jim explained, "so I can debug without actually doing all my testing with multiple processes. I'm guessing you don't support that."
     "Nope, can't do that," Stu shook his head. "Only one copy of my runtime is needed at once, so it's silly to have two. Use multiple processes."
     Jim sighed. "Well that makes life more complex."

Wil handles globals differently, letting Jim decide, as shown in this dialog:

     "Does your library have globals?" Jim asked despondently.
     "Sort of," Wil considered. "in the sense each runtime instance has state global to the instance. But space is provided by you, wherever you want. The library has no mutable globals declared statically inside, only const static data like string names and lookup tables. It's a bit object-oriented in the sense you construct a runtime instance of a yard where all mutable state goes."
     "Then I can have more than one yard instance in my process?" Jim anticipated.
     "Sure, as many as you want," Wil nodded. "Each call to the library takes a yard parameter, either directly or indirectly. They won't interfere with each other, unless you confuse your own traffic with them."
     "Won't they interfere in calls to use global resources?" Jim suspected.
     "No," Wil smiled, "because the only way each yard accesses the outside world is through calls to an abstract env object you passed in when the yard was constructed. So if you supply each yard with a separate env instance, and you keep them outside of each other's hair, you should be good."
    "But now I'm responsible for avoiding conflict in my multiple env instances," Jim chewed his lip.
    "Al, Ed, and Ty have multiple env implementations you can use as a starting point," Wil suggested.

Then Jim can make space global if he wants, since Wil's code doesn't need to know. Or if Jim is making a library used by Ivy, and Jim wants to let Ivy decide how and where space is allocated, their dialog might go like this:

    "Did you commit me to your space decisions?" Ivy wondered.
    "Nope," Jim assured, "here's how you allocate space I use, but you also need to allocate space and the env object I pass into Wil's library. Now you're on the hook instead of me. Am I clever, or what."
    "Yes!" Ivy pumped one fist. "Here's a tip. Don't spend it all in one place."

Whoever has top level environment responsibility decides how resources get used. There is still global state, but nested scopes should not need to know what is global, or how many instances of the runtime exist. This is what avoiding globals means: representing requirements as an interface until a final decision is made by an owner of resources involved, who injects them as a dependency. This works when local scopes abstract their dependencies so they can be passed as parameters.

Not Local State

I would suggest that all state be provided in the manner you describe: an external provider 'injects' the state like any dependency injection model. And the decisions keep getting pushed 'upwards' as from Wil to Jim then Jim to Ivy.

This pattern could be protected if we remove ambient authority to create 'new' stateful objects (variables, actors, etc.), and instead require explicit capabilities that create a precise line-of-authority for binding securable partitions of external state.

I suppose this isn't the same as 'global state' in connotation. But it certainly isn't 'local'.

I generally agree with all

I generally agree with all the points made. Some of these principles already exist, so you can use their established names to have more weight.

For example, using naming conventions like 'opt' is an example of "(Systems) Hungarian Notation", which is a naming convention that acts like a poor man's type system. This is also useful when we have one "String" type, for example we can use "uSomeInput" and "sSomeInput" to denote unsafe (user-supplied) and safe (escaped) strings.

Unfortunately HN often degrades to the lowest common denominator of the language's built-in types, eg. "iFoo" for ints, "sFoo" for strings, "bFoo" for bools, etc. which is pointless since the language already enforces these types. A lot of people reject the idea of HN due to only having experience with this degenerate form.

Another point I'd add is to follow Static Single Assignment for local variables rather than re-using them, since it makes inserting new statements easier later.

A point I'd make which is aimed more at scripting languages than C#-style languages is to aim for high locality, which is basically encapsulation enforced by lexical scope. Basically, declare your variables (especially functions and classes) in the tightest scope possible. I've worked on many codebases which enforce all kinds of OO encapsulation guidelines in the server-side code, but then make everything global in their Javascript :S

SSA is a rule I've been

SSA is a rule I've been following, but forgot to mention on the list! I was pretty sure I'd miss something, though :)

Will add!

wrong reply location

deleted

Interesting point about

Interesting point about lexical scoping of artifacts in scripting languages. I'll think about that some more!

Good to hear "coupling"

Good to hear "coupling" getting some mention.

In my field (scientific software), coupling is one of the biggest problems we face. We (scientists) need modular and flexible software, because we're continually finding and studying novel phenomena, and we don't have time to keep rewriting from the ground up (plus we're supposed to be doing science, not software!). But coupling absolutely kills us.

It's amazing how it shows up at pretty much every level, from the lowest to the highest, for instance:

  • singletons, instead of normal objects
  • free variables instead of parameters
  • over-engineered class hierarchies
  • custom input/output formats
  • presence of specific versions of libraries, databases, 3rd-party programs (but not included with the program)
  • running only under a specific shell
  • all kinds of crap in environment variables
  • compilers, global DLLs (, builds
  • architecture and OS-version sensitive

Perhaps coupling is just extremely hard to avoid.

I don't know if you have any control over the formatting, but it could be a lot easier to read, just with some simple changes.

In response to your addendum, I've sometimes wondered if there is more than one brand of OOP: the one that talented programmers employ, and the one that less-talented programmers employ (and is taught in schools, explained on everybody's blog, etc.). Perhaps OOP is only oversold by those who have an incomplete/flawed grasp of it? Perhaps OOP's failure is only that it became popular, and was incorrectly learned and applied? Perhaps any other paradigm, it it became popular, would be just as misunderstood by those who don't "get it", and hated by those who do "get it" for being misunderstood? Just a thought.

Yes, every time I bring up

Yes, every time I bring up OOP, I fear that I miscommunicate because there are so many different OOPs :)

But yes, here I talk about the OOP that is most commonly used in C-derived languages.

custom input/output

custom input/output formats

What sorts of formats are common in your domain?

Broad spectrum of formats

I'm not sure I understand your question but I'll give it a go (please let me know if I misunderstood):

lots and lots of textual formats, including CSV, pseudo-CSV (looks like CSV but more special cases), XML, macro-like, Lisp-ish, and a whole host of others, along with binary formats (that come with custom-formatted text files holding metadata, although some store the metadata in the binary file as well).

Well, you mentioned a

Well, you mentioned a plethora of input/output formats, so I was wondering what kind they are, and if there's any effort to standardize on a few, or perhaps if someone's made the effort to create a standard library for importing from these various formats; there are probably too many though.

I figured XML and CSV would be common, and unfortunately expected pseudo-CSV.

Yes, but with more coupling

While there have been standardization efforts, this is usually the result (part of the problem may be licensing issues and loss of grant funding leading to abandonware -- tools that are old, but useful, which can't be properly maintained).

There was a nice effort to provide universal translation. They decided to translate by importing/exporting everything using a common model, but unfortunately made it an all-or-nothing approach (i.e. for the 80% that is within their model it works fine, but for the 20% on the outside, it is no help at all).

Also, the parsers/serializers are coupled to the model. Result: my valid data (output of another tool) can't be translated with that tool, and I also can't use their parser to load it into my own model.

oops -- wrong spot

oops -- wrong spot