Write tracking for Nimrod

Hello,

I developed an algorithm to compute a function's "write set":

http://nimrod-code.org/blog/writetracking.html

While the algorithm lacks some details, I'm quite sure the analysis is sound, at least for Nimrod's case. However I wonder if I re-invented some existing algorithm (wouldn't be the first time) or if it's truly novel work.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

You need to track reads too

"Here the effect systems shows its strength: Imagine there was a genId2 that writes to some other global variable; then genId and genId2 can be executed in parallel even though they are not free of side effects!"

I think for this to be true you also need to track which variables/locations are read and make sure there's no read/write dependency between the procedures.

Yes, a good point. Read

Yes, a good point. Read tracking is a very similar algorithm, but I fear the memory overhead of keeping the read sets around and so I plan to use a simpler alternative for reads: We can assume the transitive closure of everything reachable from the parameters is read and in addtion to that detect/track read accesses to any global variable.

Worried

The examples you give are very simple (write to an easily statically-determined global or local variable) while the trouble are most likely to happen in more difficult examples (write to a dynamically-computed index of a global array, merging information from conditionals, etc.). The questions are whether your analysis can scale to these harder examples, and what proportion of the use cases is covered by what it can do.

I must say that the blog post left me worried. If I had to hazard a guess, it would be that the analysis is too naive in practice and won't work on actual code.
The question then become: how do you think of your analysis? If it's just one optimization in a framework that is expected to only work in some cases, it's ok if you're very simple (or even a bit heuristic in detecting slightly more complicated cases by making semi-arbitrary choices). If you think of it as semantics that will be advertized to your users, which they will be aware of and expect to work, then you absolutely need stronger validation of your idea: running the analysis on actual code to see how well it performs.

The problem with effect systems is that they tend to grow in monstruosities. You're happy to just track write sets today, but to cover this new case you had to add "new" as a concept to learn something about aliasing. Tomorrow you'll want finer-grained information about aliasing in the types, and you'll start thinking about linear types, explicit aliasing/nonaliasing proofs, etc.

I agree

This rule is particularly coarse:

v = f(path(a), ..., path(b), ...) # unless 'f' is 'new' (see below)
-->
writeset(v).incl(a) iff 'v' may alias 'path(a)'
writeset(v).incl(b) iff 'v' may alias 'path(b)'

If this write-set mechanism isn't stable under refactoring code into functions, that will be a big annoyance in practice, I think.

If this write-set mechanism

If this write-set mechanism isn't stable under refactoring code into functions, that will be a big annoyance in practice, I think.

A valid concern but my feeling is that it's not a problem in practice as the 'may alias' condition is rarely true.

Maybe I don't understand

Maybe I don't understand what you're proposing, but you wrote:

A path is an lvalue expression like obj.x[i].y.

So if you define function at(a, i) = a[i] and then write at(obj.x,i).y, is that treated differently than the above path? If so, I think it will be annoying. EDIT: Fixed bug

The examples you give are

The examples you give are very simple (write to an easily statically-determined global or local variable) while the trouble are most likely to happen in more difficult examples (write to a dynamically-computed index of a global array, merging information from conditionals, etc.). The questions are whether your analysis can scale to these harder examples, and what proportion of the use cases is covered by what it can do.

The algorithm is simple, but not naive. The "toy examples" in the article enumerate many important/non-trivial aspect of the language (apart from indirect recursion and the handling of closures) wrt write tracking. The algorithm is flow insensitive so there is no fundamental difference between 'x = f(a, b)' and 'x = if cond: a else: b'. Writes to a dynamically-computed index of a global array are irrelevant as then the whole array is the owner.

then you absolutely need stronger validation of your idea: running the analysis on actual code to see how well it performs.

Well I checked lots of example programs, they all "work" good enough to warrant an implementation. So yes, I'll implement it and we'll see how well it works on the compiler's source code.

The problem with effect systems is that they tend to grow in monstruosities.

As opposed to what? Dependent typing systems? You manage to worry about the naiveté of the algorithm and about the complexity of effect systems at the same time.

You're happy to just track write sets today, but to cover this new case you had to add "new" as a concept to learn something about aliasing. Tomorrow you'll want finer-grained information about aliasing in the types, and you'll start thinking about linear types, explicit aliasing/nonaliasing proofs, etc.

The algorithm already depends on a value- and type-based alias analysis and "new" is an interesting property in its own right IMO...