Nested Schedulers: A Tree Is Not Enough?

In a recent discussion, naasking mentioned nested schedulers of Manticore.
Incidentally, I am currently interested in exactly this topic.
I am, however, confused by the fact that most of nested scheduling papers repeatedly stress the point that schedulers form a tree structure.
Indeed, a tree seemed a natural idea initially, but after playing with toy implementations I quickly started to suspect that this is too hard a constraint.

Specifically, in face of synchronization between threads, it looks logical for a blocked thread to sponsor progress of the blocking thread (let's call this "helping", and this differs from "stealing" by cardinality - more than one thread may help a given thread, while only one can steal, and the act of stealing usually transfers the ownership). Yes, this may interfere with cache, and processor affinity, but at least to evaluate negative effects of the feature we have to model it.
As soon as we introduce helping, a tree does not look adequate. In fact, I suspect even a DAG of schedulers may be too restrictive. A cycle in the graph of helping not necessarily means a deadlock, as first, not all helping edges are induced by a block, and second, the sponsored CPU ticks may outflow outside of the cycle and thus further the progress.

To bring this closer to languages: what languages/calculi are known for support of explicit CPU scheduling?
I know that (nested) engines can be implemented in any language with first class continuations, but this approach is far from ideal (e.g., separately compiled or in general non-cooperating code cannot be preempted, as I understand). Same limitations apply to any schema that relies on an "upwards" cooperation (timely yield) - e.g., coroutines.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Specifically, in face of

Specifically, in face of synchronization between threads, it looks logical for a blocked thread to sponsor progress of the blocking thread

It sounds like the well-known problem of priority inversion. I believe it's generally considered a bad design if you have high priority tasks waiting on low priority tasks, so such inversion should not actually happen. A phantom type on message passing to prevent such an inversion would be a neat application.

Perhaps this reference may be of help, though it seems more like an application for a research grant:

Multiscale Scheduling: Integrating Competitive and Cooperative Scheduling in Theory and in Practice, G. E. Blelloch, L. Blum, M. Harchol-Balter, R. Harper
Computer Science Department, Carnegie Mellon University

I don't see any discussion of tree scheduling in Manticore's "Nested schedulers for heterogeneous parallelism" paper though, so I'm not sure what you mean.

I haven't looked at scheduling much yet, but I plan to, so if you have any good references I'd appreciate it!

Priority Inversion

It sounds like the well-known problem of priority inversion.

I believe priority inversion applies only to cases when the processes compete for the shared resource.
I however meant a case of data dependency, where one (potentially higher priority, or maybe just lucky enough to get some ticks) process needs data from another. If that data is not ready yet, then instead of just wasting the ticks, the blocked (by the need of the data) process can help the producing process by donating some ticks to it - which may mean just executing its continuation for several steps.
So all in all I do not think this situation is priority inversion, or should be avoided.

Perhaps this reference may be of help...

Thanks for the reference, I believe I've skimmed past the paper when it was announced on LtU, now for some reason the link does not work for me. Will place back on my reading list.

I don't see any discussion of tree scheduling in Manticore's "Nested schedulers for heterogeneous parallelism" paper though, so I'm not sure what you mean.

Sections 4 and 5 of The Manticore runtime model do mention a tree of engines and schedulers, correspondingly.

I haven't looked at scheduling much yet, but I plan to, so if you have any good references I'd appreciate it!

My current interest in scheduling comes from two directions - concurrent constraint programming (classical works by Vijay Saraswat, and some modern ones), and the Algorithm = Logic + Control maxim by Robert Kowalski.
I also find Continuations and threads: Expressing machine concurrency directly in advanced languages and Cheap (But Functional) Threads very interesting.
While none of these references is directly about scheduling, I find them forming a useful background.

I believe priority inversion

I believe priority inversion applies only to cases when the processes compete for the shared resource.
I however meant a case of data dependency, where one (potentially higher priority, or maybe just lucky enough to get some ticks) process needs data from another. If that data is not ready yet, then instead of just wasting the ticks, the blocked (by the need of the data) process can help the producing process by donating some ticks to it - which may mean just executing its continuation for several steps.
So all in all I do not think this situation is priority inversion, or should be avoided.

I don't see the difference. The shared resource can be modeled as a process to which they both send messages, which seems exactly like the situation you describe. The higher priority process is blocked by the lower priority process because its request is being serviced first. Priority inheritance will "solve" the problem by donating the priority of the highest blocked process, but this blocking structure is probably just a bad idea in the first place. The shared process should already be scheduled at the same priority as the highest priority process that makes requests on it. Are you perhaps considering some sort of asynchronous message passing semantics?

IIRC, the EROS microkernel does a limited form of what you suggest. When domain A invokes an object serviced by domain B, the remainder of A's slice is donated to B. Once preempted, I believe B reverts to its own scheduling policy.

Still a bit different

The shared process should already be scheduled at the same priority as the highest priority process that makes requests on it.

"Highest" makes sense only in some (quite simple, mostly flat) scheduling schemes. Section 5.2.2 of Using Hierarchical Scheduling to Support Soft Real-Time Applications in General-Purpose Operating Systems gives an overview of (some) more complicated schemes.

I don't see the difference.

Returning to the difference (as I perceive it): while in case of priority inversion the conflict has an additional option for resolution by undoing progress of the lower priority process to the point when it acquired the scarce shared resource, in case of the data dependency this option is not available - the resource was never acquired, the other process is blocked only because the resource was never produced yet. One may explore this difference further starting from observation that in the latter case the unblocking is monotonic - once produced, data remains available, while in the former case resources get acquired and released indiscriminately.

I can see your point that in some circumstances it may be beneficial to just prohibit such complications - e.g., by using static analysis (whether in form of type checking or otherwise).

IIRC, the EROS microkernel does a limited form of what you suggest.

Thanks for the pointer. Will look into it.

"Highest" makes sense only

"Highest" makes sense only in some (quite simple, mostly flat) scheduling schemes. Section 5.2.2 of Using Hierarchical Scheduling to Support Soft Real-Time Applications in General-Purpose Operating Systems gives an overview of (some) more complicated schemes.

Thanks for the pointer. I just meant that the shared process P should not be starved by processes which depend on it, thus preventing progress. The existence of this dependency implies that P is necessarily at least as important as any of its dependents, and any statement or use suggesting otherwise is a safety violation. This is the problem with priority inversion.

If a "higher priority" for P is unjustified, for instance because its output is not critical for the higher priority process R's operation, then promises or asynchronous semantics is justified, where R spawns an agent B to wait on P, while R continues on its merry way. B can then notify R when data is ready.

while in case of priority inversion the conflict has an additional option for resolution by undoing progress of the lower priority process to the point when it acquired the scarce shared resource

I'm not sure I understand. How can progress be undone? I don't see a general way to do this, because processes can have side-effects.

Perhaps we're going about this the wrong way. I see two assumptions in your original write-up that need more analysis first:

  1. In what way are tree schedulers insufficient?
  2. What problem does 'helping' solve?

Concurrent lazy search

You may be right that we are not making progress. Let's preempt this thread with the two you suggested.

In what way are tree schedulers insufficient?

The previously linked dissertation by Regehr has an example to motivate "join" scheduler - a means to generalize hierarchy of schedulers to at least a DAG.

What problem does 'helping' solve?

One of the problems that motivate my interest in scheduling may be found in Compiling Constraint Handling Rules with Lazy and Concurrent Search Techniques. Or just take a look at how we communicate on LtU for a meta-example.
Basically, you may have multiple procedures for finding a solution, and you do not know beforehand, which of them terminates sooner or at all. This requires concurrent execution of them. Also, there may be some data dependencies between procedures (the "lazy" part) - so progress of one procedure (C) may become dependent on the progress of another (P). Helping is a policy (either built into the language, or expressed by the programmer) to delegate ticks of C to P while this dependency lasts. This avoids undeserved leaking ticks of C to other branches of the computation. At least, this aims to find the solution sooner; I never tried to construct a case where this mere optimization becomes a difference between termination and nontermination, though this may be worth a try. On the other hand, one may of course construct a counterexample when helping hurts. It is just a heuristic, and as such is better to be optional (i.e., expressible by the programmer).

The previously linked

The previously linked dissertation by Regehr has an example to motivate "join" scheduler - a means to generalize hierarchy of schedulers to at least a DAG.

Which dissertation? The only one I see in this thread is by Rainey, and 'join' is not mentioned anywhere in there. I am interested in seeing an example where such a generalization is necessary, or at least useful, so I have a good reference when I start looking into scheduling more closely.

Re: helping

By your description, helping seems to be a type of optimization, and not really intended as a solution to a particular problem, is that correct?

My concern with helping is that the additional dynamism of tick donation makes scheduling analysis more difficult. Real-time analyses probably break down in the face of such dynamism for instance.

As for whether the ticks are 'undeserved', I suppose it depends on one's perspective. It sounds like you are concerned with notions of 'fairness', and 'quality of service'. The dataflow dependencies you describe are present in Orc, and Orc' semantics are sufficiently formalized that it has been used to analyze quality of service guarantees. Perhaps those papers would be of interest to you, as it seems related to what you've described, re: policies, stealing ticks, etc. I think they're available from the Orc homepage.

In a sense, the notion of helping/tick donation just doesn't seem natural to me. You can't speed up a hardware process you're waiting on, for instance, so it doesn't generalize to all contexts. I think this impedes the "compositionality", which in turn, impedes the user's ability to reason about his system. That's not a very desirable outcome, so the language would need some way to express that a particular subsystem can or cannot be 'helped'. This adds complexity, but if it is well-motivated, then we've gained some necessary flexibility.

Regehr's Dissertation

I meant this one, section 5.3.1.2, which introduces "join" and makes a reference to an example in 6.2.1.

Lock free algorithms

Your description makes me think of the usual technique to implement lock free data structures and algorithms. Keir Fraser's thesis, Practical Lock-Freedom (a precursor to the STM work in Haskell) which, I referenced a few years ago, describes the technique called "recursive helping". This isn't directly related as this is not a scheduling technique and also the approach described in the thesis does not involve donating a specified number of "ticks" to an arbitrary computation, though it may be adaptable to that.

Thanks

It's interesting you mentioned lock-free algorithms, as I remembered them just recently for another reason - extending uniprocessor scheduling of concurrent program to a multiprocessor setting.
Will definitely take a look - it may well be that my use of word "helping" subconsciously derives from recursive helping.