Computing Needs Time

Edward A. Lee, Computing Needs Time, Communications of the ACM, Volume 52, Issue 5 (May 2009).

The foundations of computing, rooted in Turing, Church, and von Neumann, are about the transformation of data, not about physical dynamics. This paper argues that we need to rethink the core abstractions if we really want to integrate computing with physical processes. In particular, I focus on a key aspect of physical processes that is almost entirely absent in computing, the passage of time. This is not just about “real-time systems,” which accept the foundations and retrofit them with temporal properties. Although that technology has much to contribute, I will argue that it cannot solve the problem alone because it is built on flawed foundations.

The section of most direct relevance to LtU is probably section 5.2 on programming languages, which opens by saying:

Programming languages provide an abstraction layer above the ISA. If the ISA is to expose selected temporal properties, and programmers wish to exploit this, then one approach would be to reflect these in the languages.

Also potentially of interest to the LtU readership is section 5.4 on formal methods, which closes by asserting that

...type systems are formal methods that have had enormous impact. What is needed is time systems with the power of type systems.

Note: The "Tagged Signal" meta-model of computation mentioned in section 3 of the paper was previously discussed on LtU here.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Actors Model "Time"

I've always been rather fond of the 'Actors Model' understanding of time:

  • a message is received after it is sent
  • upon receiving a message, an actor may send more messages. Those messages occur after the message is received.

I tend to think of physics in those terms. Space and time as functions of observation - that's the root of relativity.

Access to 'real time' in an Actors Model would, of course, require an actor that 'observes' something in the 'real' world and regularly reports the time (e.g. once per second). Many such actors could exist, of course, and it is likely that each of them would disagree.

That said, I can agree that we need languages that better leverage observations over time, especially including real-time and time-as-a-resource.

Using a different model

The issue isn't so much being able to model time. Many formalisms can do that. Nor is it about having access to 'real time'. Knowing the time is not the same as being able to respond in a timely manner. What Lee is interested in is designing systems that must interact with the physical world and therefore must have predictable, repeatable timing. His claim is that the standard abstractions we use to build computing systems (including actors) deliberately discard time as a part of their model, and that this makes it inherently difficult to build correct systems if those systems must interact with the physical world. Attempting to patch over the lack of time information by using something like an RTOS is a brittle and error-prone solution. Instead, Lee is proposing a change in the way computing is done, such that time is incorporated into our models from the ISA-level on up. He wants timing to be a core part of the semantics of programming languages.

What specifically is wrong,

What specifically is wrong, say, with what Ada provides? And if the issue is as you present it, why is this not the same as the requirements imposed on realtime systems?

Time as a Side Effect

Fundamentally, the problem is time as a step-function side-effect that isn't modeled, understood, or controlled by the programmer.

This makes for difficult composition. Two program fragments that may be individually correct might not be temporally correct when interleaved or placed in a sequence.

Strong timing means knowing and controlling how much 'time' takes place as a 'conceptual' side-effect of each operation.

This might not be the same as real-time, since one might simplify the model by saying that (for example) pure functions and Datalog queries take zero time, but that sending messages and accessing state take time. So long as one can 'fold' the functional operations into the time consumed by the state manipulation, a practical implementation may still be possible.

It may also be possible to enforce a 'perfect model' in the same sense as enforcing a 'perfect failure detector': if the model is wrong, kill enough processes to make it right again.

An accurate model of time, combined with strong timing, is necessary to analyze for real-time properties in the face of program composition.

I don't think Lee is looking too hard for a model of time accurate enough for realtime use. Any model of time that will help the programmers predict temporal behavior of programs would be appreciated.

Addressed in the paper

As it turns out, Lee specifically addresses Ada (as well as other languages that incorporate some time information):

There is a long and somewhat checkered history of attempts to insert timing features into programming languages. Ada can express a delay operation, but not timing constraints. Real-Time Java augments the Java model with a few ad-hoc features that reduce variability of timing [7]. The synchronous languages [5], such as Esterel, Lustre, and Signal, do not have explicit timing constructs in them, but because of their predictable and repeatable approach to concurrency, can yield more predictable and repeatable timing than most alternatives. They are only limited by the underlying platform.

Lee also mentions several domain-specific languages (such as Simulink) that include temporal semantics, but asserts that they "...remain outside the mainstream of software engineering ... are not well integrated into software engineering processes and tools, and they have not benefited from many innovations in programming languages."

As I understand it, the requirements are those of real-time systems. However, Lee (and John Stankovic, whom he quotes) claim that "existing technology for RTES [real-time embedded systems] design does not effectively support the development of reliable and robust embedded systems." Specifically, Lee states that

Modern processor architectures render WCET virtually unknowable; even simple problems demand heroic efforts. In practice, reliable WCET numbers come with many caveats that are increasingly rare in software. Worse, any analysis that is done, no matter how tight the bounds are, applies to only a very specific program on a very specific piece of hardware. Any change in either the hardware or software, no matter how small, renders the analysis invalid. The processor ISA has failed to provide an adequate abstraction.

You're quite right that languages intended for real-time applications have in the past included some kind of temporal semantics. The problem is that modern processors make those semantics difficult (or impossible) to guarantee. Lee is suggesting a change in processor design (towards something like the Precision Timed Machine), with a corresponding change in the way the ISA is defined. The interesting question from a PL perspective is how those changes in the underlying platform should be reflected in higher-level language design. This paper gives a few pointers to the kind of issues that might need to be addressed in designing a language for such an architecture.

I was thinking of things

I was thinking of things like Rate Monotonic Analysis, and how support for it influenced the design of the language.

RMA

Interesting. I didn't realise that RMA was specifically considered in the design of Ada. What influence did it have on the language design?

Anyway, as I understand it, RMA only guarantees that a given set of tasks can be scheduled in a way that meets their deadlines. It doesn't guarantee that a particular implementation will actually execute scheduled tasks on deadline. I think that what Lee is getting at is that if the computing platform upon which your code is running is not sufficiently predictable as far as timing goes, you can end up with a system in which tasks may be properly-scheduled but are executed too late (or too early, which can be just as bad).

One approach to improving

One approach to improving the predictability and repeatability of cache behaviour is to simplify the operation of the cache, and hand it over to programmer control. This paper describes one approach, sadly it didn't get much traction, hence it's not covered in the Precision Timed Machine paper when they describe the issue.

If the program specifies how the cache should behave, then the combined specification of program+caching is somewhat easier to analyse than program + generic cache behaviour.

Thanks

Neat. Thanks for the pointer!

Actors Model == Asynchronous Concurrency

The independent nature of actors gives them a very personal view of time.
They know about before and after, yes, but what about simultaneity?
I believe that synchronous languages, such as Esterel and FRP, where the notion of time among entities is unique, is are much closer to Lee's rants.
Asynchronous concurrency (csp, actors, threads) are in oppose direction.
What do you think?

Agreed

I often think of actor models as over reductions of most real physical processes. Physical systems such as the ubiquitus sampled data systems can only be abstracted by a computation. Typically this is a solution to some difference equation, a system with state. It is hard to think of an actor model equivalent. I tend to use the word activity instead. To be comfortable in a hybrid environment I would like a language with a formal semantics consistent with "sampled data semantics" or "clock semantics"

Some people may be interested to know that clock semantics actually has a modal logic definition. This is a recent summary with links to older results. Lax logic has been combined with effects in at least one formal language proposal previously mentioned on Ltu. A Modal Language for Effects.

Temporal Reactive Displays

As a recent exercise, I've been designing a data-driven document model live user-interface (derived from Datalog, but with support for guards and stratified functional data). Conal Elliot's work with Images (Pan) was an inspiration there. This is currently described here.

One of the problems I encountered was: how do I best define animations, sounds, and transition-effects? I was attempting to avoid introducing any dependence on document history, and to keep the whole system robust against partial network failures, keep everything zoomable, and to maintain the totality of Datalog.

My eventual design was simply to add document-time as a datum, with the special provision that the browser interpolate this time from updates. I.e. if the last update to the document was three seconds ago (in the opinion of the browser) then the document-time would be three seconds more than was originally reported.

I would then allow data to be defined as a continuous function of time using stratified functions or as a discrete function of time using guards. I specifically excluded 'step-functions' and 'frame functions' of time (where data at time T is a function of data at time T-dT). This would allow one to specify, for example, that a particular sound is playing starting at some point in the near future, or that the opacity and position of a display object is a continuous function of time up to some point and a constant afterwards.

The data necessary for the temporal aspect of the 'live' display can, thus, be maintained by simple subscription, just like the spatial aspect. The document being a function of time becomes orthogonal to it being 'live' and receiving updates due to changes in the world. With a monotonic time, not allowing resets or rewinds, one could also garbage-collect fact :- blah, time(T), T<K any time after T >= K.

The 'document-time' is still kept separate from wall-clock time due to the lag issues. In addition to lag-leveling or buffering (to reduce issues when updates vary in lag), there is some concern for accidentally chopping off 'new' changes due to live events. I.e. suppose the document is updated at time T to display an animation event at T+100 ms. If synchronized by wall-clock, a client 50 milliseconds away will hear this update at T+50 milliseconds and play the animation dutifully at T+100 milliseconds, but the client 400 milliseconds lag away would hear about this update T+400 milliseconds, and thus would need to skip the first 300 milliseconds of the animation. It would make for difficult video-conferencing and choppy animations! Sticking to 'document time' avoids these issues, but at the cost of displays being different based on 'distance' from the source - whether this is a 'cost' is questionable given its correspondence to physics and light.

Lee appeals to improving time as a quantity in the output of a 'program as a reactive IO processor'. I'd be surprised if this 'temporally reactive document' strategy, as I'm using for the UI, isn't generic enough for most such 'output' problems, where one is concerned with output in real systems. I.e. it isn't difficult to define output signals to a speaker or wire or antenna or anything else as continuous functions of time. This approach also avoids the follies of step-functions of time, where the output becomes difficult to analyze (essentially a differential equation) and thus difficult to optimize, and output also becomes difficult to compose.

The 'Actors Model' model of time is an example of a step-function: time advances a 'step' with each message received.

But, unlike Lee, I don't believe time should be integrated into the languages all the way through. Given concerns for disruption tolerance, resilience and self-healing, etc. it seems inappropriate for 'time' to be a significant concern to the open communications layer. Indeed, finding ways to undo the impact of synchronization concerns (as through state-based publish-subscribe model, use of blackboard architectures and tuple-spaces, etc.) seems more appropriate.

A layered language approach, with 'time' only being a concern for 'describing output' behaviors of 'real' systems (as in actuators, sounds, videos, user-interfaces) may be a sufficient approach for incorporating time concerns exactly where they are needed.

Domain Specific Timing Concerns

I think dmbarbour has detailed one of many special cases dealing with timing. More subtly, his time model was designed to handle the inevitable timing errors that crop up.

I am not certain how any generalized strong timing framework will be useful without specifying how to handle timing errors. And, I can not see how handing timing errors can be done without knowing the specific domain the solution is for. Just like any robust application; the bulk of the code is dedicated to detecting and handing possible errors, not performing it’s main duty. For example, it is not enough to “guarantee” a method completes within 50ms. You also need to know what actions to take when this guarantee is broken.

Maybe once someone reviews enough specific domains they can see the general patterns used to deal with timing errors: I am not one of those people.

Lee brings up “…cars, medical devices, instruments, communication systems, industrial robots, toys, games…” which all must accept failure in particular ways. Each domain’s model of time will be different specifically to deal with *how* timing failure will be handled. Yes, some domains do not accept failure: and hardware solutions are required. But we all know that is overkill for the plethora of systems that would benefit from strong timing, but are a little more flexible with failure.

Having a framework for proper exception handing for timing failures allows us avoid the monetary cost of real hardware solutions to this problem. I contend it also solves the timing issues in a majority of the domains.

Computing needs time travel

While the thesis of this paper sounds reasonable, I think in practice it is about as reasonable as saying that we need to invent time travel so we can go back and fix our mistakes after they happen.

Big O time has become become the standard of time in computing for good reason: actual timing is heavily affected by many different contingent and uncontrollable factors, whereas complexity limits are quite robust.

What is needed is time systems with the power of type systems.

If we took this seriously, what would it look like?

If we made it a static guarantee, the only way to implement it would be by only allowing it to run on hardware of known timing for every atomic operation with no latencies. (Obviously arbitrary distributed operations would be out.)

And if it's a dynamic guarantee, what will we do if a particular operation doesn't meet the requirement? Cause it to fail? That is likely to increase the time overage.

Performance requirements necessarily must be made with respect to specific implementations on specific platforms with specific scenarios, and probably can at best guarantee an average performance rather than a hard bound.

So though I sympathize with the author's frustration as it applies to his subdiscipline, I don't think he sets up a reasonable goal, at least at the level of generality this paper suggests.

If we took this seriously,

If we took this seriously, what would it look like?

My gut tells me that it is going to turn out to look very similar to parameterization constructs (i.e., polymorphic). Can anyone help flesh out this intuition?

Strongly Timed

An initial stab at a Time System, if I were aiming to achieve Strong Time Safety, would have the following basic timings, each of which could be associated with functions, procedures, methods... much like exception types or return types:

  • unbounded: this method can take any amount of time, and might not even return.
  • finite: this method is guaranteed to return after some finite, but unknown, amount of time. Similar to a termination guarantee, except that it might start events that never terminate... it just doesn't waits for them.
  • progressive: finite and guaranteed to make progress towards termination in each quantum. (Wait-free.)
  • processing bounded(Ti,Tf): this method is guaranteed to return (possibly in failure) after between [Ti,Tf] units processing time.
  • absolute bounded(Ti,Tf): this method is guaranteed to return (possibly in failure) after between [Ti,Tf] units real-time.

For a realistic estimation of 'bounded' times, I would probably need dependent timing (similar to dependent typing). Static analysis of this would, of course, be a pain.

Calling patterns would be in layers. Unbounded calls can call anything. Finite calls can call anything but unbounded calls. Progressive calls can call anything but finite or unbounded calls. And so on.

Realtime would only be achieved with the most specific layers, which would be avoided outside of inner loops.

Strong timings could be attached both to 'interfaces' for black-box composition and still inferred for 'implementations'.

The language would provide asynchronous exceptions, timed promises and rendezvous, and various other constructs useful for programmers to control timing of events. In addition, constructs would exist for programmers to specify that events take place at specific times (e.g. starting in ten seconds, taking no more than 30 milliseconds, would be possible to define in this system).

For methods in an Actors Model system, I would rate by 'reply' time rather than event completion time. The two would only be the same for synchronous actors.

Anyhow, that's a sketch of how I'd do a timing system. Any other ideas?

One language I've recently read about, called Chuck, also calls itself Strongly Timed. It doesn't provide timings for procedures, though, and I don't believe it performs static analysis of any sort.

Resource Bounded Certification

Karl Crary & Stephanie Weirich, 2000. Resource Bounded Certification. POPL #27.

This paper describes an approach to certifying that code respects time bounds using dependent type systems that express how long the code takes to run. They extend Typed Assembly Language in particular. to give the system they call TALres. I've not seen any work that builds on this approach.

As the authors say, there is no inference of time bounds for programs. One would have to mess about with the expressivity of a programming language to make that possible.

Lack of realism

The problem with this paper, though, is that to actually seek what it wants in reality, the only viable solution seems to be, at most, writing everything in Verilog or VHDL, synthesizing that, and programming FPGAs with such. Nothing else will really do; there is no way that general purpose computing will allow such.

We obviously cannot trust programming languages, at least if they permit Turing-complete code - as there is no way to truly verify the timing of such.

We obviously cannot trust programming languages (other than assembly) that have any more than one implementation, and if we use programming languages that have more than one implementation, we must code for a single exact implementation of such down to the exact version. Even upgrades to a compiler could change behavior, obviously.

We obviously cannot trust libraries, unless they are provided in an uncompiled state in a language that is not Turing-complete.

We obviously cannot have more than one application running at once, that is, we cannot trust any kind of multitasking involving outside code, as there is no way for us to control scheduling then. That also means that we cannot trust any kernel we have not written ourselves, as it may have behavior that we do not know timing-wise.

We obviously cannot trust systems which have interrupts, unless we have complete control over any hardware which may ever cause one to be issued. We are probably best off just using DMA combined with polling for most IO purposes, as that will have less impact on overall timing, limiting interrupts for things such as timers to trigger time-critical actions.

We obviously cannot trust hardware platforms that are not completely fixed - down to the exact versions of all the chips and boards used. Simply having the same formal behavior is not enough, as such formal behavior clearly does not include actual exact timing behavior.

With all this said, we are probably best off simply creating the hardware from scratch, generating masks for it, and sending it off to a fab to be manufactured in a particular prechosen process. Failing that for obvious time and cost-related reasons, we could fall back upon FPGAs, but then we must make sure to only use a particular exact FPGA and specific compilation/synthesizing software down to the exact version. Then we can be finally certain of timing.

major agreement

with all of the above. I just wanted to add that in this day and age if we did all that we would also want to consider power as a resource and avoid writing certain javascript/flash code that cooks your legs while browsing the Internet. Computing takes space, too, so if our software/data weren't that bloated, we'd have smaller hard drives as well.

Computing needs Energy

we would also want to consider power as a resource and avoid writing certain javascript/flash code that cooks your legs while browsing the Internet

I could really benefit from that sort of analysis, as well as total energy cost analysis. Unmanned systems need as much power savings as they can achieve, and heat-management issues isn't unusual for unmanned systems either (we have vehicles running in Iraq with CPUs cooped up inside what are essentially ovens due to shuttering against sand).

I'd at least like automated "optimizations" for power and energy, but some sort of first-class analysis of these properties for systems engineering and program design would be very nice.

There are quite a few cross-cutting concerns like these. Languages have a lot more development to do.

I'm not nearly so pessimistic as you are on the possibilities of analysis and modeling for these issues, though I expect initial attempts will be more at the level of keeping programmers 'informed' of the costs of their decisions and helping with automatic analysis and optimizations - i.e. closer to 'soft temporal typing, soft energy typing, soft power typing' than to any hard variation.

Not Turing Complete

We obviously cannot trust programming languages, at least if they permit Turing-complete code - as there is no way to truly verify the timing of such.

We really can get a LOT further without going Turing complete than we usually bother going in programming languages. We very rarely, if ever, need non-termination past non-termination of physical hardware/sensory input. Indeed, we probably want to do (in general) better than non-termination: for continuous sensor input, we really need to guarantee that the processing costs proportional to or smaller than the number of processors serving behalf of parties interested in each input times the number and size of inputs.

Avoiding Turing Complete operations: total functional programming, stratified logic languages (lacking infinite-fact-generator cycles) to ensure finite processing to determine or fail to determine any given fact, procedural languages that forbid arbitrary loop-backs (no 'while' loop, resumable exceptions limited to being forward of the throws statement). Certain call cycles among objects or actors could also be prevented, given a suitably limited object configuration language.

We can push 'Turing Complete' into the highest layer possible, maybe even to the outer edges, such that to have a Turing Complete program you literally need to involve physical hardware that bounces outputs back as as inputs at the very edges of the system. This isn't particularly unreasonable. How many object-configurations really need to support arbitrary cycles beyond physical input? Perhaps we should figure out which sort of limited cycles we'd most benefit from, and limit the language to just those.

Anyhow, giving up on a feature due to Turing Completeness, as though Turing Completeness is a feature rather than a largely unnecessary evil we suffer, tends to irk me a bit.

Your points on multi-tasking issues, interrupts and communications, etc. still apply, and are well taken. Even without being thwarted by Turing Completeness, accurate timing would be a difficult problem to whatever degree it needed to involve external resources.

Turing-completeness

The problem, though, is that many things that seem quite trivial are actually Turing-complete. For instance, various simple cellular automata, including not only Conway's Game of Life but also the 1-dimensional cellular automaton Rule 110, are actually Turing-complete. If one is not able to even implement Rule 110, physical hardware limitations such as memory space notwithstanding, then one must have actually imposed quite a great set of limitations upon oneself by not being Turing-complete.

Turing-completeness

Implementing the rules of such a simulation isn't what usually needs Turing-completeness - it's the unbounded update loop that does. You can usually structure things so that such loops are known at top-level. I'd expect timing requirements of such a simulation to usually be per-update. Also, a type system that maintains static bounds on run-times doesn't preclude the existence of an unbounded runtime type.

Once you start down the

Once you start down the Turing Complete path, forever will it dominate your destiny, Matt.

While it is true we could provide an 'unbounded runtime type', we do not need one capable of introducing an unbounded update loop.

Instead, bind the update loop to an unbounded physical event-source, such as a periodic clock signal.

Not with a global operator, either, but rather as a capability that may be granted... such that clocks can can only be introduced from outside the language. There are many advantages (for reproducible testing) in denying direct access to physical resources.

Mostly I agree with you

Mostly I agree with you (which is the first part of what I said - expose the update loop as such at top level) - that's a better style. But I think it will sometimes be useful to write functions that we haven't proven to terminate without incorporating them into some framework for non-termination.

More to the point, my second comment was just addressing the logical possibility of having established time bounds on only a proper subset of the computations in your program.

Turing whisper...

But I think it will sometimes be useful to write functions that we haven't proven to terminate without incorporating them into some framework for non-termination.

That is the force of the Turing side whispering into your heart, Matt. You must be stronger... the temptation is strong, I know, but the using the Turing side of the force has a great cost that will topple empires.

And with a little less allusion, I expect you work primarily in an environment where the 'control loop' - and the ability to halt or kill it - is implicitly accessible via Operating System job control. That is not an assumption that scales well to secure, open, distributed systems with any form of automatic mobility, distribution, and self-healing.

Formalism vesus Realtimeness

In practice, though, I am sure if such formalism is really the best way attack the issue of timing that was originally brought up in the paper before it went the direction of saying that everything associated with general computing needs to be rebuilt from the ground up. Now that I think of it, probably the most effective way to approach this issue that could actually be done within present-day general computing, and which is most likely to actually be practically implemented, is the introduction of full (optionally hard) realtime facilities into general-purpose operating systems and user software actually taking advantage of such realtime facilities for things such as audio combined with the establishment of overall best practices for such things.

The idea of introducing (hard) realtime into general-purpose operating systems is not nearly as incompatible with the normal operation of general-purpose operating systems as one might think. All that it really requires is that specific processes be able to be scheduled in a realtime fashion, and for such realtime processes to be able to preempt not only any non-realtime process but also the kernel itself. Consequently if, say, a media application needs the ability to output audio or to be able to grab frames off the network or read data from fixed storage, decode any video or audio data, and output them using the video and audio systems, they will be able to do so in a smooth fashion regardless of outside non-realtime load provided that they take advantage of such realtime capabilities.

Of course, this would require other parts of the system to actually be realtime-aware rather that to treat realtimeness as something merely tacked into the side of the system. The video, audio, networking, and storage systems would all need to be able to expedite actions requested of them by realtime processes so as to handle media applications, to use the example from the paper; otherwise heavy loads from other processes could still impede any actions required by realtime processes that would be non-realtime in nature. Of course, there would be limitations to this - realtimeness cannot handle the fact that disks can only seek so fast and that disk access times are not predictable. However, for most user applications realtimeness would probably be less critical for storage than for video and audio (as if an application is reading video and audio data from a disk, the storage system could avoid latency problems by reading ahead).

Not so bleak

You raise some interesting issues. But I don't think the picture is quite as bleak as you paint it. For starters, you could say much the same things about the functional correctness of computing systems:

We obviously cannot trust programming languages, at least if they permit Turing-complete code - as there is no way to truly verify [that they will even terminate, let alone produce a correct result.]

We obviously cannot trust programming languages (other than assembly) that have any more than one implementation [if those implementations don't conform to the language specifications], and if we use programming languages that have more than one implementation, we must code for a single exact implementation of such down to the exact version. Even upgrades to a compiler could change behavior, obviously.

We obviously cannot trust libraries, unless they are provided in an uncompiled state in a language that is not Turing-complete. [How can we possibly know what behavior a library will produce in the absence of fully verifiable source code?]

We obviously cannot have more than one application running at once, that is, we cannot trust any kind of multitasking involving outside code, as there is no way for us to control [the state of registers or memory] then. That also means that we cannot trust any kernel we have not written ourselves, as it may have behavior that we do not know [state-wise].

Yet somehow we manage to produce functionally correct software despite the potential problems.

You say "Simply having the same formal behavior is not enough, as such formal behavior clearly does not include actual exact timing behavior." But what if the timing behavior was part of the formal definition of behavior? If a language spec includes timing in its definition, then conforming compilers would presumably be required to ensure that the timing of a program remained consistent even if the compiler version changed (perhaps a new compiler version would have to add no-ops to ensure a particular lower-bound on execution, if that was what was required to provide the specified behavior). Giotto is one example of a system that's designed to allow both function and timing to be specified in a platform-independent way, and leaves it to the compiler to generate code that meets the timing constraints.

When it comes to libraries and multitasking OSes, maybe we need to rethink how those will work. Perhaps the ABI needs to include scheduling hooks or timing information. Perhaps the software installation process should include not just allocating disk or flash space but allocating schedule time. Perhaps we should allow a mixture of time-triggered tasks as well as reserved slots for contention amongst lower priority preemptive tasks. Perhaps we need to develop new techniques for dynamic scheduling of code that carries timing constraints with it. There's all sorts of interesting research to be done there.

Is the fundamental sea-change that Lee calls for really necessary? I'm not entirely sure, to be quite honest. Maybe we can continue to muddle along with layering "real-time" on top of time-free abstractions. But I think it's worth at least considering what the alternative might look like, and what it would buy us.

Re: Not so bleak

You raise some interesting issues. But I don't think the picture is quite as bleak as you paint it. For starters, you could say much the same things about the functional correctness of computing systems:

[...]

Yet somehow we manage to produce functionally correct software despite the potential problems.

This is because there are means of producing sufficiently well-functioning software other than relying on formalisms to enforce correctness. In reality, such is achieved through various software development practices, which largely allow us to create functionally correct software without such.

You say "Simply having the same formal behavior is not enough, as such formal behavior clearly does not include actual exact timing behavior." But what if the timing behavior was part of the formal definition of behavior? If a language spec includes timing in its definition, then conforming compilers would presumably be required to ensure that the timing of a program remained consistent even if the compiler version changed (perhaps a new compiler version would have to add no-ops to ensure a particular lower-bound on execution, if that was what was required to provide the specified behavior). Giotto is one example of a system that's designed to allow both function and timing to be specified in a platform-independent way, and leaves it to the compiler to generate code that meets the timing constraints.

The problem with enforcing exact timing down to the specification level is that we inevitably would have to enforce it right all the way down to the metal, and would most definitely have to enforce it at the operating system, compiler, and language runtime levels. The problem with such, though, is that hardware, operating systems, compilers, and language runtimes could never become better; the timing would be part of the platform itself, so one could never make a better processor core, or a better cache, or better DRAM, or a better scheduler, or a better compiler, or a better runtime. One would be bound to the timing specified when a platform was originally created until the time at which it would finally be obsoleted. And while this may be well and good for embedded applications or gaming consoles, this is certainly not acceptable for open general computing platforms.

When it comes to libraries and multitasking OSes, maybe we need to rethink how those will work. Perhaps the ABI needs to include scheduling hooks or timing information. Perhaps the software installation process should include not just allocating disk or flash space but allocating schedule time. Perhaps we should allow a mixture of time-triggered tasks as well as reserved slots for contention amongst lower priority preemptive tasks. Perhaps we need to develop new techniques for dynamic scheduling of code that carries timing constraints with it. There's all sorts of interesting research to be done there.

Is the fundamental sea-change that Lee calls for really necessary? I'm not entirely sure, to be quite honest. Maybe we can continue to muddle along with layering "real-time" on top of time-free abstractions. But I think it's worth at least considering what the alternative might look like, and what it would buy us.

I myself just think that the most practical approach which would actually work is to build realtime into general computing platforms from the ground up and to take it seriously rather than to merely tack it on as an afterthought or niche function. That is, not to merely have a realtime capacity at the kernel or even a preemptable kernel, but to construct all basic level system services (whether in kernel space or user space) and libraries to support realtime operation - that is, the kernel and userland would have to be fundamentally capable of realtime operation themselves. Most applications will not need realtime, but those which will need it, such as video and audio-related applications and certain sorts of networking applications (such as voice over IP and packet filtering) will need to be able to take full advantage of realtimeness without being hampered by system services and libraries not fully designed for it. And while such may seem radical in the sense that it will require significant changes at the kernel and userland levels to any operating system being modified to fit such, it actually would be very conservative in impact, and would have practically no impact upon normal general computing outside operating system design and implementation.

Building a "better" system

This is because there are means of producing sufficiently well-functioning software other than relying on formalisms to enforce correctness.

Really? You don't think that things like calling conventions, static analyses, type systems, memory protection, modular programming, or other ways of "enforcing" correctness have helped?
The problem with such, though, is that hardware, operating systems, compilers, and language runtimes could [i]never become better[/i]; the timing would be part of the platform itself, so one could never make better DRAM, or a better cache, or a better scheduler, or a better compiler, or a better runtime. One would be bound to the timing specified when a platform was originally created until the time at which it would finally be obsoleted.
That rather depends on what you mean by "better", doesn't it? You seem to be implicitly equating "better" with "faster". But better can also mean "more predictable", "uses less silicon", "consumes less memory", "uses less battery power", "suffers less timing jitter". Furthermore, there's nothing to prevent you from having "better" mean "faster" so long as (for example) your compiler or runtime was made aware of the change in timing of the underlying ISA and was able to compensate approriately to maintain the specified timing constraints of existing programs.
I myself just think that the most practical approach which would actually work is to build realtime into general computing platforms from the ground up and to take it seriously rather than to merely tack it on as an afterthought or niche function. ... construct all basic level system services (whether in kernel space or user space) and libraries to support realtime operation.
In a sense, that's precisely what Lee is suggesting. He's just arguing that to do it right requires processor architectures designed with timing in mind. But as with your suggestion, once the capabilities are in place, there's no requirement that you use them.

Re: Building a "better" system

Really? You don't think that things like calling conventions, static analyses, type systems, memory protection, modular programming, or other ways of "enforcing" correctness have helped?

They most certainly have. I was more talking about formal code verification, where one can prove that code is correct or, in this case, that its timing is correct.

That rather depends on what you mean by "better", doesn't it? You seem to be implicitly equating "better" with "faster". But better can also mean "more predictable", "uses less silicon", "consumes less memory", "uses less battery power", "suffers less timing jitter". Furthermore, there's nothing to prevent you from having "better" mean "faster" so long as (for example) your compiler or runtime was made aware of the change in timing of the underlying ISA and was able to compensate approriately to maintain the specified timing constraints of existing programs.

"Faster" was implicit in my comment there, yes. But what I was saying is that you could never make any future changes that change timing behavior, regardless of the merits of such changes. Hence you would be bound to the originally specified timing behavior, regardless of the merits of any future developments - and that such is not acceptable for general computing.

In a sense, that's precisely what Lee is suggesting. He's just arguing that to do it right requires processor architectures designed with timing in mind. But as with your suggestion, once the capabilities are in place, there's no requirement that you use them.

He was also saying that we would need to completely replace all existing operating system and language paradigms as well, whereas I was saying that with proper realtime support at the operating system level, one could have systems that would look essentially the same as before from without to non-realtime applications. One would not need to even discard things such as POSIX and garbage collection for the normal non-realtime application, but rather would just need to specify a practical realtime API which would allow applications that do need realtimeness to run without interference from non-realtime applications. (Such could range from anything like a special realtime scheduling API combined with system services that actually use it to a full realtime microkernel system on which POSIX is but one API provided by a set of runtime libraries and servers.)

Another Meaning of "Computing Needs Time"

Not only is it interesting to talk about how to build computing systems that take account of time constraints in their interactions with the world (the discussion so far), but also, to note that computing takes up time. It may be interesting to turn some attention to the question of how in a programming language (or library or operating system) we allow the programmer to express a computation that looks at one or more other computations as taking time.

For example, a procedure could be passed in a capability that represents the authority to chew up a certain amount of processor time, and we could invoke the capability and say to it, "here, dedicate up to half a second, divided equally, to calculating g(x) and f(y). Stop and wake me up as soon as either calculation finishes or the time is consumed." The capability invocation could eventually include in its reply, representations (or abstractions) of the state of the partial completion of whichever applications g(x) and/or f(y) had not completed. The calling program could for example, if it still possessed any further unconsumed authority to call for timed computation, call for further prosecution of the unfinished function applications.

This relates to "agoric" systems. Computing time can be valued and its value weighed against other goods.

'Signal' language

tutorial says "An other hypothesis is that computations take a null time. So we consider that new values are produced at the same logical instant as data used in their computation."

which made me raise my eyebrows.

on the other hand, i think it would be very hard to predict for how long any given routine will run. :-)

My enthusiastic support, for whatever that's worth

I'd like to weigh in and add my enthusiastic support for all of what Lee says. For whatever my support is worth. I suppose I have some credibility since I've written hard real-time software by hand (Pro Tools Plug-Ins) for DSPs and general-purpose processors, in assembler and C++ respectively, and I've written software to generate hard real-time software automatically (Simulink RTW, now called Simulink Coder).

This is a real crisis in realtime software engineering, a kind of "dirty secret" that pretty much no one except Lee is talking about. This crisis, like many problems in software engineering, is not perceived by the general public since behind-the-scenes heroics usually gets things working in the end.

In fact this crisis, like many problems in software engineering, is not even perceived by many engineers in the midst of it, since they see nothing wrong with these heroics, i.e. to them they are "business as usual."

But we need to make the production of software a disciplined rather than heroic undertaking.

Another factor shielding this and other problems from the public eye is that even when heroics fail, those products are not allowed to get to market so only the very inquisitive shareholder will become aware of where all the money went.

Here I'll just restate the core issue, as I see it. Average hardware performance has soared but worst-case hardware performance has stagnated or degraded, and, what's worse, it is hard to predict or even test. This has made hard real-time software far harder to write.

As is traditional, by hard real-time I mean that the value of an answer goes to zero at a certain time, i.e. a late answer is as bad as no answer at all. But implicit in this whole discussion is that hard real-time also means that the time scales we are talking about are such that average performance cannot be assumed to be achieved.

In other words, perhaps it is easy to design a hard real-time system on modern hardware if it has a period of one second. To put it another way, the performance of modern hardware is great if measured in millions of instructions per second, but terrible if measured in instructions per microsecond, though these might naively be assumed to be the same.

This point about time scales may be obvious, but it is another important reason why this crisis is not widely perceived: to get around the problem, people are designing higher-latency systems. This is how, for example, audio processing on general purpose computers achieves great throughput performance: by trading this off for poor latency performance compared to dedicated-DSP systems. But not all applications can tolerate this tradeoff. For example latency is not a "free variable" in control systems!

It is perhaps interesting to speculate what the situation would be like if, even though worst-case performance had stagnated or degraded, it had not also become so hard to predict. Then perhaps we would just be wasting a lot of circuit board space, heat, and power by using a modern processor. In which case we might as well use one of the modern "spins" of the old processors that are kept alive just for this purpose. But then we'd still be limited in the scope of what our applications could do with this low-though-predictable performance. Unless we could harness parallelism, another hardware development that we and our languages have not yet adapted to very well. But that's a whole other, oft-discussed can of worms.

funny

as i was reading i kept thinking thoughts that you then summarized in your last paragraph :-) since i grew up on 6502/10 and z80. :-)

Neither does global time

We can have global time with only local state.

In practice, you'll still have `global` shared state resources anyway, such as databases and distributed hashtables.