Immediate mode GUIs

From http://sol.gfxile.net/imgui/:

In typical GUI applications you create a bunch of widgets, they get displayed somehow, you query the widgets for information, you send messages and data to said widgets, and finally clean things up after you're done. Some parts of the above is usually done using some kind of visual editor, but the result is usually tons and tons of code all over the place.

This is ok for most applications, but not so convenient for games or other frame-by-frame realtime applications. So, instead of having one place for init, another for ui callbacks, and a third for cleanup, IMGUI can be as simple as this:

if (button(GEN_ID, 15, 15)) {
  button_was_pressed();
}

A very interesting point in the design space of UI abstractions. It feels kind of like FRP...without the functions.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The usual difficulty with

The usual difficulty with immediate mode is resource management. E.g. managing lifetimes of loaded textures or displayed windows or subscriptions. One can use caches and windowed memoization, but it often remains awkward and hackish.

In my day job, we eventually built immediate mode GUIs above retained mode dialog systems - pushing all state out to a database so it can be observed, recorded, replayed, and manipulated by external plugins. To delete a window, one could remove an element from the database. Paths through the database provide enough stability that the retained mode dialog model works pretty well. The round-trip latency for pushing all reactions through the local database has proven fast enough.

Immediate mode is a very good fit for reactive and temporal programming models (including FRP).

Immediate-mode also seems to be a prerequisite for ZUIs (zoomable user interfaces) because we must be able to systematically unload and reload UI elements as we observe them. With a windows shell, it would be infeasible to model all forms as "open" all the time (just need to zoom or navigate to view them).

I must say, a fair bit of my RDP design was influenced by my preference for immediate-mode GUIs (and ZUIs) and, in particular, the need to manage resource lifetimes precisely.

Your wish for putting at

Your wish for putting at work windowed memoization more seamlessly if not totally transparently is obviously appealing for a powerful generalization of memoization (*).

I'd sure aim for the same sort of techniques in the overall arsenal.

To bounce back concretely on your Fibonacci example, we can note for instance that there might be interesting opportunities to leverage what is known today w.r.t. static analysis and to have language processors perform, in the end, the same sort of thing as one can remark/do mentally-manually when derecursiving the Fibonacci function:

merely, to notice, as you did of course, that the co-domain value mapped from n in the domain depends exclusively (i.e., "purely", in the functional sense of the adverb), upon those prior mapped from n-2 and n-1, hence with a two domain value-sized window only, in that example.

A first design challenge being to find out, as the "best" place, where to delegate this analysis "burden" (**) / form of optimization over. Via external specification or annotations ? Or more or less comprehensively inferred ones with guidance from the source language's semantics itself ? Or its language runtime ? Or by (dynamically?) configured strategies in its interpreter/compiler/translator ? etc.

(*) in the sense that "naively" memoizing the full subset of the domain thus far mapped can be regarded as a sufficient, but specific case of the more optimal windowed memoization that relies only on the necessary, smaller window implied by the direct and pure recursion call site(s) dependency, when it can be known.

(**) but which can pay off a lot at runtime, clearly!

Widget Identity

The developer of this tutorial uses clever (but obtuse) preprocessor tricks to provide stable identity to widgets, based on line number in source code. This allows modeling multiple buttons, for example.

There are many more possibilities, of course. In my dayjob, we use dotted paths for structural identity (sometimes matching on types). In Haskell, I've used types for identity (leveraging Data.Typeable). I've been told Plan9 uses its filesystem paths more directly for most application state. In general, associating state directly with file and line number is too rigid, but something like "path" through the code (allowing relative or dynamic paths) can work well.

I've spent a lot of time thinking about identities with regards to multiple challenges:

  1. stability and performance
  2. live programming
  3. orthogonal persistence
  4. administration and debugging
  5. security

It seems that, in every category, structural or content/type-driven approaches to identity are more elegant and effective than use of new and delete. The security element first seemed problematic to me (since new/delete is fresh capability) but paths can be secured by partitioning abstract spaces and chroot-jails or like techniques. (And access to a particular path can be represented by capability.)

This also appeals to my physical intuitions - resources are neither created nor destroyed, only discovered and composed.

Widget identity

I've got C++ code sitting on my hard drive from about 5 years ago that does something very similar to the OP. I didn't use the line number trick, though. Rather, I just counted widgets (implicitly forming "dotted paths") to support static layout and handler specification of panels with C functions, and then handled dynamically varying sets (e.g. list boxes) with special widgets.

But anyway, in thinking about similar problems as you, I've come to pretty much the opposite conclusion: widgets should have opaque identity rather than trying to assign a stable identity based on structure or content. But I do think pools of identifiers should be semantically local, rather than global as new/delete might imply.

Do you mind giving brief arguments in favor of the structure/context approach in some of your 5 categories? Regarding bullets 2 & 3, my approach would be to not migrate or persist the widgets themselves, but to be able to build some new ones from only the content, such that dropping and rebuilding widget state produces only minor disruption/issues. I'm guessing that you have in mind that the language/run-time would be responsible for maintaining stability of that kind of state, which you wouldn't model explicitly. So we'll end up being very close, but on different sides of a wall.

The association between

The association between widget and widget state is what needs the consistent identity. If we add a textbox, we don't want the old text in the wrong box. And it's a bad idea (for various modularity and flexibility reasons) to couple widget state with the representation of widget structure. In general, we need a consistent association between widget structure and the state it presents despite independent changes in both.

For live programming, a valuable property is that system state is always consistent with what you'd have after a reset. The idealized system is resetting every instant (much like an immediate mode GUI). As always with programming tools, we can make up for a deficiency of any systemic property with some self-discipline and foresight. But use of structural identity removes much need for self-discipline - i.e. we know we're picking the same identity every instant, and any reference distribution (e.g. for routing service subscriptions, or debug views) won't suddenly be holding references to old and now defunct identities.

I have thought a fair bit about this...

...and think I have a pretty good understanding of how immediate and retained mode relate to each other, and to FRP.

I got interested in this question because most GUIs actually need to use a mix of retained and immediate mode -- for example, think of a web page with an embedded video playing in it. It would be crazy to display the video in anything other than immediate mode, since that would waste a ton of memory, but of course you don't want to repaint the rest of the GUI on every tick, since it's not changing, and pointlessly rerendering the screen would burn a lot of CPU (and battery life) for no reason.

Basically, the idea is that you can type stateful objects in a GUI with linear types. Usually, linear types make certain forms of expression harder, but it turns out that most GUI toolkits actually work hard to maintain linearity invariants dynamically -- every node in a DOM or widget hierarchy have parent pointers, and attempts to give a node multiple pointers either cause an error stop (eg, in GTK) or just silently reparent the node (in JS/DOM).

Now, you can also view a stateful widget as a stream of observable states (let's call them "frames", so a widget is a linear stream of frames). Then, the difference between an immediate-mode GUI and a retained-mode is the API for manipulating widgets.

An immediate mode GUI gives you operators to construct frames, and then you use stream operators (eg, unfold) to build a stream of frames, and hence widgets. A retained-mode GUI, on the other hand, gives you operators to construct whole widgets (ie, streams of frames) at a time, and you manipulated widgets with bulk operations. For example, you can interpret something like node.style.color = "magenta" as an operation which takes a stream of frames, and then maps the function which sets the color property to magenta over the whole stream.

So that's the semantic view. From the implementation side, a retained mode API lets you use memoization to reduce recomputation when bulk update operations don't happen very frequently. So the widget data structure is really a way to implement memoization.

One thing that would be really nice is to work out a way to give a uniform API, which makes adding or remmoving those memo structures into something that doesn't require radically restructuring your code. It's pretty common, and terrible for performance, to have API inversions, where you end up implementing a immediate-mode API on top of a retained mode API. Making that easily fixable would be good.

One thing that would be

One thing that would be really nice is to work out a way to give a uniform API, which makes adding or remmoving those memo structures into something that doesn't require radically restructuring your code. It's pretty common, and terrible for performance, to have API inversions, where you end up implementing a immediate-mode API on top of a retained mode API. Making that easily fixable would be good.

Couldn't agree more with this path of criticism and ideas ! I can even speculate that there is probably a huge opportunity for more "killer research results" (thank you, FP ;) that would be much, much welcome in the (near?) future, and as much by applications as by OSes or by barely higher layers, btw.

We certainly don't want to repeat too often the same errors that have had us, eventually, know better today, as you point out. I mean, when we step back a bit and really think of them as other likely beneficiaries of today's theoretical FP knowledge, and practice, some of today's GUI layer designs and implementations really feel like as huge wastes of CPU time ! Let alone of memory... (all things that could no doubts be used for something else more valuable beyond just ... drawing/repainting, also difficult to sustain excuses with the GPUs around, etc.)

Not so sure

For video on a webpage, decoding per frame needs to happen whether the video itself is visible or not (if nothing else because audio is still "visible", but also because the decoding is usually iterative, depending on previously decoded frames). Indeed, it's the decoding that takes time, transferring the image itself is pretty much negligible; deciding if part or all of the frame should be visible is something that happens in the rendering layer.

More on this when I have a proper keyboard in front of me rather than tapping on a carky touchscreen.

Simon

Some challenge problems and food for thought

Neel,

To what extent can you make hard programming problems easier? For example, most toolkits do not come with controls for rendering collections of data, and even fewer allow slicing and dicing that data in new ways, and even fewer let you edit the slices. One thing I have noticed is object-oriented change notification techniques don't scale well with these complex visualization requirements. Let me sketch some scenarios:

  1. Data Grid controls
    • Bulk Update: There are several scenarios to consider here.
      1. Nothing is sorted or grouped (trivial case).
      2. A column is sorted. This case is unique, because we need to know if the updated rows replace the value in this column. From a purely functional perspective, we need to know exactly what in the record was updated.
      3. A column is sorted, but the row updates another attribute than the one pertaining to the sorted column. Seems simple, but suppose there are also validations that require validating two or more columns in a single row when the row is updated. For a practical example of how this could be a problem, in current WPF Data Grids, the built-in mechanism for validation is coupled to a property change notification, which also therefore suggests to the grid that the row's value has changed, which therefore will trigger re-sorting.
      4. A column is grouped.
      5. A column is sorted and grouped.
      6. A column is filtered.
      7. An external event, such as a request to Save all records, forces validation checks to run.
      8. Some validations are ideally instantaneous, such as checks for entering duplicate values. Visually, we would like to display error icons in each row's "row status indicator" area to indicate there is an error, so that when the user mouses over the icon, they can see a report of the issue. Ideally (in my humble opinion), rather than a tooltip on mouse over, the user can use the icon to interact with the incorrect rows in a new view.

These are a lot of engineering concerns just for one component.

In addition, something I have never seen ANY toolkit get right is scrolling! Consider the following scenarios:

  1. Finite-sized UI content within a scrolling container.
    • If the finite-sized content
  2. Infintely-sized UI content within a finite, non-scrolling container.
    • e.g. a Data Grid can display as many rows as you allow it to, and after that limit is reached, will allow the user to scroll to view the remaining content.
  3. Infinitely-sized UI content within a finite, non-scrolling container, where a portion of the visible infinitely-sized UI content can be expanded and collapsed.
    • e.g. a Data Grid can have a Row Expander, and if the expandable content's finite-size exceeds the Expander's viewport, then the content should scroll.
  4. Infinitely-sized UI content within a scrolling container.
    • e.g., a Data Grid is capable of displaying 1 million rows, and if it thinks it has infinitely many pixels to draw to, then a retained mode grid will try to retain 1 million rows worth of visualizations. However, what we really want is a scrolling container that behaves a lot like a regular container, except if the user re-sizes the window or undocks their laptop and causes a resolution change, they can still see and edit rows in the Data Grid.

Furthermore, there are basic "sanity checks" any UI should provide as contractual guarantees:

  1. When I mouse out of one area and mouse into another area, the mouse out event should always fire before the mouse enter event. This is actually difficult to guarantee, and the reason has to do with mapping GUIs to screen resolution. In scenarios where "borders" must "snap" to device pixels, its not clear if the user is coming or going. Should a UI allow the user to construct such scenarios? If so, how should the language facilitate awareness of such edge cases and improve the developers ability to handle the situation "correctly"?
  2. Modal dialogs. These tend to be either horribly broken or incredibly useless in any toolkit I have ever used. Trying to code against them in a way that is easy to test is a nightmare. For example, in WPF, if a custom modal dialog is closing as a standard OK message box dialog is displaying, the OK/Cancel message box will select the closing window as its parent. Since its parent then gets closed, the OK message box automatically dies without returning OK or Cancel. The workaround is to use a .NET SynchronizationContext, but the flip side to asynchronous modal dialogs is that they are no longer actually modal dialogs, as they are no longer bound to the UI thread. Furthermore, in older web browsers, modal dialogs do not have Run-To-Completion semantics and it is possible to subvert modality. I'm not actively involved in web development, so I can't speak to the state of the art across browsers.
  3. On a simpler note, we shouldn't overthink GUIs at the language level, and pay attention instead to the gross accidental complexities in toolkits, such as alexfromsun's rant about how poorly designed Java Swing Listeners were.
  4. Synchronization of sound and video.
    • Again, don't overthink this. Flash is dead in part because, although artists loved the concept of working with keyframe, it made playback impossible to get right cross-platform. Thinking about how to come up with a design that you can regression test against a server farm full of different GPU and CPU and Main Memory configurations is a real challenge.

For what it is worth, I really like the approach David Barbour suggested on LtU years ago on how to design and use GUI toolkits. He didn't address any of the above concerns, but instead he addressed how to think of UIs as faceted configurations. I could really see his approach working, to the tune of even building an IDE to support the approach, in much the same way Visual Studio supports XAML, XCode and Builder supports Cocoa, etc.

An immediate mode GUI gives

An immediate mode GUI gives you operators to construct frames, and then you use stream operators (eg, unfold) to build a stream of frames, and hence widgets. A retained-mode GUI, on the other hand, gives you operators to construct whole widgets (ie, streams of frames) at a time, and you manipulated widgets with bulk operations.

The point about immediate mode UI is the implicit while (true) loop that ensures it is executed cnotinuously (semantically). So whatever you do inside yoru UI code is necessarily operating on all frames.

So that's the semantic view. From the implementation side, a retained mode API lets you use memoization to reduce recomputation when bulk update operations don't happen very frequently. So the widget data structure is really a way to implement memoization.

We can still reason about memoization in an immediate-mode API; like memoizing AST trees in a (very) incremental compiler. You could also "optimize" the while (true) loop abstraction so that it wasn't really doing everything on on every frame.

I'll reply soon...

I haven't ignored your comment -- the POPL deadline is today.

Actually, the deadline is in

Actually, the deadline is in approximately 25.5 hours.

The point about immediate

The point about immediate mode UI is the implicit while (true) loop that ensures it is executed cnotinuously (semantically). So whatever you do inside yoru UI code is necessarily operating on all frames.

I agree, and this is also the essence of FRP and synchronous dataflow languages: specifying a program as a behavior over (discrete) time is equivalent to specifying how to update a program state at each tick.

When I've implemented FRP languages that connect to existing toolkits, the runtime usually contains code to re-synthesize that big while loop -- I register a timer to run at (say) 60 Hz, and then re-evaluate the dataflow graph each time that timer event fires. Now, most GUI toolkits tend to specify events such as mouse clicks asynchronously. So to handle that impedance mismatch, the callback registered to a GUI event handler writes some data into a buffer, and an event stream reads that buffer to generate the data for each logical tick.

We can still reason about memoization in an immediate-mode API; like memoizing AST trees in a (very) incremental compiler. You could also "optimize" the while (true) loop abstraction so that it wasn't really doing everything on on every frame.

What I'd like to be able to say is that a retained mode API is the same thing as a immediate-mode API plus memoization. This isn't quite true, because different event handlers in GUI toolkits can fire at different rates. But it seems plausible to me that if you extend FRP with a clock calculus (a la the synchronous dataflow languages), then it could be true.

I actually think Immediate

I actually think Immediate and Retained are quite different APIs now, with no easy equivalence. We can have a retained API over an immediate one, or even an immediate API over a retained one (with memoization/identity recycling).

As I've always implemented my not-quite version of FRP (SuperGlue), it has always been either on top of a retained API (Swing) or an immediate mode API with a small retained layer. FRP provided the illusion of immediacy, but we still needed a retained implementation underneath that.

immediate vs retained

What I'd like to be able to say is that a retained mode API is the same thing as a immediate-mode API plus memoization. This isn't quite true, because different event handlers in GUI toolkits can fire at different rates. But it seems plausible to me that if you extend FRP with a clock calculus (a la the synchronous dataflow languages), then it could be true.

This seems like an odd way to distinguish between retained and immediate mode. (I should note that while I have a fair amount of experience with immediate mode GUIs, I have only a modest background in CS, so if my definitions are inconsistent with standard academic usage, please correct me.)

In my mind, the concept of immediate mode GUI means that the application supplies a function which maps the current application state to a UI, where a UI includes both a visual representation and a means of mapping input events back to state changes. It's the job of the UI library to ensure that the way the UI actually looks and behaves always reflects the result of applying this function to the current application state. In retained mode, on the other hand, the UI itself is state that the application manipulates directly. Any invariant that is supposed to exist between the UI and the application's model state is maintained explicitly by the application through event handlers. (Most retained mode GUI libraries now support data binding, which I see as a limited injection of immediate mode into an otherwise retained mode API.)

As far as this distinction goes, the rate at which UI events are delivered seems irrelevant to me. At the level of the event loop, an immediate mode library should be identical to a retained mode library. The difference is in how they determine how those events affect the UI. An immediate mode library maps the event to a state change, and then maps the new state to a new UI. The retained mode library routes the event to some widget (or application) object and lets it worry about it.

As for memoization, I understand that while I'm calling the retained mode UI 'state', you're thinking of it in a functional sense as applying successive manipulations to some starting UI. So in that sense, I wouldn't argue with you saying that the retained mode UI is memoized. However, I don't understand why you see memoization as incompatible with immediate mode. As I said, the UI is the result of applying a function to some state. The function is likely pretty complex, so memoizing its result (and the result of subexpressions within it) is important for performance. Some subexpressions are just the application of pure functions (e.g., generate an image of the number 5 rendered in 12 pt arial), so memoizing those is easy. At a higher level, though, it's a little more complicated, since the UI function is allowed to associate arbitrary state with any part of the UI (i.e., any point in the evaluation of the function), and that state can affect the mapping. However, for any given UI region, the UI is only dependent on that region's inputs and its internal state. Thus, if you can detect changes in both, you can memoize the resulting UI.

So in the example you gave above with a video playing inside a large web page, the rendering of the static portions of the web page could be memoized while the video is continuously updated. I think this is what you're asking for in the last paragraph. You specify both with a uniform, immediate mode API, but you just wrap the static portions in a 'memoize this' block, and it becomes as efficient as it would be with a retained mode API.

A different way to look at retained vs. immediate mode

Thomas,

Some brief background. Functional reactive programming started out as a way to express graphics programming using a denotational semantics. It came after Conal worked for years at Sun on object-oriented reactive graphics toolkits that were complete flame-outs. Conal really had a breakthrough in creating FRP.

But there are no royal roads to computer graphics. Functional reactive programming approaches like Yampa have semantics that essentially say the entire widget hierarchy needs to be destroyed and recreated at every time step. That's ridiculously expensive. Due to this engineering problem, FranTk and wxFruit have had to allow some imperative programming elements to help create and destroy subgraphs, which wash away some of the benefits of having a pure semantic model for a GUI. We want to be able to precisely reason about what an event will do.*

All NeelK is trying to do is solve this problem, as I understand it.

(Most retained mode GUI libraries now support data binding, which I see as a limited injection of immediate mode into an otherwise retained mode API.)

Data binding is orthogonal. In terms of deciding what to paint, the key is efficient damage calculation and repair; figuring out what pixels are invalidated and need repainting and what code to execute to fix that. Most retained mode systems organize widgets as a hierarchical "scene" graph that is a mix of objects and data like behaviors, lighting, cameras, and coordinate transforms. Such abstraction allows the client programmer to orient visualizations as she sees fit. In addition, the graph also helps with bounds management and "picking" (identifying which visual object the mouse is over and the precedence in which visual objects can handle a mouse click). Ideally, if we have a scene graph, it should also be dynamic and support multiple views. Dynamic scene graphs are important in retained mode graphics, because, for example, you can't do drag and drop without it and you can't do stuff like downloading web content without it. Cameras are important because having a viewing frustrum basically means there is no such thing as manually repainting the screen. You simply pan, zoom, tilt and the viewing frustrum is updated accordingly. It also makes arbitrary 3D transformations easy. Ideally, cameras can even be embedded within the scene graph, to maximize the pan/zoom metaphor: this allows trivially comparing two distant locations simultaneously or view a zoomed out view in conjunction with a thumbnail close-up as is common in vector graphics drawing apps like Illustrator. All of these considerations help play into a good design for damage calculation and repair.

* There's actually also a number of ways current FRP approaches don't really live up to the claim of making code easier to reason about, but that is a topic for another discussion.

Postscript: Here is a link to a research paper I discussed on my blog regarding immediate vs. retained mode graphics, as applied to supercomputing. Note there are a number of accidental complexities related to (lack of) language semantics.

http://z-bo.tumblr.com/post/380524273/extended-abstract-retained-mode-parallel-rendering

terminology

Thanks for the background info about FRP. I was aware of it and the parallels with IMGUI, but I hadn't found those particular implementations, and they look interesting. It will take me a while to get through all the papers describing them and other aspects of FRP, but I'm certainly willing to believe that memoizing an FRP GUI in Haskell presents very different challenges than memoizing an immediate mode GUI in C++. That is, if what I'm describing should even be called 'immediate mode'. I think another source of confusion for me is that I have a non-standard definition of immediate mode, in that I allow for memoization. Wikipedia's definition, on the other hand, includes the following.

Instead, the application must re-issue all drawing commands required to describe the entire scene each time a new frame is required, regardless of actual changes.

This would explain why the article you linked to states that an immediate mode API would introduce bandwidth issues. Similarly, when I first read your paragraph on minimizing updates and scene graphs, it seemed mostly irrelevant to the immediate/retained distinction, but with this stricter definition of immediate mode, it makes a lot more sense.

But now that I realize this, I'm wondering what to call my interpretation of immediate mode. Is there a standard term for an immediate mode-style API that utilizes memoization to avoid traversing/processing parts of the scene that haven't changed?

Conventional immediate mode

Conventional immediate mode APIs do not account for memoization. But an API could provide the illusion of being immediate while doing memoization transparently. In fact, FRP can be like this: the illusion of a pure declarative UI with memoization to optimize re-evaluation of the dependence graph. I don't its an abuse to call an API "immediate" even if its implementation is "not really."

Glitch freedom

Memoization is cool, but the related concept of glitch freedom (as applied in FrTime) is also valuable if we are going to talk about illusions.

Glitch freedom is just not

Glitch freedom is just not taking any action until your system has stabalized :)

"Consistent-eventually" systems.

That's really a "consistent-eventually" system. "Consistent-eventually" UIs used to be common, and you could watch your system catch up after a major window change. Google Maps is a nice example of a modern consistent-eventually system.

For a UI to be free of transient glitches is harder. There's the brute-force approach used in console games - redraw everything on every frame. (The game console doesn't have anything more important to do, after all.) There's using an eventually-consistent system on something fast enough that it catches up within a frame time or two. (That's Microsoft Windows on modern hardware). There are systems where some parts of the display are being updated by background tasks, while others are eventually consistent. (Smooth scroll often works that way.)

All of this is being impacted by the battery life issue in mobile, where total CPU cycles used matter.

Physics engines

A canonical example of consistent-eventual systems are physics engines, where constraints are intentionally "springy" and solved over time via an iterative algorithm.

Two thoughts

1) John Nagle, awesome.

2) Glitch freedom can be related to fault tolerance and partitioning barriers. Glitch freedom basically guarantees you can't compose a new value with an old value. It is possible to do this by just re-evaluating every function, but it is nice to have it be incremental, too. There have been discussions on LtU relating self-adjusting computation to Reps' work on attribute grammars, trying to find some principles for incremental computation.

Dude, terminology totally depends on what tricks you pull :)

Have a look at the following URL for an idea on one way to cache immediate mode graphics programming APIs: http://lambda-the-ultimate.org/node/2282

With partial evaluation, you compute as much as the computation as possible without having all the available inputs. Similarly, for fixed objects and lighting, specializing a ray tracer via partial evaluation essentially inlines all the traversals of those fixed nodes into the resulting specialized ray tracing program.

Another technique is called "pass separation". Basically, the output of the first part becomes the input of the second part. Obviously if you can cache the first part you can win big performance gains.

Broadly, these optimizations are called staging transformations.

Hi Thomas,Thanks for your

Hi Thomas,

Thanks for your comments. The difference I'm concerned with doesn't lie in the implementation, which as you observe is fundamentally similar in both approaches, but rather in how clients of the library are supposed to think about their design. Suppose you create a widget -- say, a button -- in a retained mode API. Now, to react to clicks of the button, you give the button widget an event handler routine, which it invokes "whenever the button is clicked". That is, as a programmer you aren't supposed to think about the event loop that is used to implement the retained mode API.

In particular, if you create two buttons, a retained mode API gives you no information about the relative orderings of the click events from the two buttons. If you need that information, you need to track it yourself with some state variables. That's what I meant by different event sources running at different rates.

In an immediate mode system, or an FRP system, there is a global clock that everyone synchronizes with. This means that a lot of concurrency-style reasoning is no longer necessary, and programming GUIs gets a lot easier. However, it also means that there isn't an immediately obvious way to go back and forth between the two styles of API.

I see. Thanks for the

I see. Thanks for the clarification. I'm starting to see that FRP and my interpretation of immediate mode are more different than I initially thought. As I understand it, the global clock that you're talking about is a consequence of the fact that in FRP, everything is modeled as a function of time.

I have a less radical design in that the event loop is not entirely hidden from the application. When the application specifies that there is a button in the UI, it also specifies what to do when the button is pressed. Since this is in C++, the "what to do" part is generally just updating some state. (I suppose if this were a pure language, you could model it as producing the next value in a stream, but that's really outside my area of expertise.) The goal of my design (and probably that of most IMGUIs implemented in impure languages) was not so much to eliminate all reasoning about state, but rather to eliminate reasoning about redundant state. In other words, to reduce the set of state variables to a minimal, orthogonal set, and then represent the entire appearance and behavior of the program as functions of these variables.

I'm curious though about the benefits of moving away from reasoning about the event loop. Can you give an example of what you mean when you say that it eliminates "concurrency-style reasoning"? UI event handlers are naturally atomic, so the normal difficulties of reasoning about stateful concurrency (as I see them) don't apply.

Simple example

In WPF, a Button has a Click event and a Command property. The Click event is raised before the Command is invoked. Since the event loop is internal to the Button object, there is no way to override this behavior. You would have to re-write Button or use some inheritance to override the Button's behavior. But since the Button's event loop is internal, the very idea of using a subclass approach indicates there is some exposing of the event loop's state process and how to muck with it. But this exposing is simply a guess, not actual mathematical reasoning.

However, I disagree with the idea that a Retained mode GUI has to work this way. Note that it is essential for a Retained mode GUI to expose the main event loop. In Win32, this is the message pump. In WinForms, this Application.DoEvents. In WPF, this is DispatcherFrame. In Swing, this is AWT UI event Hell, as all event listeners go through the UI thread and are blocking operations. ;-0 A classic example of why you need such functionality in a retained mode system is any short-lived frame sequence whose run-to-completion is critical to your application and cannot be interrupted. e.g. the sequence of event orderings.

I do agree, though, that retained mode systems make the event loop internal whereas FRP makes it external, and also requires threading all state through the program.

Edit/postscript: MaggLite is an example of a system that uses a scene graph yet doesn't require internal event loops. Instead, they have the notion of an interaction graph.

Bottom line: Retained mode is about object identification and manipulation, not event loops.

Better late than never in this crazy world of code

What I'd like to be able to say is that a retained mode API is the same thing as a immediate-mode API plus memoization. This isn't quite true, because different event handlers in GUI toolkits can fire at different rates. But it seems plausible to me that if you extend FRP with a clock calculus (a la the synchronous dataflow languages), then it could be true.

In any case, it's pretty interesting to see that some of us (like, on this LtU thread) have enough of critical thinking to start seriously wonder about the essence of such distinctions, rather fundamental I suppose, between immediate and retained mode UI's.

Especially when one can also check for oneself that the technological domain fragmentation, in design &/or implementation, has probably(?) never been so-well-and-kicking in these waters of our crazy world of code.

So to handle that impedance

So to handle that impedance mismatch, the callback registered to a GUI event handler writes some data into a buffer, and an event stream reads that buffer to generate the data for each logical tick.

How would this technique correctly handle "free-spin mode" on my Logitech Performance Mouse MX? Many applications incorrectly handle the inputs from this mouse, because they buffer the hyper-fast scroll wheel incorrectly, and thus do not appropriately respond to when I "hit the breaks" on the mouse.

Note, I am not saying you implemented asynchronous events wrong, but I am giving you a hard test case. IBM Rational ClearCase Explorer's Merge Manager incorrectly handles these free-spin mode inputs, presumably because it is buffering the scrolling but not monitoring the event stream for the absence of scroll wheel events. To be honest, I am not sure the cleanest way to code this myself :\

Edit: Actually, I got it; it's a simple rate-feedback equation.

Common Lisp Interface Manager's Output Recording

These ideas are a lot like CLIM's output streams with output recording and history. The default of sorts is to be able to stream out representations of objects. CLIM provides the capability to then incrementally redisplay if desired, but the simplest capability is just to write out things to display and stop worrying about them.

Object identity or "who am I?"

If we have object identity, then we definitely have encapsulated state since we could simulate that with an open dictionary (using the object's identity as the key).

Identity is something I should have talked more about in my 2007 live programming paper, but I didn't realize the problem had a name then :). One way to provide identity is to use the position markers in the editor (synced with edits of course); multiple position markers are involved if there is any kind of call chain. Then every loop iteration needs a unique ID; for a counting loop this is simple enough, just use "i," and other loops have their own unique keys (not necessarily ordered!); some loops might not have unique keys and hence we can default to ordered-based counting loops.

Now once we have identity, we can use that as a key into a dictionary to access the object's state. On refresh, we trace what keys were live before, and flush the dictionary of any keys that are no longer live on the refresh (as well as "shut" them down; i.e., remove a widget from a canvas). This is some form of live garbage collection.

Non-declarative References are definitely a problem. If it’s just signals (or YinYang-style behaviors!), then the connection won't occur if the object is no longer alive. But all bets are off if we want to store a reference to it.

overly eager GC?

Wouldn't doing garbage collection like that incorrectly discard state that is associated with regions that are inactive but still accessible? For example, if checking a check box reveals additional widgets, the state of those would be lost whenever the box is unchecked.

I use a slightly more restricted approach. Although I GC memoized results whenever the associated UI region is excluded from the current UI, I only GC state in the case you described where a loop has iterations with locally unique IDs. If an ID disappears from the sequence, then I assume the state associated with it is no longer needed. However, I'm thinking even this is overly eager, as an undo operation could make that state relevant again. I suppose one could define the reachability of UI state in terms of application state, but failing that, in an application with unlimited undo, maybe it's best to simply never discard state?

A matter of semantics

There is a big difference between the following two pieces of code:

val cb = new CheckBox();
if (cb.Checked) {
  new TextBox() { Text = "Hello" };
}

Assume that widgets are placed into their containing panel on creation. In this case, we have a text box initialized with "Hello" that is created and visible when the check box is checked. The user can edit the text box, but as soon as they "uncheck" the check box, the edits are lost. Not good. But we could write the code another way:

val cb = new CheckBox();
new TextBox() { Text = "Hello", Visible = cb.Checked };

In this case, text box will hang around as an object whether the check box is checked or not, but will only show up when the check box is checked. Any edits are preserved when the check box is unchecked and shown when the check box is checked again.

So the "Visible" property is kind of weird, its soemthing that we use in retained mode APIs but usually not in immediate ones; when visible is false, its like saying render this...but no, not really. This also implies that we don't actually render in an immediate way, that rendering is actually retained and only the API looks "immediate:" we only render what the function tells us to, but the widgets persist between function calls.

In this case, text box will

In this case, text box will hang around as an object whether the check box is checked or not, but will only show up when the check box is checked. Any edits are preserved when the check box is unchecked and shown when the check box is checked again.

I think this is probably a "dangerous" fiction. It encourages pushing application state into the UI. Little trivial examples aren't too problematic, but a few pebbles can quite easily turn into a rockslide.

Better IMO to make the UI a pure function of the application state, and provide a data layer that properly models the state transitions you want clients to experience (essentially lenses). This orthogonal data layer can then provide additional important properties, such as persistence, recovery from network partitions in case of remote UIs (like web apps), etc.

Embracing UI state

Only UI state is pushed into the UI. The debate here really is: does the UI really have state, or is all state a part of the application? I can see the appeal of ideal MVC of a state-free UI that depends solely on application state, but I think this is purist fiction. After all, the selected row of a table, how can we seriously argue that this is actually application state and not UI state?

I'm a full believer in UI state, even for check box checked values and text box text values, which we could otherwise argue should bi-directional bindings to application state. But those declarative bindings are way too obtuse, and sometimes we might want to do something pesky like validate or regulate the change, adding even more complexity to our binding facilties. For immediate UI, these problems seem to be exacerbated.

So in my view, we have UI state, we have app state, and we have the standard assignments needed to keep the two in sync, just that with the "loop" abstraction, those assignemnts are often continuous.

The value produced by a

The value produced by a control is qualitatively different state than the existence of a control as in your example, or it's visibility, it's font, etc. The former values are the result of a user action that cannot be contained or derived from application data, the latter are defined by the application data, or are a function of application data + user input.

The more I work on this stuff, the more my thoughts align with functional relational programming. Application data and UI control values are essential state, everything else is accidental state and should be avoided.

Accidental State

I agree that some UI properties aren't intrinsically stateful: visibility, font, position, etc. I didn't mean to imply that Visible was a stateful property of TextBox; actually, it is something that is bound/connected (write-only on one side, read-only on the other). But anything that can be written by multiple players (user and code) should be state; e.g., button checked, text edited, bar scrolled. The UI probably has other state that is not observable but exists for asethetic reasons (a button's animated push effect).

Some non-stateful properties must become stateful in certain cases, non-declarative animation of position through a physics engine, but we could argue that is mereley binding to another stateful cell. But then maybe they could be stateful in the first place? Maybe state is the default and non-state is an exception gained through continuous assignment to some other state (data binding in WPF). This seems more usable than a stateless default.

But regardless, state must be observable in code, which I'm still trying to wrap my head around. If "state happens" and we can't avoid it, what can we do to embrace and deal with it?

User Model State

I've had to face this issue when designing ZUIs. I agree that we need more than a little state to represent the user model. I've found it useful to split state across multiple dimensions and handle each dimension separately:

  • User navigational state: relates to panning, scrolling, progressive disclosure and drill-down, sorts and filters, the address bar, etc. Easy to bookmark. Easy history (timelines, etc.) Spatial metaphor, but many dimensions to account for scrolling etc.. Spaces have automatic layout, though that layout may be a function of object state (allowing some animations).
  • User eye state: model the users as wearing 'glasses' of sorts, with interchangeable and programmable lenses. These lenses provide HUD, annotations, highlights, alerts, transparency, themes and style preferences, language translations, etc.. They are primarily transformative, but some lenses may be stateful or bound to stateful resources. The keyboard cursor and selection text would be provided by the lens in cooperation with the hand.
  • User hand state: includes clipboards, brushes and tools (interchangeable and programmable), references to recently selected/grabbed/touched objects (and history), etc. Influences how we interact with objects. I favor a game-like inspiration here - e.g. to navigate a locked file, you must have the right crypto-key in your key ring, which is in your hand. We can open our hand and navigate it, too.
  • Object state - Each object may keep state based on its own responsibilities. Objects may provide (or reference) hints and suggestions about how to render. Some objects (e.g. portals) may provide more spaces to navigate, depending on their APIs. Some objects may represent relationships between users and other objects - forms or contracts, created when the user interacts with an object, then kept in hand.

This structure is designed for my ZUI object browser concept. It is no-application, but code may register some objects to shared spaces, making them accessible. The role of applications is fulfilled by interaction with objects.

Support for user model (UI) state with immediate mode views seems to work quite well in the context of a browser. I doubt it would work so nicely if each application (of many) maintains its own user model.

UI has state. For example

UI has state. For example the position of a scrollbar, the position of the cursor, the column that a table is sorted by, the state of an animation. If you want to make the UI a pure function of the application state, you'll have to incorporate that state into your application state. IMO it is desirable to have some mechanism to separate internal widget state from the state of the data that the widget represents, so that the client of the widget does not have to manage the widget's internal state. Edit: I see that Sean has already said pretty much the same above.

The point is to eliminate

The point is to eliminate the duplication of state between application and UI, thus requiring synchronization logic. UI state that the UI manages entirely on its own was never a problem.

Is it?

Suppose you have a business requirement that says, "At time of save, null out all database fields which are not visible in the UI."

Where do you put this logic, and how do you test it, using today's frameworks?

I'd first ask for clarification

I'd first ask what for clarification for doing this clearing at time of save. What is the precise user story here?

The precise user story is

The precise user story is data got screwed up somehow, but maybe the user cannot see the data due to data model relationships that imply the value should always be null. The alternative would be to always display every DB field on the UI, and if it is supposed to be null, put a warning icon next to it, so the user can manually investigate the data and clear it out. But this just really annoys users who have to handle urgent data entry requests.

You could certainly point out that it makes more sense to model this with input rules and use the input rules to reify the UI, but that's not the point.

If this is a design argument, certainly there are better ways to solve the problem, but sometimes a business asks for something and won't say no. Arbitrary requirements are arbitrary requirements. :)

The current project I am working on is a very boring $50 million dollar budgeted project that adds no value to the business whatsoever, and just does data acquisition and maintenance. But I joined too late to set up things in a way that agrees to my tastes.

But I am trying to communicate a broader perspective on why GUIs tend to be hard. There are many concerns, such as handling input, rendering, so forth, and any time you address so many concerns in one library, you are likely to see inelegance. This isn't an incurable disease, since programming language designers know the solution. DSLs. But creating DSLs is hard.

As developers, one of our

As developers, one of our important roles is clarification, refinement, and validation of these requirements. Even if we don't say "no", we can enter a dialog to obtain clarification, the goals of this requirement, and propose alternatives that can serve the purpose.

Anyhow, if I wished to implement this in an immediate-mode UI, I would keep the view state separate from the application, but I would parameterize the save operation with some information about the view. This might be achieved via an intermediate view structure, but the application could be ignorant of the view state.

The precise user story is

The precise user story is data got screwed up somehow, but maybe the user cannot see the data due to data model relationships that imply the value should always be null.

So the precise requirement isn't that these fields are nulled on save, but is also satisfied if we could ensure they're null at time of save. Hence my request for clarification on framing the problem.

Since IMGUIs are continuous bidirectional functions binding application state to display, this means we can null the field at the point when we choose whether or not to display the UI element, since the logic of whether the element is visible clearly already exists. That's a rather simple change.

Doing it the way you initially specified "on save" would have been much harder because it already assumes that the UI has state, which is not the case with IMGUIs.

I suppose one could define

I suppose one could define the reachability of UI state in terms of application state, but failing that, in an application with unlimited undo, maybe it's best to simply never discard state?

Indeed, which leads inexorably to the conclusion that undo should be reflected in application state as well, because only the application can decide when "old" state is no longer needed. So we're now full circle to the UI simply being a direct reflection of application state.

Interesting remark which I

Interesting remark which I suppose would also impact the look that we have today at the relevance of "standard" tiers in multi party architectural patterns, like for the "C" in MVC, for instance. Could it be deemed reasonably obsolete, by following such paths?

Amen.

It actually damages the cohesiveness of the application model to allow the UI to do any sort of journaling. And most interesting UI actions involve non-local semantics. For example, focus and navigation. In Win32, clicking on a menu doesn't steal focus, since the menu only exists to operate on the object that has focus.

Also, only application state can reify to the user a warning that they have broached some limit on application history. e.g. Photoshop allows flattening layers and truncating history. Likewise, application state can be used to provide richer behaviors. For example, the worlds best Photoshop users would for years save their "before flattening" states in a separate file with a naming convention in case they needed to backtrack to an earlier idea. Photoshop artists understand how to manipulate continuations really well, through visual metaphors.

More thoughts

I had an interesting talk with David Ungar after my Onward talk last year. He basically said that we should aim to have our code and execution models be as close as possible (not surprising, as David was one of the primary guys behind Self). At the time I was like “Sounds nice, but I have no idea how to accomplish that,” but an immediate-mode UI really move in that direction: the code that executes represents the UI exactly; there is no Hollywood-style control flow or opaque object graph that determines what the UI looks like. The idea being that you should be able to reason about program behavior by directly examining your code and not worrying about what encapsulated UI framework code is doing. I say: box go there, and it goes there. That is very powerful and very liberating. It especially benefits creating nice design tools as code is much more “bi-directional” with respect to the execution. I think we can see this in Bret’s video, looking at the examples...

Now, there are quite a few disadvantages to the pure direct model. First, encapsulated state is difficult to encode, while events are expressible only as polling and as such, their effects have to be sustained through global states. Now many of the decisions made about how the code executes is tangled in that state, we lose our nice concrete bi-directional mapping between the representation of the code and the execution.

Second, the immediate way is just very inefficient; e.g., complex layout logic is expensive and often very iterative even if it can be represented without state.

Now, I'm wondering if we could use immediate mode the abstraction itself and then enhance execution of this code with some way of first reclaiming identity, reflecting state directly in the code (e.g., think of an event as a growing stream to iterate over ala Rx), and enable some form of memoization.

Now, I'm wondering if we

Now, I'm wondering if we could use immediate mode the abstraction itself and then enhance execution of this code with some way of first reclaiming identity, reflecting state directly in the code (e.g., think of an event as a growing stream to iterate over ala Rx), and enable some form of memoization.

Indeed, this is basically the approach described in Two for the Price of One: A Model for Parallel and Incremental Computation. You basically just need to be able to abstract over the if-block in the immediate mode code snippet above, so just lift that into a lambda. Then use the "repeat" extension of concurrent revisions to perform incremental updates over these blocks without having to recompute the entire UI every time.

I can see this for

I can see this for memoization; this has a lot of similarities to what I did in the old Scala IDE. I'm embarrassed that I didn't think of this before when talking to Sebastian about it.

However, I'm not sure how record/repeat would work for cyclic iterative computations that are required complex layouts. An extreme benchmark for that would be implementing a Turing reaction diffusion system, but we are getting in the realm of physics in this case, perhaps we just need special abstractions for that.