Testing release of a platform for hosting pure functional web applications

I'd like to announce the public debut of a service I've been working on. Among other things, it provides "cloud hosting" for web applications written in Ur/Web, a domain-specific functional language for "Web 2.0 programming."


This service (called Graftid) also enables communities of developers to work together to build tools that non-programmers can use to build customized web sites quickly. Anyone can upload a site-generator GUI, which is implemented in Ur/Web and also generates Ur/Web code, based on what a user enters into the GUI. Everything is statically-typed, and it's possible to use combinators to minimize the cost of building a new GUI. Every GUI inherits a platform for automatic deployment of applications, without the need to write a line of code that has a server-side side effect.

I'm looking for curious folks who might like to put this platform through its paces, finding bugs, security-oriented and otherwise. I hope that many LtU readers will find this a very pleasant platform for building buzzword-compliant web apps, without the need to learn much about the buzzwords and their associated technologies. :)

I also have a related question that I thought I'd include with this post: We're all used to encapsulation for examples like data structures: a class or module "owns" a representation, and the representation may only be accessed by going through the class or module's published interface. Ur/Web extends this facility to let you code a module that owns a cookie, a database table, a subtree of the client-side DOM for a particular page rendering, etc.. Think "Facebook apps" with static enforcement of which app may touch which resource, but without the need for any dynamic enforcement, and with the possibility for running all the apps on the same server; we just combine first-class web app pieces with standard encapsulation techniques. Does anyone know of any other systems that allow this? Has the desirability of this facility been articulated somewhere?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

we just combine first-class

we just combine first-class web app pieces with standard encapsulation techniques. Does anyone know of any other systems that allow this? Has the desirability of this facility been articulated somewhere?

The Waterken server uses capability-security principles to achieve the same ends, though it doesn't provide the degree of static safety that Ur/Web does. The primary Waterken developer has written a few articles and papers detailing how proper encapsulation techniques solve various common problems in web programs.

I only had a quick look, but

I only had a quick look, but Ur/Web looks very nice.

Seperation of design and programming

I just a had brief look - sufficient to become interested and then to discover there appears to be no separation between design and code.

Is it just me or shouldn't that be one of the first and most important things in a web application platform? The ability to let each area of expertise work "locally"? Just like encapsulation, separation of concerns etc.

Not sure about Adam's

discover there appears to be no separation between design and code.

Not sure about Adam's opinion, but I'm not convinced this is as fruitful as is commonly believed. Certainly designers should feel free to mock up how a front-end should look and behave, but why does that imply their end-product should be merged directly with the main program's codebase? Certainly ongoing maintenance is an oft-cited factor, but do most companies have full-time designers dedicated to a project the way they have full-time developers?

And the user-friendly front-ends that enable designers often simply hamper developers trying to extend a UI control's behaviour. In my web framework project, I'm taking the opposite approach: everything is code-based and everything is dynamically, programmatically updateable, ie. UI structure, layout, styling, control-flow, contents, etc. (but exposed using declarative abstractions, like FRP). This approach may not appeal to larger development shops with dedicated designers, but there are a considerable number of small to medium shops with few or no designers that I think would benefit from this approach (myself included).

I certainly hate fiddling with awkward design tools and would prefer a more direct, programmatic interface that is inevitably safer as well (less dynamic/reflective binding to accommodate the generic design tools).

The right tools for the right job

Maybe a lot of shops doesn't have a full time designer, but there's other reasons to change design based on A/B or multivariate testing and such stuff. In any case it makes sense to me to use a design tool for making the design as well as the right coding tools for coding. Considering what's good coding practice, why do think design is different? To me it's just information encoded in another form. I want the primary tools to manipulate that information (design tools) to be the primary platform, without the need to manually transfer this information to another format. After all, what do have computers for :-)

The reason is that we don't

The reason is that we don't have good design tools that simultaneously a) allow succinctly specifying UIs via some form other than code, and b) don't rob important control that is inevitably needed from the programmatic backend that interfaces with the UI code. See Bling and Socks are examples of the approach I'm describing. I suspect that any usable design tool that doesn't rob the programmer of such power will have to start with a code-based UI combinator library, and build up a graphic design tool on top, and not vice-versa as is usually the case.

Combinator libraries

I think "code-based UI combinator library" is very aggressive, and would not presume that is the answer.

I think more generally you want a composition system with certain mathematical properties.

I would also assert that academics and practitioners alike would benefit from somebody deconstructing past composition systems for UI, and explain their shortcomings.

For example, XAML's WPF subset specification itself (which explains how XAML is a partial class of some code-behind file) contains abstraction flaws that cause "XAML resource not found" errors when you subclass a Window from a different assembly. It turns out partial classes are not as good in practice as some PLT researchers at MS argued, probably because they were too focused on arguing the merits of partial classes versus aspect-oriented programming.

By the way, Bling and Socks in the same sentence doesn't make much sense. Socks is fundamentally procedural, block-structured UI specification. This has been done a million times before.

I think "code-based UI

I think "code-based UI combinator library" is very aggressive, and would not presume that is the answer.

Yes, it is a strong statement of my personal view that something like this is the future. It's beyond my reach given I'm hamstrung with C#, but I plan to get as close as possible while retaining the clarity. I agree that compositional UI components are the minimum required however.

By the way, Bling and Socks in the same sentence doesn't make much sense. Socks is fundamentally procedural, block-structured UI specification. This has been done a million times before.

Each illustrates a different aspects of a single philosophy, namely usable code-based layout: Bling uses absolute positioning based on constraints, Socks uses the block-structured flow-based layout of HTML, and they both expose this functionality in a clear and familiar way via compositional UI elements.

Clear and familiar

I guess I just think we need a revolutionary step forward, like the invention of the mouse or spreadsheet, or pivot tables.

Code-based layout has been done a million times before. Usually, its advocates do not even have a computational model for performance, or for security. Lame.

See ZoHo Creator for one particularly cool example of styling code layout. You can also search the LtU archives for papers on scannerless parsing and concrete syntax trees.

I guess I just think we need

I guess I just think we need a revolutionary step forward, like the invention of the mouse or spreadsheet, or pivot tables.

Right, I meant "the future" absent some revolutionary new idea.

Usually, its advocates do not even have a computational model for performance, or for security. Lame.

Not sure I understand what you mean by "computational model". I use capabilities for security, but not sure if we're talking about the same thing.

I use capabilities for

I use capabilities for security

And how do you handle the confused deputy problem?

For performance, see my recent thread on CSS.


uh, confused deputy is one of the basic examples to hoodwink people into wanting to use capabilities. i assume you know that, so i figure you must mean that everybody is wrong and you can't really use them for that?

Paranoid security

My advice when approaching security is to be paranoid. For an example of how clever an attack can be, please read The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86). This paper is one I've been thinking of posting to the LtU front page, because, to me, if you replace the authors use of the phrase "gadgets" with "combinators", then you see the PLT-aspect to this attack. It's a crazy paper, because it shows you can mechanize attacks with PLT.

My comments below in this thread about hardened stateless session cookies are another example. Okay, so you say you've "proven" your application is secure and you don't need to worry about cookies because of all those secure details, right? Wrong! Secure your Cookies Model even still!

Edit: I also think an API shouldn't just document what is there intentionally by design, but what will never be there over the course of time intentionally by design. Application Domains in .NET do not guard against this, since the system administrator can inadvertantly turn a knob that allows unnecessary authority to leak into the application.

Confused Deputy Problem?

In case you mistook what Sandro said, he was talking about object capabilities, not those silly POSIX capabilities or whatnot. CDP (even in UI) is pretty much solved by use of ocaps. (I assume you already know the details.)

Taming image capturing, etc.

Most GUI toolkits I know of that allow image capture/print do not use an ocaps model.

I was not suggesting to Sandro that he didn't know what he was doing. I was simply suggesting that toolkits in general are not designed bottom up for secure, robust composition.

WPF is a good example. I'm curious how Sandro locks down Silverlight or WPF. Figuring out all these edge cases is a pain in the ass, and mostly due to the fairly insecure way WPF provisions resources to elements in the control tree and allows Imaging classes to snapshot any Visual. (I believe this design choice was made so that XPS would be easy to use and also, like many dumb WPF design decisions, made to make XBAP "sandboxes" a "viable" option.)

Am I making a stupid point? I feel like I agree with you but feel like I am advocating bottom-up security, not trying to wrap security with something like Caja or whatever.

I feel like I have read about this problem before, as a generalization of clickjacking. I can't find a link anywhere, though. Maybe I just dreamed it up.

WPF is a good example. I'm

WPF is a good example. I'm curious how Sandro locks down Silverlight or WPF.

I don't do anything about the ambient authority intrinsic to the CLR at the moment. Also, my project doesn't use these backends (yet), but were I to extend the scope to enforce capability security on local code, then some Joe-E type static analysis would be needed to enforce capability security (or isolate every component in its own AppDomain with no permissions and with only a trusted API for IPC).

My scope is very limited at the moment, consisting of just web UIs for my daily work, but there's no intrinsic limitation to the web. By using capabilities, I meant, a) server-side objects are named by capabilities, and b) the UI library itself follows a capability discipline, so UI components are naturally isolated. A UI component can still get around this via the CLR's ambient authorities.

However, a program's continuations are captured and serialized, so exploiting ambient authority is already more difficult since it does not survive restart, migration, etc. The next logical step would be to enforce that property.

Model-driven interface design

My friend Matthew Holloway (ex-Matthew Cruickshank), who is the author of docvert (an MS Word to XML transformer), created PilferPage as a demo of how to do media-type independent UI components.

It is more general than the Web. I don't 100% like his source language, Pilferpage, but the transforms are a good proof of concept. Of course, similar things were attempted in the early 90s.

Compilation and Distribution

When compiling from a capability language to an ambient authority language, one must use a trusted source - one that won't leverage or introduce ambient authorities inappropriately, yet that will convert object capabilities to ambient forms, appropriately.

As a consequence, one cannot compose code of an ambient authority language and still ensure capability security unless you can know or prove that all such code was produced (or could have been produced) from a capability language by a trusted compiler.

You don't encounter any real problems until you attempt to distribute or load ambient authority code - whether they be libraries, scripts, plugin, replicated or migrated objects. The straightforward solution is to distribute in a capability language (after running it through an obfuscater, if desired, and even various optimizers) then compile it down, perhaps via JIT. Analysis of code once compiled is considerably more difficult.

To ask how one locks down WPF is to assume that WPF is a format in which code is composed into a project, or is produced as output from untrusted sources. I would not assume that this is the case.

Impractical definition of Trusted

I agree with everything you say. I just think you are implicitly arguing from the impractical position that it is easy to trust code safely. I'd need to build a bytecode loader that denies the API from using features from the underlying powerbox (really, in this case w/ WPF/Silverlight, just a sandbox w/ APIs not designed bottom-up to be secure).

I like the way Laurentiu Cristofor breaks this down in his blog entry Security in a nutshell.

Making sure programs can only do what the programmer intended is hard, especially when the underlying system's designers see security and trustworthy computing differently than you do. Viewing all deployed code as legacy code the moment its environment is upgraded, there is no good way to prevent or reason about certain permissions w/o building your own powerbox on top of the environment itself, effectively creating your own VM. I also suspect this complicates integration for plug-ins because now you are defining your own API on top of the platform and whatever standard code they may have wanted to re-use may not be directly re-usable. -- Even Adam's graftid appears to rely on everyone writing Ur/Web(s).

Rejecting the Powerbox

Giddyap! It's my hobby horse! Time to ride!

It is easy, and safe, to trust capability secure code. Capability secure code running locally, on your machine, has no access to any capabilities that it would not have possessed if running remotely, on an untrusted machine, across a perfect network.

Of course, you shouldn't provide a powerbox to a remote, untrusted machines! As discussed elsewhere, the powerbox pattern is nearly as gross a violation of POLA as is ambient authority. In ambient authority design, libraries aggregate and specialize broad authorities in order to achieve a useful end - a service. A powerbox design follows the same approach. Naturally, if one must refine and specialize capabilities, then one has violated POLA.

We should make a goal of rejecting such 'refinement' patterns from capability languages. Facet pattern, powerbox pattern, etc. smell of design failure.

This requires that authorities be available to developers in the specific, fine-grained forms they require, at the time they need them. Ideally, a new service can be developed by combining a few caps with a shallow bit of glue code and a few functional transforms.

Such a new service would, in turn, be specific and fine-grained and available for further composition. But one must somehow bootstrap this cycle. I suggest the IDE is the place to start, providing developers a library of capabilities, along with references to public and private registries where new services can be introduced. Instead of developing then sharing service libraries (or plug-ins), one develops then shares a reference to a new distributed service.

Automatic code distribution - primarily replication and migration - can handle the rest. Since the service is implemented as capability secure code, it doesn't need any ambient authority; therefore, you can easily run untrusted services (or relevant fragments of one) on your machine. Since you don't deliver a powerbox to the untrusted service, you don't need to worry whether it leveraging more authority than you wished of it.

Instead, security risk is shifted to the 'distributor' of code, who may accidentally distribute secrets or sensitive capabilities, which may then be abused by the recipient - by you, if you're the one hosting the untrusted code. This is good: distributors are far more likely to act with enlightened self-interest to prevent distribution of sensitive information and capabilities, and there is no need for artificial limiters - no sandboxes.

there is no good way to prevent or reason about certain permissions w/o building your own powerbox on top of the environment itself, effectively creating your own VM. I also suspect this complicates integration for plug-ins.

What we need is distribution of code. Plugins are nothing but a way of packaging and distributing code, and that packaging is not essential (or desirable) complexity. By eliminating packaging can we move to a tierless architecture.

We do benefit from a capability-secure VM for distribution of code, but once we have one we don't need to build another atop it. Such a VM would need to reflect certain ambient authorities as capabilities. I use plug-ins to support this to ensure the set of ambient authorities handled is extensible; it also serves as a capability-secure basis for FFI. [details clipped]

However, 'plug-ins' at the language layer aren't anything special. If you have automated code distribution, then plain old capabilities will do the job. If a capability is remote, and the runtime uses it a few times, it might request a local copy or migration of the cap. Or the runtime might instead attempt to send the object using the remote cap to live nearby it. If I can't do either, then I simply have a tiered architecture, which is no worse than state-of-the-art today. The exact choice can depend on a number of optimization decisions, developer annotations, secrecy requirements, etc.

While the VM-layer capability factories and the IDE-layer capability library may look similar to powerboxen, I posit that they are not powerboxen because they are not used as powerboxen.

David, I would really

David, I would really suggest trying to condense your comments - it's a burden to get through, hiding your message on topics you've clearly otherwise thought about. It's not easy to be concise but informative writing is more about the reader than the writer. In this case, I bet you could have gotten across at least 80% of the message in less than half the length.


I'll probably refine the above a few more times, as is my usual habit - indeed, did so once even as you commented.

Being concise is easy.

But I should probably elaborate and say: it is easy to be too concise, and lose the relevant information.


Facet pattern, powerbox

Facet pattern, powerbox pattern, etc. smell of design failure.

I'm not sure you can really do away with the Powerbox pattern, as it fulfills a fundamental role, particularly in user-driven delegation where dynamically increasing authority is a common and necessary operation. Any kind of user shell or UI must inevitably implement some sort of Powerbox, for instance.

Consider how a user would transfer content, say a Word document, from one program to another, absent ambient file systems, a clipboard, etc. Some agent must be aware of source app and target app, and have a capability to both, in order to connect them together. The window manager may fulfill this role in the case of a drag and drop, or the user's shell may do this when the user maps a file from his namespace into the program's private namespace. Either way, this is the Powerbox pattern. We should infer as much delegation as possible from user designations and avoid Powerbox-like intermediaries where possible, but the need for one in a user-facing system is inescapable I think.

I would contend that a

I would contend that a database of capabilities made available to the user through a UI or shell is not the powerbox pattern.

Powerbox pattern would be dropping such a database into an application or service, with purpose that said application pick and choose which capabilities it will use based on arbitrary meta-data - such as the name associated with each capability. I.e. you give the app a big toolbox of power, and the app grabs the specific tools it needs. By comparison, a principled UI isn't simply grabbing tools and using them; it only serves as an agent to the user - and the database itself is a high quality alternative to human memory or pencil and paper.

Powerbox pattern is defined, like all other patterns, as much by how the components are used as by the structure of the components. It must be, lest the pattern lose meaning and utility.

you give the app a big

you give the app a big toolbox of power, and the app grabs the specific tools it needs

I don't believe this is correct. The classic use of a Powerbox is a word processor that the user wishes to use to edit a file. By default the untrusted app has no access to any of the user's files, so it requests a "file open" of the Powerbox with the file stream settings and an app-specific message, such as "Please select a file you wish to edit.".

The Powerbox then presents the user with a trusted "file open" dialog with the message and settings, and the user picks the file. The Powerbox then opens file the required settings, and passes a file stream capability back to the untrusted app (or you can map the file into the app's private namespace in some cases).

The Powerbox is simply a trusted intermediary between untrusted programs and the user. The Plash shell and a trusted UI perform these exact functions, and I don't see how you can get rid of this pattern fully.


My initial exposure to Powerbox had it explained in the manner I did above, as a sort of inverted Sandbox, never as a 'trusted intermediary'. For example, a Powerbox might grant access to a region of a FileSystem as an alternative to allowing Ambient Authority to do so.

But after a bit of searching, I see it is used by the E language community with reference to the situation you describe. Based on their wording, I suspect the 'file dialog' refers to an interesting use-case for the powerbox - providing a file cap interactively - rather than speaking of the powerbox itself. You may wish to review the documentation.

In any case, I have no objection to users holding many capabilities and picking between them. Nor do I object to a prompt for the user to provide a capability, though I would favor those be extremely seldom. If a 'capability manager powerbox' is simply a UI cap that supports the application in providing options along with the prompt, I won't object to that pattern - so long as a user is the one choosing the capability (none of that 'program wants to XYZ, allow?' nonsense).

I do object to application code patterns that involve a chunk of code getting a large number of caps then not making practical use of most or all of them. This includes powerbox-as-inverted-sandbox. I.e. if you're given access to a subtree of a filesystem, you had better be doing something to every file in there... anything else would be a violation of POLA.

For reference purposes, here

For reference purposes, here are the pages describing Powerbox, and the Powerbox pattern on the E wiki. And yes, the "open file" dialog is an instance of the Powerbox pattern, and I refer to it as the classic use-case becuase the Powerbox pattern itself was first devised in order to solve this exact problem in the CapDesk capability-secure desktop (or at least it was first so named in the CapDesk paper).

What do you mean by security

What do you mean by security for a UI library? What needs to be secured?

UI Security

A secure UI library should resist a malicious component's attempts to abuse the user or interact with other components in the same UI. One should assume the library is, in the general case, being used to build mashups between mutually untrusted components. A desktop is one such mashup, containing mutually untrusted applications. (It's worth noting that most modern UI frameworks are very far from secure.)

Example abuses: applications might take pictures of other applications (i.e. so your Shareware game can steal the credit card info you just entered at some website). Applications might steal focus from one another (i.e. by raising a window to the top) - which might cause you to enter password text into the wrong window. An application might attempt to cause you to press a button in another application, i.e. via use of opaque buttons that act transparently, or perhaps as a mouse game.

There are quite a few things that ought to be 'secure' in a UI framework. Reading Ping's Secure Interaction Design might give you some ideas.

Shouldn't such security be

Shouldn't such security be provided by the OS instead of the UI library (like WPF,Qt) that is built on top of the OS? I do not get what code based layout has to do with security:

Z-Bo: Code-based layout has been done a million times before. Usually, its advocates do not even have a computational model for performance, or for security. Lame.

Also, this is relatively unimportant issue. With current OSs if you have malicious programs running you are already toast. In any case one should not dismiss a new UI library because it is not secure in this way.

Jules, most of that isn't

Jules, most of that isn't true in the web approach to end-user applications, where we actually have a shot at getting this stuff right bottom-up. Check out some of Helen Wang's papers (a light entry point would be her recent workshop paper on a secure multiapplication browser/os).

Even in this case, the

Even in this case, the security should be provided by the browser, not by an high level (possibly code based) UI library like jQuery. I still don't understand why the "code based layout" UI library is the right place for security. It seems to me that such a library should be built on top of a secure low level facility (like an OS's UI primitives or HTML+JS). BTW Helen Wang's papers are interesting. Thanks :)

I still don't understand why

I still don't understand why the "code based layout" UI library is the right place for security.

Just to be clear, I never said something like this.

Even in this case, the security should be provided by the browser, not by an high level (possibly code based) UI library like jQuery.

Just to be clear, jQuery is an ambient authority that is allowed promiscuous access to the DOM. jQuery only exists because 99.9% of all programmers don't understand client/server responsibility segregation and how to package programs. jQuery is a run-time whitebox composition system; I've no idea what a security system for a run-time whitebox composition system would even look like, or if that even makes sense. Stuff like jQuery Live Events would need a token to permit safe updates, so that the composition system could be trustworthy.

No, asking for the OS to

No, asking for the OS to handle it is just borrowing trouble.

The OS needs a secure UI library same as any other. There is nothing special about being the OS.

Also, the desktop provided by a UI is just one example of a shared UI space, among many. Browsers, for example, must regularly combine mutually untrusted components: different pages, and different elements within each page - often from multiple sources (advertisements, content, feeds). It also isn't uncommon for modern applications to be 'extensible' by various means - plug-ins and scripts and such - these components should still be secure against one another.

The UI libraries used by extensible applications, and the UI libraries used by the OS (with its 'applications' extending the desktop), have the same basic security requirements.

In any case one should not dismiss a new UI library because it is not secure in this way.

We don't need any 'new' insecure UI libraries. We have plenty of them. While I might not dismiss a UI library just because its implementation is insecure, I would not hesitate to dismiss 'innovative new UI designs, frameworks, or architectures' that cannot readily be secured.

How can e.g. Qt make sure

How can e.g. Qt make sure that another application does not take a screenshot? The OS is special in that it CAN prevent this.

The relevant question is: if

The relevant question is: if QT were the OS's UI library, could it prevent an insecure screenshot? Or put another way: if you performed mashups in QT, to what degree could you prevent a malicious QT component from taking a screenshot of a security sensitive component?

The OS is not special; it is also using a 'UI library'. Application UI libs are often a wrapper atop it.

An OS desktop is just another mashup of mutually distrustful UI components. To suggest that the OS should handle all UI security is to suggest that the OS is the only place mashups should be allowed - i.e. that applications should not link or embed one another or be extensible. And that isn't very reasonable.

Well, yes, it could: just

Well, yes, it could: just remove the takeScreenshot() function...I don't get your point?

Who says we want to be rid

Who says we want to be rid entirely of the ability to take screenshots? Security is not about taking features away, but rather about ensuring they are used legitimately. (A feature is not secured if you take it away! Denial of authorized service is insecurity. Disconnecting from the network is the easiest way to ensure your network connection is insecure.)

My point is that if you seek egg-shell security in a world where mashups and extensible applications are the rule, you'll end up with a lot of rotten eggs. Pushing the responsibility to the OS, but failing to handle security within extensible and mashup applications, is egg-shell security.

Misisng his point

In current architectures, a screen shot is a frame buffer capture, not a request for the UI to render itself to memory, so you can't write a UI that makes screen shots secure if the OS opens access to the frame buffer. Whether or not the video card driver (or more specifically, the interface to that driver) is part of the "OS" is a separate question. I think a reasonable point of view is that the security model IS the OS.


but if the API lets you walk to a Visual object and demand without authority that it snapshot itself and return the snapshot to that unknown requestor, then it doesn't matter if the OS imposes word-datum limits, because the API lets you subvert the base and limit registers.

This is why "objects should not implement half a reposnisibility, or more than one responsibility; instead only a Single Responsibility."

Don't really follow

I only vaguely follow your word-datum analogy and don't see the relevance of the single responsibility principle. My point was that in current (used) OS architectures, taking screen shots doesn't usually involve the UI, and I don't think that's necessarily a bad thing. Screen capture should just be a capability like any other, that needs to be treated carefully by the user because of its security implications. If you want to build some kind of redacted screenshot mechanism into your UI, fine. I see the need for a secure UI is as separate issue.

It isn't separate

Giving the users a whole-screen-shot capability, i.e. via keyboard, is reasonable, since they have that capability regardless. Giving applications access to a whole-screen-shot capability, however, is not a security-conscious decision. Pointing to the "current (used) OS architectures" is hardly a point on your side: there are more than a few malicious screen-scraper applications out there, and the "current (used) OS architectures" therefore serve as yet another point that they should have a secure UI framework.

But it is useful for a UI library to provide a screen-shot capability for a variety of reasons, including the ability to regression-test the UI implementation. One might be able to 'screenshot' a region, but not necessarily pick up any embedded content... and certainly not able to capture the surrounding regions for which you bear no responsibility.

If that requires a different implementation, then so be it. Pointing at the existing implementation as an excuse for not doing better is a bit silly.

Besides, the most natural place for a 'screen-shot' capability is the UI. What purpose has a screen-shot without a UI? But if you're saying that the screen-shot cap should be separated from the rendering cap, I'd agree.


You saved me a bunch of time.

I was also going to explain 'GDI printers' and how printing to a network printer has poor fault tolerance characteristics in Windows when using GDI printers.

The other point I was going to make was that, even if the OS UI primitives are capability secure, a VM w/ Retained Mode rendering has all of its drawing primitives on top of that UI primitive. Therefore, it has to have powerbox access to the underlying Device Context (using Win32 terminology). This is why WPF isn't secure, and fundamentally hard to secure. Silverlight actually took a step in the right direction by burying the WPF Visual class. (The security problem I am describing here is a lot like The Geometry of Grafting Innocent Flesh on the Bone paper I linked earlier.)

There is an additional concern, which is separating drawing from cache invalidation: e.g. synchronization to the frame buffer is/should be separate from flow of control and flow of data. Separating synchronization from flow of control is a standard systems theory concept, as by doing so you are not locked into a particular fancy graph algorithm to determine flow of control. I think, if you want to be truly fine-grained, you should be able to say, "only give access to every 30th frame", because now you are defining resource consumption of the screenshot cap in terms of how much of the processor it can hog. -- You're not only saying, "Okay, I trust you to encode this Visual to a JPEG." You're also saying, "In terms of the applications main event loop, you can do it under these conditions this many times."

You should also be able to effectively create a motion capture program using this, where you are permitted the capability of capturing every frame (for UI regression testing, this means being able to playback failed functional tests specified in a Selenium-like framework; to do this in WPF today, mainly involves using Jeremiah Morrill's untrusted hacks into milcore.dll and P/Invoke'ing some hidden video capture features.)

I still haven't found where I came up with all of this. It is entirely possible I came up with it and didn't credit myself, oh well. If I did come up with it, it was only as a reaction to massively parallel tile rendering schemes I've been reading lately (all of which appear stupid and ugly if you know PLT).

User vs. applications

Giving applications access to a whole-screen-shot capability, however, is not a security-conscious decision.

Any command that the user can trigger with his finger should be available for use in trusted applications. Do you really disagree with this? Maybe by 'application' you mean a more restricted class, in which case I think I pretty much agree with the rest of what you and Z-Bo have said here.

Oh, except primitive screen-shot cap (frame buffer access) belongs at the level of the rendering cap, I think. "Take a screen shot and store it as a JPG in my pictures folder" belongs at the UI level.

Any command that the user

Any command that the user can trigger with his finger should be available for use in trusted applications. Do you really disagree with this?

As a general rule, one should be able to delegate the capabilities one possesses. If a user wished to delegate his or her capability to take a screenshot to another application, that's fine.

OTOH, you should ensure that said capabilities are well synchronized with the user's or her ability to trigger commands via finger.

For example, a user's ability to take a screen-shot should not long survive his or her exiting viewing range of said screen. One way to achieve this is to simply associate it with the 'Print Screen' button - which assumes the keyboard is within viewing distance of the screen. If one does wish to support delegation, there are - of course - alternative approaches. One could put time-limits on a Print Screen cap, or perhaps associate it with some other user artifact (such as creating a cap that only works when a particular smart-card is inserted). (Note: I'm not assuming that user is owner.)

You mention "trusted application" but how are you distinguishing such a thing? An application entrusted with a screen-shot cap might not be entrusted with much else. Confinement is a powerful tool for security.

Trusted applications

For example, a user's ability to take a screen-shot should not long survive his or her exiting viewing range of said screen.

So I shouldn't be able to setup automated screen capture of every frame and then walk away from the keyboard? I think he should. By "trusted application" I just mean "application the user has trusted with this capability."


David was referring to the scenario where you are connected to the machine via something like PuTTY and did not EXPORT using x11 or DWM the window manager to your remote terminal. Hitting 'Print Screen' arguably doesn't make much sense, especially in conventional OSes, because the 'PrtScr' button on EN-US QWERTY keyboards is generally mapped to the OS's active window manager. [Edit: Although, it does raise the idea that you could, in principal, sidestep the window manager's networkable graphics protocol, and see the desktop on the remote machine via sharing the same Retained Mode object model.]

Also, for applications I build, it is entirely possible for there to be a terminal and/or a GUI for the app, simultaneously. They're both connected to the same distributed network feeds, so using both simultaneously isn't conceptually complicated. For terminals, if you use linux, you probably know of the fairly fantastic *NIX program screen(1) that allows you to attach and detach programs from your terminal and thus when you exit the terminal, you are not logged out completely because your detached session is still hosting the program you were using.

So I shouldn't be able to

So I shouldn't be able to setup automated screen capture of every frame and then walk away from the keyboard? I think he should.

A machine may have more than one user, you know. I think I'll avoid any Operating Systems written by someone who believes users 'should' be able to maintain a screen-capture capability on machines from which they've since walked away.

In the role of the machine owner, you'll have different capabilities than a mere user.

Small adjustment in view

If the user has his/her own workspace, completely isolated from other users, then snapshoting remotely while other users are using the system is not only legitimate, but necessary to generalize the OS into a terminal server feeding to N terminal clients.

You should also not limit machine owner role to locally logged in users. A user, whether remote or local, should have to present a separate credential as machine owner in order to do this sort of stuff.

A machine may have more than

A machine may have more than one user, you know.

It might. And each user might have his own UI.

Isolation is the Exception

Most UIs operate in shared spaces, shared views. Often two or more users will start manipulating a shared object (i.e. a document, or a database) concurrently.

The pretense of isolation, of each user having his own UI, is a common failing among OS's today. Associating a UI with a particular user is a mistake that led to user-based authentication and ambient authority in the first place. Associating UI with a user will also resist many application mashups and interesting compositions.

I posit that any 'secure UI library' must avoid this pretense. Isolation between users is the exception, rather than the rule: Isolation lasts only until a UI capability is shared.

Anyhow, I would agree with a fundamental position: that users can screenshot exactly that which they possess the capability to display to a screen somewhere.

Not only that

...but all OSes have a naive structure that limits cryptographically secure, resumable sessions. Also, connecting to a display manager does not automatically provision assistive technologies; the specifications for XDMCP, ICCCM and RFP are mostly bolted on hacks and can't efficiently achieve such rich UX. I will not go into the details here of why X's Multi-head R-and-R is pointless if you understand PLT.

What is a UI?

I don't really agree with what you've written, and I wonder if we both have the same "UI" in mind. I mean the GUI - visual and aural interface elements (windows, buttons, etc.) through which the user interacts with an application. I do not consider the applications themselves and shared documents and databases as part of the UI. I would expect sharing a UI capability to be rare - you e.g. share access to a printer, not access to the print button widget.


It is not so unusual to share UI capabilities between distrustful components. Browsers and web systems do it all the time, via linking and navigation, iframes and transclusions, embedded Flash advertisements, or older plug-in components that render video to a region in the window. (Google Chrome even puts each plug-in into a different process.) Even internally, many applications are extensible via plug-ins or scripts that may be less than fully trusted. For these, it may be necessary to 'share' UI caps even within the application. 'Insecure' UI libraries achieve this by sharing almost everything.

Desire to share applications among many users isn't uncommon either. Today we have coarse-grained desktop-sharing hacks with inadequate security. We'd like the ability to share specific apps, or specific windows, or revocable caps to specific windows, or even specific buttons. While mashups aren't extremely popular, they do demonstrate that, if users had the ability to break apps down into caps and reorganize and share just the desired pieces of them, they would certainly do so (though there would be more flexibile if one could obtain a cap associated with the button's behavior and description (text, tooltip), rather than a rendering cap).

The reason sharing UI caps is 'rare' today isn't lack of demand; the problem is lack of ability.

You say that a shared document is not part of the UI, but I think that if you display pixels functionally related to a document, and I display pixels functionally related to the same document, then we clearly have visual interface elements through which we, the users, are interacting in a shared space with the object-browser application. My interest in Zoomable User Interfaces smears these lines quite thoroughly.

It seems to me that your vision of what 'GUI' means is limited to what is most fashionable at the moment. I accept a broader view, influenced by web mashups, Croquet Project, and plenty of work developing components in multi-operator and multi-tier command and control systems.

Still no counter-example

All of the things you just listed can be done without sharing a UI between users, as long as each user's UI satisfies certain signatures. Funny you mention zoomable interface. I hadn't heard the term, but I've designed and am building one for work. I still have no idea how being able to zoom around your UI smears the lines between anything - unless you are imagining that each user gets his own plot of virtual real-estate in your virtual plane? Even then, if I'm designing it, each user gets his own UI.

Forms Example

Consider: HTML, each user gets his or her "own" copy of a form. Each user effectively has different buttons, possibly with different authorities depending on their identity. Further, in Dynamic HTML, each browser constructs a personal little playpen for the specific user... i.e there is no shared identity between one user's 'document' object and another's; there is no shared view.

But this is not the only possible design.

Another design would have each form possess a global shared identity, such that two people with the same URI to a form will literally see the same form (to within whatever its layout constraints happen to be). As one user edits it, all others will see updates to it in real-time. In this design there need be no distinction between users that touch a form: unless one went to some special efforts, different users would generally would have the same have the same capabilities for both update and view.

If one wished to provide an initially isolated form, that can be done - i.e. create a new form and return a reference to a view of the form as a reply to a user action (such as pressing a button). Said user's browser may then navigate to the form, or otherwise incorporate it into the current view. One may also encourage isolation by having forms be single-use, and self-destruct them as part of a 'Submit' behavior. But this doesn't enforce isolation. Isolation is the exception, not the rule, and lasts only until said user decides to share the new capability.

Zoomable UI - or at least most visions thereof, including the one I'm pursuing - follow this latter design: in many Zoomable UIs, documents and forms and such are abstracted as being "open" all the time, for so long as they exist, even if they aren't immediately visible. Since the they are open and users may share a view, every object effectively become a widget requiring synchronization between users. A document, for example, is essentially a big text-area widget.

What it means to "share" is that updates by one entity can be observed by another. To share a GUI means the same thing: that updates to widgets (text spaces, radio buttons, check boxes, canvases, sliders, etc.) by one user will be seen by others. If arbitrary objects may be shared (as in a capability system) and every viewable object is a synchronized widget (as in a traditional Zoomable UI), then the lines blur: you cannot make a meaningful distinction between sharing a document and sharing UI. Sharing one is the same as sharing the other.

You seem to make a goal of avoiding sharing, but I frankly can't imagine why you'd want to do so. Sharing is the basis of more than just mashups; it's also a powerful basis for resumable UI and the ability for users to move from one terminal to another.

As far as "virtual real-estate" goes, no - I'm not imagining that each user has "his own plot of virtual real-estate". In the design I'm pursuing, the 'view' held by a user is effectively defined by a reactive expression, which may be bookmarked (and shared). A data capability can serve as a primitive FRP expression for embedding an initial view, but the 'address bar' equivalent could just as easily include a whole layout and document definition. Navigation through a space is reflected by updates to the reactive expression, as are spatial manipulations to layout, style or accessibility transforms, and filtering. Users share UI spaces insofar as they their respective expressions each share a view of objects - i.e. have a non-empty intersection of visual or aural capabilities.

Still no sale

I want different fonts, color-schemes, and layouts than you. I don't want to see tooltips when you mouse over things, or your mouse cursor for that matter. I don't want to see buttons depress like playerless piano keys as you click. Should we be able to share a form by sharing a URI? Sure, but I have my UI view of it, and you have yours.

I think the orchestra is starting to play...

Should we be able to share a

Should we be able to share a form by sharing a URI? Sure, but I have my UI view of it, and you have yours.

To the extent your actions control part of my view, I cannot reasonably say that the view is entirely my own. When a user shares a capability to view a system, one ends up with a shared view, even if the sharing between views is not 100%.

But it seems we're duking it out over definitions, which suggests further debate will be without profit. You see UI as an artifact produced by a rendering element. I see UI as something that can be rendered when it becomes convenient to do so. Based on later discussion, you assume that the renderer must be provided to the UI. I assume the opposite: that the UI must be provided to the renderer.

I think my definition is the only sensible one when it comes to understanding UI mashups, resumable UIs, zoomable UIs, persistent UIs, windowing (hidden or occluded UI), etc. I suspect you also see your definition as the only sensible one. Oh, and I like to butter my bread at the top.


So in
we're just
but your
are better.

One nit

You should see buttons depress, if you gave me the capability to control your mouse and keyboard and other input device stream via my input device streams. This is sort of how GoToMeeting works using RFB, but David and I are both talking about something much richer. GoToMeeting is a good example, though, since you can select a single top-level window the remote user can see on your local machine.

this thread isn't visually narrow enough

so i'd like to add to it. i suspect that, as was pointed out, there is a subtly different connotation for "UI" being used here. i suspect Matt is thinking "hey, a user has their own keyboard, mouse, and display, that is their own UI" which isn't how David is thinking about it?

The Blue and Pink Planes

plenty of work developing components in multi-operator and multi-tier command and control systems.

At some point, I would love to hear you explain in depth how your defense-related day job influences your position on PLT and especially GPU/GUI API design.

Blue and Pink planes is an allusion to Alan Kay's 1997 OOPSLA speech, where a Pink plane is incremental, and a Blue plane is orthogonal to the Pink plane and ascending it allows you to create something revolutionary.

Just to be clear

The JPEG encoder would be trusted, so that the most important issue here is you are saying, "I trust you to use the JPEG encoder functionality in conjunction with this portion of screen real estate, under these contractual obligations (e.g. 'you may only screenshot once every 30th frame'). I don't, however, trust you to use PSD encoder because I've determined it interferes with important real-time requirements."

It might be coolest to be able to screenshot using a PSD-like file format (any file format with layers), because then you can separate the composite according to actual UI primitives. So the functionality could be much richer than your conventional Operating System's screen capture functionality, which mostly uses GDI-like printers to copy from the device context to the data store. However, if I don't give you the capability to encode to PSD, then you can't do it.

Oh, except primitive screen-shot cap (frame buffer access) belongs at the level of the rendering cap, I think. "Take a screen shot and store it as a JPG in my pictures folder" belongs at the UI level.

What effects will the partitioning you are arguing for have on software engineering -ilities?

Personally, I think the partitioning depends on the application and its set of use cases, and what the architect thinks is appropriately secure given the user's security needs. Practically speaking, it also depends on how the OS authors chose to design the Graphics API; the frame buffer might have certain compositing effects that you do not want the object to have the capability to use as digital entropy.


I don't think I agree that the JPEG encoder needs to be trusted. You should probably be able to specify computational limits, though.

I sort of have a hard time understanding the use case for this UI level screen shot tool. If I'm taking screenshots, I usually want shots of the whole screen. I'd like the OS to allow this, but treat it as the potential security hazard that it is. The PSD screenshot idea might be nice, particularly if the hidden parts of windows were also captured, but I'm still not sure about the use case.

Trusted JPEG Encoding = supremely fine-grained case

You can specify computational limits, but the whole point of placing limits on resources is to prevent abuse. Specifying only computational limits would be awkward, and might not be easily exposed to the user or third-party plug-ins.

Computational limits could be specified in addition to the convenience afforded to other programmers of you saying, "Look, we made sure this JPEG encoding crap works and is isolated from the performance requirements of the app's critical functions. Don't worry about rolling your own and testing to make sure it fits w/in our resource limit restrictions."

I'd like the OS to allow this, but treat it as the potential security hazard that it is.

Sorry, I don't understand what that means.

I sort of have a hard time understanding the use case for this UI level screen shot tool.

It would probably help if I knew you better, such as what GUI programs you worked on, as that would help close any communication gap. Apart from that, it goes without saying that I am extremely closed-minded and visionary; part of that vision is simply extremely fine-grained, robust composition. I think such software will ultimately feedback and change how we create programming languages.

But probably the simplest example would be a WPF 3 / Silverlight 3+ VisualBrush. VisualBrush is sealed and therefore can't be subverted by clients, so really the only further requirement is a bottom up ocaps security model.

Edit: The more advanced scenario, which I alluded to earlier, is fault-tolerant network printing. This is something you cannot do with GDI.

Extremely fine grained composition

If I understand you, I'm on board with fine grained composition, actually. A focus of the language I'm working on is making parameterization easy, which I think supports this style. But passing in a trusted JPG encoder is the easy part. How do you forbid a component from brining its own JPG encoder to the party? I suppose you could try to make a JPG encoded values opaque types, but that won't stop me as a component from encoding JPGs and discarding them, just to spite you.

VERY good question

And exactly why I am totally rethinking how people go about designing GPU APIs and GUI APIs.

The short answer is you need sealing and the GUI API must be able to give the developer control over whether to forward a direct arbitrary encoding capability, or a very specific, fine-grained encoding capability.

A focus of the language I'm working on is making parameterization easy

I am not sure what that means.

I think Matt just meant that

I think Matt just meant that taking a screenshot can currently bypass the UI layer entirely and simply access the framebuffer directly, so any isolation measures the UI attempts to prevent this must be already OS-level protection mechanisms, ie. a chroot jail of some sort if in UNIX. I agree that subsetting the whole-screenshot into regions via a trusted window manager is ultimately necessary to make this work.


Just to be clear, those regions do not have to contain each other, or even be rectangular.

The OS is special in that it

The OS is special in that it CAN prevent this.

This is just a fundamental misunderstanding of modern operating system design research, particularly ocaps systems.

People often recommend Levy's book and Miller's thesis and Rees's thesis on this matter, because they are recent. But as far back as 1967 I've found arguments for protection down to the hardware level (JK Iliffe's Basic Machine Principles, which according to Vladimir Safonov influenced the design of the Russian "Elbrus" supercomputers [the first supercomputers written purely in a high-level language w/o a prosaic assembler]). Iliffe's early critique of the limitations of early timesharing multiprogramming systems in Chapter 1 is alarming when you realize it was written in 1967!

Ask yourself this: What is the kernel going to prevent? How will it notify the application of such prevention? How will the application tolerate such an access fault? Monolithic statements like "OS can do very broad idea X" is asking for trouble, and an appeal to God Objects and no partitioning of control.

Please read up on this issue, and feel free to e-mail me off list if you do not understand. I'll do my best to get back to you.

I do not get what code based

I do not get what code based layout has to do with security:

Containment pointwise reasoning tends to delegate authority using the visual or logical containment hierarchy of the GUI. IMHO, this has to do with how the GUI Toolkit API designers try to encode their GUI system into the language's type system via its static production rules. I think this is mostly a modern notion, too, because if you look at Newman and Sproull's books on Interactive Computer Graphics in the '70s, they don't really talk about type systems. Neither is it present in Sketchpad. The scene-object graph metaphor appears to have come later. I therefore think this late-coming attraction is excess baggage that should be dropped.

Sean McDirmid has a couple of interesting examples of an alternative model based on "physics" where the programmer is allowed full control over layout.

You can still create a concrete syntax for your UI a'la XAML's WPF module or DOM Level-2, but you should not tightly couple it to your semantics and use it to hide an (unchangeable) layout engine state-process.

It would probably be more beneficial if you could do containment pointwise reasoning using, say, dependent types.

No worries :-)

Luckily, there's no need to consider this as an "us vs. them" question, since Ur/Web supports what you want quite nicely. Just implement your final application as a functor (in the sense of the ML module system) over a set of templates. Each template can be a function from data acquisition methods to HTML pages. Since Ur is a pure language, assigning a template a type outside the "IO monad" naturally prevents it from doing more than you'd expect a template to do.

How does the front-end for assigning a template compare

to, say, Expression Blend or Eclipse Rich Client Platform, or BIRT, or Pentaho BI, etc.?

This I think is the essence of Michel's question.

From my use of Ur/Web, there

From my use of Ur/Web, there is nothing to stop a programmer from building these kinds of abstractions. One could create the kind of "designer interface" that just involves writing HTML code with some inline directives to display computed values.

Not that this would necessarily be desirable, but the option certainly exists.

Obvious thing to check

Does anyone know of any other systems that allow this? Has the desirability of this facility been articulated somewhere?

Although I haven't played around with either your system or WebDSL it would seem to be the most similar thing out there to have a look at.

OPA, of course :)

OPA uses the same kind of combination techniques in web applications, at a somewhat different abstraction level. Essentially, state is encapsulated in sessions, a unit of state + capabilities + concurrency + distribution, reminiscent of process algebras (or of Erlang with capabilities), and which serves in particular to implement transparently Comet-style calls, hot-code updates, etc.

While the low-level implementation of some sessions may involve (private) database paths, (private) cookies, (private) references, (private) special urls, etc. all of this is hidden from the user behind a unified set of high-level primitives. Note that, where Ur/Web's paradigm seems oriented towards static composition, OPA's sessions can be composed dynamically. Both languages are statically typed, of course.

I'm currently working on a paper explaining all of this in more detail.


As I understand the online OPA examples, there is manual mapping of URLs from strings to handler functions. How does this work with composing components, without needing to worry about how their URL schemes fit together?

Given your clean ML-y way of treating database values and references, I see how they play well with sessions. In OPA, are cookies also first-class capabilities that can't be forged?

Can an OPA session have abstract types associated with it? If not, can this kind of modularity by supported in some other way?

Can a session effectively control a subtree of the DOM?

How does this work with

How does this work with composing components, without needing to worry about how their URL schemes fit together?

In OPA, developers can provide URLs (or, more precisely, families of URL parsers) they wish to handle manually, e.g. for providing end-users with bookmarkeable or easy-to-remember addresses. These URLs can be composed by the compiler but there is no guarantee that the schemes will fit together nicely.

Most URL families (including CSS, Ajax, Comet, etc.), however, are generated and composed automatically by the compiler. In these cases, users don't need to worry about fitting together.

In OPA, are cookies also first-class capabilities that can't be forged?

In OPA, cookies are considered a low-level implementation detail. The library doesn't contain bindings to manipulate them directly, although nothing prevents from adding them.

Can a session effectively control a subtree of the DOM?

Again, not at that level of abstraction. A session can control a user interface widget.

User interface widgets

Can a user interface widget control a subtree of the DOM, such that no other component can change that part of the DOM without going through the owning widget's interface?

Subtree security

This seems to assume that the DOM is a good model for security in the face of designer-developer user stories. I think the DOM is a huge hack and implicitly assumes containment pointwise reasoning is a good way to allow for robust UI behaviors. The bloat of CSS, HTML and JavaScript suggests otherwise. jQuery is allowed ambient authority over the DOM precisely because of this containment pointwise reasoning assumption. You even see this containment pointwise reasoning in Peter von Roy's Concepts, Models and Techniques book (chapter 10). Cay Horstmann's books also advocate this. So does Craig Larman's book on Object-Oriented Analysis (weakly posturing that the notion of "Controller" is inherent to OO).

A big reason why I am interested in Constraint Multiset Grammars and Hyperedge Replacement Grammars for GPUs is to come up with an alternative model that is more dynamic and yet doesn't sacrifice static safety gaurantees.

Edit: Reread what David wrote. David's point to you appears to be similar to mine, but I can't say for sure. Bottom line: DOM vs. user interface component are two fundamentally different things. DOM presumes subtrees for modeling state, which is simply a logical disaster waiting to happen from a developer-designer user story standpoint. This is a major reason why some of us, including me, have said to Ian Hickson: Why are you working so hard on HTML5? Just give us a raw GPU library and let those of us know what we are doing to build our own abstractions (I've personally advocated in the past the dubious position that we use something like Google Gears to help parallelize drawing to Canvas). Give us a Canvas control. People give Tim Berners-Lee way too much credit, especially the REST folks. They argue HTML's ingenius is it limits non-RESTful abuse.* I think that argument is ridiculous, and only makes sense to people who don't design systems for a living. HTML5 breaks with much of HTML's philsophy anyway, incorporating a rich "network object model" for how to interpret HTML.

* If you are building custom hypermedia types, even if you embed it inside an old hypermedia type for compliance with the user's agent, then a key design question is: "How do you express links?" In HTML, href attribute on an anchor (a) element and the URIref value for the action attribute on a form element precisely define the semantics of these actions. Except, with DHTML, the composition system subverts these semantics and pushes them into the developer's control with the DOM; this is precisely why the DOM is a step backwards. Incidentally, many REST advocates do not understand this system design point. You can spot it by seeing how hungry they are for RDF and such ontology engineering specs. With RDF, there are no definitions for how to express links. That is why you have working specifications like RDF-FORMs. For this reason, returning HTML, XML, JSON, RDF/XML or Turtle from the same request requires the client to know special details about link semantics across the board. This has a real broad engineering impact on how you design the linker and loader in your distributed system!

I don't understand what

I don't understand what you're looking for references to. Ownership types (or encodings with dependent types a la Fine) seem pretty solid on the theoretical front. Fine also starts to look at compiling down to a malicious/multiprincipal environment (the browser). Obviously, the protection goals are different from that of say Jif.

If you mean building secure APIs on top of a notion of ownership, that's really cool to see you doing; there seems to be a big lack of developers using such languages so I wouldn't expect many safe APIs (Joe-E is cool in this, but I don't know their progress). I think Jean Yang's PLDI paper this year had to a lot of this, but in the OS domain. The use of dep types to mechanize PL research helped its marketing among PL enthusiasts, but that was a narrow use; cool to see you (and at least two other groups) applying these ideas to deploying code in browsers!

If you mean statically typed web browser programming... if your code naively translates to JS, it likely cannot safely interact with other JS code. You need to jump through hoops for full abstraction, whether rewriting a la Caja, changing the language (Ben Livshits and I did this for ML-ish interactions with JS as part of our Oakland paper), or a proxy system (see my WWW paper). Keep in mind those are about safe JS interactions (in which case trampolining might have actually been enough) -- sharing the DOM/BOM is substantially harder as browser versions and even the API standard are surprising, and we don't even know what it means to safely share access (e.g., Helen Wang's experiments with pixel sharing). Of potential interest, Arjun Guha has been doing a lot with types for JS.

For running multiple applications on the same server... isn't that de facto? For multiple components on the same page, same thing. The coolest form of this that I've seen is probably Seaside's, but the expressive challenge wasn't one of safety.

If you were clearer about what you want references to, I might be able to give better ones than this. E.g., what are your threat and trust models, allowable solution approaches and expressive requirements, etc.?


Thanks for the detailed response!

What I'm after is actually more about software engineering than security. Encapsulation is motherhood-and-apple-pie for most programmers, even though we don't usually think about the different classes or modules in a program as at war with each other. Rather, complex programs are easier to understand if we chop them into pieces separated by well-defined boundaries.

My question is about past systems and insights about chopping web applications into pieces with impenetrable boundaries, assuming that you have control over all the code involved (so you can, e.g., apply static type-checking). A module or class owns a database table, a cookie, an AJAX RPC handler, a Comet communication channel, a DOM subtree, etc., and everyone else who accepts the same static discipline can only get at these things by going through the module's published interface.

Still not sure what you're going for

I'm not sure what the productivity benefits of fine-grained partitioning of the resources you listed are. Some represent limited resources, so the concern might be resource management. Many frameworks have been used to manage locate resource consumption (e.g., local state is split cookies, sqlite, XHRs, etc.) and non-local resource requests (Doloto, GWT, etc.); I haven't seen studies about composing such frameworks in an interoperable way.

If the desire is to be able to have a central point to modify use of these resources, that's a classical application of AOP which is increasingly examined for the web but hasn't shown enough utility for mass adoption. There's a stated desire, and while my belief is this will get more important, it isn't a mass one at this point.

Using the word 'impenetrable' makes me think security. However, by supporting function abstraction, and backing away from JS not being truly capability secure, I'd argue the natural programming model *is* to be ***encapsulated. Bringing in static guarantees seems to be more of a question of assurance; if we are to bring in new abstractions, the push has been to go away from base web abstractions. I still don't think I'm in the right mindset for your question; perhaps you have a leading example of a scenario where we need an only-seemingly-impenetrable boundary in the web stack that isn't already naturally addressed?

***There are some crazy bits of the DOM API that might be tweaked (e.g., registering 'message' handlers in postMessage let's you snoop on all incoming messages) and cookies and DOM-as-global as you mentioned, but at the level of SE, most of it is already encapsulated. I'd like to see these things tweaked at the API level (fixing the assembly language). Interesting to ponder is that for the resources that are not directly encapsulated, SOP is a heavyweight way to achieve it, but (frustratingly) is almost never used.

Edit: perhaps comparing to facilities provided by RoR, GWT, Script#, or Seaside would help? This conflates an application framework with raw web programming though, and you seem to be interested in the latter with maybe a LAMP stack on the backend.


My simplest "real" example is a component (let's call it Auth) that encapsulates a usernames+passwords database table and a login cookie. Other components can ask Auth to display a login form at a particular place, or they can ask Auth which user is logged in in the current session, but other components can't access the password table or the cookie directly.

This isn't about what's necessary to achieve some goal. It's about making large programs easier to understand by placing hard limits on the way particular resources may be accessed.


To clarify, this is also the kind of thing we're doing in OPA. Except, in our case, we're doing this in a more functional/concurrent way and less in a database/mutable manner.

Depending on JavaScript and using Caja

Have you (Adam C.) looked into mapping any JavaScript-generated code into Caja instead of pure JS?

Not sure who 'you' is, but

Not sure who 'you' is, but if me, we did in an early form of one of the projects. Mark Miller has been pushing some important changes into the JS spec motivated by his experiences that will hopefully essentially obviate Caja's need by putting it into the language, reducing it to a library of patterns (e.g., our membrane and view ones).

*However*, the full abstraction problem still exists; even if JS is a capability language, its dynamism makes it very easy to leak capabilities. For example, consider enforcing public/private method access modifiers / the integrity of a functor in the presence of JS call stack inspection (if made cap secure) or if Ur did lambda-lifting compilation into a list of globals; a cursory glance revealed both for Script#.

Getting back to Adam's post.. There are tricky parts of login (e.g., why images on facebook still often got leaked), but the basic part of making a login interface was never an issue in my experience (again, we're skipping tricky UI attacks). I'm all for supporting at least DAC on user data (incorporating the persistance layer); so perhaps that's a better example. Last time I talked to them, the WebDSL guys said they decided they needed to make this shift as well.

I meant Adam

Adam did not clarify what he meant by "static enforcement of which app may touch which resource, but without the need for any dynamic enforcement" (in his original post).

I was not sure if this excluded needing something like Caja.

Edit: Fixed my poor English.

I think he means static

Based on what Ur is, I think he means static checking of direct per-subject access akin to Nikhil Swamji's Fine line of research.

While we normally focus on the language-level parts of Caja, it really is an eco-system (e.g., API taming, which is crazy hard).

Just to be clear

The idea behind still requiring Caja (or equivalent) in Ur/Web is that an Ur/Web hosted in a client's user agent can still be susceptible to XSS if the Ur/Web is mashed up in an iframe.

I am saying I don't understand the necessity of static enforcement, as I understand it, because static enforcement doesn't work perfectly when the environment it ends up being hosted in can't be ocaps-secured. Nikhil Swamy [SiC] distinguishes between stateless and stateful authorization policies, but Fine assumes the presence of DCIL. Swamy did not prove he could use Fine to target an IL like JavaScript.

Sidenote: I have looked at Ur/Web in the past prior to this post, which is why I asked Adam in the OOPSLA thread about it just before he posted *this* thread to LtU; I first looked at it when Paul Snively mentioned Adam is working on a book about using Coq and its tactic language to develop proofs. I didn't have enough time to really dig into how he was guarding against such attack vectors.

My big thing is to do security in layers, doing security the paranoid way.

Edit: So my question about using Caja instead of pure JS, can be turned into: What does your secure IL look like on the client-side for the browser?

Edit: perhaps comparing to

Edit: perhaps comparing to facilities provided by RoR, GWT, Script#, or Seaside would help? This conflates an application framework with raw web programming though, and you seem to be interested in the latter with maybe a LAMP stack on the backend.

What about more basic infrastructure like WebMachine, which pretty much IS raw web programming? It strips away abstractions like MVC, MVVM, MVP, Model-1, Model-2, ActiveRecord, DataMapper, ORM, blah, blah, blah.

This is the sort of infrastructure I wish IIS and ASP.NET was based on. The only thing I don't like is that its dispatcher seems to encourage out-of-band communication between the client and server, thus designing weak service encapsulation by default, making for a hard versioning story.

Happstack is a Haskell framework that technically should be way better than UriTemplate-based "dispatchers" since you can reason more directly about the HttpRequest and determine exactly how to route a message and just what that means in terms of set-up and teardown. Happstack still isn't nirvana, though.

web concerns summary?

i'd love to be able to read a short position paper that lists as many bullet items like 'how to make the dispatch not suck'. perhaps extracted from as you suggested elsewhere getting all the web stack people into a conference room and duke it out / stone soup it up a little.


could be improved to better support security. It does not have static safety guarantees, and it mostly encourages spaghetti coding.

I would also be very careful about not leaking authority or fingerprinting information, etc. with a login cookie. What you are effectively claiming is you can transparently manage a protocol for the programmer. See Murdoch's excellent paper Hardened Stateless Session Cookies.

Strongly worded warning: The last thing you want to do security-wise is commit "Wadler's Folly" all over again, promising the world to practitioners and then totally ignoring security. Every time PLT academics do this, practitioners become more weary of listening to you guys and more convinced they should just stick to Rails, JBoss or Spring, or whatever. Why learn ML?, they ask, these guys don't even understand security... and they are trying to convince us these 'monad comprehensions' help us...how?.

There are too many frameworks here that promise the world with no explanation of what that world actually looks like. Poignantly, they cannot tell you what features are missing from the world. echo, TrimPath Junction, UnCommonWeb, Links, Volta, Jif, HaXe, OPA, Andro (formerly Andromeda Project), LittleSteps, GWT, Seaside, Ruby on Rails, ASP.NET MVC, ASP.NET Web Forms, J2EE Servlets, Facelets, StringTemplate + domain model, Ousterhout's Fiz framework, Django, Merb, SQLServer Modeling Platform (formerly codenamed "Oslo"), Sproutcore, Objective-J and Cappucino Web Framework, LtU user Noel from untyped.com also had a ICFP'07 paper describing their 'security pipelines' approach but failed to really give a convincing explanation of their whole framework, etc. Some of these have been discussed before in the LtU archive.

I've tried all these things, experimented with them, they all are lame IMHO.

There are other issues in building a one-size-fits-all solution like GraftId that simply suggest to me GraftId will not scale as much as you might like. The main issue is concurrency control and transaction management for the data store. This is way too deep an issue to discuss here on LtU.

GC and Closures?

I have an upcoming project in the near future that might be done as a web app; so I played around with a number of the ur/web demos and it looks very interesting.

One thing I noticed is your VM interpreter written in JavaScript. It is very unfortunate that JS does not support either implicit or explicit tail calls, making a ur -> JS compiler difficult, but a quick glance makes me think this particular aspect is rather toy-ish at this stage in the game.

I'm curious though, what's the problem with garbage collection and closures? Is there a particular reason for avoiding them, or is it more an issue of avoiding the need for a complicated run-time environment?

Runtime issues

Are you asking about why the compiler doesn't let you allocate closures in server-side code? If so, the answer is more or less that there are clear performance benefits from avoiding that, and I haven't found a need for it so far, so it hasn't seemed worth implementing. It almost certainly wouldn't be a big deal to support closure allocation within the existing region-based memory management. I have a hunch garbage collection is overkill for server-side pieces of almost all real applications.

I'm not sure it's fair to call my virtual machine implementation "toy-ish," given how rare it is for serious computation to go on client-side.




I'm not sure it's fair to call my virtual machine implementation "toy-ish," given how rare it is for serious computation to go on client-side.

To be clear, it's not that I think that implementing a VM in javascript is inherently toy-ish. (though I do think it's unfortunate that it's necessary.) I just noticed things such as your linked-list environment representation.