Coding at the Speed of Touch

Phew, I'm done with my paper submission* and would appreciate feedback from anyone interested in this topic. Abstract:

Although programming is one of the most creative things that one can do with a computer, there is currently no way to make programs on an increasingly popular class of tablet computers. Tablets appear unable to support capable (proficient) programming experiences because of their small form factor and touch-centric input. This paper demonstrates how co-design of a programming language and its environment can overcome these challenges to create a language, YinYang, that focuses on do-it-yourself game creation on tablets. YinYang's programming model is based on tile and behavior constructs that simplify program structure for effective display and input on tablets, and also supports the definition and safe reuse of new abstractions to be competitive with capable programming languages. This paper details YinYang's design and evaluates our initial experience through a prototype that runs on current tablet hardware.

* Please excuse the extra icon click, I can't figure out how to link files from Skydrive directly.

Edit: updated paper and new abstract, this will go back to the reviewers soon.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Requires a windows live account

Maybe I'll be less lazy later today if no one posts a clear link.

Just keep clicking.

I didn't need a Windows Live account. Just keep clicking on the icon and you'll get to the paper.

It's the mobile site

Thanks for the tip. The login is apparently only required for mobile devices. I am able to get the paper from my laptop.


Why don't you just upload your draft on Arxiv ?

Unfortunately, too many file

Unfortunately, too many file share sites are blocked in china, skydrive seems to work though, well kind of.

Impressed but

I need to play with it. On a Surface ;)

I am curious about converting some of this code to Subtext similar to Haskell Damage port
so that multiple representations are possible.

Also, Ken Arnold had a paper at SPLASH about searching for Scratch tiles by objective. I mention this because of the conversations we have had in the past about tapping the powers of a Watson-like computer.

As I write this message, I am on an iPad (and hate it ;)). There's no arrow keys for editing text, so I don't bother to do things I consider simple like providing copious hyperlinks. Even above, the link is plaintext. Forget correcting typos like capitalization. This has forced me to accept we need a lot more input methods than we have today, including code search.

Search is a tricky problem

Search is a tricky problem and I'm now just getting a handle on it. Sometimes people don't know what they want and they are just browsing for something, even a keyword wouldn't work here. Sometimes they know what they want but what they could use isn't annotated correctly. My hope is that organizing tiles in semantic graphs that are then navigated in menus can help in both cases, but I'll really have to scale the system up before I can really have a good idea on how that will work.

Typing on an iPad is a pain. I wonder if a structured language could ever be used for communication. What if we were limited to iPads, how would we adapt our language (SMS thx k ....) or would we just come up with a smarter input?


The best way to improve bandwith in a one-to-one communication would probably to have a camera record oneself speaking (like video-conference, but possibly asynchronous). For situations where video isn't suitable (because it gives too much information about the context), or where the receiving end would like faster access to the information, we would use a speech-to-text system to translate what we say into text. Have you thought about voice-controlled systems?

The major barrier to voice innovation... that the best people use their skills at financial trading firms to predict stock movements. :(

I thought that was the major

I thought that was the major barrier to making any progress on string theory (or the theory of everything, all those physics PhDs working on wall street...).

Conditional accept.

Conditional accept.

I've just posted a new draft

I've just posted a new draft of the paper and modified the post accordingly. This will go out in a few days, while the ideas should be much clearer now :)

Accept accept.

Accept accept.

Questions about the paper

What does "Part | Bank | Design Point" mean at the bottom of page 4?

As a readability comment, I found the use of bespoke vocabulary (and the overloading of the meaning of the word "tile") to be off-putting at first. It required a bit of effort to figure out what it was you were really saying. In particular, I'm not certain I understand the semantics of "extending a tile in an act". It uses the vocabulary of a type system but seems to specify a semantics more akin to method invocation.

But very interesting work! Congratulations on getting the paper accepted.

What does "Part | Bank |

What does "Part | Bank | Design Point" mean at the bottom of page 4?

Declares a bank part, and the position of the bank part is specified via direct manipulation in the designer.

As a readability comment, I found the use of bespoke vocabulary (and the overloading of the meaning of the word "tile") to be off-putting at first. It required a bit of effort to figure out what it was you were really saying. In particular, I'm not certain I understand the semantics of "extending a tile in an act". It uses the vocabulary of a type system but seems to specify a semantics more akin to method invocation.
But very interesting work! Congratulations on getting the paper accepted.

Vocabulary is difficult, on the one hand I have this metaphor, a physical tile, on the other hand I lose the distinction between invoke and extend, since they are both handled the same way. I really wanted to use "play" as in "play a tile" in a board game, where play is used both to invoke and extend.

Readability is the primary criticism from the reviewers and must be fixed as condition for acceptance. So any advice/ideas on presentation would be very helpful (I'll post the updated paper here regardless).

A detail about "See" and "Seen"

Having a look at your new draft, I've noticed the "Seen/Seen" feature of your language, which I found interesting. Sorry for the its-not-the-point aspect, this is probably not what you expect feedback on, and I suppose this was already present in the former draft.

In the first code example, the relation between the "See" action and the "Seen" argument seems nice linguistically, but a bit ad hoc. You explain it later -- page 10 on the draft -- by the fact that "Seen" is an name "exported" by the tile "See".

This is interesting. I've been wondering what makes the pattern matching feature of Hope then MLs and Miranda/Haskell so nice to use. They combine control flow (selecting which branch of code to execute) and binding (adding new names to the environment). The mix of those two orthogonal aspects creates a very natural feature.

In your language, tiles handle control flow (the "success" of one tile decides the execution of the following tiles), effects (tiles can act) and binding (through exports). It's interesting.

The combination of flow+binding feels to me more natural than, say, effects+binding or flow+effects+binding. This is probably because I'm spoiled by pattern matching, but this example goes in my direction: See is a "control flow tile". I'm wondering whether there is some underlying meaningful separation that could be made, or if the generality is actually necessary. Have you come across natural examples that combine both flow and effects in a central way?

Finally, an important difference between ML patterns and the "export" feature of your tiles is that the bound names are apparent in a pattern, while they are implicit (and hard-coded) in your case. This is probably what gave me the feeling of See/Seen being an "ad-hoc" construct. I'm wondering if that could be an obstacle to code understandability: you "have to know" which names are exported by a given tile. Your contextual system helps with that -- exported tile appear as suggestions in the tile selection menu -- but I'm not sure that's enough. If you weren't under strict space limitations, I would suggest displaying the exported names in the tile, eg. "See(Seen)", and in the menu selection, the name of the exporting tile, eg. "Seen(See)". This issue can probably be managed with solid "social" naming convention, eg. using past participle consistently, or subtler user interface clues, such as an arrow pointing to the "defining tile", or even a color code.

I borrowed this from Kodu.

I borrowed this from Kodu. In Kodu, a robot can "See" something then access the result by an "It (Seen)" tile. They only support a fixed set of tiles, so this is encoded in the grammar, I had to make this behavior plausible in the type system.

Rotate and Move are examples of tiles that combine control flow and effect. They do something to achieve some goal (possibly), and when that goal is reached they succeed execution to the next tile on the right. So for example Rotate(Seen) will succeed when the object is oriented toward the seen object, and will also act to achieve that success.

I don't yet have a good example for effect/control flow/binding, I mean I have examples, they are just too convoluted to feel natural.

Seen will appear as a tile option under See in the menu, its just that this layer is optimized away because Seen is the only option under See. If See had two options, then it would appear as its own category. I have thought a lot about composite names: enhancing the name of a tile when there isn't enough context around for the user to guess what it does. So if Seen appears without See, then it should indicate its relationship to See perhaps with a prefix. Similar problem occurs with "Explode" and "Monster," the Explode tile looks weird and general without the Monster tile it extends.

Freedom and fuzziness

While sorting some papers I've collected, I quickly re-read "Evaluating a new programming language" by Stephen Clarke, which has been discussed on LtU.

In it I found a passing remark about IntelliSense that is very much related to your "Freedom and fuzziness" section, I thought you might be interested in the quote.

For example while the Intellisense feature (described earlier) helps increase visibility by showing all of the member functions and methods that an object or class supports, one participant reported in the questionnaire how it forced him into a particular order of working. In order for Intellisense to work, classes and objects must already be defined. There is no way Intellisense can show information for an object that does not exist. The participant wanted to start working with the top-level class in the specification. However, this class used all of the lower level classes. Without a definition of these classes, Intellisense could not be invoked to provide useful information. Thus the participant was forced to define the lower level classes before he could work on the top-level class.

(For the curious reader, here is the beginning of the related section in Sean's paper; it is followed by two explanations of how to alleviate the problem)

Unfortunately, programmers often dislike graphical languages because they constrain the order in which they can make edits, which disrupts their focus. For example, a symbol must typically be defined before use even if the programmer would rather use the symbol first and define it later to preserve their focus on the current code. In contrast, free-form editing allows programmers to write incorrect code that they can fix later at their convenience.

Mind rot

See also Does Visual Studio Rot the Mind. I've always thought that this problem could be fixed by smarter ideas. If the developer uses a symbol that it doesn't recognize, it could just insert it in the symbol table somewhere and begin to figure out how it is defined by the way the developer uses it. This would require IDEs to become smarter (use more machine learning), but it is completely possible.

My current research is

My current research is hitting this topic more: the problem with structured editing is that it (a) forces you to work in a specific order and (b) it doesn't present you with choices that are contextually related but not exactly valid for the current edit site. Ah!

Its almost like this a degenerate problem of static typing though; you have to do the proof in the right order in your head and in your code. Possibly, this is why some people prefer dynamic languages that are less constraining. Even using a language like Scala, which has great support for type inference, I seem to be stuck in a "declare type, use type" mode of work. Now I wonder: does static typing have a significant cost with respect to freedom of programmer thought processes?

IDE support

Hi Sean,

I appreciate your research. Isn't the static typing "cost" really a problem with the contextual inference of the IDE?

For example, VS20xx because of test-driven development trend started interrogating the user (Do you want to generate this method/class?)

It seems to me that the future is really MDA development along with very smart IDEs that have (not built-in) rulesets about what you want to do.

Isn't the holy grail for domain experts to program and somewhat a question/answer rulesets based on technical experts?

Flexibility of the language

Flexibility of the language has a heavy influence on tooling. The fact is that language shapes thought, and the IDE just augments that thought. So if the IDE asks the user "do you want to generate a class/method?" there must be a reason for that, a context to trigger the action, an permanent change that will be made. If the wrong change is made, the tooling becomes useless and the programmer must fix things the old fashioned way.

MDA is beyond me as I'm not a software engineering researcher. I tend to think at the level of programming while MDA entails a whole development methodology. There is nothing wrong with that, but many people just want to program. Do tests help a program that doesn't need to be tested (yet, at this time, or ever if its just exploration)? Given the language of the acronym-heavy SE world, it is very hard to follow or evaluate what they are doing.

Hugo Liu's Metafor used NLP to ask interactively ask questions about the program being written and ask for clarifications. The end result is just a skeleton at this point, but the approach shows some promise. But if you were going to go this route, you might want to come up with a language that was suitable for it, with a semantics that was robust with respect to ambiguity and underspecification.

Flexibility of structured editing

Doesn't that depend mostly on how the structured editing is implemented?

I assume you mean that you have to define a function before you use it. You can easily imagine a structured editor where this is not necessary, and even a structured editor that will automatically define stubs for the undefined functions you're using. That said, I almost always program in a bottom up manner and define variables and functions before using them. Even if a structured editor didn't provide an automatic stubs feature, it would not be a problem for me to manually define stubs in a few cases. Other people's way of working might be different.

Structured editors do have other flexibility issues, for example if you want to do edits that do not preserve code structure:




In textual editors you can just delete the if(...){ }else{ } text, but in a structured editor this might be more difficult. I think that this too, can be solved in a structured editor. You have to provide a more flexible way to rearrange sub-expressions. For example you could imagine dragging A outside the if statement with the mouse, or with cut-and-paste:


You can do the same thing for B, and then delete the whole if statement. In general it would also be helpful to have a scratch space to temporarily store sub-expressions, whether by dragging expressions to there or in the form of a flexible keyboard controlled clipboard (or a combination of both).

Contextual Clipboard!

That's an intriguing idea, Jules. We could have a 'clipboard' that essentially serves typed/structured, contextual source fragments and other objects, rather than raw text. The board could have multiple such objects, and just paste those relevant in context (perhaps asking for disambiguation).

That could probably be applied to UIs more generally. I am reminded of a discussion on c2 regarding hand vs. pointer - i.e. a hand carries a set of things (concepts, objects, tools) that might be relevant in a given scenario or might be manipulated directly (e.g. pick up block of code 'A' then decide to refactor it 'in hand' before putting the pieces back).

It's not a new idea,

It's not a new idea, existing operating systems already support arbitrary data in the clipboard, as well as dragging and dropping arbitrary data between applications.

With a contextual clipboard that uses types or something like that to decide what to paste you have to be careful. The problem that can arise is that it becomes difficult for the user to predict what will happen when he does a paste. You want it to be effortless to predict what exactly will be pasted, and doing contextual type inference in your head takes effort. In most cases this difficulty to predict what will happen will outweigh the cost of cycling through the clipboard in an easily predictable way. Additionally there will be cases where it's the user's intention to paste code in a place where it doesn't type check, to be corrected later.

As for a scratch area to drag things: usually there is a big margin to the right of the code available on the screen. Perhaps we can just let users use the space not as a linear one dimensional space from top to bottom from left to right, but we can allow him to place sub-expressions anywhere in the 2d space. That way you don't need a designated scratch space.

Contextual Clipboard

I'm familiar with existing clipboards, but I don't believe they qualify as the same idea. Supporting arbitrary data seems a very different concept than supporting contextual data.

I agree that it is difficult for humans to 'infer' context, but I don't believe this will prove a problem. A casual answer: if the best option is not the last item copied, display a 'ghost' (transparent, small view) of the copied data for an extra confirmation. We could also cycle through ghosts.

Alternatively, always present ghosts. This could be a user option. Cycling through choices is a fine alternative to brainpower.

But allowing badly typed code into a program might not be a good option. It would defeat the purpose of a structured editor, for example. Putting it into a scratch-space, perhaps using some sort of structured spatial relationship to existing code (along with comment boxes and such) does seem a reasonable alternative.

But allowing badly typed

But allowing badly typed code into a program might not be a good option. It would defeat the purpose of a structured editor, for example. Putting it into a scratch-space, perhaps using some sort of structured spatial relationship to existing code (along with comment boxes and such) does seem a reasonable alternative.

Type inference can help figure out what types are needed to make code well typed. Perhaps we just need more aggressive forms of inference, based on context, to make structured editors more usable.

Type inference

I agree. Actually, I was sort of assuming that to be the case.

On allowing badly typed code

On allowing badly typed code in a structured editor. I think it's very helpful during programming. Often code has to become temporarily ill typed during an edit, or at least it requires jumping through hoops to keep it well typed all the time. The same goes for making code temporarily ill-scoped (e.g. pasting a piece of code that doesn't have the required lexical environment).

For an example suppose you want to change the type of a function slightly because you want to return a Set instead of a List. Now you have a chicken and egg problem: one the one hand you can edit the body first to make the function use sets, or you can edit the type signature first, but both lead to temporarily ill typed code.

Of course it would be very helpful to indicate pieces of code that are ill typed, just like current IDEs do. So as soon as he pastes something he sees that the result is ill typed, and he can then decide to cycle further or to still use the ill typed one as is appropriate. When a programmer pastes something his intention is not "paste anything that I cut here"; he has a very specific piece of code in mind he intends to paste.

Even the smartest programmer has very limited brainpower, so I don't think you can say that cycling through choices is for dumb programmers ;) For any programmer we want to maximize the amount of brainpower that he can spend on the task at hand, and to minimize the amount of brainpower that he needs to spend to accommodate the editor.

If you allow temporarily ill-typed code I think the best option is to paste in a more predictable and familiar way (e.g. most recently cut first). If your editor does not allow ill-typed code I agree that the best solution is to only paste well-typed fragments, but it would still be helpful to display an indication "the most recently cut piece is not well typed here" to prevent frustration with finding out why it's not pasting the right thing.

In any case a clipboard is not something to store hundreds or even tens of items in. If you want to do a big code restructuring the 2d space to put fragments of code in is a better option. You can make use of spatial memory to remember what went where, and you can always see the pieces of code right on your screen (though you could of course also display the clipboard).

I'm familiar with existing clipboards, but I don't believe they qualify as the same idea. Supporting arbitrary data seems a very different concept than supporting contextual data.

Yes, your idea of a contextual clipboard is new, but my "idea" of a clipboard holding structured data is not. I just made that comment to prevent any wrong impression that I think the idea is new :)


I think the ability to temporarily make a mess of things is important.

This seems closely related

This seems closely related to the fact that, although conceptually one can support cyclic data structures in a PL without mutation, in practice cyclic data structures are apt to be constructed by mutation (either explicitly by the programmer, or implicitly by the underlying implementation).

Good analogy

That is a good observation. I think it applies more generally than cyclic data structures. When you have an existing data structure, you can do two things to change it:

1. You do the functional way of building a new data structure that copies the spine but shares parts with the old one.

2. You just mutate the part you want to change.

The latter sometimes has to temporarily break invariants. To do that reliably you either need to be single threaded (a single programmer editing the program), or you need to have some kind of concurrency control: a distributed version control system. As that is most often implemented it basically amounts to multi version concurrency control. Multiple programmers work on their own copy or snapshot, and when they are done with their "transaction" they commit the results wholesale. The difference is that these transactions are not retried on conflict, instead they are merged in some way with a merge tool. Locking is also sometimes used in version control, so that one programmer has exclusive access to a file, and others can't mutate it meanwhile. Interestingly like in programming language land this is falling out of favor in version control land: distributed version control systems are replacing centralized version control systems. Another concept of version control is push/pull requests that put a subset of the state one party has in the state that another party has.

Can this analogy be used fruitfully? For example what abstractions can be provided for conflicting transactions merging their results instead of retrying one of the transactions? Do push/pull in version control correspond to something useful in programming language land? Can we provide more interesting topologies for transactions than flat or nested? In version control the dependency graph doesn't have to be a tree. If A, B and C are repositories, then changes made in A, B or C can be committed in A, B or C as chosen by the programmers. In transactions the dependency is always a tree: the master program A creates transactions B and C, and B and C can create child transactions B' and C' of their own, but in the end B' will be committed into B and B will be committed into A. One programmer can work on multiple repositories in parallel. Is it useful to allow a process/thread to work in multiple transactions in parallel? That is not something Clojure or Haskell STM provides AFAIK. Is keeping a history instead of only the most recent value useful?

code - data

repository - snapshot

programmer - process/thread

task/bug fix - transaction

change set - write set

?? - read set

commit - commit

locking a file - locking an object

history - ??

push/pull - ??

unit test - validation check

distributed version control - multi version concurrency control

Mess and Merge

The idea of 'merging' transactions after conflict has been done before in various database systems. The trouble is that merges tend to be very 'domain specific' and benefit from having specialized update operators.

This is not easy to achieve in a generic manner. For example, since text is fairly opaque to a merge tool, conflicts are often sent to an expert system (a human) for help. This is acceptable for code mostly because said human was also responsible for causing the updates.

One programmer can work on multiple repositories in parallel. Is it useful to allow a process/thread to work in multiple transactions in parallel?

A transaction doesn't really correspond to a repository. If there are cyclic relationships between repositories, for example, then we'd probably want a distributed (multi-repo) transaction to avoid a period of inconsistency.

Repositories are probably better as administrative/security units.

temporarily ill typed

I agree with most of what you say, though I think the apparent need for 'temporarily ill typed' code depends very heavily on both the type system, the module system, and the editing mechanism.

For live programming, it might be interesting to have an IDE that allows you to build a distributed patch, then 'commit' it, atomically, when it is well typed and passes all tests. Until commit, the old code keeps running.

I did not mean to suggest that cycling is for 'dumb' people; I actually meant it as you say: developers are free to use their brainpower elsewhere. And I agree that the indicator is a good idea. (You could even just cycle through the ghosts and have a red 'X' in the lower corner of each fragment that is ill typed.)

Wrong place.

Wrong place.

Zany Doodle

I recently ran into Zany Doodle while perusing the Haskell libraries for 2D graphics (Haskell -> Hipmunk -> Chipmunk -> Zany Doodle).

Zany Doodle is a puzzle game, but involves 'programming' the environment by drawing it - with virtual crayons (different colors represent different materials), and a basic physics model. It seems to me this would be an interesting model for programming with touch, whether one uses fingers or a stylus.

Several videos are on Youtube. But here's one example to get started.

Chorded Keyboard

Douglas Engelbart's chorded keyboard via multi-touch might be an interesting possibility for the cases where developers need text.

touchscheme / lisping

This lisp/scheme editor / environment for the ipad popped onto my radar yesterday, might be worth a look although it's still quite keyboard intensive.

3D Touch

Some recent interaction work by Microsoft using see-through displays and Kinect.

This looks like it could contribute a lot to programming, esp. live programming.

Believe me, we have been

Believe me, we have been through brainstorms on how to enable programming via Kinect. It doesn't seem feasible right now for two reasons: the accuracy of these camera-based systems just isn't very good, and the programmer would get tired (guerrilla arms) very quickly waving your arms about without anything to support them (being in a zero-G environment would eliminate that problem, maybe programming for the ISS?).


Microsoft has released videos of some of these brainstorm sessions, too. All the ones I have seen are incredibly lame and offer zero material benefit other than maybe getting people out from behind a desk. For example, SQL Kinection for DBA'ing SQL Server.

As for 3D touch, I believe for many kinds of programming it is not the numbr of dimensions but rather the responsiveness of the input. See the Applied Sciences Group at Microsoft and their work on high performance touch. The goal at Microsoft is 1ms latency, two orders of magnitude improvement over today's devices. Perhaps then people will think Coding at the Speed of Touch is cool.

Weird, assuming a frame rate

Weird, assuming a frame rate of 60 fps then, it seems to me that they only need to get to about 15 ms or so response time. Also, consider high resolution retina displays like the one on the new iPad, computers are just becoming more real with fewer digital artifacts. This only affects PL indirectly though, I think.

I don't think any of these Kinect projects are very serious, its probably just fun right now. The next big domain for programming will definitely be speech-based dialogue systems. After that, I'd think tactile/haptic interfaces would start playing a role. The main problem with Kinect is that you have nothing to touch...

Speech based

I've been brainstorming a bit about speech based, for the golden-i. But I think it would be augmented quite a lot by having gestures.

Well, speech can be sign

Well, speech can be sign language. I can imagine if we can crack the dialogue problem, then we'd have to support some other form of "speech" to be cubicle-farm compliant.

Interactive Languages

I think to `crack the dialogue problem` will require pursuing the more interactive models, i.e. so the language IDE can get clarification from the user. And effective use of context will be essential, too.

I have some ideas, there, but I'd like to separate the programming into layers - a user, the user-agent, and the program. The program can be persisted or shared between multiple user-agents. The user-agent is specific to each user and understands context, learns a user's `accent` and small clues. The user communicates with the user-agent.

One idea I have is to pursue programs as an extensible stable model of constraints, such that state is more an incidental property rather than an essential one. It's easy to apply constraints, name them, coordinate them relative to observed conditions, maintain and release them. Composing locally stable elements would scale very well, since changes would tend to propagate the absolute minimum necessary. A user-agent would then maintain constraints on behalf of the user.