Disruptive PLT Revisited

In 2002, Todd Proebsting gave a talk about disruptive programming language technologies. I see that we talked about this 6 years ago and even earlier in 2002 but it seems like the right time that we talk about this again. What has been disruptive in the past 6 years and what will be disruptive in the next 10? We can talk either about technologies or the results themselves. I'll seed this topic with some initial posts.

Comment viewing options

lmeyerov post from another

lmeyerov

However, it is not interesting in the sense of disruptive and orders-of-magnitude. What was?

** MapReduce: simple high-level structuring primitives are enough for a good chunk of big data

** Ruby on Rails etc: database integration and structuring for MVC

** JavaScript/web: streamlining composition via an exposed DOM (mashups, extensions, search engines). Furthermore, lowering the bar for programming through JSVM optimizations, heavy tolerance to programming error, and having a smaller language than its predecessors.

** Scratch: visual computing simplified enough that kids can quickly pick them up yet still do powerful things + streamlined sharing

** DSLs: Max/MSP is currently my favorite :)

**Unit testing: significant enough that we can compare it in cost/benefit to type-rich programming

borderline

It maybe comes on the border of the 6 year mark, but the only thing I've really seen as drastic in the recent past is the proliferation of many solid (and mainly declarative) JavaScript libraries like jQuery.

Actually, JS libraries are a

Actually, JS libraries are a good point and are at least as disruptive as Ruby on Rails. That JS is a great language for library implementors and a so-so language for application developers is very significant: it turns out making the ecosystem easier to build benefits users.

What should disrupt?

I think it is interesting to discuss how PL technologies could disrupt industry as well as which ones have or could disrupt. Here is my biased list of possible future disruptive results and technologies:

• Another order of magnitude increase in who can and does program. Many white collar jobs are transformed into 50% programming jobs that automate society and allow for drastically increased productivity to the effect of a 40% increase in world-wide GDP (all numbers made up).
• Live programming and time-twisting debugging (see Bret Victor's recent talk) eliminate the gulf between code and execution, allowing for drastically increased programmer productivity.
• Programming is possible and pleasant on any device you use in your day-to-day life, meaning phones, tablets, smart rooms, etc...keyboards are no longer necessary (though still useful), you can program with touch, voice, ...
• Dialogue systems become advanced enough that we can have Siri-like two-way conversations with computers to build and modify programs (see Metafor for some initial work in that direction).
• Vast repositories of code + intelligent search tools allow programmers to reuse existing code very easily. Programming becomes much less about expressing novel computations and more about simply finding code to reuse and gluing it into your application. As a result, wheels are not re-invented and everyone is more productive.
• Advanced machine learning can process vast repositories of existing code to write new code for you.
• The end of the dynamic/static typing debate: someone designs a type system that is as flexible as dynamic and as safe and informative as static.

A key feature of a

A key feature of a disruptive technology is that it enables something that is prohibitively hard today. If PLs are a hammer, disruptive technologies are about targeting new nails. So, as you put, dialogue systems may enable programming-on-the-go.

However, what could static/hybrid typing enable? Verifying languages, operating systems, and software is underwhelming; we build them anyway. Harking back to John Hughes and Why FP Matters [#modularity]: perhaps software magnitudes safer and bigger than what there is today. This is still too vague, so can we get a concrete and compelling nail to hit? If we really believe in "type-rich" languages give an expressive advantage over unityped ones, what can we now build with that expressiveness? One thing may be languages for building languages: Backus's Turing Award lecture was on one such language to rule them all.

For anything else that is proposed, I suggest a similar validation.

Its about usability: the dynamic typing people are unhappy with static typing because of its inflexibility and that it enforces a top-down rigid structure. The static typing people are unhappy with dynamic typing because they don't get any semantic feedback. The nail to hit is to make both peoples happy, which I think would mean that our programming experience is vastly better.

Today we have to make painful language choices...and I don't believe hybrid typing is really the answer, where you can choose dynamic or static for a variable, choosing your tradeoff but still making it.

So what is a milestone in

[oops repost..??]

So what is a milestone in

So what is a milestone in semantic feedback? I'm looking for "if I get X I can now do Y" where the linkage between X and Y are fairly obvious.

For example, "types may be the killer technique for reasoning about big, gnarly coordination problems so now I can get a crowd to control the same robot."

I'm a bit biased here with

I'm a bit biased here with my own agenda. I really think of types as being essential to good programmer GPS...i.e., code completion to help you find APIs during programming. I'm working on an Onward paper on this right, will have something more substantial by the deadline (April 13th).

Killer features of type

Killer features of type system:

• Code completion
• Catch stupid mistakes as you type
• Writing/reading the types helps you to understand the program
• Better error messages. You get a red underline on a function call with "function foo expects an int, got a string" when hover your mouse over it, rather than a random function that foo calls internally producing a stack trace -- or worse if callbacks are involved: you might get an error out of a completely unrelated part of your code. This makes me think: for a given stack trace you can also put a red underline on all calls that appear in the stack trace.

All of these improve programmer productivity. It's not that you couldn't do a specific thing X, it's that you can do X in n/3 hours instead of n hours. Of course, dynamic typing wins on other points. Hence the value in combining the advantages of the two.

I can't say much here to

I can't say much here to avoid a friend getting scooped, but I've played with both typed and dynamic/synthesis code completion tools, and I'm very unexcited by the typed ones right now :)

Prospector is still one of my favorite papers here, but the typed view is archaic and misses a lot of the important data in practice: it should include sloppy english (insert vs add vs put vs ...) and concrete data (imagine trying to generate some jQuery selector where you have an instance of the document for which you can show input/output pairs).

At a higher level, I don't see the X->Y relationship here, and believe there's a better X'. Maybe there's something where Y -> X -- if we can do Y, that means we (probably) built X to do it? Furthermore, the code completion might be X -- mechanism is unimportant -- with Y being non-expert programming.

Disruptive things that have

Disruptive things that have happened:

• Rails disrupted the web development landscape
• Javascript VMs improved an order of magnitude
• (Re)discovery of the usefulness of XMLHttpRequest (AJAX) enabled web applications
• The combination of the previous two points leading to widely adopted "standard libraries" for Javascript (like jQuery)
• C# async support brings easy asynchronous programming to the masses

What will be disruptive:

• DSL for specifying interactive UIs. XAML is on the right track, but much more can be done.
• Programmers will stop editing character arrays, realizing that character arrays are not a convenient representation of code (neither for computers nor for humans). Fueled by touchscreens.
• Better UIs for programmers in general: live feedback on code changes, unification of undo with version control, debugging with a recorded trace with customizable visualization of data structures over time, etc.
• Automatic client/server partitioning for web applications
• Predicate dispatch will finally replace single dispatch in at least one new dynamically typed language
• With the advent of key/value stores, the next logical step is transparent persistence of native objects to a key/value store
• A practical dynamic/static typing hybrid will be devised based on abstract interpretation of first class contracts
• A practical whole program optimizing compiler will be implemented for a well behaved [1] but dynamically typed language (like Stalin, but able to compile large programs in reasonable time).

[1] Not like Javascript or Ruby, where higher order behavior (in the form of objects) combined with mutation is rampant. More like Scheme, without first class continuations.

A refinement to the

A refinement to the transparent persistence problem: simple distribution of shared data. E.g., sharing with both oneself (across devices) and others. Get that + UI programming, and you have a good chunk of the mobile app space covered :)

Yup, that's what I meant

Yup, that's what I meant with automatic client/server partitioning. I first thought of web applications, but of course the same goes for mobile :) Can be generalized to peer-to-peer sharing, but in practice it's going via a centralized server anyways, so we might as well try to tackle that easier problem first. There are issues that need to be addressed: security, latency reduction, data model upgrades, etc.

Cloud services like E2,

Cloud services like E2, Heroku, Azure, make it fairly easy to assume that the future involves some scalable centralization; P2P doesn't seem that important anymore. Perhaps what we need is "transparent cloud integration." I've tried out a few ideas that focus on making distribution transparent, but none of them have really been practical (mainly b/c these abstractions have leaked like crazy).

Actually, it's even more

Actually, it's even more rosy. The speed of modern hardware is so great that the vast vast majority of web sites or mobile apps never need to scale beyond a single server. For example suppose your web site has 10 million pageviews per day. That is around 100 requests per second. Any cheap hardware can handle this with ease if you cache your data in memory and don't use a language that's 200x slower than C.

P2P doesn't seem that

P2P doesn't seem that important anymore.

There's P2P in implementation and in functionality. P2P implementations are generally an unnecessary complication in the internet age (though obviously not always). However, based on watching senior ugrads at Berkeley build collaborative/interactive mobile apps where data is shared, especially in real time, the P2P functionality scenario is far from solved. Basically, we can barely handle manual merges with stuff like Git, and require locks and social convention in stuff like Google docs: this was hard even before you throw in the other 'expected' features (security, versioning, notifications).

There are some cool frameworks working on this; it's a real problem (... and opportunity).

Finally, I'm actually not worried about the performance side; that seems solvable once you have a usable model.

For connected devices, I

For connected devices, I think this is not a problem. What worked for me is this very simple model:

• We have the "true" state stored on the server
• Users edit the state in their clients
• On every edit, the edit gets sent to the server
• The server serves as a synchronization point that gives the updates of all clients a total order
• The server distributes this totally ordered sequence of updates to all clients
• The clients synchronize their state accordingly

Of course, this only works perfectly when everyone is connected, but it still works great even for off-line edits. For general off-line you will always have cases where you need a human in the loop.

My point is that, when it's

My point is that, when it's as easy to add an collaborative form widget as it is to add a normal one, I think we'll see a lot more collaborative widgets :)

P2P doesn't seem that important anymore?

On the contrary, I expect a resurgence of P2P now that browsers are getting P2P-enabled.

I would even go so far as speculate that the browser will become one of the most important server platforms. Why? Once browsers are P2P-enabled, running a "server" will be as easy as visiting an URL, and keeping the tab open. That's disruptive!

From a PLT perspective, P2P means a much more leisurely approach to PLs. Given that you're running on your users' hardware, you can dedicate many more cycles to every one of them.

[It's easy to see that for many - not all, of course - application providers, going P2P will be more economical than running a data center. I expect that free software especially will leverage browser-to-browser communications.]

I would love to agree with

I would love to agree with you, I think P2P is a great idea also especially living in a country where the government wants to control all information (and blocks the likes of Heroku).

But I'm seeing a trend now away from P2P, basically since the most interesting devices are mobile and cloud is becoming fairly cheap.

Disruptive things that have

Disruptive things that have happened

I'm still surprised that no one's mentioned how monads have gone mainstream via the LINQ API. .NET now has monadic queries over lists (IEnumerable), for asynchronous programming (IAsyncResult), for mobile code (IQueryable), for reactive programs (IObservable), and that's just the MS-provided libraries.

unification of undo with version control

Funny, that's exactly something I've been playing with. It's a natural data model for web programs, which must support an undo of sorts for the back button.

Is LINQ really about monads or is it just inspired by monads and composition in general? I think the continuation trick is nice, but its an implementation detail, the point about monads I thought was that you were using them to manage your effects explicitly.

I considered including

I considered including Monads/LINQ, but I did not because LINQ is both less and more than monads. First, LINQ only has the bind monad operation and it does not have the unit operation (you are of course free to define one, it's just not part of the syntactic sugar nor part of the documented interface). Second, the bind operation is just one of its many operators. Still, LINQ is worth including in its own right even if it is not exactly monads.

Do you believe LINQ will

Do you believe LINQ will lead to more reactive etc. applications? What is the "Y" that gets enabled that is currently too hard and is clearly enabled by LINQ? This still seems vague..

LINQ has enabled order of

LINQ has enabled order of magnitude simplifications in all the domains I listed above. There is no intrinsic application that couldn't be done without it, but it certainly would have been significantly more difficult and error-prone. As for whether the reactive LINQ extensions will lead to more reactive applications, I can only assume so. Most of the demos on Rx.NET involve reactive UIs.

This really isn't compelling

This really isn't compelling as currently phrased -- everyone is *already* building UIs for applications. From this perspective, I'm tempted to put it into the category of incremental improvements.

LINQ may be the approach to achieving bigger disruptions. For example, data-centric programming where data is automatically reactive, secured, visualizable, etc. and current techniques are too burdensome for achieving all of these together. That sounds more disruptive.

This really isn't compelling

This really isn't compelling as currently phrased -- everyone is *already* building UIs for applications.

They aren't building them with an order of magnitude less code. The equivalent event-based code would be a nightmare.

Order of magnitude simplifications permit order of magnitude increases in complexity, along the lines of what you describe.

LINQ is still not used often

LINQ is still not used often for event-based code, so I don't get what you mean here (maybe you mean Async rather than Rx?). LINQ makes comprehensions a bit better, for sure, but is it a disruptive change?

LINQ may be the approach to

LINQ may be the approach to achieving bigger disruptions.

I'm not sure it's completely disruptive, but the adoption of LINQ and JavaScript mean that more programmers than ever are familiar with at least basic functional programming concepts. This is what I see as the bigger immediate impact than the API.

I don't think that this will lead to wider functional language adoption, but will allow more complex concepts I think to be introduced without the adoption problems that they would have encountered even a few years ago.

Some nostalgia in the

Some nostalgia in the link...remember when Real was a real major player in video and audio? Now I don't even know how to open their videos anymore.

Replacing for-loops by

Replacing for-loops by higher order functions is properly disruptive since it replaces something which works for the masses by something which could possibly work for them as well or even better.

You cannot disrupt something which doesn't exist, doesn't work at all or which only a few specialists can handle, such as interactive theorem proving. You could disrupt Java by Rails or commodified software by open source. "Yacc is dead" was a disruptive slogan.

When Rails came up it wasn't only Java and J2EE that were attacked but the inheritance of the 1950s style "organization man" IT work culture. Disruptive technologies are celebrated like liberations, not like incremental improvements and they are hated with passion by counter revolutionaries, which are going to "debunk myths". They don't even have to be excellent if they have a strong momentum and are promising.

What is the connection between disruption and technological fashion? The legacy of "cloud computing" may be NoSQL. "Big Data" is often even defined in terms of a bundle of technologies such as MapReduce, Hadoop and R. NoSQL was clearly meant to be disruptive but the mentioned "Big Data" techniques are rather supplemental. The disruption may happen on another level though, if at all.

We are still bad at finding counter examples but since we are at it: what happened to Simonyi's "Intentional Programming Suite" which had impressive demos already in 2001?

I think there's an emerging

I think there's an emerging possibility of farmed-out computing. This is based on:

* Homomorphic encryption: The ability to perform useful computations on encrypted data.
* Widespread Javascript: Many Web sites now require Javascript, and browsers will happily download and run whatever they're asked. Every Web site has its own BOINC farm, which scales as the number of users (since it is the users themselves).
* Sandboxing, proof-carrying-code, etc.: The ones making computing resources available (cloud hosts, browsers running Javascript) don't have to make security compromises to do so. This also works the other way: servers can trust results sent from untrusted clients, by verifying the accompanying proofs.

It's already possible to create Web sites where Javascript does all the rendering (ie. Javascript templating engines); servers just need to decide what to send.
It is already possible to create sites that put all control-flow into the browser (ie. continuation-based frameworks); servers just need to check permissions for access requests.
With homomorphic encryption and proof-carrying results, servers can farm off any other processing they need to clients; servers just need to coordinate browsers.
If browsers allow P2P communication with each other, even this can be farmed out.

In the other direction,

In the other direction, there is the negative disruptive trend of walled app gardens that threaten to seriously hamper PL innovation. Rather than automatically ensure the quality/safety of the code, they just vet it and assign it a reputation. Unfortunately, this approach basically precludes downloading any third-party code in your app outside of the Javascript sandbox.

I've been thinking...

JS compilers seem to be reasonably competent at handling first-order imperative programs. So the strategy of generating JS which allocates a big typed array and then treating it as main memory for a program manipulating a handful of integer-valued register variables seems like it could work pretty well. (That is, it could get you gcc -O0 quality performance.)

This really only leaves a factor of 4 or so on the floor, which is small enough that it is not inconceivable that an agressive MLton-style compilation scheme could recover for you. (And MLton-style compilers could do a lot better if there were fewer effects: really we need a pure (not ML), strict (not Haskell) language for the source.)

Code size seem to be tractable, too. The emscripten experience is that with minifying and compression, the generated JS is within a factor of 2 of the size of the comparable binaries. I don't know about browser caching strategies, which do present a potential worry.

That is already happening. Emscripten perhaps is not really a mature tool, but the use of commercial products like Mandreel is pretty real. At least real enough that engines like V8 feel pressure to optimize for such use cases.

Hey, that's great!

Maybe you could write a blog post for the Chrome blog where you outline what kinds of code are most favorable for V8...?

Not so easy

That sort of request comes up often, but beyond common sense and well-known rules of thumb, it is not as easy as you might think. Today's JS performance model, if you can dare call it that, is very complicated, dependent on far too many dynamics, and evolving much too quickly to distil (and maintain) a set of simple and stable rules. Also, we don't really want to encourage optimizing for the version of the day when that is outdated already with the next release. Rather, we prefer people to write natural code, and when they observe serious unexpected performance problems, point us to them so that they can be fixed.

Here you go

Ex-V8 programmer (and now Dart team member) Florian Loitsch has recently been writing blog posts which cover some of what you're looking for.

The change in trust model

The change in trust model here is particularly fascinating. Considering using several EC2-based apps. A lot of performance possibilities are enabled -- the apps can share big data -- and sharing between the apps can be moderated by EC2. For example, I can let app X give data to app Y, but require that the data disappears at the end of my session: EC2 can pretty easily provide that guarantee for app Y (run in a tainted mode), and, transitively, to the user. Likewise, EC2 may allow me to control the data app Y has, such as delete it (or anything tainted by it).

End-to-end formal partial certification of a complete system

I envision a big software system (for example the software that runs on my personal computer) that is "certified" end-to-end, with a continuum of verification techniques, from full functional correctness to much weaker guarantees, according to the cost/benefit analysis of each component, but where the different techniques can talk to each other instead of just being informally glued together (mr X. verified the compiler in proof system Foo, my linker was verified by automated system Bar, and I believe the two formalizations coincide).

More specifically:
- The programming languages should have a formal semantics
- The standard libraries also have formal semantics, and their behavior is formally proved
- But maybe the window managers don't, because specifying this is too hard. Instead, only some behavioral properties/theorems/invariants are specified. My window manager is proved in a proof assistant, because it's a cool and simple one, but the one my neighbor uses is only checked through random testing. But the fuzzer tests for properties expressed formally, and can map them to the formal semantics of the implementation language (so that when given the program for white-box testing it can actually exhaust all executions paths, or maybe call a validated SAT solver to do a best-attempt); the fuzzer is itself certified because it's a component very relied upon. My neighbor's neighbor's manager doesn't even use white-box testing, but only enforces invariants dynamically through contracts. Oh well. At least when it crashes, we know who to blame.
- All this of course rests (through a formally-certified compiler) on a formal model of the hardware behavior; nobody knows how the hardware guys came with the model (it's still a proprietary technology), but if there was a bug they would loose almost all their clients, so that must be pretty thorough.

I believe that would be quite disruptive, and needs important progress from both the programming languages and techniques and the "verification" tools (test libraries, contract systems, fuzzers, type systems, automated theorem provers, proof assistants...) to be made really practical -- but I think we're past the "Social Processes and Proofs" stage and have convincing evidence that formalized specification and verification is possible on a large scale.

As a first step, I would already be quite happy with a large-scale adoption of capability security (Caja, Capsicum); I'm quite confident we can get order-of-magnitudes improvement on the amount of unneeded authority (or needed because of bad design) our programs run with. That would be an extremely effective way to mitigate damage of program vulnerabilities. Regarding disruptive applications, see dmbardour usual motivations -- or lmereyov's cloud dreams in the present topic. But I'm not sure how much this is PL-related; I believe that could already be done in existing mainstream languages, with better system designs and a bit of pressure on the application programmers.

How does mass-scale adoption of this disrupt anything? I can see, maybe, blackhat/whitehat companies being closed, but are people now building something they couldn't before? Maybe yes in that budgets can reallocate towards developers working on other things..

Imagine if we had rock solid

Imagine if we had rock solid systems (100% verified), what would change compared to today's world? Honestly, I can imagine everything would be a bit more efficient, but not drastically different. But I could be underestimating the productivity increase on the programmer's side?

Unfortunately, for the

Unfortunately, for the foreseeable future there is a productivity decrease, not an increase. Until this is turned around, formal verification won't be disruptive.

There are two strategies here: (1) start with a specification language that gives 100% static guarantees like Coq, and try to make it more expressive and automate as much of the proving as possible (2) start with a non-formal but expressive specification language like contracts, and try to get static guarantees out of it (e.g. with abstract interpretation & automatic and human aided theorem provers). Even if our theorem prover didn't succeed, we can still do more than just runtime checks. We can give our functions randomized inputs to trigger contract failures. We can do whitebox testing with SMT solvers.

(2) seems much more promising to me, especially since you can probably get productivity improvements from day 1 just by using contracts. Unfortunately, almost everybody is doing (1).

I think you misunderstood my

I think you misunderstood my question. If verification was easy and convenient, would the world change very much? I could imagine the bottleneck is then defining problems formally so they can be verified (most systems in industry are difficult to define even informally). But then there is semi-verification that simply proves the absence of some bug classes and requires no specification (e.g., no null pointer exceptions).

I see life being better, but not disruptively so: some money and lives will be saved, but nothing hugely significant. It might allow us to build drastically larger systems, but if formal specification is the bottleneck, then we would have to solve that problem first before we would ever be able to take advantage of the benefits of verification.

I think we are in agreement,

I think we are in agreement, namely that the expressiveness of formally specified properties is of utmost importance. Contracts ("assertions on steroids"), I think, are already expressive enough for many properties that you want to verify. Though obviously not for specifying complete functional correctness (and I doubt complete formal specifications of any form will ever justify their cost for the reason you mention: most real world systems are about as hard to formally specify as they are to program in the first place -- sometimes programming them is easier because you don't need to do a formal model of the outside world).

Whether it is really disruptive even if it was easy to verify properties I do not know, but probably yes. Imagine if you could write an arbitrary assertion in the middle of your program, and the compiler could tell you whether that assertion can fail. That seems extremely useful to me (though we can't yet do that yet for most interesting assertions).

Already there are people out there who use dynamically-typed languages because they don't think the benefit of statically-typed languages outweighs their drawbacks. Say we get verification, but at the cost of some flexibility + extra work for making declarations more expressive. We would then be in this spot again, where the tradeoff isn't so clear.

Actually, having the assertion there itself is already very useful: it will blow up at run-time as long as I have enough test cases. What is then the significance of added value of being able to verify that assertion static? Could I then get rid of all my test cases? Actually, that could be huge: "compiler, here is my arbitrary assertion, now tell me how it doesn't hold (if it doesn't)."

Yes, exactly

Yes, exactly. Also, a type signature is just an assertion (or contract) that asserts that something has a given type.

var s = ...;
assert(s is String);

-vs-

string s = ...;

Our current interface to type systems consists of the responses "this is always going to hold" and "I couldn't prove that this is always going hold so I am going to refuse to run your program". This is a rather limited interface. Other responses are useful, like "I know for sure that this can fail in some cases: here is an input that makes your assertion fail" or "I proved that this assertion is going to fail for *any* input" or "I couldn't prove that this is always going to hold, but neither randomized testing nor whitebox testing produced an input that makes this assertion fail". Current type systems basically don't distinguish between any of these, and instead treat all of these cases as identically giving a static type error. Of course, you want these responses to be color coded underlines just like a type error gets a red underline in IDEs now, with the possibility to hover over it for an explanation. A red underline for assertions that have a known example that makes them fail. A yellow underline for assertions that have no known input that makes them fail. No underline for assertions that were proven not to fail. Alarm bells ringing for assertions that were proven to always fail for any input.

Try this and you will be amazed: Pex For Fun.

We can do other useful things. For example with randomized testing you can correlate assertion failures with which lines were executed. For example:

if(i < 100){
x = "foo";
}else{
x = 3;
}

var s = x;
assert(s is String)

If we do randomized testing with random i's we can see that the else branch is highly correlated with the assertion failure, and give this information to the programmer. You can get this information out of whitebox testing too.

Our current interface to

Our current interface to type systems consists of the responses "this is always going to hold" and "I couldn't prove that this is always going hold so I am going to refuse to run your program"

I think that's a mischaracterization. It's more akin to, "you claimed X and !X simultaneously, so of course I can't run your program". The nature of X that one can claim are set by the type system. Some are more flexible than others, and so permit subtler distinctions which avoid semantic contradiction. The example you cite is a path-sensitive semantics, and one could in principle derive a type and effect system for this.

I suspect Jules had dependent types in mind, because I think his characterization describes the situation there pretty well.

For the record, I believe

For the record, I believe the characterization is correct when applied to most currently-used static type system: they accept programs that they can "prove correct" (in the "they will not go wrong" sense), and reject the rest, including programs that are correct but that they can't verify.

This is most apparent with dependent types, because then the "proving correctness" is really obvious, but that also holds of, say, ML-like type systems. Of course statically-typed languages with a weaker type system than that (C?) tend to prove nothing at all.

The dual approach (that he also mentions) is "reject only programs that I can prove incorrect", and is used in some static analysis on dynamic languages. One benefit of that point of view is that this often allows to produce counter-examples (this is not correct because *this* entry will provoke a bug); exhaustiveness checker for pattern-matching also do this.

To sum up, I do think he's right in this description of static type systems. That's a limitation of all formal system; the pragmatic workaround is to include an "escape hatch" in the program to introduce unproved assertions. One can also rewrite the program so that the difficult invariant are checked at runtime instead -- runtime failure between another escape hatch out of the "do not go wrong" guarantee.

Edit: I believe one reason why Jules and I would disagree with naasking could be the classic divide between Church-style and Curry-style ways to think about typed programs. Are programs intrinsically typed objects (eg. typed lambda-terms in some type system), or are there untyped description of a dynamic semantics that *can* be accompanied with a well-typing derivation? On the precise matter of "which programs do my compiler reject", I tend to hold the second view: all untyped programs could be thought of, and it makes sense to ask if there are "good programs" that cannot be given a derivation in the fixed type system considered.

I suppose naasking is taking the first point of view: thinking of all the programs as already-typed objects, and saying that if any program is rejected by the type-checker, then it is, by definition, a nonsensical program. His position leads to the same conclusion (.. at some point we need stronger type systems to write more programs, or at least pragmatic escape hatches), so I don't think it's wrong, but it just seems less intuitive to me. Yet in different situations I prefer the Church view.

(My way to remember which is "Curry" and which is "Church" is that Curry as a logician has participated to the early work on type inference: inference is about writing untyped programs then recovering typing derivations.)

Type-dependent behavior,

Type-dependent behavior, such as type-based dispatch and operator overloading (or typeclasses), makes it difficult for me to accept the general view that "all untyped programs could be thought of, and it makes sense to ask if there are "good programs" that cannot be given a derivation in the fixed type system considered."

If we remove types from the behavior of a language, then we can get pluggable type systems. This is an approach I favor at least in theory, though I haven't tried it much in practice.

Elaboration

Type-dependent behavior is often explained through an "elaboration phase" where term markers are inserted at the relevant places in your program (in a way directed by the types), so that after this annotation process only the terms (including those markers) determine dynamic semantics -- you have "type erasure".

For examples, type-classes in Haskell are type-dependent behavior, but you could also say that they are a way to use types to infer "the real program inside", that is one with explicit dictionary-passing (.. or any other implementation of type classes) and instance selection, and which is not type-dependent anymore.

So you can still consider that you have a type-independent "core language", which has nicer properties, but that you expose to your user a second "surface language" that, among other things, use types for the usability purposes of reducing program writing burden.

(The view that types really are important is also fine. And with sufficiently sophisticated types systems with dependent types, it can be difficult not to take it.)

Types by any other name

The elaboration view may benefit an implementer, or even a programmer who wishes to be closer to the metal. But it does nothing to disentangle the source language from its type system. Whenever you need to perform a program-wide transform to explain something, you've essentially changed languages.

If you could model elaboration with a syntactically local transform to the same surface language, then you really could claim a core language. But this condition doesn't apply.

Regarding your last point: I think the sophistication of the type system has nothing to do with it. Dependent, structural, and sub-structural typing are all compatible with pluggable types so long as the types are passive - no effect on semantically observable behavior.

What you're saying is true

What you're saying is true in a literal sense if you don't have type inference, but in a practical sense that is what type systems do. Take the following example:

class Foo:
def constructor(f,x):
this.f = f
this.x = x
def invoke():
this.f(this.x)

def func1(x): print(x + 1)
def func2(y): print(y + "baz")

list = [new Foo(func1, 3), new Foo(func2, "baz")]
for item in list:
item.invoke()

If you rewrote this in C# you would be forced to give it types that inevitably result in you claiming something that the C# type checker considers contradictory. But that is a fault of the type system not being powerful enough: the actual program runs just fine. With existential types, you can do it.

But you could still not do this one:

class Storage:
def constructor():
this.storage = new Array()
this.key = 0
def makeRef(val):
this.storage[this.key] = val
ref = new Ref(this.storage, this.key)
this.key++
return ref

class Ref:
def constructor(storage, key):
this.storage = storage
this.key = key
def get():
return this.storage[this.key]
def set(val):
this.storage[this.key] = val

s = new Storage()
r1 = s.makeRef("foo")
r2 = s.makeRef(3)

Say your language has type inference. Given the above program without type annotations, it will now try to construct a proof that no type errors will happen (namely, it will try to find types and type check them). So if your compiler refuses to compile the above program, it did refuse to compile it because it was not able to prove that no type error can happen.

Of course, this point only applies if you subscribe to the Church (or was it Curry?) school of thought. The other school says that programs without types don't have meaning in the first place. I find that a bit of a cop out in relation to this issue. And you can rephrase the issue as "either your type system is not powerful enough or your type inferencer is not powerful enough". Though in this case the Church point of view is more useful, as it also allows you to think about the other kind of static analysis that tries to find a counter example.

As Matt notes, this is more pronounced if you are trying to specify stronger properties. For example, if you have a library procedure log() that requires its argument to be positive, then you'll likely have a lot of cases where your tools fail to prove that this precondition is satisfied.

Edit: apparently gasche meanwhile wrote largely the same explanation :)

Pex for fun

i got slammed immediately with "possible negation of minvalue of type int32" so that was a nice hah ha.

Who'd have thought that

Who'd have thought that writing an abs() function was that difficult ;)

(although arguably that's a language design failure because ints aren't integers)

The type system I'm working

The type system I'm working on could maybe work with that. You create a value "0" and it is integer, if you just add to it, its still integer; but as soon as you subtract, you move to a more general int.

Now the only problem to overcome with this type system is its seemingly need for whole program analysis :).

In this case the problem is

In this case the problem is not so much negative numbers in general, but that an int has a minimum value (-2^31 or equivalent) and that the negative of this minimum value exceeds the maximum value (2^31 - 1).

As for your type system, it seems very interesting. I don't understand why you need whole program analysis though? If you have type signatures for every procedure/method then everything can be inferred locally?

Ah yes, overflow, I wasn't

Ah yes, overflow, I wasn't thinking about this. Isn't the hardware cheap enough now to just trap and throw an exception? I once repurposed a SAT solver to try and find these errors, but it seems like a hard problem in general...and perhaps most of our problems with it would go away with dynamic faulting?

The need for WPA arises from overriding and function pointers. Perhaps type signatures could be computed based on use and something like a Liskov substitution property could ensure that method implementation do not violate the previously observed behavior during their use. But I'm not sure what this really means yet.

Safety disruption

I can see, maybe, blackhat/whitehat companies being closed

Oh, no, I don't think so. Formal methods can propose a continuum of progressively stronger (but costlier) techniques to be confident about one program's correctness, but there is still a fundamental formal/informal gap at the edges and one should no believe that there could be "absolute correctness".

If I start proving correctness properties of a program that is security-sensitive, attackers will:
1. try to find places where the formal specification doesn't do what I expect (that's a "bug" in a formally verified software)
2. attack aspects of the software behavior that were not specified; for example I have specified functional correctness without considering intensional properties such as response times, energy consumption, and am exposed to side-channel attacks
3. consider the outer picture to attack an other part of the system; maybe the security policy is just good, except the people rightfully at the top can be bribed; or the implemented and duly certified protocol actually fails to ensure some desirable properties (eg. the definition of privacy it used is not strong enough with respects to new mass-deanonymization techniques; that's not a case of 1. because the informal specification also didn't consider this)

So I think there will always be this attacker/defender competition, and "absolute safety" is too beautiful to be true. That said, while there are technologies that are of interest both to attackers and defenders (eg. code obfuscation techniques) (see a short essay on the question by Bruce Schneier), powerful verification seems to be mostly of advantage for defenders. So we could see a relative decrease in computer attacks volume; I do think the average user would be safer -- as would already be the case with non-verified system design having better respect for the principle of least privilege. But surely there would still be people with enough money and power to effectively break things when it's interesting enough; and some of the others would simply move to greener pastures -- would shoplifting rise if credit card fraud become significantly harder?

but are people now building something they couldn't before?

Well first I think mass-scale adoption of usable static checking techniques could provoke sizable constant factors productivity gains. I've read numerous time (but never with really serious numbers to back that up) that debugging/maintenance costs were at least half of the total production cost of program. If you can remove a sizable fraction of this cost by detecting defects earlier, you can make big gains. A 30% reduction is software production cost is not a ten-fold change, but it's still a non-neglectible constant factor gain that is as interesting as a lot of things in the dcurrent "disruptive" list -- I would like live code editing very much, but I think for most of my personal use cases the gain wouldn't be 30% in total.

There are a lot of unit tests being written out there that could trivially be automatically tested by feeding the intended specification to an automated program verifier.

In fact I think that there are a lot of analogies to be drawn between what you would gain from pervasive visualization of programs behavior, and pervasive static verification (proof or test). They're both about exposing the program semantics, and both benefits when you can compose different subsystems together to get verification, or visualization, of the big system. Just as you can see "live" thanks to visualization when a program's behavior changes in a significant good or bad way, a check turning red with "this invariant is now violated" will provide useful feedback. Probably they are useful in different situations -- I wouldn't know how to visualize everything I prove, and conversely -- but I suspect argument for one could be turned into argument for the other, or conversely. Of course there is a difference that specification leads itself to automation better, while visualization is more useful in situations where you don't know what you're looking for.

Disruptive generality

Currently, most library encodings fail to cleanly separate many relevant concerns and instead encode ad hoc choices that limit the applicability of those libraries in certain ways. For example, a decision to use a certain data structure or to use OS threads may limit scaling to a large number of instances but is a reasonable choice for the primary use case. Even if a library tries to cover the 95% case, there are many dimensions along which a library may be unsuitable and so these probabilities of unsuitability tend to multiply. The language angle is to find techniques to untangle the concerns and produce libraries that faithfully encode domain expertise in various areas, so that they are almost always applicable without artificial limitation. This would hopefully result in teams building upon each others' work instead of perpetually rebuilding foundations.

I'm not sure if this counts as a disruptive idea, since I don't think there is a single "magic bullet" that will get us there (AOP was basically trying to solve this problem), but rather this will take lots of domain specific thought. But the language techniques that help organize this domain knowledge will be important and could maybe be considered disruptive.

There are many ways about this problem. First, we could "just" improve our abstractions so that we can encode logic more separately from how it is used. Ya, just...not that easy as we've seen with AOP :).

The whole area of code mining in the SE field is related; basically this is tool support for the programmer archaeologist approach. I really hope we can eventually apply machine learning to this problem: vast and powerful Watson/Google-like systems process all the worlds code to help you write new code that has similar logic but is maybe mashed up and used in a different context from the original code. That would be disruptive, but we are maybe 20 years away from that; and like fusion we will probably still be 20 years away 20 years from now.

SMT Solvers

I think getting developers standard access to SMT solvers to the same degree they currently access map and list structures would easily become a disruptive technology.

We're already almost there! But I agree completely: SMT solvers are a generalization of finite domain constraints, plus you get more efficient solving by leveraging CDCL SAT solvers (though I don't think Z3 has efficient finite domain constraints at the moment). A related useful thing is a convex optimization library. Embedded datalog and embedded constraint handling rules are other useful things. And bayesian inference.

Merging of PL and Databases

With relational logic, persistence (not necessarily orthogonal), transactions, and concurrency.

The big disruptive trend in

The big disruptive trend in databases in the last few years has been the realization that you don't often want databases: key value stores (NoSQL) and graph databases (Neo4j) are often more desirable because you need to make a completely different set of tradeoffs (scalability, performance) than more conventional ACID relational.

Now, in that context, we could actually look at this problem from a variety of angles in PL. We've already sort of explored how to integrate key value stores in PL (I think), but I don't think we have really looked at graphs (the big-data kind) yet.

The NoSQL movement will die

The NoSQL movement will die out as a fad, I think. The problem isn't so much that ACID databases can't scale, is that the ones that do cost way too much today. It was the same for small-scale relational databases 20 years ago, and now there are many free alternatives.

Currently working on a project migrating away from a MySQL installation to a "more scalable" solution with less relational guarantees, my team is really feeling the pain every day. I wonder how many thousands of poor bastards in the world are implementing indexes, joins, and a broken transactional model on top of these "hot" NoSQL databases.

Processors aren't getting much faster, but there are more of them and memories and disks are still growing fast. The datasets that people think are too huge for relational DBs today will fit easily in one machine or a single rack in 5 or 10 years.

The PL opportunity here is huge, I think.

Not really. There is big

Not really. There is big money in this and if you look at all the work being done by Google and Bing, its quite amazing how things have shifted so quickly. Both companies have plenty of money to buy decent databases (I heard Microsoft even has a database product), but the simple fact is that relational databases just don't work for many big data problems. Its not that machines are too slow, its that they have to be seriously distributed and so maintaining ACID and high performance is not really possible (at least, this is how its been explained to me).

If you come into a NoSQL with a relational mindset, it is definitely very painful. That is why all this stuff comes from and and is used by the systems (and lesser extent PL) communities, taking the DB community kicking and screaming with it. Now..there are still problems where relational ACID is still the best solution. I see some companies making bad decisions based on hype, but isn't this always the case with disruptive technology?

I believe the merging of PL with relational DBs was a holy grail topic in the 70s - 90s, but it never really went anywhere; most people have given up on this dream for now. What is the new dream?

ACID

Google's App Engine is ACID. I also believe that non-ACID NoSQL will die out.

NoACID?

I would not be surprised if we continue with relational in a non-ACID form. Though, I sort of hope SQL in particular dies out, and gets replaced by something sane.

Transactions are only necessary because we favor non-monotonic updates to our databases. If we change our design patterns, we can use streaming SQL. Use of temporal data can augment this by modeling deletion as an insert.

CALM offers an effective alternative to ACID. We'll need new idioms and design patterns to master it, but I think it capable of meeting a great many needs.

Funny, a world with ACID

Funny, a world with ACID NoSQL and non-ACID relational.

Sounds familiar

Currently working on a project migrating away from a MySQL installation to a "more scalable" solution with less relational guarantees, my team is really feeling the pain every day. I wonder how many thousands of poor bastards in the world are implementing indexes, joins, and a broken transactional model on top of these "hot" NoSQL databases.

Oh, I feel your pain. Same problem on my previous project. I sometimes wonder how many projects have already wasted their time greenspunning 1/2-assed, 1/3-broken, and 1/4-optimal relational database fragments on top of Bigtable.

Again, just because a new

Again, just because a new technology is misapplied doesn't invalidate its usefulness. We aren't going to be running Bing on top of SQL Server anytime soon (nor is Google).

We are still sorting out when to use relational and non-relational DB solutions.

You seem to be implying that

You seem to be implying that Google (the search engine) runs on a non-ACID (NoSQL) database. From what we know, Google previously ran on a custom batch-processing system (The Anatomy of a Large-Scale Hypertextual Web Search Engine), and switched to a custom incremental computing system (Large-scale Incremental Processing Using Distributed Transactions and Notifications) - with ACID transactions. The storage system it offers as a service, App Engine, also has ACID properties.

I believe Google has many storage systems, some of them ACID and some of them not (again, ACID is not bad, its just not good for all situations). But really, this is all second-hand information from the search people in my research group, who work on Bing of course :)

I guess the question is...for our dream of merging DB with PL, is there only one true model to shoot for or a bunch of different viable models that accommodate different use cases? Lately, I'm been thinking about PL support for large/volatile graph databases, where the graph structure is somehow inferred from the code that uses it.

A unified schema, basic types

At the very least we should aim for unifying the schema definition language to have a central place where the data model is at least described, even if the semantics of querying and updating the data vary greatly over different storage systems. I think the recent work on data representation synthesis is a good motivator for how useful such a specification could be to the PL world.

Part of unifying the schema definition is having some base (primitive) types....integers, strings, etc. It seems schema languages cannot always agree so far, even on that front.

I think Don Sym's type

I think Don Sym's type providers and information rich programming are related here, but it concerns more of the interchange of data between programs (and computers) rather than much about its storage.

Instead of adapting a language to use an existing database, you can also adapt the database to the language. The "schema" can be whatever data type definitions your language already provides. One simple model is the following: have a persistent reference, PRef[T]. You create it by doing new PRef(filename). When you store a value into the reference, it will be stored in the file. Additionally, it will attach listeners to any mutable references inside the value, so that if you mutate them the file can be updated accordingly. If you do this you'll probably want some facility to support efficient indexed queries in your language, and perhaps STM that integrates with the persistence. And you have to think about schema evolution.

Mechanizing law may be

Mechanizing law may be another disruption. At least in the US, very few eyes go over individual laws and those that would want to understand the ramifications of new laws cannot. This is definitely on the long-term side of things :)

A few off the top of my

A few off the top of my head:

Cloud computing like EC2 and Heroku make it trivially easy to throw hardware at a problem. That, in turn, makes slower and slower languages more viable for us in production. Cheap hardware and scalability is I think a big part of the reason you see Ruby, Python, JS etc. all being used for real apps now and not just C as far as the eye can see like it was a decade ago.

github and other open source code sharing venues make it trivially easy to create, share, and contribute to open source projects. That, in turn, makes it take much less effort to bootstrap the library ecosystems needed to get new languages off the ground and in real use.

Ubiquitous industrial-strength bytecode VMs make it easier to get new languages going by leveraging the performance work put into the VM and the existing libraries that target it. Would we have Scala, Clojure, and Groovy without the JVM? Would we have CoffeeScript without V8 et. al.?