Beyond LINQ: A Manifesto For Distributed Data-Intensive Programming

The LINQ project as embodied by C# 3.0 and Visual Basic 9 brings concepts from functional programming such as type-inference, lambda-expressions, and most importantly monad comprehensions into mainstream object-oriented programming. This is a definitively exciting for the programming language community, but realistically, it is just a tiny step towards democratizing building distributed data-intensive applications. To merely approach that goal there is still much work to do in (at least) the following areas:

Tools and IDE
It is fair to say that the the days of writing code using a text editor and batch compiler are over. Visual Studio, Eclipse, and Emacs are the norm rather than the exception. However, whenever you meet an (ex)-Smalltalk or VB6 programmer, they reminiscence the highly interactive development environment, scripters cannot live without their REPL, and tools like Ruby on Rails disrupt traditional development because of its simplicity and quick turn-around time. To simplify programming for the masses we need to shake of the yoke of the dreaded (edit, compile, run, debug)* loop and replace it with a lightweight (edit=compile=run=debug) experience.
Language and Type Systems
Writing distributed data-intensive applications naturally means dealing with many forms of data, relational, XML, objects, typed, semi-structured, or untyped. Current languages are not well equipped to deal any scrap of data, and much language and type-system innovation is required such explicit relationships, contracts,  layered type-systems, seamlessly dealing with both static and dynamic typing in the same program, extensibility, etc. The challenge is to package advanced ideas from programming language research in such a way that you do not need a PhD in type-theory to understand them.
Runtime and Libraries
Sometimes, and preferably, the compiler is able map all language and type extensions to an existing runtime such as the JVM or the CLR. However, often this is not feasible and we need to extend the runtime infrastructure to accommodate these new features. A prime example is the support for generics in the CLR versus the JVM, other examples include efficient support for first class continuations, query execution, etc. Obviously, a dynamic language is ultimately all about how dynamic the underlying runtime infrastructure is unless you emulate the dynamic features of the language, which kills interoperability. In addition to runtime support, new language and type-system features often need extensive library and infrastructure support. For example, in the context of LINQ, the language extensions are just the tip of the iceberg, and in fact the bulk of the work is in the libraries such as XLinq and in particular the OR-mapping infrastructure.
Transactions Everywhere
To make programming web services accessible to the masses, we need have to have a comprehensible way to deal with concurrency. In addition, on the desktop itself we need some way to harness the upcoming multi-core revolution. We believe that transactions are the most promising approach in this space. In fact transaction are the only way ordinary people can deal with concurrency, unless of course you are a sadomasochist who likes to wear black leather and play with locks.

As you can imagine, this is a lot of work and it will keep us language geeks off the streets for a long, long, time! And in case you are currently wandering the streets looking for a job as a compiler writer, virtual machine hacker, tool smith, etc. drop me an email. We have several job openings available.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

masses

"simplify programming for the masses"

I thought that was the goal of the likes of scripting languages like PHP or tools like RoR or powerful IDEs. Can it get any dumber so that even grandma can apply for a java job?

"package advanced ideas from programming language research in such a way that you do not need a PhD in type-theory to understand them"

nice jab at Haskell... ;)

abuse

"Can it get any dumber so that even grandma can apply for a java job?"
Presumably you wish to suggest that your own grandmas lack intelligence.

no

she's just too old for the computer thing. But, perhaps with a well suited "mass-market" programming language and IDE...

Good stuff

I really enjoyed the paper Static Typing Where Possible, Dynamic Typing Where Needed. It wasn't too technical for enthusiasts like myself.

I wonder where this leaves Java though. Traditionally, Sun has been reluctant to put new features into the Java language for fear of overcomplicating it, but subcumbed due to pressure from developers looking at .NET and C#. It seems to me that a possible course is to concentrate on the JVM/libraries and start afresh with a new language.

Previously on LtU

That paper was previously discussed on LtU here - I don't know if you've seen the discussion before, but thought I'd provide a pointer just in case.

Great

Thanks for the heads-up.

The Golden Middle Way

There is a wonderful new paper Hybrid Type Checking by Cormac Flanagan. He uses a statically undecidable type system (dependent functions and refinement types) with dynamic checks to ensure soundness. The hybrid checker also has more precision than a convensional static checker!

I wish I could agree with you about "Static typing where possible", IMO, it reads like flamebait.

Also, The link above has an extra / at the end.

ps2pdf

I've made available a version in PDF.

I want contracts == hybrid type checking

Note that the "I want contracts" paragraph in my paper describes exactly the idea of hybrid type checking.

Monad comprehensions?

I'm not a functional programming expert, so most of the papers about monads that I've read have gone straight over my head.. but from what I do know about them, it seems to me that C# 3.0 doesn't have monads. It is, after all, mainly an imperative language, and there's no need to pass around "the state of world" because statements are executed in order and have side effects.

Can someone point out which features of C# 3.0 are supposed to involve monad comprehensions?

The state monad is just one of many interesting monads

The state monad is just one example of many other possible monads, but the concept of monads is useful beyond that.

The monad comprehensions in C# and Visual Basic are used for queries, and the library provides "instances" for IEnumerable (the standard list monad) and for IQueryable (the query monad as in HaskellDB).

Hope this helps.

I see.. sort of. Can you

I see.. sort of. Can you recommend any introductions to the concept of monads from an imperative programmer's perspective?

Phil rocks!

Phil Wadler's original papers are still my favorites.

Phil does rock!

Mostly a me too post. Phil Wadler has a second sense with making good puns for paper titles and almost all his papers are very enjoyable to read.

You need to know a bit

You need to know a bit of Haskell or a similar language for this: What the Hell are Monads?

The same for this one: Monads for Functional Programming

Our very own discussion

Nice!

Thanks for the update... Recent C# developments are very exciting, and it sounds like you're really focused on advancing the state of the art on all fronts. It's very cool to see such practical progress, especially from an industry juggernaut.

I do really wish, though, that this stuff wasn't going to be tied to the Microsoft platform. Ethical or business disagreements aside, I've always been pretty uninspired by the Windows aesthetic, and as I don't expect you to re-invent the OS as well, I'm afraid I'll be stuck admiring your work from a distance. Obviously advancing the state of the art fosters competition, so I expect that we'll start to see some more interesting work outside of the Microsoft fold as well, but few companies can drive adoption like Microsoft, so it remains to be seen whether other offerings will gain (or retain) mainstream acceptance.

Also a question... You said, The challenge is to package advanced ideas from programming language research in such a way that you do not need a PhD in type-theory to understand them. Do you have any thoughts on this paper (GADTs and OOP, Kennedy and Russo)? Does it meet your criteria? It never got too much attention around here, unfortunately, although it's very good academia/industry gap-bridging work. Any chance we'll see this in C#?

I do really wish, though,

I do really wish, though, that this stuff wasn't going to be tied to the Microsoft platform. Ethical or business disagreements aside, I've always been pretty uninspired by the Windows aesthetic, and as I don't expect you to re-invent the OS as well, I'm afraid I'll be stuck admiring your work from a distance. Obviously advancing the state of the art fosters competition, so I expect that we'll start to see some more interesting work outside of the Microsoft fold as well, but few companies can drive adoption like Microsoft, so it remains to be seen whether other offerings will gain (or retain) mainstream acceptance.

Who says it's going to be tied to the windows platform? Haven't you heard of Mono?

Re: Haven't you heard of Mono?

Haven't you heard of patents? :-} Even if there is a claim that the ECMA-submitted core isn't hindered by patents, the fact is that anything which could have patents could have lawyers involved, which clearly then becomes a horrendously evil situation.

The same argument could be applied for any software under any license, admittedly. The reality is one has to try to guess the intentions of folks through their license and their past behaviour. Take a look at the history of MS (or IBM, or Sun, or even little weasels with JPEG or whatever patents), mix that with the license, and compare against things which are put out under, say, GPL.

About Patents

Haven't you heard of patents? :-} Even if there is a claim that the ECMA-submitted core isn't hindered by patents, the fact is that anything which could have patents could have lawyers involved, which clearly then becomes a horrendously evil situation.

Sun has lots of patents too. If you're going to worry about software patents then you might as well get out of the software field. If you write 300 lines of code, you're probably infringing on some patent somewhere.

Re: Patents

I think you are quite correct, that the difference between Sun and Microsoft when it comes to any sort of proof that they won't some day try to screw the little guy is zero. The issue of patents pretty much encumbers everything - very, very sad from my perspective. I guess as far as languages are concerned, the way to go is with something that was very GPL / BSD from the beginning? (As you point out, that doesn't guarantee anything either, but it would seem to be less blatantly fraught with peril.) The whole software patent thing seems so blatently corrupt and stifling and just plain wrong it makes me weep.

Of course...

Well, for one thing there's the intellectual property issues mentioned in another comment. But as far as I'm concerned, that's not really the main thing.

Even if all of the work that Erik et al are doing will run on Mono seamlessly and with binary compatibility (including Visual Studio, Linq, etc., which I highly doubt), there's still the fact that Mono is one of many smaller players in an open source ecosystem, and doesn't have the weight of a Microsoft to drive adoption. So I guess I have two concerns: (1) they're likely to be stuck on a treadmill, (2) I doubt it will acheive the pervasiveness of Java or .Net.

In fact, I think Mono is good and exciting work, and I definitely hope it continues to mature. But it's exactly the kind of project I was referring to when I said "we'll start to see some more interesting work outside of the Microsoft fold as well," with all the problems that come with that.

Mono is fine

LINQ will run just fine on Mono - it doesn't involve any changes to the runtime, just a couple new libraries and syntax changes to the C# and VB. You can compile a LINQ program with C# 3.0 and run it on .NET 2.0; there's no reason you shouldn't be able to run it on Mono as well.

Whether Mono can achieve the pervasiveness of .NET seems irrelevant to me. Mono and .NET are both implementations of the CLI. As long as Mono and .NET can run the same programs, there's no compatibility problem, and it doesn't matter which implementation is more pervasive. Getting to that point is just a matter of implementing all the libraries that ship with .NET.

Still not convinced.

Getting to that point is just a matter of implementing all the libraries that ship with .NET.

Just a matter of implementing all the libraries? I think you might be underestimating the scale of that project... From Erik's original post: "For example, in the context of LINQ, the language extensions are just the tip of the iceberg, and in fact the bulk of the work is in the libraries..."

And remember that my interest isn't primarily in deploying applications. My interest is primarily in writing software. So whether or not I can run my app on Mono is irrelevant to me if I still need to run VS on Windows. I don't have Windows. Look at the whole stack of components mentioned in the original post (Dev Tools, Language, Runtime, Libraries)... I want that whole stack to be OS-agnostic. Looking at other modern mainstream languagues (Java, Python, Ruby) they do seem to have largely achieved this.

Maybe I'm misunderstanding you, and I really will be able to run all of the MS development tools without running Windows. If that's the case, I apologize, and I really will look forward to it.

LINQ requires no CLR changes

Right now, I am working on a POPL paper on LINQ with Gavin Bierman. One of our goals is to formally prove that LINQ requires no CLR changes and that all C# 3.0 extensions can be erased into ordinary C# 2.0.


Interestingly, for Visual Basic 9.0 we show that we cannot compile into Visual Basic 8.0 and we aim to show what extensions are required to Visual Basic late-binding runtime library. But again, these require no CLR changes.


Visual Basic a very interesting topic for programming language semantics research.

Microsoft should officially support Linux/Unix...?

That's what it sounds like you're saying.

Sure

As far as language tools go... Sure, I wish they would. Maybe it just doesn't make business sense, but I'm not totally convinced. And it's not like the idea is totally ludicrous: Sun supports Windows, after all.

Anyway, like I said, I can't really make a business case for it... It's just wishful thinking, plain and simple.

The business brains at

The business brains at Microsoft know if it's viable to support alternative OSs or not. And most likely it's not.

Stop this "wishful thinking" crap. "I wish" someone would just drop a couple million in cash on my doorstep.

Static typing where possible?

Erik, in your paper you agree with this quote by John Hughes:

it is a logical impossibility to make a language more powerful by omitting features

This statement sets off warning bells in my head. What about simplicity?

Contrast with Lao-Tse:

Thirty spokes share the wheel's hub;
It is the center hole that makes it useful.
Shape clay into a vessel;
It is the space within that makes it useful.
Cut doors and windows for a room;
It is the holes which make it useful.
Therefore benefit comes from what is there;
Usefulness from what is not there.

Simplistic Simplicity

I think you have a simplistic view of simplicity ;) Joking aside, I think the simplicity in terms of language design comes from well organized features and powerful primitives. Number of features is largely irrelevent if they are properly orthagonal and self consistent.

on orthogonality

Yukihiro Matzumoto, creator of Ruby:

The orthogonal features, when combined, can explode into complexity.

Great text, by the way.

Hence th esecond half of the

Hence th esecond half of the statement - properly organized. Matz organizes his language around the principle of least surprize and an "everything is an object" structure. Scheme supports numerous things and organizes everything around lambda expressions. Perl supports everything and is organized around a hodge-podge of english and unix (okay so not all organizing principles are successful...). Arguably the urge for a simple elegant programming languages goes down the same road that physics takes - a quest for a unifying principle from which all other features can be naturally derived.

I agree with simplicity and

I agree with simplicity and minimalism. What John Hughes refers to is the misconception that if you take an existing imperative language such as C, and leave out assignment, the resulting language suddenly suddenly becomes more powerful, i.e "functional" or "declarative". It just becomes "assignment free", and hence less expressive.

In your anology it would be like removing the roof of a house. You should be careful not to remove the essence of things.

Hope this helps,

?

Yes, I know, I've read "Why Functional Programming Matters".

The point I was trying to get across: C# 3.0 is too complex. Why?

Mike Gunderloy: "I freely admit that my eyes glazed over. As far as I'm concerned, C# is now going the way of C++: it's got syntax that I will never, ever use and is well on its way to complexifying its way out of my life."

Well

The point I was trying to get across: C# 3.0 is too complex. Why?

Well, they could make it simpler while retaining the expressive power that 3.0 brings to C#, but it wouldn't be C# anymore. ;-)

Kidding aside, I really wonder what Mr. Gunderloy ment when he said C# was "complexifying its way out of [his] life." Is he referring to lambda expressions and expression trees, or to their implementation in C# 3.0? If the former, then many would beg to differ, if the later, then if we fix the implementation, it wouldn't be C# anymore (syntactically, it would be difficult to find another implementation that fits in with C# 3.0). Whether or not these features should be added into a language that is as syntax bound as C# as an afterthought is a very good question.

I'm not really convinced that C# is getting more complex: I think it has been complex the whole time: its just that nobody noticed it until now. That is, C# (really any decendant of C) isn't very ameable to changes as fundamental as this without the addition of new syntax.

So, to answer your question, as I see it, C# 3.0 is so complex because its the syntax of the language requires it.

?

Okay, so the bridge collapsed because the concrete was poor. And the question becomes, why does Microsoft sell poor concrete?

Trolling aside, I was hoping to hear sound engineering justifications for:

"extension methods" (local extensions to existing classes in a statically-typed environment)

"expression trees" (macros that run at runtime because there's no macroexpansion time)

the many different forms of lambda (Smalltalk did fine with just one)

object and collection initializers (instead of a proper argument-passing convention)

query expressions (why isn't the new macro mechanism expressive enough to encode them, why does the standard need "query expressions" with magic method names?)

And so on.

Good questions

LtU is a good place to discuss the technical issues and questions. These questions are a good start.

good start

Good questions, yes.

But it seems Erik has decided to leave them unanswered, and I couldn't find definite answers anywhere else on the Net. It's sad.

Upon further thought.

I reread the article on expression trees, and I kind of have to agree with you. It does seem overly complicated. My guess is they realize that Java isn't their only competitor anymore. They have these “new” (but older than Java!) languages like Python and Ruby gaining popularity. (Python and Ruby are much more popular in the Windows world than they were just 4 years ago, and are rapidly gaining acceptance in the business world.) These languages allow all sorts of programming and meta-programming possibilities, and can walk circles around C# all day long when it comes to DSLs and library creation. (What I mean by that is, these types of languages can use their meta-programming “kung-fu” to create libraries that much simpler and more elegant.) They realize that they need to compete with that, but they’re syntactically bound (old code might break if you introduce new syntax), as well as being bound by other concerns. C# isn’t a “new” language anymore: they have a lot of customers who would be upset of all of their old code broke when they upgraded to C# 3.0 and who won’t take the risk of upgrading if C# 3.0 breaks old code. Python and Ruby, on the other hand, have had much longer to refine themselves, and haven’t had to fear customers getting upset and leaving over broken code. Heck, Guido wants to make changes to Python 3000 that could break a lot of old Python code, but he isn’t overly concerned about that. Microsoft, on the other hand, could never get away with that sort of behavior. The result: Python, Ruby, and other “open source” languages are much more dynamic languages (no pun intended), and have a much cleaner design as a result (old cruft gets thrown out).

You do bring up some interesting points, and it would be interesting to hear what sort of defense they have for their decisions.

Type-directed translation

The interesting thing about comprehensions and expression trees in VB and C# is that they are based on a type-directed translation.

For instance in Lisp/Scheme you use quote and quasi-quote to build code as data expression trees, whereas in VB and C# it is based the type to which you assign and the compiler will insert quotes for you.

My question is how would you do that in Python or Ruby?

Good question!

For instance in Lisp/Scheme you use quote and quasi-quote to build code as data expression trees, whereas in VB and C# it is based the type to which you assign and the compiler will insert quotes for you.

Oh, duh! I'm sorry I must of missed that. Yeah, it makes more sense now. Thanks for the comment. Of course the syntax is still kind of crappy, but like I said in an earlier post, Microsoft is rather constrained when it comes to modifying C#. They don't have a lot of wiggle room, change too much and old stuff breaks. That is not an option for Microsoft (they have customers; if all their costomers stop using .NET for fear that all their old stuff breaks, it's a pointless product).

My question is how would you do that in Python or Ruby?

Good question! To tell you the truth, I don't know Python very well. I don't know Ruby either, but from what I understand such meta-programming is possible in Ruby as well. I do know Lisp/Scheme pretty well. The only reason I used Ruby and Python as examples is because, I've heard all sorts of raving about meta-programming in these languages and these languages are much more popular than Scheme or Lisp. I've looked at django, for example, and it is a pretty cool framework, and does all sorts of stuff that would be difficult to do in C# (or is it difficult because I'm not a very good C# programmer?). If I understand correctly, Python does allow one to look at and manipulate syntax trees, but from what I've seen it looks more complicated than the expression trees in VB and C#.

Anyway, C# 3.0 does look like its trying to compete with (integrate the features of) more dynamic languages such as Python or Scheme. (Or at least they should, because if they don't they might find themselves behind again. These languages are rising in popularity!) I think this is a good thing, because I work in a Microsoft shop, so I have to code in C#. If C# 3.0 can gain the expressive power of Scheme (while still being C#!) that would be pretty cool.

Python does allow one to

Python does allow one to look at and manipulate syntax trees

How about Perl6?

Perl6

Perl6 has macros. See Apocalypse 6. The general approach seems to be Lisp-like, but the details are too weird for me to understand.

This is indeed what I was

This is indeed what I was referring to.

The Apocalypse was discussed here, of course.

Well, I'll be.

I haven't really followed the development of Perl6 much in the last few years. Looks interesting. Thanks for the link.

Ruby

Ruby doesn't have macros. You can embed queries in the language (sorta) by overloading operators on placeholder objects. Criteria, a Ruby library, does just that... doesn't seem to be very popular, though. A Smalltalk analog would be ROE.

An alternative approach would be using closures instead of macros, as this snippet by Sam Ruby does. But this doesn't work with databases, as far as I can see.

Python

All I could find for Python was SQLComp. It seems to be based on expression trees, just like LINQ. No type-directed translation - a wrapping call to make_query() is used instead.

On further thought... Erik, I don't see the point of type-directed translation here. Don't we lose type inference and composability?

Tools and IDE: Perhaps I'm

Tools and IDE: Perhaps I'm in the minority but I do work with a lot of C# code outside of the Visual Studio IDE. Visual Studio is good at many things but there are many text based tools and scripting languages that are convenient in breaking up the problem of managing a large code base. One must never forget that at some level of abstraction, the code is just a bunch of data that has to be managed like any other data.

Bottom line for me: Keep enhancing Visual Studio as a tool for managing code. But please! don't assume that an IDE is the only way to manage code. Make it as seamless as possible to approach the code in numerous fashions. I curse the days of VB when much of the forms information was stored in binary formats. Fortunately, Microsoft has gone away from these formats. I'd hate to see the day when the code was no longer viewable in Notepad.

Language and Type Systems: The two features of C# 2.0 that I most often use are the nullable types and typed dictionaries (genericity). The biggest issues that remains with nullable types is that not all nulls are equal - a value returned from the database is not a C# null, but rather DBNull.Value. I understand why there's a distinction, but it does cause some extra work going back and froth. Having typed dictionaries is fantastic, but we still have the remaining collections that are just typed as objects - accessing data collections being the most commonly used one. Would be nice if we had a method to type these statically.

Which I suppose is where 3.0 comes in. From what I've seen of LINQ, I think it's gonna be great. Instead of handling all this stuff as strings, we can actually use the type system to our advantage. Related to the first bullet point, though, I don't want to see the ability to create the code tied exclusively to the Visual Studio environment. There should be a way to declare or harvest the structure of the external data source that does not wed the IDE to the underlying link with the data source (e.g. SQLServer). In fact, one should be able to write code against a (currently) non-existant database.

Runtime and Libraries: Not to knock the competition, but I think it's been a mistake to lock the JVM up. There have been some improvements in Java the Language, but Java the Virtual Machine hasn't changed much in the last 10 years. Java the API library has been the vast source of their enhancements. They've done a very good job on libraries, but I can't help but think that improvements to the runtime environment were in order. The goal was to maintain a stable platform across all platforms - a goal which I think they've succeeded.

Still, MS is in the enviable position of being able to make changes to the CLR and C# (and VB, etc...). But I don't know that this flexibility will last indefinitely. Issues of version management and updating software take over at some point.

Transactions Everywhere: As long as you're putting new stuff together, why not something along the lines of single assignment variables with futures (ala Oz/Alice)?

WRT transactions, I don't necessarily agree that it makes writing concurrent programs that much easier. It does make rollbacks manageable and it does provide automatic locking. But writing concurrent applications is still difficult when we are in a shared state concurrency model. Those novice programmer will initially not use transactions. Then we they find issues in the concurrency, they are going to start locking everything. You still have to use judgement when deciding at what level you want the abstraction to be atomic. Place it too low, and you are not going to get proper concurrency. Place it too high, and you'll have a single process thread running, putting everyone else to sleep.

Futures in Scala

You mention single assignment/dataflow variables...I just posted a proof of concept implementation in Scala you might find interesting. Scala's unification of Functions and Objects lets us do some very neat things.

Future.scala

In there you'll find six implementations of futures:

1. Blocking/Dataflow -- if value not available, block until it is.
2. Promise -- externally settable result.
3. Lazy -- calculate on demand, in calling thread.
4. Concurrent -- spawn new thread for calclation, blocks on get until available.
5. Dependent -- calculated based on modifiable values, fixed expression.
6. Cell -- calculated based on modifiable, variable expression.

These are all unified under Function0[T], so any type of future can be passed anywhere you need a T. Scala's "implicit" keyword is used to "automagically" insert all necessary unboxing, and can also be used to lift functions on T to operate seamlessly on functions of Future[T].