The Evolution Of LINQ And Its Impact On The Design Of C#

MSDN Magazine, June 2007

"The language design team now had several prototypes to get feedback on. So we organized a usability study with many participants who had experience with both C# and SQL. The feedback was almost universally positive, but it was clear there was something missing. In particular, it was difficult for the developers to apply their knowledge of SQL because the syntax we thought was ideal didn’t map very well to their domain expertise."

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

They ruined it!

I like lambda expressions, type inference and extension methods. But this whole query expressions stuff leaves a very bad impression.

They basically ruined the language: added tons of new keywords and a syntax that does not look like a C family language at all just to appease people who are irritated by a few brackets and want their queries to look exactly like SQL.

I would accept such unnecessary syntax candy for a language like VB.NET. But C# used to be a decent language.

It is amazing: C# is now just 10 years old, and it already has accumulated an amazing amount of cruft. In another 10 years it will probably be an abomination.

I don't think the syntax

I don't think the syntax looks far from a C family language this way:

var locals =
   customers
       .Where(c => c.ZipCode == 91822)
       .Select(c => new { FullName = c.FirstName + “ “ + c.LastName, 
                          HomeAddress = c.Address });

Now the SQL-like syntax is a bit dumb I think, as it is unable to mimic completely SQL (eg. FROM comes after SELECT). But as a SQL user I am not shocked by it. Not a C# user though so you may be better placed to comment on the general direction of the language :)

That way is perfectly OK

That way is perfectly OK. It only uses "general purpose" language constructs like lambdas and extension methods.

The sql like syntax is what I have a problem with. It is basically a domain specific language that is only marginally shorter than the above, yet it adds a whole lot of keywords and complexity to C# and with the lack of brackets it looks totally out of place in a C# program.

MS seems to have a tendency to extend the language every time they think of something that would be neat, instead of making the language flexible enough that it does not have to be extended all the time.

I guess in C# 4.0 we can expect native support for XML statements in C#, and whenever a new buzzword technology gets hyped they will include it in the language to be buzzword compatible.

To clarify:

This is good.

var locals =
   customers
       .Where(c => c.ZipCode == 91822)
       .Select(c => new { FullName = c.FirstName + “ “ + c.LastName, 
                          HomeAddress = c.Address });

It replicates SQL syntax as good as possible using generic language constructs without adding any new keywords and complexity to C#.

This is stupid:

var locals = (from c in customers
              where c.ZipCode == 91822
              select new { FullName = c.FirstName + “ “ +
                          c.LastName, HomeAddress = c.Address})
             .Count();

It is not even shorter. And it does not look completely like SQL. It is just a completely useless bastardization of C#. It is really a pity, since I preferred C# to java.

My all time favorite is Clean, followed by Scala. But C# was not that bad, and I have to use it at work. So this really pisses me off.

Syntactic sugar

Just curious: how would you judge other "syntactic-sugars" in C#, like the foreach statement, the using statement or the lock statement? Each of them could be rewritten using (roughly) the same amount of code with while-loops and finally clauses. If you're really minimalistic, you could also replace every for loop with a while loop, and every switch-case with a series of if-clauses.

I'm not expressing an oppinion here: I'm just curious what kind of syntactic sugar is considered "ok" and what isn't.

Butting in

If I may offer my opinion here: the sugar of foreach(), using(), and lock() are all C#-flavoured. That is, they look and work like other block-level functions (for(), if(), while(), etc). They are C# "contractions" for dealing with common idioms, which factor those idioms into new (but syntactically-regular) block-level functions.

The SQL-like syntax for LINQ is very, very different. It does not produce new syntactically-regular C# constructs, but embeds an entirely distinct syntax in a specific location (as a collection-valued expression). Worse, it's not a replacement for an idiom as much as an alternative syntax for a subset of a type of function call (whew!). That is, the SQL-like syntax expands in the compiler to a series of chained calls to extension methods, but the mapping of SQL-like statments to methods is fixed, the scope of the mapping is non-extensible, and the range of possible chained calls is pretty limited.

Had the C# group implemented the SQL-like syntax as an extensible component of the compiler, so that one could define new mappings into new extension methods, I'd be much less opposed to the new syntax. But it was put in as a compiler special-case, and there's little to be done about it now.

It is, unfortunately, a perfect compromise: it's not exactly SQL, so you can't simply embed working SQL code; it's not C#, so you can't mix it directly with C# code; and it's not extensible, so what you get is what you've got.

The SQL-like syntax for LINQ

The SQL-like syntax for LINQ is very, very different. It does not produce new syntactically-regular C# constructs...

So if I got what you mean, this is the core problem: that foreach (a in b) { c; } seems "syntactically regular" while (from a in b select c where d) does not. Right? (As far as I can see, everything else you say applies to foreach as well)

What kind of syntax would have been "syntactically regular" for such a feature? What if they had picked a more traditional list-comprehension syntax, something like [a from b in c where d] instead of from b in c select a where d? Would that have been ok?

An interesting question

An interesting question. I haven't thought much about it, since I would probably use the dotted extension syntax in many cases:
var result = data.Where(x => x.foo > 100).Select(x => new {fieldA = x.bar, fieldB = x.baz});
It's not SQL, but it reads (to my eye) like pretty much any other C# code.

Regarding foreach, if in is a (meta)operator, then it's precisely a C-style block-level function, like for. Trying to define the SQL-like words as (meta)operators, though, runs into C#'s limitations (shared by all C-derived languages) regarding operator precedence, so simply piling on new words requires learning new (and rather complicated) rules for how they relate -- note that the section of the spec describing query expansion runs to more than seven pages.

Once again, I'm perfectly happy to find syntactic changes and expansion in C#, but I would have preferred it to be done in an extensible way, such that the provided SQL-like query expansions lived in a library, instead of being baked into the compiler. That way one could learn to read the code defining the new syntax and be up to speed, and then be able to follow any future syntax additions as well (this is the way Nemerle works, for instance). Instead, we get a monolithic block of new syntax, with a well-defined but arbitrary set of expansion rules, and the next time they do this, we'll have another, unrelated set of things to remember. The cognitive space available for C# isn't unlimited, and I rather feel the SQL-like syntax is an unnecessary squandering of part of it.

Please note I understand why they did it that way, I just don't happen to agree with the decision.

This may not have been your

This may not have been your intent, but the interesting thing about that question is that all the examples you mention are replacable with higher order functions with minimal syntax changes. Indeed, the (good version of the) Java 7 closures proposal is intended to allow them as such (with some syntactic sugar to make them look more like the traditional control flow constructs). Presumably C# 3.0 lambdas could do the same.

SQL query syntax is good

They basically ruined the language: added tons of new keywords and a syntax that does not look like a C family language at all just to appease people who are irritated by a few brackets and want their queries to look exactly like SQL.

SQL gets no respect. :) But here's a programming language technology that has survived for 30+ years and has not been supplanted. There's a better Lisp. There's a better C. There's not a better SQL.

You may have missed the part where the paper's authors did a usability study and found that the syntax you like so much, with the lambdas and chained method calls, was unusable. What should they have done at that point?

I've always felt that the way programs talk to databases was stuck in the stone ages, and that fixing it was a matter of programming language design.

I can't really disagree with the people who feel that programming language features ought to be a bit more general-purpose. But I haven't heard anyone try to argue substantively that LINQ isn't the best thing of its kind.

The Third Manifesto

There's not a better SQL.
[...]
But I haven't heard anyone try to argue substantively that LINQ isn't the best thing of its kind.

You might enjoy reading The Third Manifesto.

If only

they'd release a real implementation of their thoughts, to put their money where their mouth is. (Personally, I love their commentary, and I think releasing such a thing would be a fun to check out.)

The benefit of query comprehension syntax

The prime benefit of query comprehension syntax in C# 3.0 is that it automatically carries outer iteration variables in a SelectMany, Join, or GroupJoin clause. For example, this query:
  from c in dataContext.Customers
  from p in c.Purchases
  orderby p.Description, c.Name
  select p.Description + " was purchased by " + c.Name;
is rather unpleasant if written in lambda syntax:
  dataContext.Customers
  .SelectMany (c => c.Purchases.Select (
                    p => new { c.Name, p.Description } ))
  .OrderBy (temp => temp.Description)
  .ThenBy (temp => temp.Name)
  .Select (temp => temp.Description + " was purchased by "
                   + temp.Name);

A similar scenario arises with the 'let' clause.

Joseph Albahari

Re: The benefit of query comprehension syntax

Is it necessary to do it that way? Could you use:

from c in dataContext.Customers
from p in dataContext.Purchases
where p in c.Purchases
orderby p.Description, c.Name
select p.Description + " was purchased by " + c.Name

PLs should offer the tools for writing DSLs, not be DSLs themsel

PLs should offer the tools for writing DSLs, not be DSLs themselves. It's very bad from a language point of view to make specific functionality like SQL part of a language.

With a little imagination, even C++ could have an SQL DSL in it. For example:

BEGIN_DATABASE(database)
    BEGIN_TABLE(customers)
        COLUMN(id, int)
        COLUMN(name, string)
    END_TABLE()
END_DATABASE()

struct Q1 {
    int id;
    string name;
};

Query q1 = database.
    select(database.customers.id, database.customers.name).
    from(database.customers).
    where(database.customers.id >= 10 && database.customers.name.like("john"));

You are right

The sad thing is that C# 3.0 has the tools to write domain-specific language extensions (extension methods and lambdas). But just to save a few characters, they decided to bloat the language with unnecessary keywords and constructs.

I guess nobody at microsoft has ever heard of KISS. They still seem to think that complexity is a virtue.

Microsoft is a business enterprise,

so they are not interested in orthogonal solutions, they are only interested in delivering useful functionality to their clients. That's why the abomination that is called WIN32 exists.

I guess nobody at microsoft has ever heard of KISS

"Whoever does not understand LISP, is doomed to reinvent it."

embrace and extend

Once you have SQL embedded in the language you can start adding SQL-Server specific functions and perhaps Windows specific ODBC commands directly to C#. Then 3rd party programs written in C# become more tightly tied to the Microsoft architecture. Remember, Microsoft's goals are to cause business to accidentally become strongly tied to their platform. I'm not sure embedding a DSL isn't very useful for achieving the goals they really want to achieve.

How programming languages

How programming languages can create lock in is on topic for LtU, of course. But let's try to steer this thread away from Microsoft bashing, which is not.

Of course, the C# version is

Of course, the C# version is a lot shorter, and the C++ version violates the DRY principle twice, at 3 completely different places in the code. I'm sure anyone who has used MFC knows how error-prone constructions like that are.

But apart from that, I can't really see how this is supposed to work anyway:

Query q1 = database.
select(database.customers.id, database.customers.name).
from(database.customers).
where(database.customers.id >= 10 && database.customers.name.like("john"));

How exactly would you convert C++ expressions like "database.customers.id >= 10" into SQL queries? Would that work for the tertiary operator or type conversions that can not be overloaded? How much of the query can be checked at compile time, how much has to be checked at runtime? This approach seems to have serious drawbacks (which is probably why C++-programmers still use lower-level DB-APIs mostly).

PLs should offer the tools for writing DSLs, not be DSLs themselves.
I'm guessing they chose the other way so that IDE features like Intellisense can still work.

Why shouldn't the language provide this?

I'd rather have one SQL DSL with defined semantics in the language core than 18 ad-hoc open source ones.

One of the big things that Microsoft is trying to accplish with LINQ is to make the data persistence layer 'first class'. I think that's a good thing. Only time will tell how well it works in practice though.

Well, if the language is

Well, if the language is expressive enough you can have a standard SQL DSL as part of the standard library, which up to a point is what the LINQ guys tried to achieve, by the way.

How exactly would you

How exactly would you convert C++ expressions like "database.customers.id >= 10" into SQL queries?

The field 'id' of member 'customers' contains all the meta-information needed for formulating the SQL query. It also overloads operator >= so as that when invoked, it produces a binary expression of type 'greater than or equal'.

Would that work for the tertiary operator

What does the tertiary operator has to do with SQL?

Even if it was needed, a wrapper IF class is more than enough.

or type conversions that can not be overloaded?

You can always use wrapper classes for those types.

I'm guessing they chose the other way so that IDE features like Intellisense can still work.

Intellisense can work in either setup. It's a matter of parsing and translation.

How exactly would you

How exactly would you convert C++ expressions like "database.customers.id >= 10" into SQL queries?

The field 'id' of member 'customers' contains all the meta-information needed for formulating the SQL query. It also overloads operator >= so as that when invoked, it produces a binary expression of type 'greater than or equal'.

I guessed something like that. Using the C++ type system like this always sounds nice in theory, but in practice it leads to incomprehensible compiler error messages, code that's hard to understand, because the lambda-abstraction isn't perfect and you have to think on the type-system-level if you're looking for an error. If wrapper-classes are involved, the code is usually harder to write, too, because you're essentially writing code in 2 different languages. The compiler often can't spot errors, e.g. if you're using the ternary operator instead of some wrapper-IF-class. Also, it doesn't make debugging any easier.

I respect the intellectual accomplishment behind things like boost::lambda (and I know the people who made it are a lot smarter than I am), but I really doubt it's practical usefulness. FWIW, hardly any C++ programmer I've worked with uses libraries like that in real-world projects.

Intellisense can work in either setup. It's a matter of parsing and translation.

Now look at the state of current C++-IDEs, and current IDEs for Java, C#, VB, and think again...

I guessed something like

I guessed something like that. Using the C++ type system like this always sounds nice in theory, but in practice it leads to incomprehensible compiler error messages, code that's hard to understand, because the lambda-abstraction isn't perfect and you have to think on the type-system-level if you're looking for an error. If wrapper-classes are involved, the code is usually harder to write, too, because you're essentially writing code in 2 different languages. The compiler often can't spot errors, e.g. if you're using the ternary operator instead of some wrapper-IF-class. Also, it doesn't make debugging any easier.

I respect the intellectual accomplishment behind things like boost::lambda (and I know the people who made it are a lot smarter than I am), but I really doubt it's practical usefulness. FWIW, hardly any C++ programmer I've worked with uses libraries like that in real-world projects.

My example did not involve lambdas. It was a DSL for SQL. The purpose of it was to show how to possibly make the database handling typesafe using libraries instead of hardcoding it in the language. As a DSL, it's easier to use, easier to understand, typesafe, and the errors it produces are much more comprehensive than anything else.

Now look at the state of current C++-IDEs, and current IDEs for Java, C#, VB, and think again...

Microsoft Visual C++ 8 works flawlessly in this respect. But in this case the compiler vendor is the same as the IDE vendor.

My example did not involve

My example did not involve lambdas.

Have a look at boost::lambda. They create special types for lambda parameters, override operators, create wrapper classes for non-overridable operators. That's essentially what you suggested.

It was a DSL for SQL. The purpose of it was to show how to possibly make the database handling typesafe using libraries instead of hardcoding it in the language. As a DSL, it's easier to use, easier to understand, typesafe, and the errors it produces are much more comprehensive than anything else.

I guess it might produce comprehensive errors, I just doubt they'll be comprehensible. (SNCR)
But seriously: Look at the compiler error messages you get in "simple" C++ programs that use STL: If you know the technology in and out you can usually figure out what's the problem, but especially for C++ beginners the messages are anything but helpful - sometimes they can't even tell in which file the error is. Do you really think this is going to be easier with a complex type system like the one you suggest?
Also, I have serious doubts about that "easier to understand" part: I read that Linq article, and I think I pretty much understand the basics. The compiler does most of the work (especially the expression -> expression tree transformation) On the other hand, to really understand what's going on in such a macro-template-SQL-DSL I'd have to dig a lot deeper. And as soon as I have to do a little debugging, and have to step into one of those template functions, I really have to know what's going on under the hood.

So in the end, I guess we have contrary oppinions and experiences about this subject. Good thing there's more than one programming language in the world!

Have a look at

Have a look at boost::lambda. They create special types for lambda parameters, override operators, create wrapper classes for non-overridable operators. That's essentially what you suggested.

No. I never used lambdas. I used operator overloading to form an SQL expression tree.

But seriously: Look at the compiler error messages you get in "simple" C++ programs that use STL: If you know the technology in and out you can usually figure out what's the problem, but especially for C++ beginners the messages are anything but helpful - sometimes they can't even tell in which file the error is. Do you really think this is going to be easier with a complex type system like the one you suggest?

That's a fault of compiler designers. It really depends on what you consider being the error in templates: a) is the template wrong, or b) is the template used wrong?

For me, it should always be b), because the compiler can semi-compile templates and check them for consistency.

Here is a practical example in VC++ 6.0:

#include 
#include 
#include 
using namespace std;

struct Foo {
	int i;
};

struct Foo_greater {
};

int main() {
	vector v1;
	sort(v1.begin(), v1.end(), Foo_greater());
	return 0;
}

The error output is:

c:\program files\microsoft visual studio\vc98\include\algorithm(633) : error C2064: term does not evaluate to a function
        c:\program files\microsoft visual studio\vc98\include\algorithm(598) : see reference to function template instantiation 'void __cdecl std::_Unguarded_insert(struct Foo *,struct Foo,struct Foo_greater)' being compiled

By opening the source code at file 'algorithm' at line 633, the code is:

{for (_RI _M = _L; _P(_V, *--_M); _L = _M)

The source code at line 598 is:

_Unguarded_insert(_F, _Ty(*_F), _P); }}

Let's see what the compiler knows:

1) it knows where the template 'sort' is used.
2) it knows that the predicate P of function 'sort' requires 'operator()(A, B)'
3) it knows that predicate P is 'Foo_greater'.

With all these things known, why does the compiler output such a crappy message? The compiler could easily have said:

file 'main.cpp', line 15: struct 'Foo_greater' does not implement 'bool operator(const Foo &a, const Foo &b)'

I think the problem with template error reporting is the wrong choices by the compiler implementors, not templates themselves.

Also, I have serious doubts about that "easier to understand" part: I read that Linq article, and I think I pretty much understand the basics. The compiler does most of the work (especially the expression -> expression tree transformation) On the other hand, to really understand what's going on in such a macro-template-SQL-DSL I'd have to dig a lot deeper. And as soon as I have to do a little debugging, and have to step into one of those template functions, I really have to know what's going on under the hood.

Nothing prevents you from stepping into the expression-forming code. In most cases, you wouldn't need to go deeper, because you would know the SQL statement is formed correctly. But if you need to go deeper, you can always step in the code.

So in the end, I guess we have contrary oppinions and experiences about this subject. Good thing there's more than one programming language in the world!

I haven't seen much an argument, though. Perhaps you care to elaborate on your opinion?

I haven't seen much an

I haven't seen much an argument, though. Perhaps you care to elaborate on your opinion?

Well, your point seems to be: "With a sufficiently smart compiler, all those problems would vanish". There is no argument against this, except that such a "sufficiently smart compiler" does not exist. Now either compiler writers are stupid (which I doubt) or giving sensible error messages in any case (especially the cases you never anticipated) is really a hard problem (read: AI-hard, like parsing a sentence in a natural language).

Anyway, I can only use the compilers that are available and work with the co-workers I have, and with those, building a non-trivial DSL with the C++-typesystem in a real-world application would be suicide.

Not really.

Now either compiler writers are stupid (which I doubt) or giving sensible error messages in any case (especially the cases you never anticipated) is really a hard problem (read: AI-hard, like parsing a sentence in a natural language).

Or the compiler writers had priorities and fixing their error messages was not a top priority for them (because they had to ship the product).

Anyway, I can only use the compilers that are available and work with the co-workers I have, and with those, building a non-trivial DSL with the C++-typesystem in a real-world application would be suicide.

The situation is not as bad as you describe. In the context of SQL, 'strange' errors would be minimal anyway.

My point is that it is better for such functionality to come as a library and not as part of the language. Changing the compiler is much more costly than changing a library.

In the context of SQL,

In the context of SQL, 'strange' errors would be minimal anyway.

I seriously doubt that. If you want to convice me otherwise, I guess you'd have to give me a sample implementation and show me that the compiler errors you get for typical porgramming mistakes are helpful for someone who doesn't understand how your DSL is embedded into the type system (i.e. a user). But I guess that would be a bit off-topic here ;-)

My point is that it is better for such functionality to come as a library and not as part of the language. Changing the compiler is much more costly than changing a library.

So, your point is: If something can either be done with a (sufficiently rich) typesystem (like that of C++) or can be hard-coded into the compiler, then doing it in the compiler is generally bad. Is that what you mean?

If so, how about lambda expressions? What about tuples and tuple assignment (e.g. a, b = b, a)? What about list comprehensions? All of these could be implemented (at least in C++) without explicit support from the compiler - yet many modern languages like Haskell oder Python provide these features "hard-coded" into the compiler.

I seriously doubt that. If

I seriously doubt that. If you want to convice me otherwise, I guess you'd have to give me a sample implementation and show me that the compiler errors you get for typical porgramming mistakes are helpful for someone who doesn't understand how your DSL is embedded into the type system (i.e. a user)

My DSL is not embedded into the type system, it uses the type system of C++. You don't have to see an implementation to understand that strange errors would be minimal, because the concept is very simple, and there are very few templates included in the library. Each column of the database has a C++ type, and expressions on columns accept values of that type only; no templates involved.

If so, how about lambda expressions? What about tuples and tuple assignment (e.g. a, b = b, a)? What about list comprehensions? All of these could be implemented (at least in C++) without explicit support from the compiler - yet many modern languages like Haskell oder Python provide these features "hard-coded" into the compiler

You are comparing oranges and apples. SQL is very specific, it is used for databases. On the other hand, lambda expressions, tuples, list comprehensions are generic computation tools.

If you can do a language where all these computation tools are not hardcoded in the language but are part of the library, then you would have the advantage: each time you want to change or add something, you wouldn't have to modify the compiler.

LINQ domain-specific?

LINQ isn't tied to SQL - you can use it to query anything. For example, I recently needed to use reflection to extract all types in an assembly that were based on 'TestBase', and then get all their parameterless public static methods. Here's the LINQ query:
  from t in Assembly.GetExecutingAssembly().GetTypes()
  where typeof (TestBase).IsAssignableFrom (t)				
  from m in t.GetMethods (BindingFlags.Static | BindingFlags.Public)
  where m.GetParameters().Count() == 0
  orderby m.Name
  select m;
I much prefer this to the imperative approach.

What if the domain is

What if the domain is "querying"?

If the domain is "querying",

If the domain is "querying", perhaps that's general enough for first-class support in a programming language.

Sure. We have no argument

Sure. We have no argument about this. But that's neither here nor there, since a small core language+libraries is first-class support, if the language is expressive enough. That's an orthogonal issue.

By the way, this discussion makes it clear that SQL-like queries can indeed by integrated into languages, for objects of various kinds, if one wants - by the route chosen by the LINQ designers or using a different implementation approach.

One problem with the

One problem with the implementation of the SQL-like syntax for LINQ is that it's treated as such a secial case by the compiler. There are certain optimisations that are performed internally on the query (when the query is specified in the SQL-like syntax) that render it unusable for certain kinds of non-DBMS collections. That's an implementation issue, of course, but I think it's due to the language group adding the SQL-like syntax to the language for a very small and specific use case, instead of as a more general solution.

The saving grace, for me, is that the SQL-like syntax is entirely optional in the language. I can write queries using dotted chains and be quite happy. That's a very nice thing.

There are certain

There are certain optimisations that are performed internally
on the query (when the query is specified in the SQL-like syntax)
that render it unusable for certain kinds of non-DBMS collections

Can you give an example?

Query translation rules

Have a look at the C# language specification (specifically, §26.7.1.7). The translation for
from c in customers select c
becomes
customers
which bypasses the projection process. In the DBMS context, this is a useful optimisation. In other contexts, projection has side effects, or is otherwise useful (for instance, the projection process might serve to materialise an object only referred to "virtually" in the query itself).

Prepared Statements

The other thing LINQ doesn't particularly map cleanly to is prepared statements, since it generates a fresh SQL every time.

It may have been better for LINQ to be defined using an attribute against an existing function. e.g.

@SQL
def FindUsersByAgeAndPostcode(age, postcode):
    ...

That should work

I think creating prepared statements should work. See this blog entry from Rico Mariani from MS.

How does this actually work

How does this actually work (compilation-wise)?

Expression Trees

That is actually quite nice. It uses a feature called expression trees. Basically you can convert each lambda into a function or into an
expression tree.

When you have a complex query with select, where, orderby etc, it gets converted into an expression tree which can obviously traversed straightforward and converted to SQL.

But it can also be analyzed, optimized and converted in some other representation like a prepared statement.

For example if you have something like a select condition

var young=persons.Where(person=>person.Age<20 && person.Age<30);

the expression graph can be analyzed and the redundant expression persion.Age<30 removed before converting into SQL.

That's the part I know

That's the part I know about... I was wondering at what stage can the DBMS get involved to optimize the "plan" based on indexing information etc (i.e, "statis SQL binding").

The generic flow is

The generic flow is something like:

code -> [compiler] -> extension method calls -> [IQueryable implementation] -> expression tree -> [IQueryable implementation] -> operations

In the specific case of LINQ-to-SQL using C#, it ends up being something like:

C# code -> [C# compiler] -> extension method calls -> [query provider]-> expression tree -> [query provider] -> SQL code -> [query planner] -> DB operations

The "expression tree -> [query provider] -> SQL code" step involves the query provider "compiling" the expression tree to produce SQL "object code". This is normally done each time the expression is used (specifically, each time an iterator is opened against the expression). If you're going to re-use an expression, you can request the query provider to capture the compilation, so that it can skip the compile step. The captured compilation is parameterised by the collection object, so that the same "pattern" of query can be applied to a variety of objects without recompiling.

The compiled query will produce structurally identical SQL code each time. I believe that it will also use SQL parameters where possible, so that re-application of the captured compilation will attempt to produce completely identical SQL code (expect for parameter values). This is where the DBMS gets involved (for SQL Server, at least): when it sees the first SQL statement, it creates a query plan. When the query is re-used, SQL Server sees the same SQL query, and re-uses its query plan. That's about the extent of optimisation at the moment.

Note that the nature of the captured compilation is left to the query provider -- it's opaque outside that scope. It would be perfectly reasonable for a given query provider, written by a given DBMS vendor, to optimise the captured compilation using information from the DBMS as it sees fit. As far as I'm aware, no one's doing that yet.

Note as well that none of this is specific to DBMSs, as all of this processing is intended to be abstract. One could produce a query provider that operated against a web-service query interface for example, and produced query code in a service-specific form when compiling an expression tree. Such a provider could also make use of whatever "hinting" was available when optimising its compilation.

Although this is a bit far afield of the original posting, I think that LINQ-to-SQL is as interesting in its own right as LINQ itself. LINQ has been credited with "sneaking lazy functional evaluation into the mainstream", and that's true. But LINQ-to-SQL can be seen as "sneaking multistage compilation into the mainstream", at least as a first step.

IQueryable

IQueryable is likely to be the most interesting LINQ implementation in the near term (I'd written "provider" earlier, but that's already got a different meaning around LINQ). Various implementations of it are already showing up, and can be educational to work through:

And, since it was mentioned, Rico's post on LINQ to SQL performance, including query compilation: DLinq (Linq to SQL) Performance (Part 4)

I hope this helps --

Keith Farmer [MSFT: LINQ to SQL]

Thanks guys, I should have

Thanks guys, I should have remembered IQueryable. Just to make sure I am not confusing things: IQueryable is the "protocol" between what Wolf called hte "query provider" and the compiler (or is the run time system)? I thought it operated on a much higher level than that.

There are a set of

There are a set of extensions that live on the Queryable class, that target objects of the type IQueryable (IQueryable of T, actually). The presence of these extensions means that, if you provide an implementation of IQueryable, the compiler can emit calls to these extension methods, which themselves orchestrate calls against the IQueryable interface as the blog postings I linked to demonstrate.

One interesting side effect which I should try out some time would be packaging up the queries as functions that take an IQueryable of T. Then you could swap out which IQueryable you want to use and get equivalent queries made to different data stores (potentially with different query languages).

I definitely recommend following a couple of the query providers that have been posted, to get a feel for how they're put together.

So, as Wolf notes, it's not outside the scope of LINQ for a provider to use additional information to inform how it translates an expression tree to a particular query against a particular data store.

User name

I suggest changing the user name to you real name, so that you don't have to add your name manually...

I like extension methods

I have to say the part I like most about all this is extension methods. They seem very useful and look like a very clean way to work around some common limitations. (I'm not a C# programmer so please forgive me if I say something completely wrong).

I'm wondering, do they only work on interfaces or can you use them on any type? If you can then you can add methods to already existing types, simulating something like Ruby's open classes, and you can't break encapsulation. Do extension methods work with autoboxing? If they did then maybe you can write something Rubyesqe like:

10.Times( {
    Console.print("hello");
} );

Does the type system allow an extension method to act as an implementation of a method defined in an interface? The reason I ask is because adding an interface method is a breaking API change, because any implementation that did not have the new method would no longer compile. If you could add a method to an interface and also provide an extension method as a default implementation they you could easily get around this limitation (without resorting to AOP, ugh!).

Check the spec

Check the C# language specification, §26.2 for all the low-down details on extension methods.

They can indeed be defined on any type, but they don't work with autoboxing (as usual, I think string gets special treatment). There are other (subtle) limitations at the moment, such as no support for extension properties, but nothing that can't be fixed later.

Regarding using extension methods to patch "broken" APIs, you might be interested in this.

Humour

The funny thing is that IBM had embedded sql in Cobol 20 years ago.

Yes...

But with COBOL's syntax, no one noticed.

PG'OCaml

I'd like to insert an obligitary advert / link to PG'OCaml which is my SQL syntax extension for OCaml.

It works by asking the PostgreSQL database to compile the SQL statements and return the types of their placeholders [arguments] and tuples [type of returned rows]. We can use this information when compiling the OCaml program to extend type checking and type inference from OCaml code straight into inline SQL code.

The whole thing is only a few hundred lines of code (a camlp4 macro in fact), but adds the complete PostgreSQL SQL syntax, type safe, into OCaml. And we've even used it to develop some very large web software (in other words: yes, it really works!)

Rich.

Why not a programming model like db4o?

I find db4o (www.db4o.com) the nicest database approach out there. I think it increases productivity at least five times, since it also takes care of modifying the schema as the code changes. The db4o approach makes the class definition the actual schema.

Not A SQL-Like Syntax

LINQ is not about a SQL-like syntax directly. See this:
[ x | x <- xs : x % 2 == 0]
(This is a monad comprehension for list monad.)
Now see this:
Select x from xs where x => x%2==0;
And "Select", "From" and "Where" are used because programmers have a near enough meaning for them in mind.
This is a more broad pattern of composing functionality in a very generic way.
So this LINQ thing can point to - potentially - every "set" that is somehow like a 'iteratable' list.
(And this is just a bit of Monad pattern!)