Is Small Still Beautiful?

(LTU appropriate punchline at end) I suppose I'm just old enough to have been raised on the "small is beautiful" philosophy, and I still hold in awe some languages built from a relatively spare set of primitive concepts: Forth, Smalltalk, Scheme, C and the Unix shell + utilities + pipeline all come readily to mind.

But recently, I've had some time on my hands and spent some time "swimming" about in the programming language space. A few observations.

Some of our modern languages (some a decade+ old already) have type systems that require a PhD to understand fully. We have a low level threading model in languages like C and Java that almost require a PhD to use effectively in any sufficiently complex system. (Not to make a fetish of the PhD.)

In Pike's Power Point on system research, recently posted in another thread, he mentions, IIRC, that 80% of the Inferno (?) effort was spent conforming to existing, externally imposed standards! Turning to the most mainstream, popular, *production* languages - we have TRULY giant libraries that boggle the mind.

Designing an effective GUI library for the modern, newly more complex UI was a grand challenge of the late '80s and early 90s.

But today, we have Java/JVM, Perl, Python, C#/CLR (different, but still pregnant with oodles of MS API's), MS binary APIs, a growing body of de facto Linux binary APIs, and even fairly rapidly growing complex Scheme libraries (PLT) that actually require a separate complex application to locate, manage and update - and that's prior to presumably learning the eventually using the libraries in our applications. We're not in Kansas anymore.

The documentation effort alone for any of these language specific "package databases" is daunting.

And all the while, the famous "principle of least surprise" is growing stronger and stronger both with each new generation of computer user and each new generation of computer programmer. It recall's the joking "10 principles" of successful programming language design, the first of which, IIRC, was "use C style curly braces" :-)

I guess that Pike's PowerPoint on systems research had a big impact on me, and that I readily seemed able to apply it to today's situation with programming language design and development. Will new languages be increasingly be relegated to "toy" status until more and more design efforts and research just whither away?

And at least my aging collection of language texts still emphasize a notion of programming language made up of a small set of data and control concepts easily combined - typically, the smaller the better. It reminds me of my formal logic training, where the search for a set of primitives to provide the foundations of logic, set theory and mathematics formed some sort of holy grail (sound a little like Scheme?).

So I ask - do we need an about face? Do we need to study (and teach) how to build *large* programming languages: languages with type-checked integrated SQL syntax; built in rich XML support; myriad native persistence, serialization and network communications facilities; a diverse family of concurrency mechanisms; language level transaction support, including distributed transaction facilities (MQ Series style?) to better support cluster computing.

As for library infrastructure and a (poor) degree of platform based language interoperability, we have the JVM we know and love today frankly by an historical mistake. We have the CLR because MS has to produce a "one up" version of whatever else is popular in the computing world. I won't restate the many, many gripes about each made by folk targeting, or potentially targeting, these platforms for their new, innovative languages (while, acknowledging, that surely each also has its many interesting implementation virtues). But I will invite us to recall the gripes :-)

It's arguable that we need an academic/industry consortium effort to redesign the JVM (presuming we don't start from scratch), with a new, concerted focus on language support - advanced calling convention support such as generalized last call optimization (or think of efficiently supporting Common Lisp calling conventions, even CLOS multimethods, combined with higher order function just for a brain teaser); integration of compiler analysis with runtime call/loop support for optimizing GC or thread switch safety points; optimized execution models to support logic programming and expert system style languages; type systems divorced via some well defined barrier between (more limited) capabilities of the runtime/JIT and new, innovative (unanticipated) future type systems at the language level; safe and efficient intermixing of manifestly and latently typed code and data; rule or specification based per-language calling conventions to facilitate "auto-glue" supporting automatic cross-language library interoperability; support for compiler and linker customization to support a variety of module systems of varying complexity; same potentially for macro facilities, and yada yada yada.

Just a brief scenario based on examples, but I hope you get the idea. I'm sure many of us could go on and on, based on current personal research or commercial interests, likely isolating even more fundamental and/or timely issues that beg attention in order to support language innovation in this apparent new era of "Big is Beautiful."

Like it or not, are we in the era of "Big is Beautiful" language design, and if so, what are we to do about it?

Put another way, given the above described "issues," the raw CPU itself gets in our way least of all! So what's the problem? The problem is the scale of the libraries one must support in a modern language. The problem is increasing productivity of smallish research teams by sharing a low level 3 address code, set of SSA optimizations and code generator - and other relatively neutral infrastructure. The problem is composing language features, *larger* features, and on a *larger scale* than the minimalist principles laid down in the days of yore.

In summary, it *appears* that the glue holds some promise, and clearly some languages benefit from it better than many others. So can the "glue" truly become the *solution* for future language research, design and implementation?

Scott

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Link

Link to Rob Pike's systems talks: http://herpolhode.com/rob/utah2000.pdf

In Pike's Power Point on system research, recently posted in another thread, he mentions, IIRC, that 80% of the Inferno (?) effort was spent conforming to existing, externally imposed standards!

This doesn't surprise me. Do you have a link to the thread?

Like it or not, are we in the era of "Big is Beautiful" language design, and if so, what are we to do about it?

It's important to distinguish between languages and libraries. Languages can be simple and still be useful. Libraries increased in size and complexity because the applications we write increased in size and complexity. I don't think there is a way around this other than writing simpler applications.

Link to Rob Pike's systems

Link to Rob Pike's systems talks: http://herpolhode.com/rob/utah2000.pdf

I went to that specific talk as a grad student. And at the lunch we had afterward I remember arguing a lot, but not about what...

It's important to distinguish between languages and libraries. Languages can be simple and still be useful. Libraries increased in size and complexity because the applications we write increased in size and complexity. I don't think there is a way around this other than writing simpler applications.

My own hypothesis is that libraries have become more and more like toolkits, focusing on defining a framework to express the application vs. a more secondary component accessed through a simple API. Languages have followed suit focusing more and more on top-down configuration of frameworks and not on building things bottom-up out of unrelated components.

Consortium or wiki?

Bootstrapping a consortium to redesign a universal computing platform is a difficult task. To start off a little simpler, how about a wiki that serves as an up-to-date survey of open problems (both theoretical and practical) and research results in PL? Not focusing on a particular language or methodology, but rather comparing how many different approaches fare and where they break down (the sorts of things you pointed out). Perhaps this could serve as a qualitative counterpart to quantitative performance benchmarks.

Simple languages work on complex platforms

Anecdotally, I find Python and Java to be well suited for writing web apps on Google's App Engine.

While these languages usually have a huge support system (libraries, preprocessers, bytecode manipulation, etc), and may thus be considered large and complex languages, when writing GAE apps I find that I only use a small, simple subset of them.

In fact, one could even strip a lot of dynamism from Python (making it a simpler language), without impacting the experience (one of the reasons being that GAE in effect reloads code on every request).

So, at least for web apps on comfortable platforms, I think simple small languages do have a future.

-----------------------
P.S. Some more notes:

So I ask - do we need an about face? Do we need to study (and teach) how to build *large* programming languages: languages with type-checked integrated SQL syntax; built in rich XML support;

No, provide macros (and extensible syntax, if you really have to.)

myriad native persistence, serialization [facilities]

No, provide a metaobject protocol.

network communications facilities

No, that's a pure library thing.

a diverse family of concurrency mechanisms; language level transaction support, including distributed transaction facilities (MQ Series style?) to better support cluster computing.

Don't know, I have a feeling that this stuff belongs in the OS.

Orthogonal is stll beautiful

The relevant criteria is orthogonality (or representational independence), not size. The orthogonal core may be mostly of interest to the language implementer. Orthogonality should make the implementation of one construct independent of implementing another construct, particularly in optimization.

Harmonic analysis is the appropriate comparison, since "orthogonal" is formally defined in that context. While every complete basis can represent any function in the relevant function space, any given basis is better at representing some functions than others; notably the functions which are a finite linear combination of the basis functions. So trigonometric series represent one kind of function very efficiently, wavelets another. It's similar with programming languages.

Edited to improve clarity after being accused of being elaborate spam: Ouch!

Is some kind of elaborate

Is some kind of elaborate spam?

I don't think so

I don't think so... it seems to me be a good faith effort to comment on the original post, although I'll admit it comes off as eccentric.

The post makes a rather tortured analogy to Harmonic Analysis, an area of mathematics I don't know much about. But I do know something about linear algebra, and programs are not linear combinations of features. Programming language design is notoriously non-linear, in that languages can be much more (or less!) than the sum of their features. The same is true of programs; Adobe's software is particularly notorious for being feature-oriented instead of solution-oriented.

The point that different languages have different strengths and weaknesses when expressing an idea is something most people here know already, although I think most of us can agree that there is a partial order between languages as well.

As for what orthogonality has to do with representation independence, I haven't a clue. The final sentence also seems incoherent.

Edit: ok, I think I see what the author might be talking about, representation independence implies some level of orthogonality in implementation. But orthogonal language features might have non-orthogonal implementation characteristics; e.g. nested functions and function pointers. If your language has both, and you don't restrict their interaction, then as a practical matter you'll need some kind of automatic memory management.

Edit 2: The more I think about this post, the more sense it makes. The analogy aside, it seems less eccentric and incoherent than it is dense.

My apologies, then. My

My apologies, then. My spam-o-meter may be overly sensitive these days.

Orthogonality in the core

His last sentence about an "orthogonal core" is actually something that I resonate with quite strongly, even if it is phrased a little bit roughly.

He's using the word orthogonal in the pragmatic sense that there shouldn't be overlap between language constructs. As a very simple example consider the abuse of templates in C++ to get recursion at compile-time. Contrast this with D generics that simply adds 'static' to function declarations, reusing the already existing concept of recursive functions. Not only is this is simpler for the user of the language, but also for the language implementer.

I do think this idea is relevant when designing the 'core' of a language. However, when considering the "outer layers" one starts abandoning "genericity" in favor of "convenience" which implies that combinations of constructs may be wrapped up in a way that is no longer orthogonal but instead forms a kind of short-hand notation.
Naturally some programmers will insist that this type of thing should go into libraries while others will want to directly inject these "macros" into the syntax of the language. I believe that this is often where debate about "big" versus "small" can be constructive since the metrics used may be subjective or, more interestingly, domain-specific.

"Convenience kills"

I agree with your comment that even languages with elegant, orthogonal core features - most of them, anyway - give way to convenience toward the language's "outer layers."

Skipping complex, fancy language features toward the "periphery" for the moment, ask ourselves how many different looping constructs or conditional execution constructs even "tiny" languages will nearly *always* feature. Go's "overloading" of the for special form is just fun with syntax - there are still a number of different looping constructs.

If I think about it, I'd favor the lambda and rec special forms for looping (one can easily replicate repeat/until which is not the case with named let) even in a strongly typed language; and I would pick CL's cond special form for conditional execution (although a match operator with when guards might be a better single conditional execution mechanism at the cost of much greater complexity).

For plain conditional execution and looping, it's hard to justify much more than these simple, but adequate constructs - yet we STILL find the panoply of named let, letrec, labels, while, repeat/until, do, for, loop, comprehensions, if/then/else, when, until, switch/case and I'm sure I'm forgetting a handful more of them.

Needless to say, as you commented, we need not stray far outside the core to find ever more overlapping "convenient" features. On the way to the Harvard Freshman dining hall, someone had spray painted "Convenience Kills" on the otherwise nice brick wall surrounding that part of Harvard Yard - truer words were never spray painted on a wall!

Scott

Mmm...

Needless to say, as you commented, we need not stray far outside the core to find ever more overlapping "convenient" features. On the way to the Harvard Freshman dining hall, someone had spray painted "Convenience Kills" on the otherwise nice brick wall surrounding that part of Harvard Yard - truer words were never spray painted on a wall!

I don't know that I entirely agree with the spirit of your post. In my mind orthogonality serves two purposes:

  • Support for "generic" programming.
  • Providing elegant implementation and analysis.

I think the real sin is not having a clear mapping from your outer layers to the core language or making the core unavailable so that users are forced to use non-orthogonal constructs (unless this is actually an explicit requirement of the language).

Non-orthogonal constructs (in this sense) are just "syntactic" templates. While I agree that too much sugar "causes cancer", I don't think that a few well considered short-hands is unreasonable for most languages. But, perhaps this is just personal preference. Then there's also the question of whether a particular feature is better handled by a library or by the compiler itself. Different languages, different requirements.

EDIT: I thought I should try to return to the topic of your original comments by mentioning that perhaps libraries could also be designed with orthogonality in mind. Maybe "layering" API's in a similar fashion is the way to go about making them glue friendly.

orthogonality in the semantics

The non-orthogonality of a core language arises in the semantics. For my favorite example, the proper tail recursion requirement for scheme makes binding (the creation of an activation record) and recursion orthogonal features.

Say you start with a language that has parameterized subroutines that use a function call syntax but the parameters are simple static variables that get assigned by side-effects. If you add binding to the language without the guarantee of proper tail recursion, the two features are non-orthogonal. In particular, it means that even if you write programs where only one activation record is needed for each function (aka a tail-recursive program), you still pay the price of the binding feature. With proper tail recursion, the semantics are orthogonal.

I think Todd Veldhuizen's work on guaranteed optimization is along similar lines.

Tortured analogy

Actually I'm not aware of any formal definition of orthogonal for programming languages that can compare with the situation in linear algebra and harmonic analysis.

Even though wavelets and trigonometric series can represent each other, the coefficients in the infinite series do not decay fast enough to make this equivalence in a formal setting translate to equivalence in a practical setting. That is, even though the two bases are not orthogonal in the infinite setting, there is a sense in which they are "orthogonal" when the number and size of the coefficients is constrained. It has to do with defining "efficient representation" in a sense that is transitive: If basis A represents the functions in basis B efficiently, and basis B represents function f efficiently, then basis A represents function f efficiently. Note I am not claiming I have such a definition of efficient representation, just that any such definition should have this transitive property.

There is a similar conceptual linkage between how we use the term "orthogonal" and the notion of "efficient expression". For example, you can reduce the SK combinator calculus to one based on a single combinator, but it causes an explosion in the size of terms. This explosion in terms is what makes me hesitant to not call S and K orthogonal even though they can be expressed with a smaller vocabulary (basis).

There are other ties between harmonic analysis and expressing programs with formal (infinite) series in noncommutative algebra, but I'm not conversant enough with them to avoid violating the "ungrounded discussion" policy. If there is someone here with that knowledge, I'd love some pointers on the literature.

Kernel Languages And Expressiveness

Actually I'm not aware of any formal definition of orthogonal for programming languages that can compare with the situation in linear algebra and harmonic analysis.

There are two concepts which can be compared: kernel languages (pdf) and expressiveness (ps.gz).

Unfortunately, the situation is not as convenient or as fundamental as it is with mathematical orthogonal bases, since it is much more work to take a given language and decompose it into basis elements, and many language features that were not explicitly designed with this approach are bound to overlap in power.

Which is too bad, because I think more consistent and coherent languages would result from designers starting with these concepts at the fore of their minds.

Calculi perhaps?

It would be nice if there were some methodological way to analyze whether there is overlap between concepts. A pipe dream?

To me it appears that the various calculi are trying to achieve something like this. E.g. like lambda calculus describing the "space" of all computable functions.

Calculi Conundrum

To me it appears that the various calculi are trying to achieve something like this

Yes that's true. For example, part of the program for Milner's bigraphs formalism is explicitly to provide a lingua franca of distributed computation. The expressiveness paper explicitly explores concepts by translating into calculi.

I think the catch is that a lot depends on how you translate from a given language into your calculus. There may be more than one way to do it, and some of those ways may not clarify the concepts as well as others.

Language => Calculus <=> Semantics?

Oh, yes I see what you mean. I feel a little dumb for not having quickly skimmed the expressiveness paper before posting.

I think the catch is that a lot depends on how you translate from a given language into your calculus. There may be more than one way to do it, and some of those ways may not clarify the concepts as well as others.

I was going to mention that I find it interesting that a mathematical space can have more than one orthogonal basis and there exists transformations between them. The ideology sounds a little similar to the concept of translating between formal languages which is also related to denotational semantics I guess.

Ian Piumarta and VPRI

... say points very similar to the one you are making.

So I don't think this is a "tortured analogy"... just one consistently ignored. ;-)

Extensibility

In the same way that a simple core library gives rise to a complex and feature complete set of libraries, an extensible core language gives rise to a complex and feature complete language.

Academic or research efforts can acquire all the necessities of a successful "modern" language by offering extensibility and the means to tap into an existing language library.

For me, I think the following features make a good core language: continuations, structure expension (prototyping), pattern matching, macros, message passing.

Open or Distributed Setting

While I've made a few stabs at designing a system with an extensible core language, I have been unable to reconcile the issues surrounding code distribution - especially in open systems (which is to say: code distribution across trust and security boundaries) - with the issues surrounding any extension aimed at supplying an end-to-end runtime feature (GC, transactions, persistence, data-flow secrecy, redundancy, disruption-tolerance and well-defined partial-failure modes, demand-driven publishing, even distribution itself).

Extensible syntax, OTOH, causes no such problems... one simply distributes a post-parse structure (an AST being one possibility).

In any case, we may do better to drop the notion of libraries-as-deliverables and start thinking about alternative forms of modularity at the service layer. The 'applications and libraries' approach to code distribution has far too many fundamental security, instancing (including upgrade), and composability problems - i.e. for a library to support any form of persistence or access to shared resources requires ambient authority, and there simply is no standard way to compose applications or services as there once was with the Unix pipes and filters. I discussed this elsewhere, and won't go into more detail here.

What I'm aiming for is a combination of publish-subscribe model and tier-less programming, i.e. where live services - not libraries which can be instanced into services - are the main unit of composition... and where, when you create a new instance of a service, the appropriate bits and pieces of the implementation are distributed automatically to both the publisher and consumer resources (spontaneously forming multi-cast networks and such).

I'd also like to see a massive change to how 'GUI applications' are developed and distributed. The current approach to applications has trouble with at least persistence, zoomability, accessibility (for screen-readers, internationalization, transforms for small screens on mobile devices), composition (and styling), security principles, and sharing (i.e. consider producing a shared form F where edits by one user are seen by others, and then embedding this form in another arbitrary application A, then internationalizing A).

Something like REST + Publish/Subscribe as a basis could (with enough support) be better in almost every way... and would also integrate very well with the sort of automatic code distribution described above (i.e. distributing or duplicating part of the server on the client, along with the necessary subscriptions, but without introducing a DOM).

I think Small is Still Beautiful, but that we are being driven to these 'monolithic' solutions due to lack of appropriate tools for a very small number of very large problems: security (authority, privacy), persistence, and distribution. I think that if we solve these problems from within a language (or runtime) and achieve wide distribution of the runtime then 'small' can be practical again . . . at least until we discover the next big problem. There will, of course, always be a next big problem, but it won't be a significant one until we make progress on the existing problems.

FWIW, I really like the direction of your comments

I'm partial to the "solve these problems from within a language (or runtime)" approach to the problems you describe. 20 or 30 years ago, we had a "system" and "environment" which solved certain hard problems of the day in a standard and reusable fashion (think Unix, VMS, etc.), providing "Terra firma" for both the design/development and programming/use of innovative, relatively "small" languages.

We need to find our Terra firma once again, and yes, it probably means solving the central hard problems you enumerate as well as very elegantly exposing these solutions to the rest of "the system."

Scott

The Demise of the Core of Orthoganal Features?

First, thanks for the comments. Second, I'm sorry my original post was pretty free wheeling.

Third, I found Pike's presentation on system research *analogous* to language research, as Pike speaks explicitly about *actually building* "complete things" during research efforts; and he bemoans excessive specialization and phenomenological approaches in current system research.

Off the cuff, I'd propose that the equivalent "problematic approach" in language research is the "toy language" with its elegant, core of orthogonal and composable features, something akin to C's standard library and maybe today a TCP/IP library as well.

20 years ago, this might be a formidable tool (heck, awk is pretty cool, IMHO), but industry and open source development of production languages renders these efforts "toys" - just like the too focused and relatively minor and myopic efforts Pike criticizes in his presentation on problems with systems research.

The issue are both (1) libraries and (2) language features. Regarding the former, I have in front of me P.J. Plauger's "The Standard C Library", all 498 pages of it including appendixes and the index. It includes the complete source of the C library along with very considerable and wise commentary. Brodie's "Starting Forth" weighs in at 346 pages, and Stroustroup's and Ellis's "The Annotated C++ Reference Manual" is 447 pages, including the index and again, considerable amounts of sagacious commentary.

And the 1029 pages of Steele's "CLtL2", once the subject of either awe or ridicule based on its sheer size, now describes only a relatively modest size language, its module/package system and its i/o, collection, object system and mathematical libraries. A lot has changed in 20 years!

Now imagine a book comprising the printed Java Docs of the entire Java standard library (Server), including both GUI libraries (are there more now?), JDBC SQL Db access and so forth. I conjecture that such a tome, even minus much or any wise commentary, would likely count thousands upon thousands of pages. And this tome would include no source code and little to no "specification quality" prose. Just 1000's of pages of API's!

Something has changed in the last 20 years or so, and one of them is the "expected" size of the library of a truly useful, "real" programming language and not just a "toy" language.

20 years ago, I recall vividly arguments about whether or not Lisp should have a standard FFI, and today, a "scripting" language Perl has built in support for Unix shared memory and a library database, CPAN, that would take many careers to master.

As for language features, there are too many examples. In one recent online presentation, I recall a comment that MS's LINQ feature already comprises over a million lines of code! And, AFAIK, despite it's alleged theoretical elegance, it's primary use is for syntactically and semantically nice database queries. Gosh, weren't MS's ODBC (MS's original SQL DB API akin to Java's JDBC) or the oodles of follow on MS DB API's enough?

So what happened to "orthogonality." What happened to using libraries (typically function or object based) in lieu of new language features? Nah, I guess users want new language features that cost 1+ million lines of code instead.

We can take a more detailed look at Python's (not so) simple list comprehensions. But don't map, filter, zip, reduce, etc. do the job just as nicely? Why do we need a new language level syntax supported feature when "orthogonality" and "composition" are the hallmarks of good language design?

The answer: "small primitive features," "orthogonality" and "composition" may remain important, but THEY ARE NO LONGER the *primary* hallmarks of "good" language design.

Today, languages beg for EVER MORE SYNTAX SUPPORTED FEATURES - whether it's Perl's bizarre (IMHO) flexible syntax, or Python's list comprehensions, or something even fancier such as Scheme's SRFI 42 "Eager Comprehensions" which, yes, cheats due to Scheme's ability to add new syntax at will, but also appropriately provides "collectors" for a wide variety data structures instead of just lists.

And if that takes a few thousand, or a few 100 thousand or even millions of lines of code - so be it. Users expect it - more language level features; convenience; and overlap be damned. I'll bet good money that the "concurrency issue," excepting C and C++, will be "solved" only by languages that provide extremely friendly and safe language level, syntactic concurrency support to users - primitives and users' theoretical acumen to properly "compose" and manipulate them forever be damned.

So there we have it: large languages with many overlapping features each with a supporting, convenient language syntax; and on top of these relative language behemoths, giant libraries of literally hundreds or even thousands of modules, some perhaps add further syntactic support to the degree the language allows it.

That is what 95+% of programmers under 35 years old consider an "OK" *candidate* for a programming language.

So back to Pike's presentation and set of concerns: applying it to language design research, how will it be possible in the future for researchers to build "real" languages and not just myopically focused "toy" languages, small libraries, etc.?

Scott

Big language research

So back to Pike's presentation and set of concerns: applying it to language design research, how will it be possible in the future for researchers to build "real" languages and not just myopically focused "toy" languages, small libraries, etc.?

Big PL research is as dead as big systems research is. These days, you couldn't get funding to do a new OS or a new language, rightly because there is so much prior work out there that has yet to be digested.

Actually, new language research isn't so dead; e.g., consider Scala out of Switzerland. But even here, the research focus is on more bite sized features as opposed to the language itself, which is a more pragmatic bundling of those features and not a noteworthy contribution by itself. Also, Scala builds heavily on the JVM, which is a huge savings in needed effort.

My approach is to focus on "toy" languages that are special purpose and complete, as well as "libraries" that rely on the extensibility features of existing languages. This is usually enough to explore/validate an idea, which, once validated as useful, could every so often be collected into a larger programming language effort (same with systems).

Not dead yet

Big PL research is as dead as big systems research is. These days, you couldn't get funding to do a new OS or a new language, rightly because there is so much prior work out there that has yet to be digested.

This is highly unfair to the numerous people doing "Big PL" research. Scala, GHC and PLT Scheme are all big languages in which people get real work done, developed entirely by academics. And your claim about Scala focusing on "bite-sized features" is just wrong - Martin and his team have put a big effort into coming up with a language that fits together nicely.

Other big new language design efforts outside of academia include Clojure and Fortress, both of which are Big PL in every sense.

I think Ocaml should also be

I think Ocaml should also be mentioned.

This is highly unfair to the

This is highly unfair to the numerous people doing "Big PL" research. Scala, GHC and PLT Scheme are...

Please reread my comment, and note the use of "Big PL research." I in no way said that big PL design/development was dead, just the ability to get a single grant to do this. Instead, you've got to scrounge around for various sources of funding using smaller proposals based on problems that don't necessarily require a big language solution, or the funding that goes along with such a solution. This also applies to publishing opportunities in a SIGPLAN conference.

And your claim about Scala focusing on "bite-sized features" is just wrong - Martin and his team have put a big effort into coming up with a language that fits together nicely.

Yes, I know this from first hand experience. You have to go out and find funding for your language via various small grants that focus on bite-sized features, and the subject of those grants strongly influence the evolution of your language...because otherwise you won't get more grants.

Other big new language design efforts outside of academia include Clojure and Fortress, both of which are Big PL in every sense.

I'm not sure about Clojure, is it a big PL research effort with a team of 5-10 people and a stable funding source? Fortress is/was definitely an academic effort in that it has/had a government/academic (DARPA) funding source and is/was not funded by Sun. Do you know what happened to Fortress when their DARPA contract didn't get renewed? The other languages involved in the HPCS program (X10, Chapel) could be seen as the last real big PL efforts, but even these are projects with only a few people that heavily reuse existing technologies (e.g., X10 and Java).

In general, all research efforts have had to adjust to reduced funding and the end of big/long term grants. This applies equally to PL as it has for systems. These days, to survive and produce, you have to build on existing technologies and choose your innovation points very carefully.

There are at least three

There are at least three reasons I don't buy this:

1. Bush is no longer in charge of US science funding

2. Funding for multi/manycore and cloud computing projects (US industry, used to be mostly US government)

3. Funding for security projects (US government)

I do a agree that there isn't as much interest in new general programming model for basic large-scale SE as opposed to new languages that address the above.

Furthermore, the trend of empirical and more end-to-end examination of how we build software in conferences like ICSE and FSE is revealing that our languages are pretty disconnected from what's going on -- traditional big PL felt more like a mathematical art, and this work suggests we're finally starting to understand the domain like a science. Unless there's a concrete, motivated challenge for the language (like hw-driven performance or particular security properties), it's hard not to call bullshit on whole-cloth language designs as being too short-sighted.

Imagine proposing something like Self today (e.g., a big step like the Worlds work or Coherent Reactions). We might find it intriguing.. but insufficiently compelling to move funding from say concolic testing groups. However, if we claimed the Worlds work is appropriate for an 800 core machine or writing a secure, collaborative MS Office.. that'd be different.

I understand your argument

I understand your argument but I'm going to disagree a bit:

  1. Bush is no longer in charge, but the US is becoming bankrupt. Even if we want to throw more money at science, we probably can't.
  2. What industry or a OSS group considers research isn't really at the same level that academics consider research. Research to me is very much a pie in the sky affair, and industry more often than not can't afford to work like that.
  3. Security projects are more concerned with existing software and techniques rather than creating new languages that fix problems. E.g., bug checking tools for C++ rather than using languages that are just more safe in the first place. Very short-term pragmatic, even when funding comes from the government.

None of this has been proved or disproved: we are both making guesses about the future that may or may not pan out. At this very moment, I personally believe big PL research is dead, but resurrection is always possible if our priorities change.

Big steps...I'm curious when do you think was the last time that someone made a big step in PL? Early or late 90's? Sometimes I wish I was 20 years older than I am.

Last big step?

Big steps...I'm curious when do you think was the last time that someone made a big step in PL?

My "Ha, ha, only serious!" answer would be some date in the early 1960s.

I'm currently rereading a book that offers a highly-detailed description of a working implementation of a block-structured language with sophisticated features (such as call-by-name parameters). The compiler produces code for a virtual machine, which is also described in detail in the book.

The book is ALGOL 60 Implementation by Randell and Russell, completed in late 1963 (copyright 1964 by Academic Press). It is a fascinating exercise to re-read some of the classics in our field from the late 50s and early 60s and observe how many ideas from that time frame are still fresh (or barely emerging) with respect to mainstream programming.

The reference to mainstream programming instead of academic research is deliberate. IMHO a "big step" must actually enter the arena of practice; the lag in programming is still quite large.

FYI

And your claim about Scala focusing on "bite-sized features" is just wrong - Martin and his team have put a big effort into coming up with a language that fits together nicely.

Sean was one of Martin's postdocs...

how will it be possible in

how will it be possible in the future for researchers to build "real" languages and not just myopically focused "toy" languages, small libraries, etc.?

By building off of common language runtimes. If you look at the more... "real" languages developed recently, they all tend to be built off of the JVM or .NET. Scala, Boo, F# to a degree, Clojure... That allows the language designer to focus on the language at least and not the wide-ranging functionality required for practical application (and thus not-"toy" status).

Re: That is what 95+% of

That is what 95+% of programmers under 35 years old consider an "OK" *candidate* for a programming language.

Being under 35 and a programmer, I take offense to this as well as questioning it ;-). Look how much code people write in C and C++ (without template libraries turning it into something else) even today!

IMHO, people use Java, C#, Python, and Perl more because of their VMs than their language features (well, except Python). Each of those languages includes a mechanism for distributing code libraries without worrying about ABI compatibility. This is a first for anything as far above the expressive level of C as they are, and it makes these languages powerful.

Complexity

Data Point: As someone who has written a C# parser in C++ via hand crafted recursive descent with k look ahead, Java is a much simple than C#. I converted the parser to Java by deleting a bunch of semantic constructs and swirling a couple, and switching the lexer to tokenize Java keywords. Java is actually a tight little grammar compared to C# (but not compared to SML or Haskell).

What are you replying

What are you replying to?

I'm not sure I understand your comment. It just looks like data, not information. What is it in response to? The idea that many programmers prefer C# and consider it a more beautiful language than Java, despite having a large discrete structure?

The next Revolution in Evolution will be small...

...here's my Not So Humble Opinion of what the future holds.

All current mainstream stuff will suffer ever ballooning Mahlerian Elephantisis and Gigantism.

Until a tiny, easily analyzable language with powerful global analysis, rewriting, refactoring and factoring facilities gathers enough steam to trump them all.

We are too small and dumb to understand the programs we are writing today.

We are going to need a lot of help from the language design and automated tools to understand how the next generation works.

I bet the Author's of this future language will credit Joy, Cat and Factor more than Java and C#.

I dream of a gcc back end that outputs code in one of these small languages.... and then refactors and factors the result.

I'm convinced we can shrink the current corpus of gcc compatible programs by a factor of a 100. ie. Achieve the same functionality with a source code body a hundred times smaller.

It's still just a dream...

Funny you should mention ....

Funny you should mention Joy, Cat and Factor and then speculate on shrinking the current code base by a factor of 100.

Dream or not, back when I was teaching myself Forth in the early 1990's, I was AMAZED at the time at the amount of features, functionality, libraries, tools and programs crammed into very, very tiny Forth distributions. In those days, we used to speculate about how Forth and Lisp could share so many interesting general attributes, but with Forth being so "tiny" and Lisp being so "giant." I'm still not sure I systematically understand all the reasons why.

I thought Cat had real potential as a language design and implementation. I had just downloaded the distribution recently when I learned, if I understand correctly, that any further development and maintenance of Cat has come to a halt.

It's too bad, because aside from good ol' Scheme (I use PLT's Typed Scheme where possible) and Lisp, I'm looking to languages with manifest types (both for compilation/performance as well as compile time type checking).

Scott

Parts of a Solution?

First, I'm surprised that no one brought up arena's where small is not only beautiful, but is necessary - say languages for phones or the oodles of different types of embedded applications.

Second, if any "real" programming languages are now "big" (in either or both ways: libraries and features) and research is mostly conducted "in the small," we have a problem. I'm not qualified to even begin to address the latest fashions in research grants.

Third, I think there are at least parts of a potential solution all around us. In no particular order....

Something like the JVM can provide access to giant libraries of functionality, but the JVM wasn't really designed as a platform for "delivery of advanced research languages." So we rightly kvetch about TCO, other calling convention limitations, the lack of fixnums, boxing doubles in numeric intensive code, the generics implementation and so on. Oddly, something like the JVM "stack" that provided *fewer* features might be more suitable to the problem at hand!

Now IBM certainly loves the JVM, and IBM is interested in language research, and IBM works with academics and IBM is no stranger to work-a-day computing pragmatics.

So where is the effort to rejigger the JVM to (*) remove certain features or make them more flexible, generic and "CPU" like to support well a broader range of language semantics; (*) provide first class support for static data - static arrays, structs, etc.) (*) work through the multi-calling convention support issue (*) work through the calling convention interoperability issue at least as well as the SWIG guys have, so via some standard platform IDL (ideally produced by one or more new standard platform parsers) we might imagine writing goofy "skinned" MPEG players for our desktop boxen in SML and some latest/greatest concurrent constraint variant of Prolog with some Java libs playing backup; (*) complement the current bytecode to JIT model with a "compile to native code upon install" model; (*) allow for some degree of plugable module systems; (*) design plugable source compilation and source management systems to support ye old javac *.java file oriented computing environment, Forth/SmallTalk/Lisp style per-method compilation, and yet-to-be imagined ways of "using" programming languages; (*) provide "hooks" to support customization for potentially widely different ways of determining recompilation dependencies.

Needed paragraph break. Redesign javadoc to support languages with different lexical conventions, while hopefully better supporting a "system wide", multi-language repository of library/language documentation; (*) support a standard debugging infrastructure and "hooks" for the language specific intermediate representation(s) to propagate source level information through to the compiler/runtime/debugger; (*) keep banging on tools for Eclipse to make it ever easier to build at least an OK "modern" IDE for new languages; (*) build the infrastructure for the equivalent of CPAN and its mirrors, to gather, catalog, make bug reports, version, etc. all of the oodles of new libraries we're going to have; (*) lay down some law (probably via greatly deprecated status granted to single platform libraries) on the whole *nix/Mac/Win religious b.s. After all, the point is to have a lot of *cross platform* language implementation and runtime infrastructure to support research - so no, you can't just post a paper in TeX in lieu of reasonable system-tool-generated API documentation.

I have other thoughts (such as widely divergent degrees of *will* to have one's research consumed by the masses from person to person, department to department), but my fingers are tired and you're probably bored by now.

Oddly, the very presentation on *system* research by Pike that put this "Big is Now Beautiful" language research conundrum in my head has alot to say about what I'm proposing as a potential (partial) solution to the conundrum:

We need a REAL *SYSTEM* that supports language design, implementation, source/module management, IDE, documentation, compilation, calling convention "glue" generation, linking, and a flexible "modern" ready-made runtime.

We already have such a *system* in place now - Emacs; lex/yacc; gcc; gas; ar; make; TeX. The problem is that it's just no longer good enough for the age of researching, designing and implementing "Big is Beautiful" programming languages.

Scott

First, I'm surprised that no

First, I'm surprised that no one brought up arena's where small is not only beautiful, but is necessary - say languages for phones or the oodles of different types of embedded applications.

There is no relation. C++ and Ada are both used in embedded environments, and they are hardly "small" languages.

Small runtime

I think the author of the above post was focusing on "small as in run-time".

I'm not sure I'd agree, though. The run-time for C/C++ is pretty much the operating system by happenstance. If one was running the C/C++ run-time atop a capability-secure Java operating system, I doubt it'd look small or cheap. (Process-level isolation is super-expensive...)

If that's the case, I think

If that's the case, I think the author is mixing up many types of "small". The topic of this thread was about "small beautiful languages", not about runtimes. Generally small beautiful languages have heavier runtimes because many low-level details are abstracted over.

My notions of embedded programming are way out of date

I still think of bearded dudes up at NASA (I used to go to their Forth SIG every once in a while) using tiny Forth systems to cram control systems for "outer space stuff" into 16K - that kind of probably now way outdated "picture."

And yes, this would mean both a small language runtime as well as a small language design - and it's likely that here I do tend to conflate the two. So accept my apologies for faulty logic and forgive my backwards notions of embedded system programming.

Scott

Da Vinci Machine

It sounds like a number of things that you're interested in are within the goals of the Da Vinci Machine project. I'm assuming you know about it, so maybe you'd like to clarify why you don't think it's a good direction?

My only other thought, really, is that it sounds like you're really asking for a lot, much of which is outside the range of things that "programming languages" traditionally has encompassed. And not only in the sense of academic research... A lot of PL hobbyists are really just interested in tinkering with PL ideas, and lots of the stuff in your list just wouldn't scratch that itch. I think there are a lot of very good reasons why much of what you're proposing has not and will not come to pass, even if it should.

I must admit my ignorance

The Da Vinci Machine project is definitely at least kin to the types of solutions to the types of "Big is Now Beautiful" problems I've been thinking about. I've only skimmed the surface of the information available on their Web site, but it's a heavy duty effort to remove some significant impediments to programming language design and implementation. Very cool, and I'll be delving into this effort further.

Scott

We have the CLR because MS

We have the CLR because MS has to produce a "one up" version of whatever else is popular in the computing world.

What. Please, don't spread false statements around. MS invented the CLR at least in part due to uncertainty about JVM licensing with Sun. Please see the wikipedia article on Microsoft's JVM.

The bigger picture is that Bill Gates' lovechild idea is "networkable graphics", which has more or less come to fruition (Silverlight). A managed runtime, such as the JVM or CLR, is necessary for such an idea to be possible. (This is not opinion... it comes straight from lead MS architect Chris Anderson's preface to the book, Essential WPF.)

OK

OK