The Next Mainstream Programming Languages

I didn't see anyone post them yet, so here are the slides from Tim Sweeney's POPL talk entitled "The Next Mainstream Programming Languages: A Game Developer's Perspective". I know Tim and I aren't the only game developers who follow LtU, and I figure even non-game developers might find them quite interesting!

http://www.cs.princeton.edu/~dpw/popl/06/Tim-POPL.ppt

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

PDF Version of the slides?

Are these slides available as PDF-Version as well?

PDF version

I found a pdf version on the homepage of the "Advanced Functional Programming" seminar course at Saarland University

(I haven't been able to download the .ppt (server seems down), so I'm not sure it's exactly the same.)

PDF of Tim Sweeney's POPL talk

The PDF you found was created from the PowerPoint slides using Keynote -- so they are the same, except for animations.

Good points...

As an ex-game coder I agree with his empasis on concurrency, this is only going to be more and more important and not just in games. This is the major angle people interested in functional programming can leverage with people interested in performance.

Then again the only success I had in getting functional code into a game engine was a scene-graph scripting language which would have been hell to debug if it had side-effects.

Not sure about being able to eliminate all numerical/array index etc. exceptions though. A lot or most maybe but not all IMO.

Also not sure about lenient eval' either.

Very interesting and important stuff.

The conclusion is that we need better 'type' systems that can catch more requirements and programmer assumptions. This ties in beautifully with the other thread about static vs dynamic typing. Programming languages should generally go the static way, doing as much as they can in compile time, but also leave the door open for dynamic stuff, should it be necessary.

I agree with most of the points raised, but I have questions on the following:

1) I do not see how variable length integers are that important for games. Wouldn't 64-bit integers be enough? what game value would go over 64 bits?

2) bounds checking is not needed if the language makes indices be strong types. ADA has this feature very nicely done.

3) I did not understand how horizontal subclassing will offer more than vertical subclassing, as the new features introduced by horizontal subclassing will not be used by the rest of the game engine anyway without modifying the engine.

4) I do not agree that garbage collection is the end to memory management. I think there exists a 'calculus' for resource management (or effects management). After all, if we take a program and lay down its instructions one after the other, it is pretty obvious when an effect takes place.

I liked the part that shows how ineffective "modern" programming languages are. In just 3 lines of code, there are 7 possible sources of error or uneccessary overhead. If only programmers could realize this...

A few things

Storage size is important for performance - smaller storage requirements improves usage of cpu caches and can help prevent paging and keep minimum sys requirements down.

For some things just having a form of non-null reference would be handy. In fact wouldn't using C++ references prevent the problem that the parameters could be null here?

C++ refs: You'd think so, wouldn't you?

Consider the following code:

T* f() {...}
void g(T& y) {...}

void h()
{
  T* x=f();
  T& y=*x;
  g(y);
}

I've seen this happen, when f() and g() were written in different epochs of a team's development practices. Say, the original developers didn't like references, so they wrote f() to return a pointer; later hires managed to get the culture changed, so new code, such as g(), was written to take references. Then h() was written that had to talk to both.

C++

Null references are undefined behavior but they can still be created. An example

agreed !

It seems to be a way to get around non-extensible classes in a static class world.
--
Trojan Encyclopedia

Horizontal Subclassing

3) I did not understand how horizontal subclassing will offer more than vertical subclassing, as the new features introduced by horizontal subclassing will not be used by the rest of the game engine anyway without modifying the engine.

I think the idea of "horizontal subclassing" is to enable modification, extension, and redefinition of an already existing class. For example, altering an Actor to give it another field that *all* classes extending Actor will inherit and without editing the Actor source code.

It seems to be a way to get around non-extensible classes in a static class world.

Tim, am I close?

Open classes?

I think the idea of "horizontal subclassing" is to enable modification, extension, and redefinition of an already existing class.

Is not this usually called open classes?

How is it different from just adding new fields to class Actor?

How is that different from just adding new fields to class Actor? If all classes extending Actor will need those new fields, then it makes sense to add the field to class Actor. Already compiled code will have no use for those fields anyway. If, on the other hand, only certain classes need those fields, a simple refactoring (i.e. a new base class inheriting from Actor with those fields) solves the problem.

In either case, horizontal subclassing seems reduntant.

Modifying class Actor

Modifying class Actor directly would solve the immediate problem, but remember that for a game developer using "UnrealEngine3," class Actor belongs to the engine code you liscensed from Epic. If two months later Epic comes out with "UnrealEngine3.1" with cool new features you want to use, then you have try to merge your local changes into the new code.

I think Sweeny is arguing for "horizontal subclassing" so that liscensees of his engine can customize it to their liking while still being able to take advantage of future releases of the product.

Hi!

Hi!

would a good version control system not solve Sweenys immediate problems?
(of changing the basecode of the Unreal Engine and then merge the Unreal Engine changes back into the old codebase.)

Is there a version control system like Monticello (Smalltalk) for c++?

Merging: not a great solution

(wince) I don't know about Monticello, but in my random experience merging code is time-consuming and error-prone, even with good tools. (I like Araxis Merge, combined with any decent version control system.) Merging code is a common solution to a range of problems, but that doesn't mean it's pretty.

Choice of programming language definitely affects one's ability to cope with change. If programming language designers found better ways to support this (alternatives to merging), they'd be welcome.

My question is still not answered.

I still do not understand how horizontal subclassing will solve the problem. What is the difference between horizontal subclassing and vertical subclassing? Sweeny's problem could be solved by introducing a class between Actor and Actor-derived classes. Changing all references of 'new Actor' to 'new MyActor' is a matter of using the right tool...

Not That Easy

It's really not that easy to modify a large class hierarchy in C++ (or, presumably, UnrealScript) in such a way that you don't risk introducing bugs when the base class changes, which, again, Tim has said happens with their licensees every couple of weeks! To gain a better understanding as to why that might be, read Oleg Kiselyov's excellent OOP Critique or Peter Norvig's Java IAQ. Virtual classes are a great solution to the problem, but are somewhat difficult to keep type-safe. The paper "On the (Un)reality of Virtual Types," mentioned elsewhere in this thread, contains an excellent explanation of the challenges inherent in a type-safe virtual class system while at the same time offering multiple type-safe solutions that are extremely easy to express in O'Caml. The executive summary is that any system with parametric polymorphism, structural typing, correct binary method typing, and open self-typing allows for the easy implementation of virtual classes. It's worth noting that Scala also has no difficulty expressing virtual classes.

But why

I have to concur, I don't see the benefit of extending classes horizontally.

Considering that the base level engine will never make use of the extensions introduced in Actor, then the only users of those extensions are in the specific application that defines them. The underlying Unreal framework would not even know those extensions were present.

So, that being the case, do you not achieve the same results by creating a MyActor with the enhancements needed that subclasses Actor, then require that this be the Actor class that your application is allowed to use? Then if the Unreal engine changes, your modifications are isolated.

What does horizontal extension provide that normal vertical inheritance doesn't?

What if Actor isn't a leaf class?

Imagine there are other classes like Player that extend Actor in the engine framework. How are you going to create MyPlayer such that it extends both MyActor and Player? (If you said Multiple Inheritance, then now you have two problems)

Use delegation?

You're right, an open class model would simplify this situation, one I hadn't considered. But, assuming a small number of possible child classes off the non-leaf class you wish to modify, can you not also do it via a delegation model?

If you define a simple object, MyNewFeatures that handles your new features and an interface IMyNewFeatures for those methods, then define MyActor extending Actorand MyPlayer extending Player and havet both implement IMyNewFeatures. Then if you delegate their interface method calls to an instance of MyNewFeatures, do you not have the same result?

Again, neither MyActor nor MyPlayer are used by the base engine, only in your higher level code, so you can actually treat them in your code as IMyNewFeature objects. It won't matter to your code which one it receives, so long as it fullfills the required contract. You wouldn't be able to treat MyPlayer as an instance of MyActor, which depending on design, could be an issue, but you would be able to treat both as IMyNewFeatures.

Obviously if there are a large number of child types that you want to inherit the new features, the work effort here can increase, so I'm not arguing that open class models are unnecessary. In a newly designed langauge, that type of feature would be something to consider from the beginning. But, with existing lanaguages, there are alternatives that can get you a good portion of the way to what you need, albeit with more work.

Workarounds

I agree there are workarounds, but piling everything into one big IMyNewFeature interface looks unattractive to me.

In a real engine, the root of the inheritance hierarchy probably won't be Actor, but rather Thing or Object, and you can easily have a somewhat large number of classes (even non-leaf ones) in the hierarchy. Also, there will likely be a desire for more than two levels of framework - for example separating the First-person Shooter logic into its own layer inheriting from the generic game layer.

Also, your suggestion lets you cast down to the IMyNewFeature interface as needed, but you haven't explained how you're going to cast back to either Player or Actor when you need to interact with the engine layer. It sounds like that will either involve an unnecessary runtime check or a supression of the type system.

It looks to me that without a language improvement you end up either with a system that's pretty verbose or that lacks some nice property. With the right language improvement you can have straightfoward code with nice properties. And note: Named frameworks with inheritance are more flexible than just open classes in that you can have e.g. ShooterGame and PuzzleGame frameworks that extend GenericGame's classes in different ways.

Sorry. Can you explain, what

Sorry. Can you explain, what does "horizontal subclassing" and "vertical subclassing" means?

These aren't the normal

These aren't exactly standard terms used, but...

Vertical subclassing is normal subclassing.

Horizontal subclassing is a method of altering and extending an already existing class without altering its source.

The units of extension are

The units of extension are different.

In vertical subclassing you have a class extending the class.

In horizontal subclassing you have a class library extending other class library. Or module/package extending another one.

I think UML2 package extension mechanism specifies this model of work. They have quite a funny scenarios defined. ML modules might also come close, but my knowledge of ML is quite limited.

The idea is actually might be useful in other contexts, so I suggest looking into it. I borrowed the idea as it comes in UML2 for ETL to implement grammar extension mechanism.

How does Ada do it?

2) bounds checking is not needed if the language makes indices be strong types. ADA has this feature very nicely done.

I looked around for information on Ada's solution, but couldn't find what I needed. How does Ada do it?

Dependently typed array bounds would guarantee no out-of-bounds accesses without having to perform any dynamic checks.

Run time...

Not all bounds checking is static. The Ada model supports runtime checks for various error conditions, and array bounds are a part of this model.

"Lenient" evaluation

-- versus eager or lazy.

Can anyone point to more information about this?

I see that there was a discussion of this about 1.5 years ago (here, which also makes mention of "the next mainstream programming language(s)"), but haven't seen much else.

"How Much Non-strictness do Lenient Programs Require?"

How Much Non-strictness do Lenient Programs Require?

While not dedicated solely to description of what is lenient evaluation strategy, this paper is the only (free) online resource I could find on the topic. I didn't look hard, so there may be others as well...

Lenient

Lenient evaluation is just non-stict, non-lazy implementation, aka:

1. Inputs to a function need not be defined/valid -- not strict

2. Unnecessary work may be performed -- not lazy

My understanding is that most non-strict languages are "lenient" ones.

ID, on the mosoon

You can draw a 2x2 grid:

 strict   non-strict    |
-------------------------
 C, ML     Id (lenient) | eager evaluation
           Haskell      | lazy evaluation

(credit for the diagram goes to... I don't recall. SimonPJ perhaps.)

Of course, once you've drawn that, you wonder what goes in the last corner, and what a lazy strict language would look like.

Bounds Checking and Array Comprehensions

Check out SAC for some good work on dimension intedependent array comprehensions and a type system for array indexes.

The slides were interesting,

The slides were interesting, but without hearing the lecture it is hard to understand the deatils. Nonetheless, I feel he is right about the importance on concurrency.

A couple of quotes I found interesting

There is not a simple set of “hotspots” to optimize!

A lot of people falsely assume that in any project there's always a small core that needs to run fast, which can be coded in C, etc., but the rest can be written in your favorite but slow language.

90% of integer variables in Unreal exist to index into arrays

Wow !

Garbage collection should be the only option

This is very interesting coming from a game dev.

Garbage collection could be controlled by game state.

Garbage collection could be controlled by game state.

Imagine if you will, you burst into a room. Monsters are everywhere and you have only a couple seconds to respond before you're toast. Do you want the GC to run now? Hell no!

What if you could defere the GC run until the room was cleared. As soon as there are no monsters in sight, go ahead and process that long overdue memory sweep. Or run it whenever the player opens a menu or otherwise pauses the game.

By defering the cost of memory deallocation until appropriate times, it seems possible to more than make up for the cost for GC. Obviously there will be times when a memory sweep cannot wait, but careful tunning should at least reduce those occurances.

Real-time GC

Presumably what you really need is some form of real-time garbage collection, to ensure that your frame-rates never drop too low. There's plenty of research in this area: you could try here, here, or here as starting points, or just hit Google.

I like that idea. I wonder

I like that idea. I wonder how users would react to a tip that they could increase performance of their game by occasionally pausing it. Most people expect a pause while they quicksave their FPS game -perhaps a good time to GC as well.

GC at the end of a level

Some games with GC-ed scripting languages do a garbage collection between levels. Obviously this means there needs to be an upper bound on how much garbage can be created, but I know of several shipped games that used this technique. (The trend at the moment is to not have levels, though, and have a big seamless world.)

GC would be fine here

You're taking the performance of superscalar computers that can perform billions of operations per second and putting it into a very loose human frame of reference. I'd say a long GC, provided you don't have an enormous heap or you're doing generational collections, would be 2 milliseconds. That's less than 1/8th of a total frame at 60fps. This would be completely fine in an action game like Quake.

Especially with C++

Yes, GC would probably be excellent - in a language like C++ where stack allocation is extensive you don't get the same problems as in Java/Python where nearly everything is an object and tons of temporary objects are created behind the scenes so putting excess load on the GC. A well managed game engine that kept allocation to a minimum combined with coarse-grained objects and stack-allocated temporary objects would probably require very few GCs, and they'd be very fast.

Note that there is research

Note that there is research on Java compiler's optimisation to allocate objects on the stack when it can see that the object is not referenced at the end of the function.
This would reduce the pressure on the GC.

Escape analysis

Right, escape analysis can be used to prove that an object is stack allocatable, and this is the "clean" way to do it as it means the developer does not have to think about these things themselves. And actually the bulk of the proof is showing that it cannot be accessed by other threads.

But, stack allocation is still not with us in production/shipping JVMs, and 10 years ago C++ developers were still benefiting from this. So you could argue that waiting around for the clean solution (in this and other ways) did Java a great deal of damage in the public eye, and unfortunately this damage is often genearlised out to all garbage collection.

Holy cow, this almost

Holy cow, this almost doesn't look like SPAM! Are the bots really getting that smart?

And most other languages too

Again, you're jumping to the "GC is slow in a human frame of reference" conclusion. Tons of temporary objects is no big deal--provided you're using a compacting collector that only cares about live objects.

"magic stack GC" versus live-object GC

I'm not so sure:
Allocating objects on the stack allows them to all be deallocated once you leave the stack frame ("free" GC) - as long as you KNOW they're not needed past that stack-frame (and thus don't need to scan them). I find this very useful in C++, like alloca() in C. (of course, it would be nice to have some guarantee about the "KNOW"ing :)

..whereas allocating "Tons of temporary objects" is still taking up memory until your GC runs.

If nothing else, I would think the latter would give more page-faults & cache-misses, and not be favored by memory-bound apps.

I suppose using the stack for objects with known lifetimes is something like "Linear Lisp", which I wish there were more of..

Region inference

Allocating objects on the stack allows them to all be deallocated once you leave the stack frame ("free" GC) - as long as you KNOW they're not needed past that stack-frame (and thus don't need to scan them).

Region inference can do this kind of thing for you automatically in many cases and also generalizes the idea dramatically. It's been discussed on LtU several times in the past. A good primer on this material is Mads Tofte's introductory slides.

Programming with Regions in the MLKit

I've been reading it during the last couple of days (on the train to work and back). While regions (in MLKit) have some nice properties, there also seems to be some gotchas that the programmer needs to be aware of. I'm not sure I would want to give up the flexibility/freedom of expression that garbage collection gives (luckily MLKit offers both regions and gc). However, regions (without gc) might be an excellent approach in some special domains (real-time / embedded), where resources (time / space) are limited.

Hot-spots

A lot of people falsely assume that in any project there's always a small core that needs to run fast, which can be coded in C, etc., but the rest can be written in your favorite but slow language.

Yes, this is an all too common fallacy. I used to work for Tim on Unreal Engine 2 and 3 and other projects, and one of the programmers who often worked on optimizations, Dan Vogel, was fond of pointing out that UE2 had a basically flat profile. It didn't start out that way, but when you've been optimizing your code for a while it inevitably converges towards this performance profile. The reason many non-game apps don't experience this is that they have a very different idea of what "sufficient performance" means.

Sufficient performance

I'm curious - by the time a game gets to the flat profile, does it usually have sufficient performance? I can understand how repeated profiling would lead software to converge to a flat profile: if you always optimize the hotspots, then eventually there will be no hotspots, because they'll already be coded in C and tightly optimized. But this is only a problem if you need to do further optimization - otherwise, you're done, and profiling has done its job.

Sufficient performance

Well, it better, or you're screwed! Like Tim mentioned, once all feasible code optimizations have been attempted/implemented, you generally start making compromises on the art/level content to increase performance.

Another thing: for a console game the minimum bar is probably a rock solid frame-rate at around 25 Hz. But higher frame-rates generally lead to smoother, better looking experiences, so game programmers are often chasing them.

By the way, there are forces at work that help to flatten the profile which I didn't mention. Especially when optimizing graphics code you are dealing with a pipelined architecture and thus a bottleneck in one of the pipeline stages means that you can shove more workload into the other non-bottlenecked stages without noticably decreasing performance. So, for instance, if you are fill-bound you can push more polygons for "free". Or if your GPU performance is memory bandwidth bound from fetching texture data, you can use more complicated pixel shaders (as long as the extra complexity is arithmetical and doesn't involve extra pixel fetches). Taking advantages of these kinds of oppurtunities also serves to flatten the performance profile.

SPAM

Ah, I see. They are mining previous comments to create new comments with embedded SPAM. Really sneaky!

It's worse than that...

The newest spam-bots hack into accounts of regular contributors and accuse random posts from years ago of being spam.

Deep philosophy

If spam is deleted in the forest, and Matt M doesn't read it, was it ever there? ;-)

Yes... We need to fix the

Yes... We need to fix the spam deletion procedure so that replies get deleted too).

I figured...

...it was something like that. But this approach seemed more fun than just asking what I was missing :).

Games can make compromises that others cannot though

Remember we are talking about a game. Games are not important. If my game spends 90% of its time in arcTangent() [which of course would use Newton's method or some variation to find results to many decimal places] I can write a new arcTangent() that isn't as accurate and call it good enough. (most likely a lookup table)

If my bridge simulation turns out to spend 90% of its time in arcTangent(), there may not be much I can do about it without risking a bridge collapsing.

In the real world most programs are not bottlenecked by the CPU these days. CPUs are fast, programmer time is expensive. Nobody cares. So long as are are not stupid in choosing algorithms things will be fast enough.

That's just wrong...

... any game engine developer would tell you otherwise.

Tripple-A games always push the envelope and will always try to maximize the performance (especially in the game console world, where most of the money is). Therefore, CPUs are *always* slow and yes, we do very much care about performance. That won't change.

What *has* changed is that we've realized that most of the game *logic* processing can be executed using slower methods. This is 50-80% of the code, and *productivity* is by far the most important factor. Therefore, higher-level languages are quite suited for these kinds of tasks.

Not necessarily slower methods...

(well, since apparently spam bots resurfaced this thread...)

Not necessarily slower methods...

Some game companies are moving towards the Mono VM for game logic processing. Interpreted languages (like Lua) carry really high performance consequences, but a VM can JIT the instructions. Moreover, a VM lets you choose from a wider variety of High-Level Languages. With an interpreted language, you pretty much lock yourself into that one langauge. What happens if you want to create an External DSL? Then you are writing another interpreter. Presumably, you might do what Quake 3 did and effectively LLVM the game logic DSL at load-time - this assumes you can stabilize the game logic around basic patterns that are easy to compile down to efficient machine instructions.

The more important void here

[eh didn't feel like continuing this conversation, sorry -- wish i could delete a post, not just edit it =/ ]

GC for memory, ok but for resources?

While 'GC only' for memory seems possible, there is also the problem when an object is used to control the access of some OS (scarse) resources such like files, in which case RAII seems (to me) the only sensible approach..

So I'd say GC only seems to be too restrictive..

PS:
Never heard before about 'lenient evaluation' interesting read even though I'm just starting.

RAII

While 'GC only' for memory seems possible, there is also the problem when an object is used to control the access of some OS (scarse) resources such like files, in which case RAII seems (to me) the only sensible approach..

RAII mixes together initialization, memory management and resource acquisition, which is an undeniably useful idiom in C++. In garbage collected languages you can still get much of the same effect. For instance, in Lisp-like languages it's common to use WITH-* macros. In C#/.NET you can implement the IDisposable interface for your class X and then do

using (X x = f()) { ... }

which will automatically call the Dispose() method on x when the section is exited. So there are plenty of ways to deal with resources that don't confound resource management with memory allocation.

On RAII in higher-order languages

Using fxCop in addition to the C# compiler

Sweeney's presentation lists a number of errors that could be detected at build time but currently aren't:

  • Accessing arrays out-of-bounds
  • Dereferencing null pointers
  • Integer overflow
  • Accessing uninitialized variables

I wonder if he tried tools such as fxCop which can check for a number of these and can be extended to verify more.

FxCop can't be fully complete and sound.

FxCop does what it can with what it's given, but you need more typing information to fully guarantee safety.

You can never fully

You can never fully gaurantee safety.

Garbage collection

Garbage collection performance in games:

Though an Unreal Engine 3 game can easily consume 512MB memory (on console) or up to 2GB (on PC, in the editing tools), less than 10% of that memory consists of data that contains pointers. The rest is bulk binary resources - like texture maps, sounds, animations, which don't require scanning for references.

50MB of data is well within the range of a mark-and-sweep garbage collector taking a few milliseconds within a game running at 30 frames per second. This is what we do. Realtime garbage collection is another possibility, but we didn't want to try to implement that within a primarily C++ engine.

Garbage collecting memory:

I do believe this is completely practical as the sole memory management solution, even in a realtime application like a game.

Garbage collecting resources (file handles, etc):

This is a very different question, because resource freeing has observable consequences beyond performance and memory consumption -- unlike garbage collection, which is justified by the realization that if you don't have any pointers to a value in memory, then it can simply dissapear (or not) without observable consequences.

For example, if you have a file handle open for writing, then other applications can't open that file. You want such resource usage to be clear and deterministic, so that files don't just remain open for a random duration depending on the garbage collector's internals.

In general, I would not advocate garbage collection of OS resources or any other thing requiring explicit cleanup. For example, I think that Java/C# finalizers are a misguided idea, because they have observable, nondeterministic consequences. That is exactly the sort of feature a high-level, secure language should avoid. The bizarre finalization state diagrams for those languages should be enough to indicate that something is wrong here!

For resource freeing, constructs which guarantee that every resource is freed look promising (think of wrapping a file handle in an abstraction like a Haskell State monad). Or just plain old handles with explicit closing (and thus the risk of not closing a handle, closing it when it's not open, etc).

FxCop:

These sorts of tools are useful for analyzing code written in existing/legacy languages. Microsoft has some scarily powerful analyzers along these lines that they use internally, as well. But they're all built on the idea that you have code written in a language without strong safety guarantees, and you want to statically analyze them to figure out some of the things that might possibly go wrong at runtime. That is useful, but it's not comparable to the guarantees one would have in a language with strong safety guarantees.

I remember writing game code in assembly language a long time ago. Then macro assemblers came out and made assembly code easier to write correctly. But the eventual solution was C, which led to ~5X higher programming productivity and a comparable reduction in bugs. Ultimately tools like this are just workarounds for legacy problems to which genuine solutions are known ... and on the verge of coming into commercial practice.

Garbage collection without type information?

C++ garbage collection would greatly benefit if it was assisted by the compiler. During a scan, only pointers to objects need to be scanned. Unfortunately, it is not possible without assistance from the compiler. That is the reason third-party solutions are not good enough for performance-intensive applications.

Memory management and resource management is not the same. Resources other than memory are usually few in a program and not interconnected. A program may have files, sockets, windows and other types of handles, but usually these handles are 'dead ends', i.e. they do not contain references to other handles. Memory, on the other hand, is a totally different beast: a block of memory usually contains references to other blocks of memory. So the idea of finalizers is a bad one, since resources other than memory should be freed by RAII techniques (for deterministic resource management), but memory should be handled by garbage collector.

I will say it again, because no one seemed to notice it: could it be that there exists a "calculus" for resource management?

Resource management calculus

In general this is not possible in a Turing-complete language, because of the halting problem. Presumably the goal of your resource management calculus is to determine if there is some statement within the program that can still access the resource, and free that resource iff there is not. Now replace that statement with "halt()", and your resource management calculus can determine if any arbitrary Turing machine can halt. But we know this is impossible, so such a resource management calculus cannot exist.

Garbage collection, reference counting, RAII, and WITH-blocks are all conservative approximations. They tell you if it's safe to delete a given resource, but they do not guarantee they'll always find every safe situation. That's why it's still possible to leak memory in a GC'd app: just hold onto references longer than they're needed. RAII and WITH-blocks can also fail, but they usually do this because the program went into an infinite loop within the stack frame in question, so you have bigger and more obvious problems.

There may be better resource-management solutions (and personally I think Eros OS has an interesting approach: the disk is just treated as one more level in the memory hierarchy, and everything is automatically persisted. Wonder how they deal with garbage), but "perfect" resource management is impossible.

How about a 'limited' resource management calculus?

Of course, it can not be determined if an algorithm halts or not, but I think it is doable to infer from the source code when some resource is no longer needed.

First of all, humans do it, by running an approximation of the program in their brain.

Secondly, I think it is possible to improve resource management by annotating (either manually or automatically by inference) if an object is stored in a temporary or another long-lived object.

Here is an example of a double-linked List class that shows the above concept with annotations(in Java style). The annotation stored means that an object is stored in another object; The annotation temp means an object is not stored:

private class Node {
    private Node m_prev;
    private Node m_next;
    Object m_object;
}

private class ListIterator implements Iterator {
    public ListIterator(Node node : stored) {
        m_node = node;
    }

    public boolean hasNext() {
        return m_node.m_next != null;
    }

    public Object getNext() {
        Object result = m_node.m_object;
        m_node = m_node.m_next;
        return result;
    }

    Node m_node;
}

public class List {
    public Iterator find(Object object : temp) : temp {
        for(Node node = m_first; node != null; node = node.m_next) {
            if (((Comparable)node.m_object).compare(object) == true) return new ListIterator(node);
        }
        return null;
    }

    public void add(Object object: stored) {
        Node node = new Node;
        node.m_prev = m_last;
        node.m_object = object;
        if (m_count) m_last.m_next = node; else m_first = node;
        m_last = node;
        m_count++;
    }

    public void remove(Iterator t: temp) {
        ListIterator it = (ListIterator)t;
        if (it.m_node.m_prev) it.m_node.m_prev.m_next = it.m_node.m_next; 
        else m_first = it.m_node.m_next;
        if (it.m_node.m_next) it.m_node.m_next.m_prev = it.m_node.m_prev; 
        else m_last = it.m_node.m_prev;
        m_count--;
    }

    public void remove(Object object: temp) {
        Iterator it = find(object);
        if (it != null) remove(it);
    }

    private Node m_first;
    private Node m_last;
    int m_count;
}

Let's see using the above:

class Main {
    public static main(String[] args) {
        List list = new List;
        list.add(new Integer(1));
        Iterator it = list.find(new Integer(1));
        if (it != null) list.remove(it);
    }
}

By analysing the method main using the annotations, we can infer the following:

public static main(String[] args) {
    List list = new List;                      //list is temp
    list.add(new Integer(1));                  //new Integer(1) is stored
    Iterator iter = list.find(new Integer(1)); //new Integer(1) is temp
    if (iter != null) list.remove(it);         //iter is temp
}

We can see that objects list, new Integer(1)(the one used in find) and iter can be safely deleted after execution of main.

I think with a little more work on this, garbage collection could be made much more effective (and in many programs, eliminated completely).

Refcounting

That's basically reference counting, with additional programmer annotations that tell the compiler "Don't refcount this" (the temp annotation). At least, I hope it's refcounting, because otherwise you can run into big problems. What happens if you put an object into two different containers?

I also don't understand some of the defaults you're using. When you create an object within a function as "new Node", is it assumed to be stack allocated? What happens with the assignment to m_last then? Increment the reference count? What happens if you have cycles? Who owns objects that are returned from functions? (Obviously not the callee, but the caller might not be collection, and may do arbitrary things to the reference like passing it to multiple functions.)

Anyway, I don't really see the point, because this system is:

  • Unsafe. If a programmer forgets a temp or stored annotation, they end up with a dangling pointer
  • Inflexible. If you look at large Java projects (say, Netbeans or Eclipse), most objects are nothing more than a collection of references to other objects that manage their behavior. Yet they're not containers of those other objects, more like clients of their services. For example, a ClassPath in Netbeans is used in all sorts of places, from resolving imports to suggesting autocompletions to flagging type errors to compiling the project. This is very different from how C++ programs work, and I doubt that IntelliJ/Netbeans/Eclipse could have been written in C++. Look how Microsoft had to rewrite portions of Visual Studio in C# to get some of the features that were cropping up in Java IDEs.
  • Slow. This'll probably surprise you, but Java allocation is faster than C++.

I'd encourage you to try and pursue this - who knows, maybe you'll find something - but IMHO garbage collection's the best approach out there for memory management, and garbage-collection + orthogonal persistence + formalized concurrency calculi like CSP are the best approach for general resource management.

It is not reference counting.

What happens with this system it is that the compiler does the checks at compile time, instead of run time.

If you put an object in two different containers, the object's lifetime is equal to the longest lifetime between the two containers.

"new Node" is presumably allocated on the heap, but the compiler finds out that it does not "escape" beyond a certain block of code, so it can be safely deleted after this block of code is executed.

There is no problem with cycles, since cyclic objects also have a lifetime.

No one owns an object that is returned from a function. What I presented is a solution to improve garbage collection by doing optimizations based on code flow.

If a reference is passed to arbitrary functions, then the parameters of those functions are also annotated, thus making the compiler aware of how they use their data.

The system is not unsafe, because the annotations were only for comprehension. The compiler will insert those annotations.

I know that Java allocation is faster than in C++, since Java allocation is linear. But C++ wins due to many objects allocated on the stack.

If you put an object in two

If you put an object in two different containers, the object's lifetime is equal to the longest lifetime between the two containers.

How do you determine a container's lifetime? Does making the container (or its reference) a global result in its lifetime being that of the process?

Will the programmer be able to implement their own containers? As I understand your proposal, certain common collection operations (e.g: remove an object, empty a collection, etc.) will require the collection to somehow delete objects (e.g: those that are no longer in the collection). Without access to manual memory management (to delete when appropriate, etc.), I don't see how you'll be able to support cyclic objects as you'll need to treat reference as containment and thus might not be able to infer lifetimes (a < b < a). If collections cannot delete things when necessary, I don't see how you can avoid a space leaks in common patterns such as using a queue to store units of work for threads in a pool.

It's not possible to

It's not possible to determine all things statically, but a great deal of things can be done about it that are not touched now.

How do programmers manage manual memory management in their mind? they abstract the program as a sequence of operations without loops, and thus they are able to understand when something should be deleted.

The same thing could be done by a compiler, if the compiler was fed with the appropriate information.

My opinion is that whatever the human brain can detect in a program, a compiler can also detect. The problem lies that no one has sat down yet and defined the set of information a compiler needs so as that it can do the same work as the human brain.

Before you say that the human brain is inherently more powerful than a CPU, I will say that when I am trying to understand when I should delete something in a C++ program, I do not use all the information at hand, but only certain things in a program that make sense to the problem I have to deal with. In other words, I use the proper abstractions. If those abstractions are fed to a compiler, the compiler could do the same job I do, but without the distractions (i.e. with 100% success instead of 90%+).

But what if the abstractions

But what if the abstractions that you typically use in your reasoning typically ignore certain details which in boundary cases become very important? Then the compiler too will only cover that same 90%. How would we represent these abstractions in the compiler and how do we convince ourselves that the abstractions are sound? We can't just use a fixed logic, because then there would always be useful statements that can't be encoded in that logic. The alternative to a sound logic doesn't seem like a good idea to me, because a compiler that makes tranformations based upon assumptions that we can't show to be true is a compiler that'd I want to be very careful when using.

Not all cases can be

Not all cases can be covered, of course. In very complex algorithms, the confusion of the programmer will be transferred to the compiler. The difference with using an automated tool to do the check is that if there is at least one contradiction, the tool will find it, no matter how complex the algorithm is...whereas we humans we may not be so lucky.

The reason we humans cover the 90% and not the 100% is that we get distracted from external things, we forget, our emotions get in the way etc. A compiler would not have such problems. For example, I can easily assume that a variable X never gets value 0 in a program, but two weeks later I can come back and write 'X := 0'...or another programmer may come in and write that for me.

I'm not quite following

I'm not quite following where the statement regarding emotions comes into play....

Think about what can distract you when you are "programming".

There are many things that can distract a person who's designing/programming/debugging. It's easy for humans to lose focus; machines have the advantage in this domain.

It's not possible to

It's not possible to determine all things statically, but a great deal of things can be done about it that are not touched now.

I agree. There are so many avenues that haven't be explored in programming languages, and programming environments for that matter.

How do programmers manage manual memory management in their mind? they abstract the program as a sequence of operations without loops, and thus they are able to understand when something should be deleted.

I suspect it's a bit more complicated than that. It's my belief that the haulting problem afflicts humans as much as computers and that the same way we work around it can be applied to program analysis. Obviously the times where even we can't figure it out statically will likely be impossible for the computer, as well, but getting a compiler to utilize a fraction of our ability could yield a large boost in lowering memory usage for a GCed language.

profiling memory use

Curtis W: ...but getting a compiler to utilize a fraction of our ability could yield a large boost in lowering memory usage for a GCed language.

To lower high water memory usage for a language with gc, I think one focus should be standardized profiling in a language, showing both source of use and patterns of growth.

This leverages a human mind's ability to process patterns by simply making them visible. Then a coder can reason about causes and experiment scientifically by making changes and observing results.[cf]

Visibility is key to getting the mind into play. Highly expressive languages make it easy to request unintended effects like excessive memory use. Good profiling and debugging tools should make it easy to observe effects, so a coder can compare actual to expected results.

Abstractions...

One of the things that help me decide where I can safely delete an object is the "linearity" abstraction: I draw a line where I put all significant milestones in the lifetime of an object, including its birth (start of line) and its death (end of line). Then I try to fold this line in one of the program's loops. I guess something similar can work for a compiler.

Has there ever been a study about what abstractions programmers use in their minds?

linear types

One of the things that help me decide where I can safely delete an object is the "linearity" abstraction: I draw a line where I put all significant milestones in the lifetime of an object, including its birth (start of line) and its death (end of line). Then I try to fold this line in one of the program's loops. I guess something similar can work for a compiler.

Sounds like linear types to me, which are definitely useful for automatic memory management and regions, etc.

Boehm GC

The Boehm GC can be made less conservative by giving it type information in C++ apps, typically with the help of the compiler. I think there are smart pointers you can use that let the collector know to only scan those values, but I don't recall the details.

Anyway, it's certainly possible, and these days nothing stops you using G++ or LLVM and creating your own C++ compiler modified however you wish :)

C++ Runtime Data

Something to look at here might be item #9 of Paul Wilson's OOPS Group Publications at the University of Texas: "Portable Runtime Type Description for Conventional Compilers." An implementation is apparently available via FTP, but it's not obvious from where.

Note that item #9 is related to item #14, "Real-Time Non-Copying Garbage Collection." This is definitely real; it's used in the RScheme system.

some random comments

some random comments

can you get aspects for c++? i would have thought so. they seem like a possible solution to the problem of extending base classes, based on my limited experience with java (especially if the extensions are orthogonal to the original classes, so really it's more like multiple inheritance from a bunch of base classes).

lazy evaluation seems a rather extreme solution to uninitialized variables - couldn't the compiler just warn instead?!

lenient evaluation *is* eager evaluation, isn't it?

arrays are preferable to lists? doesn't that mean imperative counters? have you considered lists as an alternative to checking array bounds via dependent types? maybe there's some kind of hybrid implementation that gives close-to-array efficiency for simple cases? [later - list comprehensions only give you map, not fold, afaik]

type inference doesn't scale for modules - have you looked at ml?

Type inference and scaling.

type inference doesn't scale for modules - have you looked at ml?

Since Tim is obviously reading this forum it would be interesting to hear a clarification of that statement, "Type inference doesn't scale". The slide in question seems to mix that statement along with some "scary" error messages - which is of course a valid observation, but in my humble opinion not the same problem as "type inference doesn't scale".

two more things.

two more things.

first, i read elsewhere a talk by another games programmer claiming that the majority of time was spent in searching (A* tree search). anyone know where/how that fits with the numbers here?

second, a day later, i don't remember any reference to queues. it seems like they might help with the "interacting objects" problem. both in the future (erlang, lightweight thread per object?) and now (allowing you to "straighten out" the inter-dependencies so that single threading doesn't go round in loops between closely coupled objects).

i would certainly look at erlang and ml, rather than haskell, if thinking about high performance multi-threaded applications. seems to me that you want a strongly typed erlang, basically (how close is alice to that?)

A* search, message-passing

first, i read elsewhere a talk by another games programmer claiming that the majority of time was spent in searching (A* tree search). anyone know where/how that fits with the numbers here?

I think there's basically no way that claim could be true for a modern game if he was referring to A* search specifically, even for something like an RTS that does a crap load of path-finding.

If you take an extremely broad definition of 'search' that encompasses everything from frustum culling to collision detection to A* search and other AI-related activities, then it would have at least a chance of being true.

second, a day later, i don't remember any reference to queues. it seems like they might help with the "interacting objects" problem. both in the future (erlang, lightweight thread per object?) and now (allowing you to "straighten out" the inter-dependencies so that single threading doesn't go round in loops between closely coupled objects).

He mentions message-passing in there. Tim's definitely familiar with Erlang-style concurrency.

I would think that the games

I would think that the games programer talking about A* search meant games as in chess.

We were using A* for path

We were using A* for path finding in RTS (Real-Time Strategy) games 8+ years ago. For the unintiated RTS games involve building bases, creating units and getting them to fight each other. Click on unit or units click on the other end of the map and they figure out how to get there.

Oh, thats intressting, but

Oh, thats intressting, but even if you did use A* i doubt that your games spent more than half there time doing it, as was claimed above. By the way can you tell which games?

I can easily see any game

I can easily see any game targetting hardware from, say, 1996-2000 in which hundreds of units are finding paths across large maps on a regular basis spending half its time doing pathfinding. The RTS genre's traditionally gone easy on the graphics for much this reason, where possible getting improvements in look from better artwork.

Yep

During that time period at least (I don't know about now) tweaking A* to get acceptable performance was a fairly common topic in game development resources that I looked at. Generally using A* to obtain an optimal path simply took too much time for real-time games where you don't want the computer (or the user's unit) to grind to a halt for a second or two, so they'd reduce the quality of the path to get acceptable response times. You didn't even need more than one unit back then--computing a single path over a complicated map could take more than the few hundred milliseconds that you had available, and that would be enough to make the game unplayable.

Pretty much any real-time game that involved moving stuff around a map had to deal with these problems, for example games like Baldur's Gate and Fallout (real time RPG's), Warcraft and Command & Conquer (RTS's). All had generally abysmal path finding quality.

20-30% in our case if memory

20-30% in our case if memory serves me correctly. We did have to precompile maps (thereby precluding dynamic maps), and make the algorithm incremental to achieve that though. The was also a fudge to get units moving before the search was completed.

Haven't worked on an RTS in years, last time I heard Epsilon-Admissible Iterative-Deepening A* was the next big thing but this could be out of date especially now that dynamic maps are being tried.

lenient = non-strict + eager

After reading the mentioned paper I think that lenient is not an evaluation strategy, but a pair of (non-strict) semantics and (eager) evaluation. Similar to lazy evaluation (which implies, AFAIK, non-strict semantics). Other non-lazy evaluation strategies which satisfy non-strict semantics are: speculative and optimistic.

Anyone here knows about a good paper talking about the relationship between all these evaluation strategies?

No, but CPS is a good bet

Unless you are asking just about some fact-sheet (this strategy scores 5 for expressivity, etc.).
My worst concern is that most of these terms are abused and never standardized in the first place - lazy of one researcher may well mean non-strict of another.

As a side note, it would be interesting to see how various evaluation strategies interact with various delimited continuation operators...

Full comparison

No, I really want to understand what are the real differences behind those, because we have much research in optimistic evaluation and I have no idea if the results are valid to lenient evaluation.

Just curious

because we have much research in optimistic evaluation

Do you mean the CS community as a whole, or some proprietary research?

The reason I ask is I am very interested in (business) process semantics, and in some use cases they require something that feels like optimistic evaluation.

We, the people...

The CS community has lots of it, I have zero. Actually I have a bit more than zero, but in PLT and type systems not implementation strategies.

Oops, CPS is a bad bet

It just occurred to me that while CPS-transforming to lambda terms is appropriate to define call-by-value or call-by-name, but if one wants to express lenient evaluation strategy faithfully, then the target language must have some notion of concurrency. Which means, translating to pi calculus (or join calculus, m-calculus, etc.) is a better way to compare various evaluation strategies.

On the other hand, "CPS" in some papers does not assume the target is lambda calculus, so I might have pretend I didn't mean that as well :-)

Futures, maybe

Although I cannot point you to any paper that mentions this explicitly, all three evaluation strategies can trivially be expressed with futures. For example, in Alice ML, the three forms of reduction for application can be employed as follows:

strict: f(e)
lazy: f(lazy e)
lenient: f(spawn e)

That is, the different forms of future directly correspond to the different evaluation strategies.

Futures are great...

but what about speculative and optimistic? IIUC you need compiler suport (or some sort of staged introspection) to decide if you should evaluate some parameters. Also what happens in the lenient case if e diverges but f neither use it nor diverge? Do we have to explicitly kill the divergent thread in the implementation of f?

Good questions

Re speculative and optimistic: I'd say these are different because they rather are interesting on the level of implementation than semantics. The distinction is not a clear one, though, because it depends on the language. Lazy and lenient can both be considered implementation strategies for non-strictness, but they are semantically distinguishable in a language with side effects. In contrast, for speculative and optimistic evaluation observability usually is not desired, as far as I know.

The case of f(spawn e) where e diverges but is unused will have the correct behaviour semantically, but it would indeed leave a junk thread running forever, unless you kill it explicitly. This of course might be undesirable operationally. But I consider it a rather pathological situation and would be surprised if it posed much of a problem in practice.

Garbage-collecting threads?

The case of f(spawn e) where e diverges but is unused will have the correct behaviour semantically, but it would indeed leave a junk thread running forever, unless you kill it explicitly.

Isn't it possible to collect junk threads in PLs without (undeclared) side-effects? By junk thread I would define a thread of the future that is garbage itself (unreachable from root set) - in absence of side-effect this thread can do nothing but consume resources and return the result that will be ignored, so it is semantically just to kill it.

But I consider it a rather pathological situation and would be surprised if it posed much of a problem in practice.

I suspect there are a lot of use cases - parallelOr being the simplest (granted, in this simplest case GC is not the best option performance-wise).

[on edit: ha, so it was not a deja vu: I already asked about something like that in the past... I have to return and rethink that...]

Isn't it possible to

Isn't it possible to collect junk threads in PLs without (undeclared) side-effects? By junk thread I would define a thread of the future that is garbage itself (unreachable from root set) - in absence of side-effect this thread can do nothing but consume resources and return the result that will be ignored, so it is semantically just to kill it.

Maybe not the exact same scenario, but interesting nonetheless: The Incremental Garbage Collection of Processes...

We discuss an approach to a problem that arises in this context: futures which were thought to be relevant when they were created become irrelevant through being ignored in the body of the expression where they were bound. The problem of irrelevant processes also appears in multiprocessing problem-solving systems which start several processors working on the same problem but with different methods, and return with the solution which finishes first. This parallel method strategy has the drawback that the processes which are investigating the losing methods must be identified, stopped, and reassigned to more useful tasks.

The solution we propose is that of garbage collection. We propose that the goal structure of the solution plan be explicitly represented in memory as part of the graph memory (like Lisp's heap) so that a garbage collection algorithm can discover which processes are performing useful work, and which can be recycled for a new task.

An incremental algorithm for the unified garbage collection of storage and processes is described.

In principle, yes


Isn't it possible to collect junk threads in PLs without (undeclared) side-effects?

In principle, yes. Although I would expect most threads to not be free of side-effects, even in pure languages. In any case, you'd need an effect system or monadic type system to find out.

OTOH, Alice can, in some situations, collect threads that are blocked for eternity - e.g. when they wait for a future that is not reachable by any runnable thread (normal garbage collection actually does that almost for free).


I suspect there are a lot of use cases - parallelOr being the simplest (granted, in this simplest case GC is not the best option performance-wise).

In general, sure. I was referring to the particular situation of using threads to emulate lenient function calls - I don't think you normally would want to apply a function that does not use its argument to an argument whose evaluation diverges (unless you plan to do real lazy programming).

The difference lies in application

Interpreting Functions as Pi-Calculus Processes: A Tutorial

The paper gives encodings of CBV, CBN, and parallel CBV into pi calculus. See section 7 for a variant of encoding that differentiates evaluation strategies only by the translation of application (though for some reason this section ignores parallel CBV :-( ).

Why virtual classes?

How is that different from just adding new fields to class Actor? If all classes extending Actor will need those new fields, then it makes sense to add the field to class Actor.

In a one-developer world, you would just add the new functionality to your base class. But in the case of Unreal Engine 3, we maintain and distribute a large software framework used by a hundred or so other developers.

We're continually adding new features to the framework, so if an external developer adds a new member function to our base class's source in their local codebase, then they'll need to merge their changes into all subsequent versions of our framework, and we release frequently - every week or two.

In this case, a virtual class solution that enables a team to extend an entire class hierarchy in a framework, without modifying the framework's source, would be very valuable.

Obviously there are lots of workarounds available (see the slides) - adding extra data components, dynamically casting them to the expected type, etc. But they're quite error-prone.

by the time a game gets to the flat profile, does it usually have sufficient performance?

When all commercially-practical code optimizations have been exausted, further optimizations have to come from the game's content: reducing the number of visible objects, enemies, etc. At that point, it's just a performance vs quality tradeoff, as opposed to performance vs cost.

Arrays are preferable to lists? doesn't that mean imperative counters?

Comprehensions and folds are fine substitutes for imperative iteration.

Type inference doesn't scale

My 1-line example produced an incomprehensible error message in HUGS, but it was easy enough to see what was wrong with that code. Given a multi-hundred like Haskell program with fairly complex typing, it can be combinatorially more difficult to figure out what's wrong, given such an error message. This property doesn't seem at all tenable in a mainstream programming language.

Note that I'm only arguing against type inference: inferring x::Numeric in "f x = x+1". Type propagation, deriving the typing x::Numeric from "x = 7", is not problematic.

On Virtual Types

Note that notions of virtual types/classes are addressed very well in Scalable Component Abstractions and On the (un)reality of virtual types. You'll have to scroll down a bit on the latter to find the reference.

Update: I should have pointed out that one upshot of the above is that Scala and O'Caml lend themselves particularly nicely to maintaining groups of classes that are extended independently of each other.

Nested Inheritance

I think Nystrom, Chong and Myers's OOPSLA 2004 paper Scalable Extensibility via Nested Inheritance is also worth mentioning here (it's been mentioned before, but didn't get the attention I thought it deserved).

It allows you to subclass (implying subtyping) a group of classes, so that, just like when you override virtual methods, references to the overridden classes in the original group of classes now refer to the overridden versions.

A key difference with virtual classes and the like, is that nested inheritance works at the class-level, not the object-level: a class is an attribute of the enclosing class, not of an instance of this class. This made it easier to prove the soundness of their approach. (And arguably, it also might make their language extension easier to use than full-blown virtual classes.. that's just wild speculation, though..)

However, when the nested inheritance paper was written, the soundness of gbeta's and Caesar's virtual classes-approach hadn't been proven. This has changed recently (at POPL 2006, to be exact :-)): A Calculus of Virtual Classes by Ernst, Ostermann and Cook looks very interesting, but I haven't had a chance to study it yet. (I skimmed it briefly and they discuss the nested inheritance paper in their related work. At first glance, they seem to have some valid points about how nested inheritance actually isn't easier to use than their approach...)

As far as I can tell from your slides, both approaches (especially nested inheritance) seem to be pretty close to what you're looking for.

responses

My 1-line example produced an incomprehensible error message in HUGS, but it was easy enough to see what was wrong with that code. Given a multi-hundred like Haskell program with fairly complex typing, it can be combinatorially more difficult to figure out what's wrong, given such an error message. This property doesn't seem at all tenable in a mainstream programming language.

Incomprehensible? Such an error is easy to read for any programmer who has been using Haskell for a little while. It's telling you that there's no instance of Num for strings, so f can't be applied to them. When you look at the definition of f, that makes perfect sense, since it adds the components of the pair that you pass it. As an alternative to looking at the source for f, you can simply ask hugs or ghci to give you the inferred type. I don't see how the error is any more or less confusing than the one you're getting in C. Here's what GHC gives:

test.hs:2:7:
No instance for (Num [Char])
arising from the literal `3' at test.hs:2:7
Probable fix: add an instance declaration for (Num [Char])
In the first argument of `f', namely `(3, "4")'
In the definition of `a': a = f (3, "4")

Lazy evaluation isn't really all that costly -- it's more the case that most programmers are used to thinking in strict languages, and expect the lazy language to behave similarly with regard to performance. I think it's commonly forgotten that there are two ways to optimize a Haskell program: make it stricter, or make it lazier. Lots of people forget the second one. It's actually not so hard to get decent performance from lazy code if you're used to thinking in terms of demand. Sure, the compiler could also be a lot better in terms of the code that it produces, but compilers for strict languages have been around a lot longer than for lazy languages, and it's an easier problem there in the first place. I don't think it's a fundamental problem with non-strict semantics, and the benefits of non-strict semantics in terms of compositionality of programs are so incredibly huge, that it's worth a closer look. :) Laziness only when explicitly specified will defeat the whole usefulness of it in the first place, because strict functions will be all over the place and will prevent the gain in compositionality that laziness gives you.

There is a right place for strictness, but it's actually rather small compared to the cases when laziness wins out. You want strictness when reducing large data structures into small ones in a way which uses all of the large structure (i.e. not a search). I'd claim that in all other cases, laziness is actually preferable or equivalent to strictness in terms of performance. Strict higher order functions can be provided to deal with cases where you really want the strictness: see foldl' (note the prime) in Data.List as a good example.

If you want to know why lists are so popular in Haskell, ask why loops are so popular in C. :) The reason is mostly the same. Due to laziness, lists, along with the usual higher order functions are essentially reified loops which can be transformed all at once. In the absence of laziness, yes, arrays may be preferable. It's laziness that allows you to think of data structures like control structures.

Here's a nice (perhaps too small) example of where laziness improves compositionality. I could give much larger ones, and Why Functional Programming Matters gives lots -- from numerical computation to game AI.

Consider writing a function which checks if a particular string x occurs as a substring of another string y. A natural way to break this problem up is to check if x is a prefix of any of the tails of y. In an imperative language, you end up writing a couple of nested loops, with early breaks to proceed to the next tail as soon as it becomes obvious that x doesn't occur there. In Haskell, we can write:

isSubstringOf x y = any (isPrefixOf x) (tails y)

If Haskell was strict, this would be a very stupid way to write this function. Computing all the tails of y, then checking every single one of them to see if x is a prefix, then summarising that to see if any of the tests returned true is asymptotically much worse, and usually quite impractical. With lazy evaluation, the above code is essentially the same as the nested C loops with the early break out tests. We can successfully reuse existing tools like 'any', 'isPrefixOf' and 'tails', whereas if the language was strict, we'd be back to square one.

Another odd thing is that you complain about the syntax. The syntax of Haskell is actually one of the things which I like most about it. I suppose this is a matter of taste, but could you be more specific here? I think the syntax is really clean and easy on the eyes.

I'm still not really sure what you mean about type inference not scaling. Perhaps it doesn't scale to certain extensions of the type system (higher rank types / GADTs / existential types), but when combined with a relatively small number of programmer annotations, it's incredibly nice. Most Haskell programmers write type signatures on important top-level bindings anyway, as a form of documentation, to aid in thought about how to write the function, and to improve error messages. I still wouldn't want to lose type inference altogether though, because it would be a horrible pain to get anything done. You really don't want to explicitly type everything, and it's very often nice to be able to omit type signatures when they're unwieldy, or you're just writing something quick.

- Cale

Incomprihensible error messages

Incomprehensible? Such an error is easy to read for any programmer who has been using Haskell for a little while.

I am going to side with Tim on this one. It is incomprihensible, and more research into type error reporting is needed. Bastian Heeren's thesis has a lot of interesting information on the subject, if you were to skip all chapters but one I would recommend atleast skimming Chapter 12 of it.

Lazy evaluation vs. Streams

If Haskell was strict, this would be a very stupid way to write this function.

Isn't it a property of lists in Haskell having two uses: (algebraic) lists and (co-algebraic) streams? In a strict language with streams you can write similar code and have similar results.

I'm not sold on non-strictness as a default for improving compositionality, IMHO all you need is a way to declare codata in you language.

The C++ traits pattern

Re: virtual classes, have you looked at the C++ template traits pattern?

I know templates get a lot of flack for their syntactic ugliness, but this is one problem they can really solve quite well.

Basically you make everything a template, and inherit from a traits class passed in as one of the template parameters. I can then instantiate an Actor<MyCustomData>, which can inherit custom methods, fields, etc. There is a lot of knowledge in the C++ community about how to deal with whole familes of relations between such constructs, for instance how to implement things like:

	Collision<CustomCollisionData> Actor<CustomActorData>::DetectCollision();

and have the code be able to query the CustomActorData traits to automatically determine the appropriate type for CustomCollisionData.

Modern C++ Design, by Andrei Alexandrescu, is pretty much the definitive text on this stuff: check out http://erdani.org/

What about the verbosity problem?

Paul Graham's article about what makes hackers prefer some programming languages over others makes the interesting point that programmers are lazy -- concise syntax is highly desireable. I think this is part of why I enjoy coding in Python but hate coding in Java. Tim's idea of adding more annotations seems like it would make this problem even worse. I've been thinking over compromises but I'm curious if anyone else has thought about this.