C++ has indeed become too "expert friendly"

Once upon a time we might have leavened the LtU fare with lighter articles like this The Problem with Programming, Bjarne Stroustrup interview

Now I see that stuff on programming.reddit.com

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

unreliable parts

Thanks, that article's worth reading. I especially liked the following statement, which seemed rather close to my own assessment of why so much software seems bad.

Stroustrup: Software developers have become adept at the difficult art of building reasonably reliable systems out of unreliable parts.

The process doesn't converge on good behavior. As you add more unreliable parts, the result diverges, getting less reliable in more parts. C++ wasn't designed to support scavenger based coding, where folks find piles of code in libraries here and there to be sintered together. But that's largely what everyone wants to do and tends to do. Small flaws can cause (well, usually do cause) subtle problems that are harder to find as the code base gets larger.

I guess Bjarne kinda points out the expected recipe for successful development in C++ takes a small number of talented cooks with above average skills (for today's definition of average). Does "expert friendly" emphasize the wrong issue?

Anyway, I think C++ makes a fine assembly language for implementing some other higher level platform with fewer clear options for doing the wrong thing. Piles of code written in something like this might be less prone to chaotic interactions, especially if it was willing to "waste" both compile and runtime cycles on looking for (and reporting) problems, so bad behavior was probablistically detected more often than classical C++ systems.

Efficient languages need not be unreliable.

Take C++ for example*:

-replace its current type system with a static strong type system
-remove casts
-add algebraic types
-remove pointer arithmetic
-make pointers non-nullable by default
-make array access bounds-checked unless size is known at compile time
-remove pointer to void

and a much-safer language would arise, one with the same efficiency of current C++ but without its unreliability...

(*)list is by no means conclusive

You forgot the most important one

- remove delete/free

In other words, remove manual memory management. But we already have that: it's Java.

Not really.

remove delete/free

Given that it was me that started this thread, I can assure you GC is the #1 requested feature in my list. But I don't hold my breath...:-)

But we already have that: it's Java.

C++'s semantics differ from Java's (as you already know, of course). Java does not have value objects, stack-based allocation, templates, etc. I would hardly call it the same. C++ is more efficient due to these features.

Beat Me To It

It's important to note that the request was for a language as efficient as C++. That's an anti-goal for Java, as I'm sure everyone reading this knows. Of course, everyone reading this likely also knows that I believe the sane alternative to C++ is O'Caml, but I won't recapitulate the reasons why here. :-)

Beat Me To It

It's important to note that the request was for a language as efficient as C++. That's an anti-goal for Java, as I'm sure everyone reading this knows. Of course, everyone reading this likely also knows that I believe the sane alternative to C++ is O'Caml, but I won't recapitulate the reasons why here. :-)

If you allow stack-based

If you allow stack-based allocation, you again have a potential source of unreliability:

A* bug()
{
  int i;
  return new A(&i);
}

Assume A saves the pointer to i as its state. And now? Forbid to take the address of stack-allocated objects as well?

When it comes to "safe and efficient" programming, first language I can think of is O'Caml. Why not use that?

Ignoring the C++ and O'Caml

Ignoring the C++ and O'Caml debate for a moment. I think that the code snippet highlights a fundamental problem with the C/C++/D/etc. mode of thinking that a pointer is a pointer is a pointer.

In other words it seems to me that the following notions should all be different types:

- a pointer to an object in a stack frame,
- a pointer to an object allocated with new
- a pointer to memory returned from malloc et al.
- a pointer to an array of objects allocated with new[]
- a pointer to an array of objects in a stack frame
- a pointer to a temporary
- a shared pointer
- a memory address

Of course in C++ this is all really just a "pointer" which means little more than "memory address".

IMHO a large number of the safety problems with C++ et al. is related to this fundamental issue.

Now getting back to the debate, my contribution would be: why not just use Cyclone?

CCured

Now getting back to the debate, my contribution would be: why not just use Cyclone?
Or possibly something like CCured.
CCured is an extension of the C programming language that distinguishes among various kinds of pointers depending on their usage. The purpose of this distinction is to be able to prevent improper usage of pointers and thus to guarantee that your programs do not access memory areas they shouldn't access. You can continue to write C programs but CCured will change them slightly so that they are type safe.

CCured is a source-to-source translator for C. It analyzes the C program to determine the smallest number of run-time checks that must be inserted in the program to prevent all memory safety violations. The resulting program is memory safe, meaning that it will stop rather than overrun a buffer or scribble over memory that it shouldn't touch.

Or Eiffel or Ada

Eiffel and Ada both produce efficient code on par with C++, come with most of the things Achilleas is asking for (except algebraic data types), are available right now, are tried and tested, and are easy enough for any C++ programmer to learn. There's also, as others have pointed out, O'Caml which provides everything Achilleas is looking for. If safety is a concern wrt C++ there's no need to whine, just start using something else.

pointers for horses

Instead of having different types for pointers (I agree on the memory address distinction) why not put them in different regions? Region based memory management has lots of nice properties and can be mixed with GC (i.e. by having a parallel GC'ed region).

Another possibly simpler solution

Another possibly simpler solution would be to remove the 'address-of' operator (operator &), thus not allowing taking the address of a value object.

This solution would solve a whole lot of problems:

1) taking the address of a stack-allocated object.
2) taking the address of a member object.
3) taking the address of an array element.

Items #2 and #3 would be helpful in GC as well, as they would not allow pointers into the middle of blocks, and thus solving another fundamental issue with GC in C++.

Passing parameters by reference would be no problem if parameters could be labeled 'in', 'out' or 'inout'.

Example:

class MyObject {
public:
    //a pointer
    MyObject *other;
};

//objects passed by reference
void test(in MyObject obj1, inout MyObject obj2) {
    //error
    obj1.other = obj2;
}

int main() {
    MyObject obj1;
    MyObject obj2;
    MyObject *obj3 = new MyObject;

    //allowed
    obj1.other = obj3;

    //error
    obj1.other = &obj2;

    //invoke by reference
    test(obj1, obj2);
}

Careful

Note that arguments passed by reference might end up in a closure (if the language isn't dull). That means that anything passed like that generally has to be heap-allocated. This is roughly what C# does these days, IIRC.

No problem with closures.

Parameters labeled with in/out/inout would be passed by reference, but they would be value types actually; they could not be assigned to pointers. Therefore any assignment to a member or local variable would copy the objects. If the programmer desires to keep a reference to an object passed as a parameter, he/she should use a pointer instead; and since pointers would only be able to point to heap-allocated objects, there is no problem with closures.

Problem with closures

Consider:

typedef int func(int);

func f(in int a[]) {
  int ff(int i) { return a[i]; };
  return ff;
}

func g() {
  int a[100];
  return f(a);
}

No taking addresses anywhere. Still it will blow if `a' is not put on the heap.

By value... except for arrays

That's a problem that shows up in other places with C. Everything is passed by value, except for arrays. I fully understand how the pointer is passed by value, etc. Still does not excuse the behaviour :) The well-known hack of using a struct containing an array simply shows that we should be using a container that copies the underlying array. We'll be back to using pointers nearly everywhere, but at least they'll be explicit.

Call-by-reference

...was the issue under discussion. ;-) The "in" is supposed to enforce cbr. It does not really matter for the example whether it's an array or something else you pass. I just chose it because arrays are one sort of large object you typically don't want to copy around the stack all the time.

There is an implicit address-of-value-object in there.

The attribute 'in' makes the object 'a[]' a value object managed by reference. The operation 'a[i]' takes the address of 'a', which is an operation that is not allowed on value objects.

Irrelevant

As I already said, it is irrelevant whether you use an array or any other kind of object. Defining indexing in terms of pointers is a particular ill-design of C that had to go anyway for the goals you have in mind (otherwise arrays could never be value objects).

Regardless, replace the array by a struct and the indexing by member access. Same problem, no "implicit address-of" operator. You can even use a plain int if you prefer.

EDIT: just to make it crystal clear, here is the simplest possible example demonstrating the full issue:

typedef int func();

func f(in int n) {
  int ff() { return n; };
  return ff;
}

func g() {
  int i = -1;
  func ff = f(i);
  i = 0;
  return ff;
}

int main() {
  return g()();  // should give 0
}

Not irrelevant at all.

Defining indexing in terms of pointers is a particular ill-design of C that had to go anyway for the goals you have in mind (otherwise arrays could never be value objects)

I do not really see the problem: any object passed with 'in' attribute would have value semantics, so it would have to be copied. An array without size is a pointer to a memory block, and therefore since it would not be possible to be copied, the compiler would declare an error.

Regardless, replace the array by a struct and the indexing by member access. Same problem, no "implicit address-of" operator. You can even use a plain int if you prefer.

EDIT: just to make it crystal clear, here is the simplest possible example demonstrating the full issue:

Not really. The following code:

int ff() { return n; };

creates a copy of 'n', so there should be no problem.

No copying

any object passed with 'in' attribute would have value semantics, so it would have to be copied

I don't understand what you are saying. Where is it copied? Surely not at the call to f, because then it wouldn't be cbr. During closure construction? That would be wrong. Please reread the latter example. There is no and there cannot be any copying, because that would screw the semantics (the program would return -1).

Screw the semantics

If i is copied it would be very different from a &reference (result -1 instead of 0). Maybe that is just "unsupported usage"?

Yes, copying

The purpose of the 'in' modifier in parameters is to avoid copying values, it is not to pass references. 'In' parameters are implemented as references but treated like values.

If you would like the 'int' to stick around as a variable, then you have to do do the following:

typedef int func();

func f(int *n) {
  int ff() { return *n; };
  return ff;
}

func g() {
  int *i = new int(-1);
  func ff = f(i);
  *i = 0;
  return ff;
}

int main() {
  return g()();  // should give 0
}

So what

Not what you said orignally, but anyway, just construct an analogous example using an `out' or `inout' parameter, which surely is a reference.

I don't think you will win

I don't think you will win at this game... From what Achilleas said I can see that 'out' is not allowed (or valid) to be read and 'inout' is just an 'in' that can receive a new value and return it to the caller (semantically a copy on exit).

As Achilleas says your only bet is to use a pointer.

Options


typedef void func();

func f(out int n) {
  void ff() { n = 0; };
  return ff;
}

func g() {
  int i;
  func ff = f(i);
  return ff;
}

int main() {
  return g()();  // crash?
}

AFAICS, you have only three ways to prevent the crash: (1) heap-allocate i, (2) rule out the definition of ff, (3) let ff assign into some safe nirvana that you allocate in the closure. The second option would undermine the regularity of the language and the third would be flat-out insane. Both would cripple the expressiveness of closures, particularly in conjunction with higher-order functions that you might want to use to compute an out result. Of course, that would be a very C-ish design choice, but I thought that Achilleas wanted to repair the language. ;-)

EDIT: There is a fourth option: ruling out the return statement in f, i.e. disallow passing a closure that refers to an out argument out of the scope of that argument. But I'm not sure how this could be statically detected in less trivial examples.

'Func' is an object with a member 'n'.

In your example, 'func' is an object with a member 'n'. If you really want to store a reference to a variable in a closure, your only option would be a heap-allocated variable.

Both would cripple the expressiveness of closures, particularly in conjunction with higher-order functions that you might want to use to compute an out result.

Not really. The case that an out result should be stored in a closure and passed around as a function is rare and covered by the heap allocated variable solution. The most usual case is to pass a function as an argument which accepts an out parameter which the function uses to store the results. For example:

void filter(in List src, out List dst, void func(in List::iterator src, out List dst)) {
    for(List::iterator it = src.begin(); it != src.end(); ++it) {
        func(it, dst);
    }
}

In the above example, a higher order function is used to collect elements of a list into another list.

but I thought that Achilleas wanted to repair the language

I don't have high hopes that someone is listening...it is more of a theoretical exercise...

That's what I originally said.

That's what I originally said, and it is valid for 'out' and 'inout' parameters as well.

For example:

func f(out int n) {
  int ff() { return n; }; //n is copied!
  return ff;
}

func g() {
  int i = -1;
  func ff = f(i);
  i = 0;
  return ff;
}

int main() {
  return g()(); 
}

Let's not forget that 'ff' is implemented as an object allocated on the heap, and therefore all its pointer members should also point to heap-allocated objects.

In other words, if you want to keep a heap-allocated reference to something, that something must be also be a heap allocated block, and the type system ensures that.

Mh...

That's what I originally said

Sorry, maybe what you meant, but not what you said:

Passing parameters by reference would be no problem if parameters could be labeled 'in', 'out' or 'inout'.

which is what I replied to.

func f(out int n) {
  int ff() { return n; }; //n is copied!
  return ff;
}

This is not the example I had in mind, see my other post (in fact, it doesn't even seem to be valid, making a read access to an out-only parameter).

In other words, if you want to keep a heap-allocated reference to something, that something must be also be a heap allocated block, and the type system ensures that.

I hope you are aware that this is a severe restriction that implies that closures aren't really closures, because they cannot close over mutable variables from surrounding scopes - they can only copy their content (and if this is done silently, as you seem to propose, then I expect a lot of surprise and subtle bugs).

Actually...

Sorry, maybe what you meant, but not what you said

Actually, that's what I meant right from the start...perhaps you did not understand it.

This is not the example I had in mind, see my other post (in fact, it doesn't even seem to be valid, making a read access to an out-only parameter).

Indeed...sorry for the error, but you get the idea.

Passing parameters by reference would be no problem if parameters could be labeled 'in', 'out' or 'inout'.

which is what I replied to.

Passing parameters by reference does not imply that the parameters are reference types.

I hope you are aware that this is a severe restriction that implies that closures aren't really closures, because they cannot close over mutable variables from surrounding scopes - they can only copy their content

Sorry, but I do not see the restriction. If you want closures over mutable variables from the surrounding stack, use heap-allocated objects. It's simple. Closing over stack variables should be prohibited.

(and if this is done silently, as you seem to propose, then I expect a lot of surprise and subtle bugs)

I don't see that as well. The rule is very simple: parameters labeled with 'in/out/inout' are objects passed by reference but their address can not be taken.

Stack-based objects

Forbid to take the address of stack-allocated objects as well?

C# allows to pass addresses to stack-allocated objects only in function parameters (ref and out parameters, and the implicit this parameter in methods of such objects). This is a nice compromise.

Java Performance

Java does not have value objects, stack-based allocation, templates, etc. I would hardly call it the same. C++ is more efficient due to these features.

Java will also allocate from the stack when it can prove that it is safe to do so. C++ forces the developer to make the decision, often to disastrous affect.

You can use polymorphism to write generic code in Java, as is done with the standard collection classes. With Java's partial-evaluation/method-specialization feature this gives you the same performance as C++ generics but without the huge memory overhead of having to internally generate type specific versions for each type. Java's HotSpot compiler only creates specializations of classes which are actually used a lot. Infrequently used classes are not specialized, which saves memory, and can even improve performance by improving cache consistency.

Java's average amortized memory allocation takes less that three CPU instructions whereas C++'s malloc/free combination takes around 27 (on Sparc, according to Sun).

The performance advantages of GC are discussed in the famous pager by Andrew Appel, of ML fame, called Garbage Collection Can Be Faster Than Stack Allocation.

Java also derives several other advantages from waiting until run-time to perform compilation. It can collect real-world statistics for things like branching or method in-lining so that it can make better choices that a statically compiled language like C++ can.

Consider the example where you have three methods, a(), b(), c() such that a() calls b() and b() calls c(). Both C++ and Java have a size limits for the size of methods that they will in-line. In C++ if c() is sufficiently small then it would be in-lined into b(). However, this might push b() over the in-lining limit which would prevent it from being in-lined into a(). With Java c() would only be in-lined into b() if it were actually a hot-spot. If it weren't then if b() were a hot-spot then it could still be in-lined into a().

When working with arrays Java is allowed to make some optimizations that C++ cannot make because of the possibility of issues related to pointers. This is the same reason that C++ is slightly slower than Fortran for working with arrays.

Many C++ programmers assume that because Java doesn't give them the "freedom" to perform many optimizations themselves, that these optimizations aren't being done. The truth however is that Java has just moved the complexity of deciding when and how to make many types of optimizations out of the hands of the language and its developers and instead put them into the compiler and the runtime environment. If you have the compiler technology to do it, then seems like a much better solution.

ava will also allocate from

ava will also allocate from the stack when it can prove that it is safe to do so.

Certainly doable, but I have never actually seen it in any Java compiler so far.

C++ forces the developer to make the decision, often to disastrous affect.

It's not that of a problem. Local objects are usually passed as references to functions that demand by-reference arguments.

In C++, there are some conventions followed almost by anyone:

1) references may point either to local or heap-allocated data; keeping the addresses of objects passed by reference is ill-advised and personally I have never seen it anywhere.

2) pointers are usually for heap-allocated data.

With Java's partial-evaluation/method-specialization feature this gives you the same performance as C++ generics but without the huge memory overhead of having to internally generate type specific versions for each type.

Last time I checked Java converted primitives to Object-derived instances, something called type erasure if I recall: the type exists only at compile-time; at run-time, the usual Object-derived code goes on.

This certainly does not have the performance advantages of C++, where operations on primitives can be deeply inlined and take advantage of each CPU's dedicated hardware. My own tests indicated that using an Integer instead of an int halves the performance of the code.

But perhaps this slowdown does not matter today; there is plenty of CPU horsepower around.

Java's average amortized memory allocation takes less that three CPU instructions whereas C++'s malloc/free combination takes around 27 (on Sparc, according to Sun).

I still do not believe that. What about synchronization? the collector has to be locked during allocation. Furthermore, the allocated block most probably has to be added to a global linked list of objects.

Java also derives several other advantages from waiting until run-time to perform compilation. It can collect real-world statistics for things like branching or method in-lining so that it can make better choices that a statically compiled language like C++ can.

Consider the example where you have three methods, a(), b(), c() such that a() calls b() and b() calls c(). Both C++ and Java have a size limits for the size of methods that they will in-line. In C++ if c() is sufficiently small then it would be in-lined into b(). However, this might push b() over the in-lining limit which would prevent it from being in-lined into a(). With Java c() would only be in-lined into b() if it were actually a hot-spot. If it weren't then if b() were a hot-spot then it could still be in-lined into a().

When working with arrays Java is allowed to make some optimizations that C++ cannot make because of the possibility of issues related to pointers. This is the same reason that C++ is slightly slower than Fortran for working with arrays.

Many C++ programmers assume that because Java doesn't give them the "freedom" to perform many optimizations themselves, that these optimizations aren't being done. The truth however is that Java has just moved the complexity of deciding when and how to make many types of optimizations out of the hands of the language and its developers and instead put them into the compiler and the runtime environment. If you have the compiler technology to do it, then seems like a much better solution.

Agreed, but I also do not agree :-). Although what you say has a basis, in reality the yields of these optimizations do not seem to make Java equal to C++, let alone faster.

I recently had this debate with a colleague of mine over sorting (quicksort). Even with the latest Java (1.5), C++ is faster (Microsoft STL 8.0), even with the program doing nothing else than sorting. And the biggest the array we used, the wider the gap between C++ and Java was. Finally Java stopped working around the 100,000,000 integers mark, where as the C++ program managed to finish although with a severe blow to the page file...

Of course the above is my opinion, and in no way conclusive...but, on the other hand, I have not seen real-life examples of Java being faster than C++...

Escape Analysis

Certainly doable, but I have never actually seen it in any Java compiler so far.

Java SE 6 performs escape analysis in order to decide when it is safe to allocate from the heap.

I still do not believe that. What about synchronization? the collector has to be locked during allocation. Furthermore, the allocated block most probably has to be added to a global linked list of objects.

Objects are allocated from Thread-Local pools so that synchronization is only required infrequently when a thread needs a new pool. With Java you have two options for improving performance, you can buy a better or more CPU's or you can buy more memory. With C++ you only have the CPU option. With one recent application I got a 35% performance improvmenet by just adding an extra half Gig of memory. The extra memory made memory allocation very cheap because virtually all objects could be collected for free from the eden space. It's nice to have tyis option with Java as you can't always add more/bigger CPU's and depending on your application, the performance/$ gain from extra memory is often cheaper than from CPU.

There exist primitive collection classes for Java but you're correct that primitves are faster/cheaper and it would be nice if you didn't have to write them seperately.

I recently had this debate with a colleague of mine over sorting (quicksort). Even with the latest Java (1.5), C++ is faster (Microsoft STL 8.0), even with the program doing nothing else than sorting. And the biggest the array we used, the wider the gap between C++ and Java was. Finally Java stopped working around the 100,000,000 integers mark, where as the C++ program managed to finish although with a severe blow to the page file...

This points to a genuine advantage of C++, namely that garbage collection doesn't work well in combination with virtual memory because the garbage collector keeps your working-set artificually large. Java works best with lots of memory. Low memory makes garbage collected memory allocation more expensive and low memory leads to virtual memory which doesn't work well with garbage collection either. A double whammy. This isn't an issue for me though as my laptop has two Gigs of RAM and our servers usually have tens of Gigs.

Also, when benchmarking Java be sure to use the "-server" option and give it a few runs to warm-up the JIT. I've noticed that application performance can continue improving even after a day of uptime.

Partially agreed.

Java SE 6 performs escape analysis in order to decide when it is safe to allocate from the heap.

But that analysis can be achieved only in trivial cases. It is not possible to do that for a complex operation, especially when it is the allocated object's state that dictates if it should be allocated on the heap or on the stack. In other words, if it was possible to determine at all cases if an object should be on the heap or on the stack, we may as well determine when an object should be deleted, and therefore get rid of garbage collection all together.

For example, an object A might have a pointer to another object B. When this pointer is null, the object A can be on the stack. But when this pointer is not null, the pointed object B may point back to this object A, and therefore the object A can not be on the stack.

And the run-time analysis takes CPU cycles (and memory) away...

This points to a genuine advantage of C++, namely that garbage collection doesn't work well in combination with virtual memory because the garbage collector keeps your working-set artificually large. Java works best with lots of memory.

That's a very serious problem. Most people usually have a few word documents open, a browser, an e-mail client, us developers an IDE (or two), a torrent client and a chat application. If all these apps where in Java, a modern computer would have a hard time dealing with all of them.

That's the reason people prefer u-torrent from Azurreus: u-torrent fills lightweight and fast, Azurreus fills heavy and slow. And that is true for many other apps: Microsoft Office vs Star Office (the Java version), Firefox vs java Browsers, Thunderbird vs Java e-mail clients, etc.

Escaping analysis


Java SE 6 performs escape analysis in order to decide when it is safe to allocate from the heap.

In my C++ code, there are mainly 3 kinds of "value typed" objects:
(1) Simple structs (like POINT, RECT) on the stack
These objects are typically short-lived, and they have no destructor, so allocating them on the heap and letting the GC take care of them wouldn't really hurt performance.
(2) Objects that are members in classes
Having members allocated separately on the heap adds a small memory overhead (for the pointer) and it hurts data-locality. If performance is important, this can be bad.
(3) Containers
This is (for me) by far the most important use case for native types: If I create an array/vector of points, complex numbers (think number crunching, signal processing) or RGB values (think image processing) I really need the elements to be consecutive in memory, otherwise I won't get anywhere near acceptable performance.

I'm no expert on Java VM technology, can it really solve the important cases (2) and (3)? In real-world code?

No

There are no solutions for 2) or 3) in current JVMs other than manually inlining the class contents. Note however that the data-locality hits are not as bad as you think, as generational garbage collectors will often (but not always) end up placing closely related objects adjacent in memory. Because of this, I'd say that 2), in practice, is rarely a problem. 3) is, and this does limit Java's penetration in scientific environments.

Argh!

Java will also allocate from the stack when it can prove that it is safe to do so. C++ forces the developer to make the decision, often to disastrous affect.

Is "often" misunderstanding LIFO/stack scoping a level of incompetence and cluelessness we should expect from developers? It's one of the simplest patterns of object lifetimes, and by remembering a few simple rules (don't store references to stack-allocated objects or return them), you can decide whether or not it applies.

The performance advantages of GC are discussed in the famous pager by Andrew Appel, of ML fame, called Garbage Collection Can Be Faster Than Stack Allocation.

Argh! Let's keep this in perspective. The paper *actually* says that the crossover point for GC vs. explicit free occurs when you have 7 times as much physical memory as active memory. Even with a bad malloc that incurs terrible fragmentation, that's still probably more than three times as much memory as you're actually using, or about two years' worth of hardware progress thrown away.

Also, on the general "sufficiently smart compiler" argument, do you (or does anyone) know how many of the optimizations you mention are actually done by hotspot/jikes/whatever? My impression is that the Lisp and Smalltalk folks have had decades to put these into production compilers, with limited success.

Sufficiently Smart Compiler

Also, on the general "sufficiently smart compiler" argument, do you (or does anyone) know how many of the optimizations you mention are actually done by hotspot/jikes/whatever?

All of the optimizations that I mentioned are performed by HotSpot. Escape analysis and allocating from the stack when it is safe to do so was added in Java SE 6, ie. Mustang, but all of the other optimizations have been available for several years.

You can read more about escape analysis in Java here.

Developer skill

Is "often" misunderstanding LIFO/stack scoping a level of incompetence and cluelessness we should expect from developers?

In general, we should assume that error will occur in all individual human endeavors, and that while error rates will (usually) decrease as the skill of the participant increases, they will never decrease below some base level. In computer programming, that base level of error rate turns out to be "often". For software engineering, you can get the error rate to decrease below that level with extensive automated testing, automated error analysis, and multi-developer review, but that's still not enough for high-reliability domains.

...and by remembering a few simple rules...

Excellent. One couldn't ask for a pithier recipe for engineering failure. I may have to have a plaque made of it. It'll have a picture of the Challenger, a chart of the flightpath of the Mars Climate Orbiter, and a source fragment from the Morris Worm, commemorating those who couldn't quite remember a simple rule. Systems that require human beings to consistently remember and follow rules without providing automatic cross-checks and backstops are simply failures waiting to happen.

In computer programming,

In computer programming, that base level of error rate turns out to be "often".

I'm sorry, but I have to call bullshit here. First, I'm not saying "all programming is easy", I'm saying "certain concepts are easy," or "I would not hire someone to program who cannot understand certain basic things." Second, we accept certain levels of error in nearly all domains. We let people use hammers and nails to build things, even though they hammer their thumbs occasionally. Because hammers can hurt thumbs, we take them away from children, but we trust adults, as we should trust software developers (unless it's "mission critical" in a way you're willing to back up with money).

Excellent. One couldn't ask for a pithier recipe for engineering failure.

Again with the all-or-nothing reasoning. Even better, it's accompanied by irrelevant examples (hardware inspection failure, metric-for-standard units, buffer overrun). All human activities have rules and require some degree of learning and understanding. And again, we tolerate certain levels of failure (driving) and/or restrict the set of people who can engage in the activities (smoking, driving, drinking).

Skill only gets you so far

I'm saying "certain concepts are easy," or "I would not hire someone to program who cannot understand certain basic things."

Sure, but even skilled developers working with easy concepts will produce error rates that are simply too high for most domains, unless backstopped by intensive testing, automatic error prevention systems, and code review. The state of the art really is that bad. Skill and understanding simply don't make a dent in the rates of many large classes of error, and there are practical limits as to how disciplined most developers are capable of being on their own. The rules behind manual stack allocation are easy, and most C++ developers are fairly skilled. That still means that screwups will occur more often than is acceptable in many domains. These screwups can be completely avoided by using memory-safe languages.

Even better, it's accompanied by irrelevant examples (hardware inspection failure, metric-for-standard units, buffer overrun)

Actually, these are quite relevant. There exist mainstream languages now that allow automatic compiler checking of unit types, and pretty much every language bar C/C++ prevents buffer overruns, some at compile-time. The Challenger fits as well. The proximate cause wasn't a hardware inspection failure, but a failure to follow an operational rule (don't launch when the temperature is below X) under the influence of performance pressures, previous schedule slips, tight deadlines, and patchy domain understanding. The analogy to software development is left to the reader.

The rules for preventing all of these failures were simple and easy to follow, indeed simpler than the rules for stack-allocating objects in C++. The rules got followed faithfully and skillfully by diligent and clever engineers. Well, most of the time, anyway. Call it 99 times out of 100, which is really about all you can hope for. Sadly, it wasn't good enough.