Petition for adding garbage collection to C++.

Dear fellow LtU members,

I took the initiative and started a petition for adding garbage collection to C++. You can find it here:

petition for adding garbage collection to C++.

Maybe if we get too many, the C++ standards committee will hear us!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Not trying to be inflammatory...

Are there any sizeable application domains for which garbage-collected C++ would be appropriate? I would have thought that most of the C++ development remaining was in realms for which garbage collection is viewed as inappropriate (games, embedded, etc.) Areas where GC would be a big win for C++ (enterprise, web) have pretty much been ceded to Java and the new scriptign languagues. Again, I'm not trying to flame, I'm just trying to figure out who would benefit from this.

Graphics

There are many applications where high throughput is required but where occasional latency is acceptable (unlike games and embedded applications). This includes things like 3D modelling and animation packages, and 2D image processing and compositing packages. These types of applications are typically structured as a C++ core, an array of C++ modules, and a scripting language to drive these modules. GC might help in these domains, although these applications sometimes require quite complex resource management that might make them unsuitable for GC.

About games...

Maybe Paul or Tim can double-check this, but at POPL 2006, Tim Sweeney remarked that the Unreal engine uses garbage collection everywhere. IIRC, the reason was that latency bugs in manual memory management were just too annoying to debug, and realtime GC just eliminated that entire class of errors.

GC in games

I'd be surprised if GC was actually used *everywhere* -- I'm guessing it's mainly for actors, resources, and other stuff that persists between frames. Maybe Tim can speak more on this.

A language like Java (flat GC'ed memory space and where structs cannot contain other structs) is death for interactivity. To create a data structure with 10,000 items (which can occur every frame in some cases) you need to either create 10,000 objects which are GC'ed, or arrays with 10,000 elements (hack).

It's interesting that Tim actually considered and rejected Java as a candidate for Unreal's scripting langauge...

IIRC...

Tim, please correct me if I misremember, but:

I think the Unreal technology GCs everything that is reflected in both C++ and UnrealScript. Of course, if you launch UnrealEd, extract the UnrealScript code from UT2004, and start browsing, that's a lot of stuff, and no doubt there's even more in the forthcoming Unreal Technology 3 titles.

I don't know what "structs cannot contain other structs" means.

There were good reasons for Tim to reject Java, especially at the time. There might be a VM with a good enough real-time GC today, but nevertheless, UnrealScript has some features, e.g. language-level support for hierarchical finite-state machines, that are really helpful to game developers and that Java lacks. IMHO, UnrealScript is an excellent 1995 design, but clearly Tim is heading in other directions these days. I can't wait to see the language he uses in Unreal Technology 4.

Composite objects

I don't know what "structs cannot contain other structs" means.

I'm blanking on the proper term for this -- but what I meant was an object in Java can only be composed of primitives and references (not arrays or other objects). This increases your total reference count, and thus your total GC workload. The workaround is to "denormalize" your class structures, but this trades efficiency for simplicity of design.

Gotcha!

Ah, you mean that you can't structure things in such a way that a couple of objects might be composed in such a way as to be laid out end-to-end vs. a pointer from one to the other being maintained. Gotcha!

don't forget

"most of the C++ development remaining was in realms for which garbage collection is viewed as inappropriate (games, embedded, etc.)"

don't forget most infrastructure and core system services in most modern platforms...

I was under the impression...

that that was still mostly vanilla C, rather than C++.

Am I missing something when

Am I missing something when I suggest using Boehm's GC library?

We've done it in Inkscape,

We've done it in Inkscape, but it's painful to integrate. For a desktop-style application (particularly a graphics one) you don't have the luxury of using it in full malloc-replacement mode because RGBA image buffers have a tendency to look like arrays of pointers.

The Boehm GC also doesn't play nicely with libraries which create their own threads and tools like valgrind; the necessary setup for the GC to be able to inspect the C stack is fairly intrusive.

(Obviously if I had the opportunity to rewrite from scratch, we wouldn't be doing Inkscape in C++.)

It's not true that there are no technical obstacles in GC in C++

A conservative GC like Boehm's GC is prone to memory leaks because it must treat any bit pattern looking like a pointer to some object as a potential pointer which keeps it alive. I've heard that this actually happens when the memory used has size comparable to the address space (in the range of gigabytes for 32-bit machines) and it contains lots of randomly looking bytes (e.g. a compressed video).

It's also slow. I've tested that it's even slower than malloc. It must be slow because a potential pointer may point into the middle of an objcet so the GC must find the beginning of each object, and because it can't move objects to other addresses, and because it doesn't know where pointers are located so it must scan all allocated memory.

OTOH a precise GC is incompatible with traditional C++ assumptions about data representation, e.g. that pointers are POD types, or that structures without virtual member functions have no allocated fields besides those specified by the programmer.

Already being considered

In this article, Stroustrup says that optional GC will most likely be part of the upcoming standard:

C++0x will most likely support optional garbage collection and it will support concurrency in the form of a machine model plus standard library facilities supporting threads (and maybe more). Some would consider that radical, but I don’t; people have used garbage collection with C++ for years (where that makes sense) and just about everybody uses threads sometime. In these cases, the issue is simply to standardize current practice.

Are there any sizeable

Are there any sizeable application domains for which garbage-collected C++ would be appropriate?

There are many domains where garbage-collected C++ is needed:

  • desktop applications. There are big applications like Word, Excel, Firefox etc that the performance of other languages is not appropriate. For example, a 20% less performance in Word would result in big delays in document repagination. Personally I have seen Word, Excel, Firefox, Explorer, Visual Studio and other flagship applications die horribly from memory management / pointer problems.
  • defense applications. Many apps in this sector has a real-time part and a non-real time part where garbage-collected C++ fits well.
  • game development tools. Many developers are frustrated to having to use a different programming language for writing their tools than the one their game is written on.
  • mobile applications. Check out this post...there are many similar posts around.

Areas where GC would be a big win for C++ (enterprise, web) have pretty much been ceded to Java and the new scriptign languagues

The only reason Java exists is because C++ does not have garbage collection. Take that away, and nobody will need Java anymore. Remember that Java was created as the language Oak, and Oak was created because someone tried to make an infrastructure with C++ for mobile telephony and failed.

A conservative GC like Boehm's GC is prone to memory leaks because it must treat any bit pattern looking like a pointer to some object as a potential pointer which keeps it alive. I've heard that this actually happens when the memory used has size comparable to the address space (in the range of gigabytes for 32-bit machines) and it contains lots of randomly looking bytes (e.g. a compressed video).

It's also slow. I've tested that it's even slower than malloc. It must be slow because a potential pointer may point into the middle of an objcet so the GC must find the beginning of each object, and because it can't move objects to other addresses, and because it doesn't know where pointers are located so it must scan all allocated memory.

I confirm this, and thanks a lot for the input. Indeed, Boehm's GC is good work, but not the appropriate solution.

In this article, Stroustrup says that optional GC will most likely be part of the upcoming standard

Unfortunately, the new upcoming standard in 2009 does not have garbage collection.

people have used garbage collection with C++ for years

I have not seen a garbage-collected C++ application in all my life. Where did mr Stroustrup find them?

Source?

Unfortunately, the new upcoming standard in 2009 does not have garbage collection.

Can you give a source, I couldn't find any final word on the inclusion or exclusion of garbage collection, although I may have missed this as I don't follow too closely what's happening regarding C++0x.

In any case, that they would have "most likely" added it shows that they are aware of the fact that GC in C++ would be a good thing and considered it seriously, and if they dismissed it, there probably was a reason. So instead of petitioning, you might rather want to identify why the old proposal was not appropriate, write a better proposal and submit it...

I do not have a final word

I do not have a final word either, but my impression is that they will not put GC in C++0X...and this impression is reenforced by the fact that the C++ standards committee gives priorities to things other than GC. There is only one vague reference to the article Stroustrup posted on artima, and that's it.

EDIT:

My opinion in favor of C++ has been reenforced lately after founding pages like this. How's that for a "simple-to-use language that the rest of us can use?"

EDIT:

The artima article also says that the C++ committee has stopped accepting any other proposals.

Unfortunately, the new

Unfortunately, the new upcoming standard in 2009 does not have garbage collection.

What do you mean, the standards committee are no where near completion. The C++ standards committee only finished accepting proposals last autumn/winter now they are currently carefully reviewing them all and a draft spec is estimated to be available in 2007/8. As far as i'm aware there has been no offical confirmation whether or not GC will be accepted in C++0x.

All indications suggest at the very least C++0x will acknowledge GC within the C++ standard itself, by providing more defined behaviours in other words C++0x will have better support for optional GC.

Making a petition is unnecessary because the standards committee have been discussing GC in C++ for years.

desktop applications. There

desktop applications. There are big applications like Word, Excel, Firefox etc that the performance of other languages is not appropriate. For example, a 20% less performance in Word would result in big delays in document repagination. Personally I have seen Word, Excel, Firefox, Explorer, Visual Studio and other flagship applications die horribly from memory management / pointer problems.

Big applications like that are not speed critical. In cases like that, improving the algorithms used results in a much greater speed boost than using a faster language. Even so, just because a language isn't as low level as C++ doesn't mean it's not fast (or faster). Most of the .NET languages are more than fast enough for pretty much anything outside of a few uncommon application domains. If you really care about garbage collection and speed, you could give O'Caml a try, which is about as fast as C++ for most things (although I personally can't stand the language).

game development tools. Many developers are frustrated to having to use a different programming language for writing their tools than the one their game is written on.

That's hardly a reason. Not everyone likes bashing their head on a rock because they don't like to learn.

the old mantra

"The only reason Java exists is because C++ does not have garbage collection. Take that away, and nobody will need Java anymore."

except, of course, to run and give support to the tons the legacy java code that's been developed in the meantime while C++ was sleeping...

as they say, old programming languages never really die.

www.inkscape.org

The vector drawing program inkscape uses libgc.

why do a petition?

despite the general opinion the c++ comitee is not ignoring the necessity of natively supported garbage collection for some of the languages application domains.

see this paper for work on the topic by boehm et al. (which is btw already in active discussion for one year now)

you can follow any of the bigger recent newsgroup threads about the topic for insight into the technical and design issues around gc and c++, for example this posting fest spawned by alexandrescu where various committee members attended.

personally i hope for gc in c++0x, but it may very well be that the next standard will not provide that. if so, though, because of the technical and/or democratic complexity of introducing a fundamental change in the language, and not for lack of interest.

Thanks a lot for the link.

Thanks a lot for the link. The author's views are similar to mine.

I really liked the paragraph where it says that "C++ is used in domains where manual memory management is not needed". At last, after all these years, someone recognizes the truth.

personally i hope for gc in c++0x, but it may very well be that the next standard will not provide that.

That's what I am afraid of.

because of the technical and/or democratic complexity of introducing a fundamental change in the language

It is not that difficult to add GC to C++. Microsoft has already C++/CLI out. Why should we have to wait for so many years for something so important?

It is not that difficult to

It is not that difficult to add GC to C++. Microsoft has already C++/CLI out. Why should we have to wait for so many years for something so important?

Because there are many issues to resolve it's nontrivial, choosing an appropriate model for C++, how it effects the rest of the standard, defining behaviours and what is undefined and left as an implementation detail, etc, etc.

If you follow the mailing list of the C++ standards committee you will see how diffcult it really is and how many years its been in discussion.

Quite. A LtU discussion of

Quite. A LtU discussion of the specific semantic implications is, of course, appropriate.

I am aware of the technical

I am aware of the technical details of GC and I see no great difficulty in introducing one in C++. There are technical difficulties, but nothing that can not be solved.

well...

It is not that difficult to add GC to C++. Microsoft has already C++/CLI out.

c++/cli introduced the concept of a managed pointer type which represents a boxed value, contrary to existing raw pointers which directly represent a value at a particular memory location. this additional semantic allows for a better gc (e.g. reduced memory usage through relocation), but essentially breaks compatibility with large parts of existing idiomatic pointer usage.

this may be ok for microsoft, where historically standard idioms were ignored for (often inferior) proprietary alternatives, but it is clear (and comforting) that standard c++ is bound to follow existing concepts in future extensions.

boehm proposes to add current non-intrusive gc solutions intrusively, while maintaining known pointer semantics. ensuring the backwards-compatible nature while keeping the proposal implementable is non-trivial at best.

by the way, native gc support may be politically important to increase public acceptance of c++, but i'd like to stress that there already are viable solutions like the free boehm collector or commercial alternatives that have been successfully applied to projects of any size.

c++/cli introduced the

c++/cli introduced the concept of a managed pointer type which represents a boxed value, contrary to existing raw pointers which directly represent a value at a particular memory location. this additional semantic allows for a better gc (e.g. reduced memory usage through relocation), but essentially breaks compatibility with large parts of existing idiomatic pointer usage.

There are certain things that shall not be done with GC pointers. For example, casting a pointer to an integer and back. Semantics like this is the responsibility of the end-user of the language. The language shall provide a clear way to say that a pointer is GC'd or not, and Microsoft's solution is a step in the right direction.

Personally I would not change the syntax or add new pointer types, but I would have the following options:

  • globally enabled garbage collection, with a compiler flag or pragma; programs will be garbage-collected as is, without syntax.
  • controlling GC areas with pragmas.
  • using a special attribute 'gc' before classes and pointers to make them GC'able.

ensuring the backwards-compatible nature while keeping the proposal implementable is non-trivial at best.

Care to elaborate? I can truly find nothing very difficult in keeping backwards compatibility.

but i'd like to stress that there already are viable solutions like the free boehm collector or commercial alternatives that have been successfully applied to projects of any size.

Of course there are. I have even written my own. But, in the end, if it is not part of the official standard, it is avoided. Perhaps not by me, but from the managers (and I have examples of this).

difficulties

The language shall provide a clear way to say that a pointer is GC'd or not, and Microsoft's solution is a step in the right direction.

a syntactical difference between collected and non-collected pointers is matter of taste. a semantical difference is rather dangerous though. for example CLI arrays cannot participate in the iterator pattern, which renders stl algorithms useless for these.

microsoft tackles all of these problems. the different syntax implies different semantics to the careless programmer. a duplicated stdcli namespace provides the stl with cli support.

this does not suffice for a standard based solution. imagine standard c++ shipping with two flavours of stl...

Care to elaborate? I can truly find nothing very difficult in keeping backwards compatibility.

type unsafety doesn't play very well with gc. are mistyped pointer values to be considered for collection?

pointers as array iterators create problems as well. are arrays pointed to into the middle considered for collection as soon as the first byte is not any longer pointed to?

see the original gc proposal by bjarne stroustrup for more of these issues. existing practise create conflicts with conventional gc concepts which is why c++ is particularily difficult to gc.

a syntactical difference

a syntactical difference between collected and non-collected pointers is matter of taste. a semantical difference is rather dangerous though. for example CLI arrays cannot participate in the iterator pattern, which renders stl algorithms useless for these.

microsoft tackles all of these problems. the different syntax implies different semantics to the careless programmer. a duplicated stdcli namespace provides the stl with cli support.

this does not suffice for a standard based solution. imagine standard c++ shipping with two flavours of stl...

No, shipping two versions of STL would be wrong. But I do not see what is the big deal, again. If two kinds of pointers are used, then STL containers and algorithms could be customized (as to what type of pointers to use) either by template parameters or type traits (or by pragmas/attributes). I prefer type traits, as it is easier to manage customization using traits (the type declaration is uncoupled from some of its attributes, thus making syntax easier).

type unsafety doesn't play very well with gc. are mistyped pointer values to be considered for collection?

A gc pointer is considered for collection. If the programmer does something stupid like ptr = 0xff0012cd, then it is the programmer's problem.

pointers as array iterators create problems as well. are arrays pointed to into the middle considered for collection as soon as the first byte is not any longer pointed to?

No. Pointers pointing into the middle of arrays is no different than pointers pointing to members of objects. If an object is reachable from any pointer, either one that points to the object start or a pointer that points to the object middle, it should not be collected.

see the original gc proposal by bjarne stroustrup for more of these issues. existing practise create conflicts with conventional gc concepts which is why c++ is particularily difficult to gc.

Thanks for the link, but I do not see the "difficulties" as very ...difficult problems to solve.

EDIT:

I have just read the small document you posted (thanks) and indeed I can not find something too difficult to tackle...nothing that accounts for delaying GC for C++ so many years.

Absolute pointers

If the programmer does something stupid like ptr = 0xff0012cd, then it is the programmer's problem.

It's been more than a decade since I used C or C++, but the pattern of addressing hardware through a pointer to an absolute memory address was fairly common when I dealt with hardware. Has there been something to replace this pattern of usage? That is, what kind of differentiation exists in pointer types? Pointing at an absolute location may be in error in some cases, but in for embeddeded devices it is fairly attractive feature.

It is difficult

It is not that difficult to add GC to C++. Microsoft has already C++/CLI out.

C++/CLI has introduced 3 kinds of pointers in addition to the old T* (T^, pin_ptr<T>, and interior_ptr<T>), and one new kind of reference (T%, which semantically corresponds to interior_ptr<T>). There are separate GC'd and non-GC'd heaps in addition to objects allocated on the stack and statically. Classes are divided into standard C++ classes, .NET reference classes, and .NET value classes (let's ignore .NET enums in addition to C++ enums, and .NET delegates in addition to C++ function pointers and function objects and boost/new-C++ function templates).

There are some rules about which objects may be put on which storage areas, which kinds of objects may contain which kinds of references, and rules about when objects are referenced and when they are copied. I'm not familiar with the details.

In addition to C-style arrays and standard C++ vectors, there are garbage-collected .NET arrays and .NET collection classes. In addition to C++ templates, there are .NET generics, using the same syntax but different semantics.

In short, the language is schisophrenic, and contains both C++ and .NET objects together, with their own sets of rules and restrictions. It doesn't show how to make C++ a garbage collected language. It shows how to let C++ and GC'd worlds cooperate. It might have done it as well as it's possible.

Why not just use D?

Isn't D appropiate for the class of problems that C++ w/ GC would be used for? (It has similiar syntax, generics, and it compiles to native code).

First of all, in D objects

First of all, in D objects are like in Java: they are all allocated on the heap.

Secondly, I dislike D template syntax.

Heap Allocation

they are all allocated on the heap.

Maybe I'm totally clueless, but isn't heap allocation (or something similar) necessary for garbage collection? So I guess you want to have the speed of objects allocated on the stack and garbage collection for objects allocated on the heap?

Secondly, I dislike D template syntax.

Can't say that I like C++ template syntax either - but then opinions are a cheap commodity. What specifically can you not do with D templates?

Maybe I'm totally clueless,

Maybe I'm totally clueless, but isn't heap allocation (or something similar) necessary for garbage collection?

Yes, GC objects are allocated on the heap.

Can't say that I like C++ template syntax either - but then opinions are a cheap commodity. What specifically can you not do with D templates?

D templates are similar to C++ templates, but it is the syntax I do not like - not the capabilities. Of course D's syntax is what it is in order to make the parser LALR(1), which C++ isn't.

hello

First post here at LTU.

I am a long(2+ years?) lurker, but never really posted as a good amount of stuff here is *way* outta my league.

D objects are generally allocated on the heap except when using the auto keyword.

class Foo {}
auto Foo x = new Foo() ;
// this is allocated on stack, and destructed on leaving scope

D has also recently acquired a small amount of type inference:
auto x = new Foo();

This statement has no type declaration so is not allocated on the stack, but D structs are allocated on the stack and have most of the class capabilities. I think its a good tradeoff.

D just got new lambda-like delegate syntax, policy-like scope commands:scope(exit), scope(failure), scope(success).

Sorry for rambling but D really is a better C++.

A minor correction.

The auto keyword in D does not allocate D objects on the stack. It allocates them on the heap, but their destructor is invoked when the scope is exited.

D has many goodies, but I want my objects to live on the stack as well as on the heap. I am not ready to surrender stack allocation.

you are correct

I remember Walter considered making auto objects on the stack, since they must go out of scope.

Nevertheless structs remain on the stack. They have almost all the methods of classes, and can be parameterized.

I am still curious why you put so much emphasis on this?

Are you are thinking of speed? My programs run the same or faster than my C++ equivalents. It does require differing idioms in certain cases, but the power is still there.

And I finish faster allowing the optimizer between my ears to help out.

The new syntax for delegates is very nice:

auto times2 = (int x){ return(x * x); }
writefln( times2(100) );

Thanks to array functions being automatically applied to arrays the following is possible:

int[] array;
int[] x = array.where( (int r) { return r<=10; } );

int[] y = array.map( (int r) { return r*3; } );

All this with the compiled speed of C++, and faster compile times.
It isn't perfect but its at least as good as C++ imo.

Details...

The auto keyword in D does not allocate D objects on the stack. It allocates them on the heap, but their destructor is invoked when the scope is exited.

D has many goodies, but I want my objects to live on the stack as well as on the heap. I am not ready to surrender stack allocation.

Are we discussing languages here or implementation details?

Performance is generally an

Performance is generally an implementation detail. For example, it's conceptually possible to analyze whether or not a variable is ever used or stored outside the function, allowing the compiler to put what it can on the stack.

And that's why...

... people who do not understand C++ will never understand it. Performance isn't an afterthought in C++. The biggest blocker in C++ to multiple inheritance wasn't syntactic detail, or even semantic meaning. The biggest argument against it is performance, since you've got to re-adjust the virtual table for the second and third, etc. bases. Maybe I'm too young to remember when C++ was pushed as some sort of magical OOP solution (only been coding for a decade, and commercially for half that). But modern C++ is about retaining absolute efficiency, whilst providing comfortable interfaces. Admittedly the average C++ coder is willing to live with more than necessary. Look at the stuff that Boost is producing. There is a reason that something like multi_index container hasn't been produce in any other imperative language (to my knowledge; please correct me otherwise). No other language allows the sort of control that C++ allows whilst building abstractions. Of course there are problem. There are issues. But as someone else here has pointed out, the things that people usually suggest simply throw away bits of C++ that people actually use, trying to shoot for 80%, which for once is not good enough. And this is coming from a guy who's other language of choice is Haskell.

eh?

I'm confused...what are you getting at? C++ is probably the worst language I can imagine in terms of its relationship with optimization, not because it's generally fast, but because it sacrifices a lot of flexibility for speed by default, which leads to a lot of extra work when this isn't true. Examples of this include: non-virtual classes, lack of introspection capabilities, lack of dynamic dispatch, etc. which are all inferrable by the compiler most of the time. I'd rather the language default to flexibility, while still giving me the chance to optimize it later if I need to. Actually, one might even say that the strictness of C++ supports premature optimization.

Feature not bug

I'd rather the language default to flexibility, while still giving me the chance to optimize it later if I need to.

I agree, as I'm sure many others here do. Historically, however, what you describe hasn't been so easily achieved, and the languages most abstracted from the machine still tend to have issues in that area. In that context, the C++ "relationship with optimization" has been one of its biggest selling points.

flexibility wanted

Anton van Straaten: I agree, as I'm sure many others here do.

Absolutely; it's agonizing the default is not flexibility before speed. The only escape that looked easy to me was writing a high level flexible language in C++ that lets me drop into C++ when I finally get to the optimization stage after covering bases first. Most people use scripting class languages for this general purpose.

Many features and support frameworks don't get done (ever) in C++ projects because there isn't time to do them, and ironically most have slim performance needs. Often there's only time to blindly careen down the critical path once for a product, hoping the organic growth of change in response to discoveries won't render old code too chaotic.

Curtis W Actually, one might even say that the strictness of C++ supports premature optimization.

That's it in a nutshell. Let's pretend coding is like writing prose. In C++ you tend to spend all your time writing optimal verions of first drafts without any time to revise or explore your thesis, or to add illustrations, guides, and tables of contents. Also, there's no time to proofread the result, and you couldn't proofread if you wanted to, since optimization forbids review.

Exactly

I believe even Straustroupe said something like: "Inside C++, there's a clean language trying to get out". Everyone knows that C++ is a huge mess; no other language has a language reference that thick (even Ada). When was the last time you pondered the use of a protected pure destructor? However, the particular combination of features, whether it be by luck or by design, that have congregated in C++ means that some very interesting pieces of code can be made. For example, a custom memory allocator that allocates things inside the AGP apeture can be written, and integrated into existing code seamlessly, using the same coding pattern.

Below it's been mentioned that programmers might be underestimating the compiler. While this might be true for trivial things like data packing or instruction ordering, this becomes less true when it involves things that the compiler has no knowledge of: data invariants, platform garantees, etc. The only languages existing right now that could deliver this data to the compiler in a declarative form are dependent typed languages. If you want, look at C++ as an attempt to carry out multi-staged imperative programming. At each level you walk the compiler through what you want, often overriding what it's warning you about, hushing it with some arcane invocatio. Until we have a dependently typed language that we can actually explain everything to, C++ will manage to pull off efficiencies and cute/neat (for C++ at least) abstractions that other languages can't.

My defence of C++ basically amounts to that it actually allows me to build more abstractions in an executionally efficient manner than other languages. It does not allow me to do this easily, but something is better than nothing. After a while, you get used to the crap, and understand that you're looking at patterns that actually mean something underneath, and begin to marvel that it's even possible.

C++ occupies a very interesting design point; one that in the wake of its success and subsequent shitstorm is very easy to miss and underestimate.

I think it's psychological

I think it's a psychological thing: Many people deeply mistrust the compiler doing good optimization and want full control for themself because they think they can do it better (even if they're wrong). Many programmers are simply kind of 'control freaks'.

OTOH: In the moment there is no real alternative to C++. Creating a nice language is one thing, but to make it really usefull you also need libraries and tools. And I don't even see a real alternative languagewise to C++: While it's possible (and maybe even relativly simple) to create such a language, most attempts simple went into the wrong direction to really satisfy most C++ users.

Examples of this include:

... lack of introspection capabilities, lack of dynamic dispatch, etc.

Eh? C++ does support dynamic dispatch but only single dynamic dispatch not predicate or multiple dispatch and C++ does support a (very) limited form of introspection via RTTI and if accepted in C++0x will support eXtended type information (XTI).

Also lets not ignore the fact that C++ has rich compile-time introspection capabilities via type traits in std::tr1 or boost type traits. No offense but are you sure you know (enough) C++ to really make comments or judgments.

As i mentioned earlier people seem to be ignoring actually what (advance) C++ programmers do with C++, most likely due to the fact that they don't really know or don't know completely.

In Curtis' Defense...

...I took his point to be precisely that C++ doesn't have multiple dispatch, that this is a big enough issue to be addressed, e.g. in Alexandrescu's Modern C++ Design, and that C++'s RTTI is essentially useless for purposes such as developing orthogonal persistence implementations or supporting precise garbage collection, two of the most obvious domains in which a proper reflection/introspection facility would help.

XTI seems to be perpetually on its way; I can't even find a reference implementation anywhere. C++0x isn't due to be final until 2009 (!) and that's just the spec—we all know how long it takes for our favorite compilers to conform to new standards, and if you've been following the process, you know that, e.g. "auto" support basically means a rewrite for most compiler teams. This doesn't give me warm fuzzy feelings about C++ (then again, it's far past too late to give me warm fuzzies about C++).

Yes, I meant multiple

Yes, I meant multiple dispatch.

Why not D

That's exactly what I was thinking. Just the fact that there's no header file crap is a huge improvement over C++, plus all the other features.

Achilleas, I disagree with an earlier thread post you made about the only reason that Java exists is because it had GC. GC is a feature, but I believe that the big benefit of Java over C++ is the number of crossplatform packages that Java provides out of the box.

wtf?!

"Just the fact that there's no header file crap is a huge improvement over C++"

what are you talking about?! C/C++ headers are very handy interface/module specs. All i need to do to use a library is to take a look at the header, rather than, say, navigate throughout a large body of Java source and search the methods one by one...

i don't believe it's good design to let that kind of need in charge of external tools, i.e. class browsers in IDEs...

Are you kidding?

Hehe, so you're saying that all the rest of the languages that don't use header files (basically all of them except C/C++) got it wrong? Maybe you should propose header file inclusion in the next revisions of Ruby, Lisp, Python, Java, etc... because you're having such a hard time using proper tools.

interfaces

All i'm saying is that header files provide handy standalone interface spec. Don't try to read too much into it.

Java has proper interfaces and it'd be good if all created classes were backed by one or more of these, but alas, since it's not needed, seldom are they used except to implement system interfaces...

Python will get its own interface mechanism eventually. But even now it's not really hard to browse python code and get a listing of the methods -- not even saying you can quickly get them by dir() -- because the language is rather dense when compared to Java and the likes. Same can be said of ruby/perl/etc. Scheme module systems provide similar listings of export symbols, so you can quickly search for the definitions.

But really, separate interfaces containing the prototypes for exported functions are a must. I truly enjoy C/C++ header files. Pascal does something similar, except the interface and implementation sections are in the same file. Of course, distributed binaries only come with the interface section intact...

Interfaces have nothing to do with header files

As others have stated, C/C++ header files are basically a textual substituion hack, probably related to compiler technology of the time and not having a module system.

Header files have nothing to do with Java interfaces, which are more of subtyping contract. Python getting interfaces has more to do with an eventual real compilation implementation (it already has Duck typing), and Guido's interest in positioning Python as an "enterprisey" language.

In any case, reading header files for documentation is a pretty piss-poor strategy for understanding interfaces that don't have proper documentation and/or programmers that don't have proper tools. As a last resort, it's fine, but you end up wading through a bunch of macro crap, and in the case of C++, some actual implementation code.

regardless

header files represent the interface to a particular library.

"Header files have nothing to do with Java interfaces"

conceptually, they are both abstract prototypes of functionality promised to be implemented by others. By following the interface -- a contract -- you can be sure the desired functionality will be present to you.

"In any case, reading header files for documentation is a pretty piss-poor strategy for understanding interfaces"

No documentation is as faithful as a coded interface. And seemingly, developers have dealt just fine with it for about 30+ years...

header files represent the

header files represent the interface to a particular library.

No, not really. Header files also tend to contain many symbols that are not intended to be publically used.

conceptually, they are both abstract prototypes of functionality promised to be implemented by others. By following the interface -- a contract -- you can be sure the desired functionality will be present to you.

Conceptually, yes, but from a language semantics point of view, no.

No documentation is as faithful as a coded interface. And seemingly, developers have dealt just fine with it for about 30+ years...

Except that it's likely you will need context, whether that is implementation details and/or proper documentation, in order to use those functions.

But if I'm just interested in symbols I'd rather have an editor that can understand the language and present to me those symbols with visibility context and without having to wade through header files.

I've coded C++ as my job for

I've coded C++ as my job for the last year and not once consulted a header file as interface documentation. Probably because I depend mostly on popular open source libraries, so they're well documented.

Also, using dir() in Python is superior because its clear what is meant to be public/_protected/__private, while this is not the case with headers.

collaboration

dataangel: ...not once consulted a header file as interface documentation.

If you had a coworker who wrote any code you needed to use -- or vice versa -- how would the two of you inform each other about API and usage conventions? Read the code?

Some coworkers seldom (sometimes never :-) write documentation, and often resist giving any evidence about their coding intentions (since then errors are undeniable).

hmm

"its clear what is meant to be public/_protected/__private, while this is not the case with headers."

In the case of vanilla C, the headers represent the public interface of the library. functions and data meant to be kept private to the translation unit are declared static. Which is a consistent practice.

It's kinda obnoxious declaring a C++ class in a header, though, and show the private members that people using the class should not be concerned with...

Ancient technology

The problem with header files is not that they separate interface from implementation - that is a good thing.

The problem is, that this is done based on an incredibly primitive, fragile, and ineffective mechanism, namely textual inclusion. No separate compilation, no proper interface/implementation conformance checks, no namespacing, no protection from horrible macro interference, no whatever.

Have you never wondered why a C compiler spends, say, 80% of its time in processing the same header files over and over again? Why you have to wrap every header in a stupid #ifdef? Why you need to be so careful about the order of #includes? And so on.

if it's working...

"based on an incredibly primitive, fragile, and ineffective mechanism, namely textual inclusion. No separate compilation"

what good would separate compilation of an interface consisting only of function prototypes that'll get inserted in the final code bring to the table?

"no proper interface/implementation conformance checks"

why would that matter, if the interface is getting textually inserted into the code so the compiler will complain of any mismatches anyway?

"no namespacing"

C single namespace is really not that bad as most people picture it. If two classes share the same name in java, to disambiguate you use the full module path in all references to the class. In C, you prepend a proper prefix to the functions, so its mostly just a hardwired "path".

"no protection from horrible macro interference."

some people actually find macros, even primitive like cpp's, quite handy.

"Have you never wondered why a C compiler spends, say, 80% of its time in processing the same header files over and over again?"

no, because i actually "wrap every header in a stupid #ifdef", cause that's the standard idiom when dealing with C and i'd rather follow it than fight it.

If it's working, don't fix it. There's no need to transform C/C++ into OCaml or Java.

Here:

Here are some more rationales for proper module support (section 4).

Really, this is widely

Really, this is widely regarded as a huge flaw in C++. Textual inclusion of headers is an arcane technique. The language is in desperate need of a module system, and there are a few proposals to add one to the next standard (but I won't hold my breath).

CMod

The CMod project at might be a good starting poing.

CMod is a tool written to enforce a modular programming style in C code, to ensure that modules, when linked together, yield a type correct program.

C syntax does not provide direct support for modules. But over time, programmers have developed a discipline of modular programming which treats .c files as module implementations and .h files as interfaces. However, without proper enforcement (of the requirements of a module system), mistakes can be made which will not be reported by a normal C compiler. In particular, one module can declare that it requires some symbol of type T1, but in the module defining the symbol it actually has type T2. This problem is compounded and obfuscated by the extensive use of preprocessor directives which is common in large C programs. These mistakes can lead to confusing errors that manifest themselves at runtime and are hard to track down.

The reason this problem exists in C is because the language has only an informal notion of an external interface: the .h file. CMod provides a way to treat .h as more formal interfaces. In particular, it enforces three rules as to how .h files should and should not be used; if the rules are followed, checked by CMod at compile-time and link-time, then the final program is likely to be type correct. In addition to this benefit, it is widely recognized that modular code is easier to maintain, easier to extend and more readable.

small correction

D has no *required* separate header files.

You can and do make them as you like.

EDIT:
The dmd compiler generates header files for you as well.

Achilleas, I disagree with

Achilleas, I disagree with an earlier thread post you made about the only reason that Java exists is because it had GC. GC is a feature, but I believe that the big benefit of Java over C++ is the number of crossplatform packages that Java provides out of the box.

GC is one of the primary reasons why there is a large number of cross-platform packages. It is much easier to write libraries when you do not have to deal with manual memory management.

If C++ had GC, I would have written many libraries myself (as open source, that is), including a GUI library. I started to do so, but the project died because it was getting too complex.

GC only a small part of the reason

GC is only a small part of the reason that Java has the level of library support it does. The portable bytecode format, simple standard distribution format, standard and extensible dynamic loading capabilities, and low-cost high-quality cross-platform build tools are equally important.

Distributing a common C++ library tbat can be used by all developers pretty much requires distribution as source, including a build file, which may or may not work in the user's environment. Using the library may or may not require building it, but certainly requires worrying about supported OSes and link file formats, expected locations of header files, and probably a good deal more stuff that I've mercifully blocked out. You'll also need to worry about all of that for any libraries the library you want depends on, and if you need multiple versions of the same library, god help you.

Distributing a common library in Java requires slapping a .jar file on a website, with a javadoc tree if you're really being posh. Using that library requires nothing more than pointing your IDE or build script at that file or URL. The latest tools will automatically handle versioning and dependency management, although those tools will be obsoleted when built into Java 1.7.

Java has eclipsed C++ in the enterprise and web spaces for a lot of reasons, including portability, lower training costs, decreased complexity, higher security, and some amazing low-cost high-quality tools. Merely adding GC to C++ isn't going to do anything to change that.

I disagree.

I disagree.

simple standard distribution format

Any language can have that.

standard and extensible dynamic loading capabilities

I have happily used dynamic loading in C++ applications through DLLs in many apps.

and low-cost high-quality cross-platform build tools

...which are made possible due to GC.

Distributing a common C++ library tbat can be used by all developers pretty much requires distribution as source, including a build file, which may or may not work in the user's environment.

Not really. People are happy to download the binary versions of the library they want to use, and this has been happening quite a few years. And there are not that many different platforms to care about these days; you can count them on two hands.

but certainly requires worrying about supported OSes

You need to worry about supported OSes in Java as well. Not all libraries are available for every O/S under the sun, and each O/S requires its own wrapper from the Java side...which is exactly the same as in C++.

and link file formats,

Hardly a problem, because libraries are easily available for all major compilers, since most of the compilers are free.

expected locations of header files

That's no different than supplying classpath in Java programs. The actions I have to do to link a library in C++ are exactly the same as in Java: I have to point the C++ IDE to the directories that include files and libraries exist, just like I have to point the Java IDE to the directories that contain the java libraries.

Distributing a common library in Java requires slapping a .jar file on a website, with a javadoc tree if you're really being posh. Using that library requires nothing more than pointing your IDE or build script at that file or URL. The latest tools will automatically handle versioning and dependency management, although those tools will be obsoleted when built into Java 1.7.

That's exactly what I have been doing with DevCPP on Win32: I click 'update' and the latest library for Mingw32 is downloaded and installed, automatically.

Java has eclipsed C++ in the enterprise and web spaces for a lot of reasons, including portability,

There are thousands of portable C++ libraries around the web.

lower training costs,

That's a function of 'decreased complexity'.

decreased complexity, higher security, and some amazing low-cost high-quality tools.

And all that is because of garbage collection.

Do a mental experiment: take garbage collection out of Java and re-evaluate your statement. None of your arguments will really hold, except perhaps the argument about bytecode. Without GC, it would not be possible to have decreased complexity, low-cost high-quality tools, higher security etc.

By the way, which are the high-quality low-cost quality tools you are talking about?

You're missing the point

I never indicated that any of the nice things I listed were impossible in C++. I meant to indicate that they were more difficult, non-standard, more expensive in terms of developer time, and that these difficulties adversely impacted the amount and quality of libraries available. Moreover, it's not GC that causes those difficulties, it's mostly compilation to native link formats rather than zipped-up bytecode.

Take security, for example. There's nothing that prevents C++ code from being as secure as Java code, it's just a hell of a lot more difficult to do, requiring painstaking care and attention to a lot of details. That means more expensive, and lowers the probability that any given C++ library is secure. That's what people mean when they say "We aren't using C++ because of security risks".

It does absolutely no good to examine that things that are possible if you are trying to explain the way the world is, or could feasibly be. Instead, you have to examine the things that are economical. "It could be done, but it takes a lot more work for not much payoff" is almost precisely equivalent to "It won't be done". Adding garbage collection to C++ isn't going to make any of the things I listed any more economical to perform, so it's not going to make cross-platform libraries any more likely to manifest.

Why not just use D?

Why not just use D?

If we are going to suggest alternatives then why not use O'Caml for exactly the same reasons suggested.

Big difference

A C++ programmer will almost immediately recognize and be able to code in D.

That wasn't really my point,

That wasn't really my point, the same could be said for Cyclone or any safer C dialect and i personally do not like or think D is better than C++ in every aspect.

As far as i'm concerned D still has along way to go (despite some of it's advantages) before it is truely better than C++ and the fact of the matter is C++ is still evolving besides.

It's pretty much the point though

Your personal dislike of D is pretty much irrelevant of why D would be taken up by C/C++/Java/C# programmers before Ocaml. Yes, syntax matters.

I'm not C++ programmer then?

I'm not C++ programmer then? i really cannot fathom a C#/Java programmer to take up D to replace either of those in their jobs.

You still didn't get my point (but you proved it for me), i never said syntax never mattered or not either.

More likely than Ocaml

Once again, it's not about your personal dislike for D. It's about D being a more likely alternative choice for someone coming from a C++ background than Ocaml. That was why the OP brought up D and not other random statically typed, natively compiled language.

I have to agree with Dave.

I have to agree with Dave. O'Caml is nothing like C++.

That was why the OP brought

That was why the OP brought up D and not other random statically typed, natively compiled language.

Suggesting O'Caml as alternative to C++ isn't that random at all, despite the syntax.

I have to agree with Dave. O'Caml is nothing like C++.

I never said it is well i never meant it, admittedly i did miss out (or it was added later) the statement about having a "simillar syntax" but i did not say it's like C++.

I'm suggesting another alternative that just as with D, O'Caml has parametric polymorphism and you can compile O'Caml to native.

My original point was just as how much my opinion maybe irrelevant is just much as how making suggestions to alternatives is being off topic here.

Felix

Felix seems like a good middle ground.

C++ doens't need GC -- it's got RAII

That RAII hasn't been mentioned anywhere in this discussion is telling.

If you are having problems with memory resources in C++, here's some advice: "DON'T PROGRAM C++ LIKE IT'S C -- IT'S NOT." Use smart pointers. Use the STL containers. Don't call new and delete outside of constructors and destructors.

Hell, if you're writing C++ code that could be improved by garbage collection, it's almost certainly not even exception safe.

Even RAII Doesn't Scale

Let me refer you to Tim Sweeney's POPL '06 slides where he discusses the Unreal 3 Technology. Pertinent observations include:

  • Gears of War is about 250,000 lines of combined C++ and UnrealScript
  • The underlying Unreal 3 engine is about another 250,000 lines of C++
  • Gameplay simulations objects are "usually garbage collected"
  • "Garbage collection should be the only option" in "The Next Mainstream Programming Language"

Some more details about why Epic makes the engineering choices that they make can be found in Tim and Martin Sweitzer's GDC 2006 session, "Building a Flexible Game Engine: Abstraction, Indirection, and Orthogonality," the MP3 of which is available (cheap!) here. Highly recommended.

The bottom line for very large C++ codebases seems to be that lifecycles become too complex to reason about effectively, and RAII therefore becomes too confining. It's interesting to note that the Unreal technology has always included real-time garbage collection, and the Unreal technology runs neck-and-neck in performance to similar technology that, AFAIK, does not rely on garbage collection, e.g. id Software's technology. It's also interesting to note that O'Caml, which in most domains achieves competitive performance with C/C++, also features extremely effective garbage collection.

So I think garbage collection has, at long last, won enough battles, and what battles it hasn't won it seems that region inference will, and manual memory management is, at long last, on its way out. (Of course, in LtU-ish timeframes, "last legs" and "on its way out" should be interpreted liberally, say another 10-15 years.)

Update: I'm wrong about Unreal always having had real-time GC, as this blast-from-the-past mailing-list thread reveals. It's fascinating to read what Tim was thinking about in 1999!

Most people find that RAII scales fine

That Sweeney thinks RAII could never have worked in a project that never had it is unconvincing.

The best thing that can be said for garbage collection is that it's the ideal tool for 110-IQ programmers and half-baked designs. That's not denigration of GC. Much of the world runs on software programmed by 110-IQ programmers working from half-baked designs. GC is more forgiving to people who don't know the craft.

On the other hand, well-designed projects from the 130+ IQ programmer crowd will always have their place (this statment is anathema to the productivity gurus, I know). So I'd be surprised if GC takes over, even given 15 years.

Let's not a start a

Let's not a start a "language for smart people" thread again (search the archive if you must). Let me just remark that since some of the classic examples of GC come from the most highbrow languages I, for one, wouldn't want to defend the claim GC is for 110 IQ programmers (I guess this refers to Wadler, Peyton Jones etc. ;-)

Forgiving to them

The "is this language for smart people" question is the most overlooked element of language design. It probably needs more discussion. But another time and another place.

To say a feature is forgiving to the bad programmer is very different from saying that a feature is only useful to the bad programmer. I was careful to only say the first. (On the other hand, I'd probably agree with the statement that GC is most useful to the bad programmer, and is less useful for someone who knows his craft.)

LFSP

If you are interested in this you should read our previous thread. Search the archive if you want more...

As a development manager...

On the other hand, I'd probably agree with the statement that GC is most useful to the bad programmer, and is less useful for someone who knows his craft.

As a development manager, why exactly would I want my most skilled developers spending their time on workaday memory management issues, rather that building new features or high-quality infrastructure?

Either way, why should a

Either way, why should a newly designed language not support automatic memory management? Why force manual management on the programmer when this is a solved problem? GC is perfectly good in the general case and when you want more control then regions/reaps allow this, while retaining safety.

I personally do expect GC to have taken over within 15 years. What language won't be using it? Already, Java/C#/Haskell/Erlang/ML all use garbage collection. The notable exceptions are C and C++. I think it is unlikely C++ will remain dominant for applications programming. C will probably survive for low level software like operating systems, but even then there are languages like Cyclone that are well suited to this area and choose to replace manual management with regions.

If this discussion (which is

If this discussion (which is kinda similar to the "I like D"/"I don't" subthread) is to continue how about pointing to specifics? While I am sure I know what you guys think are the arguments for your opposing positions I may be wrong, and you guys may also be arguing without knowing the real position of the other party to the debate. What are the disadvantages (please give specific data, perferably published) of GC? What are the counter arguments to that (you can point to the archives for links to relevant papers)? What are the advances in the field that might make GC relevant (ditto)? Why are these not enough?

If you have personal experiences worth sharing please do so, but hey - let's remember not to overgeneralize (see the policies doc for more ;-)

experience and opinion

Ehud Lamm: If you have personal experiences worth sharing please do so,...

Most of my gc experience is with (early) mark and sweep collectors and (later) copying collectors I wrote myself for Lisp and Smalltalk implementations, so I've little real world feedback on systems used by lots of people. But around 2001 I noticed something about processor speeds that affected my gc plans.

I was writing infrastructure to move bytes around in bulk, and I wanted some time statistics on moving blocks of memory in RAM, so I wrote tests to get crude numbers for back-of-the-envelope calculations on expected server behavior. I was amazed to find I could move a megabyte from one place to another in one millisecond, or do something complex like crc32 in 6ms per megabyte. (I'm sure things have only gotten faster since then.)

Since most of the complaints I'd ever heard about gc (from the olden days) were about latency interfering with real time response, these numbers made me excited. I realized I could easily copy collect a few megabytes in a few milliseconds, and thus have high responsiveness in collected systems. Provided I made sure I partitioned memory into small enough (ie several megabyte) disjoint partitions, I could ensure no particular gc event had latency more than a few milliseconds.

So I thought increase of processor and memory speeds during the 90's had turned an improvement in quantity into an improvement in kind, since it was now feasible to implement imperceptible garbage collection, given some moderate discipline in memory partitioning. I wrote about this in my weblog back then, since I expected the affect of partitioning to be similar to generational garbage collection in terms of latency and memory footprint.

Oh, that's the other old complaint about gc: a naive implementation might reserve too much free space to receive the result of a copy during collection. Classical (1970) Cheney copy collection describes the use memory in two alternating hemispheres, implying latent waste of half of memory. But if memory is divided into N disjoint partitions with no references between partitions, you need only enough free space to copy the largest partition under copy gc. This decreases memory footprint for memory in use, and avoids some complexities in generational gc.

Ehud Lamm: What are the disadvantages (please give specific data, perferably published) of GC?

Next I offer unpublished intuitive opinion that would be hard to falsify, and therefore suffers from lack of scientific rigor and lack of evidence. But a game theorist might have a fun go with it.

I'm only aware of a gc disadvantage -- for some styles of gc -- when used with manual memory allocation in the same address space as a collected model requiring pervasive exactness and correctness. It would be easy for sloppy code in C and C++ to whack the gc runtime with a tiny memory corruption; so I'd expect gc to go down first like a canary in a coal mine. A conservative collector like the Boehm collector in C++ wouldn't have this problem, since it errs in the other direction, of not collecting when in doubt.

This is only a disadvantage to someone who really wants to mix randomly collect C++ libraries in the same address space as a gc engine that's very sensitive to corruption. Too much C++ and C code has a higher tolerance for memory poisoning, having evolved in a memory poisoned environment. This is a result you can derive from worse-is-better philosophy plus some game theory.

I'd advise folks to make a gc address space immutable from the view of C code, however you arrange for this, if you don't want gc to bear the brunt of blame for failures caused by memory corruption.

Real-time GC

Provided I made sure I partitioned memory into small enough (ie several megabyte) disjoint partitions, I could ensure no particular gc event had latency more than a few milliseconds.

That sounds about right. Real-time GC systems like IBM's Metronome (described further in this article) claim "worst-case delays of 2 milliseconds". I just mention that one because I saw it demoed - there are others out there.

RAINAP (not a panacea)

Using RAII for memory management scales fine in the kind of projects which are suited to using RAII for memory management. Not coincidentally, many of the kinds of projects that get developed in C++ are a very good fit for RAII.

However, there are many memory allocation patterns for which RAII isn't well-suited. For example, in cases where shared objects have a lifetime exceeding that of the variable which references them, using RAII typically means using something like reference-counted smart pointers. This has performance implications, and also creates issues with reclaiming memory from cyclic structures. These issues can be quite a show-stopper for many kinds of applications.

I've only touched on some of the issues. For a more detailed explanation of some the other factors, including others which hinder scalability, see Why is Garbage Collection A Good Thing. That should help in understanding why, in a system such as Unreal, it is highly unlikely that RAII alone would be a sufficient solution for memory management. (And why garbage collection is used gratefully by those with high IQs, too...)

My personal opinion

I find that RAII often is sufficient for my needs when I program in C++ (which is rarely nowadays). But one area where it falls flat on its face, and I strongly suspect that many others share this view, is with graph-like data structures (which incidentally is where reference counted pointers will fail too). For example, writing the parser stage of a compiler is more or less straightforward, but it is extremely painful writing even the basic structure of the optimization passes with graphs of basic blocks. In such a case, in C++, I'd be more inclined to use something like pool allocation.

A Good Point That Someone Else Made...

... is that you can't use RAII when you're exposing an interpreter to your users, i.e. when your allocation/deallocation patterns are truly dynamic. This is precisely the case that Tim ran into with the Unreal technology: there are C++ and UnrealScript objects that are intrinsically coupled and with totally dynamic lifetimes. Consider that, from the Unreal console, you can "summon" an instance of any UnrealScript class, which by definition means a paired C++ class. RAII can't help you here.

Region Papers

In the cases where GC might not be the best solution there is region-based memory mangement, as mentioned earlier.

Papers on regions, reaps and region inference can be found here at the MLKit website, and at the Cyclone site here.

There is also a brief explanation and comparison to GC here.

For those in the dark, read

For those in the dark, read on RAII on WP.

Also relevant for this discussion is this paper (by Boehm).

RAII versus

RAII works wonders in many applications. It's also predictable, which is good in latency sensitive contexts.
Now consider multithreaded programs, with lots of shared data. For shared objects you'd have to use a technique such as reference counting, and whose counter is protected by a mutex. Even if your code is exception safe.
That's lots of overhead every time you want to access the object.
Moreover, you risk getting hold and storing the raw pointer inadvertently.

At this point the (performance+ease of development)/drawbacks ratio may tip towards the use of gc.

This is one of the reasons Java is being increasingly used for long-lived server software.

And by the way, using gc for shared objects need not preclude RAII.
Objects can still be allocated on the stack or as part of an enclosing structure (by value).
Operator delete can still perform its job and tell the GC an object is definitely out of scope.

Obviously your requirements

Obviously your requirements differ from mine. Please read up a little on why RAII is no replacement on GC, there are many links around.

exception safety

if you're writing C++ code that could be improved by garbage collection, it's almost certainly not even exception safe

Yep. Exception safety in C++ is hard, compared to most any other language out there. The principal reason is (drumroll) manual memory management: most exception-safety problems I've seen were memory leaks, a GC would've fixed them easily. Exception safety in Java is much less of an issue, because Java has GC.

The root cause is deeper, I think. C++ relies heavily on mutation - there are always those pesky side effects to undo. In a language where immutable objects are the default, like Haskell or ML, the programmer almost never has to worry about "exception safety".

On the subject of current alternatives

While we are or where on the subject of *current* alternatives to C++ (which have similar syntax to C/C++) i have chance to say something i've being meaning to say for a while but i haven't found the right opportunity to do it, so now seems like the perfect place.

I think there are good reasons why all current alternatives to C++ (which have similar syntax to C/C++) will never take the majority of C++ programmers away.

These alternatives either:

(Over) emphasizes 1 or 2 (or maybe even 3) main features which fix problems C++ while typically removing other certain features without actually replacing them at all with better alternatives. At the same time they still have a (relatively) ad-hoc type systems just as with C++, no real formalism has been applied and don't offer a rich type system.

Or

They have formal (or semi formal) type system that again only emphasizes dealing 1/2/3 issues with C/C++.

Either way they don't deal with all the other things that advance C++ programmers do with C++ and these rarely are to do with low-level programming (but often the case of efficiency).

For instance D, i would say D falls into first category last time i checked out D doesn't have a preprocessor or it's a small subset of C/C++'s preprocessor (i can't remember exactly) and before anyone jumps down my throat just wait and read everything first, give me a chance.

For the advance C++ programmer who knows how what there doing they will use the C++ preprocessor for macro metaprogramming with boost.preprocessor to aid them.

I know what your about to say "but D supports template metaprogramming...." yes i know but what about repetitive boilerplate code of template metaprogramming? let me tell you when you do alot of template metaprogramming (typically with boost MPL) you can get a bucket load of repetitive boilerplate (meta)code.

This brings up another thing, whether they realize it or not advance C++ programmers do a form of multi-staged programming, if we have alook:

  1. boost.preprocessor (top-level, level-1)
  2. boost mpl (level 2)
  3. boost.fusion (v2) (level 3)

As far as i'm aware there is no support for multi-staged programming in D, only support for template metaprogramming & static assertions.

Another important lacking feature of D (last time i checked) is no type deduction of template function arguments, this means a no no for anything like the C++ standard library containers & algorithms or more importantly expression templates and the writing of domain specific embedded languages (DESLs) like in a C++ style or any method of writing DESLs in D (as far as i'm aware).

For the majority of what advance C++ programmers do boils down to essentially 2 things:

  • Try and get as much into the type system as possible and get checked and/or verified (well try to) at compile-time.
  • Writing code to be as generalized as possible while applying (domain specific) optimizations for special cases at compile-time when the exact type is known such as what happens in modern implementations of C++ standard library algorithms.

Clearly a C++ programmer wants a language with a very rich type system together with the ability to do metaprogramming in multiple-stages in there langauge and still being able to work close to the hardware model when necessary.

Anyways getting back on track some of the current alternatives also force certain features without giving options, prime example being garbage collection, GC shouldn't be the only option there should be a variety of options like having regions aswell as GC.

Unforutnately as of current there is no such alternative that has a similar syntax to C++.

If you really want you to entice the majority of C++ programmers away and have a truely better alternative to C++ that fixes C++ we need to keep in mind as to what C++ programmers do in C++ aswell as fixing the issues with C++ with what we already know.

So how do we fix C++ and give something that is truely better?

I've mentioned earlier what advance C++ programmers do so we know the majority of the things they do in C++ so now we can focus on actually fixing C++.

So what we need to do is start by "rollingback" C++, all the way back to a purely functional subset of C (lets do it with the C99 standard) just like with the SAC programming language, remove all suspicious and/or non-referentially transparent features and undefined behaviors that may be in the standard. Change the compiliation system, no more textual substituion, no more traditional (dumb) linkers, no more single-pass processing limitings.

Now we have a very simple language reminiscent of simple typed lambda calculus and we all know the benefits of a purely functional language can give. It now becomes easy to define the behaviours of how language constructs that interact in the face of concurrency/paralliism, garabage collection etc, etc.

Before that we should fix & tighten the type system, make it more rich & expressive change the type system into a dependently typed one. Adding pattern matching capabilities and local type inference. Add a real syntax macro system and support for multi-staged programming not just at compile-time. The macro system should be able to interact with the type system smoothly. These should allow for dependently typed meta-programming.

From there add formal effects system to the the type system like uniqueness types (linear and dependant types) and linear regions, support GC aswell.

Then on do what you want but keep it simple, apply it with formalism.

No comments or opinions then?

No comments or opinions then?

My Immediate Reaction...

...is that what you're describing won't be similiar enough to C++ in any meaningful sense to get lots of converts. OTOH, from a conceptual standpoint, what you're describing sounds an awful lot like Tim Sweeney's design ideas to me, so I like them. The difference is that Tim can implement his design, include it in the Unreal 4 technology, and instantly have tens of thousands, if not hundreds of thousands, of people itching to learn it. Releasing a "successor" to C++ in the wild would be vastly harder.

Unreal Engine 4

I'm designing a language which will look very similar to Ontic and have similar semantics to the ones Tim described in the old 'Python Metaclass Programming' thread.

So Tim is still exploring these ideas too? If this is so I'm very pleased, especially if they end up in UnrealScript or in some other part of Unreal Engine 4. Can you (or Tim) elaborate?

I Certainly Can't!

I (regretably) don't work for Tim. :-) Hopefully he will elaborate when he feels comfortable doing so.

...is that what you're

...is that what you're describing won't be similiar enough to C++ in any meaningful sense to get lots of converts.

I've spoken to some C++ programmers about it and the general conscious is that they would be converts if such a language existed. The fact there isn't such language we cannot comment on whether or not such language would get alot of converts.

OTOH, from a conceptual standpoint, what you're describing sounds an awful lot like Tim Sweeney's design ideas to me, so I like them.

It is similar but there some minor suggestions i don't completely agree with Tim's paper that isn't a problem. Also nothing was mentioned for support for meta & multi-staged programming , one that works smoothly together with a dependently typed language like in the paper i linked to.

The difference is that Tim can implement his design, include it in the Unreal 4 technology, and instantly have tens of thousands, if not hundreds of thousands, of people itching to learn it. Releasing a "successor" to C++ in the wild would be vastly harder.

Well we can only hope. I wish there was more details on his reference implementation given than his paper.

I think pure fp with

I think pure fp with referential integrity isn't ready for the masses yet. While referential integrity is a very nice property, it invalidates many well known programming patterns and makes some previously simple things much more complicated.

And for a language which should be able to do low level programming it's especially problematic because most of those 'low level' programing uses and creates side-effects and are thus dependent on execution order. While you can captures this with monads this would create another level of complexity which would prevent the success of such a language.

A very strict static type system is also problematic. Most big systems require a least a bit of dynamism, so having a type system which is totally static can make certain things difficult or even impossible.

I think pure fp with

I think pure fp with referential integrity isn't ready for the masses yet. While referential integrity is a very nice property, it invalidates many well known programming patterns and makes some previously simple things much more complicated.

I think you mean referential transparency, expressions & functions in a purely functional language are referentially transparent by nature it's not something that is added onto it. Anyways this is not what i meant, you've misunderstood me please read carefully to what i wrote.

And for a language which should be able to do low level programming it's especially problematic because most of those 'low level' programing uses and creates side-effects and are thus dependent on execution order. While you can captures this with monads this would create another level of complexity which would prevent the success of such a language.

Again please read carefully, did i mention anything of monads? (despite the correspondence between monads & effects in Wadler's paper).

A very strict static type system is also problematic. Most big systems require a least a bit of dynamism, so having a type system which is totally static can make certain things difficult or even impossible.

Not unless you have rich type system which is what i suggested already and besides the point this is exactly the case with C++. I'm talking about providing suggestions on making a truely better alternative to C++ for what advance C++ programmers actually do with C++ aswell as what Tim mentions in his paper, i'm not inventing or suggesting arbitrary features of some new arbitrary language.

You wrote "remove all

You wrote "remove all suspicious and/or non-referentially transparent features". So I thought you want a language with referential transparency (I wrongly wrote 'integrity', because I've just read a paper about referential integrity in virtual machines and mixed it up. Sorry.)

did i mention anything of monads?

No. But how do you want to create the necessary flexibility in a referentially transparent language without using monads? How do you ensure a certain execution order for execution order dependent hardware accesses or similar low level things?

While it's probably possible (especially with the meta programming abilities you propose), I'm not sure if it's really that easy and more flexible then using monads.

Not unless you have rich type system which is what i suggested already and besides the point this is exactly the case with C++.

Even rich static type systems remains static. How to you extend types depending on user interactions without recompilation? How do you handle dynamic linking? How do you easily integrate dynamic typed scripting languages? In C++ it's always possible to use reinterpret_cast for those cases. And while it isn't used very often (and then often well hidden into templates which do lots of checks) it's nonetheless sometimes necessary.

While I think that the features you propose are usefull in more abstract high level language, for a C++ replacement you simply need features which enables unchecked access down to the 'metal'.

C++ already has some very primitive kind of dependent types with the possibility to use values as template parameters (which is often 'misused' for template-meta-programming).

And I think it remains questionable if it's really possible to do static array bound checking the way Tim mentions in his paper in real world applications because only one dynamically sized buffer in a inference chain would be enough to break it.

You don't have to use monads

You don't have to use monads for impure computations. A few pure languages use uniqueness types instead. Another approach is using effects typing to annotate impure functions.

You should check into it again

D has template type inference now.

It also has an auto type inference, which C++ does not have. This saves some code typically employed by macros.

We have DTL a container library, but often I just wrap the built in arrays/hashtrees with a class customized for the task at hand.

It has the cleanest delegate syntax you are likely to see in a C++-esque language(see my other post).

It has *static if* construct which is evaluated at compile time and can be type driven (along with the *is* operator).

The scope operators also are nonexistent(?) in any imperative languages I'm familiar with.

Walter has left the syntax open to Fortran like vector operation(not implemented yet).

Instead of propping up C++, why not just try D? If its missing so much power, why isn't everyone who is using it saying so?

The biggest complaints are :
- lack of reference return types
- no static initializer for non static structs and arrays.

On top of all of this, Walter frequents the newsgroup often and is very open to suggestions and answering questions.

If D were to vanish I would not be able to go back to C++ now.

-DavidM

Thanks but no thanks, i'm not convinced

I think you've missed the point of my post, which isn't about C++ VS D, i only used D as an example, it's the same with virtually all other current alternatives (with similar syntax).

D has template type inference now.

I see there still is no support for implicit template instantiation which is a major issue despite the complexity it may add to the language. It looks as though you still cannot inherit from template type parameters in a class template and there is no bounded qauntification in D but C++ doesn't either however C++0x is geting first-class *Concepts* (which are similliar to haskell type classes but not quite).

It also has an auto type inference, which C++ does not have.

C++ is getting local inference in C++0x, by change the semantics of the auto keyword.

We have DTL a container library, but often I just wrap the built in arrays/hashtrees with a class customized for the task at hand.

It's nothing like or on par with the C++ standard library containers & algorithms and never will unless support for implicit template instantiation is added. There is nothing like the iterator concepts & generic algorithms either, yes there is some minor issues with the current model but now with past experinice and the help of boost iterators & iterator concepts there going to get an overhaul. The C++ standard library & boost libraries are only going to get better & better as C++ evolves.

It has the cleanest delegate syntax you are likely to see in a C++-esque language(see my other post).

Are you aware of what is on offer C++? std::tr1/boost::function, various generic signals & slots libraries which subsumes delegates, DSELs for lambda expressions in C++ and C++0x most likely to gain lambda functions as part of core. This is only the start of it all.

It has *static if* construct which is evaluated at compile time and can be type driven (along with the *is* operator).

Where do you think these ideas came from? this is already possible in C++, looking into std::tr1 type traits or boost type traits, boost mpl, static assertions, Concept checking library & C++0x Concepts.

Instead of propping up C++, why not just try D?

I'm not propping up C++, this is not about advertising C++ or C++ VS D. I'm (or we are) talking about how we should go about designing a new language which is truely a better alternative to C++.

I've looked into D in the past, i've looked at it briefly today and as far as i'm concerned it's not a truely better alternative in all aspects.

D is not what i have in mind, why reintroduce another ad-hoc type system & templates?. Parametric polymorphism, ad-hoc polymorphism and metaprogramming should all be separate things (but operate together smoothly) not some intertwined monstrousity that is C++ & D templates.

If its missing so much power, why isn't everyone who is using it saying so?

Thats not what i'm saying and besides why is there only a small minority of D programmers? why haven't the C++ masses come rushing over to use D? well i (partly) answered the reasons in my previous post above.

If D were to vanish I would not be able to go back to C++ now.

I'm not suggesting that, as said before this isn't about promoting C++ quite the opposite and this isn't about C++ VS D.

Unreal Engine 3 garbage collection

Since several folks asked:

In Unreal Engine 3, we implement garbage collection for all "heavyweight" objects, those objects for which we maintain complete metadata, support orthogonal persistence, etc which are similar in functionality to instances of Java/C# "object" in their capabilities. However, we also have "lightweight" allocations which we manage directly without garbage collection.

Typically during gameplay, there are 40,000 heavyweight objects around, with extremely complex chains of references (including cyclic references) and ownership relationships, thus making garbage collection a huge win in productivity. There are then hundreds of thousands of lightweight allocations whose ownership relationships are extremely simple and thus easily manageable explicitly.

Overall I wouldn't endorse a proposal to add garbage-collection features to C++, since the language is sufficiently low-level that most applications will want to handle memory management quite uniquely. Of course, I do see GC as perfectly appropriate for higher-level languages like Java and C#, which don't expose pointers-as-arrays, unchecked memory access, deal with low-level OS data structures, etc.

It seems like 90% of the effort of implementing GC on top of C++ is in collecting proper metadata for everything. Thus it would be useful for future C++ standards to address the problem of reflection very thoroughly, while leaving actual GC to the application.

Wilson and Johnstone Agree

Portable Run-Time Type Description for Conventional Compilers describes their system for extracting type information from the debugging data generated by C++ compilers, which they use for their real-time GC as well as their "Texas" persistent store. That's a tough way to get the metadata, but there it is!

there is also reflex

there is also reflex which offers a different approach to adding reflection to c++. it consists of a library component and a reflection data generator based on gcc-xml, which makes it non-intrusive and compiler-independent as well.

... It seems like 90% of

... It seems like 90% of the effort of implementing GC on top of C++ is in collecting proper metadata for everything. Thus it would be useful for future C++ standards to address the problem of reflection very thoroughly, while leaving actual GC to the application.

There has been a C++0x proposal by Bjarne Stroustrup (i think), it's called XIT (eXtended type information) library.

Whether this gets in or not nobody knows yet, proposals where only stopped being accepted late last year so the C++ standards committee are in the process of reviewing them all. We wont know until roughly 2007/8, when the C++0x draft spec is available.

Tim Sweeney

The best thing that can be said for garbage collection is that it's the ideal tool for 110-IQ programmers and half-baked designs.

You're essentially claiming that programmers with high IQs prefer to solve problems in difficult, unproductive ways.

In large software projects with complex data relationships and multiple programmers, garbage collection yields very significant productivity gains compared to manual memory management. A whole class of bugs (dangling pointers, memory leaks) and impediments to understanding (exactly who should free which object when) are eliminated.

A practical, productivity-focused programmer -- of whatever IQ -- will welcome any areas where basic language features can improve productivity and software reliability. And he will outperform higher-IQ programmers who stick with lower-productivity solutions.

so to sum it up

.NET *is* C++ with garbage collection. Using D or a decent template library solves the problem as well. So this petition is a great discussion starter, but cannot be taken literally.

get HnxGC with RAII

HnxGC library supports C++ application with RAII (deterministic reclamation feature) design pattern.

It is accurate, pauseless(block-free, lock-free, no-suspend) concurrent tracing collector with reference counting. Efficient and portable, no registration of root set pointer cost, no scanning of rootset, more features...

website: http://hnxgc.harnixworld.com