Why do they program in C++?

Over at comp.lang.c++.moderated, there is a thread created by the c++ guru Scott Meyers on the subject: "Why do you program in c++?".


In my experience, C++ is alive and well -- thriving, even. This surprises
many people. It is not uncommon for me to be asked, essentially, why
somebody would choose to program in C++ instead of in a simpler language
with more extensive "standard" library support, e.g., Java or C#.


It's a truly neutral question: given that there are many
languages to choose from, why do you choose to program in C++? I don't
care if the reaons are technical, political, social, or what, I'm just
honestly curious.

I thought this might be interesting.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Here's why I do it...

As someone who programs C++ for a living, here's why I use it for the stuff I do at work. (Keep in mind, some of these reasons are imposed by the boss). There shouldn't be any surprises here, really--it's the standard set of reasons (or excuses, if you prefer).

* While there are things I dislike about the language (and I've often voiced many criticisms), there are also things I like about the language. Compared to many of the other OO languages (ie Java), C++ has better support for the functional programming style.

* My current project(s) are embedded systems; C++ is readily available for our target hardware. A common codebase is shared across multiple hardware platforms with widely varying CPU and memory systems.

* C is the language of choice for interfacing with the underlying operating systems. No need for a FFI.

* As an embedded system, a good chunk of the code our team writes consists of frobbing hardware registers. This is trivial to do (and do reasonably portably) in C++.

* Good support for generic programming.

* I know the language well. Just as importantly, the rest of the team knows it well, too. Several of them know other languages (including non-industrial ones), but there isn't any other language that is architecturally appropriate that wouldn't require significant retraining or replacement of staff.

* Current engineering policy is to only use languages for which there exists a large user community and talent pool--especially for mainline tasks. On one project I work on, Java is used as well for the application layer and user interface. Python can be found in some of our products (none that I'm working on), as is Tcl--so we're not necessarily limited to C and its derivatives. (We once put Smalltalk in an embedded system; back before my time; that should make it obvious where I work. :)

* Performance is an issue; though perhaps an overrated one. Some of the products I work on have sub-100Mhz processors in them, so use of interpreted or bytecoded languages is restricted to non-time-critical stuff.

* Memory footprint is also an issue. Several products I work on do not have sufficient system memory to effectively make use of garbage collection (without the GC dominating the CPU). None of them have a disk.

Wow...

...Scott must work where I do! Well, apart from the "embedded Smalltalk" part.

I can, without exception, second all of Scott's comments. As I've said before, I am perfectly satisfied that C++ is almost surely the wrong language for the server side of my domain. It almost makes sense to me on the embedded side, but so far the compilers and libraries there are abysmal—so much so that I find myself thinking we might be better off with plain C.

But the retraining issue is a totally legitimate one, and it would be politically infeasible to convince the management team that the investment would be repaid in higher-quality products developed more quickly, even if it happens to be true. C'est la vie.

What sort of embedded platforms are you using?

GCC 3.x is available, in some form, for most 32-bit embedded processor families; in many cases embedded OS and RTOS vendors will provide complete C/C++ toolchains. GCC isn't the best C/C++ compiler out there, but it certainly works if nothing else is available. Many platforms have commercial C/C++ compilers as well.

I'm often puzzled by those who prefer C to C++; much advice of that sort seems to be based on myth, bad experiences with early C++ implementations (it took a while to get things like templates and exceptions right) or simple outright dislike for the language. Libraries ought not be an issue; any C library (possibly excluding things dependent on new C99 features like "restrict") should be easily callable from C++ code--even if only available in binary form.

The main exceptions I can think of, where C would be preferable are:

* Team is unfamiliar with C++ and/or OOP.

* VERY tight memory restrictions and/or performance requirements. (In many cases, a selection of C is justified on these grounds, but the justification is groundless).

* Architecture/platform that has a good C compiler but lacks a good C++ compiler (many DSPs, certain small-scale processor families)

* Legacy system issues which prevent or confound introduction of C++.

* A heavily K&R-ish codebase which would make a modern C++ compiler barf (and no desire to port it).

Even if you ignore the OO features (subtyping/inheritance) and program in a procedural or functional style; C++ has lots to offer that C does not. Better typechecking, generics, better metaprogramming support, etc.

At any rate, I checked in the employee database...no, you don't work here. Unless you post to ltu under an assumed name. :)

VERY tight memory restriction

VERY tight memory restrictions and/or performance requirements. (In many cases, a selection of C is justified on these grounds, but the justification is groundless).

The concern about memory is real in some very small applications. C++ generates a lot of structures (e.g. for exceptions and RTTI) which are still "there" even if you don't use them. If the compiler lets you turn these off, then this isn't such a problem, but then you're using a C++ subset, not C++ proper.

There are a few concerns which are specific to library vendors which may apply:

  • Some platforms still have binary compatability issues, such as differing object/vtable layout and symbol mangling.
  • Upgrading libraries without requiring clients to be recompiled is harder to do in C++, and sometimes platform-dependent.
  • More languages provide easy ways to bind to C libraries than to bind to C++ libraries. (This is more of a concern in the open source world.)

Apart from that, I agree with you. I don't know why most people would prefer C to C++ in this day and age, especially when you consider that most C++ compilers are really C++ compilers with features turned off. Even if you only use C++ as "a better C", you still win.

ABI Compatibility...

... is a real issue. While it's true that GCC 3.x is available for many platforms, in one form or another, we do have to concern ourselves with link-time compatibility for our customers, who often aren't willing to migrate away from whatever compiler/library combination the embedded device vendor is hawking.

In any case, having kvetched about it, I should mention that we do, in fact, still code in C++ even on our embedded devices. We just don't enjoy it very much. :-)

C vs C++

I would usually choose C over C++. My reasons are:

  • I only ever use C for very small performance critical pieces of code, or to interface with specific libraries etc. Typically, C++ doesn't really add much here.
  • From my experience, C++ is a lot less portable than C. What I mean by this is that I have so far been unable to write any C++ code which managed to compile cleanly first time on 2 different platforms. Use of the STL seems to be the kiss of death in this regard. Perhaps this is my own incompetence, but no other language I've used has been this tricky to port.
  • Related to the above: the STL seems to be single biggest reason for preferring C++ over C for the work I do. However, I rarely program without having a Tcl library present, so I can get most of what I need from that (e.g. hashtables, dynamic Unicode-aware strings, event loop, simple sockets, GUI, etc), and get a nice, compact configuration language for free!
  • Complexity: as mentioned above, my code is generally structured as small packages which are loaded into a Tcl interpreter to be tied together. For these sort of small pieces of code, C++ just adds complexity without really giving much gain. C++ is a much more complicated language than C.

In short: I'm trying to minimise the amount of code I have to write in low-level languages. C++ seems like more of a win if you're writing most, if not all, of your code in C++. That's a situation I'd like to avoid. I'd benefit more, I think, from a nice functional language (e.g. Ocaml, Haskell) with garbage collection, pattern matching, etc. However, I haven't had much luck integrating such a language with Tcl, unfortunately.

What are your problems with i

What are your problems with integrating Haskell or OCaml with Tcl? Also, why are you mainly writing stuff using Tcl, if I may ask?

Haskell/Ocaml and Tcl

What are your problems with integrating Haskell or OCaml with Tcl?

Last time I tried, it was building dynamic libraries, suitable for [load]ing into the Tcl interpreter. You can build a static customized tclsh, with whatever Haskell etc code baked in, but I'd prefer to not have to do that. I came across some code for linking (byte-code compiled) OCaml to Perl, and was going to try and adapt it, but so far haven't got round to it. I notice that GHC now has some preliminary support for dynamic libraries [1], so now might be the time to revisit this (although, I might wait for it to mature a bit first).

Also, why are you mainly writing stuff using Tcl, if I may ask?

Plenty of reasons. Mostly, for the implementation/libraries, which are really very good. It's also pretty good at meta-programming, but prefers using normal (runtime) procedures over Lisp-style macros, and defers interpretation of arguments to individual commands, which avoids some quoting issues. (e.g., there are no special forms in Tcl, beyond the minimal syntax defined in Tcl.n man page). There are warts, and I would personally like to see it move in a more Scheme-like direction (Tcl 8.5/9.0 are doing so, in some ways, with big integers coming in 8.5, some proposals for lambda, and even some preliminary talk of things like closures and continuations on the wiki), while maintaining some of it's more unique aspects.

I sometimes gaze longingly at some particularly elegant piece of Haskell, but mainly as a means of replacing C, rather than Tcl. My ideal language would somehow be able to scale between Tcl's very hands-off approach (interpretation as a technique of last resort, almost), and Haskell's elegant type system.

Perhaps I'm being stupid, but

Perhaps I'm being stupid, but I can't see any doumentation for the C++ libraries for embedding Tcl on http://tcl.tk. Where would I find this?

Tcl library docs

Tcl is a C library. The docs for the functions it exports are at http://www.tcl.tk/man/tcl8.5/TclLib/contents.htm. There are more docs scattered around the wiki, e.g. this page on C++. The sample chapters from Practical Programming in Tcl/Tk should be useful too.

You can either embed Tcl in your program, or you can restructure your program to be a set of Tcl-loadable packages and use tclsh (or wish, or tclkit...) as your "main". The latter is generally the preferred approach these days, for various reasons.

Thanks.

Thanks.

Advantages of C over C++

Portability (through time and space) is important, and C++ is far worse than C here, but the primary reason I prefer not to use C++ is that it's so bug-prone. http://www.parashift.com/c++-faq-lite/ is a several-hundred-page document that consists mostly of "gotchas" in C++ --- subtle bugs that mostly can't happen in C. In some applications, these subtle bugs are outweighed by C++'s support for higher-level idioms not supported at all in C.

However, I mostly use C as a slightly-higher-level version of assembly language. In this application, C++'s advantages are not very compelling, but some of its disadvantages remain. Generally I write higher-level code in Python.

(In Wheat, however, we're using C++ to implement the virtual machine. And by "we", I mostly mean Mark. But I think it's probably the right choice.)

Why I prefer C to C++

Maybe you work in places with lots of good programmers, but I have seen far more abuse of C++ than good use of it. People seem to pick their favourite feature and over-use it. Spaghetti inheritance, template torture, exception returns outnumbering normal returns... I sometimws get the impression that Windows programmers learned from the so-called Microsoft Foundation Classes, and the latter were written by the Mad Arab.

Or, to use a popular mixed metaphor, "C gives you enough rope to shoot yourself in the foot; C++ gives you a bazooka."

My big personal peeves are:

  • Scoping. When I see some code referring to "foo", is it a member variable, or a global (or static or class) variable? Or does it belong to some superclass? I keep writing "this->" in front of everything, just so I can keep track of it. ("But you don't need to do that!" "I know; I want to; I wish you'd do it, too.") Yes, there are naming conventions, but they're often not followed (especially in the older and cruftier bits of the code base), and they're different in every code base, anyway. You can do horrible things in C with macros, but everyone knows that you use ALL_CAPS unless you're damn sure that you've made them behave like normal variables and/or functions.
  • Auto-typedef-ing. The single biggest wart in the C grammar is the type name syntax and "typedef" introducing an unbounded number of new keywords. So C++ goes and makes it worse. I like writing "struct" in front of everything. It tells me that this object has complex internal state. I like writing "class" in front of everything, too, which attracts the same comments as the above. I only use typedef when I really need an arithmetic type and I really can't use a built-in one. Indeed, in C, I often write "foo_frob(struct foo *foo, arg1, arg2, ...)", which drives C++ programmers crazy.
  • The syntax. In the 90s, I remember people wondering if it was possible to parse C++ unambiguously, or if there was an unresolvable corner case. Maybe X3J16 solved that, but it can still be a remarkable pain, especially combined with the above two issues. By fitting into the corners of C, already a fairly densely-encoded language, it's possible to approach Perl 5 regular expressions for clarity. And some people seem to like writing that way.
Now, it is well known that "there is no language in which it is the slightest bit difficult to write bad code", but once I've stripped a piece of code down to the few places where I actually need virtual functions or whatever, the overhead of creating a vtable by hand in C is generally negligible, and is dwarfed by the comments explaining why it's needed and what the invariants are, anyway.

Now, automatic constructors and destructors are rather handy, and for some mathematical-type code, operator overloading is a wonderful syntax convenience, but if I'm doing high-level programming, I'll use a high-level language.

But other than that, the only thing I find worthwhile about C++ is the STL. Which is a very good thing, but the cost at which it was bought... now J. Random Cruftycoder can obfuscate his wierd, special-purpose and buggy data structure by wrapping it in template syntax and feel virtuous that he's writing "reusable code".

I don't deny that C++ is a powerful tool, able to span from low to fairly high levels, but I have seen it in the hands of idiots too often.

Embedded code

On top of my comments above, for embedded programming, I have to think hard about every single allocation: "what do I do if it fails?" (There's an important exception for startup code, before anything data-dependent happens.)

Thus, I hate implicit allocation. I have a strictly finite amount of RAM available, and that's it. I preallocate when possible, and cleanly back out when not.

It's not a case of "tight" memory - I usually have over a megabyte - but the fact that I can't just print "out of memory" and return to the shell.

Not to mention, I have to provide the run-time, as well. Remote fimware upgrade is always fun, particularly when you don't have enough non-volatile storage for two full copies. A nice simple run-time has advantages there.

Replacements for C++?

For a long time I have been looking to replace C/C++ with something better, but what would be a good choice? What would be a better C++ than C++?

I've looked at D (from Digital Mars), and it seems promising. And C# fixes many of the problems of C++, but it does not compile to native code, so it can't truly replace C/C++. Are there other modern alternatives?

Ada

I am not sure what your requirements are, but I suggest checking out Ada.

Objective-C

The only systems which made wide-spread use of Objective-C appear to be Next and now Apple, but it is quite elegant and useful. Portable too, as part GCC. The GnuStep framework is available for Linux and Windows, and Apple has open-sourced their lower-level libraries for writing portable Objective-C code.

It is a thin layer on top of C, providing Smalltalk-like behaviour. Being a dynamic runtime system, it is much easier to link to other dynamic languages (Python, Lisp, Ruby, etc.) than C++ or even C.

IMHO...

Kkaa: For a long time I have been looking to replace C/C++ with something better, but what would be a good choice? What would be a better C++ than C++?

I believe Objective Caml to be a quite reasonable replacement for C++ for 90% of the domains that I've seen C++ applied within. O'Caml offers all of the usual benefits of the ML family such as higher-order functions, currying and partial application, algebraic data types and pattern matching, and Hindley-Milner type inferencing. It offers key benefits of imperative languages such as mutable references and arrays. It offers all of the usual benefits of a class-based object-oriented language such as inclusion polymorphism, multiple inheritance, method overriding, and data encapsulation. It offers an extremely powerful module system including modules that can be parameterized by other modules so that, e.g. a "currency" module can parameterized by a "dollar" module implemented in terms of floats or a "euro" module, also implemented in terms of floats, so that you can't accidentally mix operations on dollars and euros but there won't be any code duplication.

Unlike many advanced alternative languages, O'Caml doesn't fetishistically abstract away from the machine: integers' range is determined by the machine word size, whether 32 or 64 bit. There are specific 32- and 64-bit integer sizes, and arrays of these or of float values are guaranteed to be contiguous and unboxed for efficient numerical computing and/or compatibility with C/C++ memory layout. Being a member of the ML family, O'Caml is garbage-collected, but its GC is an extremely efficient generation-scavenging collector. If you want arbitrary-precision numbers, you can get them, from a standard module even, but it is a distinct module.

O'Caml has a reasonable set of standard libraries, including, of course, text and binary I/O, strings, lists, hash tables, etc. Some modules go beyond the obvious: regular expressions, threads, sockets, graphics, lazy evaluation...

O'Caml excels as an environment for developing DSLs. You can go the traditional route and use ocamlyacc and ocamllex and write your semantic actions by hand, or you can take advantage of the camlp4 preprocessor to build your front end. camlp4 and the O'Caml back-end are well-integrated; if you write a new front-end with camlp4 the O'Caml compiler won't parse your code again; it will deal directly with the AST that your parser hands it, all while retaining source coordinates for error-reporting purposes.

What really grabs me about O'Caml, though, is its attention to pure pragmatics: there is an interactive REPL (Read, Eval, Print Loop, called a "toplevel" in O'Caml terminology), a bytecode compiler coupled with a GDB-like debugger that offers "time-travel debugging" so that you can "undo" the execution of your program when it would be helpful, and a native-code compiler coupled with a profiler for those heavy micro-optimization tasks. There are also several foreign-function interfaces to C/C++, and lots of people have used them: there are wrappers for database APIs, OpenGL, crypto libraries, etc.

I can't emphasize enough how important this last point is. O'Caml is the only statically-typed language I know of that is competitive with Lisp/Scheme/Smalltalk for fun, interactive development and C/C++ for the ability to deliver small, fast native-code binaries. Highly recommended.

It's a shameless plug but I agree

I personally agree. I use both C++ and OCaml rather heavily and most of the things I do in C++ I could do with less difficulties and less run-time errors in OCaml -- if the libraries I'm using weren't C++-specific, that is.

I look forward to using Mono + F# or OCamlIL and being able to use all these libraries in a language-neutral way.

can i hijack a little corner

can i hijack a little corner of this thread to ask for recommendations of a good c++ compiler that runs on windows/cygwin and linux, and is free. is gcc the best there is? by good i mean as consistent as possible with the standard, rather than generating fast or compact code (i'm particularly interested in using templates). i need to improve my c++ abilities because in a year or so there may be a good job opportunity using the language.

Certainly

GCC 3.4.3's support for the standard is quite good, and GCC 4.0's will be even better. Also, GCC 4.0's codegen for C++ is vastly improved over GCC 3's. Either compiler does just fine with various C++ acid tests (Boost, Crypto++, etc.)

thanks!

thanks!

Needed for non-deskop platforms

I write code for game consoles. C++ is almost always the out-of-the-box officially supported language. I could see using a language like Forth as well, but there's no way I'd ever suggest that :)

It's difficult to take a language like Erlang or Haskell or OCaml and have it run well on a system where you can't dynamically allocate memory willy-nilly (because there's often no memory mapping hardware, so fragmentation can be really bad). And you have to avoid situations where you have tens of megabytes of data that the garbage collector could have to run through at an inopportune time, when you've only got 16 milliseconds per frame of total processing time.

I'd appreciate pointers to high-level language implementations designed for embedded systems (I know about Lua, but that's a "scripting" language by design, not something you'd want to write the bulk of your code in.)

MLKit

James Hague: I'd appreciate pointers to high-level language implementations designed for embedded systems (I know about Lua, but that's a "scripting" language by design, not something you'd want to write the bulk of your code in.)

How about MLKit?

"All memory allocation directives (both allocation and de-allocation) are inferred by the compiler, which uses a number of program analyses concerning lifetimes and storage layout. The ML Kit is unique among ML implementations in this respect... Programmers who are interested in real-time programming can exploit the absence of garbage collection: there are no interruptions of unbounded duration at runtime."

I'm interested in something similar

I occasionally build low-cost robots in my spare time and it'd be fun to try some functional reactive programming in the real thing. Some of my robots only have a few hundred bytes of RAM so there's probably not much hope there. But one has 512K and it'd be cool to find a nice functional language that runs in that space. Even better, an interactive interpreter that runs in 512K so I can control it 'live'.

There are a number of project

There are a number of projects to run functional stuff on bare PC metal - such as movitz, or the Haskell projects mentioned here in the last couple of days. It's probably worth checking to see if there's a practical way of using scheme on the metal.

Hudak's embedded FRP

I know that Zhanyong Wan and Paul Hudak worked on real-time functional reactive programming for industrial robots. You can find more info on Zhanyong Wan's publications, especially his PhD thesis.

--Shae Erisson - ScannedInAvian.com

The Transterpreter is an Occa

The Transterpreter is an Occam interpreter/runtime:

We think occam is really best suited for programming robots, because robots always have to deal with lots of stuff at the same time. Therefore, we've also made it possible to write occam programs to control the LEGO Mindstorms. Give it a try; you'll love it.

As occam closely models the CSP calculus, we can write clean, concise, concurrent programs that can then be executed anywhere---on Mac OS X, Windows, *NIX, on handhelds and clusters, and yes, little robots too.

Also, Danny Dubé did BIT, "A Very Compact Scheme System for Embedded Applications", and with Marc Feeley, PICBIT, a Scheme interpreter for PIC chips (also covered in Running Scheme on a PIC microcontroller).

I'm not sure about the availability of source code with any of these.

Transterpreter code availability

The last time I talked to Transterpreter folks (a few months back now), they said they had no plans to make the source code for the Transterpreter generally available. However, they did mention that they'd be happy to supply the source code to people who had a genuine interest in working on the code base.

Scheme compiler for Lego Mindstorms

Here's a Scheme compiler for Lego Mindstorms, which comes with source in Scheme. The source is only 1570 lines -- the compiler was apparently written by three students in three days. How hard could it be to retarget for another CPU? ;)

Of course, the Transterpreter is apparently written in C, so if the target CPU has a C compiler, then porting that ought to be easier, if the source can be obtained.

Re: LEGO Scheme + Porting

Yep; although, that particular compiler is actually a cross-compiler, so unfortunately it isn't going to be particularly portable to new platforms. It was, as the pages said, a proof-of-concept. But, it was impressive to see happen. :)

Regarding porting the Transterpreter, we've had good success porting it to other platforms.

http://www.transterpreter.org/blog/archives/2005/02/portable_you_be.html

We have a port to the Sharp Zaurus completed as well (which took almost no time), and a few other platforms in various stages of partial completion. Again, people who are interested in working on ports are welcome to contact us, and we can find ways of getting them involved.

(Hm. An awful lot of posts for me today...)

Source

Apologies for not keeping up on all-things-LtU!

I should say that our intentions are to release the source, but both of the primary authors of the Transterpreter (myself and Christian) are within 6 months of writeup for our PhDs. Supporting a source code release is not, we don't think, a good idea at this time.

We have the infrastructure in place (version control in a place we can create accounts, etc.), but are not ready for any of the administrative overhead at this time. We won't let it bitrot, though, we promise.

"C++ is bigger than ever"

according to Bjarne Stroustrup.

Here is why:

As a professional programmer that earns his living coding C++ applications, here is why:

a) C++ is faster. No matter what you have heard, the overall speed of C++ applications (either real or perceived) is greater than that of other languages.
b) Templates and generic programming without too much brain twisting. Templates and type safety is very important. If C++ haven't had templates, it would not be used.
c) Direct interfacing with C and O/S routines.
d) Plethora of good libraries.

I think of greater importance is to ask why they don't program in other languages other than C++?

Here are my reasons:

a) Java and C# can't do many intelligent things that C++ can.
b) Having everything in classes does not always make sense.
c) Python and perl have weird syntax.
d) ML and Haskell have even more weird syntax.
e) The most important reason: it is not directly visible, from the available online resources, that what I can do with C++, I can do with LISP/ML/Haskell/O'caml/etc etc.

This last reason is the most important one. My job is about doing desktop defense application programming (radar consoles, simulation systems etc). Here is what I want a programming language to support:

1) signals and slots, for doing model-view-controller apps.
2) networking.
3) memory mapped data structures.
4) multitasking.
5) a rich GUI.
6) value classes.
7) operator overloading.
8) XML and databases.
9) clever and efficient data structures and algorithms support.

Let me tell you, after a two year search, what I have found:

Java does not support signals and slots, but a generic observer-observable pattern which is a pain in the ass to use. Java does not support memory-mapped data structures, nor value classes, nor operator overloading. Java interfacing with C and OpenGL is weak, to say the least.

C# is not an option, because my apps must run on Linux as well as MS Windows.

Smalltalk has an ancient GUI and tens of different flavors.

I have searched for example desktop apps in LISP/ML/Haskell/OCaml that support the above, but I have found nothing. All I find is broken links, 100 ways to code the factorial function, insane amounts of work on how continuations are the best thing since sliced bread, etc etc.

In other words, it's not the elegance of a language that counts, but rather the most cliche of arguments: what one can do with a language, in the shortest space and time.

So after all, I am back to C++. I am using Qt, which is the best piece of software ever written, for all practical purposes, and it covers all the needs for the types of projects I want to do. I never do any memory management, I don't need garbage collection, I have direct support for signals and slots (although not part of C++), a very rich GUI that actually makes sense, direct support for OpenGL, and all the goodies C++ offers me, like operator overloading, value classes (you can't tell how valuable value classes are, until Seconds and Milliseconds get confused!used value classes to separate them and provide implicit conversions between them!), smart and efficient data structures (no need to copy stuff over and over, as in functional languages), etc etc...

Most of what you say makes no

Most of what you say makes no sense whatsoever. For example, what "intelligent things" can C++ do that Java or C# can't?

Signals and slots are not directly supported by anything in the C++ standard - they are not a language feature. Conversely, there is no reason why a C# or Java library could not support them as well as a C++ library does.

I could go on, but I won't.

"For example, what "intellige

"For example, what "intelligent things" can C++ do that Java or C# can't?"

Check out templates for starters. There is a rich field of fancy stuff to be discovered which can't be done "similarly enough" in Java (always think about compile time vs. runtime, for instance) :o)

"Signals and slots are not directly supported by anything in the C++ standard - they are not a language feature. Conversely, there is no reason why a C# or Java library could not support them as well as a C++ library does."

You are assuming that the expressiveness of both languages is close enough ("Conversely, there is no reason why a BASIC library could not support them as well as a C++ libary does.")

I don't agree with many points of Achilleas, but at least we should make some effort in disagreeing :)

Ok, leaving aside the fact th

Ok, leaving aside the fact that Java does not have compile-time metaprogramming, I see no justification for the idea that there are "intelligent thigns java cannot do", nor that Java cannot support Slots or any otehr architectural pattern in a way that is easy to use.

As regards compile-time evaluation, for the vast majority of applications in C++, if not all, a reflective capability only as powerful as Java's is quite enough to allow programming the same kinds of patterns in a way that is no less concise or easy. Java generics cover the simple uses of templates.

I see no need to enter into deep arguments about capabilities with someone who is ignorant of the basics.

Nanny, Nanny.

I see no need to enter into deep arguments about capabilities with someone who is ignorant of the basics.

As best as I can tell, we are in a very shallow ravine that has little to no running water.

Personally, I think anyone that can program in C++ effectively is, by definition, pretty smart - you have to be smart to figure out and keep track of all the intracacies. Me, I'm a dumb programmer who likes to not worry about a lot details simultaneously - which is why I don't get along well with the likes of C or C++.

Sure - but I'm not going to a

Sure - but I'm not going to argue about what Java or any other language can or cannot do with someone who appears to be wilfully ignorant of it, making claims that have no basis in fact whatsoever, providing no evidence because they have none.

Who's to convince?

The person you're arguing with? Or the smart people? Or the dumb people? Or yourself? Can't say I blame you for wanting to optimize your time, but why the caustic nature of these posts?

Not that it makes any difference, but Java is way too clever for me as well.

[...]Java is way too clever f

[...]Java is way too clever for me as well.

Really? I find that Java stops me from shooting myself in the foot with things like accidentally passing pointers to short-lived char* buffers, while also preventing me from easily creating well-factored, generic code.

Library implementors do need a little bit more of what it takes to create something good in Java, to be sure. Doesn't mean that it can't be done, or that once it has been, downloading and using a Java library is harder than downloading and using a C++ library.

Somewhat relative

I do find Java programming easier because it has built in GC and doesn't use memory pointers. However, Java's got its own set of warts. For example, how do you determine when a type should be constructed as a class hierarchy (limited to single inheritance) and when should it be an interface (no code inheritance)? Whichever path you choose, you'll be fighting it all the way along. Also there's the distinction between primitive types and object types?

May just be me, but I find that all the programs I construct are heavily reliant on the construction of Types. Get the types right and everything else falls into it place (somewhat akin to getting a database design correct). If I have to fight the Type system, then I find myself in a pitched battle with my software, never really being satisfied with the tradeoffs.

Objects vs primitives

Many people are bothered at the distinction between objects vs primitive types (int, boolean, double, etc.) in Java; it is often suggested that a major deficiency of Java is that not everything is an object (as opposed to Smalltalk, for example, where everything is).

In C++, the problem is (at first glance), worse. Not only are the primitives not objects, but the language happily allows you to manufacture MORE things which aren't objects--unions, structs, enums, and pointers/arrays to such. (We'll gloss over struct vs class for now).

However--the C++ approach seems more natural to me. Why? In Java, objects and non-objects have differing semantics and (in general) cannot be interchanged. Intrinsics live "on the stack", whereas objects live in the heap. One object cannot be embedded in another without a level of indirection. Pre Java 5, one couldn't use a primitive in an object context without an explicit cast to a class such as Integer, Boolean, etc. Primitive types cannot be aliased without wrapping them in a containing object of some sort.

In C++, on the other hand; the polymorphic objects, the primitive types, and the non-polymorphic aggregations all exist on the same plane. One can use any type in C++ as a template paramter; or as the basis for overloading. All types can be allocated in any memory region, or embedded "inline" within another object. In short, whereas the primitives and the objects are segregated in Java; in C++ they are not--something which I thinks makes C++ more expressive.

Of course, that comes at a price--the C++ solution permits references to out-of-scope objects to be created. By allowing internal pointers, it makes implementing GC much more difficult.

Thoughts?

It's not just that C++ gives

It's not just that C++ gives you more ways of doing things - it genuinely is easier to shoot yourself in the foot with stack allocation, as you actually have to be aware of the problems.

By contrast, Java holds your hand very tightly, which can incur performance penalties, and makes the language more confusing to learn (it took me ages to work out why I couldn't use 'new' on int). However, I've never found that this aspect makes it harder to actually achieve a particular end (except, of course, in relation to performance).

Classes vs. interfaces [OT]

For example, how do you determine when a type should be constructed as a class hierarchy (limited to single inheritance) and when should it be an interface (no code
inheritance)?

I don't really see the problem in making that decision. In Java, I think of all types as being interfaces in principle — classes are really about implementation. I think possibly the reason confusion arises is that when a class will have a single interface that won't be used anywhere else, you can save time and keystrokes and maybe some mental overhead by simply defining the class without an explicit interface. But in that case, the class still has an implicit interface. You could define the interface explicitly, change all references to the classname (where used as a variable type) to use the interface instead, and the design and semantics of the program wouldn't change.

Whichever path you choose, you'll be fighting it all the way along.

The fighting happens because you can't reliably use subclassing for implementation inheritance when you need to implement multiple interfaces. You either need to pretend Java is C and use lower-tech reuse mechanisms, like composition and delegation; or use the new generics stuff, Java version permitting; or use some kind of metaprogramming, like AOP. Either way, I agree it's a fight.

Good Point...

...this led me to think a little bit about some systems that I feel sit at some kind of local maximum with respect to how I seem to measure "quality" of implementation in Java. I came up with perennial favorite Hibernate, but also, in a similar but possibly even simpler vein, SimpleORM and Bhavaya. I'm also reminded of ActiveMQ and JGroups. What do I find in common among these systems?
  • They solve real-world problems: object persistence and distributed computing.
  • They take throughput and scalability seriously.
  • They take standards (OMG, J2EE) seriously, at least enough so to play nicely with other children.
  • They take the cost of developer time seriously.
On this last point I should say that I didn't think it could get any better than Hibernate on the Java persistence front, but both SimpleORM and Bhavaya strive to do even better, even extending to eliminating Hibernate's XML configuration files! The real shining example, though, is ActiveMQ. On one hand it's "just" a JMS 1.1 implementation. On the other, it plugs into any J2EE 1.4 container plus several other popular containers, supports some eight different transports, three persistence frameworks, makes non-Java code and DHTML first-class citizens in the messaging infrastructure, supports clustering and hot deployment, and supports dynamic discovery. Oh, and it's open source.

Technically speaking, what these systems seem to have in common is getting extreme leverage from Java's reflection, dynamic proxies, and in some cases the availability of high-quality bytecode-generation libraries to, as transparently as possible, minimize both the cognitive distance between task and accomplishment and the amount of repetitive, tedious, error-prone glue code often required to bridge task and accomplishment.

There's a lesson in here, as well as in systems like Ruby on Rails, for fans of statically-typed languages that lack reflection and/or easy in-place code generation, IMHO. In my own thinking about having a "slider" from static to dynamic typing, I need to remember that just having that slider isn't enough, but that the space on either side of the knob needs programmatic access to the space on the other side in order for the full potential power of the slider to be realized.

Your foot will thank you for

Your foot will thank you for not using char * buffers and using std::vector instead. Failing that std::auto_ptr, or boost::shared_ptr.

I was talking about Java 1.4.

I was talking about Java 1.4. I haven't tried Java 1.5, and using Java 1.5 a non-argument, because our projects date back from 1998. And Java generics are much less powerful than C++ templates, especially because one can not use primitives in collections.

Here is a real-life example: on OpenGL display built with Java uses Ids (of type 'int') for display objects. Whereas in C++ these ids are kept in a simple map (with no run-time overhead for insertion-removal-search), in Java 1.5 there is an Integer object created every time there is a display action, in order to create the key to search the map for which Java object was clicked. In a display with real-time video (real-time plots fed from the actual radar), we had to use a two-processor system to achieve a good frame rate with Java. The previous version, made with C++, needed only one CPU.

There's a fast way to map int

There's a fast way to map integers to objects, and it's called an array. You may have heard of this datastructure.

Anonymous attitude

it's called an array. You may have heard of this datastructure.

I have this theory that postings seem more collegial and less snarky when posted with less anonymity. (Not sure what causes this perception; must be some deep human factor. ;-) )

Most of us here post with our real, full names as a sign that we are prepared to stand behind our opinions.

Might I recommend this practice to you, too, Marcin?

I'm happy to go on the record

I'm happy to go on the record as Marcin Tustin. I'm still going to call someone who comes up with unsubstantiated rubbish, though.

Surely the question is why you're uncomfortable with me challenging nonsense?

It;s not an argument....

...it's a contradiction. (via Monty Python sketch).

Guess we could get into an argument about Turing completeness meaning that anything from one language can be expressed in any other Turing complete language. But then we'd back into the subjective argument about which language makes it "easier".

Personally, I find language advocacy to be a non-productive avenue of exploration. Original poster thinks he's most productive in C++. You think you're most productive in Java. To each his own. What is rubbish is the pretense that either C++ or Java must be defended. If the language does what you want and you find yourself productive, then that's all there is to say.

Actually, I find it hard to s

Actually, I find it hard to say which language I think is worse - probably Java. However, I don't like to see people lay down claims that are factually inaccurate, or simply incapable of proof while presented as facts.

It's simply disingenuous to say that Java is inferior because a bad way of coding something, for which an alternative which is as easy to use exists, performs badly. I don't like it, and I see no reason to let that kind of thing pass.

The record shows...

I'm still going to call someone who comes up with unsubstantiated rubbish, though.

By all means do. I have observed, however, that doing so in a collegial manner tends to have better effects both on the discourse and on the interlocuters.

Just a suggestion.

Surely the question is why you aren't willing to challenge nonsense?

You're right; I never give anyone a hard time here. I really should be ashamed of myself. ;-)

But seriously, when discussing PL preferences, there is always more than a pinch of "de gustibus non est disputandum".

There is a terrible habit I've observed in myself (and perhaps other programmers as well) to over-generalize one's experience as typical of all programming environments and problems.

When someone says that they find a particular language better suited to "serious problems", they usually mean the serious problems that THEY routinely run into, which may be quite different in their requirements from mine, even nominally in the same field of effort.

So I try not to get too worked up about such things, I'm more interested to see if I can detect any substantive features of PL design that demonstrably contribute to particular suitabilities.

Sometimes such things ARE to be found.

I think you'll find that I've

I think you'll find that I've never said in general (nor, at all, in this topic) that one language is superior to another. Only that certain criticisms are patently unfounded and based on ignorance.

Ignoring your ironic tone, I

Ignoring your ironic tone, I should say that you have not understood the problem at hand at all. Let me state it again: a radar generates plots (lots of plots per second), these plots are mapped onto an interactive OpenGL display, where the user can do various things. There are various labels attached to each plot, and various user actions. Each OpenGL object on the screen is identified by a unique integer Id generarated automatically; this unique id must be associated with a Java object. An array can not be used in this situation, simply because the range of ids given by the OpenGL library is not known, and OpenGL objects and their ids are destroyed at some point in time. An array is simply not a good data structure; an associative array is.

Allocate an array of size MAX

Allocate an array of size MAXID+1, where MAXID is the biggest possible id number (I'm guessing that it's MAX_INT). This will work just fine, will be faster than any hashtable, but will take more memory (unless, of course, you normally have very many handles open at one time). Otherwise, it will have the same functionality as a hashtable, for example it will still hold references to objects after they have become invalid.

Haha, this was funny.

Haha, this was funny.

Pray tell why?

Pray tell why?

Feeding trolls

It won't work on architectures where sizeof(int) == sizeof(size_t):

$ cat foo.c
#include <stdlib.h>
#include <limits.h>

int main(int argc, char ** argv) {
   int * arr;
   int i;

   printf("Trying to allocate %d * %d = %lu bytes\n",
          INT_MAX, sizeof(int), INT_MAX * sizeof(int));
   arr = malloc(INT_MAX * sizeof(int));
   if (arr == NULL) {
      puts("NULL");
   }
}
$ gcc -o foo foo.c
$ ./foo
Trying to allocate 2147483647 * 4 = 4294967292 bytes
NULL
$

I don't think java will handle allocting an array twice the size of virtual memory any better.
And even on architecturs where it works (like AMD64), allocating 8GB to hold a few thousend handles seems excessive.

Actually, I failed to take ac

Actually, I failed to take account of the fact that this would require 16Gb (Assuming that both java references and ints are 32 bits).

It is true that it is rather annoying to have to write your own hashtable implementation for integers, no matter how simple it is.

It's a little more difficult than it looks

Given the difficulties of making a fast, thread-safe and memory efficient hashtable for production usage I would just use one of the many libraries that have already implemented this. For instance: http://trove4j.sourceforge.net/overview.shtml

The fact you did not understa

The fact you did not understand what I was saying does not mean I was wrong. 'Intelligent things' are all the template/operator overloading tricks one can do, which Java can't do. For example, boost::spirit or boost::functional can not be done with Java.

Signals and slots are not a language feature, but the language allows for a good implementation. On the other hand, Java can't do signals and slots, because it lacks templates and function/member pointers. The observer/observable pattern leads to the ugly solution of using a flag to notify the observer about the kind of update that hapenned in the observable, which leads to spaghetti 'if-then'/'switch' blocks.

C# has events right into the language, but as I said earlier, for the kind of job I do, C# can not be used.

Please explain the difference

Please explain the difference between a function pointer and a pointer to an object with a known interface. (Hint: Lexical closure)

A function pointer (in common

A function pointer (in common terminology) lacks the state to be an object.

Correct; however, there is no

Correct; however, there is no reason why an object has to have state.

Misconception

smart and efficient data structures (no need to copy stuff over and over, as in functional languages), etc etc...

You might be interested in this posting (from this thread, respectively).

In any case, the idea that you have to copy excessively in FPLs is a gross misconception.

Basically, the poster of the

Basically, the poster of the posting you posted is cheating a little:

The poor performance of the STL compared to OCaml's library is, in this case, directly due to the imperative design of the STL and the functional design of OCaml's set implementation and not to do with any given implementation. Functional programming is simply vastly more efficient in this case.

Cheat number 1: two different algorithms are compared, not two different programming languages. The non-copying style of O'caml can easily be done with C++. But that implies the data are static, i.e. they don't change.

Specifically, due to its imperative design, the STL's set_union function must explicitly copy all elements in both sets as the originals are mutable and, therefore, may be changed. In contrast, referential transparency allows OCaml's Set.union function to refer back to branches of balanced binary tree in the input sets whenever they can be reused.

But if you add elements to those data, your data will be copied. Whereas in C++, they won't. Let's say you have a huge data set of 10000 records, and you want to sort it. The functional way will copy the 10000 record array again and again, whereas in C++ the sorting happens in place. I doubt the functional algorithm will not run out of memory.

I see comparisons of apples and oranges here. It's not fair.

He is comparing paradigms

Cheat number 1: two different algorithms are compared, not two different programming languages.

He does not primarily compare languages, rather trade-offs of paradigms - which is a useful comparison to a certain extent.

You claimed that one needed to "copy stuff over and over" in FPLs and silently implied that this was a grossly inefficient approach. The article shows by anecdote that (1) this statement is not true, and (2) that overuse of stateful data structures for alleged efficiency can actually backfire big time.

The non-copying style of O'caml can easily be done with C++. But that implies the data are static, i.e. they don't change.

Sure it can be done - though not that easily, because it is much harder to achieve properly without garbage collection.

But point is, it is not done in C++, because of the wide-spread naive believe in the superior efficiency of stateful data structures.

But that implies the data are static, i.e. they don't change.
Not sure what you mean.
But if you add elements to those data, your data will be copied. Whereas in C++, they won't. Let's say you have a huge data set of 10000 records, and you want to sort it. The functional way will copy the 10000 record array again and again, whereas in C++ the sorting happens in place.

No, the "functional way" won't copy everything over and over again, simply because it wouldn't use arrays. ;-) For most tasks, including sorting, there are efficient algorithms that do not rely on in-place update.

Besides, all functional languages provide means to implement imperative algorithms efficiently, e.g. by providing imperative arrays. Some of them even manage to do this without loosing the nice properties of being "pure".

Informal summary

A tree is not an array ;0)

Wrong summary

That should have been: an array is not a tree. Ah, well ;-)

Neither is a list

but all three can be used in terms of each other. More specifically, an array is an indexed list. And a list or an array can easily be represented as a tree. From an abstract data type standpoint, any of these forms can be used to store the data, using accessors to hide the internal representation.

Well...

I agree! . . . . . Que?

Stating the logical conclusion....

...for the emoticon impaired. :-)

From This...

...I can only conclude that O'Caml needs better marketing, especially if you believe that Qt is the best piece of software ever written. :-) And aren't you conveniently ignoring that Qt requires the use of a preprocessor to overcome C++ deficiencies?

I've written quite a bit about O'Caml—recently, even—so I won't rehash it here. Suffice it to say that if you don't think it has good GUI support albeit for GTK rather than Qt, "non-copying" data structures, "value classes," OpenGL support, etc. then you really should look more closely.

As for syntax, all syntax looks weird at first, and I definitely can't help you if you think O'Caml's syntax is even weirder than Perl's!

The only point that I think we agree on is that it would be nice to see G'Caml again, but truthfully I don't even find myself thinking about that most of the time when I'm writing in O'Caml. On the other hand, I don't find myself thinking about it much when I'm writing in C++, either, so maybe the take-away from that is that I don't do much scientific/numerical computing and therefore don't suffer much from not overloading operators.

But all of this misses the main point, which in my case is that O'Caml continues to let me write correct code vastly faster than C++ does, and generally speaking the performance of that correct code is within epsilon of the performance of the C++ code. Even when I have to tune the O'Caml code, it's vastly easier to do that than it is to make an arbitrary body of broken C++ code correct.

It pains me, as basically a free-market absolutist, to have to say this, but given that I keep hearing the same essential mythology about alternatives to C++ in general, and O'Caml in particular, I have to wonder whether the solution to the imbalance won't have to consist of someone starting a software company that simply disallows the use of C++ in their products. I would actually rather take a group of experienced programmers, in any language, and teach them O'Caml for, say, two months after hiring them, having them create no work product during that time, than have them hit the ground running in C++. Unless I've been fortunate enough to hire a company of Meyers, Alexandrescus, Sutters, and Parents, I pretty much guarantee you the team will still be more productive than a C++ team would.

For all practical purposes, Q

For all practical purposes, Qt is the best library there is. Qt needs a pre-processor, but only because Trolltech wanted to easily convert between text (.ui files) and code; otherwise, they would have used templates.

I would be more than happy if you post some links with good O'caml libraries in the sections I referred above.

Fair Enough...

Achilleas Margaritis: For all practical purposes, Qt is the best library there is.

I still maintain that "all practical purposes" is far too broad, but if Qt is working for you, that's great.

Achilleas: Qt needs a pre-processor, but only because Trolltech wanted to easily convert between text (.ui files) and code; otherwise, they would have used templates.

Assuming templates existed when they started. I dunno. I've looked at Qt many times over the years and never came away impressed. Obviously, your milage varies. :-)

Achilleas: I would be more than happy if you post some links with good O'caml libraries in the sections I referred above.

Happy to oblige:

  1. signals and slots, for doing model-view-controller apps.
    On the (un)reality of virtual types
  2. networking.
    If you mean basic IETF protocol stuff, I'd suggest ocamlnet. If you're referring to process group communication in distributed systems, let me highly recommend Ensemble. If you're talking about CORBA, try CamlORBit.
  3. memory mapped data structures.
    Please see the O'Caml standard library's Bigarray module. Look for the "map_file" function. That, coupled with O'Caml's standard marshaling functions, might do. You might also find this post helpful.
  4. multitasking.
    I refer you to O'Caml's standard "Thread," "Condition," "Event," "Mutex," and "ThreadUnix" modules.
  5. a rich GUI.
    LablGTK is quite nice, and also integrates nicely with LablGL.
  6. value classes.
    This is addressed quite nicely with O'Caml's module language. The example they give is using floats to implement currencies, but to have specific currencies, e.g. dollars and euros, be incompatible types to avoid errors.
  7. operator overloading.
    We saw these, and more, in G'Caml, but so far it hasn't gotten past the prototype stage. I'm still frankly not convinced that operator overloading is a big win.
  8. XML and databases.
    Boy, where to begin? OCaml-MySQL, OCaml-Sqlite, OCamlBDB, or just go straight to OCamlODBC and talk to any database with an ODBC driver? For XML, you can start with the expat wrapper or go whole-hog to PXP. If that's still not enough, how about CDuce or, if a new O'Caml-derived language seems like cheating, just OCaml + CDuce, which just embeds some critical CDuce features into the O'Caml 3.08.2 compiler?
  9. clever and efficient data structures and algorithms support.
    Once again, where to begin? Baire perhaps, or ExtLib, or Okasaki's Purely-Functional Data Structures? Maybe you just want a nice Trie implementation? Or perhaps you need some nice graph theory support?

Shouldn't it tell you something that just by following links from the O'Caml site over lunchtime that I was able to address all of your points, and that only the operator overloading response seems the least bit lacking? Doesn't it bother you at all that the multitasking and memory-mapped data structures questions are answered within the context of O'Caml's standard libraries, and that the "rich GUI" question is one link away and also refers to OpenGL?

It's stuff like this that persuades me that many (most?) people who claim to have "investigated alternatives" and "found them lacking" are lying.

Thanks a lot for the links. I

Thanks a lot for the links. I'll try to check them out all.

Assuming templates existed when they started. I dunno. I've looked at Qt many times over the years and never came away impressed. Obviously, your milage varies. :-)

What was the part that you had a problem with?

Conclusions

Here is my thoughts on the material from links you have given me. Please allow for a margin of corrections, since I am not familiar with O'Caml and I need to learn the language in parallel with the examples.

signals and slots, for doing model-view-controller apps. On the (un)reality of virtual types

The link you have given me presents a typical observer-observable pattern as found in Java. Here is what I think about it:

1) it's not signals and slots, because the interface between observers and obervables is fixed. I want to have a parameterized interface, like in C# events or Qt signals and slots.

2) it's not something used throughout all other libraries.

3) Documentation is not formal. Is it a library or a research paper?

networking. If you mean basic IETF protocol stuff, I'd suggest ocamlnet. If you're referring to process group communication in distributed systems, let me highly recommend Ensemble. If you're talking about CORBA, try CamlORBit.

Ocamlnet is a fine library, but it offers a subset capabilities of the Java SDK/Qt. CGI is not relevant anymore, since more advanced technologies exist. The documentation quality differs wildly from part to part: some parts are documented with ocamldoc, others aren't.

Ensemble seems a fine library if one wants to build a communication mechanism with different protocols, but I failed to see how it is useful in practical applications: how do I do basic stuff like open a TCP/IP port, wait for connections, how to connect, errors reporting etc. I have to say I did not understand much of Ensemble, but I suspect this is because it is an entirely different and irrelevant domain from my line of work. Documentation of it is in PDF format.

Camlrobit seems like a good wrapper around Corba's Orbit, but the documentation is again in a different format.

memory mapped data structures. Please see the O'Caml standard library's Bigarray module. Look for the "map_file" function. That, coupled with O'Caml's standard marshaling functions, might do. You might also find this post helpful.

I don't see anything related to memory mapped data structures. I see information on memory mapped files though, from a guy that wants his app to have full state persistence. What I talked about is encoding of binary messages as data structures. ADA is the best language for this, since developed from the start with that mentality, and C++ comes closely second.

LablGTK is quite nice, and also integrates nicely with LablGL.

LablGTK is a very nice wrapping around GTK, but the problem is GTK itself: Swing and Qt are much better toolkits. GTK does not follow the model-view-controller paradigm (from what I know so far), and the LablGTK documentation is unfinished (and yet in another HTML style).

value classes. This is addressed quite nicely with O'Caml's module language. The example they give is using floats to implement currencies, but to have specific currencies, e.g. dollars and euros, be incompatible types to avoid errors.

Nice example, but how are ML modules any different than C++ classes? I can do in C++ the exact same thing.

operator overloading. We saw these, and more, in G'Caml, but so far it hasn't gotten past the prototype stage. I'm still frankly not convinced that operator overloading is a big win.

Consider the following C++ code:

Vector r = a + b - c;

And the following Java code:

Vector r = new Vector;
r.add(a, b);
r.subtract(c);

Not only it takes much longer to write, but it is not as elegant as the C++ version.

Now consider a two-page algorithm like a fighter trajectory computation, and you will see why operator overloading is really important.

Operator overloading goes hand-in-hand with value classes: Why shouldn't the Currency type (in the context of an example above) be used with operators? it's a number anyway.

XML and databases. Boy, where to begin? OCaml-MySQL, OCaml-Sqlite, OCamlBDB, or just go straight to OCamlODBC and talk to any database with an ODBC driver? For XML, you can start with the expat wrapper or go whole-hog to PXP. If that's still not enough, how about CDuce or, if a new O'Caml-derived language seems like cheating, just OCaml + CDuce, which just embeds some critical CDuce features into the O'Caml 3.08.2 compiler?

Ocaml-mysSQL / Ocaml-ODBC: same api as C, even with the lowercase keywords, and yet another documentation style.

PXP: almost the same api as Qt: document class, node class, etc etc.

CDuce: extension language. As you said, it does not count.

clever and efficient data structures and algorithms support. Once again, where to begin? Baire perhaps, or ExtLib, or Okasaki's Purely-Functional Data Structures? Maybe you just want a nice Trie implementation? Or perhaps you need some nice graph theory support?

Baire: In the writer's own words: Baire is a data structures library for the Caml language. Please note that Baire is an ongoing work, even if already quite usable. Not good enough for big commercial projects. Documentation is in French and non-existent.

ExtList: exactly the same stuff as Qt or JDK.

Shouldn't it tell you something that just by following links from the O'Caml site over lunchtime that I was able to address all of your points

Unfortunately, you did not address any of my points. There are several drawbacks in all the above links that still prohibit me for going to my boss and suggest O'caml.

Doesn't it bother you at all that the multitasking and memory-mapped data structures questions are answered within the context of O'Caml's standard libraries, and that the "rich GUI" question is one link away and also refers to OpenGL?

What standard libraries? all I saw is a bunch of individuals, each one doing his own thing, possibly intersecting with other people's areas (in data structures, for example). If you would like to call these libraries "O'caml standard libraries", then perhaps I should call the entire Sourceforge catalog as C++ standard libraries, since many of the O'caml libraries are in Sourceforge, actually. I will not do it though: C++ has only one standard library, the STL, and Java has the JDK.

It's stuff like this that persuades me that many (most?) people who claim to have "investigated alternatives" and "found them lacking" are lying.

Ok, it's reality time. Since you accuse me of lying (rather politely), let me tell you that you have no knowledge of the requirements for commercial projects like defense applications or Hospital information systems. These projects are never gonna be done on development environments like the one you suggest. The O'caml language is fine, but the development environment is not only the language: it's the libraries, the support, the documentation, the availabity of information. So here is reality, for you:

From one side, we have mature environments like C++ and Java with a few number of libraries developed by corporations that base their existence on these libraries (Trolltech, Sun) that cover most of our needs, with consistent architecture, consisent documentation and huge support.

From the other hand, we have (mature ?) environments like O'caml, with lots of little libraries, spread over the web, with no immediate visibility, with support left at the hand of individuals who code away on their free time, with various architectural designs not compatible between themselves most of the time, with incosistently presented documentation.

Furthermore, O'caml libraries are not vastly different than C++ or Java libs. I would say that they are 90% similar.

Conclusion: which one would I choose to base my defense contract on? or my Hospital information system on? or my web store? certainly not O'caml in the way you have presented. It's too much a risk for business environments.

Well, another conclusion

Achilleas, I believe that the above pointers only meant to show you that there are ways of doing the things you wanted in OCaml as in C++, not necessarily that they are better. This was an answer to your


I have searched for example desktop apps in LISP/ML/Haskell/OCaml that support the above, but I have found nothing. All I find is broken links, 100 ways to code the factorial function, insane amounts of work on how continuations are the best thing since sliced bread, etc etc.

And yes, the absence of Qt bindings for OCaml is a shame -- I personally prefer Gtk to Swing, though. Ymmv.

The standard library is documented on OCaml's website. You'll find low-level networking, concurrency, marshalling/serialization, data structures...

As you write yourself, you are obviously not familiar with OCaml, which makes this whole discussion pointless. Obviously, for any task, you will use the tools you know best and trust. For most tasks, you will use C++, the STL and Qt. For most tasks, I would rather use OCaml, its standard library and Gtk. Note that I have experience with C++ and STL (and only toy experience with Qt).

P.S. :

I still do not understand what you mean by "memory-mapped data structures". Unless perhaps you want to do some low-level DMA transfers or something similar, I believe this is a standard case of marshalling.

Thanks!

As David said, I was (as always) reacting to overly-strong claims of "I couldn't find that for O'Caml," and my reaction can be summed up as "then you didn't try very hard." Elsewhere I've already acknowledged the marketing difficulty related to languages other than C/C++ in certain domains. With respect to the standard libraries, my point was that the threading support and memory-mapped file support is included in O'Caml's standard libraries, so asking about them suggests that you (Achilleas, not David) couldn't even be bothered to look at the standard libraries. etc.

People do use O'Caml in commercial production environments. Obviously, this isn't happening on a scale that looks impressive from a pure numbers perspective—rather akin to Apple selling computers and operating systems compared to Microsoft selling operating systems—but it's plain ignorant to suggest that O'Caml isn't suited for production work.

Achilleas: Ok, it's reality time. Since you accuse me of lying (rather politely), let me tell you that you have no knowledge of the requirements for commercial projects like defense applications or Hospital information systems. These projects are never gonna be done on development environments like the one you suggest.

I'm glad that you feel I was polite. In fairness, I now see that your concerns aren't actually about available features or libraries, but rather uncertainty about or disagreement with their relative level of maturity. I would argue that those concerns are existential: if I find a third-party C++ library, I have just as much effort to vet it as I do an O'Caml library. I'm just as (more, actually) concerned about quality of implementation, documentation, throughput, scalability...

Now let me tell you that if you think I've never done commercial projects like defense applications or hospital information systems, you're wrong, and if you think those systems have always been done in C/C++ or even Java, you're wrong again. Let me also tell you that if I ever find out I've been cared for in a hospital that uses a C/C++ system, I'll sue for criminal negligence. There's no excuse for using such radically unsafe technology in an environment in which people's health or even lives could be at stake.

[OT] Hospitals

I'm curious: aren't hospitals running computers and embedded CPUs on which OSes are written in C ?

Note that this is not a troll. Rather a question related to hardware/OS trust (which might be better suited for a forum topic).

Trusted Computing Base

David Teller: I'm curious: aren't hospitals running computers and embedded CPUs on which OSes are written in C ?

Note that this is not a troll. Rather a question related to hardware/OS trust (which might be better suited for a forum topic).

Right on both counts, IMHO: I'm stipulating that the OS is part of the Trusted Computing Base, and I'll refrain from commenting on why that might or might not be wise on a per-OS basis. :-)

Not just the OS

Software for things like MRI machines has certainly been written in C/C++. Of course, a malfunctioning MRI probably won't kill you, at least not directly.

In fact, I suspect your list of potential targets for lawsuits on this basis is longer than you could pursue in a lifetime. Perhaps you've finally found the way to break the mainstream language stranglehold. You could turn C/C++ into every lawyer's favorite language. "I don't know what C++ is, I just know I get rich suing anyone who uses it!" However, I'd watch out for those silently overflowing integers in OCaml — you could be next!

I See Your Point and Raise You an Overflow!

Anton van Straaten: Perhaps you've finally found the way to break the mainstream language stranglehold. You could turn C/C++ into every lawyer's favorite language. "I don't know what C++ is, I just know I get rich suing anyone who uses it!"

That was exactly the 75% tongue-in-cheek point, very much inspired by the anti-Supers lawsuits in "The Incredibles." :-)

Anton: However, I'd watch out for those silently overflowing integers in OCaml — you could be next!

Indeed, so I'd use this Overflow module and OcamlExc to check my code for uncaught exceptions. :-)

I'll call

That was exactly the 75% tongue-in-cheek point, very much inspired by the anti-Supers lawsuits in "The Incredibles." :-)
I'm apparently behind on the relevant "literature"! I'll have to go find a 5 year old mentor to help get me up to speed.
Indeed, so I'd use this Overflow module and OcamlExc to check my code for uncaught exceptions. :-)

Hmm. I must confess this gives me deja vu. The language has fast, unchecked integers because people demand performance (as evidenced by the other thread about C-class performance). The performance is one factor which drives its acceptance for some real applications. But then we need to start retrofitting solutions to the problems which were created by the pursuit of performance. Sound familiar?

This is not particularly a critique of OCaml itself, which is obviously fulfilling many people's wants. (Note I didn't say "needs".) Other statically typed functional languages also have usability problems when it comes to mixing numeric data types, for example. I think I'm going to remain holed up in my bunker with Scheme's seamless, non-overflowing, dynamically typed numeric tower for a while longer, at least until I can get one of those famous sliders to smoosh all the numeric types into a single one.

Towering Numbers

Anton van Straaten: I'm apparently behind on the relevant "literature"! I'll have to go find a 5 year old mentor to help get me up to speed.

Oh, "The Incredibles" is far from appropriate for 5-year-olds. It is deservedly Pixar's first PG-rated film. I personally would agree with 13-and-over.

Anton: Hmm. I must confess this gives me deja vu. The language has fast, unchecked integers because people demand performance (as evidenced by the other thread about C-class performance). The performance is one factor which drives its acceptance for some real applications. But then we need to start retrofitting solutions to the problems which were created by the pursuit of performance. Sound familiar?

Certainly. My once again somewhat tongue-in-cheek point was only that if in a certain domain you can't afford silent overflow, then you can make it "unsilent." Of course, you can also use the Big_Int or Num modules for arbitrary-precision integral or rational arithmetic. So there are actually a range of choices here, which can be made on a per-project basis according to requirements and what's learned from the appropriate combination of experience and profiling.

Anton: This is not particularly a critique of OCaml itself, which is obviously fulfilling many people's wants. (Note I didn't say "needs".) Other statically typed functional languages also have usability problems when it comes to mixing numeric data types, for example. I think I'm going to remain holed up in my bunker with Scheme's seamless, non-overflowing, dynamically typed numeric tower for a while longer, at least until I can get one of those famous sliders to smoosh all the numeric types into a single one.

You're not going to hear me argue that the Scheme numeric tower isn't a thing of wonder from a conceptual point of view, and if you care strictly about correctness over performance it is, AFAIK, the only game in town. All I was saying was that it isn't necessarily the case that O'Caml's overflows on its int types were silent, and therefore I could balance safety and performance according to my needs. Having said that, I'd be very interested in seeing what a good optimizing Scheme compiler like Stalin did with some heavy numerical code.

Reflection and language redesign

The language has fast, unchecked integers because people demand performance (as evidenced by the other thread about C-class performance). The performance is one factor which drives its acceptance for some real applications. But then we need to start retrofitting solutions to the problems which were created by the pursuit of performance. Sound familiar?

This kind of retrfitting wouldn't be a problem in a language with adequate support for reflection-oriented programming. Does ocamlp4 actually provide this support, I wonder?

would argue that those conce

would argue that those concerns are existential: if I find a third-party C++ library, I have just as much effort to vet it

That's why I said "C++ + Qt" and "Java + SDK", simply because they minimize the need for 3rd party libraries.

if you think those systems have always been done in C/C++ or even Java, you're wrong again

I never said that they weren't. But why does C++ has to be slammed so hard, to the point of ironically asking "why do people use C++?". They use it because it is one of the best languages, that's why. If O'Caml is not used, it is because it's no better than C++, plain and simple.

There's no excuse for using such radically unsafe technology in an environment in which people's health or even lives could be at stake

The 'unsafeness' is in your mind, mostly. C++ is as safe as you care to make it. Put some effort in it, and it becomes totally safe. Remember that no algorithm can be proven 100%, so why should OCaml be safer? just because it does not have pointers? a pointer is a kind of value! wrap that up in a safe pointer class and you are done!

Let's also not forget that operating systems used in critical environments like Solaris or BSD or Linux are programmed in ...C. There are unsafe programmers, not unsafe programming languages.

P.S. : I still do not unde

P.S. :

I still do not understand what you mean by "memory-mapped data structures". Unless perhaps you want to do some low-level DMA transfers or something similar, I believe this is a standard case of marshalling.

I think he means something like this (be warned: have not written Ada in a while, hw manipulation in Ada less so)...

type mouse_data is record
   x : integer range 0 .. 128; -- mouse for very small screen ;)
   y : integer range 0 .. 128;
   button_status : status;
end;

for mouse_data use
   x at 0 range 0 .. 7;
   y at 1 range 0 .. 7;
   button_status at 2 range 0..15;
end;

for mouse_data'size use System.storage_unit * 4; -- constrain to 4 units of memory (usually bytes).

for mouse_data'alignment use 8; -- byte alignment (packed).
...

main_mouse : mouse_data;
for main_mouse'address use some_intrinsic_sys_routine(16#000000ff#);

procedure initialise_driver is
   main_mouse.x := 0;
   main_mouse.y := 0;
   -- ...
end initialise_driver;

This uses two aspects of Adas HW support. First it uses a representation clause to layout the structure we expect (for blah use ...) and secondly it maps a record to an address (for blah'address use ...). Ada allows you to map this to a pointer (or access type) and manipulate it safely (and back).

Ada's hw support is about giving you the ability to do hw level manipulation with some level of safety. You can with the right code map records (structures) to physical memory, but you get static checks on the structures you manipulate (you can fly without the checks if you really really want to, but i ain't getting in that plane ;) ).



I stopped using Ada a while ago, but am thinking of going back to it for a toy project. Be interesting to see what's happening with GNAT in Gcc 4.0 and what's going on with Ada 2006.



Chris

p.s. the best reference on Ada's low level facilities is Cohens "Ada as a second language", but I couldn't find anything online to give you a flavour. Perhaps Ehud knows of a better example?

p.p.s. the compiler doesn't have to honour representation clauses to be an Ada compiler, but something like GNAT will unless it's just plain odd.

Thanks

Okay, thanks. That's the kind of think I hadn't seen used since... well, since I last programmed hardware "drivers" under MS-DOS in Turbo Pascal.

mmap

I thought the original poster was referring to making use of mmap() etc from C. This provides the ability to efficiently map data living on disk (for example) into your address space. It's used a lot for low-level database routines etc, from what I gather. I've never used it myself (directly), and may be wildly off-mark here.

Achilleas, I believe that the

Achilleas, I believe that the above pointers only meant to show you that there are ways of doing the things you wanted in OCaml as in C++, not necessarily that they are better.

But in order for me and my company and my friends to jump on the O'caml bandwagon, there must be something that O'caml does that is much better than C++ as to justify the jump. I just don't see it(with the information at hand so far).

As you write yourself, you are obviously not familiar with OCaml, which makes this whole discussion pointless.

But I am familiar with many programming languages, I know ML and I have figured out how OCaml works (despite quirky things like using '#' for membership). If I, with good experience in various programming languages, can't find anything vastly better with OCaml, how are supposed bussiness programmers (that only know Java or VB, for example) to do so?

I still do not understand what you mean by "memory-mapped data structures"

"Memory-mapped" data structures are data structures that map directly to memory layout. For example, the struct:

struct MessageHeader {
    unsigned short id;
    unsigned short length;
}

Is a memory mapped data structure. I can use the above in C/C++ to find out the arrived message header without the overhead of instantiating other objects. For example:

int main()
{
    SOCKET s = bla bla bla create socket;
    MessageHeader header;
    while ()
    {
        recv(s, &header, sizeof(header));
        switch (header.id) 
        {
            //bla bla bla handle message
        }
    }
}

The above is very important for real-time applications. It is very very important for hard-realtime applications, and quite important for soft-realtime applications. If it was to be done in Java, I would have to instantiate a class, then read from a buffer, then assign the data from the buffer to the instance etc etc. Too much overhead, especially when there is a lot of messages going in and out.

glad you found your answer.

And it only took one day of cursory reading. Now that that's settled, perhaps we can get back to discoursing on a level that doesn't amount to "I've spent 10 years writing in C++, show me in one day how I can do the same thing in another programming language".

Personally, I'd have more confidence in the software for Hospitals and Defense systems if they were written in Ada, but that's probably just me. As a business entrepreneur of sorts, I understand the concept of risk. I also understand the concept that doing the same thing as everyone else involves less risk, but it also means that I have less to differentiate my offering (less reward as well).

Math without overloading doesn't need to be that ugly

Now consider a two-page algorithm like a fighter trajectory computation, and you
will see why operator overloading is really important.

I'm not going to argue that operator overloading isn't nice sometimes, but your example isn't a good demonstration of why. Without overloading, I'd write this:

Vector r = a + b - c;

as this:

Vector r = vsub(vadd(a,b), c);

which I don't see as that much less readable.

Re: Math without overload doesn't need to be that ugly

which I don't see as that much less readable

Here's a quote from Guy Steele along similar lines:

(/ (* (- 2 3)
      (+ 4 1))
   (+ 11 3))

[...]

This two-dimensional effect is difficult to achieve with infix.

Yes, infix may win for small expressions. Years of experience have convinced me that prefix wins for large expressions.

large expressions?

Well, I think there's still a reason that mathematicians don't use prefix.

In any case, how large should expressions get? Not that large, IMHO.

Re: large expressions?

I think there's still a reason that mathematicians don't use prefix.

While it's true that they don't use it to the exclusion of everything else, it must be stressed that they do use it extensively — as in f(u(x,y),v(x,z)).

There might be a reason for the use of infix operators, but I'm not sure what that reason is. Care to enlighten? I'm genuinely curious.

Communication device

Programming is a matter of symbolic manipulation with infix is a fairly normal technique in mathematical notation. Helps some people (but not all) read the code easier.

Mathematicians and their notation

Prepare for a lot of sentences beginning with "And":

Mathematicians are funny with their notation. Sure, you've got infix notation. And of course prefix notation is common, for things like functions named 'f'. But mathematicians have all other sorts of whacked out notation that they use when they don't feel like using prefix notation. There's postfix notation (factorial signs), topfix notation (putting a bar over top for the conjugate), vertical infix notation ("three over five"), top-and-left-fix notation (square roots), top-and-left infix notation (nth roots), outfix notation (absolute value), and let's not even get into the notation used for representing functions that turn functions into other functions (integral signs, derivatives).

And that's not the end of it. There are all sorts of ways to imply functions simply by placing symbols in different orientations with respect to one another. Putting two variable letters together implies multiplication, as does putting a number and letter together (with the number on the left side), as does putting anything next to a parenthesized expression. (Unless it's a function.) And let's not forget that while one form of function notation requires parentheses like f(n) and is spoken "f of n," another form of function notation simply places the argument as subscript, and we end up saying "a sub n." But sometimes it's actually not a function; it's just part of the variable name. And then of course, if the right-hand symbol is superscripted, you just get exponentiation. And since exponentiation is not commutative, you get interesting arguments about how to interpret x^y^z. And if the exponent is placed in parentheses, it suddenly means the nth derivative. And if the exponent actually is a 'prime', well, that's the first derivative. Then there's combination notation, putting numbers above one another like a fraction, except with nothing in between, and two large parentheses on both sides.

Generally speaking, these forms of notation are used because they've been found to be more convenient. Considering all the interesting forms of notation my text editor can't do, why would I ever want to give up infix? It's all I have left.

nitpick

"documentation is unfinished (and yet in another HTML style)"

what's wrong with documentation in HTML? quite frankly, no PDF matches the quirkyness of going through an html manual with Lynx. and no, i'm not kidding.

Seems we all ride the hump these days

Might as well join the band and say that I also mostly program O'Caml (and the occasional C, Python and Haskell) these days. Actually, I guess I would still be programming Haskell for most needs (LoC/functionality) if it wouldn't have been for its laziness (often nice, but I don't like the occasional performance hit) and referential transparancy (weave a monad into an otherwise perfectly clear program if I need a random number or some unique id, you kiddin' right?) and occasional type quirks.

Still, I think I can see why C++ is the industrial language of choice (can't we all): scales in all directions (from 4kb to 4GB and beyond, from inline assembly to GC-ed functional programming, from almost untyped upto static strong typing), strong template system which supports generics as well as compile-time code generation, runs everywhere (portable assembly), hardly needs maintainance (yes, most other language bits do rot, especially with frequently changing languages and unstable APIs), very closely bound to the hardware & OS (direct programming on the API), and most important: doesn't restrict the programmer (what is wrong with pointer arithmetic if I need it?). Did I mention performance (?), yeah, that too.
I guess the statement is "C++ == absolute freedom, and I value that".

How often do you find lazines

How often do you find laziness a performance hit? I haven't programmed a lot of Haskell (practically none), so I don't know. Do compilers have the ability to optimise IO monad code in the same way that any strict code would be? And is that even a feasible solution for optimising parts of a programme for which strictness is preferred?

Often

Often, but I admit domain specific. My programs occassionally construct graphs which easily have more than half a million nodes in them. Lazily, insertions are only evaluated after you inspect the graph; the representation with lazy nodes explodes in these cases. Since most of my programming is referentially transparent, and the eager normal form of a graph is complex but small I want eager evaluation.

In short: I don't even want to be bothered by thinking over the lazy evaluation of the programs I write - I never need it since I don't write programs with expressions which aren't evaluated.

The other two questions: yes and no ;-)

I have to wonder whether the

I have to wonder whether the solution to the imbalance won't have to consist of someone starting a software company that simply disallows the use of C++ in their products.

Sounds good. Want to hire me :) ?

More seriously, I believe that this is the only way of deciding OCaml's claim of being better than C++ for all/many/a number of developments. That, or perhaps launching a relatively large-scale OCaml-based open-source project.

Any ideas ?

The Language Wars

Paul Snively wrote:
...I have to wonder whether the solution to the imbalance won't have to consist of someone starting a software company that simply disallows the use of C++ in their products.

Really, that's what all of these conflicts over whose language is best come down to: if language x is so much better and more productive, then use it. If it's as good as the claims, then it will net you a commercial advantage. It'd almost be better if other companies didn't use language x ;)

You see a similar situation in the battles over the use of formal methods in software development. Now, I've been guilty of advocacy in that particular war myself, mostly because I hate seeing people doing something one way when I know there's a better way. But, at the end of the day, if the use of formal methods really does allow bugs to be caught sooner it will confer an economic advantage on those companies using them. Some companies are using that advantage (see e.g. Praxis Critical Systems). Others are not. Perhaps there are market niches for both.

Trade offs redux

if language x is so much better and more productive, then use it. If it's as good as the claims, then it will net you a commercial advantage

As with many "evolutionary market forces" arguments, the assumption is that utility is a linear value that can be maximized.

But in practice there are trade offs. For example, for most businesses, abundance of people with experience in a language (ease of hiring) will trump almost all other considerations in the long run, because it is a pain that managers feel directly.

The only way to get a level playing field to assess the productivity benefits of X versus Y (language or method) would be to have equally large pools of proficients for both and try them on a large scale.

(Of course, there are other barriers to a level playing field as well.)

There is no level playing field

As you point out, you're never going to get a level playing field. But that's beside the point. The only reason you would want a level playing field to be able to objectively assess "productivity". Why would you want to do that? To convince others that they should adopt your pet technique. My point was: why bother with advocacy - just use what works for you. If a company sees a competitive advantage in using a particular technique, they will use it. Or at least they should. If they're not, they'll get crushed by someone who does (assuming said technique is all its cracked up to be). The only reason to do advocacy is internally, to convince your employer that your pet technique is a good one. If they don't listen, maybe it's time to start your own company, and crush the competition...

New languages do get adopted (otherwise we'd all still be writing asm or FORTRAN). New techniques do (eventually) make their way into industry practice (witness the current push towards formal verification in the VLSI world).

Can't argue with you here. I

Can't argue with you here.

I think the problem is that we crave to find the global maxima ("best language"), yet must accept the fact that in real life the best you can do is move towards the local maxima ("best language given available implementations and all other constraints").

P.S

This isn't an exact analogy since it mixes the fitenss landscape and the enviromental factors, but I hope I managed to get my point across.

Play ball!

If a company sees a competitive advantage in using a particular technique, they will use it. Or at least they should. If they're not, they'll get crushed by someone who does (assuming said technique is all its cracked up to be).

This is exactly what I'm arguing does NOT happen, because technical improvements tend to be incremental rather than revolutionary, and that even when the effect is quite strong, it is frequently swamped by other more visible criteria, such as e.g. community effects.

New languages do get adopted

Yes, but not dramatically unless they can benefit from a founder effect, such as Perl with early CGI, or C with Unix, or a big marketing push from a "big dog" (or a few) in IT e.g. Java or C++.

Whether we like it or not, if we want "better" languages to be adopted in the mainstream, we will need to think like marketers as much as like PL theoreticians. We can't just wait for "better" features to win in the marketplace without a concerted effort.

(Unless of course we are happy with them being our little secret... ;-) )

Revolution through evolution

This is exactly what I'm arguing does NOT happen, because technical improvements tend to be incremental rather than revolutionary, and that even when the effect is quite strong, it is frequently swamped by other more visible criteria, such as e.g. community effects.

There is a strategy to build revolution on evolution, by providing compiler/interpreter implementations for established languages that support better languages, and then show that the implementation allows you to do X cool thing that is trivial in the better language but very non-trivial for the legacy implementations of the legacy language. Once you have sufficient mindshare, you can then offer your better language as a an alternative way of programing for your implemetation. This seems to have been more or less what stackless python was about, and it more or less has been working.

Scala is trying this

My impression is that this is the path that Martin Odersky and the Scala people try to pursue. I hope they will be successful.

Downside is that this approach usually demands for significant compromises in the "better" language that tend to put off those PL folks in search for the "ideal" language...

Phases of language adoption

Stumbled on Phases of Language Adoption (via a JavaLobby post) that figures that there are 3 stages of language adopters (a) Explorers (b) Pioneers and (c) Townsfolk. As with all analogies, it's a gross oversimplification, since the spectrum is more of a continuim.

Wine and Women

Remembers me of some quote of going through a language: you learn to love it, you learn to know it, you learn to regret it.

Haskell Examples

The first issue of The Monad.Reader has a simple memory game written with gtk2hs. A more complicated demo article is scheduled for the third issue.

The #haskell irc channel is a wealth of information and helpful people.

All of the nine points you mention are supported by Haskell. If you want helpful urls for each point, I would be happy to respond in greater detail.

--Shae Erisson - ScannedInAvian.com

Yeap, as I mention above, I w

Yeap, as I mention above, I will be more than happy to find the relevant libraries in any of the good languages. If you know the links, post them, don't hide'em!

By the way, I don't claim that C++ is better than any of the languages LtU prefers. I am into LISP lately, and I have to say LISP is quite clever (to the point of wanting me to make a new language which has the hardware-related capabilities of C and the linguistic capabilities of LISP). But the truth is, C++ is one of the most complete environments for what I do (and Java for server-side programming that does not need much intelligent stuff).

The individualization of LtU

I don't claim that C++ is better than any of the languages LtU prefers.

I, for one, don't know which languages this refers to. LtU had many readers and quite a few contributors. Each has his own experience, and tastes. Some love Haskell, some prefer Scheme. Some work with C, others with Java and yet others with C++. Naturally the list goes on.

The nice thing is that we usually manage to discuss things without starting holy wars about which language is better, by focusing on specific uses of specific language features.

There is clear division betwe

There is clear division between languages: the imperative, and the functional. LtU members seem to prefer the functional ones.

For me, a programming language is a tool. If Java makes me more productive for a certain task, I will use it. If assembly is better, I will use that too. It just happens that I have programmed a lot in C++, so I felt I could answer the original question 'why people still program in C++'. No language holy wars for me, please.

It seems to me that what it is needed is a language that offers 'linguistic' abstraction instead of 'executional' abstraction. 'Executional' abstraction means to abstract execution details away, whereas 'linguistic' abstraction means to abstract language details away. If, for example, a programming language had only one type ('bit'), only one operation (assignment) and only two possible values ('0' or '1'), but a powerful way to provide layers upon layers of linguistic abstraction, I would prefer it any day. It seems that the solution to reuse and enhanced productivity lies in a programming language that can is able to process itself, instead of only being able to process programs.

If it sounds like the principle behind LISP, it is because it is. But another source of inspiration is C, and specifically the C preprocessor: by extrapolating on an ever increasing amount of clever preprocessor tricks, productivity is enchanced in previously untold ways, both in quality and in quantity. Boost.preprocessor is one of the best actual examples amongst them.

Preprocessor ?

Correct me if I'm wrong but I tend to believe that C's (and C++'s) preprocessor, although often useful, is also one of its biggest problems.

Every so often, when switching compiler (typically, between gcc and vc++), platform or library version, I end up having to debug the compilation process itself, digging deep into thousands of lines of headers and macros to find out why expression A, once compiled, ends up with syntax errors and/or different behaviours....

Also, I'm not familiar with professional C/C++ tools for code management, refactoring and analysis, but every complex macro layer I've seen (and there's at least one such layer per toolkit involved) ends up completely breaking all the tools I'm using, sometimes with quite annoying effects (i.e. impossibility to find symbol definitions or to generate API documentations).

Last but not least, I suspect that C/C++'s horribly slow compilation is largely due to the awful lot of string-crunching involved by the preprocessor -- heck, I compiled larger programs about ten times faster with Turbo Pascal on my old 286.

Note that I'm not against this notion of "linguistic abstraction". I think it's interesting. I just hope I'll eventually find a language which does it well. Note that I'm not a Lisp programmer (yet ?).

Preprocessor

"Last but not least, I suspect that C/C++'s horribly slow compilation is largely due to the awful lot of string-crunching involved by the preprocessor -- heck, I compiled larger programs about ten times faster with Turbo Pascal on my old 286."

The preprocessor is actually very fast. It's called cpp on most systems, and you can try running it seperately. The C/C++ preprocessor is very simple, well on par with a complex regex (a little exaggeration). The biggest problem is C++ is that it's unparseable with a context-free parser (the usual sort). The the following three lines:

int (x), y, *z;
int (x), y, new(int);
int (x), y, *z = 0;

The first declares two ints and a pointer to an int. The second line converts x to an int, discards it (noop), discards y (noop), and introduces a memory leak. The second line is in fact an expression, and not a declaration line. However the parser can't tell until it gets to the end of the line (thus your parser requires infinite lookahead). The third line could in fact be an expression, but the rules say that it's a declaration. So your parser needs semantic info to resolve things. Even this wouldn't be that bad -- C has it as well (tho C++ complicates things with references and what nots). The real killer is templates. "Good C++ " (i.e. STL-style, Boost-style) use templates copiously. Templates create more ambigieties that needs to be resolved later, thus requiring the compiler to work harder.

Compile Times

What genneth said. Keep in mind, for example, that "std::string" isn't just a class, but a typedef of a class template based on the std::basic_string class, a type trait for char, and a standard allocator.

If you really want to kill your compile times, just start using Spirit heavily. If you really really want to kill your compile times, use Spirit to parse XML. :-)

There is clear division betwe

There is clear division between languages: the imperative, and the functional. LtU members seem to prefer the functional ones.

I am far from certain that this is true in any meaningful way.

For me, a programming language is a tool.

Of course, for me to. But you won't find many who will object to this platitude.

No language holy wars for me, please.

Great. I hate them.

Let me humbly suggest, however, that claiming one language (especially one of the mainstream ones everyone here knows) is superior to all oothers, and that some piece of software is the best software ever written without having read all software ever produced (an impossibility) isn't the best way to avoid language wars...

Need to fix the &lt;/i&gt;

I think your second italic close tag is broken, Ehud.

Fixed, thanks

Maybe this will fix things

Delete (or just ignore this)

Let me humbly suggest, howeve

Let me humbly suggest, however, that claiming one language (especially one of the mainstream ones everyone here knows) is superior to all oothers, and that some piece of software is the best software ever written without having read all software ever produced (an impossibility) isn't the best way to avoid language wars...

I am not claiming that C++ is the best. I am claiming that OCaml, Haskell etc are not that much better over C++ that one has to ironically ask "why do they still use C++". For me they are marginally better, with the margin being so small that it does not justify the trade-off of performance and productivity.

Have you ever programmed?

I mean, in a Haskell (my language of choice)?

My favorite example is a complete Lagrangian physics integration system with arbitrary precision integration in under 500 lines of code. It consisted of sufficient set of functions, symbolic differentiation, symbolic simplification, integration and evaluation.

I coded it totally in two spare days.

On C, only the lazy evaluation emulation layer took 380 lines of code after week of coding. Then I give it up and forget about C rewrite completely.

So, for me Haskell isn't marginally better. It is miles ahead.

I am not claiming that C++ is the best

Actually, you claimed that C++ is the best. One os the best, naturally. "One of" is the "is".

And I know the answer for "why do they use C++." The answer is: alternatives, even much better ones, require work. Unusual work. That is all.

More than language has to be taken into account

On C, only the lazy evaluation emulation layer took 380 lines of code after week of coding. Then I give it up and forget about C rewrite completely.

Well, so you found out the hard way that C is not an adequate language for dealing with symbolic computations, in almost all cases.

And I know the answer for "why do they use C++." The answer is: alternatives, even much better ones, require work. Unusual work. That is all.

Exactly. When using language X one has to take into account its environment as well: available libraries, documentation, interoperability (e.g., with other languages), stability, portability, cost of maintenance, etc.

At risk of stating the obvious: even if a language excels in one particular area, it does not mean it is the overall best choice.

I perceive a massive difference.

I have not used C++ extensively, but I do believe that marginally better is an understatement.

I do believe that the trade-off in performance and productivity is worthwhile. In my opinion and in my experience, Haskell gives many times the productivity of C++.

Since I don't particularly want this to devolve into a shouting match, how can we usefully compare the two?

--Shae Erisson - ScannedInAvian.com

For language manipulation and

For language manipulation and other tasks that involve a lot of union types, pretty much any language with algebraic datatypes and pattern-matching is likely to come way ahead of C++ in terms of programmer productivity.

The same applies rather rapidly any time you find an abstraction that's hard to express effectively in C++ and is easy in Haskell - I find a lot of these.

I Think This is Fair...

...and I'll add (again; LtU regulars already know this) that I write C++ for a living, and it's marginally tolerable vs. when I last left off doing it for a living some 7.5 years ago thanks primarily to the introduction of the STL, improved standards compliance on the part of the available compilers although VC++ didn't get there until 7.1, and truly multi-paradigm libraries such as Boost.

Speaking of Boost, with respect to O'Caml vs. C++, I'm really not as partisan as I sound in the other thread: some months back someone suggested that he had to write his own expression-language parser because Spirit parsers weren't re-entrant or threadsafe. I pointed out to him that because Spirit parsers are recursive descent he got re-entrancy for free, and a quick look at the documentation revealed that with a single #define before #include-ing the Spirit headers and avoiding unprotected shared state, e.g. by using Phoenix closures, he'd have thread safety. This was appropriate because he was using C++ for other perfectly good reasons. But it was another example of someone saying "X can't do that," and me feeling that they hadn't really tried.

Interesting Point...

Achilleas Margaritis: By the way, I don't claim that C++ is better than any of the languages LtU prefers. I am into LISP lately, and I have to say LISP is quite clever (to the point of wanting me to make a new language which has the hardware-related capabilities of C and the linguistic capabilities of LISP).

I don't think there's an implementation yet, but you might want to keep an eye on BitC. Another strong candidate might be Cyclone, designed as a safe dialect of C, but with various language features that are very familiar to functional-language programmers, albeit more from the ML family than the Lisp family.

Haskell's Nine Points.

signals and slots, for doing model-view-controller apps.

I'd use MVar and Channel from Control.Concurrent for this.

networking

IPv4 is standard, IPv6 is available separately, there's also a pure Haskell TCP/IP stack in House.

memory mapped data structures

Yes, see the Haskell Foreign Function Interface, or darcs' FastPackedString for usage examples..

multitasking.

Control.Concurrent does

  • coroutines in a single OS thread
  • coroutines that use a pool of OS threads
  • bound threads for OpenGL and other libraries that require thread-local scope.

Full SMP support is on the way in GHC.

Update: SMP support is now in the development tree.

a rich GUI.

value classes

Haskell uses 'newtype' for checked type aliases.

operator overloading

Haskell has typeclasses, and you can define your own operators. See the HaskellDemo for a simple typeclass example.

XML and databases

For xml, there's HaXml, Haskell Xml Toolbox.

For database support there's HSQL, HaskellDB, and several others.

clever and efficient data structures and algorithms support

Too many to list, any specific requests past what's in the standard libraries?


--Shae Erisson - ScannedInAvian.com

However

What is true, however, is that there are relatively few desktop applications written in non-C/C++, partly because many libraries -- especially toolkits -- are C/C++-specific.

Yes, if I look at the apps I've used during the past few years, I can probably find around 20 of them written in Python, 3 or 4 in OCaml, one in C# and one in Delphi. Compare this to hundreds or thousands written in C/C++.

This does not mean that any of these languages is inferior to C/C++. Maybe just that they need a little "push".

Completely unwarranted jab

OK, I can't resist being a total jerk.

I think it's hilarious that most people list "type safety" as one of C++'s benefits, as if it were actually type safe.

Well, it IS typesafe if...

...you compare to C.

It's also typesafe, for the most part, if you avoid the following:

* pointer arithmetic, other than as defined by the language standard (i.e. traversing an array or one-past-the-end, use of offsetof and casts to traverse a struct).

* C-style casts or reinterpret_cast

* Abuse of unions

* Uninitialized pointers

* Low-level memory operations (like memset) on objects

* Object slicing

* Creating pointers to out-of-scope objects (i.e. returning pointers to your stack frame)

* Creating pointers to already-deallocated regions of memory (double-deleting objects).

All but the last can be dealt with by simply not doing the indicated thing--either the indicated thing is bad programming practice in any case; or the thing is a C idiom for which C++ provides a more suitable alternative. One can replace pointer arithmetic on raw pointers with STL containers and iterators; one can use the safe casts (static_cast, dynamic_cast), and one can compile with -Wall and pay attention when "uninitialized variable" warnings appear.

The last, of course, is the tough one. The runtime (other than via a tracing garbage collector) cannot prevent double-deletion of objects (probabilistic solutions using guard words and such are tractable); verifying that no object is deleted more than once is not amenable to static analysis without severly restricting the type system (see ATTAPL, as usual). This is, in my mind, the strongest argument for garbage collection; stronger than avoiding memory leaks (GC'd programs can still leak memory--even if all dead objects are collected, there is often much live semantic garbage lying around in any large non-trivial program).

You'd need a much longer list

For example,
  • null pointers
  • arrays (because of their completely unsound interpretation as pointers)
  • delete (it's not just double deletion, it's dangling pointers in general)
  • all kinds of std library functions
  • and much more things I already repressed

Null pointers are another killer, almost as bad as dangling pointers. Since they are completely unsafe, you cannot call something like dynamic_cast or operator new type-safe either, because they may give rise to null pointers.

I claim that it is virtually impossible to identify a safe subset of C++ that would still be workable.

NULL and type safety

A lot depends on how you define "type-safety". If you mean "provably typesafe, such that type errors cannot occur at runtime", than no language with null pointers can be typesafe. Proving that an arbitrary pointer isn't NULL is undecideable, just like proving that the dividend of an arbitrary division operation is not 0 is undecideable.

If, on the other hand, you define "type safe" as "type errors have well-defined semantics", then NULL is less offensive. Dereferencing NULL is undefined according to the C++ standard, but most C++ implementations (and ALL modern implementations you will encounter in the server room or on the desktop) provide well-defined behavior for NULL dereference--a processor exception, usually followed by termination of the process (with platform-dependent trapping mechanisms available). With this looser definition of typesafe, NULL doesn't pose a problem.

(Some embedded platforms map NULL to a valid address; I've had the misfortune to work on one of those...)

That said, it would be nice if C++ (or Java) provided a feature analogous to Nice; a rebindable pointer that is guaranteed to not be NULL. If you don't need rebinding, C++ references are a good substitute for this.

One minor correction to your post--operator new does NOT return a null pointer if insufficient memory exists; it throws an exception (bad_alloc) instead. (There is a variation on operator new, the "no_throw" version, that does act like C's malloc() and returns NULL if an out-of-memory condition occurs; and early C++ implementations did return NULL. But nowadays, it does not).

Re: Well, it IS typesafe if...

It's also typesafe, for the most part, if you avoid the following:

Did you mean to write, it is typesafe if you avoid libraries that use the following?

Alternatively, why do you *not* program in C++

After reading Scott Meyer's two books, or 85 Things to remember at all time to prevent yourself from shooting your foot off with C++, I realized that I would never be able to keep all that in my head. I realize there are subsets of C++ that can be used effectively, and I occasionally have to use them, but there are many different subsets in use, and with the various non-standard libraries these almost become different languages to learn.

Why do I not program in C++?

Well,

* If I want to play around with programming languages

* If I am doing something for which garbage collection is appropriate

* If I want to elimate the bugs that accompany explicit pointer arithmetic and/or memory management (and the loss of these features is acceptable for my application)...

* If I'm doing something heavy in string/text manipulations

* If I want dynamic typing

* If I want to write a "script"--in other words, I want my program to live (and be deployed) as source, and not have any (obvious-to-the-user) compilation/link stages around.

etc....

Any of the above is ample reason to NOT use C++.

Yeap, all these reasons are v

Yeap, all these reasons are valid reasons for not using C++, but there are lots of other reasons to use it.

And I don't agree that Haskell or any other language simply enhances one's productivity; it all depends on the project. You may code an exotic algorithm in under 500 lines of code, but there are other problems that C/C++ excels in. Databases, for example.

By the way, does anybody have examples of recursive-descent parsers done with functional languages? I would like to see how efficient they are.

Interesting...

Achilleas Margaritis: And I don't agree that Haskell or any other language simply enhances one's productivity; it all depends on the project. You may code an exotic algorithm in under 500 lines of code, but there are other problems that C/C++ excels in. Databases, for example.

Given that my day job consists of keeping a C++ codebase talking to a database, and I've done the equivalent thing in half a dozen different languages, I have to say that I find this a particularly inapt example: database integration, at least today, is one of the things that C++ is most miserable at. Something like Hibernate in Java or Og or Active Records in Ruby is vastly superior to anything I've found in the C++ world. You really want some kind of tuple type and reflection to accomplish the C++/SQL mapping, and even with heavy use of the preprocessor and template metaprogramming, C++ just isn't competitive with frameworks in languages that actually support such concepts. And yes, I'm keeping my eye on Boost.Interfaces and Boost.Names, even at this early stage.

Achilleas: By the way, does anybody have examples of recursive-descent parsers done with functional languages? I would like to see how efficient they are.

Sure. See camlp4 or the tersely-named "cf" library for O'Caml, Parsec for Haskell, and I'm sure someone knows of something equivalent for Standard ML or other languages.

Hmm, he might talk about impl

Hmm, he might talk about implementing (well-performing/robust/*buzzword) database systems.

Tuples in Java?

Did you really mean that? I really miss them.

Straw man?

This seems like a bit of a straw man.

Because they are forced?

I do not think any sane programmer would use a badly-designed, anachronistic, terrible language like C++ unlike they are either totally unaware about what a language could provide or forced by the circumstances. C++ can server as an example of how not to design a language. Though it is not really C++ fault: it is just the lame attempt to "upgrade" the badly designed terrible language C which is an even bigger anachronism.

Yes and No

1. That's the only reason why I use C++.
2. C was actuallly well designed. But as a high-level assembly language, not as a general-purpose, one-size-doesn't-quite-fit-anybody language.

Not so simple

I do not think any sane programmer would use a badly-designed, anachronistic,
terrible language like C++ unlike they are either totally unaware about what a
language could provide or forced by the circumstances.

I don't think it's nearly so simple as that. In lots of circumstances, you want
a good match for the hardware and predictability, in that you can roughly
understand how your code is compiled. Predictability also means that you know
when you're handing off control to a magic unknown function, like malloc in C.
Just that there's always the chance that a 50 millisecond garbage collection
will occur at a bad time (for example, the expected frame rate for most video
games only gives you 16 milliseconds per frame of total processing time). You
can tangle yourself up impressively with C++ if you aren't careful, but you can
strip it down and keep your code transparent and straightforward if you desire.
There's something to be said for simplicity. Given new hardware (say, for an
embedded DVD player or somesuch), is it easier to write C compiler for it or a
Haskell compiler? Depends on your background, I guess, but it's easier (for me)
to see how to map C to most hardware.

Of course most of this stems from effort being put into CPUs that are designed
to run C well. If the original IBM PC had a Scheme chip in it, instead of an
8086, maybe everyone would be complaining about how we're all forced into using
Scheme :)

Just to clear up a few things...

Pardon any errors in my history; much of this occured while I was still in grade school/junior high. :)

The original 8086 was not designed to run C; as anyone who has ever messed with a "far pointer" will attest to. At the time the 8088 were developed, C was still something that was intimately tied to the Unix-based minis of the time, usually VAXes and such.

I'm not aware of any evidence that the 808x was targeted to any high-level language (its instruction set is a pain in the ass for both human programmers and compiler back-ends to use). The microcomputers of that era (CB/M systems in business, as well as the 8-bit "home computers" from Apple, Atari, Commodore, and others) were generally not programmed in C. Assembly language was required for any serious software (due to performance considerations), and various flavors of BASIC were usually used for simple tasks. The first high-level language to make serious inroads on the PC (as well as the Mac) was Pascal, not C.

The 16-bit 68k based machines of the era (early MACs, as well as the Commodore Amiga and Atari ST series) did have C compilers available; but most serious applications development was still done in assembly.

Most widely available, general purpose processors of that era, if they were targeted towards any language at all, were targeted towards manual assembly language programming. Both the VAX and the Motorola 68k series have reputations for being assembly-programmer-friendly (the x86 isn't friendly to anyone, it seems).

In the late 80s and early 90s, of course, the CISC vs RISC wars heated up. Unix-based machines started making more and more inroads into business computing; and C was the dominant programming language on these beasts. It soon became clear that the 68k and VAX instruction sets were architectural dead-ends performance-wise, and RISC arose. RISC machines were designed primarily to allow very short cycle times and pipelining; but it also was the case that code generation for RISC machines was easier to do automatically (the output of a compiler) than manually. As RISC machines don't provide HW support for dynamic languages (no tag bits, etc.), static languages--chiefly C, and later C++--would become the language of choice for high-performance code.

C/C++ also became the systems language of choice for the PC world when Windows and OS/2 (and Linux as well) started shipping. (MS-DOS was written in assembly). At about this time, the x86 series (which was up to the 80386) was getting clobbered performance-wise by competing RISC designs, so Intel started migrating the franchise to an underlying RISC design (with a whole lot of logic to map the x86 instruction set to the RISC core) with the 80486--which was a fully pipelined architecture. Superscalar execution was introduced with the Pentium, and the rest is history there. (This hybrid architecture comes with a steep price, as modern Pentium-class CPUs consume enough current to jump-start a small car! Though the voltage is considerably less...)

C became the preferred systems language on the Mac when Apple abandoned the 68K architecture in favor of the PowerPC, and was forced to rewrite the OS (which was written in 68K assembly). C was the systems language of choice for IBM machines running the POWER architecture, which the PowerPC was based on. The switch to MACH with OSX 10 further cemented C's stature in the Macintosh world.

It still isn't quite correct to say that modern CPUs were designed to execute C, to the exclusion of other languages. Certainly, the x86 wasn't designed with C in mind. Modern CPUs certainly support non-stack-based static languages (like C or Pascal) better than dynamic languages (Lisp, Smalltalk) or stack-based ones (Forth). But the predominance of C/C++ in the marketplace today is more likely due to it being the de-facto systems language for the dominant operating systems.

Let me rephrase my comment

Let me rephrase my comment: most popular CPUs are designed to be very
procedurally-oriented in the generally expected way. You don't see
hardware-level support for type tags (but you do on the SPARC), CDR-coding,
garbage collection, or anything else "exotic." C, as it was designed to be a
low-level language for such hardware, is also very procedurally-oriented in the
generally expected way. You can more or less see how C features map to the
hardware.

(As an aside, there were some "high-level" instructions added to the x86 in the
80s, but they're usually microcoded--and therefore frowned upon--these days.
BOUND was for Pascal-style checking of array bounds. ENTER and LEAVE were for
setting up and tearing down stack frames.)

ccNUMA appears procedural also.

I think that cache coherency in non-uniform memory access systems is also a procedural/mutable pattern.

If the hardware instead used allocation of immutable chunks and garbage collection, you wouldn't need to broadcast changes to the data in the cache of each CPU. Only the garbage collector could invalidate cached data.

I think this sort of approach might be faster for data that fits into the different caches (level 1, level 2, etc) because you wouldn't need to allocate a chunk of main memory for temporary items.



This is just something that I thought up recently, but Google shows that I'm by no means the first to think along these lines. See Cache-Conscious Copying Collectors by Henry Baker.

--Shae Erisson - ScannedInAvian.com

What a great opportunity

to recycle Alan Perlis's famous quip (itself a take on an Oscar Wilde quote) that "LISP programmers know the value of everything and the cost of nothing."

e

The Prolog version of Alan Perlis's quote

"A Prolog programmer knows the cost of everything but the value of nothing."

Here is the explanation: Prolog has a simple operational model, so a program's execution cost is easily calculated. However, Prolog has logic variables. A logic variable always represents a fixed value, but the value is unknown before the variable is bound.

Of course most of this stems

Of course most of this stems from effort being put into CPUs that are designed
to run C well. If the original IBM PC had a Scheme chip in it, instead of an
8086, maybe everyone would be complaining about how we're all forced into using
Scheme :)

Or maybe they wouldn't be complaining at all.. :)

What's different about a Scheme chip?

I've been playing with FPGAs recently and I've implemented some simple microprocessors. This stuff was modelled on what I already know about microprocessors I've used in my life ranging from the 6502 to the Pentium. What should I do differently if I wanted to run, say, a simple functional language on my CPUs?

Scheme chips

You might start by reading about the Symbolics Lisp machine; a computer which ran LISP "on the metal". (The dialect of Lisp that ran on these was one of the precursors to Common Lisp). A Lisp Machine should have little trouble running Scheme; although a dedicated Scheme machine (a processor optimized for running Scheme) might have better support for continuation-passing style.

At any rate, some possible hardware features which might be beneficial for dynamic languages:

* "Tag" bits to distinguish pointers from literal integers, and an instruction set that understands these tags and can act accordingly.

* For LISP-y langauges, the ability to store an entire cons node in a register (mainly for performance reasons), and the ability to quickly extract the car and cdr

* Better hardware support for garbage collection (things like write barriers and such). Or even GC itself in hardware.

Others can list more.

One problem, of course, is that to get many of these things to work properly requires more than just modifying the CPU. Memory devices may need modification (to store the extra metadata); bus architectures, etc. The current style of CPU architecture widely used enjoys numerous economies of scale that would be difficult to duplicate in a radically different architecture. As a result, trying to re-invent the Lisp Machine is probably a losing proposition. I suspect that the avenue toward improved dynamic language performance is improved compilers and runtime environments (and possibly more dynamic-language-friendly OS's as well). Type inference, when possible, is a great benefit here; as it can eliminate many instances where tag bits might be needed.

Dylan

It may be worth looking at Gwydion Dylan here. The compiler uses a 2-word architecture (one word for type tagging, one word for data), on the basis that it can type-inference enough that it'll usually be able to eliminate one of the words from almost every variable.

Tag bits

The simplest addition would be a tag bit to indicate which memory cells contain addresses.

The game is trivial, but the game is untrivially played trivial

[I am going for football quotations for a while; the above one is translated from football professor J. Cruyff ;-)]

What functional language are you thinking of?

In some sense tag bits provide autoboxing of integers (in registers/memory locations) on the assembly level.

On first glance, this is good for lisp-like languages which have dynamic typing but functional languages which are statically typed actually most of the time don't need tagging since they know the types of expressions compile time.

Since it is, in some sense, an autoboxing feature, one of the problems you don't nicely get rid of by tag bits is the sane treatment of the run-time type error (for a dynamically typed language); for example, what do you do when you add a pointer and an int? Of course you can solve this at the assembly level but then you need to think the operational semantics of assembly instructions through very carefully. Modest type inferencing should deal with a lot of these problems.

Guido van Rossum decided against tag bits for Python because of the added unwanted level of complexity in the compiler; why think about it if it is unkown what benefit it actually has?

I think OCaml uses tag bits just for their GC so they can easily determine the reachable node set. Here, my guess is they must even take some performance hit, added assembly instructions, for keeping the tag bit invariant.
In the end, even if it does improve run-time performance, this introduces a schizzo situation since they don't need it for arithmetic; tag bit assembly would, in their case, reduce assembly instructions so you would expect some little increase in performance but then again (caches, busses, pipelining)... my guess is that if you have strong typing you should be able to exploit that even in the GC algorithm... but then again...[insert small PhD thesis here]

In the end it is about delegation of concerns, isn't it? So the question becomes: do you really want to delegate this kind of information to the assembly level - where is the benefit exactly?

[duh, edited heavily and still find my own post annoying, ah well]

Appel on tag bits

I previously summarized Appel's take on tag bits in this post.

Look, the ball minimally has to go between those two poles

[Yet another nugget of Cruyff]

Ha, missed that one; seems LtU needs a PL wikipedia. Predictable somewhere, if GC needs RTTI then you need to store the information somewhere (in a register if possible, in a record, or in some type table). I personally would favor the record, so, well, let's go back to Algol it seems.

Lambda: the ultimate...

... Opcode