PLs and large scale development

Personally I'm very comfortable with using a combination of C++ (with STL, Fusion2, Phoenix2), Python (with PyGTK, SciPy) and SWIG. But then programming is just a single hobby of mine (audio stuff and some quantum mechanics, mainly).

So I'd like to hear some opinions from more experienced or even professional developers out of curiosity. What do you think about the common PLs in use with regard to large scale development? Please don't compare different languages, but instead tell me about a particular PL you have used and why it is suited for a bigger team of programmers or why it is a hindrance.

EDIT: "large scale" = "more than a handful of developers"

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

What do you mean by "large scale"?

What I'm working on right now is about 500k lines of code. That's still several orders of magnitude smaller than the largest-scale projects out there.

Most large-scale projects are written, I suspect, in a multitude of languages, and delivered as a multitude of separate programs of various types which communicate via various means (the filesystem, databases, networks/the Internet), and have various ways of interacting with users.

With those caveats, most large-scale projects I suspect gravitate towards the "mainstream" industrial languages (C/C++, Java, .NET languages, "P" languages, SQL)--simply that's where the plethora of programmers are. Exceptions are most likely found in applications with particular requirements (telecom/Erlang, DOD/Ada). Even if a 500 million line (C++ or Java) project could be shrunk to one tenth the size by coding it in a suitable higher-level language, where are you going to find the programmers to deliver a 50-million line Lisp or Haskell or Smalltalk application?

Re

Well, that's what I suppose. But I don't really know anything about that. That's why I interested in some opinions / observations on this topic.

mutable

"where are you going to find the programmers"
Same place we looked when we needed C++ and Java programmers - the programmers we already have, retrained. (Some might suggest that domain knowledge is harder to acquire than technical skill.)

800 lbs. Gorilla

The 800 lbs. gorilla here is that C++ in particular was marketed—sorry, but here it comes—on a lie: that you could take C programmers and they'd become effective C++ programmers without significant retraining. Java was, IMHO, marketed on a similar lie: Java is just C++ with the point(er)y things removed, when in fact its semantics are a lot closer to a statically-typed Smalltalk's than to C++. The visual I have of C++ and Java marketing is a trail of breadcrumbs leading to a gingerbread house; the screaming coming from inside is from the C programmers circa 1990-1993 and the C++ programmers circa 1995-1998 resisting the change that they were misled into.

This is why I think something like Tim Sweeney's Next Mainstream Programming Languages is so important. We need to acknowledge up front that prior experience and syntactic familiarty matters, and that if we're going to significantly change semantics out from under people, then it needs to not be by subterfuge (or, to be charitable and probably more accurate, ignorance and accident) but rather by onion-layering, e.g. mutation looks syntactically ("x = y") and behaves in common usage patterns just like it does in C/C++/Java/everything else... but when your requirements make the common usage pattern break down, you can learn that you've actually been using Software Transactional Memory all along, and with some new composable language constructs, your code works fine in multithreaded, multi-core circumstances without horrendous global coordination of locks/mutexes/etc. Common control-flow patterns look just like they do elsewhere, but when you suddenly need to find all matches of a pattern instead of just the first one, the common usage easily generalizes to something like Icon's goal-directed evaluation. And so it goes—or at least, so I fervently hope it goes.

Of course it is true that

Of course it is true that learning new languages is quite some work. However you can have others supervise the "newbies" for some weeks and correct their mistakes and have them read some useful books on the language and look at older projects for getting some clues on the appropriate coding style. Oh, four "ands" in a single sentence... ok, whatever.

BTW Tim Sweeney's lecture got me interested in programming languages again. This way I found out about l-t-u.org.

I remember T. S. writing: "Type-inference doesn't scale to large projects." Why? He didn't give any further explicit reasons as far as I remember.

Type inference and large projects

A couple thoughts, which may be way off base.

1) Sometimes, you don't WANT type inference, especially at module boundaries. Sometimes you want to explicitly declare the types of things, because that maps on to a domain or functional requirement, and have the compiler reject code which doesn't conform to this external specification, rather than trying to "fill in the blanks" and getting it wrong. Of course, fans of typeless languages have one solution to this problem which is probably useful in a type-inference environment: A robust set of unit-tests.

2) Aren't type inference algorithms superlinear with respect to code size in running time? How about for modular programs--is type-inferring a module dependent only on the size of the module, or do the size of the dependencies matter?

Good points

I seem to recall Tim writing elsewhere that all he meant was that you should declare function argument and return types, otherwise letting them propagate. It's interesting to note that this is exactly what Scala and Felix require.

Hindley-Milner type

Hindley-Milner type inference is nearly linear in the size of the largest type of your program. However, it is possible to write programs whose largest type is doubly-exponential in the size of the program.

A lot of the subterms of this type can be shared, so this can take up to exponential space. This is why type inference has worst-case exponential run time, simply because storing that much stuff into memory has to take exponential time.

For example, the following Ocaml code will take quite a while to print its type:

let x1 = fun y -> (y,y) in
  let x2 = fun y -> x1 (x1 y) in
    let x3 = fun y -> x2 (x2 y) in
      let x4 = fun y -> x3 (x3 y) in
        let x5 = fun y -> x4 (x4 y) in
          x5 (fun z -> z)

Happily, nobody ever writes programs like this on purpose.

definition modules, implementation modules

"Sometimes, you don't WANT type inference, especially at module boundaries."
iirc Clean definition modules are made up of type header definitions - so stuff exported from a module does always have an explicit type definition.

Just wanted to note that

Just wanted to note that Stroustrup is insistent that C++ was not 'marketed', that AT&T provided no marketing budget, and frequently complains that the reason people started thinking java fixed C++'s (alleged) mistakes was because of Sun's marketing dollars. Not to say that there wasn't any C++ marketing by vendors... but I'm not sure it took off because of marketing.

Definer vs. Vendor

Yes, I didn't mean to imply "marketed by Bjarne Stroustrup or AT&T;" in fact, I didn't mean "marketed" formally at all, but rather that much of the justification—from vendors, from books on C++, from technical managers at employers, whatever—strongly had this flavor to it. On the contrary; I've read that all that Bjarne Stroupstrup wanted to accomplish initially was type-safe linkage!

"Type-safe linkage" - and

"Type-safe linkage" - and there's still no standard ABI even today. Naynaynay...

Given that C++ is typically compiled to native...

a "standard ABI"--promulgated by the C++ committee and binding on all platforms--would be difficult if not impossible to produce.

API specs for compiled-to-native code should come from system vendors (including OS suppliers, CPU suppliers, computer companies, and consortia of the above). It's not unreasonable, after all, for there to be different ABIS for Windows/IA32, Linux/IA32, and Linux/PowerPC; code compiled for one machine isn't going to run in another without recompilation.

C++ also has C's lax history in this regard to respect.

That said, some things that ARE within the scope of the C++ standard probably should have been standardized, at least for "reasonable" (non-legacy) platforms. Things like name-mangling, etc.

Name-mangling was what I

Name-mangling was what I mainly thought of when I said ABI. So ABI is a misnomer for what I meant, sorry.

Name mangling

Name mangling is specifically un-standardized to prevent incompatible ABI's from mixing accidentally.

See How To Write Shared Libraries by Ulrich Drepper for lots of entertaining info about C++ ABI's:

In the new mangling scheme used in today's gcc versions and all other compilers which are compatible with the common C++ ABI the names start with the namespaces and class names and end with the member names.... The mangled names for the two member functions differ only after the 43rd character. This is really bad performance-wise if the two symbols should fall into the same hash bucket.

Also, there is some further chuckles to be found in OpenOffice.org startup and relocation processing.

The 1600 lbs. Gorilla...

Where I work we have a 38 year old application that's about 40 million LOC of cobol (!!!) with about 500,000 LOC of java web interface (I don't really know, that's just an estimate). The cobol dinosaurs still use the green screen mainframe editor that offers little more functionality than MS Notepad. Clients pay a premium for this thing because there is so much business logic tied up in it it's damn near unfathomable. Plus there's almost zero documentation (their motto is 'the codes the documentation' yeah right).

So how many of these beasts are left in the world? Several I imagine.

one. singular. gorilla. every little step it takes.

I worked on a Fortran-to-C++ reengineering project just apres college. It was kinda a pain, and it was nothing like the size of your beast. Maybe whatever refactoring/extracting/understanding/reengineering tool that can solve such a problem is something to actually avoid, since the AI involved could lead to doom.

so heavy ... but ... does it work fine?

OK, I guess you don't like the way that application was built, it is not the kind of work I would like to do certainly.

But ... IMHO one point to analyse is:
- does the monster app fulfill the requirements for which it exists?
- how many errors does it manifest in production stage? None, a reasonable amount, too many?
- how difficult is change the app as the requirements change?

People outside software world don't pay for nice software as we software people define it, pay for the possibility of doing whatever (or something!) they want to do, that's their software aesthetic criterium.

Now we have monster projects experience, and I think the best thing we can do is effectively learn from this experience, design programming tools and environments taking this concrete experience as basis, and test them in real-world situations ...

... as we were ingenieers :D

PS: please excuse me for my awful English :$

In my (admittedly limited) experience...

...retraining developers to use a new language and then deploying them on a new project which uses that language, is a dangerous thing to do. Especially if the language involves a different paradigm (such as going from a procedural to an OO language, or OO to a non-subsumptive functional language). I've seen it tried several times, often with disasterous results. Not because the programmers in question were bad--but because becoming fully productive in a language, for most people, requires a bit of experience--to learn the idioms and corners the language.

If you are going to deploy a new language, it is probably wiser to staff your first project with a mixture of hired guns from the oustide, who have language experience but perhaps no domain knowledge, and your staff with domain knowledge but beginner-level skill in the language. And even then, I would be very nervous about doing this on a crucial project--better for people to get experience on something which can afford to fail.

"get experience on something

"get experience on something which can afford to fail"
I don't recall any of those :-)

"disasterous results ... Not because the programmers in question were bad ..."
I'll disagree - I recall "disasterous results" when the new language was hoped to be a silver bullet and the actual problem was that the programming staff weren't very good.

The Lake Wobegon effect :-)

I've seen bad programmers

On the projects I'm thinking of, none of the programming staff was of the God-awful sort that makes regular contributions to the Daily WTF. But still, your point is valid--programming teams tend to overestimate their skill--especially those who don't hang out on places like LtU where we are constantly reminded how clueless we really are compared to some of the heavyweights here.

But yes--both C++ and Java were peddled (somewhat) as "silver bullets" upon their introduction. Which were then used to shoot ourselves in the feet.

I especially like the

I especially like the Dunning-Kruger effect linked from there.

When I first began learning

When I first began learning to code, I kind of assumed that most programmers were like me. They like to find new and better ways to solve problems.

I have written at least trivial programs in C, C++, Python, Ruby, Standard ML, OCaml, Haskell, Scheme, Common Lisp, Emacs Lisp, Erlang, Java, Scala, Eiffel, Lua... and probably others I can't think of right now.

But it turns out most programmers aren't really interested in learning anything new. Why did these people get into this field?

Learning...

True, it's kind of sad to see such a mentality, also in a much broader sense. I know a few people who aren't even interested in reading ("Hah, now that I've finished school, I'll never touch a book again!" type). But I just can't imagine myself stopping to read anything related to my interests I can get my hands on. Not that I'm reading much, but at least I do so regularly.

I strongly believe you should try to develop yourself all life long. Unfortunately many people seem to get stuck along the way. And I started noticing this though I'm just 23..

Careful...

...programmers still interested in learning new things might not be strongly motivated to learn new languages. For example, I know plenty of people who are very interested in things like (say) graphics or machine learning, and after they pick up R or Matlab they don't show much interest in learning new languages, because they perceive the benefits to be tangential to the focus of their interest. But they're still curious and learning new things.

This is not to deny the existence of inert lumps, though.

Control freaks

Those programmers love the sense of control. They love that the computer does exactly what they tell them to do. They hate bugs, because then some of the control is lost. Also they absolutely hate trying a new language because while they are learning the language they have less control then with the language they have already mastered.

Most people do not learn new

Most people do not learn new languages for their own sake. There is also too much overlap between them to make language-zapping an exciting activity ( now you have 30 channels but watching very similar programs ).

People learn a new language

* for their job
* for a special activity / tool they want to master
* when being language researchers or designers

One word

Ada.

this comparison study by

this comparison study by ericsson, analysing a concrete telecommunications problem, gives sound arguments why erlang is a good choice for large scale development of distributed, fault tolerant systems.

i'd like to add the obvious, though: by far the most LOC in telecommunications are still written in c, c++, or java for more mundane reasons than the technical merits.

i'd like to add the obvious,

i'd like to add the obvious, though: by far the most LOC in telecommunications are still written in c, c++, or java for more mundane reasons than the technical merits.

Most lines of code I can see, but what about the most sophisticated? What about the most reliable?

Most lines of code I can

Most lines of code I can see, but what about the most sophisticated? What about the most reliable?

as a matter of fact, in a proportion similar to LOC of code bases, the most sophisticated and most reliable products are also usually implemented in the aforementioned languages. ericsson is actually not standing out in the industry.

it is a common misunderstanding that better languages yield better products, just as it is another common misunderstanding that only languages strong inside your domain provide for sophisticated development.

large scale development often reduces to implementing DSLs which break the large problem scale down to manageable chunks. ironically, in telecommunications this usually breaks down to reimplementing a good portion of erlang in the implementation language of choice.

Ericssons ban

Just a side note. People who are using Erlang probably know this but I was surprised when I first encountered the information that Erlang was considered a legacy language / banned for new projects by Ericsson a while ago. This tells a lot about the status of programming languages at large tech companies.

Ericsson charging ahead

I'm not an Ericsson insider but I can tell you that this is old history from when C++/Java/UML were hyped towards executives. Today Erlang is bigger than ever within Ericsson and they're shipping major new products on it. They even managed to hire Joe Armstrong back.

charging ahead

The email linked to by Kay is actually Joe pointing out that he's back at Ericsson, and giving the "continuation" since then. An interesting read. Joe's take on corporate policy about such things applies to many more companies than Ericsson.

this reminds me of this very

this reminds me of this very amusing ericsson internal erlang marketing video, which provides a quite bizarre display of the wide gap between the executive and technical sides of a company.

somehow, though, it still manages to show off erlang. :-)