ACM Queue: Security Problem Solved?

Buffer overflows are one problem that the world seems to know how to solve, as evidenced by languages such as Java, C#, and Python that are not susceptible to the issue. On the surface, the reason why we still are plagued by the problem is obvious: we still use C, C++, and assembly in a heck of a lot of applications.

Among the things mentioned in this essay: String-handling libraries, Static analysis tools, Cyclone, and zero knowledge protocols.

What happened to all the editors? Is everyone on vaction? ;-)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

The essay is not about buffer

The essay is not about buffer overflows only, but about security in general. So it's unfair to mention C/C++/assembly as the only source of problems regarding security.

First of all, a common misconception is that the problem of buffer overflow is a problem of security. It is a problem is safety, not security. It is a side effect that a safety problem can be used for compromizing security. A buffer overflow is like a prison wall falling down, allowing inmates to escape: the wall falling could also kill people.

Secondly, C/C++/assembly have their place in the world of programming. There are certain domains that need these languages, especially low level programming. But these languages come with a big sticker saying that 'the programmer is in full control'. So if someone feels he/she does not want to have all that control, there are other languages that can be used for software development.

Thirdly, C++ provides all the means for preventing buffer overflows, since all STL collections have overflow-checked access control protocols. The effort required to use those protocols are minimal; in most cases, one has to use the method 'at()' instead of the operator '[]'. If one considers that too much, then he/she should not program at all.

Finally, one of the larger (if not the largest) percentage of responsibility is on CPU designers, especially Intel, that have not provided a good way to make data non-executable in modern CPUs. The best solution would be to provide separate R/W/X bits for each protection ring at page descriptor level (the 386 page descriptor has enough room for that; 6 bits are only required: 2 for each type of access, since there are 4 protection rings), but we got the NX bit instead, but rather late.

Just a little point.

So if someone feels he/she does not want to have all that control, there are other languages that can be used for software development.

I don't like the use of "does not want to have all that control" here. It shouldn't be a matter of wanting control, as much as needing it, in my opinion. This is a fundamental principle of security theory – don't give the user more control over a system than they need. If a programmer does not need low-level system/data control, should they really have it?

A million programmers can be wrong

Secondly, C/C++/assembly have their place in the world of programming.

That's certainly true for assembler, as a direct interface to traditional CPUs. But as for C/C++, ignoring legacy requirements and what people think they need, what features of these languages do you think cannot be obtained by languages with fewer poor bad design choices?

WYCIWYG performance.

WYCIWYG performance.

fast enough

Well, software must be fast enough, but after that...

Some other old languages seem fast enough - Modula-2 tools tailored for on-board spacecraft software development.

What is fast enough, btw ?

Some percentage of C programs are written without solid performance requirements. The performance requirement is then solidified post hoc. The actual implementation could actually be only at 50% of the performance achievable in C, but this is not investigated, so it's not known.

In many cases writing in C is like buying insurance so that the program won't be too slow as opposed to the program will be very fast.

FUD

buying insurance so that the program won't be too slow
Seems like you're saying, some choose C to achieve a performance requirement that doesn't actually exist - if that's the case, shouldn't we regard it as incompetence?

Maybe

Requirements tend to be nebulous early on. Even when one openly admits that one doesn't know how fast the program should run, one has to put down a concrete figure. One then errs on the side of caution (read: C, heh, this is not the segv kind of caution).

When all is said and done, one realizes that it was possible to write in Java/Scheme/etc.

Thus, I wouldn't really think of this syndrome as incompetence. At least not technical incompetence.

This is a requirement far les

This is a requirement far less often than people think that it's a requirement though, isn't it? Other than very low-level systems, is it ever needed?

I don't want to judge about p

I don't want to judge about people in general.

I agree that this requirement is mostly needed for relatively low-level applications, and applications with hard real-time/responsiveness constraints.

C has no monopoly...

...on predictable performance. Ada, for example, has the same sort of properties as C in this area, and has been proven in real-world, performance-intensive applications.

The reality is that we're stuck with the C family and its warts only for legacy reasons, and because of erroneous perceptions like the one I'm arguing against here. People often mistake C's ubiquity at the system level as meaning that something like C (unsafe and low-level) must be necessary at that level, but that simply isn't true.

Please don't forget that C ha

Please don't forget that C has proven itself as well.

Proven to produce buffer overflows and memory corruption?

The point is that C/C++ have a number of serious flaws, all of which we know how to correct, essentially without technical cost. The only real reason we put up with those flaws is because so much computing infrastructure has been developed to depend on C, which makes replacing it impractical. What I'm objecting to is the idea that a language with such flaws is somehow technically necessary at some level in a system, or for some domains. It's not.

My point is that there are ap

My point is that there are applications/areas/requirements for which C/C++ are still well-suited (as in: there are no better pragmatic options).

This is not an absolute statement or valuation.

Slow software sucks

The point is that C/C++ have a number of serious flaws, all of which we know how to correct, essentially without technical cost. The only real reason we put up with those flaws is because so much computing infrastructure has been developed to depend on C, which makes replacing it impractical. What I'm objecting to is the idea that a language with such flaws is somehow technically necessary at some level in a system, or for some domains. It's not.

This is true. The world would still run just fine if 80% of the C/C++ programs were written in <insert favorite language here>. On the other hand, I regularly have to use programs written in Java, and I can say with confidence that Java is noticeably slow. I haven't had the pleasure of using large programs written in an FP (at least that I am aware of), so I can't say how fast or slow those are, on average. But I will say that *writing* Java is much more enjoyable than *running* it. And I would wager that most of Java's slowness comes from its enthusiasm for heap allocation and its GC. Which leads me to conclude that if FP programs were as common as Java programs, they would exhibit comparable performance.

While C++ does truly have many flaws, it has one thing that makes it a speed demon by default: stack allocation. It surely has not cornered the market on this form of storage, but because stack allocation is idiomatic C++, equivalent Java programs tend to be much slower than their C++ counterparts. You can preach all day long about how GC can be faster than manual heap allocation in certain scenarios, but it will never be faster than stack allocation.

Granted, C++ programs still go to the heap quite often. And there are times when GC compares favorably or even better than malloc()/free(). But devices like smart pointers can offer deterministic storage lifetimes along with GC-like code safety. All this adds up to performance that more or less comes for free. You don't even have to touch a custom-written allocator for these benefits. And that's why most C/C++ programs I run across are pleasantly fast, and most Java programs I see are obviously written in Java.

Right now, a JVM is taking 145 MB of system RAM (for Eclipse). The next piggyest process is only taking 43 MB, and that's because it is rendering a 768 page PDF. So I don't have any objection to wider use of FP langs. But if they all produce executables like Java does, I'll stick with C++, thank you very much.

Good News!

David B. Held: And I would wager that most of Java's slowness comes from its enthusiasm for heap allocation and its GC. Which leads me to conclude that if FP programs were as common as Java programs, they would exhibit comparable performance...So I don't have any objection to wider use of FP langs. But if they all produce executables like Java does, I'll stick with C++, thank you very much.

You're in luck! Your assumptions are erroneous and, partially as a consequence, no modern functional language implementation with which I'm familiar (SML/NJ, PolyML, MLton, O'Caml, GHC) produces code that, on average, is anywhere near as poor in CPU resource utilization as Java. I'll go farther and say that you would have to invest a fair amount of effort in getting the native compilers of any of these systems to produce code as slow as even JITted Java, and keep in mind that each and every one of these languages is garbage-collected.

The issues in why these implementations, and Java, perform as they do are complex and subtle. I therefore recommend that you pick one of them and work through whatever tutorials you can find (O'Caml has several, including some very good book-length ones that are freely available), and find out how they actually work. Particularly if your point of comparison is Java, I'm sure you'll be pleasantly surprised. Since your preferred point of comparison is C++, for competitive runtime performance (and I mean truly competitive, i.e. some software will be faster in one language, and some will be faster in the other language), let me recommend O'Caml or MLton.

My Experiences

In my experience Java on the client sucks, and in most cases I lay the blame for that on Swing. Now Eclipse doesn't use Swing, so perhaps we need another excuse for it. You claim lack stack allocation is the reason. I think it is more subtle than that. A Java object takes up at least something like 16 bytes (I forget the exact number) without even taking account of the space needed to store its data. You'll find that a piece of data in, say, ML takes up maybe 4 bytes more than the size of the data, and less if the compiler unboxes it (which MLTon, at least, does aggressively). The reason for this is two fold: firstly data is simpler. Data doesn't need to keep a link to a vtable, or identify it's type (in statically typed type erasing implementations), or have the other overhead that Java requires. Secondly, the semantics are simpler making compiler optimisations easier.

Who said anything about slow?

This is true. The world would still run just fine if 80% of the C/C++ programs were written in <insert favorite language here>.

My claim is stronger than that: the world would run better if 100% of the C/C++ programs were written in better languages. The current status quo is nothing but a legacy problem, we rely on unsafe and semantically-challenged languages for historical reasons, not technical ones.

This isn't a claim about FP languages in particular, or the superiority of automatic GC, or anyone's favorite language. It's merely pointing out that we don't have to accept the egregious problems in C/C++ just to get the performance and other qualities we need.

Looking at the real "situation on the ground" right now, as Paul and Noel have pointed out, existing FP languages have already proven that they can compete with C/C++ on performance, even with automatic GC.

Regarding GC, if you truly have an application for which traditional automatic GC is unacceptable, there are alternatives like region-based GC (see e.g. MLKit for an implementation of this). But even if you have to resort to manual memory management, it can still be done more safely than C/C++ do it. While C++ has provided a number of tools that allow you to work around its problems, they don't solve them.

So I don't have any objection to wider use of FP langs. But if they all produce executables like Java does, I'll stick with C++, thank you very much.

Java is not an FP language, not even close. While I understand the logic that's led you to connect GC and heap allocation to Java's performance issues, all you have to do is look to real FP languages out there to find out that your conclusion is wrong.

"It's the Economy, stupid!"

My claim is stronger than that: the world would run better if 100% of the C/C++ programs were written in better languages. The current status quo is nothing but a legacy problem, we rely on unsafe and semantically-challenged languages for historical reasons, not technical ones.

This isn't a claim about FP languages in particular, or the superiority of automatic GC, or anyone's favorite language. It's merely pointing out that we don't have to accept the egregious problems in C/C++ just to get the performance and other qualities we need.

But the whole point is that we *do* have to accept those problems exactly because C/C++ *does* run so much of the world's software. The dirty secret is that if Java is currently the most popular language in terms of number of programmers, it is not so by a large margin, and it is more realistic to believe the reports that they tend to trade places over time. That ubiquity is exactly why we do not see large projects written in *ML, Haskell, Scheme, (insert favorite "pure" language here). Nobody is going to spend money commercially to develop something that does not give a quick ROI compared to the entrenched languages.

When you compare the C++ toolchain to your average FP toolchain, there is no comparison. C++ offers far more choice of tools, libraries, and support vendors. So from a purely business perspective, FP as a paradigm has to either overcome a huge hurdle of entrenched imperativism, or it has to differentiate itself into a niche market and then evolve into the broader market like Java. Java started out as applets, plain and simple. FP needs to find its "applet". Until it does, people will be more or less forced to choose the C++/Java toolchains to achieve the support and predictability demanded by a fast-moving economy.

It's neither a technical nor an historical problem. It's an economic problem.

So, I agree that the world would run better if software were written in language X, for some suitably agreeable value of X, with the caveat that *X is the entrenched language*. Since the agreeable values of X do not currently satisfy the caveat, I must disagree.

Separating arguments

But the whole point is that we *do* have to accept those problems exactly because C/C++ *does* run so much of the world's software.

Yes, I've pointed that out several times, in different words. However, my whole point is that there's a strong tendency amongst people who use C/C++, and haven't been exposed to significantly better, but equally capable alternatives (i.e. Java doesn't count), to believe that things are the way they are for technical reasons related to their technical requirements. I'm pointing out that this is not the case.

somewhat puzzled

What does that mean proven itself?
Are you saying that (arbitrary language) Ada hasn't proven itself?

Surely not, I was using as we

Surely not, I was using as well to avoid your exact interpretation (is this a language problem? should I have used also or too?).

language is difficult

It would have been clear if you'd been explicit: "Please don't forget that C has proven itself as well as Ada" (as well as any other language...)