In Search of the Ideal Programming Language

The ever-enticing search for the ideal programming language produced this 1997 article from Sergey Polak. Although somewhat dated, I liked the article's comments about strings:

The discussion of arrays also brings to mind the subject of strings. No matter what anyone says, it is my firm belief that any language, regardless of its purpose, must have a powerful and flexible string-handling facility built-in. A program is very rare if it has no need for string handling, and I myself have had to write a great deal of programs, both at work and for my own uses, that depended heavily on strings. Some languages put strings in as an afterthought, and others put in some very basic features and leave the rest to library routines. That just can not be. The text string is a fundamentally important data type and can not be ignored, nor can it be relegated to blatant impersonation by some other type, such as array of characters. A string data type is required in a good language.

The very popular language C, and C++ as well, have horrendous string-handling facilities. Not only is the programmer required to declare his strings as character arrays, but there simply is no way to deal with strings as entities in the language.

Ouch. So true. That is not to endorse the specific string implementation recommendation from the article. (I have previously commented about implementation ideas, including communication buffers.)

Do the man a favor and save the article to disk for offline reading so as to minimize his bandwidth hits.

P.S. You're welcome, Ehud. I'll now be Internet-disabled for a week.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.


One of my favorite subjects.

As far as I can tell no language gets this right. I like snobol and icon, but their approach is too heavy handed for a general purpose language, and too obscure for most purposes.

Don't get me started on C, Pascal, Ada and ilk.

Who uses character arrays for

Who uses character arrays for strings in C++ anymore? That's what std::string is for.

Verbing weirds language.

No matter what anyone says, it is my firm belief that

Is there any point in arguing with a person who says this?

any language, regardless of its purpose, must have a powerful and flexible string-handling facility built-in.

This misses the point. Sure, we want a "powerful and flexible string-handling facility", but if the language is powerful enough to define abstractions which behave as if they were built in, it is better not to build them in, but rather to put them in a library.

The people who want this and that built in are the people who do not understand the meaning of "compositional", and cannot envision a programming language better than the ones they have used.

Sorry to disappoint you

We're not talking about "this and that," but the very fundamental things called strings. Every language has some built-ins. Many (most?) include strings as built-ins, and for those that don't, I consider it folly to build in dozens of numeric types (signed and unsigned byte/short/long/double-long, float, double, extended, complex float, complex double, complex extended, etc., etc., etc., etc.) while relegating strings to the libraries.

Where do you draw the line, Frank? Why not drop all the way down to a language that only shuffles bits around - since all data is composed of bits - and puts everything else in libraries? If anything merits built-in status, it has got to be strings.

I don't understand what Ehud means by heavy-handed. I have yet to find Icon's equal, and std::string is not it.

I am not a crook!

Where do you draw the line, Frank?

That's easy: if it's definable and efficient, I leave it for the libraries; if not, I make it primitive.

Why not drop all the way down to a language that only shuffles bits around - since all data is composed of bits - and puts everything else in libraries?

I would if I could! Indeed, I have considered it: a functional language with a single base type bool.

But all data is not "composed of bits". All data is representable using sequences of bits. You can also represent all data using sequences of characters. Or as natural numbers—no sequences needed. Or as trees, etc.

However, even for any one of these, the choice of representation is extremely arbitrary. For example, a representation as sequences of characters amounts to a parser for a language of strings, and obviously there is no best choice. A representation in the naturals is similar to a Gödel encoding. What you really want is to abstract away from this arbitrariness, while still supporting each possible representation. And that is the point of my research.

Case sensitivity and other atrocities

Some other issues from the article:

On case sensitivity:
"This is something I am not allowed to do when I program in C, C++ or Java. If I need to use the if construct, I must type it in lower case if I want the program to compile. This restriction certainly makes the job of writing the compiler easier by about one iota, but accomplishes nothing....A good programming language will not be case-sensitive."

A bit of sarcasm:
"Luckily this has been put into a convenient function that goes by the name strcpy() (the "o" must have been omitted for the sake of efficiency)."

(AFAIK, The 'o' is missing because early linkers used only the first 6 characters of symbols.)

On array indexing:
"What makes matters much worse is that in C, and subsequently in C++, all array subscripts begin at 0. If the application calls for an array with subscripts ranging from -15 to 134, the programmer must write his own code to do the necessary arithmetic."

On the putrid symbols "{" and "}":
The words BEGIN and END stand out, jump at the reader from the page, indicating a block, whereas the tiny {} braces can get lost in volumes of code, even with proper indentation and syntax highlighting.

The impression I get is that the author has decided that the ideal programming language already exists, and it is called "Pascal."

Frank is mostly right

Strictly text strings are not essential. A string is an array of some type. Usually monomorphic. Usually of variable length. Most languages support such structures.

What makes them text entities is simply an external interpretation of their component values as characters or glyphs. And an externally defined interpretation. ASCII, Unicode. Sure, support Unicode semantics, but clearly with a library.

The fact that array support in many languages including C is very primitive should be a completely orthogonal issue.

Fix the array system, and you get perfect string support.

Not always so

I'd agree with Marius that you're pretty doomed if the language does arrays badly.

One of the points that I don't think is stressed enough with strings though is that there are several representations needed depending on the problem.

Array based or null terminated strings are good for simple work. Ropes are very useful for problems that involve lots of string mashing, e.g. sgi STL std::rope. Interned or managed strings if comparisons are the order of the day, a la java.lang.String.

I find the main constraint being system library calls that require one form of string or another, instead of an iterator or some other abstraction. There are separate compilation and instantiation concerns though.