CaSe SenSitIviTy! What is its purpose in programming language syntax?

I grew up with Pascal. It's been my language of choice for doing mostly everything. Everytime I try and switch to other (case sensitive) languages, specifically C++/C#, I am incredibly put off by the case sensitivity of the syntax.

What especially gets me, especially from a readability point of view, are things like variable declarations in the following fashion:


TD_SOMETYPE td_sometype;

That's the most braindead thing I've ever seen, naming a variable after the type, except it's unique because the character case is different....

Not only does this kill readability, but the very nature of the case sensitive syntax means that you're constantly having to think about variable names, instead of just using the bloody things in that activity otherwise known as programming.

And don't get me started on the debugging headaches it causes simply because you typed "S" instead of "s", somewhere. To me, it makes the coding process needlessly complicated.

The ordinal value of a character shouldn't change its meaning except if it specifically occurs as data. Of course, the source code of a program IS data, but only for use by the compiler. Why should we as programmers have to suffer just to keep the compiler happy?

Somebody, please give me some good solid reasons why case-sensitivity is useful. M$ had a golden opportunity with C#, yet they kept it case-sensitive. WHY? Surely it wasn't to support existing code bases, was it?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Case sensitivity

Well, case-insensitivity means making the decision that certain characters, from a to z, are to be deemed equivalent to certain other characters, those from A to Z. You could call this an arbitrary and unnecessary complication.

I personally don't hold stron

I personally don't hold strong feelings either way.

Case sensitivity does seem to fly in the face of hundreds of years of written conventions. We may wince at alice instead of Alice, but the meaning remains the same.

On the flip side, the main advantage is that often saves the effort of having to think of a name for a variable.

Good code comprises small functions, so the meaning of something like:

TypeSignature typeSignature;

is perfectly clear.

Naming conventions

Having case sensitive identifiers can make code more readable, through the use of naming conventions. The obvious example from C is using ALL_CAPS for the names of macros and #defined constants. This avoids confusion with functions and variables, which otherwise look the same. Another example would be naming typedefs and classes with an initial capital, and using an initial lower case letter for variables, functions, struct fields, and methods.

I agree that 'TD_SOMETYPE td_sometype;' is pretty stupid, but that is example of a programmer being careless about choice of names, rather than a defect in the language.

As for coding errors where you type a capital instead of a lower case letter, the compiler will pick these up and show you where the error is. I cannot recall a single case where a typo caused a programme to be compiled with a bug, rather than the compiler bombing out with an error message.

It is also worth noting that modern human language writing systems based on Latin/Greek/Cyrillic alphabets use case sensitivity to enhance meaning or structure, e.g., an initial cap to mark the start of a sentence or indicate a proper name, and all caps for acronyms. Of course, errors in usage of case do not usually cause human language writing to be meaningless, but humans are more flexible than compilers.

Not being familiar with Pascal, I would like to know what advantages case-insensitivity brings: e.g. things which you can say more clearly with a case-insensitive alphabet.

Readability

"Having case sensitive identifiers can make code more readable, through the use of naming conventions."

Programs, perhaps even more than most prose, are "write once, read many times". Independent of naming conventions, the consistency enforced by case sensitive programming languages adds to their readability, because the same entity is always written the same way.

I suppose a rule contributing more strongly to readability would be to have case sensitivity, but also to disallow any identifiers in a given namespace from differing only by case. Anyone know of any languages with that sort of rule?

Not a language, but still

I suppose a rule contributing more strongly to readability would be to have case sensitivity, but also to disallow any identifiers in a given namespace from differing only by case. Anyone know of any languages with that sort of rule?

Sounds like Windows file system (is it NTFS I have on my XP?).

Visual Studio

NTFS is not case sensitive and yes that is what you have on your XP 'puter.

Visual Studio with VB.NET, I know will correct the case of a variable based on prior usage.
I don't know if there is similar functionality for C# (but I am pleased to be finding out soon).
So maybe the best case does exist, but I doubt it.

I hate sloppy coders and am very meticulous with the formatting of my code. I don't mind being reminded when I mistype a variable name. If you are strugling with case senitivity, I suggest you be sure to use a good IDE. Color coding is invaluable for me when I am trying to remember a php function name. Code completion is great, except for when it is annoying (depends on my mood I guess).

It should also be mentioned t

It should also be mentioned that there exists alot of languages with diffrent types of identifiers, seperated by the case of the first letter.

Simplicity?

I think that case sensitivity is just used because it's simpler for the compiler and that it doesn't matter..

You're missing case insensitivity only because you're used to.

Case insensitive code is ugly

Careless use of case insensitive names looks horrible:

var WIdget = 1;
VAR OtHerWIDget = widgeT;

No one writes code like that, you always use some case convention. What use does a case insensitive language have if you already write your code in an (almost) case sensitive way?

Case sensitity does not preve

Case sensitity does not prevent you from creating UgLY CodE, imo. I agree that consistent conventions are already a sort of case sensitivity, but ultimate my point is that we are humans, and not machines. The code should serve us, not the other way around.

To my way of of thinking, program code is supposed to enable a human-centric way of representing a solution to a computing problem. Enforcing case sensitivity does not increase the power of the expressiveness of the language, instead (for me) it becomes a stumbling block, albeit minor, to the successful solution to the problem. To me, it's a step backwards towards assembly language.

My biggest gripe is still that it allows the TYPE type style of declaration, like HWND HWnd. God, that irritates me to no end, and it appears perfectly acceptable to many C/C++ programmers.

I appreciate accuracy and consistency in program code as much as the next guy, but when the compiler is being anal about the case of a letter, it takes away from the expression of the problem and instead forces you to focus on the vagaries of the tool.

When you use a screwdriver, you shouldn't have to be worried about the intracies of the tool itself - the tool should allow you to solve the problem at hand without creating problems that are due to the design of the tool itself.

Imagine a left-handed screw-driver.... luckily screwdrivers are case-ignorant :-p

My biggest gripe is still tha

My biggest gripe is still that it allows the TYPE type style of declaration, like HWND HWnd. God, that irritates me to no end, and it appears perfectly acceptable to many C/C++ programmers.

What's your problem with that?
I mean, it depends on the context, but if I have a function like:
void SetWindowTitle(HWND hWnd, const wchar_t* title);
What's wrong with that?
I think that's similar to naming a counter variable 'i': If the meaning is clear from the context, what's wrong with it? Calling it "iCounter" doesn't really make it more readable.

ATHING theThing

To me, C-style declarations can be read as "reserve some storage for one of these; and label that reserved storage with this identifier for future reference." It's as if there's an implicit indefinite article associated with the type, and a definite article with the variable name: so declaring "FOO foo" to me seems similar to allowing anaphoric reference to just "the otherwise-anonymous FOO that I built earlier". Strong medicine, and, as in natural language, it should not be used unless the role of the anonymous entity is absolutely unambiguous.

Anecdote about he perils

I used to believe case sensitivity was a mistake, until I did this in the case insensitive language PL/SQL (syntax now entierly forgotten):
function IsValidUserLogin(user:string, password :string):bool begin
   result = select * from USERS
            where USER_NAME=user and PASSWORD=password;
   return not is_empty(result);
end
This passed unnoticed for several months on a low-volume production system, and no harm came of it. But it is a nasty bug, sprung from case insensitivity, coding conventions, and the way humans read code. The lesson for me was that:
Things that are the same should look the same.

The GNU Ada compiler has a nice option to warn about case mismatches in an otherwise case insensitive language.

What's the bug?

I'm suspecting its the PASSWORD=password clause in the SQLish query, but I don't know PL/SQL well enough to know whether PASSWORD is interpreted as a variable name (same as password) due to the two names matching each other (modulo case); or whether the left side of a = in the query string is parsed differently than the right.

It appears to me that the pro

It appears to me that the problem is not really the case-ignorance of the language per se, but because the language allows you to inline the query members in a typeless fashion (even when you have "identical" params) instead of for example putting the whole query into a string type and executing the string type.

In my opinion the compiler should have at least generated a warning on the duplicate name, because ultimately it's a logic error.

I don't fully understand...

There is nothing typeless about that piece of code, PL/SQL is statically typed, as the code is compiled and executed inside the DB it's fully statically checked against the schemas of the relevant tables. SQL queries are as much part of the syntax as any other entity in PL/SQL, in fact, the language is a procedurally extended SQL.

I don't fully understand your final comment, so pick your answer, but is any of the comments below appropos? If not, please correct me.

This tautology would be obvious to a computer, but I think it's not too hard to imagine a similar scenario where the computer couldn't help you.

The problem here is combination of case insensitivity and shadowing, disallowing the second would obviously have prevented this particular problem as well as introducing case insensitivity.

But I really believe both of these 'fixes' ignores the fundamental issue, that things that are the same should look the same, because we might rely as much on the 'shape' of the words as the actual letters of them when reading. But then, I also find human language/computer language rules such as names/types should always be capitilized sensible and helpful as well. They irk me on a logical level, but work for me on a psychological level.

I refer to the "typeless fash

I refer to the "typeless fashion" of the language according to the way that it allows you to ambigously use field names as if they were identifiers, without any specific type. The problem would have been avoided if the SQL field identifiers were typed, for example:

SELECT ... WHERE "PASSWORD"=password,

That would remove the ambiguity (a field name is a string, afterall...).

The compiler/interpreter should be able to determine that the field identifier following the WHERE clause creates an ambiguity within the parameter list, and should issue a warning to that effect.

But in PL/SQL they are, basic

But in PL/SQL they are, basically, ordinary identifiers, and the FROM syntax is just another way of introducing a scope, like a function declaration. The dangerous part is probably that the declared identifiers introduced are, as you say, implicit rather than explicit, much like in Deplphi's (?) fabled WITH syntax. Which is well known to be dangerous.

An analogous example in a language without these kinds of expressions take stranger coding conventions, but still work.

function ValidUserLogin(USER :String, PASSWORD :String) begin
    password :String = PasswordOfUser(PASSWORD);
    return PASSWORD == password;
end

Unicode, generality

Case isn't general in Unicode, so each parser/compiler has to define its own rules. (Eszett is a silly but common example of the need for this.) Since this would result in endless variation and confusion, case sensitivity is easier.

Also, a case-sensitive program will run regardless of compiler/interpreter case sensitivity. This is not true of a case-insensitive program.

Not true

Also, a case-sensitive program will run regardless of compiler/interpreter case sensitivity. This is not true of a case-insensitive program.
Here is a case-sensitive program that does not run in case-insensitive implementation of scheme:
(define E
  (lambda (x e)
    (if (pair? x) (E (car x) (E (cdr x) e)) (+ e 2))))
The only way to be safe is to rely on neither case-sensitivity or insensitivity. And that's not the same as being case-sensitive.

Aziz,,,

CORBA IDL

while not normally a source of inspiration for PL design, had the following policy:

* IDL files are case-sensitive: FOO is a different identifier than foo or Foo.

* Symbols may NOT differ only by case. If FOO is a valid identifier in a given scope; it is an error to introduce foo or Foo as identifiers.

This policy was probably necessary to deal with the numerous target languages, some case-sensitive and some not. Code which has THIS property (cases are used consistently, and no two symbols which differ only by case) may be safely used in either environment.

In many languages, case-folding is nontrivial

In many natural languages, case-folding is nontrivial or otherwise "icky". This is, IMHO, the main argument for case sensitivity.

Beyond that, I don't care much either way--arguments about whether case sensitivity or the lack thereof are better often remind me of arguments over endianality, tab settings in editors, and where to put the curly brace.

In other words--it's often important to HAVE a standard; but what that standard is probably doesn't matter much.

EDIT: Changed languages in the first sentence to natural languages--which is what I meant (as opposed to programming languages, which can define their case rules as they see fit).

history, purpose

Pascal was created to be an introduction language. [As such, it is better than C, C++, and Java, but it is inferior in most other respects.] As a noob, you may have been please to see that a variable named George is the same as a variable named george -- fewer "typos" means fewer bugs, right?

However, decades ago, before Pascal, when you /felt/ the size of your compiler/interpreter, why waste space teaching it to treat different characters as the same character?
(char)97 != (char)65
Variable names can be shorter, less descriptive, more 1337, easier to type. Code lines can be shorter, to fit in the 80-char terminal without ugly wrapping.
And conventions are as good an idea as namespaces.

My suggestion: get used to case-sensitivity.

Since when does case sensitiv

Since when does case sensitivity result in shorter variable names?

Oh, and because I'm a Pascal programmer by and large, I'm a "noob"?

Here's a clue. Catch.

Need case enforcement

I think any case-ignorant language should require the identifier uses to match the case of the identifier definition. That would eliminate one of the problems cited above. One disadvantage of case-ignorance is that you'd have to start separating words with underscores to make sure they never clash. For example: "PerSon" and "Person" (yeah, I need to find a more compelling example of this).

Case-sensitivity isn't all that bad. The inconsistent capitalization is just because the C language doesn't have different syntactic contexts for type names and variable names. The "Var : Type" syntax used by Pascal and other programming languages allows this (though I forgot whether Pascal actually does distinguish between type and variable identifiers). Besides, the uppercase/lowercase convention isn't universal; there's also the "_t" suffix.

Case-ignorance with case enforcement is, in practice, almost the same as case-sensitivity with distinct namespaces. The only remaining difference is that a case-ignorant system can't distinguish between "Person" and "PerSon", which is enough to put me in the case-sensitivity camp. But this disadvantage would be inconsequential to those who use underscores to separate the parts of an identifier.

The uses of case sensitivity

Your personal preferences are not necessarily shared by the rest of humanity. Case sensitivity is useful for coding conventions such as Taligent's or Symbian's. Anders Hejlsberg, the Chief Architect of C#, also designed Turbo Pascal and Delphi so it's hard to argue that he was unaware of the Real Ultimate Power of case-insensitivity.

I try and follow the Taligent

I try and follow the Taligent method wherever I can, but ultimately it's a convention for clarity to humans, not the compiler. It has nothing to do with whether the language is case sensitive or not, afaic.

I think both systems are too heavy.

I use a very simple coding convention:

1) Class identifiers start with a capital and contain no underscores. Words are separated by capital letters.

2) local variables and methods are like class identifiers but with the first letter in lower case.

3) member fields are like local variables prefixed with 'm_'.

4) constants are in upper case with words separated by underscores.

Example:

class Foo {
    void method(int data) {
    }
    int m_data;
    static const MAX_SIZE = 3;
};

We used to have hungarian notation in our company, but we ditched it because it is very difficult to consistently follow it, and it gets in your way when you are thinking about the problem at hand.

Case is often meaningful

If you transliterate physical equations to programs, you'll note that case is very important in standard physical notation or much of mathematics. For example, in thermodynamics, there are always kT terms hanging around, where k is Boltzmann's constant and T is a temperature. Substituting values, you often see something like:

k (273.16 K)

where the capital K means Kelvin. You certainly don't want to get these mixed up, any more than you want to get mW (milliwatts) or MW (megawatts) mixed up. And note all the times in mathematical expressions that you'll, say, have a small "t" and a capital "T" in the same equation.

Inventing new names for these variables just to get around your language's case-insensitivity adds a potential source of error.

In my programming language Frink, I try to follow correct orthography for units of measure, including the style conventions for the SI. This makes it easier to write and understand the program--it simply follows normal notation as closely as possible. If you follow normal spelling, capitalization, and mathematical rules of precedence, Frink will usually understand it perfectly.

And, as others have mentioned, proper case insensitivity for more than one language can be hard. See some notes from the Frink documentation about casing.

I come from a math background

I come from a math background, and I really want to be able to write

for n:=1 to N do ... od

I also come from a real-world background, and I always thought China and china refer to different things:

enum countries { India, China }
enum products { tea, china }

Finally, I have been on IRC #math channels forever, and I observe that most students who write like "how do u solve U^2 - 2u + 1 = 0? NE1?" are also sloppy minds.

Don't do that

The answer to your concern is don't do that. Sure in Pascal PerSon and pERsON are the same thing, but I hope you would never mix case like that, even though it is legal.

Likewise in C I can have Person and person as separate variables, but in practice we just don't do that. In fact I have some code with two variables that could both logically be named person, and I spent some time to come up with variable names that would be descriptive (essentially say person) and yet be different by more than case. Even though I know that I'm allowed to differ variables by case only, I never considered it.

If you know the language at all you can look at FOO foo = bar; and know instantly that FOO is the type, and foo is the variable - though in practice I discourage such notation, it isn't confusing because those are not two different variables, they are a type and a variable. Even still I won't do it. (Except for HWND hwnd, where convention of everyone doing is more important than code purity)

Case Sensitivity and Unicode

Michael Greenberg is almost certainly correct in pointing to Unicode as the main reason for the survival of case sensitivity in modern programming languages. Every programming language is a compromise between various competing forces and, at the present, internationalisation via Unicode is one of those forces. Unfortunately this can significantly complicate the programmer's job.

For example, Java allows identifiers to be in Unicode. Harold Thimbleby pointed out that this means that it would be possible to have two distinct variables named 'A', one from the Latin alphabet and one in the Greek. This intoduces a completely new class of programming error which can only be debugged by looking at a hex dump of the program source code! (Alright, you could go through the source code and delete and retype all the occurrences of that variable name, but that is equally silly.)

The Unicode standard does address

issues of using Unicode for programming language identifiers: see Unicode Standard Annex #31 (rev 5), part of the Unicode 4.1 standard.

The Annex includes recommendations for case folding and case senstitivity, as well as other thorny Unicode issues (such as normalization). It doesn't make a recommendation for or against case sensitivity, though case folding--as noted above--is tricky in Unicode.

It doesn't address the issue of different code points (with different functions) being represented with visually similar glyphs (an issue which goes beyond letters--consider the large number of slashes, quotes, dashes, and parens defined in Unicode); though Unicode Technical Report 36 (which deals with script spoofing and numerous other security related issues) does deal with this (though from a security perspective).

Not all case insensitive languages

Not all case insensitive languages are seen as toy or learning languages. Lisp for instance capitalizes all symbols unless they are defned with a special syntax or READ is altered. That makes it effectively case insensitive since the almost all symbols are case folded.

Case sensitivity is just a bit of semantics that you get used to, sort of like python's whitespace/block system.

Come to think of it, the requ

Come to think of it, the requirement for compatibility with the expectations of the many millions of existing C and C++ programmers was obviously a much more significant force for case sensitivity. I was trying to get the idea over that a language may well be sub-optimal when viewed from any individual perspectives but can still be optimal when viewed as a whole. Case sensitivity is one of these perspectives.

Being an old Pascal programmer, my natural leaning is towards case insensitivity, however, as I get older I am tending to regard syntactic issues as irrelevant noise which just gets in the way of understanding what the program actually does (maybe I am transforming into a Lisp programmer!).

I dunno

A class named "Apple" describes what it is. A variable named "apple" then makes perfect sense:

Fruit apple = new Apple();
Person person = new Person();

I do it all the time...am I bad?

If you stick to that naming convention, you'll be fine

But what happens (in Java) if you declare two identifiers at the same scope, which differ only in case? I don't think the language itself cares, but given that package and class names map onto the filesystem of the underlying OS, and certain OS's (i.e. Windows) have case-insensitive filesystems, what happens?

Again, it's the naming conventions...

That's why you have the naming conventions (well, among other reasons). If you stick to them, you'll only use CamelCase with classes, and you'll never have a class named Camelcase or camelcase or CAMELCASE anyway. Consistent rules for abbreviations help a lot here too.

But you're right, it can lead to big problems if you don't stick to conventions. I once had a compiler design class where the professor gave us two classes, Exp and EXP, both in the same package. They represented the same thing, but at two different phases in the compilation process. (Yeah, they should've been in different packages, but professors are lazy.) I had to rename the class and all usages before I could get it to compile on Windows (and actually, the copy overwrote the original class, so the first indication was a missing method on the class).

In practice, I've found this is only a problem in school, where the professors are too lazy and the students too arrogant to stick to naming conventions. In real programming jobs, you learn very quickly that you stick to the conventions or die. Your coworkers tend to make sure of it, if your deadlines don't. ;-)

Both solutions to the same problems

I think there are other problems here (for instance some languages allow the exact same name to mean different things depending on usage (I believe Lisp has this as a legacy)). But, in the end I think case sensitivity and case insensitivity are divergent solutions to the same problems: confusion about names is bad, and case can help or hurt.

1. Case sensitivity avoids the problem of misreading two uses of the same name that differ by case (either on purpose or on accident). The tradeoff is that without smart editor support the user gets confronted with an aggravating and time consuming error at compile time if a typo occurs (was that class called UrlEncoder or URLEncoder?). And while it (may) allow users to reuse names and some coding conventions might like this, it doesn't prevent someone from sowing the seeds of confusion by declaring (purposely or by accident) two names that differ subtly by case (i,e, if there are both a URLEncoder and a UrlEncoder).

2. Case insensitivity eliminates this masking, and it mitigates the effects of typos - I believe there are some papers that discuss the value of this with respect to command line interfaces and making things easier in general. But (in absence of a smart editor or compiler) it doesn't actually force typos to be fixed and this affects maintainability.

Which of course leads to what smart compilers and editors do - try to help users fix these problems early and maybe even enforce naming constraints (essentially a union of the two approaches).

Dave Christianson

Comparisons with other systems

English has use for case when reading, but has no aid to memorising case - that is, I can visually see that "XF86Config" is not the same as "xf86config", but there is no easy way to remember which is which, or to verbally discuss them, except "The ecks-eff-eight-six-config with the capital x, f and c".

Yes, when I was beginning Java, I also found it infuriating to see:

SomeType someType = new SomeType();

and not understand what was happening - at least partly because when reading I can't pronounce case differences, so it reads as "sometype sometype equals new sometype".

This is possibly what leads to my annoyance when I think "you-are-ell encoder", write "URLEncoder", and then find it should have been UrlEncoder with this library.

However, I would rather a language let me use differing case where I feel that it's an acceptable way of expressing something, than for it to force me into having "Apple objApple" or some other stilted and awkward workaround.