switch statement design

Recently there were a few discussion on some MS blogs about the design of the "switch" statement. I'm curios what readers of LtU have to say about it.

See: Links to blog discussions about "switch".

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Divide and prove

Having read several papers on dependent types, I now look at all branching instructions as dividing problem with "big" type into subproblems with "smaller" types. For example:
// takes string of any length
X blah(String s)
{
  return s.length() > 10 ? doh(s) : dooh(s);
}

// takes string of length > 10
X doh(String s) {...}

// takes string of length <= 10
X dooh(String s) {...}

Of course, in PLs with more advanced type systems the statements in comments become part of the types of the functions.

Any "smart" use of C++ switch can be seen as a split of one type into disjoint union of other types. This requires runtime type check ("predicate"), but gives power and uniformity far beyond those of C++. Basically, any branching can be expressed by casting.

So my answer to the question "How would you like your switch?" is "à la Epigram".

PS: a quiz for those who didn't read the blogs about "switch": what language is praised in the following quote?

It's the first language I've used that is, for the most part, consistently designed from start to finish - everything makes sense, has a purpose, and coaxes the developer toward writing bug-free code.

Stories are converging

Well, some people said this better then I:
As Filinski has pointed out [5], a product type in continuation space is equivalent to a sum type in value space. For example, we can regard the %if function as being the converter between these two forms for the boolean sum type.
This quote is from Multi-Return Function Call, discussed on LtU. The same goes for switch, I guess.

BTW, Filinski's Master thesis was already discussed on LtU, as well. It's really interesting how different stories intertwine.

After accepting the notion of "sum values are equivalent to product continuations", one needs just to buy usefulness of Wadler's "views" to slice (algebraic) types arbitrarily as opposed to just by their constructors, and here you have another (happy) convert to the Epigramic way :-)

Is the question about break?

Just glancing at the links, the source of the irritant seems to be the C based break required to prevent falling through on the switch statement. I'd prefer a Pascal sort of construction myself, with each alternative being a block (syntactic sugar for a multiple if then else construct). Fallthrough is rare and more times than not it is an accident.

Be that as it may, the C switch construct is what it is, and can be used to get the correct program flow. A larger question from Java is why the case statements are limited to primitive integers? Why can't you branch on other types of values? Why can't you branch based on a range a values? And finally, why can't you branch based on conditions that don't represent a discrete value?

Of course, the OO programmers will tell you that a switch statement of any sorts, especially the longish ones, is a good indication that there is a type (object) waiting to be factored. That's all well and good, and switch statements should be looked at to determine the proper factoring. But software, at least of the Extreme kind, is an iterative process that does not require everything be exactly factored along proper boundaries. Do the least to get the software functioning correctly in the smallest amount of time.

Of course, it's probably telling that Smalltalk (from which XP arose) does not have a native Switch/Case construct. :-)

Tradeoffs

The C switch statement is limited to integers because it's a tradeoff. It allows the statement to be optimized into a jump table at the ASM level, and thus run in constant time (whereas a large if/else if construct has a linear worst-case time).

A language could choose to allow arbitrary data in switch cases, but the resulting machine code would run no better than an if/else if construct.

There's a good discussion on the C switch, with example C and ASM output, over on PerlMonks.org.

Doesn't convince me

A compiler should be able to do the static analysis on a statically typed language to emit different code depending on what type is being switched on.

Which could then open the door for writing custom optimizations for different types with a general, slow, but correct, way otherwise.

Not just efficiency

This misses the point, though, that part of C's closeness to the machine depends on each language construct having a single underlying machine implementation. It would be harder to write fast or (more precisely) predictable C code if what you got from a switch was so dramatically at the whim of your sufficiently smart compiler.

It already is a whim ...

Most optimizing compilers will compile even an integer-only switch into jump tables or chains of if-else, or combinations thereof, depending on how sparse the cases are.

In the past, sparse switches

In the past, sparse switches were often handled by just doing a sequence of subtractions of constants until zero was found (not much more efficient than a chain of "if-else").

Modern compilers use algorithms such as binary search, so you can rely on switches being pretty efficient, even for sparse cases.

ISTR reading that Hoare argued in favor of a C-switch like construct in a meeting (predating C) discussing Algol design. The basis of the argument was very much from a compiler implementation perspective (at the time, making language constructs simple to implement correctly and efficiently was a priority), which is probably also the rationale for the way switch is in C.

Does this predictability argu

Does this predictability argument really apply to modern C compilers with having to handle quite complicated pipeline and caching issues? In other words, there are many other places where C code is already quite unpredictable. Since the solution of both problems is profiling, I can't see the problem.

m[ia]cro-tweaking

As with all things, it is a matter of degree. I guess I would consider the cases you mention (insn reordering) "micro-tweaks", in that they are local transformations that don't have any real reflection in the source language. While compilers do higher-level things with no associated source, like unrolling, blocking, and vectorizing loops, these optimizations are at least controlled by flags (and are often done explicitly in libraries). Sure, profiling is good and necessary for tuning a large app, but in C you typically have a decent idea what the compiled result will be for a particular function without having to run it through the compiler and see. Contrast this with compiling Lisp with SBCL. If you give it enough information, it will generate very good code, but to make it do so, you have to become intimately familiar with the optimizer. In other words, one often has little idea what the compiled form of some piece of code will be (performance-wise-speaking, of course) without testing it to see if it needs another type declaration somewhere.

This property is valuable in at least two ways: First, if you're writing a compiler that goes through C, you want to have a reasonable idea how each piece of C it generates will look in assembly before you have to take a profiler to it. Second, if you're writing a piece of code you want to perform well on multiple compilers, it may not be possible to profile with each. In other words, you need to have a sense of what Joe average compiler will do to your code, and write an app that doesn't suck when compiled by Joe.

usablity

Although efficiency is important in designing a language feature I think that others should be taken into account: how easy it is to reason about the correctenss of the program? how much power does it bring to the programmer? how robust will it be to changes in program text?

If you read one of the last posts in jaybaz's blog you will see the "line of thought" that made me give these suggestions for "switch":

  • allow it on any type
  • remove "break", "goto" and even the "case" keywords
  • add powerfull ways to specify partitions of a type's set of values (e.g. regular expressions for strings, integer ranges like 3.. 8, etc.)

The most important source of inspiration in suggesting these was pattern matching in OCaml. I've heard there are extensions that do matching on ADTs but I've been shy to propose it in the C# world..