Static typing may require runtime type checking?

This may be obvious to many of you, but it got me thinking:

module Main where

data Z = X Int | Y Int

xy z = case z of
X a -> "X"
Y a -> "Y"

x = X 0

y = Y 1


Above Haskell program works as expected: (xy x) outputs X and (xy y) outputs Y. So although Haskell uses static typing, at times it needs to do type checking at runtime? Any studies on overhead of runtime type checking in languages employing static vs. "dynamic" typing?

Comment viewing options

This is not runtime typing

This is a common misconception. Case analysis over algebraic datatypes is not runtime typing, no more than, say, the case distinction over integers in

sgn n = case n of
0 -> 0
n -> n div abs n

Technically, datatype constructors have the same kind of status in a type system as record labels. You would not think about the latter as introducing dynamic typing either. Constructors are just indexes, and case an indexed branch. In fact, your example already shows that it is not about types: both constructors carry an Int, so which one is seen does not affect typing.

In a sense, it is.

While the complete coverage of the dispatch is statically checkable, there is a dynamic type dispatch occurring here. If this should not be considered a form of dynamic type checking, what should we call it?

In abstract, it seems to me that a fully statically checked language should be incapable of conditionally generating exceptions based on run-time values, yet we commonly accept violations of this rule on vector bounds checks, which seems to raise a philosophical question:

Where is the line, exactly?

No, it isn't

Dispatch based on dynamic values is not at all related to typechecking. Typechecking is about ruling out certain behaviors statically. Claiming that all non-constant functions are doing 'dynamic type checking' makes the term meaningless.

As for 'fully statically checked', the term ultimately doesn't make sense. Every interesting program has dynamic behavior based on input. Trying to identify when we have reached 'full' checking is a fools errand - some things are predicted statically in some languages, and not in others. In no language is everything predicted statically, and in no language is everything determined dynamically.

Yes and No

In a program in a pure language where all inputs are known at compile time, it is, in principle, possible for the compiler to reduce the program to its output. My poster-child example for this is the C++ template factorial metaprogram: given a constant integer at compile time, the compiler produces a structure whose "value" field is the factorial of the input. The only thing that happens at runtime is the retrieval of that value and its presentation.

I take shap's point (at the risk of putting words in his mouth, but at least he's around to correct me!) to be that, if a given function is total on its domain, then there shouldn't be a means by which the function can conditionally violate its totality. For example, OCaml (rightly) disallows narrowing type coercions precisely because that operation can only be partial, possibly raising an exception, and the propagation of that exception would make what might otherwise be a total function partial.

I also take Sam's point, though, that just because a language construct takes a dynamic value and chooses an action conditionally based on that values, it doesn't mean that we're now dealing with dynamic typing. But getting to clarity about why not probably involves a conversation about observable vs. non-observable effects, abstraction boundaries, etc. Perhaps we should try to cast such a conversation in terms of some specific calculus—maybe one of the recent calculi of types and effects—especially if the calculus supports some notion of dependent types, in which these issues become even more blurry when treated informally, but become more crisp and clear once formalized.

M

In a program in a pure language where all inputs are known at compile time, it is, in principle, possible for the compiler to reduce the program to its output. My poster-child example for this is the C++ template factorial metaprogram: given a constant integer at compile time, the compiler produces a structure whose "value" field is the factorial of the input. The only thing that happens at runtime is the retrieval of that value and its presentation.

This is true, but trivial. Programs where all the inputs are known ahead of time are not interesting.

if a given function is total on its domain, then there shouldn't be a means by which the function can conditionally violate its totality

This is just a tautology. If it's total, it must be total. But note that totality and static checking are not the same thing at all. Also, lack of exceptions and static checking are not the same thing.

I also take Sam's point, though, that just because a language construct takes a dynamic value and chooses an action conditionally based on that values, it doesn't mean that we're now dealing with dynamic typing.

If that is what we meant, then every program, ever, is 'dynamically typed'. While I'm not a big fan of the term, we still shouldn't interpret it in such a meaningless way.

Converging to a point?

Sam Tobin-Hochstadt: Programs where all the inputs are known ahead of time are not interesting.

No, but a language specification and/or implementation being able to tell, and doing the right thing, might be.

Sam Tobin-Hochstadt: This is just a tautology. If it's total, it must be total. But note that totality and static checking are not the same thing at all. Also, lack of exceptions and static checking are not the same thing.

Definitionally, I absolutely agree with you. The fact is, however, that in many (most?) languages, this tautology is violated with wild abandon.

Sam Tobin-Hochstadt: If that is what we meant, then every program, ever, is 'dynamically typed'. While I'm not a big fan of the term, we still shouldn't interpret it in such a meaningless way.

I get the impression that, somehow, you've arrived at the impression that I'm disagreeing with you. I'm not. On the contrary; I'm attempting to explore where the intersection of your observations and shap's lies, given that shap has developed at least two operating systems and at least one new programming language that has very interesting things to say about type inference and mutation, and is explicitly intended to support software verification in some fashion. Personally, I think the intersection justifies a closer look at dependent types, but I also take a point that shap has made to me both in personal communication and on the BitC mailing list that dependent types can be very problematic for a language like BitC, IIRC, by virtue of introducing the possibility of leaking authority via the escape of lambdas outside the context of their construction.

There's a lot to think about here. That doesn't mean that I disagree with the need for terminological exactitude—again, on the contrary. It just means that I trust that shap, at least, knows what he means, and that one way to arrive at clarity might be by reference to a specific calculus, or family of such calculi.

Terminological confusion

The fact is, however, that in many (most?) languages, this tautology is violated with wild abandon.

What do you mean by that? How could a tautology be violated?

On the larger point, while I think highly of shap's contributions as well, I think what he was pointing to as a philosophical question is instead created by confusion of terminology.

I also really don't understand the point you're trying to make about dependent types - the particular things checked by a type system don't enter in to the point I was trying to make.

Trying to Get Taut

Sam Tobin-Hochstadt: What do you mean by that? How could a tautology be violated?

This specific tautology is routinely violated in popular programming languages, if for no better reason than that, while the programmer might intuitively expect a function to be total, it often won't be due to a failure to account for the range of dynamically caused exceptions that can be raised, not necessarily by the function itself, but rather by functions below it in the call chain. Saying that "the function is total or it isn't" is, of course, true as a matter of mathematical logic, but in my opinion isn't particularly helpful in the face of writing real software in real languages. And shap's comment was, after all, specifically about the possibility of a conditional expression given dynamic input conditionally raising an exception, i.e. not being able to statically prove that the function is total.

Sam Tobin-Hochstadt: On the larger point, while I think highly of shap's contributions as well, I think what he was pointing to as a philosophical question is instead created by confusion of terminology.

That's certainly possible, so I hope that shap will elaborate.

Sam Tobin-Hochstadt: I also really don't understand the point you're trying to make about dependent types - the particular things checked by a type system don't enter in to the point I was trying to make.

The point that I was trying to make is that it's hard to say things like "the particular things checked by a type system don't enter in to the point I was trying to make" with any bite, given things like GADT's (as another poster pointed out) or dependent types. Also, of course, that the question of what are "static" and what are "dynamic" semantics of a language become much more complex, and thankfully much more interesting, given a dependent type system than one that isn't. In particular, learning what, e.g. Haskell or OCaml can or cannot statically prevent isn't very informative relative to learning what, e.g. Epigram or Agda can or cannot statically prevent.

What's a functional programming language?

I would agree with those who, in broad strokes, paint JavaScript/Python/Ruby/Lisp as "the same language". I'd also agree that Python and JavaScript are imperative language while Lisp is functional; "functional" versus "imperative" applies more to individual programs than languages, and I think applying these labels to languages themselves is more a statement about how the community typically uses the language.

I think what Paul was trying to say is that a ADT's idiomatic disjunction includes every possible case; one could certainly omit one of the cases, but this less than idiomatic. (Though some people idiomatically omit a case if they know it's not possible via purely local reasoning)

Compare this to the typical usage of dynamic typing, on the other hand, where the prevailing idiom is to not use typecases, assume that you got passed one specific type (or a small idiomatic disjunction of types, e.g. null? or pair?) and let a run-time error occur if somebody passed in a kind of value you weren't expecting.

Confused

I'm really confused as to what point you're trying to make here, and what in my post you're replying to. I didn't say anything about functional languages, or about typecase, or about idiomatic code.

"Totality" of a Scheme function versus ML array dereferencing

In retrospect don't think Paul was trying to say that exactly, but I'll try to clarify what I'm trying to say.

The vast majority of Scheme functions are not total over all sexps, and while the domain of most functions are relatively understandable, a few Scheme functions make rather convoluted, undocumented assumptions about what their domain is. I remember one time I fixed a bug in somebody else's code, I took it to them and while they were happy with the improvement, they argued it wasn't really a bug, just a hidden assumption. I thought it was a pretty ludicrous assumption, and still would classify it as a bug, not least because it did something incorrect outside its domain instead of raising an error, even if it was "known" but undocumented.

Compare this to ML, where the domain of a function is usually assumed to be the set of all inhabitants of the argument's type, instead of some subset thereof. As shap points out, there are some notable exceptions, array indexing being an obvious one. :-)

So while there is a great deal of similarity between value-based dispatch and dynamic type-based dispatch, at the end of the day it feels very different due to the prevailing idioms: much like Lisp is a functional language while JavaScript is imperative, even though these two languages are more alike than they are different.

Ultimately I would not call the original post "runtime type checking," basically agreeing with both you and Andreas, however I'd also argue that the difference isn't as profound as you might suggest. Sidhu demonstrates an unfamiliarity with the vocabulary of PL Theory, but the thought process isn't so terribly wrong either.

So what should we call this

I'm less interested in whether this is dynamic typing than in learning what it should properly be called.

That's the issue

At the heart of this thread is a discrepancy between the statically typed language's and dynamically typed language's definitions of 'type'. In a dynamically typed language, 'type' is practically synonymous with 'constructor tag'. Though you should be careful using such terminology around PLTs ;).

Call it discrimination on a

Call it discrimination on a variant.

In any case, it certainly isn't 'dynamic typing' because, with a dynamic type, I could do such things as pass a value like 'U string' to my function of type '{type: X int | Y int} -> string', and doing so wouldn't be caught until the pattern fell through and either raises an exception or performs an implicit conversion to (glass half full ? make everyone more productive : break the code someplace else).

I say potato, you say...

I think this is a bit of a word game here, not unlike the classic debate over "goto considered harmful" versus proper treatment of tail calls... a debate that seems to have been settled decades ago unless your name happens to Guido. ;-)

To bolster Shap's side of the story, dynamic typing and discriminated union share the same implementation techniques; whether it be using the high- or low-order bits of a pointer, or keeping the tag in a separate byte, etc. And if you want to write an interpreter for a dynamically typed language in ML or Haskell, then your value type would likely be one big disjunction...

tomato

To bolster Shap's side of the story, dynamic typing and discriminated union share the same implementation techniques; whether it be using the high- or low-order bits of a pointer, or keeping the tag in a separate byte, etc. And if you want to write an interpreter for a dynamically typed language in ML or Haskell, then your value type would likely be one big disjunction...

This is fundamentally confusing the meta and object levels. A Scheme interpreter written in ML is a typed program, but the Scheme programs it runs are not typed.

But the object level informs the meta

Classic example being the "discovery" of Scheme itself, when Guy Steele and Gerry Sussman started looking at the implementation of what they had done and realized that they could remove the "actor" construct from Scheme because the implementations were basically the same, and that actors were "just" lexically scoped procedures in the continuation passing style.

object/meta

First, I think you mean that the meta-level can inform the object level (actor' and lambda' were object-language constructs in the original Scheme interpreter).

Second, it's certainly possible to learn things about a language by implementing it. But that doesn't mean that the implementation technique *is* the language. We can implement integers in many different ways, but we shouldn't imagine that any of them *are* the integers.

Re: object/meta

Probably, I hadn't ever quite heard those terms before in this context, so I was trying to infer their meaning from their usage.

Abstract algebra often treats isomorphic things as equals. While I tend to agree that we shouldn't start confusing implementations for their langauges, if two things can be implemented the same way they probably are more alike than they are different.

Again, I agree with both Andreas and you, and don't really want the current terminology to change. I'd summarize this thread so far as:

1. Andreas points out Sidhu's violation of the shibboleths of current PL jargon.

2. Shap points out that there is similarity too. You, me, and Paul argue about this in all the children of his post.

3. Meanwhile, John Cowan points out a key substantive difference between run time typing and disjoint unions down in his reply, ever so lonely as of now. :-)

meta/object

I agree with most of your post, but I wanted to point out that while this:

if two things can be implemented the same way they probably are more alike than they are different.

is true, it isn't relevant here. If you were implementing ML in ML, and Scheme in ML, you wouldn't use the same technique to implement variants in ML as you would for the s-exp datatype.

By no means...

...you can certainly implement a "dynamically typed" language in a way that that does not make use of any tags or discriminated unions. Namely, you can implement the pure, untyped lambda calculus. There is no possibility of any kind of runtime error, and so an implementation of it does not need to do any tag checks.

This is not a trivial point; the fact that data flow can be turned into control flow, and vice-versa, lies at the heart of Church encodings, continuation-passing style, defunctionalization, the construction of abstract machines, the expressiveness of System F, the relationship between lazy and eager evaluation, and many other subjects in PL.

Dispatch based on dynamic

Dispatch based on dynamic values is not at all related to typechecking.

I don't think that's precisely true. If Z were a GADT, the branch could be selected statically. It all depends on the degree of precision in your types. At one end of the spectrum we have a program who's execution is fully specified in the type signature parameterized by the inputs, and at the other end we have dynamically typed languages where no type information is known statically.

It all depends on the degree

It all depends on the degree of precision in your types.

Right. It all depends on the type systems. It is not a coincidence that some errors are termed variant errors rather than type errors, and some constraints are termed subtype constraints rather than type constraints...

If Z were a GADT, the branch could be selected statically.

No, that's not right. GADTs allow the return type of the constructor to vary based on which constructor was selected. This doesn't mean that which branch will be taken can be predicted statically. In the classic example of an interpreter in Haskell using GADTs, the branches couldn't possibly be predicted statically.

Case analysis

shap: While the complete coverage of the dispatch is statically checkable, there is a dynamic type dispatch occurring here. If this should not be considered a form of dynamic type checking, what should we call it?

What's wrong with case analysis? Or discrimination?

In the rest of your posting I suppose you are alluding to non-exhaustive cases. But they are just (occasionally convenient) syntactic sugar - their meaning can be explained entirely statically and locally, without the need to introduce types into the operational semantics. That is the line, I would argue. Not all languages allow such cases, by the way, and most others have obligatory warnings, i.e. qualify them at least as "soft" type errors.

If you think about what a Z is ...

you see that it's just an Int with an extra bit saying whether it's an X Int or a Y Int. So when Haskell examines a Z, it can do runtime discrimination between X Ints and Y Ints, but it can't deal with anything not in the preset framework of Zs.