The Current Practical Limit of Static Typing

I've been asked to review some papers on the topic of application development using dynamically typed languages. Many of these papers make claims about static typing that I believe are untrue. However, I believe there is a limit on what can practically be achieved with static typing, and there certainly is a limit on what can practically be achieved right now. Where do you think that limit lies? Some ideas: Anything involving runtime code modification seems difficult. Meta-circularity likewise. Most current statucally typed languages offer no useful support for arrays.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Furthermore

I nominate the Oleg-hour as the standard unit for measuring the difficulty of encoding a problem's constraints in available statically typed languages. As in:

"Dude, static array index checking is, like, 3 Oleg-minutes of work"

"Right, so that's a year then."

interpretation

Runtime interpretation of type-structured data tends to be difficult1 to reflect back into the host language's type system. This manifests itself in different ways depending on whether you're writing an interpreter (you mentioned meta-circularity), unpickling serialized data, or reading structured data off the wire. These have typically been motivating examples used in the literature on adding a type Dynamic to statically typed languages.

One place I've found where types get in the way is when I'm interacting with or testing a program. Types are program invariants that are universally quantified over program traces, but unit tests and interaction at the REPL are more existential in flavor: they attempt to produce a single trace that demonstrates an example or counterexample. In these kinds of scenarios, I've found types are often not helpful--and when you're playing with constructs with particularly tricky types (I'm looking at you, call/cc), they can really be a distraction.

1Note I said difficult, not impossible... it seems nothing's impossible, provided your name is Oleg, Ralf, or Simon.

Intensional Polymorphism

Typed reflection interests me, and I've been collecting papers on "intensional polymorphism", which has the potential to fulfill these goals capabilities:

1. Original paper: http://citeseer.ist.psu.edu/harper95compiling.html
2. http://citeseer.ist.psu.edu/trifonov00fully.html
3. http://citeseer.ist.psu.edu/crary98intensional.html
4. http://stinet.dtic.mil/oai/oai?&verb=getRecord&metadataPrefix=html&identifier=ADA285340
5. Lambda-L http://www.cis.upenn.edu/~dimitriv/itaname/
6. Encoding intensional type analysis http://www.seas.upenn.edu/~sweirich/papers/depabs/depabs.pdf
7. Type-preserving garbage collectors http://www.cs.princeton.edu/~danwang/Papers/tpsrvgc/
8. Reflection in logic, functional and object-oriented programming: a Short Comparative Study http://citeseer.ist.psu.edu/106401.html
9. Reflections on reflection, Henk Barendregt http://lambda-the-ultimate.org/node/91
10. Principled Scavenging http://flint.cs.yale.edu/flint/publications/gc.pdf
11. A Typed Interface for Garbage Collection http://citeseer.ist.psu.edu/572192.html
12. Thesis http://flint.cs.yale.edu/flint/publications/saha-diss.pdf
13. Higher Order Intensional Type Analysis http://citeseer.ist.psu.edu/weirich03higherorder.html
14. Self-Interpretation and Reflection in a Statically Typed Language http://webpages.cs.luc.edu/~laufer/papers/reflection93.pdf

Combining this with staging constructs ala Metaphor [1] also seems promising.

[1] http://lambda-the-ultimate.org/node/624

Theorem provers

Theorem provers are getting better all the time -- SAT solvers can solve some incredibly huge constraint problems posed in propositional logic, and the FOPC provers are coming along nicely.

As far as I know, the only theoretical limits to inference come from self-reference as in Gödel and the halting problem. Sure it's NP-complete but for practical problems the progress is there.

The question is if you hitched up a prover like Darwin for type inference, would you get added usability for your pains. There's room for some ingenuity as people invent descriptions of values that are both useful and provable by the then-current crop of systems.

Josh

acronym police

In theory, practice is the same as theory

I'm agreed that in theory there isn't any limit to static typing, beyond incompleteness. However it certainly isn't practical (cost-effective) now. As a result we have type systems that can prove simple theorems with a bit of user assistance. A lot of the progress in static typing seems to be finding ways to prove a bit more without much user input.

However I'm more interested in what the current state of the art is, in practical terms (so deployed in a language like Haskell or O'Caml, that one might reasonably use in a setting where language design is not the main concern). I'm selfishly interested in evaluating the reasonableness of the claims in the papers I'm reviewing, and all interested to see where it's at in a field I've haven't paid too much attention to for a little while.

For example, I think the following are still problematic:

- Dynamic code modification. As far as I know O'Caml and Haskell have very poor support for loading code on the fly. I can see why this might be problematic with type erasure. In contrast, Common Lisp systems let you modify a program at runtime. (I'd be interested to know what Erlang offers.)

- Arrays. If you're not using SAC you don't get much from a static type system beyond 'hey, this is an array'. SAC is fairly inexpressive (first-order!), and not very polished. Oleg's work might be a counter argument. Last time I skimmed it, it didn't stick.

- Pointer/bit twiddling. Again, Oleg has something to say here. I don't know how practical it is.

Dynamic code

(Warning: shameless plug.)

Alice ML and Acute are two MLish languages that combine static typing with dynamic import and export of code. Alice ML even performs type erasure for polymorphic functions, though not for modules.

Neither does currently perform verification of imported code (i.e. checking the type integrity of the code representation), though I see no principle problem with this (but a lot of engineering work).

Not sure if this qualifies as code "modification", though.

Dynamic code 2 ...

Dynamic Applications From the Ground Up deals with lambdabot, a plugin-based IRC robot and Yi, an emacs-clone using haskell (!!!) as emacs lisp replacement.

Dynamic code, Arrays, bit twiddling

Noel wrote: ``As far as I know O'Caml and Haskell have very poor
support for loading code on the fly.''

I would say that both OCaml and GHC have excellent facilities for
loading code on the fly. In fact, when you work in the toplevel of
OCaml and GHCi, dynamically loading code (including the compiled code
and C shared libraries) is what you do all the time. OCaml toplevel
has a special #load directive for loading the compiled code
(.cmo/.cma). GHCi loads the compiled Haskell code (.hi/.o) files
automatically if they are available. The linked-in shared libraries
are loaded transparently.

Loading code on the fly is what makes MetaOCaml work. When you evaluate
.! code-expression, the code expression is compiled and
the corresponding (byte or the native) code is dynamically linked-in
into the running system and executed.

I'm not sure how much that has to do with static typing. The main
problem in loading code on the fly is supporting dynamic linking in
the run-time system. Also vexing is the issue of versioning. If a code
fragment is compiled given some definition of the data type Foo and
then loaded in the program that also uses the data type named Foo, is
it the same data type or not? This is the general issue of global
naming, versioning, etc. For example, please see the issue of
serializing generative records in Scheme.

Regarding arrays: Haskell, OCaml (or any other language providing
existentials or generative structures in a local context) support
statically safe array indexing without the bounds check. That
definitely improves performance of complex array-intensive code.

Pointer/bit twiddling is not incompatible with static typing. Please
see `High-level Views on Low-level Representations' by Diatchki, Jones
et al (ICFP05), theirs `strongly typed memory areas' (Haskell Workshop
2006), the Hume project. The ICFP05 paper is specifically about
bit-twiddling.

Bit stream programming in Erlang

For functional bit fiddling: http://lambda-the-ultimate.org/node/1921

Although that doesn't address static typing.

What is static and dynamic typing?

Can you give a meaningful definition of static and dynamic typing? It seems to me there is no such definition, and consequently it makes no sense to discuss limitations one way or another.

What we describe as "static typing" tends to be compilers that are "more strict" about types at compile-time, but there is no principled distinction from "dynamic typing" where there is "less emphasis" on compile-time checking, as far as I can see.

Previously

An LtU search for "static vs. dynamic typing" will turn up lenghty and heated threads on this issue. I suggest looking there. Or you can look up type theory's view of the issue in Chapter 1 of Pierce's TaPL. He gives a widely accepted (among theorists) definition of "type system", which is consistent with type theory's historic roots. By this definition, typing is an inherently static notion, such that "static typing" is simply a tautology, and "dynamic typing" a traditional misnomer, as he points out. I'm not aware of any equally solid alternative definition that would succeed in making "dynamic typing" (as usually understood) a meaningful term.

theorem or conversation?

It seems to me there is no such definition...

If someone were trying to prove a theorem about static and/or dynamic typing, you would be right.

But for meaningful and useful communication between humans, mathematical definitions aren't necessary. Sure, universal properties of programming languages are arbitrary and slippery to nail down precisely. But in practice, static and dynamic typing have perfectly reasonable and useful definitions.

To a first approximation:

Static typing = a program analysis executed prior to evaluation, almost always syntax-directed, that detects potential violations of language abstractions (often of the form "primop δ cannot be applied to values of certain forms") and rejects programs (i.e. prevents them from evaluating), in order to guarantee that those abstractions cannot be violated by the evaluator.

Dynamic typing = protection of such language abstractions without the use of a static analysis, enforced during evaluation as checks resulting in runtime errors.

As I say, in practice, these are reasonable definitions, even though they wouldn't be useful for theorems (because an adversary can always game them).