Quantitative comparison of unit testing vs. static typing?

Has anyone seen an objective comparison, e.g., quantitative analysis, of whether unit testing will find the bugs an ML-ish type system will reveal? There are additional benefits to both approaches, and many claims and anecdotal experiences about these things, but I'm wondering if anyone has seen anything more objective?

For example, the DiamondbackRuby etc. guys might have something like a true positive and false positive rate on checking codes that were only unit tested.

Edit: Ask and ye shall receive (in a tiny part): a study of the bugs found by translating some unit tested Python apps into Haskell.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I don't see how it could be

It's trivial to stack the deck by making sure there are unit tests to check the types of the values returned by each procedure.

The wonderful thing about

The wonderful thing about most Rubyists, Pythonistas, etc. is that they don't use research tools for type analysis, but they still do worry about things like test coverage. I don't think this is that outlandish -- I suspect somebody has already done it, e.g., to validate their research.

Well, beyond a certain

Well, beyond a certain critical mass, static typing sure helps your unit tests, in the same vein as it did at your compile time.

There sure are examples of large unit tests batches in projects heavily based on dynamic languages. And that's all them folks can get not to be totally naked.

I would argue though that neither static typing or huge unit tests batches are enough to spot the most subtle bugs in many cases.

It's probably better to have both, especially from the point of view of a project manager with tight deadlines, and ultra short release cycles, but if you don't have static typing, you can still unit test. If you have static typing but no unit tests, you're probably not working to make a living out of your code. Or not in the industry's corporate jungle, for sure, anyway.

You're maybe on an island with plenty of time and proof checkers instead (Ok, exagerating a bit)

Of course, AFAICT.

EDIT
Oh, and if you have neither, static types or unit tests... well, then you probably value more being surprised than anything else.

Relevant question thread

Relevant question thread from Programmer's StackExchange. Nothing conclusive, and I would guess all the evidence used in these debates is anecdotal.

I'm actually reaching for a

I'm actually reaching for a lower target than the general "static vs. dynamic" mess. For example, long-term productivity is hard to measure, even though people try. I'm just curious about the technical point of catching ML-style type errors on code that already has good unit test coverage. I think we can answer that (and if we can't, that's pretty concerning).

My question is flawed, but more in the sense that ML-style type errors may be false positives.

I think we need such

I think we need such unbiased empirical studies. But if you come into such a study already subscribing to particular position (static typing is good), then confirmation bias is a strong possibility no matter how careful you are. Also, its not even clear what the study looks like.

What would the experiment look like? Take a simple ML-style language that none of your population is familiar with. Divide a population of programmers into 4 groups of around 20 or so programmers (so you need about 80 total, probably more if you want more solid results): (a) can use static typing, (b) has dynamic typing but is encouraged to do unit testing, (c) has both static typing and unit tests, and (d) has neither. Measure the performance of each population. However, you run the danger that your population doesn't write enough code to really hit the type system.

Best if you can eliminate people from the study, as having them are always kind of annoying :). So instead of measuring productivity directly, you can do a simple analysis of how much unit test code is necessary to detect the same amount of errors that static type checking detects and then somehow claim that this is related to productivity. But this connection is too vague and this would only be considered as ceremonial evidence in a publication (to the pass the PC on an otherwise good paper).

Then, you could analyze bug reports for a variety of large projects using statically typed and dynamically typed languages. Classify the bug reports for dynamic languages as type errors (detectable with static typing) or not; compare. But in this case you don't really get to observe how the code was developed, just the resulting maintenance costs, and the projects will be different enough that the comparison really doesn't matter.

So none of these really feel very satisfactory. So instead, we go off and do the first study again, except instead of using real programmers we use evolved neural networks that have reached a decent local maxima and evaluate the efficiency of how they write code in 4 variants of the language. Of course, getting a neural network to write unit tests is hard, so maybe at this point we find that we've lost our sanity on the topic.

Here is another potentially

Here is another potentially useful experiment: rather than have your population write code, you have them fix code that is heavily type broken. Your (a) population has the fun of diagnosing static type error messages from the compiler, your (b) population gets to write a bunch of unit tests, your (c) population does both, and your (d) population just gets to look at the code and make changes without testing or compiler feedback.

Hmm...sounds like a nice paper.

[Edit] To do this right, make the type errors catch-able by the compiler only if type annotations are added to the program so that the (a) and (c) populations have to do some work for compiler to be useful. Then run some pilot studies to make sure that the test is sort of fair, unbiased, and more importantly interesting. Select your population from a group of programmers with similar experiences; e.g., the top N% students of a big intermediate-level programming course. Offer them incentives for just showing up (~10 bucks) and a prize (~100 bucks) if they have the best productivity in their population (it would be unfair to rate across population. For better results, repeat the study for different experience-levels.

[Edit 2]: provide a nice unit testing interface for populations (b) and (c) so they can easily write their unit tests.

[Edit 3]: your pilot population can and should be biased. Ideally, you'd have both advocates of static typing and unit testing, and you would develop tests that seem to benefit productivity around equally of both in your pilots. The final tests might need to contain a warm up problem so that you can take learning curve out of the final results.

Finally, you really have to build a new simple language to facilitate the user study as (1) you don't want any other feature messing up your results and (2) you want to use basically the same language to test both static typing and unit testing with differences only appropriate to to each case (e.g., the addition of types for (a)). Typing might even need to be optional for group (c) (since they should have flexibility either to use static typing or not). Racket should make doing something like this easy.

"Right"?

To do this right, make the type errors catch-able by the compiler only if type annotations are added to the program so that the (a) and (c) populations have to do some work for compiler to be useful.

Wait, why would that be anything like "right"? One of the advantages of types is that you shouldn't need to do such extra work.

Ah, I forgot we were dealing

Ah, I forgot we were dealing with ML here, where type inference is fairly comprehensive. Fair enough, as far as type inference works is concerned the testee should be able to rely on it. The only problem here is that such typing will be overly conservative in some cases, but then that should be part of what is studied.

Stacked evaluation?

This sounds like an evaluation stacked against dynamic checking to me.

It sounds like comparing the ability to go out on town and come home in one piece between talkers and fighters, based starting the test at "You are in a fistfight".

A decent dynamic programmer wouldn't get into this situation.

For an ideal but unrealistic experiment, you'd randomly assign programmers to working dynamically and statically for ten years, and then check who's more productive at building a series of different type of systems.

A more realistic option may be to collect data on the dynamic side for how much time is spent on lack of type checking. Below is a proposed experiment. Step 1-6 gives you fraction of time spent on type errors; it captures a large claimed benefit of types (catching errors easily). The experiment could be stopped here. Step 7-8 gives you the number of bugs left in the system that would have been caught by type checking; this capture the other claimed large benefit. This does not cover the costs or benefits of the types on development itself. Types can be used to enable refactoring and code browsing; this benefit is not counted in the above experiment. The cost of increased compile time and lack of flexibility in development (including the need to make things type check before you can run it) is also not counted.

  1. Compile a group of programmers with significant experience in dynamic language development
  2. Have them all implement some reasonable problem(s)
  3. Track all saves of the program
  4. Track all executes of the program / tests
  5. Have the programmers mark "finished" at some point
  6. Manually go through all the traces from work (which diffs happened), and determine which save/execute portions correspond to type errors. This will show you how much of the time on development was spend on debugging type errors.
  7. Go through the resulting programs and add type checking, either by translating them into a more strictly typed language or by adding annotations that run time check them and run the programs on lots of different data. (I have a framework for adding "type annotations" to Ruby at http://people.freebsd.org/~eivind/ruby/types/)
  8. Find out how many type errors were left in, and estimate the importance of the bugs they represent.

On a more anecdotal level, my own data collection has been mental tracking the complicated type errors I get when I program in dynamic languages; and excepting shell scripting, over fifteen years I can count the number of serious problems on the fingers of one hand - where I see "serious problem" as anything that's taken more than ~15 minutes to fix. Most of the problems I can think of has related to semi-weak checking/auto-casting and would have been a problem in a static language with a similar type situation as well - I've had a couple that came from auto-vivification in Perl, one that came from hashes with 0/"0" in Ruby, and one that came from strings in Python being considered iterable as the characters of the string. I've also had a lot of low level pain from Perl not checking that you have the right number of arguments to a method. Each case didn't cost a lot of time, but it did cost some time regularly.

Though I've spent more time in dynamic languages, I've spent much more time on tracking down null pointer errors in Java than I have dealing with problems from dynamic typing. In my experience, the cost of a type error in a dynamically checked language with argument count checks (ie, Ruby, Python) is usually on the same order of the cost of a type error in a statically typed language. In a statically typed language, you compile the program (which takes some seconds), the error immediately pop up, you fix it, and you compile again. In a dynamic language, you run the program or immediate tests (which takes some seconds), the error immediately pop up, you fix it, and you run again.

The compile time for the static systems I've worked with is often longer than the time to start up a dynamic program, so counter-intuitively, it is often faster to fix a type problem in a dynamic language in a static language, even when the type system catch the problem.

No true scotsman fallacy

A decent dynamic programmer wouldn't get into this situation.

No decent C programmer would get into a situation where his buffers overflow or he double-frees memory. Of course, we all know that even decent programmers make plenty of these mistakes.

Any reasonable person would simply recognize that programs are complicated and it's nearly impossible to keep all programs you're currently maintaining in your head at the same time. Furthermore, programs are worked on by different people who all have different knowledge of the program properties. Refactoring can often break many assumptions in code you may not have even written, so even if you are a "decent" dynamic programmer, your programs are inherently limited by the worst programmer you have to work with. Sometimes you are brought in to fix broken programs and you just have to deal with this scenario, even as a "decent" dynamic programmer.

The scenario Sean posted may seem biased against dynamic languages, but this bias is imposed by reality, not introduced by the experimenter.

Different realities

Let me rephrase to better capture my original intent: I believe this case to be very uncommon. I have never encountered a checked in dynamic program with massive type messups; I've encountered it in the moment when somebody was attempting to change something and didn't understand what they were doing, and at that point looking at the diffs to whatever is in version control makes it possible to untangle (or just throw the latest change away and start from fresh, working code). It is a situation that's occasionally reasonable to get into temporarily when you're working with statically typed languages (because you have the type system to get it back on track); it's a situation that's not reasonable to get into with dynamic languages, so experienced dynamic programmers will tend to have working habits that avoid getting into this situation.

As a such, I feel that "How do I get out this situation" is mostly irrelevant. How much effort goes into avoiding getting into the situation - that's relevant. How quick to get out once you're there - not so much.

I have never encountered a

I have never encountered a checked in dynamic program with massive type messups;

My intuition for asking is that there are perception issues. Bug finding advocates and static analysis enthusiasts can reasonably claim that exceptional/rare code paths have been, by definition, run less and therefore may have bugs that testing (or any other incomplete methodology) would not reveal. This may or may not be true. As a dynamic programmer, you perceive few bugs -- but how true is that?

Presumably the dynamic

Presumably the dynamic programmer will still perceive those type errors, namely as bug reports from production code. That way he can still get an accurate perception of how effective testing is. If a bug is really never perceived, then was it an important bug in the first place?

-------

The last bug I found that was arguably a type error is the following. There is a slider in a GUI whose position is sometimes set in response to other events. This was using Qt with Python. When the external event was triggered it sometimes resulted in weird behavior, with the slider's marker suddenly changing from pointing upwards to pointing downwards to being square. Eventually I tracked the problem down to a call to slider.setTickPosition(pos). That seems like reasonable code, especially because when you look at the QSlider documentation it is the only method with position in its name. The problem is that this method does not change the position of the marker, instead it changes the style of the slider. The method that is needed is setSliderPosition, which is defined in QAbstractSlider. Why did the code not raise a type error? Because setTickPosition can take an integer indicating the style: 0 for style X, 1 for style Y, 2 for style Z, etc. Furthermore, instead of giving an error for setTickPosition(1000), it just does 1000 modulo the number of styles, and applies that style. Note that the error would not have been caught even if the code was in a statically typed language.

This kind of type error that does not exhibit itself as a type error is in my experience the most common: it can occur in both statically and dynamically typed languages. The problem is that the API designer did not choose the appropriate data type, and instead interprets any integer as a valid input. Another error that can be prevented by appropriate data types is SQL injection: have two different string types for SQL strings and user input strings, and make sure that user input strings cannot be used as SQL strings. Another is numbers with units. You can have this too in both static form and dynamic form. In static form you have units of measure as in F#, and in dynamic form you represent a number as a (float, unit) pair.

It's probably a fairly accurate perception...

...I didn't stop writing Dylan and Scheme code because the programs I wrote had too many bugs: there are plenty of idioms and tricks for avoiding those bugs. I stopped using dynamically-typed languages because I grew to see those idioms as a hindrance and encumbrance.

In particular, I wanted to stop writing pointless wrapper code, and I wanted to start writing very higher-order code. This is against good style for dynamically-typed functional programming, since they both make tracking errors much harder by decoupling where errors are detected from where they occurred.

But if you don't want to do those things, then you won't see a problem.

Healthy perspective, but a little terse :)

In particular, I wanted to stop writing pointless wrapper code, and I wanted to start writing very higher-order code. This is against good style for dynamically-typed functional programming, since they both make tracking errors much harder by decoupling where errors are detected from where they occurred.

That's probably a healthy perspective, and feeds my (pet) suspicion that FP researchers view the perceived benefits (and costs) very differently from FP practitioners and non-FP practitioners.

Unfortunately, I don't understand either of these statements (a little too general). What static features did you want to replace wrapper code -- is the concern similar to what Andreas was describing? Likewise, what did you want to write in a higher-order way that was frustrating before -- computation over types for some particular scenarios?

I'm curious if this might, again, be a situation of a perceived vs. actual difference (as ended up being the case for Andreas). Similarly, whether the scenario is one that is noticed at all by general programmers (in the sense of Wadler that FP needs to look beyond building theorem provers and compilers).

There were two things I

There were two things I wanted to do: first, I wanted to use data abstraction to detect errors early, *without* paying a runtime cost. For example, consider a stateful gensym module which returns fresh symbols by incrementing a counter.

You can represent fresh symbols with integers, but this makes it easy to accidentally pass in a number to a function which expects a symbol, or vice-versa. IMO, programming for debuggability in a dynamically-typed language leads you to wrap that integer in (the moral equivalent of) an object, so that you get an error as soon as you try to call the wrong method on it. You can do even better by coupling this with contract checks, which give you errors and correct blame assignment on top of that. However, I was reluctant to embrace contracts because they can change the big-O space complexity of code (eg, by breaking tail-recursion). Data abstraction (in the sense of existential datatypes/ML modules) gives you those benefits, *without* any runtime overhead. The types also free you from having to write any wrapper code; the only thing that changes is that your signatures get shorter.

More important than that, though, was trying to write higher-order programs like parser combinators. Even small errors lead to hard-to-debug errors, since the error would not manifest until you used the full parser, which meant diagnosing an error in a small bit of a higher-order data structure on a large input. However, basically all such errors are easily caught by simple ML-style type checking. Basically, all the programs I wanted to write were so much easier to write in ML that I couldn't resist.

Then I decided I wanted to write my own FPL, and grad school followed. :)

It's never a wash though...

This is quite interesting. Using a dynamic language requires you to keep your code as simple as possible, avoid heavy abstractions, or anything else that will be difficult to debug. Consequently, using a language with a rich static type system can encourage heavier abstractions, but will probably have some amount of cognitive overhead even with the compiler making sure your types match up.

Even if the compiler is inferring my types, I still need to understand them. I have to understand why the compiler made its judgement so I can fix my type errors. So now I've got this type system to fight, but at least when the compiler finally says yes, I will be confident that I've fought it well enough.

I went in a backwards direction when I moved from Scala to C#. I went from a rich language where elegance was possible, and so you spent time chasing it, to a fairly pragmatic language where elegance was unachievable and so you just spent more time writing functionality. Language shapes thought, so to say, and its not clear which is better in the sense that worse can be better sometimes (or there is a general advantage to the KISS principle).

I'm preparing a survey right

I'm preparing a survey right now about these sorts of things, so I'm wondering if there's a way to capture Neel's intuition. Something on the lines of "Do you believe X is more important to you than Y." I have this for the static types vs. unit testing thing, but I'm still searching for something else: "Is the type signature an important part of the documentation", "By avoiding side effects, how more modular do you expect your code to be", etc. Wrappers seem a little too specific of a concern here :)

I expect your results will

I expect your results will be obscured by ideology: group A will claim their path is the way, group B likewise their own path. It especially gets crazy because you are bumping into "worse is better" phenomena. How people lie about their efficiency: they could have spent too much time perfecting their types or unit tests, and fail to account for that in their assessments.

I still think performance testing on the approaches is necessary for any kind of usable (less biased) results. Perhaps if we could identify the core aspects of each approach, those could be diced into simple cognitive performance tests.

I want to expose the biases,

I want to expose the biases, and of different demographics.

What do you mean by performance testing? Objective tests / analyses to see the utility of various techniques, such as I proposed at the top of the thread?

I view that as a step after: I first want to establish beliefs, and only then support them / knock them down. I suspect, for example, researchers and practitioners, and even different researchers, are talking past each other. Much of the discussion in this thread, for example, reveals such confusion on unit testing vs. static types, and this is by those who understand these ideas relatively well!

Cognitive dissonance ensures

Cognitive dissonance ensures that well entrenched beliefs endure. If you attack the beliefs directly, I think, you won't have much influence.

By perf testing, I mean, measuring performance using various approaches, so you can indirectly measure difficulty and say cognitive burdens. Provide two systems with different properties that emulate some of the aspects of static typing and unit testing, perhaps even disconnect them from the task of programming, find the essence of the debate and perhaps identify a gradient of solutions/effects. But even if you can manage to collect numbers, they will probably not be taken seriously, we just aren't used to science.

Stop worrying about us researchers; we have the strongest opinions but don't really matter anyways. Lead by example, design and ship languages; let the market decide even if its not "right."

Edit: it is definitely not inconceivable that someone will discover a new approach or a vital twist on an existing approach that will lead to an experience that is vastly better than our current state of the art, ending the debate with disruption. Example: huge break through in theorem proving makes static much more appealing, or big advances in ML (machine learning) make dynamic vastly more appealing (since ML would benefit from more flexible/less rigid assumption in code). So far, nothing like that has happened yet and so the debate continues.

Higher levels of abstraction are essential

Sean: Using a dynamic language requires you to keep your code as simple as possible, avoid heavy abstractions, or anything else that will be difficult to debug. Consequently, using a language with a rich static type system can encourage heavier abstractions, but will probably have some amount of cognitive overhead even with the compiler making sure your types match up.

[...] Language shapes thought, so to say, and its not clear which is better in the sense that worse can be better sometimes (or there is a general advantage to the KISS principle).

I don't follow this argument at all, or why you disregard higher abstractions as "heavy". Being able to work at higher levels of abstraction is the ultimate goal of programming languages. There can be little doubt that today's software could not be written with the abstractions from 1960. Likewise, it seems highly unlikely that you gonna be able to write the software of 2060 with today's common abstractions. So anything that supports a move to higher abstractions is not just clearly good, it is essential for making progress.

I think we disagree about

I think we disagree about the general goals of PL; its not, I believe, to work with higher levels of abstractions, but simply to make it easier to write better programs. You can definitely go too far with your abstractions; over abstraction is a well known anti-pattern. It really depends on your context; e.g.,

  • What is the programmer hacking together and how much thought should it require? Is it a simple assembly task, or are you writing in flight control software for a multi-million dollar plane? Remember the value of the duct tape programmer.
  • What is the amount of indirection one can tolerate?
  • When is a higher-order function too higher order?
  • How many layers are too many in an architecture?
  • Are combinator parsers clearly better than writing a recursive descent parser by hand, or is it just a different set of tradeoffs? You can get lost with combinator parsers even when you can rely on rich static typing, sometimes simpler is better.

Much of the software written today (2012) is done in dynamic languages with fairly simple (dynamic) type systems; improvements over 1960 are mostly of convenience and better hardware to afford things like garbage collection.

I do agree that we need "better" abstractions, I don't believe that these abstractions are necessarily higher-level (to mean more abstract). Complexity is our enemy, but sometimes the abstractions we use to contain complexity introduce complexity themselves. Perhaps the abstractions are more simple and less higher-level, but they just "fit" better?

I also was puzzled by the

I also was puzzled by the remark that, as I understood it, said when using a dynamic language one would need to avoid heavy abstractions. The main reason I could think of for avoiding heavy abstractions in Scheme would be that Scheme isn't dynamic enough — it uses phase-separated macros instead of fexprs, which means various kinds of abstractions require laboring in a second-class programming model different and clumsier than the primary computational model of Scheme.

You could over-abstract in

You could over-abstract in dynamic languages also, but the cost can become quite high since the compiler isn't keeping track of types for you. Sometimes are languages are deficient in some way that discourages certain abstractions, but maybe we are better off without them. Its definitely something that shipping language designers (those that ship languages) seem to think about a lot, and what we might miss as non-shipping designers (speaking for myself, of course).

Bug counts vs business cost

I'll readily concede that there most likely are some type related errors left in dynamic programs I've worked with, I don't see any reason to think they're very frequent. I don't have the perception of finding any more bugs on random code reading when I read dynamic code I've written than when I read static code I've written, which means it most likely isn't that much more frequent.

I also don't get significantly more bugs back from production.

This don't mean that static typing can't be good - I personally hope that we'll get to a situation where mainstream languages have static typing in a non-annoying[1] form - just that trying to recruit people that choose dynamic languages based on bug counts are not going to get you any friends, and that setting up experiments based on this has to be done very carefully, or you'll just be dismissed as uninformed by the dynamic language community.

[1] Non-annoying static types would let me run programs without fixing the type errors first. And be succinct. And generate reasonable error messages. And overall provide enough benefits to outweigh the cases where it blocks me from using the structure I'd prefer for a program.

Random data point

I'll readily concede that there most likely are some type related errors left in dynamic programs I've worked with, I don't see any reason to think they're very frequent.

As a random data point, I remember hearing stats some while ago that something around the order of 30% (IIRC) of the crashes logged on YouTube servers every day are actually unexpected dynamic type errors.

I don't have the perception of finding any more bugs on random code reading when I read dynamic code I've written than when I read static code I've written, which means it most likely isn't that much more frequent.

Or it means that you don't find them by just reading your code, which is one of the motivations for having the compiler double check for you.

And overall provide enough

And overall provide enough benefits to outweigh the cases where it blocks me from using the structure I'd prefer for a program.

I've heard this complaint often from dynamic typing enthusiasts, so perhaps you can elaborate: what sorts of program structures were impeded by static typing?

I've heard this complaint

I've heard this complaint often from dynamic typing enthusiasts, so perhaps you can elaborate: what sorts of program structures were impeded by static typing?

Here's a few, in probably order from most to least important:

  1. It stops me from doing run-time patching for testing. In a static language, you need to do dependency injection or similar to be able to disconnect code for testing. In a dynamic language, you just override the part you want to disconnect, run the test, and restore it.
  2. It stops code that adapts to external data. In a dynamic language, you can write up something that morphs "type" based on external data - e.g, an ORM that show up with all the fields in the database.
  3. It stops using eval to incorporate configuration parts.
  4. It remove the ability to have arrays of differently structured entities. This can be used for constructing sort-of DSLs, and for me is often a natural way to evolve a program, needing less up-front design than alternative ways of doing the same thing.

I think these are all fairly much fundamental to dynamic vs static programming. None of them are critical, but they each come with some advantages. Comparing any particular dynamic language to any particular static language, there will probably be an additional list of advantages and disadvantages on each side.

It stops me from doing

It stops me from doing run-time patching for testing. In a static language, you need to do dependency injection or similar to be able to disconnect code for testing. In a dynamic language, you just override the part you want to disconnect, run the test, and restore it.

But is your complaint the inability to test, or the inability to test the specific way you want to? Some type systems indeed make it hard to mock or write unit tests, but this doesn't seem endemic to typing itself.

It stops code that adapts to external data. In a dynamic language, you can write up something that morphs "type" based on external data - e.g, an ORM that show up with all the fields in the database.

I'm not sure what you mean by "morphing type". There are plenty of typed languages that generate code based on DB schemas. You can even do this "live", although this requires a runtime "code generation"/evaluation model, just as it does in dynamic languages.

It stops using eval to incorporate configuration parts.

This seems merely due to limited runtime support rather than static typing. Having the compiler accessible as a library to a program would allow you to compile at runtime and load code of an expected type. Is this what you mean?

It remove the ability to have arrays of differently structured entities. This can be used for constructing sort-of DSLs, and for me is often a natural way to evolve a program, needing less up-front design than alternative ways of doing the same thing.

This is indeed the goto example for dynamic vs. static. I suppose I'm just not convinced this is all that appealing. We could achieve the same in a typed context by having a universal type and injecting all values into that type automatically.

To me this seems like the same bad idea as most other types of implicit conversions: when you intended it, it's nice, and when you didn't, you'll never know until it's too late. Why not design a model that's not prone to such accidents?

All are conveniences

But is your complaint the inability to test, or the inability to test the specific way you want to? Some type systems indeed make it hard to mock or write unit tests, but this doesn't seem endemic to typing itself.

All of these things are things that makes some things more convenient; all of them are possible to program around. It is clearly possible to construct unit-testable programs in a typed language; you just need to structure them differently.

In the case of tests, programming around lead to forced program structure (dependency injection or functional style) in all typed languages I'm aware of. I assume it would be possible to design a type system that allowed these kinds of overrides - for instance, by allowing recompilation of that unit of the system with a "monkey patch" in place. It would, however, be very close to dropping static typing overall - it would mean that the calls in question didn't actually necessarily go where they said they did.

You can even do this "live", although this requires a runtime "code generation"/evaluation model, just as it does in dynamic languages.

In a dynamic language, this will often be a very simple pattern - e.g, in Ruby, you just define "method_missing" and handle that call. I don't think a static system can do this; it basically means that all calls must be valid, which turns it dynamic. It would be possible to have a system that was dynamic only at this point and static for most of the rest of the system.

Having the compiler accessible as a library to a program would allow you to compile at runtime and load code of an expected type. Is this what you mean?

Yes. In retrospect, it was a bad example; I tend to think of compiler/runtime as separate, but know in theory it don't have to be that way.

It remove the ability to have arrays of differently structured entities. This can be used for constructing sort-of DSLs, and for me is often a natural way to evolve a program, needing less up-front design than alternative ways of doing the same thing.

This is indeed the goto example for dynamic vs. static. I suppose I'm just not convinced this is all that appealing.

It's appealing only sometimes. I said "often", and that's probably a bit of an overstatement - it depends on what I'm working on, and happens probably a couple of times a year. The advantage of this is exploratory programming; it's an easy way to get ahead without being bogged down in formality (like designing a class.)

Overall, when I've finished implementing the program, the data will have some kind of structure that is amendable to a type (albeit often a somewhat complicated one). However, as I'm working on it, I'm playing around with the data, trying out different ways of structuring it. I'll probably have the data inconsistent for a while. And I don't want my environment to bug me about that at that time - I want to be able to just get on with what I'm doing.

We could achieve the same in a typed context by having a universal type and injecting all values into that type automatically.
To me this seems like the same bad idea as most other types of implicit conversions: when you intended it, it's nice, and when you didn't, you'll never know until it's too late. Why not design a model that's not prone to such accidents?

The ideal state would be a model that, without adding any costs, were not prone to such accidents.

Now, all models that we have (and probably all we can make) add some costs. At the trivial level, all programs with type checks will either reject some programs that would execute fine, or will not terminate compilation in all cases. Also, static languages will typically demand that you have the types line up before it allows you to execute the program - and this requires a shift from whatever I am doing to satisfying the type checker, often satisfying the type checker for something different than what I was thinking of last.

This is a core point that seems to be generally ignored by static typing enthusiasts: I *don't* want my errors to be showed in my face as early as possible. I want them when I want them. I want to be able to start executing my program, type errors and all, and then fix the test I'm most interested in. Maybe that will drive the structure of the rest of the program - so the type errors will not end up being fixed after all, but that code removed, or so rewritten that the initial errors don't matter.

From just today: 1) A data

From just today:

1) A data structure that is a container of T plus a function on T's. I would have needed existential types to hide the T, which I don't get in many languages.

2) A couple of functions that take heterogeneous list of items that the function will convert to the interface it needs the elements to be with an extensible conversion function. I know no statically typed language that makes this possible. You could do this with type classes and existentional types, but that requires explicitly introducing a wrapper on each element of the list, and even that does not give you the extensibility.

3) A couple of functions that work on arrays of arbitrary dimensionality. Would need a complicated specialized type system or complicated type structure in a powerful type system to do this (or give up and just use a runtime checked Tensor[T] data type of any dimensionality).

4) A couple of places where a variable is initialized to None/null which is set at a later time. Would require introducing an unnecessary option type.

5) A small text area in a visualization that allows you to customize the computation that leads to the visualized data it by writing some code, which the application runs dynamically. This is of course doable in theory in a statically typed language but in practice is hard, especially to get the data from the piece of code into the application. This is a hassle free one liner in most dynamic languages.

As you see these problems are not insurmountable. The problem is that a type system imposes a cognitive burden. With dynamic types you can just write down what you have in your head, instead of constantly having to solve the puzzle of how to fit it into a type system. For example I did not even realize that I had needed an existential type for #1 until I read your question here. I simply wrote down what I meant. If I was constrained by a type system, I'd probably not even have thought of some of these things. For example it's likely that I would have just defined a few copies of the #3 that works on 1D, 2D and 3D arrays. And for #2 the design would have turned out differently (and worse). #5 would not have happened.

That is not to say that a type system would not have made things easier in other places, it certainly would.

As you see these problems

As you see these problems are not insurmountable. The problem is that a type system imposes a cognitive burden. With dynamic types you can just write down what you have in your head, instead of constantly having to solve the puzzle of how to fit it into a type system.

Why not just write what's in your head in a static language too? As you said, you could just define an abstract Tensor[T] data type and track units at runtime as you would in a dynamically typed language. Where it's easy to rule out errors via types, you can do so, and where it's not immediately obvious, don't sweat it.

It seems that the cognitive burden is largely self-imposed, not necessarily intrinsic. I'm guilty of overcomplicating types to get the static guarantees I'd like too, but I acknowledge I'm a masochist that way.

2) A couple of functions that take heterogeneous list of items that the function will convert to the interface it needs the elements to be with an extensible conversion function. I know no statically typed language that makes this possible. You could do this with type classes and existentional types, but that requires explicitly introducing a wrapper on each element of the list, and even that does not give you the extensibility.

I'm trying to understand the context where such a solution is required. Suppose you have a person that had never heard of dynamic typing, and he were faced with the same problem that you consider to be naturally solved by "heterogeneous lists with custom functions". How would he have to solve the problem you faced?

This is a hassle free one liner in most dynamic languages.

Is it a hassle-free one-liner because of the hassles of typing, or because of the runtime support for eval? Because runtime support is a red herring when discussing dynamic vs. static typing. Typing does not preclude eval, it's simply uncommon in typed languages because they were often only used in contexts where eval isn't needed (and because compilers are often heavyweight).

Since I know you're familiar with .NET, we now have the compiler-as-a-service library. Are the one-liners now possible in F#?

Why not just write what's in

Why not just write what's in your head in a static language too?

Because then you need to sprinkle your code with casts, which makes your code ugly and defeats the purpose of static typing. You're more likely to use a less general and less convenient but typeable solution that probably involves code duplication, but in both cases you've spent a lot of time figuring out that there was no way to fit it into the types (for example give an ordinary programmer who does not know what an existentional type is the task to write something that requires an existentional type in a language that does not support them -- I can assure you that he will spend a LOT of time before he figures out that it's not possible). I do not think the burden is self-imposed. With a dynamic language these issues are simply non-issues. In a static language they do become issues. It's not me that makes them issues, it's the type system.

How would he have to solve the problem you faced?

He would have to call the appropriate conversion function for the right type at the point where he is building up the lists to pass to the library.

Because runtime support is a red herring when discussing dynamic vs. static typing.

In theory perhaps, but in practice it's not. Virtually all dynamically typed languages have this, virtually all statically typed languages don't. This is no coincidence, it's simply harder with static types.

The compiler as a service library does not have support for F# as far as I know. But suppose that C# is enough. Even then it would be quite a bit more difficult to set up communication channels between the child program and parent program.

As I said, none of these are show stoppers, but these problems do help tip the scale to Python for many tasks.

Because then you need to

Because then you need to sprinkle your code with casts, which makes your code ugly and defeats the purpose of static typing.

The purpose is to write a working program. You claimed that a program required a certain structure that defeats typing. Then you criticized a design in a typed language that avoids the alleged typing problems as defeating static typing. But that's what you wanted to begin with! I don't think you're being fair.

Finally, unless I've misunderstood your example, you don't need to cast at all except at the edges of the computation, ie. at the beginning to lift the input values, and the end to extract a result or an error.

He would have to call the appropriate conversion function for the right type at the point where he is building up the lists to pass to the library.

I was asking what sort of problem necessitated these lists to begin with.

Virtually all dynamically typed languages have this, virtually all statically typed languages don't. This is no coincidence, it's simply harder with static types.

I'm not denying it's harder, but it's not so hard as you seem to imply. Rather, it's trivial in dynamic languages to support eval since that's just the interpreter itself. It's not trivial in a static language, but it's not especially difficult either. I could write a C# library right now that calls the C# compiler to build an assembly from a code fragment; it would take maybe 30 minutes.

The reason it's not typically provided in static languages is because the mindset in this domain is entirely different, and eval is not a solution most would have even considered. With static languages now being used in more dynamic domains, it's absence is being felt more strongly. A little chicken/egg problem here.

Lack of 'eval' considered a feature

FWIW, quite a few language designers - even from the dynamic camp - consider the absence of 'eval' a feature (see e.g. Google's Dart). And if you look at how 'eval' is actually used in the wild then it's not hard to see why they think so.

Eval is a very powerful and

Eval is a very powerful and useful primitive to expose to users, especially for data analysis software. For example if there is somewhere you can specify a formula in the user interface (e.g. a spreadsheet), then eval quickly gives you a powerful formula language. Or a filter expression for filtering a table of data, etc.

Eval is too powerful

Yes, and it also tends to give him access to a wide range of other things that he shouldn't be able to access or even know about. Using 'eval' for this purpose is a well-known integrity foot bazooka.

JSON started out being implemented with JavaScript's 'eval'. The web crowd quickly realised that this was a horribly bad idea.

Eval is only really bad if

Eval is only really bad if your language exposes ambient authority. Most languages do, hence eval generally causes problems. Fixing the real problem is probably better than eliminating a useful tool.

A good eval doesn't have

A good eval doesn't have these problems, though you're right that most languages don't have a good eval. The trick is being able to specify the top level lexical environment as a parameter to eval. For trusted users an unsafe eval is okay (in my case it's a GUI application running on the person's own computer, so if they want to erase their hard disk that's fine with me).

A good citizen

I smiled at the similarity of attack and defense techniques used in this thread.

Most static languages don't have eval, "this is no coincidence".

Is it a hassle-free one-liner because of the hassles of typing, or because of the runtime support for eval? Because runtime support is a red herring when discussing dynamic vs. static typing.

In theory perhaps, but in practice it's not. Virtually all dynamically typed languages have this, virtually all statically typed languages don't. This is no coincidence, it's simply harder with static types.

Most languages have a "bad" eval, but that is okay.

Using 'eval' for this purpose is a well-known integrity foot bazooka.

A good eval doesn't have these problems, though you're right that most languages don't have a good eval.

I think it is fair to say that we want to have a service for user-provided code at runtime in a controlled environment (no ambient authority). If admittedly existing solutions are bad (flawed or nonexistent) with a high probability, anyone is free to choose their preferred poison. That said, I have seen a lot of user-scriptable programs (usually with an embedded interpreter for a script language), including GIMP, {Microsoft ,Open,Libre}Office, Firefox, and I was never under the impression that the host language (and whether it was statically typed or not) made a big difference -- except maybe for Emacs.

Bad evals are not okay, but

Bad evals are not okay, but given a bad eval it's okay to use it in certain circumstances. Perhaps the reason why those user-scriptable programs don't use the host language for extension is that the host language does not provide eval? Arguably Firefox is similar to Emacs in this regard, the front end is written in JS which is the reason why it is so easily and thoroughly extensible. Firefox and Emacs, besides being the programs that can be extended in the language they're written in, also happen to be the most widely extended software. A coincidence?

Firefox is not really written in Javascript

It's a bit far-fetched to claim that Firefox is "extended in the language it's written in". According to Ohloh, 14% of Firefox source code is in Javascript. It's certainly true that the XUL/Js combo is touted as helping extensibility, yet I'm not convinced by the idea that Js was selected because it has eval; I'd rather bet they found it more convenient and user-friendly (or user-familiar) that the other embeddable language they knew. I'm not even sure the extension mechanism actually uses "eval", as it is piloted by a native engine that has control over the Javascript execution anyway.

Besides, the use case you presented for eval was giving the user more control over visualization or computations by allowing them to write some code. That's far closer to VBA macros for Excel than Firefox or Emacs scripting; and in magnitude it may be a bigger use of embedded scripting. By the way, the Excel use case immediately suggests that "non-sandboxed eval is not an issue if it's only for the user on its own machine" is not a stellar idea in the long run.

I don't know why JS was

I don't know why JS was selected either, and I know that Firefox's back end is not written in JS (it's the same for Emacs, which is written in C, although a much larger proportion is elisp). My point is that it is easier to a make an application extensible if the language it's written in is the same as the extension language, because then you already have the right APIs ready. For example because Firefox's front-end is written in JS, there likely already exists JS APIs for manipulating the menu bar, so you can directly use that in the extension language. If instead the extension language was something else, you'd need to write bridges for all the APIs. Especially for applications that are more on the throwaway end of the spectrum than FF and Emacs, you can see why using a built-in eval can be cost effective when embedding another language is not. Which is why I like languages that provide eval.

That a safe eval is needed if you share the formulas with others is a good point.

Restricted forms of eval

Dynamic languages could meet static half-way by offering evals over terminating, perhaps even data-only (i.e. JSON notation) subsets of their grammar and access to language facilities or environments.

I'm not discussing popular dynamic languages other than JavaScript, but it is feasible (been experimenting with a modular language privately) and manages to merge the perspectives a bit.

I didn't claim that is was

I didn't claim that is was required (turing completeness, etc.), I was simply answering your question "what sorts of program structures were impeded by static typing?" by taking a couple of program structures from dynamically typed code that would be impeded by static typing.

If the solution is to throw the type system out with casts, then I do not consider that a success of that type system. I did not want to write casts, and I did not want to think about where exactly to put them and whether perhaps they can be avoided by some different typed design. What I wanted was to write down the program that exists in my head. I do not want to have to think about the constraints that the type system imposes on the programs that I can express. Thinking about the program alone is difficult enough. As I've said a couple of times before, I'd love a type system that is there to help me instead of me being there to help it. Let me express anything, don't force anything on me, and let it give me information that it can deduce about the program I wrote (like "this can possibly go wrong, but I'm not sure" or "this definitely goes wrong if x has value foo")

I was asking what sort of problem necessitated these lists to begin with.

Ah :) It was a mini-dsl where the elements of the list are elements of the DSL, OR some primitive thing like int or string that can be lifted to a value in the DSL. It's not too hard to manually apply a conversion, but it's not pretty either.

It's not trivial in a static language, but it's not especially difficult either. I could write a C# library right now that calls the C# compiler to build an assembly from a code fragment; it would take maybe 30 minutes.

Right, it's certainly not impossible. There would be some difficulty in getting typed values back out of the assembly. C#'s dynamic can help here, but hey, that's dynamic typing. The point is I don't have to do all this work in a dynamically typed language: I just call eval and be done with it. I don't have to worry about shipping the C# compiler, it's incredibly fast compared to compiling and running a bunch of small expressions, I don't have to worry about where to put the file to compile, etc. [some of these are solved by project Roslyn]

I was simply answering your

I was simply answering your question "what sorts of program structures were impeded by static typing?" by taking a couple of program structures from dynamically typed code that would be impeded by static typing.

Perhaps I should clarify: as Sean said, language shapes thought, so the point of my question was to ascertain whether exposure to dynamic languages makes you want to write programs using that mindset, instead of using the actual language you have at hand. Hence the scenario where I posited a person who had never encountered dynamic typing. Such a person may never even consider hetergeneous lists, and so on, because such a thing is either impossible or not worth the hassle.

The fact that you are aware of that "shortcut" via dynamically typed data structures may close off avenues of thought that would have led to a simple typed solution. For instance, Danvy's solution to the printf problem in ML.

It was a mini-dsl where the elements of the list are elements of the DSL, OR some primitive thing like int or string that can be lifted to a value in the DSL. It's not too hard to manually apply a conversion, but it's not pretty either.

What were the lists used for? I get that you'd have to lift values into the DSL, I'm just trying to understand the problem more fully. There are plenty of examples of this sort of thing that people considered impossible to do in a typed context, but most of them have reasonable solutions these days.

I do not want to have to think about the constraints that the type system imposes on the programs that I can express.

This is only true of some typed languages, like C#. Other type languages simply ascribe no meaning at all to ill-typed expressions, so it's not simply a matter of considering type constraints in addition to program constraints, it's a matter of using the formal language you have at hand to model your program. Either your language requires an elaborate model for your program, or it can express your model concisely.

Even for those scenarios where elaborate models are necessary, there are often simplifications to trade off static safety for speed, instead of trading off all safety. For instance, embed a monadic language for dynamic typing in the standard library. Now there's no need to ever resort to a dynamically typed language, just use the typed one and inject/project from Univ when that would simplify a program.

No need for closures!

Taking the first question first: I believe experience with any language will make you want to use the best features of that language when they fit with the problems at hand. So having experience with dynamic languages will form a way of thinking and working that fit with dynamic languages; having experience with static languages will form a way of thinking and working that work with those. Any transition will come with lots of costs that are not there for somebody that's "native" - you're taking extra costs because you're losing the ability to do the useful things you're used to, and you don't have the expertise to fully exploit the useful things in the new environment. This applies both to people used to dynamic languages and using static languages - and, more subtly, to people used to static languages trying to use dynamic languages. The latter have at least two phases available before becoming proficient: First, they use only static type structures, and don't exploit the language being dynamic. Then, if they stay with dynamic languages, they go giddy, and overuse the dynamic features. Neither of these are ideal uses of the languages; use dynamic features when they make things easier, avoid them when there is no point.

As for coding what types are dynamic: Let me try to illustrate why this isn't always a good solution by using an example in a different context (which hopefully resonate more closely with your way of thinking about programming problems):

For instance, just encode all your closures as

static final class ArgType {
  final Foo foo;
  ...
  ArgType(Foo foo) {
    this.foo = foo;
  }
};
...
  return new Function() {
    @Override
    RetType call(ArgType args) {
      ...
    }
  }

... and all your calls as

  rettype = func.call(new ArgType(arg1, arg2));

Now there's no need to ever resort to introducing closures, just use the simple pattern above when that would simplify a program.

Having to explicitly mark types and mark them as dynamic has to same kind of impact for DSL use as the above has for use of closures - it can be done, but it makes the declaration exceedingly noisy. In the case of a DSL, this is in a specific context where you're trying to remove noise.

A couple of functions that

A couple of functions that work on arrays of arbitrary dimensionality. Would need a complicated specialized type system or complicated type structure in a powerful type system to do this (or give up and just use a runtime checked Tensor[T] data type of any dimensionality).

Been giving some thought to your matrix example with typed units. Definitely an interesting case that stretches the limits of current typed languages. A few possible solutions:

  1. Part of the awkwardness is that while matrices are 2D, type parameter lists are 1D. So perhaps a 2D type parameter syntax would help somewhat. I'm skeptical though.
  2. The requirement to specify type parameters within the declaration, as opposed to the definition makes this awkward, ie. we are required to input "type ('a, 'b) pair = { fst:'a, snd:'b }", when it would be nicer to simply do replaced by "type pair = { fst:'a, snd:'b }", and have ('a,'b) pair be inferred. This would mean you only need to specify the large list of type parameters for a matrix once instead of twice. I like this solution much better.
  3. Matrix operations are better specified as type-level functions, since they operate element-wise. Matrices themselves are inductive structures defined by type-level numerals, so this seems like a good fit. This solution is probably the best, but rather cutting edge at the moment. Haskell only just got type-level number support.

Null exceptions are a great

Null exceptions are a great example of a frequent dynamic error that does not arise in decent typed languages.

Compile times are a non-issue. If C++ or Java is the basis of your experience then rest assured that their abysmal compile times have zero to do with typing.

More importantly, you are probably not aware that competent programmers in modern typed languages make heavy use of typeful abstractions to get much more checking out of types than just the naive errors you are familiar with.

Evidence? I've seen static

Evidence? I've seen static languages that convert NPEs into some other kind of error, like case-not-matched, seems dishonest to me to say that "they eliminated the problem!" Surely, a decent static type system could get rid of the more careless NPEs, but then these are the ones that are easy to fix anyways and can be detected by unsound bug checking tools such as Coverity. The hard NPEs don't go away at all; e.g., the ones that arise during complex initialization of a cyclic data structure.

Abysmal Java compile times? Java compiles fairly fast, C++ compiles slow given that its just an abysmal language (templates). Scala compiles slowly mostly because of its advanced type system. But whatever, all compile times are rather reasonable.

Ideological (static) programmers spend lots of time designing nice typeful abstractions that may or may not payoff, or (dynamic) writing unit tests for everything that may or may not payoff. Competent programmers are much more pragmatic and look for the best value of both approaches.

Null pointer exceptions are so 1980

Anecdotal evidence is that I can't even remember when I last saw, say, an 'Option' exception in ML, which would be its equivalent to a NPE. Simply because types are not infected with null inhabitance, you only use the explicit option type in the rare places where it is actually necessary, and you always case-eliminate it explicitly. I'm pretty sure that most programmers in similar languages can confirm this experience.

My point was that this can

My point was that this can work for eliminating the careless NPEs, but not the hard ones that we are really worried about; the ones that would really require some amount of theorem proving to eliminate statically. This is from my experience with Scala (which supports both).

Don't blame null.

I agree with Andreas for the most part. Variants and pattern matching really eliminate nearly all of these kinds of errors. You just simply match over the patterns and are usually prepared for the "none" case. Couple with a nice pattern match syntax and it makes putting in a default case really easy. What is truly evil is the combination of objects that can be null and trusting the object to do the dispatch (virtually or multiply). The obsession of objects and structs and pointers is really the culprit, IMO, not the concept of a null or default value.

For the life of me, I don't

For the life of me, I don't know how None is any different from null. How is None somehow safer than null? Is it just a nice pattern match, or is it the fact that you can specify that the result is not null (say Is(t) vs. Maybe(t))?

The problems with object dispatch don't go away when null is transformed into none. Sure we can delay knowing about the problem with Map, but this is often more harmful than useful.

Because variants don't have

Because variants don't have the properties that their cases offer, if you have e: Maybe, you can't just write e.someMethodOnT(), but instead you have to either pattern match or do an e.assertExists().someMethodOnT() or similar, the latter being runtime check (for example a helper function assert() of type Maybe -> T). In practice, languages with pattern matching make it so much easier to do the pattern match right where you are, that you almost never do the latter case. Besides, Maybe is simply *screaming* at you that it might be None. You can use the type system to push the asserts out of the internals and into its interface; inside you just use T everywhere, and that by virtue of the static type system, your code assumes, indeed *screams* that T must exist to use it.

Pattern matches are still

Pattern matches are still expensive syntactically. Most of the time, you aren't really sure what to do if the value comes up null, you aren't really expecting. So with options you either just de-ref immediately (incurring a run-time NPE on failure), or push it to the method you are calling, which might have legitimate reasons to go either way based on dependencies that you can't express in the type system but are still (you think, without proof) static. Such is the world of imperative programming.

pattern matches are worth the syntactic cost

Having worked on large-scale codebases in ML, the syntactic cost of match seems quite minimal compared to its bugfinding power.

The idea that this would only get rid of "trivial" errors does not ring true to me in the least. Pattern matches and variant types help you avoid many errors that otherwise cause real system to fail. I suppose in some sense these errors are trivial, but the point of a good type system is to enforce simple rules uniformly (and cheaply), which lets you spend more time on the deep structure of your program.

Not if you are only pattern

Not if you are only pattern matching on None and Some, and especially when you have nothing interesting to do with the None match other than crash. Pattern matching is much more useful for other cases, but Opt isn't really one of them. The only win I can think of is when you have interesting patterns wrapped up in an Opt, then at least the complexity of the Opt check is amatorized by other interesting work.

Pattern matching with option

is I think still quite useful. I agree that there are other, richer uses of pattern matching. But my experience, which includes dealing with large scale production systems where correctness is of great importance, is that having the types tell you when you do and don't have to worry about null is itself quite valuable (because if you play your cards right, for most values, you don't). Indeed, I wish I had better hints from the type system about other kinds of partiality, like exceptions.

None vs. Null

There are three primary differences:
1) Maybe is parametric, i.e. you can have `Maybe (Maybe a)`.
2) Developers must explicitly check for `Some a | None` and handle the None condition. Or they must abstract some patterns (functor, applicative, monad, etc.) to perform these checks on their behalf.
3) Developers can write functions that safely assume all arguments are actually provided or results will be valid (by *not* using Maybe). Thus they can control where and when the problem is experienced.

Of course, many languages with Maybe still have their equivalent to `null`. Such as `undefined` in Haskell. But developers can be disciplined and not use undefined. I certainly avoid its use whenever possible, even if it means plumbing Maybe throughout a program.

Some variations of Maybe, such as Nullable, drop the parametricity property. It can be difficult to reason about generic programming or algebraic properties of types without parametricity (1 + 1 sometimes equals 1).

You earlier state:

The hard NPEs don't go away at all; e.g., the ones that arise during complex initialization of a cyclic data structure.

This does not match my experience. But I should clarify by saying: there are other idioms for complex initialization of cyclic data structures than use of nulls and state. Laziness, fixpoints, single-assignment logic variables are all possibilities.

Combination of things

I agree with Andreas as well, but also agree with your point. For one, I think what you call the careless case is the common case, and thus eliminating it eliminates most errors (and yes, the main point is that results are not null unless specifically wrapped in an Option type). The other thing is that the languages that encourage option types and pattern matching are functional and discourage the type of imperative hook-up-the-pointers-one-at-a-time style that leads to the deeper problems you're talking about.

I would argue strongly

I would argue strongly against the last point: they make the hook-up-pointers harder in general and therefore discourage those designs, but you often need those designs! So your FPL actually gets in the way at this point, b/c you need to do something bad like express a real graph.

At least Scala gives you flexibility, there are some nice use cases for Option but most of the time you just deal with raw references. Option definitely doesn't make null go away as a concern; null is often the lesser of two evils.

Sort of agree, but not really

Sometimes what you need is mutation (e.g. updating a graph), but as I think back on NULL pointer bugs I've tracked down, I have the impression that most of them weren't from an error in the logic of a function that used mutation in a disciplined way. Rather they were the result of reasoning over convoluted imperative code that attempted to handle complex state transitions with ad hoc mutations. Or where helper functions were called on partially updated objects that should probably still satisfy the invariant of helper (right??).

Of course, you can avoid these kinds of traps/bad designs in imperative languages, too, but the 'immutable by default' mindset helps.

Working in the reactive

Working in the reactive domain, I write a lot of non-deterministic imperative code, so my experiences are definitely different. In this case, the null arises because my orderings are off and I need to add more run-time enforced ordering constraints to the system.

In my field, immutable by default is never really a serious option for the important stuff. That is something you save for your small value-like data structures.

Edit: I guess the ability to avoid NPEs with better constructs is related to the ability to adopt functional programming, not necessarily as an aspect of a powerful type system (though FPLs generally have those more than imperative languages). Throwing support for options into an imperative language alone isn't good enough, the whole style of programming has to change.

Non-proliferation of failure

How is None somehow safer than null? Is it just a nice pattern match, or is it the fact that you can specify that the result is not null (say Is(t) vs. Maybe(t))?

Both, but the latter in particular. In practice you can confine use of optional types to very few places. And if you do, there will be clear, explicit points in the program where you go from Opt(T) to T, which then can fail locally. But failure cannot silently proliferate to unrelated sites in the code.

It is true that cyclic initialization is a challenge. But (1) it is rather rarely needed in full generality IME, (2) for simple cases letrec is enough, (3) for others you can use (implicit or explicit) laziness instead of mutation, see Haskell, where FRP works just fine, (4) in a language that is impure anyway nothing prevents you from using Ref(Opt T) if you must, and (5) Promise(T) or some other form of single assignment reference actually is the more adequate solution if you really need incremental construction.

The huge advantage of Promise(T) over Ref(Opt T) again is that failure cannot silently proliferate. When you access an uninitialised field you get an exception locally and immediately.

None of these techniques is specific to functional programming either. It is true of course that you run into fewer problems with fewer side effects. But that's an advantage of FP, not a disadvantage of explicit option types.

The huge advantage of

The huge advantage of Promise(T) over Ref(Opt T) again is that failure cannot silently proliferate. When you access an uninitialised field you get an exception locally and immediately.

Is that true? If I access a field with an option type, then my failure will still silently proliferate. Likewise, if I access my fields and immediately do something with it (even if just assert-not-null), then my NPE will go off in the right place. The only advantage we get is that we can declare that the null check was already done in the type system by returning a non-nullable T.

Ok, the extra dethunk step makes the check explicit, and is probably more elegant than assert-not-null. And the non-nullable T saves on redundant null checks, but then you have to merge back into an Opt[T]. Again, if you just dump Opt[T] on someone in an impure language, its not meaningful without changing how they program in more drastic ways.

And that is cyclic recursive state that must be mutated and updated incrementally (the world changes, fix the model). But I don't want to get into another argument about FRP nested in this one :)

Good will and discipline

Likewise, if I access my fields and immediately do something with it (even if just assert-not-null), then my NPE will go off in the right place.

Right, except that often enough that's not what is happening, because there is no distinction between ordinary and "dangerous" variable accesses, and it's just too easy to forget about it. I think the omnipresence of NPEs in production code is evidence that good will and discipline alone is not sufficient. (And I have certainly fallen into this trap myself.)

Again, if you just dump Opt[T] on someone in an impure language, its not meaningful without changing how they program in more drastic ways.

Yes, you have to make good use of it (essentially, by avoiding it whenever possible). But it's not a "drastic" change by any measure. And I fail to see what it has to do with pure vs impure.

Re: Good will and discipline

Good will and discipline

And that is the difference between the philosophy of a strongly static language vs. other languages (dynamic or just statically typed with an inexpressive type system). Brush your teeth, go to the gym, meditate, control yourself, and think deeply before you do anything.

NPEs have a quick but dirty property that can't be discounted as being completely unreasonable and contributes to their continued popularity versus more typeful and safer alternatives. Getting back to the question at hand, then, its a matter of reasoning about which one is better given a particular context.

Strong update

Regarding cyclic initialization, I think none of your solution are entirely satisfying; that's one not-infrequent use case of `Obj.magic` in production OCaml code. There are people working on type system with strong update¹ and I think we'll eventually have those statically checked in the impure typed languages.

¹: local bias and all, I'm thinking of François Pottier's work, but there is also for example Fähndrich's work on "Delayed Types".

Cyclic initialisation

Regarding cyclic initialization, I think none of your solution are entirely satisfying

I agree wholeheartedly, but they are all better than mutation-based initialisation with implicit null pointers (or if you ask me, using Obj.magic, for that matter).

Cyclic initialisation is one of these idioms for which no good static checking seems to exist yet, at least not for the general case. There have been quite a few papers on the subject, but AFAICT, none of them seems sufficiently practical.

So we have to live with dynamic checking for the time being. But have that as tight as possible! Null pointers are the opposite.

Unison

We indeed agree. I think those systems will become practical soon. I don't usually play the prediction game, but I bet that this class of "simple errors" will have been globally eradicated in ten years, either by explicit adoption of richer type checks in the mainstream language du jour, or equivalently sneaking in as a separate static analysis component that gets enough widespread usage in practice to be considered standard -- sadly, I'm under the impression that Microsoft is ahead of the game there.

All of that static analysis

All of that static analysis tech excels at detecting NPEs because the easy stupid cases are easy for the tools. The complicated cases are just not amenable to anything but theorem proving, but we aren't arguing about that.

Given that dynamic languages keep getting more popular, not less, I think the PL community really needs to rethink fundamental assumptions about where programmers really find value! NPEs are apparently not painful enough to push the mass adoption of safer languages, and in fact the opposite is occurring.

I'm not convinced the

I'm not convinced the adoption of dynamic languages is being driven by null pointers and less safety. Rather more likely, it's being driven by features that simply aren't available in prevalent static languages, such as concise and more powerful reflection.

My only point was that

My only point was that dynamic languages are being adopted in spite of null pointers (and consequently less compile-time safety). I'm then guilty of concluding that the problem with null pointers is not large enough to deter adoption.

The most valuable reuse is

The most valuable reuse is specifically unanticipated. Such things stretch the boundaries of what can be done with available elements; stretch a bit here, stretch a bit there, and it all adds up — in ways that by nature cannot be specifically foreseen, which is the point. It doesn't take much to destroy that, by making each of those bits impossible to stretch. Dynamic languages maximize what can be stretched.

Dynamic languages maximize

Dynamic languages maximize what can be stretched.

I don't think this is true -- particularly not if you intend "dynamic languages" to include the style of reflection popular in dynamic languages. It might be true in comparison to currently popular static languages, but ultimately static languages will be able to offer flexibility that requires an understanding of the code that dynamic languages lack.

ultimately static languages

ultimately static languages will be able to offer flexibility that requires an understanding of the code that dynamic languages lack.

I've heard something like this said before. I'm struggling to grok the intuition behind it (my own intuition suggests it should be a secondary effect, whereas dynamic languages removing restrictions is a primary effect and therefore likely to overwhelm the secondary; but being proven wrong would be good too :-).

Can you, by any chance, give a simple example illustrating additional flexibility enabled by static languages? (To help kick-start my intuition.)

I've dipped into this area a

I've dipped into this area a couple of times before. Don't just think of types as a bondage and discipline thing whose jobs are to say no (for your own good). Think of them as providing semantic feedback (code completion) and peripheral awareness about relationships in your code.

But a static language can never be as "flexible" as a dynamic language if by flexible you mean "accept more programs:" the dynamic language has the additional luxury of saying no less because run-time data is present.

I'm not exactly sure what Matt is referring to on reflection. You can always reflect your program at run-time. The difference is that you might not be able to detect static relationships that could be inferred by a compiler statically, which you could use to change behavior of the program; i.e., you know something will happen in the future and run some code now to prepare for it.

Technically

But a static language can never be as "flexible" as a dynamic language if by flexible you mean "accept more programs:"

Well, technically speaking, that is of course incorrect. You can always weaken the precision of a type system as far as you like, even to the point where only one type is left (as Bob Harper likes to insist, untyped is a special case of typed). But you also loose all the advantages on the way.

That is almost the same as

That is almost the same as saying dynamically typed is a special case of statically typed, which I definitely have no qualms about (most statically typed languages includes some amount of dynamic typing). More static typing generally means less flexibility, although the static typing added can be flexible (e.g., structural vs. nominal types). Ideal static typing would be as flexible as full dynamic typing, which is not achievable in practice (requires heavy amounts of automated theorem proving).

Examples and what 'dynamic' means

For one example, add automatic unit conversions/dimension checking to a physical calculation. We can do that in a dynamic language, but in a calculation intensive program might lose an order of magnitude of performance [edit: It wouldn't surprise me at all to learn that this would be almost free in good implementation of a dynamic language -- this was just on my mind having tried this recently in Python. Feel free to ignore this example.].

For an example of something that just isn't possible dynamically (which isn't to say that it's impossible in a dynamic language, but you have to statically encode the extra data), pick something that requires analysis before the fact, like Sean said. Maybe we want to convert a function to run on a GPU. Maybe we want to build a schema describing the data used by our program before running it.

It's possible that we're not using the words the same way, but to me dynamic is more than just untyped. It means that the constructs of the program don't have a chance to participate in the semantics until run-time, when they are executed. In a static language, you have a static pass in which in the constructs contribute a statically to the semantics. The way I use the words, that static pass can do more than just type check and reject programs from the later pass. When phrased this way, it seems clear that the static approach is more flexible.

On the units: you mention

On the units: you mention that a good implementation might compile dynamic checks away if it can prove that they don't occur. Even better is if the implementation does that in the compiler rather than in a JIT/at runtime, because then you can show the programmer which unit checks are guaranteed to succeed. You can generalize this to: run time assertions + compile time analysis gives you a type system that tries to prove that things don't fail (and dually, randomized/whitebox/etc testing of assertions that tries to prove that things do fail).

You can run code on a GPU in a dynamic language. You just need to have access to the code at run time, which for example in Javascript you have (calling tostring on a function gives you the code).

In my opinion the largest loss of dynamically typed languages is type classes, and inferring code from types in general.

No smoking gun

To me, dynamic programming languages munge data and meta-data that should be phase separated. Yes, you can probably tease that data back apart for an efficient implementation, but that comes with a complexity cost. Also, that mixing just makes the code uglier, IMO. Type directed code selection (type classes) is a good feature, but honestly you can probably hack together something dynamic that works almost as well there too, and in the rare cases where it fails to pick the instance you want, let's you manually specify. (Note: I think that type class instances should be resolved at edit time).

Phase Separation

Perhaps you'd favor Newspeak's approach to reflection, which provides securable separation of data and meta-data, and would not be unsuitable even for a static language.

There is a lot of design space between dynamic and static. One I'm interested in involves live staged programming, where each `stage` seems dynamic to those below it and static to those above it. It isn't very clear to me whether such a language should be considered static or dynamic, but `phase separation` should be clean enough.

I think Newspeak's approach

I think Newspeak's approach is in the right direction, but doesn't go far enough to fixing reflection. Or maybe it does -- I haven't studied it too deeply.

Can you elaborate on live staged programming? I think I know the problem you're trying to solve, but I don't know what you mean by "seems dynamic" to the layer below.

Live staged programming

Live programming means I can modify code while it is running. Staged programming means I use one program to build another, which is subject to its own compilation. Live staged programming is at the intersection of these properties. In my case, I model a program as continuously, reactively building and modifying another program.

I think of this as a tower, each layer a program that is built and maintained by the layer below it. At any given layer, if we look down or around everything has a longer or equal lifetime than we do (and thus seems static), and if we look up everything has a shorter lifetime than we do because we're responsible for maintaining it and the code we install is subject to runtime data (and it thus seems dynamic). Static vs. dynamic is relative to our point of view.

Many of these towers may exist in parallel, and communicate indirectly. This models multiple applications, or federated services.

The lowest layer, human source code, is no different. I.e. we can model the interpreter that turns source code into a program as the next layer down. We can model traditional compilation by asserting that certain inputs are frozen, though nothing like that really exists (in practice all software, all the way down to firmware, is subject to updates).

More interesting is the reasoning we can perform at each layer, and for communication between them. I.e. we can ensure correctness of any given subprogram before installing it, but we need some way to reason about correctness of behavior over time.

A constraint for live staged programming is that I do not model state as internal to the program. I need to replace subprograms without losing information. So I model state as external to the program, e.g. in abstract databases or registers.

Interesting, but nested too

Interesting, but nested too deep, need more new topics.

Live programming is hard

I agree the way to approach it is to isolate state. And even then, care will need to be taken to support it. For example, closures in that state need to be tamed.

I'm still not sure I follow what you have in mind by "dynamic." Do you just mean that you use a tagged representation for values in a higher layer? i.e. you have a type Entity which represents a value in the typed layer above you, which includes a descriptor of the type of that value?

I agree with Sean that we're writing into the margin here.

I've done this before of

I've done this before of course. The state has to be externalized somehow and managed somehow, which SuperGlue did using cells that generally persisted across program changes. The recently introduced Circa language proposes something like inline state, which seems similar to what I used cells for. This is a very interesting area and I think this is the year to start work on it. If there are any bright students who are interested in this topic and want to come to MSRA for a an internship...

Inline State

I've approached various notions of `inline state`, but I've not found a satisfactory concurrency model for it that doesn't rely on linear types. The issue lay with controlling the update function, concurrently.

External state can look and feel a lot like internal or inline state. For example, an IDE could actively display external state mixed in with the code, so long as it has a stable static identity. This is the approach I pursue.

The difference is that external state comes with its own update model. I've developed a few state and update models suitable for a reactive, event-less paradigm - tuple spaces, reactive term rewriting, reactive state transition systems, even some stateless stable models.

Dynamic just means behavior

Dynamic just means behavior is determined at runtime. The important bit is that you can't reason about a dynamic program the same way you can reason about a static program. (But this isn't a binary property.) Perhaps you connote "dynamic" specifically with "dynamically typed", but I do not.

[edit] Relevantly, reasoning about correctness means much more than reasoning about data types, at least in an open system. I favor capabilities to provide structured control for access to effects, but they must be leveraged in a disciplined way to augment static reasoning.

relative to POV

dmbarbour: Static vs. dynamic is relative to our point of view.

Very nicely said. I can't encourage you enough to pursue that line of thought. Next questions sound like: When is this POV applicable? What can be seen from there? Do contracts vary by POV? Which invariants are true across which views?

Considering multiple views at once gets confusing. Terminology is sparse and weak here, and could use basic elaboration.

The difference between global and local is similar. Managing that distinction is a major reason for several kinds of technique, including object oriented programming (which I have always considered just namespace local dispatching). In the other thread where I mentioned boundary logic, I later decided it would be useful to discuss global-vs-local views, but I restrained myself.

Just like nothing is completely static when everything can be updated, it's hard to find a completely global context.

If you're selecting the code

If you're selecting the code at edit time then indeed dynamic languages can support it as well (because then the inference does not need to be perfect). That's a great idea actually! I'd love to see it explored for dynamically typed languages.

Can you expand on what you consider data and what you consider metadata and why they should be phase separated?

In the case of units the dynamic approach shines in simplicity (just overload arithmetic) whereas adding units statically is hard or impossible to do from within the language in most languages. Units would be very helpful for my work, but unfortunately the static variant as found in e.g. F# is not enough. For example I need matrices with different units in different entries. This already requires a fairly elaborate type system. [Since I only care about a single result, the static guarantees add nothing in this case. If the program I ran to get the result did not have unit errors in that particular run, that's enough. I don't care whether there exists another run that can have unit errors (which is of course extremely unlikely if it does not have unit errors in a non-trivial run).]

Good example

By meta-data and data, I mean that the type descriptor is meta-data and the value is data. You shouldn't be able to ask "what type of thing is this?" or even "is this an instance of class X?" like you do in dynamic languages.

Your Matrix example is interesting. I would have Matrices with varying dimensionality of coefficients in my (Python) code base, too, if I wasn't projecting to scalars first for efficiency. I think I can do a good job of solving this with my language, and I'm going to steal this as an example problem.

I don't think practically we want to get into dependently typing all of your matrix operations. Rather, we want to have a type Quantity that has dimension as part of its value and then specialized types for e.g. Length, that subtype Quantity. So the Matrix would have entries of type Quantity and dimensional agreement would be something you could assert and prove separately. Such assertions could be used by an optimizer to e.g. do a fast O(MxO) pass to figure out the dimensions of the result when multiplying MxN and NxO matrices.

Did you choose the option

Did you choose the option "HTML" instead of "Plain Text + HTML"? In any case you can separate paragraphs with <p>.

For matrices as they are normally used, namely as linear functions, the situation is even better. The type of a matrix is a function type vec[n] -> vec[m]. You need n+m units for this. Multiplying matrices is function composition, and can be checked in O(m) time.

The question about data vs metadata comes down to whether it's a good idea to Top type that you can do anything with. I've been thinking about a dynamically typed language where you cannot ask the type of something, where type tags are used only to detect errors. Types become purely metadata, but they are still spread between two stages: checked at compile time where this can be done automatically or where a human is willing to prove it by hand, deferred until runtime where that's too onerous.

On the other hand, type dispatch is a convenient mechanism to implement serialization/deserialization, quick and dirty presentation to humans, etc. What did you have in mind to solve this problem? If you have ("dynamically typed") type classes then this concern might go away, but isn't a sufficiently powerful type class mechanism indistinguishable from type dispatch?

Thanks

Yes, HTML was selected.

I agree that dynamic tagging for type safety isn't as bad. That's what I had in mind when I wrote "particularly not if you intend [...] the style of reflection popular in dynamic languages" in my original reply to John Shutt.

I might use "type classes" to pick a serialization routine to use when based on the type of "x" when you write "serialize x". But that serialize will be written using meta-data, not by recursively invoking serialize.

Function.prototype.toString

You just need to have access to the code at run time, which for example in Javascript you have (calling tostring on a function gives you the code).

That is not quite true. First of all, toString is not guaranteed (by the standard) to return the original source code. More importantly, it does not give you access to the lexical environment of the function, so that you cannot properly interpret the function's free variables. Some JS libs do some hacks based on toString, but it's really just that, awful hacks, not usable in general.

Right, I did not mean that

Right, I did not mean that Javascript can support it well (or at all), just that it's not impossible in dynamically typed languages.

Typeful Dynamic

the largest loss of dynamically typed languages is type classes, and inferring code from types in general

You'd have a hard time getting that position past CLOS users.

There isn't any particular difficulty supporting type-specialized dispatch tables - for predicate dispatch, multimethods, typeclasses, etc. - separate from the objects.

For securable, scalable, open distributed computing, I'd prefer these patterns be explicit (no implicit global dispatch tables!). Regardless, they're valuable patterns.

Can you

Can you express e.g. the Read type class in CLOS?

Inference

CLOS does not natively support inferring the output type for selecting a read method in Lisp the way Haskell would. But if you are willing to provide an indicator of the output type (a symbol, for example) then there should be no difficulty expressing a Read class in Lisp.

In practice, the question would not arise. You do not express Haskell's `Text.Read` in CLOS. There are more dynamic approaches to extensible grammars.

[edit:] I suggest we do not conflate the issues of inferring the type and inferring code from the type.

No conflation

The whole rave about type classes comes from the way they allow inferring code along with the types. If you had to give the types explicitly, then you'd lose 95% of the power of type classes. So you have to consider both kinds of inferences together.

Requiring an explicit type at every use site is no better (likely worse) than having to explicitly pass some proxy value to dispatch on, so what would be the point?

If you have generic or

If you have generic or first-class types, much of the power of typeclasses is retained. In Haskell, it is not unusual to annotate values with types when we need specific results (e.g. `typeOf (undefined :: b)`). Nor is it unusual to annotate every module-level function so inference remains local.

Requiring an explicit - but first class - indicator of type is much more expressive than annotating types at every call site. It is little different than requiring a value to dispatch on, except that you don't have a full value for specifying return types.

Pessimistic

I'm a bit more pessimistic, actually. The real usability problems pop up for recursion across abstraction boundaries (e.g., separately compiled recursive modules, classes, recursive higher-order functions, etc). Then no clever analysis helps, you have to force the user to specify all kinds of dependency annotations at the boundary. And for anything non-trivial that is going to be highly tedious and brittle.

Not necessarily Tedious or Brittle

Clever analysis - in the form of soft, stable constraint models - can help a lot even across abstraction boundaries. Configurations and dependencies don't need to be tedious or brittle - they can be typeful, open, extensible, and securable.

Today we have a lot of patterns - dependency injection frameworks, abstract pluggable factories, etc.. If we formally and pervasively model resource discovery, constraints, and linking, I believe we could achieve many advantageous properties. It's just an area of language design that seems neglected.

The usability issue is with specification, not checking

I probably wasn't particularly clear. The problem I'm pointing out is not one of analysis (e.g. respective type systems already exist) but one of user-required specification. At an abstraction boundary, the programmer necessarily has to write down all constraints on the import/argument/whatever, otherwise there can be no separate compilation/compositional type-checking.

Assume I write a function that recursively combines two arguments, e.g. with some form of mixin composition:

f (a : A) (b : B) = ...a with b...

There is no way you can type-check such a function without requiring the user to express all the necessary dependency constraints on a and b explicitly, e.g. as part of the types A and B.

I agree there is a

I agree there is a specification burden at interface boundaries, but I'm not sure they are "fragile". I believe that if a concept is semantically necessary for reasons that can be understood, the interface exposing this concept can be made equally easy to understand. So I hope that, if the user is able to write this code, he should also be able to explain why it works. Of course, finding the right interface description language is difficult and a research topic.

(Regarding termination of definitions across recursive mixins, I think we should try reusing the sized types approach of the dependently typed community, which has had the same problem with stronger incentives to check it statically, but perhaps a bigger complexity budget.)

So I hope that, if the user

So I hope that, if the user is able to write this code, he should also be able to explain why it works.

While a seemingly reasonably assumption, this where I think typing falls down for some people. Explaining why a program works takes place in a different language (the "impoverished" type language) than writing code (the rich term language). I speculate these "two-levels" are a main reason dynamic typing enthusiasts prefer tests and contracts, which are all term-level.

I'm not sure what bridging this gap would entail. Some way of unifying types and terms in some model of staged compilation I imagine, perhaps via some sort of flow and path sensitive abstract interpretation. You would still want some sort of concise description of semantics that we might call a type, but it would also need to be a meaningful term of some sort in some earlier stage.

For instance, a type is a program the compiler executes to generate a residual program that is serialized to disk. The term-type distinction merely distinguishes the execution stage. Note that you still have what amounts to a "type system", but it's all within a uniform syntax. I haven't even begun to think what such a uniform syntax might look like.

I'm wondering how far we can

I'm wondering how far we can push whole program analysis in a language as a form of type inference. Haxe do whole program analysis to seemingly good results, but they make their type system less expressive, I think, to get there.

Unification of terms and types is a good goal. A type is merely an abstraction of a term anyways, could we possibly make them (a) executable and (b) generally check that an abstraction of an expression by a type is reasonable (aka, type checking)?

A type is merely an

A type is merely an abstraction of a term anyways, could we possibly make them (a) executable and (b) generally check that an abstraction of an expression by a type is reasonable (aka, type checking)?

Indeed along the lines I was thinking. "Type are propositions, programs are proofs", so why not have propositions and proofs be the same term language you are compiling, just executing in a different phase?

For simplicity, assume a strongly normalizing untyped term language with staging support. You can then define a DSL that provides ML with HM inference, which is essentially evaluating the propositions of the type program producing either false with the list of errors, or true with the code fragment consisting of the residual program.

This has the flavour of Active Libraries and fexpr, since some control over environment would be required to compile some typed programs.

Dependent types

Isn't this what dependent types are all about? That types are simply aspects of terms in your programs?

Yes, but I was thinking of

Yes, but I was thinking of something more restricted than full dependent types, and perhaps a little more familiar to dynamic typing enthusiasts. One of the main problems with dependent types is that types can't be erased, in general.

Although in principle, I suppose that there's little difference between the two, since any set of boolean propositions + a program composing those propositions you'd care to write in my approach, could cross stages into the residual "compiled" program as well.

Magpie's type system was

Magpie's type system was something like this at some point.

Bootstrapping a Type System

Sort of. I definitely wish

Sort of. I definitely wish to be rigourous about the phase distinction; not sure if Magpie is. The nature of the propositions you can write is open-ended, where Magpie seems inherently restricted to lexical languages.

The idea is essentially: start with Piumarta's open, extensible object model which has 3 primitives: objects, symbols and vtables, with 5 associated methods. We can translate to a more standard approach and consider values, environments and symbols to be primitives. Add metacircularity via a staging distinction. Piumarta has the beginnings of such a distinction via symbol interning, ie. a string value (stage n) becomes a message type (stage n+1) once interned.

We can use this to introduce a 'compile' interpreter, which partially/abstractly evaluates the program, and produces a residual program where values in stage n are static bindings and types in phase n+1.

You could do something similar in a typed context via a tagless final encoding, although I'm not sure it makes as much sense. The metacircularity is where the real trickiness comes in. Jacques Carrette mentioned he was working on this IIRC.

Definitely going to have to try this out when I have some time.

Types carry propositions

I've mentioned my thinking on this before, which is to make the type system a mechanism automatically infer propositions about the code. Not Curry-Howard style, but rather direct propositions about the code (e.g. "f: T->T" ===> "forall x:T. (f x):T".) You can always drop to the ambient logic to do your reasoning. A proof can be presented a sequence of true assertions close enough to automatically fill in the gaps.

Regarding the specific question of how to prove cyclic dependencies terminate in the abstract, I think the idea will be to provide a mechanism for reasoning as above, and then box up a few patterns of reasoning that come up.

Separate compilation is

Separate compilation is something I'm willing to weaken a bit - i.e. true "separate" compilation only for service abstractions rather than ad-hoc abstraction boundaries. There are viable alternatives to separate compilation within each service or application, such as caching and incremental or staged compilation.

When I spoke of constraints, it wasn't just for checking. They serve simultaneously as specification, telling the linker or compiler how to discover dependencies and adapt applications to their environments.

~No experience with H-M inferred languages

My comparisons are mostly based on Java/C/C++/Pascal style type checking. Those are the ones most people (not here on LtU, but most people in general) are thinking of when they talk of typed languages. And I feel the null-hole in the type system is so large that it makes it mostly meaningless.

I have no significant experience with static typing in non-mainstream typed languages; I've played a tiny bit with Haskell, but that's about it. I know about the use of types more extensively, but I have no experience with how much difference that makes in practice, and only a vague understanding of how some of it is done (ie, I know about algebraic data types and some of the things you can do, and monads and some of the things you can do, but haven't used them for structuring anything.)

I've seen people say that it makes a large difference, and a couple of anecdotes from people that have transitioned away from Haskell that say that having things checked doesn't make much difference after all, though knowing the style is helpful.

What I do bring to the table is fairly long experience with dynamic languages, and the experience that the problems that people that are "static programmers" get with dynamic language aren't noticeable problems when you've got some years of experience with dynamic languages. You learn where to direct your attention and unconsciously avoid programming patterns that would create problems. You also get much more benefit out of the dynamic parts, as you learn ways to use it. This also goes the opposite direction: When you've used a restricted tool for a while (e.g, a typed language), you don't even think of the ways of structuring your code that would not be possible to express in that language, and you learn to exploit the strengths of that language.

Take existing populations of

Take existing populations of code with accessible source and source history (github, sourceforge, mozilla's browser and javascript, etc.). Run Lint-like error detecting algorithms on them at different snapshots in time. Categorize the errors. Identify which classes of errors would likely have been caught with an ML system. Compare projects using unit-tests against those that do not. Compute how long errors remain in the program, and how often they regress, using those histories. Perhaps estimate `overheads` - how much work is on unit test per percentage benefit.

We should mine the resources available to us, take a more `naturalistic` view of CS (similar to social sciences). Lab approaches might be better in some cases... but it's a lot easier to buy some processing time on E3 to run snapshots of a thousand project histories.

I expect raw data will serve us best for the sorts of questions you ask.

I'm not sure what we would

I'm not sure what we would learn from this. The programs are written by different populations and usually in completely different domains. So if a web app is written in a dynamic language and accesses a bunch of web services, what would you compare it to? By going with a more scientific lab approach, you could come up with some real conclusions.

The code data mining approach is definitely very useful for other tasks, like figuring out interesting API correlations, clone discovery, and so on. Its definitely the direction that we want to go in, but then we are avoiding looking at problems that require expensive lab testing.

The scientific lab approach

The scientific lab approach might give you some conclusions. Whether they can be generalized, or are even applicable in a realistic scenario, is a different question. I would not be surprised if using FFI and web services extensively mitigates the need for or utility of rich static types (unless, perhaps, you use some sort of zero-tier application approach like Opa or Ur).

IIRC, it is especially hard to account for long-term effects in a lab setting with humans, e.g. the expert's curve, or network effects from sharing libraries and frameworks, or the growing cost of education in libraries.

You are asking questions

You are asking questions about human performance that you cannot really observe very well indirectly with all sorts of other factors in play.

Can we really compare apples to oranges? Could you really compare Haskell's performance on a financial transaction app vs. Ruby's web app performance? No, you need a common domain, but then you'd have to find existing comparable data in both languages. And did the Haskell programmers smarter than the Ruby programmers? Many companies require Haskell simply to weed out the noise. Oh no...

If we really want truth, we would have to control so many factors that it is unreasonable to do it outside of the lab. And you are definitely correct: the results of the lab study would be pretty narrow, subject to only the feature being tested in a very clean environment.

We must compare apples to oranges.

Consider your proposed trial: two variations of ML, controlled to vary in only one characteristic - one has static types, the other does not. The bias that seems to have escaped you: some features such as type-based dispatch and typeful programming, accessible only in statically typed languages, may also be essential for their expressiveness and productivity.

How is it reasonable to control for all the variables, when doing so introduces such biases in our tests?

Network effects also apply to language design, via feature interaction, library and IDE integration, etc.. You know this better than most. Attempting to control certain variables is likely to result in naive and controversial conclusions.

But we can, at least, compare Ruby to Ruby, Python to Python, and JavaScript to JavaScript, when deciding how well unit tests catch errors that would likely have been caught by an ML system.

But to answer the question,

But to answer the question, you will only discover that an apple is an apple and an orange is an orange. This isn't science.

I'm not arguing with you that the conclusions that could be drawn from a lab study would be narrow. Time pressure is a biggy, the fact that we are measuring performance in one sitting vs. a real life development project. However, you would have to make these sacrifices to get some real scientifically repeatable results, the fuzziness occurs then in how you interpret these results.

I believe that the study as I've laid it out would yield some interesting/usable results and is worthwhile. But it wouldn't settle the debate for sure by any means. We can't really expect closure.

OT: "Type-based dispatch"?

Programmers using dynamically typed languages dispatch on (dynamic) types all the time. Granted, their language doesn't do it for them, but the capability is there.

Furthermore, if the language designer isn't on a crusade against syntactic sugar, you can have things like this (from the Pure examples):

tak x::int y::int z::int
		= tak (tak (x-1) y z) (tak (y-1) z x) (tak (z-1) x y)
		    if y < x;
		= z otherwise;

tak x::double y::double z::double
		= tak (tak (x-1.0) y z) (tak (y-1.0) z x) (tak (z-1.0) x y)
		    if y < x;
		= z otherwise;

Those ::int and ::double things are nothing but guards on untyped term rewriting, but they provide the effect of type-based dispatch.

Type based dispatch

Dispatching on a dynamic property is a different thing from dispatching on a static type. For one thing, you can dispatch on static type without actually having a value. You can emulate that to some extent in dynamically typed languages, but you do it via dictionary passing rather than dynamic property based dispatch.

A lesson on lab results vs.

A lesson on lab results vs. throwing lots of data at the problem and shaking hard...

Methodologically it would

Methodologically it would make the most sense to write a dynamically typed ML or Haskell interpreter and see then which type errors are caught by UTs. I think the debate would be pretty soon conclusive.

The Erlang Dialyzer work?

Dialyzer is a static analysis tool for Erlang that does a lot of things, but in particular uses a "I reject when I can prove you're wrong" static type system¹ -- and can use types annotations that are part of the formatted documentation for each definition.

¹: so it's really not "ML-ish": it underapproximates failure rather than underapproximating safety, and iirc. it is based on some amount of subtyping.

See the main Dialyzer web page here, with links to several papers, both describing the tool itself and experience reports on its use.

Dialyzer was first used on tools that where "critical" for some Erlang infrastructure, well tested, and already in production. It still found a certain amount of defects (From the paper "... a wa story"):

Indeed, we were positively surprised by the amount of discrepancies D IALYZER managed to identify, given the amount of testing effort already spent on the safety-critical components of these projects and the conservatism of the methods which D IALYZER version 1.0 currently employs.

The paper Gradual Typing of Erlang Programs, a Warngler Experience (Konstantinos Sagonas and Daniel Luna, 2008) describes how the Dialyzer authors used the tool in a semi-automated way to analyze the code of a software project, Wrangler, that they were not familiar with and was already tested:

The code of Wrangler has various interesting characteristics with respect to what we want to do. First, it has been developed by researchers who are experts in typed functional programming. For this reason, we expected that Wrangler’s code base would be written in a type disciplined manner and would not contain
(m)any type errors. Second, we expected that its code base would contain an interesting set of uses of higher order functions — possibly more than in most Erlang code bases out there — and this would be challenging for our tools and approach. Third, the authors of Wrangler have been heavily involved in a project related to testing Erlang programs and have used Wrangler in conjunction with sophisticated testing technology such as QuviQ’s QuickCheck tool for Erlang [2]. Finally, the authors of Wrangler are aware of the tools of our group: as they acknowledge in Wrangler’s homepage they ‘make use some of the ideas from Dialyzer’. In short, we expected that this would be a relatively easy task. Let’s see what we found.

Neat! The Wrangler program

Neat! The Wrangler program is fairly complicated -- PL research still doesn't know how to test and verify such programs -- so this is a great benchmark of a hard case. I may be reading it wrong, but the report makes it sound like the Dialyzer found relatively few bugs for such a large codebase (~1 per 2,000loc). I was surprised, esp. because, as they noted, the code uses a lot of higher order functions and the domain/range of refactoring is richly structured.

The choice of Wrangler is a little different from what I'm interested in however:

1. Wrangler is a research project that warns "We DO NOT recommend to use this prototype on your production source just yet!". This raises all sorts of alarm bells.

2. According to "Testing Erlang Refactorings with QuickCheck", Wrangler doesn't actually seem to use high-coverage unit testing. Wrangler relies upon a different and experimental testing methodology. Unit testing advocates believe that the behavior of small functions should be tested, which symbolic execution of big tests will not examine.

Basically, this is exactly the type of paper I was hoping for, except I wish they didn't pick an academic project for the case study because it is a bit of a strawman :)

Many wrong assumptions

I see many bogus assumption in this thread (and in actual literature I have seen on the subject):

  • You could take an untyped language and add a type system and see what effect that has. That is bogus, because these languages have typically been designed without types in mind. A type system added after the fact almost always performs substantially subpar.
  • You could take a typed language and remove its type system and see what effect that has. That is bogus, because these languages have typically been designed with types in mind. Some languages embracing types, e.g. Haskell, do not even have an untyped semantics.
  • You could design a new language, and provide a typed and an untyped version of it, and compare those. Again, this is bogus, essentially for the same reasons as the previous two points.
  • All type systems are alike. Obviously, completely wrong. The quality and expressiveness of a type system can lie anywhere on a very broad spectrum, with vastly different trade-offs that affect productivity in various ways, and differently in different situations.
  • The style of code that is written in typed and untyped languages is roughly the same. I strongly doubt that is the case. From my personal experience, using more advanced abstraction is usually much harder in untyped languages. For example, higher-order abstractions -- combinator libraries and such -- are very hard to use without types (because dynamic checks are fundamentally restricted to first-order), so people don't use them much. OTOH, reflection is much harder in typed languages.
  • All programming tasks benefit similarly from the presence or absence of types. Clearly, that is not the case, so how do you pick your domain?
  • All project sizes benefit similarly from the presence or absence of types. In my experience, the larger the project and the team, and the more third party code is involved, the more important types get.
  • Productivity is only about writing programs. That is completely wrong. Productivity primarily is about maintaining programs on the long run. And that is much harder to test properly.
  • Quality of implementation does not matter. Clearly, non-trivial type systems also stand and fall with the quality of the error messages they give. And those can be worlds apart across different implementations of the same language.
  • Types or no types is only a question of programmer productivity. It also affects implementation and tool making cost. And language sanity.

So, until anybody has a great idea how to keep bias out of any of these parameters (and others), while still keeping the experiment manageable, I claim that any such experiment will be more misleading than revealing.

Edit: A central point I failed to make explicit is the ability to instrument type systems to achieve semantic checking, see my later post.

Science is hard

Everything you say is valid, but so what? There is possibly no way to, in one go, study every single factor that you are concerned about. You have to chop it up into a bunch of small bits that you can then do experiments with. Then at least you have hard unbiased data with respect to the experiment performed. How you use this data is the problem, if you are using it to claim A is better than B, then that's dishonest. If you use it to make informed decisions about language design, then that's much more valid.

Now, there are of course pointless (or poorly done) empirical experiments just like I claim there are pointless proofs. But if you want to know an answer to a question (and you aren't already biased about it), going the controlled experiment route is definitely an option. Just realize that it won't put to rest any ideological debates, and do it right (meaning 3-6 months prep work just to get to the point where the experiment can be done) or go home.

[Edit] Everything you say about static typing being beneficial in large projects with multiple people also applies to unit tests. Also, dynamically typed languages aren't untyped, they are just typed at run-time vs. during compile time (usually with more flexible rules).

Unavoidable bias

Then at least you have hard unbiased data

That is the bit that I very much doubt. I believe that any such experiment will, intentionally or unintentionally, but unavoidably, be biased. Heavily. And so will its outcome. Because you cannot meaningfully isolate the variables.

The best you can do realistically is studies that give you soft biased data, and you can try to keep the bias in mind in the evaluation of this data. But that will be fairly subjective.

You are definitely right,

You are definitely right, but data itself is unbiased (by definition). The way the experiment is set up will have bias, and your sample will be biased. So you have to be very transparent about your experiment, and how you selected your sample.

Really, its just like doing proofs. Your abstraction of the problem biases the "usefulness" of a proof even if the proof is formally solid.

Funny

Ironically, this is why Stefan Hanenberg does these studies! His whole intention is to take classical Empirical Software studies and do them over and see if he can find totally contradictory results.

There are similar

There are similar difficulties in all experimental sciences, yet they are able to get useful results. "Getting meaningful results is hard, let's just stick to our mathematical proofs" is not very satisfying. Designing languages and type systems without getting experimental data is like only doing theoretical physics. Or designing medicines on theoretical chemical grounds but not testing them.

In physics, all information about the universe comes from experiments. You cannot determine which of two self consistent theories is correct on purely theoretical grounds. The same goes for programming language design. Programming languages are to be used by actual humans, and the only way to determine the value of an idea to test it with actual humans. Saying that you will not do *any* experiments because it's hard is *less* rigorous and scientific, not more. The alternative is to view programming language design as a purely mathematical exercise. That is some very boring pure mathematics, though.

because dynamic checks are fundamentally restricted to first-order

This is true in some theoretical sense, but in practice you can get higher order checks from first order checks. A type like f : int -> int means that when you pass an int to f, it will return an int. You can check this dynamically. Of course you'll only get an error message as soon as f will actually be called and misbehaves. See also Contracts for Higher-Order Functions.

Not really

The alternative is to view programming language design as a purely mathematical exercise.

So pure mathematics is the only alternative to evaluating language designs by measuring Joe Idiot's productivity?

Designing a programming language with user studies looks to me like trying to devise a winning chess strategy by measuring "what works". You can probably do a bunch of studies and figure out such maxims as "statistically it's best to move your pawn after your opponent moves his." It would work in theory if you developed a complete play book for the game, but the game is far too complex for that to work in practice.

Similarly, in programming languages, I suspect productivity studies really only measure what worked and what didn't (and I suspect the most useful studies here are long term, big project successes/failures with so many variables compared to the number of data points that it's hard to draw conclusions). Any deeper conclusions about what we should be changed to get better results requires actual understanding of the issues involved, and that understanding doesn't require any user studies. Maybe the programmer requires training and the language is fine.

statistically it's best to

statistically it's best to move your pawn after your opponent moves his

Actually, the strongest Go programs do exactly that ;) Chess is a well defined mathematical game. Programming language design is not, but at least officially, academia is treating it as such. Your comparison with chess illustrates this nicely.

Sure you can use whatever methods you want to come up with new ideas. The same happens in physics and pharmaceutics. Physicists use subjective things like "elegance" to find new theories.

I'm not saying that the same statistical studies should be done for programming language productivity. There can be other kinds of studies, for example building a more than 10 line program with a new language idea. Or comparing programs written with and without the language feature. Having a professional programmer use your language and report his experience. All of these are far more convincing that your idea actually works in practice than the current alternative of doing nothing beyond a 10 lines or less motivating example. Or, at the very least, actually implementing the language or language feature, so that people reading your paper can try it out!

For example I think it would be valuable to set up a service where you send an implementation of your new idea along with some instructions. The service will then recruit some programmers who will build something with it. Their screen is captured for the whole duration. They answer your questionnaire and provide free form comments afterwards. All of this data (including the complete screen capture and resulting code) is publicly available, and you can summarize it in your paper. Of course it will cost some money, but given the size of the software industry this should be doable, especially since this kind of thing is *very* cheap compared to experiments in other fields. For example a single lab mouse can cost in excess of $500. A measurement device in physics can easily cost more than $1 million. Or, $9 billion. Sure, it's far from perfect, but I think some useful data can be collected (and if the programmers get excited, research can have a lot of impact more quickly).

Take for example closures. I suspect that most people here think that they are a very good thing. Why is it that we believe this? How would one give convincing evidence to somebody who can program but who doesn't know what closures are?

Take an example from physics. We believe that an object's velocity stays constant unless you exert a force on it. A long time ago people thought that a force is continually needed to keep an object in motion. How would one give convincing evidence to somebody who hasn't thought about the matter, say 200 years ago? This is not easy. If physicists would have given up on experiments as easily as people do in programming languages, they wouldn't have gotten anywhere. Many of the same caveats apply: "you did an experiment that the speed of light is constant in New York, but how can you be sure that it happens everywhere?". Then you do an experiment in London, then in Hong Kong, etc. The point is that even if an experiment has a lot of caveats because it is studying a specific situation, if you do a lot of these necessarily flawed experiments, then a trend can start to emerge and good evidence can be built.

Still disagree

Take for example closures. I suspect that most people here think that they are a very good thing. Why is it that we believe this? How would one give convincing evidence to somebody who can program but who doesn't know what closures are?

This is the main question, but I don't think closures are very good in isolation. They just fit into a certain style of programming in an important way.

I still find your analogies to physics and medicine pretty seriously flawed. A better analogy might be problem solving techniques in mathematics. I hesitate making the comparison since you've been complaining about comparisons between programming languages and mathematics, but the problem solving approaches behind mathematics are much fuzzier, human driven, and closer to programming languages than the formal product of mathematics. And while there certainly have been attempts to improve problem solving techniques by studying what works, I don't think that's been a serious contributor to new techniques. Most progress has been made by the introspection of skilled practitioners. I think the same will be true for programming languages in the foreseeable future.

So how do we know that closures are good? Because skilled practitioners say they are. Just because this answer is highly informal and ad hoc doesn't mean you can do better with user studies.

I agree, programming is very

I agree, programming is very similar to doing mathematics. But unlike in programming, there is no scientific field that claims to be designing the language of mathematics for mathematicians to use, a field that is publishing papers that claim that their new notation or language improves mathematicians' productivity. If we want to do the same as mathematicians we'd give up on designing programming languages and let them be designed by practitioners.

Because skilled practitioners say they are. Just because this answer is highly informal and ad hoc doesn't mean you can do better with user studies.

Sure, that's why I proposed (I probably edited this after you started typing your reply) alternatives to doing user studies with objectively measurable outcomes (like measuring the time it takes to complete a task). One of them was letting skilled practitioners use your design and then asking what they think. Do you think that would be valuable?

I think the people designing

I think the people designing a language for mathematicians call themselves caterogy theorists. I guess I can see feedback being useful to help get a beginners eye, but I'm not sure how useful. Should the newbie opinion influence the design or mainly just the tutorial?

I think an intelligent

I think an intelligent programmer in the field that you are targeting can give more than just any newbie's opinion. Remember it's not just for the writer of the paper but also as an independent opinion for the readers of the paper. For example if you are designing a language for web programming then a professional web programmer can tell you whether what you are doing is actually helpful in practice. Or actually writing a non trivial web application in it can tell you whether what you are doing is helpful in practice. Not all research will benefit from this of course, especially not the more theoretical research. But for example the recent post mortem of BitC could have been different if it was actually exposed to systems programmers or used to do real world systems programming at an earlier stage. Quoting from there: "The distractions of academia aside, it is fair to ask why we weren't building small "concept test" programs as a sanity check of our design.". Or for example Sean's "Magnetic Intellisense": I think it can be tremendously useful to write moderately large program with it, or to let other programmers test it and describe their experience.

Experience is extremely

Experience is extremely valuable. I just think that what you do with it is dig into the experience reports, try to figure out what's going on, and then think hard about how to improve things. I think quality of data (allowing you to really understand what's going on) trumps the quantity youd get from larger polls.

Not Chess Strategy

More like trying to build a `better` Chess game by tweaking various rules. This could actually work fairly well if the rule-tweaks are controlled and have relatively shallow effects (e.g. deciding a `better` layout for the pieces behind the layer of pawns, or starting White with one fixed pawn forward).

But the dynamic typing vs. static typing is like trying to compare Chess to Stratego by looking at the winning strategies...

Off topic

Related.

We present initial results from ACCME, A Co-operative Co-evolutionary
Metroidvania Engine, which uses co-operative co-evolution to automatically evolve
simple platform games

Invention and Design

The normal "science" of invention and design involves obtaining feedback, studying the frustrations and successes of users and seeking to limit the former and augment the latter.

There are many experiments we can run, and much data we can acquire, that don't need much control over variables in a laboratory setting. There is no dichotomy between mathematical proof vs. laboratory experiment.

Indeed, I expect the information we can acquire from a naturalistic setting would be more valuable for language designers than the isolated, narrow results of a controlled lab experiment.

The data acquired from a

The data acquired from a naturalistic setting is very different from one obtained from the laboratory. There is a reason why we still build mutli-billion dollar super colliders, to observe phenomena in controlled settings to know things for sure. You can't really compare them.

But you are right that naturalistic data is much cheaper to collect, and usually very fruitful. We will continue to learn more there than in the lab; we might as well pick the low hanging juicy fruit first. But there comes a point where the lab experiment is useful and worthwhile. An absolutist position will always be wrong.

Agreed. There is certainly a

Agreed. There is certainly a time and place for lab experiments. Ideally the experiment can be isolated to controlling just a few variables.

Another profitable mode is simulation, which can be useful to study emergent behavior and feature interaction. But I don't know how to apply this in language design. (Maybe it would be suitable if we were also designing languages for exploratory programming, i.e. edit-and-tweak-until-correct, in which case we could systematically explore language designs to discover which converge more readily on a valid program.)

That was what my crazy

That was what my crazy neural network approach was all about. Let the nets use two methods (still need four sample groups), and see which ones "perform" better after an appropriate amount of evolution so they reach nice local maximas. Of course, I have no idea how to make that work, but it would solve our human bias problems. The nets don't represent actual human behavior, but somehow would measure an intrinsic efficiency in the approaches.

Thank You!

As Alan Kay likes to say, the difference between good designers and everyone else is attention to details, the thousands and thousands of details.

Types or no types is only a question of programmer productivity. It also affects implementation and tool making cost. And language sanity.

I hate language designers who claim they designed their language to be "cheap to tool", too, and then give only three or four examples of what that means. You just end up with a "cheap tool" instead. For example, in Chris Anderson's book Essential Windows Presentation Foundation, he says that Microsoft created XAML and WPF Styles because HTML and CSS is very difficult to tool. Yet, I have run into many scenarios where either myself, another developer, or a graphic designer find the language difficult to use. My most recent complaint was posted yesterday to the WPF Forums: namespace collision is undefined by the language spec, as are many other common language features. Some of these issues have rendered Visual Studio totally unusable in the past without researching workarounds.

Apart from those complaints, a common issue I found with "XAML is cheaper to tool than CSS" is that it focuses on the wrong criteria for language design. Anybody can build a tool cheaply. But cheap costs result in missed scenarios. For example, if I have a total summary at the bottom of a data grid, and the summary is numeric, then it should be right-aligned. Yet, to do this, I have to copy and paste 3 pages of code and edit one line of code. There is nothing similar to, say, Jigsaw-like combinators for composing the Look & Feel of a user interface.

In additional, graphic designers with a background in Flash found WPF animation confusing, since Flash uses frame-based animation. Understandably, WPF doesn't use frame-based animation since it is harder to synchronize playback correctly with a frame-based animation suite, one of the many problems that ultimately led to Adobe killing Flash due to the difficulty in testing on so many devices.

I agree that checking for a

I'm a little surprised that I had to write this response. The main deviation in what I'm asking from what is standard for evaluating program analyses is that I want to use code that has been unit tested as the baseline. Type systems are still subject to traditional program analysis evaluation metrics, and I agree that careful setup and measurement is important. I was simply asking about research that has done this; the baseline is more specific than what I recall seeing before.

Perhaps some context into my question will help. My interest in "unit testing vs types" is because offline finding of a certain class of errors is a benefit that programmers can observe. Does unit testing share this perceived benefit of typed programming, and therefore is it worth checking if unit testing enthusiasts undervalue types because they do not see the other benefits? Before examining beliefs and perceptions, I was wondering if unit testing enthusiast were simply wrong in that unit testing does not, in practice, reveal ML-style type errors.

To clarify:

Objectivity metric: I asked for an objective one-way comparison: when checking types on unit tested code, what is the true positive rate? I was careful not to mention abstract properties that I'm not sure about how to measure, such as your discussion of productivity or implementation and tool making cost. The notion for true positive might need to be partitioned, e.g., for an open vs. closed system, but that sort of thing is traditionally experiment-specific.

What type system: I intentionally wrote an "ML-ish" type system. This is generally understood as avoiding extensions for dependent typing, watering away polymorphism, etc. I assume similar care should be taken for picking a baseline of a "unit tested" program, which is why I objected to the Wrangler case study. Better questions are along the lines of "should the entire program be typed, and if not, how is the missing code handled?"

What it means to type untyped code: I nowhere claimed that you can simply run a type analysis on a dynamic program and achieve a reasonable result, nor did I ask for an exact ML type system. I even explicitly mentioned DRuby, which had to go through all sorts of hoops! Consider Arjun's type analysis, which was designed to better handle some forms of dynamism (wrt traditional ML) that JS authors use. To use it on our Flapjax framework, which was written with only a loose notion of contracts in mind, Arjun still had to rewrite and restructure parts of the code. The need for refactorings revealed false positives and various other adoption barriers, but it was certainly doable and measurable.

Style of code: I think it's fine in practice. I agree that it introduces bias, but this is the sort of bias I would want to see proven or emerge as relevant. For example, I agree that developers will use features in a language even if they are not available in another one -- initial staging of code was a big challenge for DRuby and a big cause for the refactoring of Flapjax. However, the analysis, give or take, was able to run after that.

Again, I don't see this as a very problematic experiment. My request was more out of laziness: I know researchers in gradual/soft/etc. typing of dynamic languages have rigorously analyzed real code, and I was hoping someone knew of a good study on using it on a program that was already tested via unit types.

Perhaps the takeaway, here, is that no, you are not aware of any such evaluation :) Thanks for trying. I'm sad that this request turned into exactly what I asked for it not to

Just to be clear, my post

Just to be clear, my post wasn't intended as a reply to your question but rather as accumulative reply to various other suggestions in this thread.

Sorry that this does not help your quest. :)

Great, I was rather

Great, I was rather perplexed :)

You mentioned "objective"

You mentioned "objective" and "quantitative" in your original post, but now you are mentioning "in practice," which is quite a different kind of data that is mostly anecdotal. Were you looking for a study that was impossible to exist in the first place? If all you wanted was qualitative case studies, why did you ask for quantitative? Are we just disagreeing on terminology?

This discussion was useful even if you are sad about it. I realize now we are all over the map on what is evidence in our field. That most everyone in the thread has rejected empirical science from being useful to PL is sad, but I guess not unexpected. Doing real experiments is hard work for the amount of usable data obtained, and the opportunity cost is staggering (how many grad students want to go off and spend 6 months doing just an experiment?). But still, our field has plenty of room for empiricists if anyone wants to volunteer, just like it does for mathematicians and designers.

I'm quite interested in what you consider "rigorous analysis" of gradual/soft/typing?

For the record

I'm a skeptic of user study driven design, not an opponent of it.

Out of curiosity

What are the big results to date out of HCI / user studies of programming languages?

Good experiments are

Good experiments are difficult to find even in for HCI papers done in the HCI community. Usually they tack on qualitative user studies to design papers, the evaluation is very subjective and seems like PC ceremony to me. All the good experiments seem to be done in experimental psychology and some small amount of work that crosses over into HCI.

I heard of one real experiment done by Brad Myers' group on whether named parameters or positional parameters were more usable, but I can't find a link to this work in a publication. What was being tested was very simple, but that is to be expected in a good experiment, and they found that named parameters were more useful.

Now, is that a big result? No, but its a good result to have: a PL designer to have when deciding what their method call syntax would like. We are not going to uncover big results with experimental data, actually, I would claim that we aren't going to uncover big results in the evaluation process, that's what design is for! Rather, evaluation only informs on design.

Well, maybe neutrinos can go faster than the speed of light, but then maybe not.

Boolean syntax considered harmful

Their psychology of programming paper showed that one of the most basic parts of a language, boolean expressions, is an error-prone mess for novices. That was a fun one :) I would have liked to see it in a wider context, but that was already interesting.

We take a lot for granted and needlessly keep the bar high for programming -- improvements in this area seems important given how widespread computers are. I'm guessing responses to this work won't be appearing in a conference near you, but that's not Myer's fault.

ECOOP and OOPSLA seem to be

ECOOP and OOPSLA seem to be trying to include more empirical work in the conference with specific empirical tracks, but of course all the papers are thrown into one pile during selection.

Measuring the quality of

Measuring the quality of features by their effect on novices is a rather misguided approach, though. Unless one believes that we should dumb down all languages to BASIC.

Booleans are an abstraction. Abstractions always have a learning curve.

Yep, agreed -- that's what I

Yep, agreed -- that's what I meant about a wider context. The novice case is uninteresting to PL in its current state.

Your second part seems to conflict with the study. In surveys and interviews, the subjects were perfectly able to reason about booleans. Explaining away the notion of boolean computations as a learning curve issue doesn't seem complete.

I'd like to see a version where non-novices are asked similar questions, and, likewise, code audits that examine bug fixes relating to boolean expressions.

Learning Curve

I wouldn't want to dumb down our languages (no artificial glass ceilings), but I would like to see some attention dedicated to controlling that learning curve - making it continuous, gradual, and accessible through UI... for experts, novices, and two-year-olds.

Today we need to unlearn and restructure things at so many stages in the growth of our applications and ourselves as programmers (e.g. concurrency, distribution, security, persistence, scale, etc.).

Experience bias

There is a very good reason that experiments are done with novices. Once the user becomes invested in certain tools and reaches a certain level of experience, they become too biased by their experience and cannot be reasonable compared to other users, even with other experts as everyone's experiences are necessarily quite different. As a result, blank slates are quite desirable if you want good results.

I agree that we shouldn't be making conclusions just for novices, and if you measure something like the complexity of boolean operators, you want to measure more generic cognitive burdens, which necessarily affects learnability but is not equivalent to.

Really, doing these experiments involves measuring very specific situations under ideal rather real world conditions. We must then make inferences on specific data for more general situations, which has to be done carefully. But again, isn't this what we do with proofs? I mean, you take your complex/impossible to formalize language, create a mini-language that you argue is representative of what we care about, do the proof, then make a conclusion that is more general than the proof is actually evidence for.

User studies save lives :)

This is currently my favorite result about looking at human phenomena when designing a solution: trying to stop HIV. Why are programming practices more special than other human ones? The burden of proof is on your side for this one ;-)

Edit: this was a response to your first question. For your second, see Erik Meijer's "Confessions of a Used Language Salesman." I'm not sure what you mean by user-driven design, but it's fairly well established that focusing on the people side of technology will improve it, and we even have a name for that: innovation. As concrete examples, Erik's focus on adoption led to LINQ and phantom types. I'm not as familiar with the HCI community, but I'd say the importance of direct / tangible manipulation is starting to be felt by the PL community as are related ideas like programming-by-demonstration. We even have a section of POPL+PLDI devoted to this basic direction. 'Ssynthesis' is interesting, for a large part, due to the new usability-driven models of programming!

I'm using quantitative in the same

I'm using quantitative in the same sense as any social scientist. 'In practice' as in performed by people in the community. You are right that only looking at a few projects is qualitative (statistically insignificant and instead in favor of depth). Qualitative is generally a good starting point in that it is easier and will orient the quantitative -- I'll take what I can get :)

The Wrangler case study was a great example of the type of evaluation I wanted. The result tables were unclear and they only looked at one project (and not the type I wanted), but they did go through with pushing through the analysis. DRuby, Arjun's work, and scalable software analysis in general (stuff in the security community such Ben Livshit's and Adrienne Felt's mass security audits, analysis of C/linux etc.) are often good and even primarily judged on the strength of the evaluation. Hope that helps! I should have known that asking about this in the context of types would be too controversial ;-)

As an example of a semi-rigorous quantitative analysis, my internship at Macromedia years back was to help transition hundreds of thousands of lines of ActionScript into the new gradually typed language being co-designed. Every time the analysis was enriched in some way, we had to figure out the impact on the code: was it revealing bugs (answer: surprisingly few), were the false positives easy to work around (answer: mostly but not always), etc. I wasn't expecting anything on that scale for what I asked for here :)

Some further refinement necessary

Within empirical software engineering, there is a movement to better define terminology and try to map techniques to disciplined methodologies. For example, it is not unit testing vs. static typing. Instead, it is unit testing vs. test-driven development.

Even within test-driven development, I have observed a wide range of testing techniques:

  1. Acts-Arrange-Assert
  2. Record & Replay
  3. Fluent DSLs
  4. Parameterized Unit Testing (e.g. Pex and Moles)
  5. etc.

There are also strong arguments about which approach is best. See for example the author of Moq, a .NET mocking library:

I could put more links here but I am heading to a friend's birthday party. If this looks new to you or you don't understand why it may be important, then please reach out to me and I can help teach you. I also have a good pulse on the most recent literature from Empirical Software Engineering research.

Thanks.

For example, it is not unit

For example, it is not unit testing vs. static typing. Instead, it is unit testing vs. test-driven development.

Why do you say that?

Where I originally got this question was a (widely respected) systems designer discussing his love of Ruby and the Ruby community. He believes their use of unit testing has, in practice, covered much of the benefit of static typing (as is likewise currently practiced). I don't recall him postulating a particular testing technique going alongside the static typing.

Re: Why do you say that?

Testing techniques matter a lot. For example, product line software engineering tends to use statically typed systems, but they commingle static types with orthogonal array testing strategy (OATS). Applying OATS here is very likely to be more effective than any other technique in terms of computational time to discover defects, due to the fact feature explosion is a thorny problem that can't be dealt with well using any other integration technique I am aware of.

I don't think respect matters if their opinion is simply truthiness: Truth that comes from the gut, not facts. If the words they use are "love" and "belief" then it is simply an idea that needs to be validated.

Anecdotally, I know a systems researcher (who is a successful system designers worth nearly billions) who has said positive things about Ruby. But you know what was hilarious? He couldn't explain how a line of Ruby code worked. He actually wrote in a paper that said a Ruby database API (ActiveRecord) inspired his DSL, but when I asked him to explain how, he clearly didn't know what he was saying beyond vague knowledge of syntax. When the conversation got detailed, it became a one on one conversation between just me and him so he could save face in front of his colleagues who were co-authors. I actually walked him through how the code worked, and knew exactly where to go in the code to describe what it was doing and why and how it benefitted the programmer.

Also, if you want an inspiring case study of TDD, use PyPy and RPython as examples.

Something that would be cool: MapReduce all Ruby projects on Github, Codeplex, Google Code, SourceForge, etc. Figure out which testing libraries they use using data mining techniques where you train the machine based on what code fragments to associate with various libraries and techniques. Product-sum analyses might be even more interesting, since you can then weight the result of what the "Ruby community" is by things like commit velocity, downloads, etc.

For Typed Racket

In my thesis, I go into some of what you're looking for. Unfortuntely, there's no quantification of how well tested the various programs I analyzed were.

It sounds like the random

It sounds like the random number generator is the closest candidate (I assume standard lib code requires some community-checked level of testing). No bugs were found. Thanks!

ML typing does not include

ML typing does not include subtyping. Hence any two types are inherently incompatible. So if the ML type system says there is a type error, and if it is right about that (no false positive), then the first time that code runs it will raise a type error. More precisely, if the ML type checker says "expected type X but got type Y for expression e" then as soon as somebody actually does something non-polymorphic with the result of e, there will be a run-time type error.

For this reason I think that if you ripped out the type checker from a ML implementation and extended all primitives with dynamic checking, then virtually all true positive type errors will be caught with high code coverage unit tests, provided the program actually does something with the values it computes.

Dynamically typed ML?

That is only one interpretation of what ML would look like if it were dynamically typed. Another interpretation is that it would just be duck typing all around, so code that would be rejected by static ML could be accepted and run correctly in dynamic ML. This would actually be in the spirit of static ML, I think, as in OCaml objects that support duck typing statically.

Or we could argue that we are talking about the same thing. In any case, the code wouldn't break with respect to typing during execution until it hit mismatched unit types or accessed a struct member that didn't exist.

Right, if you extend ML with

Right, if you extend ML with objects or other kinds of subtyping that argument no longer holds.

Many ML type errors would not lead to exceptions

I think it's simply untrue that ML type errors would lead to exceptions. For example, one very common trick we pull is to mint new, abstract types that underneath are simply strings. So, you might write

module Username : Identifer = String
module Filename : Identifier = String

Now, Username.t and Filename.t are abstract types, and so the type-system will prevent you from confusing them. In an untyped setting, putting a username where a filename is expected will not trigger a runtime exception, full stop.

And I don't think this is a small issue. A large fraction of the benefit that one gets from an ML-style type system comes from explicitly coaxing out more guarantees from the type system than would be implied from the straightforward data layout.

Now, Username.t and

Now, Username.t and Filename.t are abstract types, and so the type-system will prevent you from confusing them. In an untyped setting, putting a username where a filename is expected will not trigger a runtime exception, full stop.

That's not true. For example, putting in a username where a file is will lead to a file not found exception. This just happened to me yesterday ;-)

I agree in spirit, however. For example, adding a string to a number in JS causes a coercion and, for an exception, you'd either have to change the static type checks into dynamic ones or hope that the coercion triggers a downstream exception (such as doing a multiply on the resultant string). Andreas's comment captures (some) of the subtlety here.

Ah, I didn't think about the

Ah, I didn't think about the case where the dynamic type is the same but the static type is different. Though it depends on how you interpret dynamically typed version of ML: if you introduce a new type tag for every type, including types that just contain a single other value, you would get a dynamic type error for Username and Filename.

What about phantom types?

I think that you're underestimating the role of type errors that would not lead to static errors at all. In serious ML code, they're everywhere.

I'm not saying that phantom

I'm not saying that phantom types, etc. are uncommon. I'm saying that whether they lead to dynamic type errors depends on how you translate the code to be dynamically typed. If you tag/untag each value that crosses a module boundary with its static type then they do lead to dynamic errors. Phantom types are no different.

A new claim

I think we now roughly agree. ML with the types stripped off simply does not have the property you proposed, but one could come up with some new way of encoding the ML type system in such a way that what is caught by the ML type system would, when violated, lead to a runtime error. As Andreas' post points out, there's a lot of subtlety to getting this right, but I agree that such an encoding probably exists. It doesn't sound terribly efficient or practical, though. And saying we're going to get efficiency back by adding laziness makes me suspicious: pervasive laziness causes at least as many performance problems as it solves.

Instrumenting the type system

That is indeed a point I should have made far more explicit in my list: type checking is not just checking representations, although most inexperienced people invariably seem to assume that.

When you have a strong enough type system, then there are various ways to instrument it. Abstract data types are one obvious and very frequent example provided specifically for this purpose. Fancier techniques like phantom types come to mind, too. More advanced type systems provide even richer typing constructs, e.g. refinement types that the programmer can employ to get more semantic guarantees out of type checking.

Note also that dynamic tagging is not quite equivalent, because (1) it is much more costly (consider a List(Int->Int) that you abstract to List(T->Int)), and (2) it is not identity-preserving and thus inapplicable when mutable state is involved (consider abstracting List(Ref(Int)) to List(Ref(T))). In the latter case, types can provide a form of abstraction that is simply unavailable in untyped languages.

You can redefine the

You can redefine the identity comparison operator to ignore wrappers. Tagging can indeed be costly, but not asymptotically so if you do it lazily. Refinement types can also be dynamically checked.

Giving types a dynamic semantics this way is interesting because then you're not forced to formally prove them correct before your program even runs; instead you can leave some properties to be checked at run time. The more powerful your types (e.g. refinement types), the more important this becomes, because proving them correct is often too costly whereas lightweight checking gets you a bit less confidence for a much lower cost.

Not realistic

Yes, in principle you can built magic into the language to ignore (arbitrary towers of) wrappers, but that is rather unrealistic. In particular, since every primitive operation that might involve stateful objects would have to operate modulo wrappers (because you cannot push them into state).

But it's getting even more complicated than that. Once you have type constructors, you also need to be able to commute wrappers in certain ways. Consider going from List(Int) to List(T) to C(T) to C(Int). I actually formalized such higher-order abstraction coercions as part of my thesis, but I wouldn't consider this approach appropriate for actual implementation.

Can you give an example with

Can you give an example with state that would be problematic?

I don't understand how commuting wrappers can happen, can you explain this? Naively I'd think that a value enters a module, something happens internally, then it leaves a module. So wrappers will follow a stack discipline?

Examples

Here is a stupid but simple example. Assume you are given some ML-ish module:

module A =
{
  current = Ref 0
  next () = ++current
}

And for some client code, you want to provide an abstract interface to that:

signature S =
{
  type T
  current : Ref T
  next : () -> ()
}

module B : S = {type T = Int; include A}

No problem with type abstraction, but it would be incorrect at this point to wrap the content of the reference, because module A still assumes it is an integer. The only way this kind of abstraction could be made to work without types is if the language had some built-in magic that can delay all wrapping/unwrapping for references (and similarly, all other stateful entities) until an accessor and or mutator primitive is applied.

"Stack discipline" doesn't hold when you abstract type constructors independently. Consider:

signature Coll =
{
  type U : * -> *
  fromList : List a -> U a
  toList : U a -> List a
}

module C : Coll =
{
  type U = List
  fromList xs = xs
  toList xs = xs
}

signature Abs :
{
  type T
  wrap : c Int -> c T
  unwrap : c T -> c Int
}

module A : Abs =
{
  type T = Int
  wrap c = c
  unwrap c = c
}

x1 = [3, 4]  ;; List Int
x2 = A.wrap x1  ;; List A.T
x3 = C.fromList x2  ;; C.U A.T
x4 = A.unwrap x3  ;; C.U Int

(Today's MLs don't have constructor polymorphism like I used it with wrap and unwrap above, but you can easily achieve the same effect with a functor.)

Thanks!

Thanks for the explanation.

The problem you mention about state is not specifically about state? A reference cell can be represented as a pair of functions to get and set the ref. We already know how to delay wrap/unwrap checks around functions. The same mechanism works for refs that are represented as a pair of functions (and for any other data type interface). Of course, you need a mechanism to hook into for all primitive data.

Can you explain how to do it with a functor? I'm not familiar with this style of programming but I expect that you parameterize Abs by c, and then the call to wrap becomes A(List).wrap x1, and the unwrap becomes A(C.U).unwrap x3? Doesn't this break information hiding, because A(List) and A(C.U) are different modules, yet A(C.U) can peek inside A(List)'s protected information?

Probably you had a different design in mind, where the quantifiers are in the other order. Instead of forall c: exists t: {...}, you want exists t: forall c: {...} so that the information hiding boundary is shared among all c's. I think the problem here is abstracting over type constructors at all, or at least I don't think that the runtime contracts literature has anything to say about that (yet). Perhaps somebody with more knowledge about contracts can comment on this?

Yes

A reference cell can be represented as a pair of functions to get and set the ref.

Indeed, but that effectively means forcing an OO interface onto all state primitives in the language, which may or may not be what you want.

Can you explain how to do it with a functor?

It's very simple: essentially, a polymorphic function is just a degenerate form of functor that just takes type parameters. So instead of, say, the polymorphic wrap function, you can make it a functor (inside A):

signature Abs =
{
  type T
  module Wrap : {type C : * -> *} -> {wrap : C Int -> C T}
  ...
}
module A : Abs =
{
  type T = Int
  module Wrap {type C : * -> *} = {wrap c = c}
  ...
}

And likewise for unwrap (or you combine them both in one functor). Indeed, this gives exists T. forall C. {...}, as you guessed.

Ah, I see. Actually I think

Ah, I see. Actually I think that can be translated almost verbatim to contracts (though it's certainly possible that I'm missing something). I'll try to translate it to contracts and test it later.

Proxies/Chaperones

You can definitely implement all of this using proxies as in JS (see Tim Disney's OOPSLA 2011 paper) or chaperones in Racket (that's how contracts for mutable state and abstract data are implemented).

Hm...

I don't think that is a fair comparison, though. First of all, both proxies and chaperones are extremely heavyweight reflective mechanisms, that are not at all orthogonal to the rest of the language.

Moreover, proxies are only applicable if you limit yourself to objects and OO-style abstraction. AFAICS, they don't generally help with simulating ADT-style abstraction. Racket's chaperones can impersonate a larger set of values (everything primitively mutable in Racket?), but I still doubt they subsume ADT-style abstraction. But then again, I don't know enough about them to be able to tell for sure. How would they express the toy examples I gave above?

Well, you said originally

Well, you said originally that this support wasn't realistic. I agree that it certainly reaches deep into the internals of the language.

As for enforcing ADTs, all you need for your first example is generative structures at runtime. The second example is trickier in the Haskell-esqe style you present, as functors I think it would be easier.

Fair enough

Fair enough. But cost aside, I am still unconvinced, that you can express the general case. AFAICS, you have to resort to some heavy-duty membrane pattern to simulate abstraction over the content of first-class stateful entities. Not sure if/how this would scale to the constructor case.

As for polymorphism vs functors, the two (see my follow-up reply to Jules for the functor version) are equivalent, so the difference shouldn't matter. In particular, the functors themselves are not performing any data abstraction, only the outer module does.

Dynamic abstraction

My intuition is that you could give a wrapper-oriented reading of realizability interpretations of types (roughly, unary logical relations that characterize types as set of elements satisfying some dynamic property) that would describe how to dynamically seal modules, wrapping components at higher types in higher-order wrappers, in the general case. I suppose that's more or less what Sam is doing.

The cost could be high, but they look like things that could be handled relatively well by current techniques of optimization by tracing for example.

I find it a bit strange however that type abstraction would result on a change of value representation -- and raise corresponding concerns of wrappers accumulation, commutation, traversal, etc. I would find more natural to use a pair encoding `(value, runtime type)` where only the second part would be affected by abstraction. Maybe that would also be easier to implement not-too-slowly. That's reminiscent of the "tagged hardware" design of the CRASH project.

Finally, I'm wondering if either techniques could lend itself to "secure" abstraction, in the sense that we get strong guarantees that even non-cooperating code can't break the abstraction barrier. For example, could type abstraction have the runtime effect of encrypting the sealed data through a crypto key that is known only to the internal module?

I have little knowledge of the previous work on runtime encapsulation. Apologies if this is old stuff -- and references are welcome.

Abstraction as encryption

Abstraction as encryption was exactly the idea behind Pierce & Sumii's cyptographic lambda calculus, which interprets wrapping and unwrapping as encryption/decryption. Tag creation is key generation, obviously. Semantically, this is no different from ordinary tagging, though, because you cannot observe the encryption within the language. You'd need to talk about an embedding into a lower-level language (with more observations) to actually say something interesting, e.g. prove full abstraction.

A lot depends on the details of the low level language. For example, can the context look into closures? If it can, then full abstraction wouldn't hold AFAICS, as you must be careful not just to not pass out a key, but also no closure containing a key.

Indeed, when getting the

Indeed, when getting the CRASH link it should have occurred to me to look at Pierce's bibliography. Thanks for the reference, and sorry for the laziness underlying my previous answer.

About your closure remark: if you present closure conversion as a typed source-to-source transform, you hide the closure environment with an existential types. Wouldn't this encoding naturally provide the security guarantees you're looking for?

The closure remark

No worries. The closure remark wasn't so much about how to model, but what to model. For example, if the target environment is machine code, then an attacker will potentially have random access to all memory, including closure environments. In that case, the model you suggest is simply inadequate. If the target environment is something more well-behaved, say the JVM, then the situation is different, and everything should be groovy (though probably still hard to prove).

Encryption as Abstraction

If you start from the embedded but capability secure language and you want to securely model ADTs, the concept of sealer/unsealer pairs is very useful. The actual encryption can be performed lazily if necessary.

It's a useful observation in the dynamic vs. static typing battles.

Academic?

Isn't this whole sub-thread a bit academic? I think that it's pretty clear that in existing languages, you can't easily get dynamic versions of the kinds of static guarantees you get out of a H-M type system. Yeah, maybe you can get it to work with a lot of cleverness and a lot of overhead, but it's pretty clearly impractical given the current state of the art.

No disagreement there

No disagreement there. I'd actually go further and conjecture that this will remain an academic discussion, even for future dynamic languages.

Daring

I do not often think that, but I would like to see you proven wrong!

(Whether or not Racket counts as a counter-example depends on whether you consider racket "academic". It is not efficient, even for a dynamic language, and I think it is now more useful as an academic idea vehicle than for the real-world software that gets developed in it (well, PLT Redex could be seen as a killer application) but maybe it will change or, and I fancy myself to think it likely, its idea could diffuse in the next mainstream dynamic language -- or the next static language.)

Racket's speed & practicality

I believe news.ycombinator.com runs on Racket. While Racket might not be the fastest language, it's certainly faster than Ruby & Python (often by over an order of magnitude).

I think you're confused

I think you're confused about Racket. It's much faster than paradigm "dynamic" languages such as Perl, Python, or Ruby. Single-threaded performance in Racket is most comparable to Clojure or Erlang, with the caveat that these are always terribly flawed comparisons. There are people building real, large-scale (1,000,000+ LOC) Racket applications.

Communication

It's much faster than paradigm "dynamic" languages such as Perl, Python, or Ruby.

I meant to say it's not a *fast* dynamic language (Lua, some Javascript engines), but indeed my comment was misleading.

There are people building real, large-scale (1,000,000+ LOC) Racket applications.

That's neat, and I think you should communicate more on this. I subscribed to the Racket blog (or, for that matter, your blog) and I don't hear about it at all. Where can I get relevant information?

Fast

Looking at small benchmarks such as the Language Shootout, Racket is ahead of v8 in single-core performance on both x86 and x64.

Here is Neil Van Dyke talking about "Racket in the large".

I stand corrected.

I had looked into Racket performances a few months ago, and for some reason remembered the wrong thing -- maybe it was being compared to native language so the results were less impressive. Thanks for clarifying matters.

Works For Me :)

As far as I can tell from this discussion, the Racket contract system currently lets you express all of the invariants you get out of a standard HM-style type system (leaving aside Andreas' constructor polymorphism). Racket provides contracts for both polymorphism and existential types in the standard library.

How far does it go?

I'm curious in practice how far this goes, in particular, if you look at a large, sophisticated codebase written in ML, how much of the static guarantees can in practice be doable by contracts. Also, I wonder how much is required in the way of annotations to capture those invariants. One thing that's lovely about the ML type system is how very tersely your invariants can be written down.

I do think it would be lovely to have a contract system in ML that was well integrated with the type system, so that guarantees captured by types would be implemented and checked as they are now, and where contracts that require run-time enforcement could be handled as well.

Ocaml contracts prototype

Something like the Dynamic contract checking for OCaml prototype?

Indeed!

That definitely looks interesting. Thanks for the pointer.

Squadron scramble

Looking back on it, it seems that the Squadron Scramble code was actually unit tested extensively. There, 5 bugs were found.

The random number generation code probably didn't have an extensive test suite (SRFI's rarely do). Additionally, almost everything in that program is of type Real, making the type system less useful for finding errors.

Cool, good catch. For the

Cool, good catch. For the benefit of others:

Squadron Scramble This application, created by Matthias Felleisen, is a version of the multi-player card game Squadron Scramble [Kaplan 2002], which resembles Rummy. The original untyped implementation consists of 10 PLT Scheme modules, totaling 2200 lines of implementation code, plus 500 lines of unit tests. The program maintains data about the set of cards in an external XML file, which it reads on startup.

The 4:1 code/test ratio probably isn't a good way of quantifying unit test quality, but it's something. I'm not sure what would be good (e.g., the automated testing community likes branch coverage).

This is a fairly unrelated

This is a fairly unrelated remark, but while I don't think branch coverage is related to typing information per se, I have (anecdotally) found it quite related in result to the warning provided by an exhaustiveness checker for pattern matching of algebraic datatypes.

In MLish languages (Haskell included), and in a certain range of applications, an important part of the control flow is done by potentially deep pattern matching, and non-exhaustive patterns are good markers of ignored cases (and, very often, bugs). A pattern exhaustiveness checker, even naive, can be very helpful.

This is also related to Andreas's remark that code evolution and maintainability is paramount. Just like adding a method to a base class in an OOP program requires cross-cutting changes to the program, adding a case in a sum type is a frequent case of program evolution, and is quite effectively supported by the warnings emitted by the exhaustiveness checkers -- if one does not abuse fragile patterns. But for pragmatic reasons this static analysis is not directly part of "type checking" (except for OCaml polymorphic variants).

PS: I don't mean that exhaustiveness checker can replace code coverage checkers or conversely, only that they provide the same feeling of "ah, I forgot to handle this case again".

On the bugs of Squadron

On the bugs of Squadron Scramble in your thesis:

In one location, the string->number procedure was used, while
neglecting that the result might be #f if the provided string cannot be parsed
as a number. The assert procedure converts this possibility into an exception,
as described above.
Four similar functions, all used as examples, call a constructor function
with the wrong number of arguments. Fixing these was very simple.

Can you expand on the last paragraph here? Were these function calls example code that was not updated to reflect the change to a define-struct, presumably because the example code was never run after that? I'm not sure if lmeyerov wants to count these (or the insertion of an assert) as bugs found. Don't get me wrong, Typed Scheme is awesome and valuable, but perhaps the value is not in finding bugs in unit tested code, because, it seems that there simply aren't many? (you write that the porting to Typed Scheme has required a thorough review of the original code, which is generally a very effective way to find bugs on its own)

Yes, the example code hadn't

Yes, the example code hadn't been updated. Ironically for the general subject of discussion, the examples were intended for integration testing, if I remember correctly.

The thorough review was generally only needed to revise code that didn't typecheck, which didn't substantially come in to play for this example (which wasn't done by me, I should point out). So that typically didn't find bugs, in my experience.

On the larger point, finding bugs in non-trivial working software is not easy, and not the goal of Typed Racket, so while I think the experiment Leo proposes is interesting, it wouldn't really change my opinion about types one way or the other.

To be clear, I don't think

To be clear, I don't think the value of types for commercial software development lies in simple bug finding. In the future, it may be useful for finding more bugs (e.g., sharing state bugs in concurrent code), but I agree with other statements here about enforcing good documentation, providing hooks for tool development, etc. My concern is that, if programmers *do* think types are primarily for finding simple localized bugs, type enthusiasts are going to have serious adoption issues. I've been thinking about the adoption / sociology issues, not the traditional technical ones.

Types do provide bug finding

Types do provide bug finding as you type (in IDEs at least). With unit tests you first have to actually write the tests, and they tend not to run unless your entire file is in a valid state.

As Jules points out, I think

As Jules points out, I think that "most bugs typing would find are also found by untyped unit tests" does not indicate that typing is not efficient at bug finding, only that it does not find much more bug than untyped unit tests.

It could be argued (and I think it is very reasonable) that static checking has a tighter feedback loop and lower costs than writing unit tests. Or at least the part of static checking that can be performed at relatively little cost, such as ML-style type inference. I decide to ignore for a moment that the mere possibility of inference is the sign of some language design choices that may have other, possibly negative, effects on overall development cost.

An obvious counter-argument is that despite the presence of such static checking, unit tests are still needed to evaluate functional correctness (at least in the 99% of cases where fully static proof of functional correctness is judged a bad cost/assurance compromise for the domain). What is the point of catching bugs through types if you still need to write the unit tests after this, and would catch bugs at this point anyway?

One could still argue that pre-testing bug catching has a lower cost. One could argue that the knowledge that the code is well-typed decreases the cost of writing unit tests: maybe a lot of untyped tests are unnecessary and can not be written? I must say I have very little practical experience of unit testing in dynamic language, and am unsure which amount of work solely increases assurance in type-correctness.

There is this same question of transition between a static and a dynamic model when using contracts. Why would I put effort into annotating my code with static code annotations if I know that I will later make the choice to add more expressive dynamic contract checks anyway? This conflict is lessened in languages where the transition between the static and dynamic variants of an assurance-increasing technique is facilitated. (That is, Racket, mostly.)

Testing for the type in unit

Testing for the type in unit tests unnecessary 99% of the time, because if there is a type error then normal tests will fail anyway (either because some function raised a dynamic type error or because the assertion fails). For example:

def sort(xs):
	if len(xs) <= 1: return xs
	return sort([x for x in xs[1:] if x <= xs[0]]) + \
	       [xs[0]] + \
	       sort([x for x in xs[1:] if x > xs[0]])

Lets write a test for this:

def is_sorted(xs):
	return all(a <= b for a,b in zip(xs,xs[1:]))

assert is_sorted(sort([3,2,5,1,2,3,2]))

The chances of this assertion still passing after introducing a type error are very low. For example if you forgot to put the pivot xs[0] into a list [xs[0]], then is_sorted will raise a type error. If you put the pivot in too many lists [[xs[0]]] then there will be no type error but the assertion will still fail. In general things have to conspire in very strange ways for unit tests to pass despite type errors. The more common case is that you didn't test a particular piece of code (usually error handling).

Contracts nibble at this as-you-type advantage of types. One can easily imagine a contracts system that in addition to checking values dynamically, also uses quickcheck-like randomized testing as you type. If you annotate a function with the contract int -> int then it will throw random ints at the function and the dynamic checks will make sure that the contract holds. You can also easily imagine exploiting this to do "contract inference": assign a trial contract to the value that we want to infer a contract to, and decide by randomized testing whether it is a good contract. You have to do this in an intelligent way because the space of all contracts is huge, and because you want to assign the most general contract. I think it's doable by searching a space of contracts from weakest to strongest.

For example if you had the value x -> x+1 you might have the following series of tests when searching over the space of contracts C = {Any, None, Int, List[C], Func[C,C]}:

Any? passes.
  Int? fails.
  List[Any]? fails, this prunes an infinite number of candidate contracts List[C]
  Func[None,Any]? passes.
    Func[Int,Any]? passes
      Func[Int,Int]? passes
      Func[Int,List[Any]]? fails
  etc.

This way you can infer a set of "locally strongest" contracts. Perhaps instead of exploring weakest to strongest it is better to maintain a a set of upper AND lower bounds on the contract and do a kind of binary search to deal with positive an negative contract positions in a symmetric way (you'd start with Any...None, then go to Func[None,Any]...None, etc.). You probably need some additional mechanisms and a lot of engineering to make it work in practice, but it would be an interesting experiment I think. Imagine an IDE that displays the inferred contract above each function as you type them. Functions with type errors would get very weak inferred contracts that you'd be able to spot. For example the contract of the function x -> 1 + x + "foo" would be inferred to be Func[None,None], meaning that the function doesn't work for any values.

Cost is one of those fluid

Cost is one of those fluid things. I've found both static and dynamic checks to be rapid, and depending on scenario, to have different benefits. Likewise, a lot of useful lightweight static checks need not adhere to rigid type disciplines (e.g., tab completion is still effective for dynamic languages). Many researchers are observing an inflection point in the static/dynamic design space because the cost of compute is going down much more rapidly than the increase in program size and number of tests. This gets interesting because, for example, I might write a dynamic contract for a function and ask for it to be synthesized, or expect continuous unit testing.

For example, when hacking on the C3 web browser, I ran unit tests maybe every 5-10 minutes because most of my cores were idle anyways. If I can develop a browser in this manner today, what does that say about the design of research tools that will be adopted further out? I think it's worth being critical about the comparative and long-term costs/benefits of different approaches. For example, types will make many search-based techniques more tractable, and thus act like an optimization for tools relying upon them. That, to me, is a significant change in thinking.

Tooling will have a lot of

Tooling will have a lot of influence on this. If you can write unit tests that are automatically invoked by the IDE when the code they targets changes, you can get timely feedback about your errors in a very pervasive way. Static typing gives us that for free, but its always very limited. If we focus too much about static typing, we might ignore other tools that are equally useful in different ways.

Static types make some search tasks easy, but not all. Many types are too complex to encode as static types, but we'd still like to "find" them; and so we have naming conventions and comments. I actually don't think formal-encodings are in competition with run-time/informal-encodings, we really need to understand and leverage both.

Hey, dude, it already exists.

If you can write unit tests that are automatically invoked by the IDE when the code they targets changes, you can get timely feedback about your errors in a very pervasive way.

In fact, not only does it exist today, but there is no requirement "if you write unit tests", because an IDE testing plug-in does it for you: Pex! Here is a screenshot where the IDE presents a test coverage progress indicator of unit tests it has generated so far.

Science!

[edited top-level post to reflect finding]

Maybe...

...it justs proves that the unit tests were not quite complete enough. But that's really more of polemic point of view.

Maybe this is not the right place, but in my opinion the striving for Proof using typing systems such as Haskell, while nice academically, is not wholly applicable in the engineering world.

If there is measurably benefit, after cost/benefit analysis, of applying particular technologies such as static typing systems of the Haskell's ilk then it should be used.

But in engineering, tolerance limits are often set that do not require the 'bestest' 'safest' solutions; sometimes 'just good enough' will do. E.g. some mission critical system, such as nuclear reactor, might benefit from more Proof but a simple script to check for unreachable URL's might not.

It's not even clear that

It's not even clear that formal methods lead to more reliable systems at all, even disregarding the costs. If you take the two step process of first developing a formal specification and then implementing it VS developing the implementation directly, it's not clear which leads to better software. There is the implicit assumption that the specification is good or at least that the specification is more likely to be good than a directly developed executable program. For all but the most mathematical programs I find this highly unlikely. The continuous feedback you get from being able to look at the results of the executable program while you're developing it probably vastly outweigh the advantages you get because the specification language is not constrained to be executable (there are of course programs where that is very advantageous, for example specifying a sorting routine is easier than implementing it). More focus on the whole development process is needed than "we proved it correct according to the spec so we're done" to be able to effectively identify which parts of a program benefit from which reliability techniques (verification being one of them) and to have them smoothly interoperate for the complete program. For example I want to be able to specify properties in a single notation for both randomized testing and formal verification, so that I can first specify it and later decide whether the added guarantees of formal proof are worth it on top of automated randomized testing of the property.

That sounds very much like

That sounds very much like the reasons some people like to use dynamic and interactive languages such as Lisp.

To be able to easily test loose concepts independently or as part of the overall system is very appealing. It allows for a very iterative development approach which is nice when a problem domain is ill defined.

As the system moves closer towards the known requirements then extra robustness can be included.

For some people they may prefer to do that 'exploration' at compile time; I can not speak from experience there because C++ is the only statically type language I use at the moment (Work), and I prefer CL for personal hacks.

Straw man

You're doing a bit of a straw man argument here: you're opposing a rigid, waterfall specify-then-implement model to a continuous implement-test feedback loop. In practice certified software developments I know of also use an implement-prove loop: specifications and software are usually "co-designed", built mutually from an unclear idea of the goal to a precise end product; similarly, both of them can be left unfinished in areas where it was not deemed worth the costs at a given point in time.

I completely agree with the need for smooth integration of all kind of correctness-improving techniques. There are people working on that; in fact there is a whole conference series called "Test And Proofs" (TAP).

I agree, it was a bit of a

I agree, it was a bit of a straw man especially if you already share a similar view. But other people don't, and find it a crazy idea that using Ruby could be better for correctness than Coq. You can do iterative development of the specification/implementation, what I was trying to say is that you need the tools to facilitate this (or phrased negatively: the presence of formal verification should not inhibit iterative & experimental development). One of those tools is testing/proving of the same specifications. Others are being able to experiment in ways that violate preconditions, to base proofs on unproven assumptions, to have tests where a human is the correctness/error detection oracle, etc. More generally that defining correctness as binary property based around a formal spec is too narrow a view. Whether software works is a probabilistic and subjective property, and that tools should guide the development process as much as possible to good solutions (quickly and cheaply).

Formal methods, and even

Formal methods, and even more importantly, *available* and *usable* formal tools can help tremendously. While unit tests require a minimum discipline to even write them correctly (read: usefully and cover what your many code reviews still have overlooked).

And tons of unit tests can be useless, too, if over-redundant to each other or poorly fed with improper inputs. I saw all sorts of strange animals over the years, including unit tests that wouldn't even cover decently if only the nominal cases for the components being tested. The sort of things that make you mad when you discover those weeks or months later after jumping in a project where you had been reassured with a "oh, yeah, we do unit tests of course, duh!"

Yeah. Right.

But if we're speaking of large scale software, a fortiori with heavy legacies, still, I would never drop one in favor of the other exclusively (unit tests "versus" static typing/analysis).

Not with Turing complete languages, anyway. That would just be silly and accepting eyes wide open your shooting yourself in the foot.

Whoever claims otherwise is daydreaming, IMO. We're only in 2012. Or the person doesn't actually care about which real world dilemma the vast majority of programmers is actually busy (or, coping) with.

I appeal to have always more user friendly and ubiquitous formal tools, I'd almost beg the budget guys to get us those, if needed, but I'll also demand to keep whatever fraction of my unit tests is deemed necessary. And likely for a couple more decades to come, very minimum, I'm afraid.

Now I wonder how much better

Now I wonder how much better they can get than haskell with Coq or F* solutions :)

Similar to the other peer

Similar to the other peer comments, I don't think this study proves very much. They look at the end result of the programming activity rather than the activity itself. What the study shows is that static types can improve on a mature solution, but it doesn't say anything about the maturation itself, it doesn't say how static types can improve the act of programming. As if program code were simply construction material assembled in a deterministic way by construction worker programmers, which is not true!

What the study shows is that

What the study shows is that static types can improve on a mature solution

Yes. Crucially, it starts to quantify the improvement (and a bit of the cost). I agree a lot more can be done -- science is incremental :)