Static vs. Dynamic Languages: A Literature Review

We've mentioned some empirical studies of programming languages a few times, but I haven't seen a comprehensive list we can use as a reference.

Fortunately, I just came across this pretty decent overview of existing literature on how types impact development. Agree or disagree with Dan Luu's position, the comprehensive list warrants a front-page post in my opinion.

One point worth noting is that all the studies used relatively inexpressive languages with bland type systems, like C and Java, and compared those against typed equivalents. A future study ought to compare a more expressive language, like OCaml, Haskell or F#, which should I think would yield more pertinent data to this age-old debate.

Part of the benefits of types allegedly surround documentation to help refactoring without violating invariants. So another future study I'd like to see is one where participants develop a program meeting certain requirements in their language of choice. They will have as much time as needed to satisfy a correctness test suite. They should then be asked many months later to add a new feature to the program they developed. I expect that the maintenance effort required of a language is more important than the effort required of initial development, because programs change more often than they are written from scratch.

This could be a good thread on how to test the various beliefs surrounding statically typed and dynamically languages. If you have any studies that aren't mentioned above, or some ideas on what would make a good study, let's hear it!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

an archaeology of successes and failures

how to test the various beliefs surrounding statically typed and dynamically languages.

It seems to me that prior to those questions is, in some sense, the question of what is worth believing about any programming language.

For example, consider a claim of the form "This language reduces bugs per programmer hour by around 8%"; or "This language shaves 15% off the time it takes programmers to complete a certain kind of challenge task."

Notoriously it is very difficult to prove such claims in a convincing way. That suggests: are these really important questions worth the effort of testing?

An archaeology of programming language successes and failures might be helpful for getting a sense of what qualities (absolute or relative) among programming languages matter to their uptake, economic impact, and so on.

My own hypothesis is that such an archaeology would suggest that a programming language must either (or both) greatly reduce the amount of labor needed for some class of programs, or greatly expand what kinds of program can be written at all. In other words, a programming language will succeed if it either helps to eliminate jobs, helps to create a whole new industry, or some combination of those. (Thus, I doubt the marginal differences along axes such as typing are all that important.)

I think being able to

I think being able to produce programs robust against any or many errors impacts every success criterion you list, and protects against many reasons for failure you omit. I think types are indeed one tool to produce such robust programs.

Seems pretty relevant to me, and really in principle, it's just a question of what kinds of types help produce robust programs at high productivity, not whether types can. Perhaps the type properties we currently check aren't that important, but higher-level modular session/protocol-typing would be very beneficial. Empirical studies will bear this out, so what kind of studies are best?

People are notoriously hard

People are notoriously hard to study empirically. You can test simple things out for sure, but how many PL features are simple enough to be reliably empirically tested in a controlled manner? Especially when you want to cross the threshold from simple types to more complicated ones. And of course, you have to balance power with usability, as always (power is useless if it can't be used).

So when empirical methods fail, what else can we do? Incidentally, Jonathan Aldrich and company did a qualitative study on their type state work. This is quite controversial, many arguments when we discussed it:

http://www.cs.cmu.edu/~aldrich/papers/icpc15-searching.pdf

See also expert walkthroughs that are used to guide design for complex professional tools where usability studies are much less reliable.

People are notoriously hard

People are notoriously hard to study empirically. You can test simple things out for sure, but how many PL features are simple enough to be reliably empirically tested in a controlled manner?

I don't think simplicity is relevant if the magnitude of the result is sufficiently large. For instance, I don't think the refactoring study I suggested would be that difficult unless the difference is very small. Have students do the first program at the beginning of a course, and the change at the end.

Re: type state, that's an interesting study. Sounds like type state + an IDE geared towards type state reasoning, like highlighting valid transitions, would handle the simpler questions of this sort. Protocols that involve creating multiple objects to get the state you want are not so trivially addressed. Perhaps declare a type for a new binding designating the desired type state, and use a key combo which accepts the variable designating the starting state, which begins a local search of the possible transitions until a path is found.

You would be including a

You would be including a bunch of biases at that point where your result might not be very accurate. E.g. how much training and knowledge do you expect participants to have? Are you basically leading them in the study? Also, are you evaluating usability or...something else?

Even getting reasonable results on things that we are pretty sure are useful are very difficult. It is much easier to test a cognitive process (e.g. abstract reasoning capabilities) that might be related to your type system, get some results, then show that this process is actually related to your type system through additional tests.

Anyways, these are always easier to think about than to do.

You would be including a

You would be including a bunch of biases at that point where your result might not be very accurate. E.g. how much training and knowledge do you expect participants to have?

Does it matter? A large enough sample where people get to choose their own language would account for such variations. A simple to administer exercise as part of a class over 3 or 4 semesters should be enough participants to qualify. As far as costs go, that seems pretty low. Certainly the studies already outlined would have cost much more.

Biases tend to amplify each

Biases tend to amplify each other, not cancel each other out! Your results would be so noisy that they probably wouldn't even be useful in non-scientific contexts (e.g. to have a feeling if the technique was better). Success in the results would probably have more to do with the individual than whatever you were trying, and you wouldn't be able to see any correlations.

Anyways, typically in these cases of ideological investigation, one already knows what results they want to show, so they just have to make the data fit :)

Biases tend to amplify each

Biases tend to amplify each other, not cancel each other out!

Often, sure. So can you actually point out what sort of bias such an experiment would exhibit? Because it seems that with a large sample of the sort I suggest, there will be newbies and experts across the spectrum of both dynamically and statically typed languages, so an appropriate statistical average + the standard deviations of completion time should be meaningful measure of maintainability.

And if the survey includes self-asssessed skill level, or preferably the number of years and/or months programming in said language, that will provide a more objective means by which to filter the data.

So they are working on one

So they are working on one program? Who chooses that? What about processing lots of unstructured data using a very strict statically typed language? I'm sure types have some biased example programs also. And...big project, small project, think to program, program to think, big team, small team, line wolf, write time, read time, debug time, quick and dirty prototype, safety critical life support system, CRUD app, web app, compiler, UI, framework, everything from scratch....

Not to mention there is no control group in your study. The languages are quite different already even if you ignore their typeyness, and the libraries as well. Establish a clear baseline, study a small feature from tha baseline. Grand uncontrolled studies do work.

Control groups aren't needed

Control groups aren't needed for all studies. I'm not sure why one would be needed in this case. It doesn't even sound like you're responding to the study idea I suggested, so let me outline it once more:

Each participant writes their own program in a language of their choice, using just the language's standard library. They take an input and print an output that's checked by a comprehensive test suite the experimenters use to ensure program correctness. Months later, they extend that same program with new behaviour that's also checked against a test suite for the extended behaviour. Development time and time needed to extend the program are tracked, users report their skill level with language in some way (like number of months of experience). A number of such experiments with different tasks would yield a broad range of data across many problem types.

So can you please be more specific about what biases you expect to see in such an experiment? Certainly one experimental outcome wouldn't be definitive, but a range of such experiments covering various tasks certainly would compelling evidence of the maintenance burden of existing programming languages, both typed and untyped.

Ok, so really no control at

Ok, so really no control at all. Random people writing random old fashioned string IO programs (that are best written in perl) making random changes after a few months. oh, and something about types, right? So how do you expect to get a clear signal out of so many random signals? Whatever your results are, it is just too unorganized to be convincing.

Let me put it this way: what concrete hypothesis are you testing for in the study? Be specific. If it is "types make maintnence easier" balance that with competing approaches, as in "regression tests make maintnence easier". And a null case: no tests and no types should really suck. Then start with the same base language, split your participants into three or four groups (null, tests, types, types and tests). I think Stefan has done something like this already. Don't forget to lock the participants in a dungeon during the 3 month wait, though getting sign off on that from your ethics board might be hard (at least they aren't lab mice).

So how do you expect to get

Random people writing random old fashioned string IO programs (that are best written in perl) making random changes after a few months.

That's an awful lot of assumptions you're making. I left the type of program unspecified so the problem can be changed to suit any number of tasks. I can't even fathom why you'd assume string IO. They could implement Ackerman's function for all you know.

So how do you expect to get a clear signal out of so many random signals? Whatever your results are, it is just too unorganized to be convincing.

Empirical studies like this are conducted all the time, and broad, meaningful correlations can be derived. Just look at the study analyzing github. Way noisier than what I'm suggesting, yet still some interesting results.

Let me put it this way: what concrete hypothesis are you testing for in the study? Be specific.

I already explained this:

Certainly one experimental outcome wouldn't be definitive, but a range of such experiments covering various tasks certainly would compelling evidence of the maintenance burden of existing programming languages, both typed and untyped.

Now say, if existing typed languages were correlated with better outcomes [1] overall while accounting for experience, that's suggestive (Edit: and if you think this isn't the case, please explain why). Like I said, certainly not definitive, but there's no need to be precise from the get go and nail down exactly what type of reasoning yields this benefit, and I'm not sure why you're focusing on this precision as the only desirable measurement.

[1] where "better outcome" is defined as less time spent overall to satisfy the test suite

That's an awful lot of

That's an awful lot of assumptions you're making. I left the type of program unspecified so the problem can be changed to suit any number of tasks. I can't even fathom why you'd assume string IO. They could implement Ackerman's function for all you know.

Well, a focus on functions would definitely make FP look better. Any program kind you choose will be a huge source of bias, obviously.

Empirical studies like this are conducted all the time, and broad, meaningful correlations can be derived. Just look at the study analyzing github.

The study analyzing github was not studying people, but artifacts produced by people. This is quite reasonable and common: if studying people is hard, study the code they produce instead. You are proposing to study people this way, do you have a link to a study like that?

You are proposing to study

You are proposing to study people this way, do you have a link to a study like that?

Study people how? How am I not exactly studying code produced by this experiment to reach conclusions exactly like the github paper?

Instead of disparate code bases with arbitrary input/output and correlating against defect rate, I'm suggesting fixing the input/output and fixing defects at zero (against the test suite), and measuring development time instead.

psychology vs. archeology

We can study people indirectly by the artifacts they left behind. Its not a great way to study them, the results require a lot of (hopefully unbiased) interpretation, but often then are dead and long gone, so we don't have many other options. In our case, the programmers might still be alive, but they are otherwise inaccessible to us.

On the other hand, the artifacts are unbiased: they weren't influenced by the study (no spooky observer bias/quantum effects). So take the github study: the study itself didn't influence the artifacts. Contrast that to a controlled experiment in the lab...you have to be careful to correct for all sorts of biases that arise in a lab setting.

What you are proposing is the worst of both worlds: an archaeological study that biases the artifacts upfront by what you call "fixing" vs. a study on artifacts as the arise naturally in the wild. Whatever results you get from that are not reliable. Either go with archaeology or go with a controlled experiment, but don't mash the two up haphazardly.

So take the github study:

So take the github study: the study itself didn't influence the artifacts. Contrast that to a controlled experiment in the lab...you have to be careful to correct for all sorts of biases that arise in a lab setting.

You're once again making assumptions. I don't see a lab setting proposed in any of my comments. The experiment can be done a number of ways, an ordinary lab that forms part of a regular course (which they already do), or as a take-home assignment with self-reported development times, each of which has its own limitations.

And if you think the results would be so unreliable, you should be able to name a specific bias introduced by these results. Your vague warnings aren't convincing.

Right, you aren't trying to

Right, you aren't trying to do a controlled experiment, you are only controlling a little bit. You aren't trying to do archaeology like the Github either, because you are biasing the artifacts being studied (by merely stating the ground rules). So it is neither a controlled experiment nor archaeology....what are you calling it? Can you point out an example of such an experiment, given that you aren't doing something like the github study at all?

As for specific biases, start here:

https://en.wikipedia.org/wiki/Observer-expectancy_effect

In research, experimenter bias occurs when experimenter expectancies regarding study results bias the research outcome.[2] Examples of experimenter bias include conscious or unconscious influences on subject behavior including creation of demand characteristics that influence subjects, and altered or selective recording of experimental results themselves.

Scientific control attempts to reduce bias:

A scientific control is an experiment or observation designed to minimize the effects of variables other than the single independent variable.[1] This increases the reliability of the results, often through a comparison between control measurements and the other measurements.

But you aren't doing that, as you admit. Then there is archaeology as with the github study, but you would have to avoid influencing the artifacts at all to do that.

Can you point out an example

Can you point out an example of such an experiment, given that you aren't doing something like the github study at all?

Sure, there are literally thousands of medical studies using self-reported data, which influence the direction of future research on larger scales.

Re: observer-expectancy. Where does this factor into my hypothetical study? The students are given a problem to solve that is checked by a test suite, and they do it on their own time but subject to a deadline as part of class. They are asked merely to total the time taken to write a program that passes all the tests, and possibly some other survey questions, like rating the relative difficulty of the problem. Some months down the road, they add a new feature and pass a new test suite, and fill out the same survey again.

Where exactly is the observer bias? Where exactly does any bias enter into this experiment, beyond the usual self-reporting bias?

Repeated experiments on different types of problems eliminates programming language bias, which was the point of such a low-cost study.

A small number of concurrent validity experiments can confirm or repudiate the validity of the self-reporting data, as is usually done. These can be tailored to a meaningful subset of the languages that were either most used by students in the self-reporting dataset, or that provided sufficient data and provide a meaningful cross-section of programming language power.

Sources of biases

I agree with Sean that you're vastly underestimating the bias problem. In this post I make up some plausible examples. But if you're asking this question, I wonder how carefully you read Dan Luu's discussion, since some of it is about surprising result that might be due to biases.

If you want a specific example of bias or experimental error, you'd need to define an experiment much more. Since you didn't, I'll refine the experiment myself in various ways. I'd guess many of these criticism probably apply also to the GitHub study and limit what we can learn from that. I'll also advance alternative explanations which I might or might not believe in — as in "maybe Haskellers are smarter than Javaers". Please don't take offense at them: my point is not that any of these hypotheses is true or not, just that it's hard to exclude them, and maybe that somebody actually believes in it.

Background: I haven't done such experiments, but I've followed a one-semester course on empirical methods, so I know that's essentially a separate career. Separately, I heard of a master thesis supervised by the professor (Christian Kästner) and done by a CSist with a psychology background (Janet Siegmund, née Feigenspan): the whole thesis was about the experimental design needed for a meaningful controlled experiment. That student is now a researcher, but I'm not sure that, after quite a few years, we have all the needed building blocks.
Say, suppose you want to control for programming experience: One of her papers was about how to estimate it reliably.

Suppose that participants are free to choose which language to use, some participants choose Haskell and others choose Java (a "less typed" language). Suppose for simplicity that participants using Haskell are orders of magnitude more effective than the other. I claim we still can't be sure of any specific explanation, in particular, we can't be sure it relates to the language themselves. Even then, we wouldn't know which specific feature matters, especially whether types are relevant, but since it's hard to just "add cool types" without changing the language design, I won't try doing that.

For instance, this might simply mean that some programmers are inherently smarter than others, no matter what language they use (which I'd find plausible), and they happen to choose Haskell because it's more fashionable due to unfounded claims on it, even if it were intrinsically somewhat worse. In that case, a startup should still pick Haskell over Java, but the result is irrelevant to a university choosing which language to pick.

Alternatively, it's widely believed that Haskell has a longer learning curve (and some would say it's harder to learn), so participants who learned Haskell enough to pick it might simply have more programming experience, or be smarter.

Conversely, if Java came out to be better, that might depend on its educational material, library documentation, library availability, tooling, and what not.

To fight all this, some controlled experiments (like Hanenberg's at OOPSLA 2010, mentioned in Luu's review, http://courses.cs.washington.edu/courses/cse590n/10au/hanenberg-oopsla2010.pdf) compare languages constructed so that all these aspects are identical.

Worse yet, some Haskell libraries seem to depend on your ability to do type-based reasoning (I've seldom seen point-free style in Scheme, for instance).
Type-based reasoning is something fairly abstract and thus (in general) cognitively hard (I understand there's immense evidence that abstract reasoning is hard; what I find most fun is that some languages never abstracted the concept of "three" out of "three apples" and "three horses").
Maybe Haskell types are better if you can reason about them in your head, but otherwise you'd need non-existing tool support, so this might bias against Haskell.

Let's now get to the problem choice — there you have experimenter bias. If we suppose that some language is more suitable for some problem, for both essential and accidental reasons, how do we pick an unbiased problem? For instance, should our problem require implementing a GUI? I often hear that OOP is actually good for GUIs; but maybe GUIs happen to be biased against Haskell for accidental reasons (nobody built good libraries for it).

Especially, how do we test that our problem is not biasing across languages? In fact, we probably should also expect bias from the way the problem is stated.

The "easy" way to do this properly is to repeat the experiment across different domains. If we want to do good statistics on it, let's take at least 30 different problems. BTW, we should probably try reproducing each experiment, and see if we manage at all — we'll probably fail. OK, not so easy after all.

If we assume experienced programmers, we might be able to ignore differences in concrete syntax — otherwise, we should also look at syntactic differences, as in research on the Quorum language. Experiments suggest that Java syntax is no more usable for beginners than random syntax.

Thanks for the detailed

Thanks for the detailed reply. I agree the experiment I describe is underspecified, and not being an expert, will likely remain so, but I have described a sufficiently complete skeleton of this experiment in comments above IMO. As for your specific list of possible biases:

1. Haskellers might be smarter: sure, but my experiment wasn't designed around only two languages where this bias would kill any results, it's designed to be inclusive, gathering as much data on as many languages as possible for the same sort of work. If there are consistent trends among statically typed languages that share commonalities (like Haskell, F#, OCaml), that's suggestive even if it's not definitive. This yields a meaningful target for more focused experiments with more controls to nail down specifically what property is advantageous. Can you honestly tell me right now which languages are actually most productive for meeting a specification on a deadline? That seems like an important starting point for any further study.

2. Choice of problem: I agree the problem choice can introduce bias, which is why I suggested a wide variety of problems. Perhaps 30 different problems, like you said. I also at some point suggested using a class already studying various problems, like a class on advanced data structures and algorithms, as a viable pool of participants since they're already doing labs like this. Let students pick their own language, and make lab submission subject to automated tests with a survey. Repeat it every year, and even at different institutions if you can standardize the set of problems and get some cooperation on this.

3. Phrasing of problem: this certainly could introduce bias, but the known biases resulting from phrasing can be mitigated, and the unknown biases will only reveal themselves by actually conducting the experiment.

4. Syntax: this is indeed a problem if the participants are beginners with the language, but presumably a participant will choose a language with which they are somewhat familiar to meet their deadline. You can't perform this experiment with neophytes and derive suggestive information about types, though it will yield suggestive information about syntax and semantics as you say.

5. Experience with language: certainly the more effective Haskellers might simply be more experienced with Haskell, but the survey of the time taken can also ask how much experience they have with the language. With enough data, we can analyze the effectiveness of programmers with X months of experience in Haskell vs. Java vs. ...

The point of the experimental framework I suggested was to make it cheap to execute and encourage cooperation to gather as much data as possible on effectiveness. Even if not definitive on specifically what is effective about some languages, it can be suggestive, and actually provides a direction for more focused experiments. It seems like a critical start to identifying what actually seems to work without stumbling about trying expensive, highly controlled experiments with flawed languages on small scale problems which have no ultimate meaning.

To fight all this, some controlled experiments (like Hanenberg's at OOPSLA 2010, mentioned in Luu's review, http://courses.cs.washington.edu/courses/cse590n/10au/hanenberg-oopsla2010.pdf) compare languages constructed so that all these aspects are identical.

Except typed and untyped languages are simply not the same. Only the most basic languages and basic programs could be studied this way, which isn't meaningful at all. You certainly can't conduct this experiment with a language that utilizes sophisticated type-directed resolution like Haskell, or type-enforced protocol safety like TyPiCal because, to my knowledge, there is either no dynamically typed equivalent, or the performance wouldn't be remotely acceptable so no one would choose to use such a language anyway.

I think you're describing a

I think you're describing a slight variant of the ICFP Programming Contest and similar things (where the submissions are made by self-selected teams). Do you want to run stats on winning languages :-)?
https://en.wikipedia.org/wiki/ICFP_Programming_Contest#Prizes.

The essential problem with uncontrolled experiments is that your risk measuring just noise. To us programmers, they seem to arise out of nothing, but psychologists have adopted them for reasons. I do guess they were adopted over the course of time, as people tried and failed to learn much from uncontrolled experiments.

Even with all the restrictions we use nowadays, we still often just measure noise (google for "reproducibility crisis" for info — this ranges e.g. from biology to psychology and sociology). Worse, you can reliably perform rigorous experiments proving parapsychology is real (see http://slatestarcodex.com/2014/04/28/the-control-group-is-out-of-control/).

Hence, results would probably boil down to what the experimenter wants to see and be a poll on experimenters; or at least, once I can claim that, you have no way of disproving me.

I'm not sure this means that you shouldn't do this experiment if that's your plan. But I think all these problems are one reason why researchers are careful with exploratory experiments.

But ignoring all of that, let me get to some specific points.

With enough data, we can analyze the effectiveness of programmers with X months of experience in Haskell vs. Java vs. ...

I agree data on that would be nice. Since Haskell has a steep learning curve, I conjecture Haskell would fail many (some) of these comparisons for equal X. What I'd hope to see, however, is that Haskell programmers become more effective after enough experience.

Except typed and untyped languages are simply not the same.

I totally agree, but if you just compare actual languages, you are not comparing specific features.

If there are consistent trends among statically typed languages that share commonalities (like Haskell, F#, OCaml), that's suggestive even if it's not definitive. This yields a meaningful target for more focused experiments with more controls to nail down specifically what property is advantageous. Can you honestly tell me right now which languages are actually most productive for meeting a specification on a deadline? That seems like an important starting point for any further study.

Your experiment still tests something else. If you have to meet a certain specification with a deadline with single-person teams picking a language, what's the language used by the productive programmers? That doesn't help to pick a language when the people have already been picked, but it is still a somewhat worthwhile question, as long as you don't try generalizing it to questions that might seem similar.

+1

+1

A variant of ICFP with a

A variant of ICFP with a final survey is only the starting point for what I suggested. The difference is that participants in an ICFP-like contest are self-selected, not randomly selected like the makeup of a programming class, as I suggested. Self-selection is an obvious source of bias.

You'd also need to add the second part consisting of changing/extending the program with new features. I can't remember if that's ever been a part of ICFP.

Even with all the restrictions we use nowadays, we still often just measure noise (google for "reproducibility crisis" for info — this ranges e.g. from biology to psychology and sociology). Hence, results would probably boil down to what the experimenter wants to see and be a poll on experimenters; or at least, once I can claim that, you have no way of disproving me.

Sure, which is why the experiment is easily conducted at numerous institutions with their own experimentors; each institution can even try a variation of the same survey questions to ascertain whether there's some implicit bias in how the question is asked.

The point is to be cheap and easy to execute, since there seems little real funding for controlled experiments of the kind you and Sean prefer.

I totally agree, but if you just compare actual languages, you are not comparing specific features.

But since we both agree those features are fundamentally incomparable, then all you can check is the meta-claim that in *most* scenarios, *most* typed languages are superior to *most* untyped languages [1]. This requires an experiment very much like the one I suggested, just with more precise measurements on timing, something which costs money that virtually nobody is willing to spend.

So where does that leave us exactly? Ideally, using a less precise, noisier measurement to hopefully drive interest in obtaining a more precise measurement.

[1] edit: this obviously isn't the only metric of interest, just an example; for instance, dynamically typed languages may be better on average, but certain typed languages may be better than any other dynamically typed ones, which is also a result this experiment would find with some better controls

To be honest

I always felt that functional programming, and type based reasoning, is for the simple minded.

That isn't to say that you cannot do entirely baroque complex things, but the core of functional programming, just glue an application together through function composition is utterly simplistic.

I saw the same with Erik Meijer in a comment on C#. He has the problem, like I do, what to do with all those different keywords.

I can't say I understand it, but to claim that Haskell is for people who can think better, or more abstract, no, I don't think that's true.

Programming shouldn't be

Programming shouldn't be hard. Simple minded people (or people who don't want to devote complex minded reasoning to it) should be able to program. I'd rather spend my mental effort solving problems.

That's exactly the problem with FP these days

I remember the days where FP was heralded as an utterly simplistic safe and robust manner of creating a program. And that was true. I remember typing in a program in a text editor at home, removing one or two bugs in a minute on the university, and then handing in the assignment.

But the last fifteen years, since it was recognized that FP doesn't solve all problems well, people have been cheering about ridiculously complex solutions to solve yet another corner case FP doesn't handle in a straightforward fashion.

I don't mind research but that's the opposite of what I want. I want the process of creating programs to be simple and in the reach of most people.

That you need to be a seasoned Haskell programmer to make use of the language is not a good thing. And, no, that shouldn't be inevitable either.

Oh I totally agree. But I

Oh I totally agree. But I just want to point out that we want simply minded programming experiences, not clever ones. Haskell is in the worst offender list, but you can find cleverness in C++ code also.

Fibonacci always makes FP look easy, at least.

Hey!

Nah, that was something like a 500 line program I remember. A SECD machine or something, forgot the assignment. And that looked easy too.

Something as simple as Miranda was wonderful.

With Haskell, well, guess I am not going to post my opinion anymore.

In defense of Haskell, the

In defense of Haskell, the goal is to understand the problem well enough to come up with an as elegant solution as possible. It is a worthy goal, and can be a useful constraint, but a lot of programming falls outside of that goal.

Not the main point

Let me reiterate my disclaimer: my main point is that if something like that were true, or if the opposite were true, it'd confound the results. Actually claiming "Haskell's for people who can think better" would be offensive (unless we had robust proof, but as discussed that's not in sight), and that's not my style.

But for instance Haskell `lens` library does seem rather abstract.

I agree. But whether that's

I agree. But whether all these abstractions are a good thing?

Abstraction is good, but it

Abstraction is good, but it has a non-trivial cognitive cost. We can deal with that cost either by "making programmers smarter" or selecting for smarter programmers, or we can create languages/experiences that try to reduce that cost (e.g. program by example).

We should definitely focus more on training up abstract thinking (aka math, but not calculating, abstracting) earlier and doing a better job at it.

Don't think so

Everybody can think abstractly, otherwise you would constantly run into doors. After some decent grasp of algebra I don't think you need to train students much more.

I'ld rather have universities focus on things that matter for industry. The hard stuff. Complexity theory, machine learning, differential equations, model based design.

Maybe we both agree, but I see Haskell as a more or less harmless hobby for academics than anything else. If you say 'learn more abstract thought', I hear you saying 'learn Category Theory and Haskell', both which I find examples of how not move forward. (Though some people should do it.)

"Not running into doors" is

"Not running into doors" is quite different: that is forming abstractions and then applying them automatically. It really isn't abstract thinking! We evolved our ability for abstract thinking with our ability to use speech...which then developed into our ability to do math and so on; we are kind of unique in that regards.

The problem is the way we teach math today: it is more about calculating and memorizing, and not enough about forming abstractions, which is what we really need. Everything else are just specific applications of our ability to abstract, so a decent foundation is needed to complement more specific concepts.

I'm actually quite sympathetic to your point of view. I would totally prefer searching for ways to augment our abstraction abilities (e.g. through concrete examples) rather than require us to think harder while programming! Haskell completely misses the point, focusing on the ability to represent elegant abstractions as opposed to coming up with them in the first place. The lack of debugging capabilities just reinforces that (well-typed programs can't go wrong [to] haskell checks to make sure your thinking is correct [to] not much help in coming up with the correct thinking!).

Here is an interesting website that goes in the direction of what I mean:

http://betterexplained.com/articles/adept-method/

So this is a method for teaching math, but I think we need something like that in the programming environment: we need help in discovering our solutions, not just the ability to write them down when we have them.

It was a bit tongue in cheek

No, I agree people differ in the various manners in which you can think abstractly, but my point is that the bottom line for programmers is somewhere around algebra. Once you pass that hurdle it can be expected that you have all the cognitive faculties it takes to program.

I fully agree with that method you link to.

The thing is, I don't think the solution for software engineers in the future is more abstraction along the line of category theory. Most programming is mundane. When I think of an expert programmer with a master's thesis, then that is someone who is able to create a complex piece of software (a physics simulation), in a complex language (say C++), while juggling with performance and other requirements.

So, it's someone who can write, has a decent grasp of mathematics and physics, has a decent grasp of complex languages and the underlying machine model, and can handle a lot of requirements simultaneously.

And I think the future for CS students is in being able to translate mathematical, or physics, models into programs. Where the goal of us language designers is providing languages for that.

Of course, the bulk of programming is office automation. But a lot of that will go into the cloud, and I think is bachelor level work. Not for academics.

I simply see no future for abstract things like category theory there. I saw a mention of Dijkstra in some other thread. Well, he was wrong and he did great damage to CS. Now we have all kinds of people chasing logics in the name of correctness nobody cares much about, and people creating 'mathematical' languages and employing category theory to make it even more robust.

It's all nice, but to me a category theorist is someone who failed his differential equations exam. We need the latter kind of people in CS.

The thing is, I don't think

The thing is, I don't think the solution for software engineers in the future is more abstraction along the line of category theory.

Uhm, does anyone actually solve problems with category theory. I'm honestly curious: I thought category theory was just meant to describe monads. There is plenty of theory that isn't useful, or not immediately useful, or useful in only niche cases. And anyways, it is not knowledge of theories that is useful, but the meta-ability to learn, use, and discover theories relevant to your problem that is useful.

Limited Time Budget

But you have a limited time budget educating people and learning particular cases of math takes a teacher and practice. I'ld rather have students learn some physics, like mechanics, than waste their time.

Learning any theories will

Learning any theories will lead to the eventual formation of generalized theory skills, so its all good.

Well. That's not my

Well. That's not my experience. Nobody gives you time for that.

But anyway. Merry XMas & Best Wishes.

does anyone actually solve

does anyone actually solve problems with category theory [...] I'm honestly curious: I thought category theory was just meant to describe monads.

For my own part, I've found categories useful for conceptual clarity: framing things in terms of objects and arrows, and occasionally working out how the parts fit together into larger structures such as adjunctions, helps provide a sense of the lay of the land, as it were. I've never personally found any use for high-powered theorems about categories; in fact, I've found that the high-powered stuff tends to get in the way, as it's working with reasoning about programs rather than working with programs themselves.

Haskell completely misses

Haskell completely misses the point, focusing on the ability to represent elegant abstractions as opposed to coming up with them in the first place.

I believe that "being able to represent elegant abstractions" is one of the main things that PL research actually contributes to language design. A common type of paper starts with some ugly kind of program that people have to write, and ends up with a more elegant version of that kind of programs + the needed language support.
As a PLer, I think this kind of thing does have its place, so "misses the point" feels unfair to me.

I started arguing again why I also appreciate your goals, but I think I'd be repeating myself. As also discussed, it's hard to measure progress there, so kudos for trying.

Also, +1 on mathematical teaching.

Since you mention educational methods and "coming up with ideas", I'm afraid that while I advertise "How to Design Programs" all the time, I might not have mentioned that to you. It's an excellent curriculum, and I'm currently teaching with it.

I guess it depends on what

I guess it depends on what you think the point is. It is true that a lot of PL research has been chasing the "encode solutions as elegantly as possible", which is a worthy goal, but as Bret Victor's work has suggested, isn't really the holy grail.

I'm not clear on how HtDP innovates at all in this area. Glancing at the book, it doesn't teach you how to solve problems, just how solutions can be encoded (they could maybe focus and expand a lot on part IV..). The same, it is not teaching problem solving. The SICP, on the other hand, goes very clearly in that direction, and the SICM really brings it home. Sussman on the latter:

Classical mechanics is deceptively simple. It is surprisingly easy to get the right answer with fallacious reasoning or without the real understanding. To address this problem Jack Wisdom and I, with help from Hardy Mayer, have written a book with the title of this talk (Structure and Interpretation of Classical Mechanics) and are teaching a class at MIT that uses computational techniques to communicate a deeper understanding of Classical mechanics. We use computational algorithms to express the methods used to analyze dynamical phenomena. Expressing the methods in a computer language forces them to be unambiguous and computationally effective. Formulating a method as a computer-executable program and debugging that program is a powerful exercise in the learning process. Also, once formalized procedurally, a mathematical idea becomes a tool that can be used directly to compute results

Yet, I guess, even here, the thinking is still mostly done outside of the box (aka the computer). It is increasingly being into the box with projects like iPython, however (and the scientific computing crowd has done much more in is area than PL has).

re what kind of studies are best

My own hypothesis is that such an archaeology would suggest that a programming language must either (or both) greatly reduce the amount of labor needed for some class of programs, or greatly expand what kinds of program can be written at all. In other words, a programming language will succeed if it either helps to eliminate jobs, helps to create a whole new industry, or some combination of those.

I think being able to produce programs robust against any or many errors impacts every success criterion you list, and protects against many reasons for failure you omit. I think types are indeed one tool to produce such robust programs.

[....]

[...] it's just a question of what kinds of types help produce robust programs at high productivity [...]

Empirical studies will bear this out, so what kind of studies are best?

Convincing proofs that a programming language creates a large labor savings can be made through direct demonstration using challenge problems. For example, the power of AWK was demonstrated through numerous one-liner and near-one-liner examples of programs that otherwise took many 10s or hundreds of lines. The power of FORTRAN was demonstrated John Henry, style, by doing the work of many assembly language programmers with fewer people in less time.

Similarly, if a language opens up qualitatively new programming capacities, that too can be demonstrated directly. chem(1) was directly shown to eliminate a manual drawing process, a draftsman's job, between chemist and printer. Javascript was directly shown to expand the capability of web pages in ways previously impossible. (Conversely, Java mostly failed to demonstrate any new capability until it grew libraries for so-called middleware applications.)

Proving labor savings or new capabilities of significant scale is thus easily done by demonstration.

Testing how that relates to adoption in the field is an archaeological task that can be carried out by looking at old industry publications, general news publications, school curricula, business records, employment statistics, and so on.

Well, PL papers already

Well, PL papers already often (try to) sketch the essence of such problems. We can argue on how well they manage, but often their (academic) examples convey the essence of such demonstrations.

The problem is that "convincing a reviewer this way" might not correlate with practical usefulness; many claim that AspectJ's academic fad has shown this.

re Well, PL papers already

Well, PL papers already often (try to) sketch the essence of such problems. We can argue on how well they manage, but often their (academic) examples convey the essence of such demonstrations.

The Bell Labs series (C, the shell utils, awk, the text utils, C++ etc.) initially appeared in the literature largely as experience reports (substantially about large, immediate labor savings).

The first FORTRAN paper opens with an experience report: http://www.softwarepreservation.org/projects/FORTRAN/paper/BackusEtAl-FortranAutomaticCodingSystem-1957.pdf (about large, immediate labor savings).

Perl was announced by posting its source and advising existing users of awk and sed to switch, this after co-workers of the authors found it useful in practice.

Go was presented as the outcome of studying the use of labor at Google and getting to work on a systems language to rationalize it a bit.

I think that reviewers and academia in general is stymied these days because the low-hanging fruit is largely picked, and new developments in reliable labor-savings from new language designs is mostly marginal. (If the labor savings from your breakthrough isn't good enough to kill jobs, why should industry care much?)

That leaves journals with a lot of pages to fill and so I can believe the paper judging practices tend towards the arbitrary, political, and aesthetic.

Robustness

I think one must look to factors other than correctness as criteria for software development. In particular, rapidity of development, training of programmers, training of managers, support for cooperation, success of testing methods.

Both statically typed checked software and other stuff can have bugs. Both need testing. All development processes .. both coding and training .. cost real money.

Today, an Australian company (Altassian) listed on US stock exchanges for a huge amount (3 times its estimated value). One analyst said this is because it isn't a startup, it has been making a small profit for years. Altassian makes tools for programmers.

Their primary implementation language is Python. The founders are CS grads from UNSW.
These guys were reasonably well trained. Why did they choose a dynamically typed language which, in my view, is great for small programs but utterly incapable of supporting anything significant (having been involved in several non-trivial Python projects that's my experience anyhow).

I like their product

I am personally in the "types are a necessity for large scale software engineering" camp but they did manage to create a nice product in Python.

I suspect that if you're problem domain doesn't benefit much from the added safety of types, just engineering an application well outdoes any benefits from the type system.

Probably, their application is just a good amount of managing simple structured data well. A type system would buy an extra check on that but the language offerings probably don't outweigh the added functionality of Python.

Science requires an open mind

Given that almost anyone working on programming languages holds some firm beliefs about type systems, I doubt that objective studies on the subject are possible. Scientific research requires an openness to accept whatever the result is. If you are out to prove your point, you will probably succeed, even if it is wrong.

The page under discussion here starts with a clear statement: we need to prove the advantages of type systems. Do you expect something unbiased after that?

We are already heavily

We are already heavily invested in our positions, so only healthy doses confirmation bias can avoid midlife crises. It helps that really knowing is really hard, so it is easy to brush off finding out what the truth is.

Experience

Before I did lots of programming I did not care about type systems. I first programmed in basic and assembler. My first encounter with a type system was 'C', but since then many others. I have since done extensive work in dynamic languages. Whilst I like programming in JavaScript (its actually a good mix of object and functional if you ignore the bad parts), the lack of types causes performance problems, and you often have to imagine types, and manually make sure you don't augment objects or change the "type" of a variable, or performance suffers. They had to add typed arrays to make numeric performance acceptable. So there are practical measurable effects. People also make up types when documenting functions. You end up coercing values to int by or-ing them with zero.

All I can say in the end is that types would make it easier for me to program. This is probably due to the way I think about programming. Other people may have a different cognitive approach, that suits them better. In the end neither is right, there is probably more than one 'brain architecture' that makes an effective programmer, and those groups will work better with different programming languages. The reason we can't agree is we are different, and both can be correct for their cognitive pattern.

This leads to the conclusion we will always need a variety of languages to suit the variety of people who program. None is really better than the other, but for each person there is an optimal one.

I was talking more about PL

I was talking more about PL researchers, who have invested heavily in their chosen religion. Very few of us straddle the line between static and dynamic typings. Personally I think both are great and suck. Dynamic typing is very fluid, provided little resitence, and provides for a very easy to understand concrete type debugging experience (you run the code!). Static typing can provide for earlier feedback, enables code completion, and...makes performance easy.

I don't think people are really that diverse. We have common hardware, common processes, our intrinsic abstraction abilities are about the same. But some of us are Christians and some of us are Buddhists, and others...religion has nothing to do with brain architecture.

"Sapir-Whorf hypothesis" of static types

Because abstract reasoning is known to hard for most humans, except those people that have the raw talent for abstract math, I conjecture that proficiency with static types correlates with that talent. I'm thinking of the kind of static types that Haskell is (maybe) moving toward — say, Edward Kmett's code.

I called this a "Sapir-Whorf hypothesis" (on a whim) for a meta-level reason: I expect that whether this is "true" or "false" depends on how strictly or weakly you interpret it. For instance, if I take "advanced static type" to mean "use Coq/Agda and prove your program correct", the claim pretty much reduces to the tautological "to be talented at proving you must be talented at proving" :-).

There are, at least, a

There are, at least, a couple of fundamentally different sorts of minds that could be fairly described as having "talent for abstract math". They differ by, as best I've figured it, how well they handle masses of raw data; the ones with poor "rote memory" have a different perception of what simplicity means than the ones with strong rote memory. Likely patron saints of these two types would be, I think, Albert Einstein for low-rote, Leonhard Euler for high-rote. Seems to me you'd get, statistically, different reactions to elaborate type systems from those two sort of minds.

The page under discussion

The page under discussion here starts with a clear statement: we need to prove the advantages of type systems. Do you expect something unbiased after that?

Actually, the page I linked is of the position that there isn't a clear advantage to type systems. I think there is, though I didn't even imply that in my original post. So what bias do you see exactly?

Let's discuss

During the last few weeks we have had a "Which language is better" discussion http://lambda-the-ultimate.org/node/5277 here, and now we have a "Which typing system is better" one.

Of course all these discussions have not defined the problem correctly, so all solutions are possible.

Now, to continue this trend, shall we start a vi-emacs flame war next? I'm on the vi side.

Meh

I think we're doing pretty well at not flaming, all things considered. It would also seem... challenging... to justify vi-versus-emacs as relevant to LtU. (If it's a dichotomy, I'd take the emacs side; but it bothers me that emacs Lisp uses dynamic scope, which sabotages the great potential of fexprs... Aha! Maybe there's a way to make vi-versus-emaics a PL topic after all. <evil grin>)

Types or languages?

I haven't read the paper yet, but I see from the description that it claims to be about static vs. dynamic languages, but is actually about types. There are languages like ISLisp and R6RS Scheme that are entirely static with the exception of the type system: that is, essentially everything except the dynamic types of the values of variables is known already at compile time. (This is not necessarily true of their implementations, which often allow dynamic redefinition at the REPL, for instance.) This is completely different from such truly dynamic languages as Common Lisp, Python, and Ruby, where everything can be redefined at runtime (and even Common Lisp has both static structures and dynamic classes). And there is nothing in principle to prohibit basically dynamic languages with static typing, though I can't think of any such language offhand. I wonder how much of the claimed advantages of static/dynamic typing are really advantages of static/dynamic languages in a broader sense.

Statically Typed Dynamic Languages

Can I turn that around and talk about dynamic typing in statically typed languages, which is effectively runtime polymorhism. In Haskell this is achieved with existential types. In C++ with virtual methods. Personally I think the distinction between static-polymorphsim and dynamic-polymorphism is important and something that should be expressed in the syntax of the language. In both Haskell and C++ you can say either, "I know exactly the type of this, it is an Int" which is static typing, and "I don't know the type of this, but I know it implements this interface" which is existenial typing with type classes, or virtual methods implementing an abstract base class. Dynamic languages can be seen as having a type-class (or abstract base class) for each method, allowing duck-typing.

Dynamic and static typing are separate things

Sure, you can have mostly dynamic typing with one static type, or mostly static typing with one dynamic type per static type, or anything in between. But this is still all about types. I'm trying to get a discussion going on dynamic vs. static languages independent of types.

Dynamic and static typing are separate things

Sure, you can have mostly dynamic typing with one static type, or mostly static typing with one dynamic type per static type, or anything in between. But this is still all about types. I'm trying to get a discussion going on dynamic vs. static languages independent of types.

What is a dynamic language?

If you change a class you change its type. One of the things that kills JavaScript performance is augmenting objects, as the compiler has to create a new hidden class.

Runtime polymorphism is allowing objects of multiple classes to be stored in the same location. Augmenting an object will change its class, so this is runtime polymorphism.

Inferring types in Javascript

What's preventing Javascript from inferring types like Haskell and company do? If a type doesn't change at runtime, there's a point for a speed optimization. Well, I'm waiting for this inference mechanism to boost up speed in a magnitude of asm.js.

Have you seen the number of

Have you seen the number of type systems you can choose from in Haskell?

No, how much is there?

No, how much is there? I thought there is just this one

It uses some variant of

It uses some variant of system F where you can twiddle a lot of parameters. But good luck bolting that on top of javascript; it doesn't handle mutation.

Monads

Haskell provides mutation in the state and IO monads. The simplest way to type JS would be to have everything in the IO monad.

That's probably a non-solution

It's a fundamental problem with universal quantification. I see no reason to believe they would somehow have magically solved that problem by simply moving mutation to type classes.

Monads and Mutation

So, the state monad the the IO monad provide mutation in Haskell, why would it not work?

Unsure

Guess it's somewhere here: You can safely thread along a reference of known type in Haskell. Guess the problem is that if you would try to bolt that atop of Javascript, you can't thread along a reference type which isn't known from context. (In another approach, you'ld blow up the monad like it would be a database of all reference types used in a program. Impossible.)

If it would be that simple, the problem would have been solved nicely in ML languages ages ago. And it isn't, they need a value restriction. You're going to inherit that problem typing a mutable language no matter what.

Exactly this.

Exactly this. I would also add that its also not really a coincidence that mutation-embracing statically typed languages also have explicit support for nominal subtyping; it just goes hand in hand with assignment.

That problem requires unsafePerformIO

That problem is documented for Haskell, but triggering it requires mutable references and unsafePerformIO. Docs do claim it's the same problem that requires the value restriction in ML, and that you can't trigger it without unsafePerformIO:
https://hackage.haskell.org/package/base-4.8.1.0/docs/System-IO-Unsafe.html

To understand why: you can create a polymorphic computation that will return a reference (with newIORef, you get forall a. IO (IORef a)), but
you can't get to IO (forall a. IORef a), so without unsafePerformIO you can't create an actual polymorphic reference.

But Javascript is mutable

The question was "Why can't we bolt a Haskell type system on top of Javascript?"

Javascipt is mutable, Haskell is immutable. You're answer is: Haskell has a correct type system for an immutable language. That isn't, well not really, the question.

I still don't trust Haskell's type system though. But I am a sceptic. Has there ever been a proof of correctness?

Existential or Dependent.

One solution is to make all mutable values existential, hence they can only be accessed by an interface on the value.

The other is of course dependent types, which might be the cleaner solution, as you can later recover the type of the value stored in the mutable at runtime.

Subtyping.

Subtyping.

If you can isolate variables

If you can isolate variables that are not depending on subtyping, there is a way out for optimizing.
Edit: but there is "eval" fuction which messes up all reasoning on types, so I think it is hardly to see a solution.

OO languages (and JS is OO,

OO languages (and JS is OO, no doubt about it) depend heavily on subsumption. Even mutation involves subsumption (x := y is the same as y <: x, since assignment implies that any value of y could be a value of x, but not the other way around).

So even if we ignore eval, H&M is just not going to work very well.

JS is functional.

That's odd because it has first class functions and closures. You can define a monad (the 'Promise' is actually a monad with return = Promise.resolve and bind = then). I don't see any subsumption. There is no class hierarchy or subtyping. Assignment replaces (changes) the type of the Variable. Most JS programmers ignore eval, it should probably be removed from the standard, just use a lambda abstraction instead.

All languages have first

All languages have first class functions and closures these days. You can also define monads in them to, not that anyone does that. I guess you could use JavaScript as a functional language and not rely on subsumption. Just don't interface with any libraries or Dom manipulation, and you are set!

When people say they program in JavaScript, it's not just about a language and a compiler.

V8

The V8 compiler docs specifically talk about variables changing type, and functions being monomorphic and polymorphic. It does mention not subsumtion, see: http://www.html5rocks.com/en/tutorials/speed/v8/.
When I say I program in JavaScript, I know exactly what I mean. Googling subsumption and JavaScript returns no relevant links. I would go as far as to say there is no subsumption in JavaScript, do you have an example illustrating subsumption, that could not be typed in a Haskell like type system? I think you are using the wrong typing model, JS is a prototype based dynamically typed language, subsumption is something specific to classical OO with class inheritance hierarchies, which JS does not have.

Everyone is starting to use Promises in JavaScript, all the official HTML5 APIs are moving this way, so soon all async APIs in JS will be monadic.

Subsumption is such a

Subsumption is such a primitive concept that languages are expected to support it. It is only the odd purely functional language like Haskell that manage to really avoid it, and less pure functional languages support it without very good type inference.

A := B IS subsumption, it entails subtyping. If you try to solve it via unification, you will find that you can no long type check most programs. Subtyping via classic OO inheritance is just another form of it, but it isn't even the most common kind (mutable assignment is).

Incidentally, type inference for languages with subtyping is hard for the same problem that pointer analysis is hard...they are basically the same thing. Semi unification is a PITA.

A := B

A := B is assignment. If A had type 'X' before the assignement, it has type 'Y' after the assignment, the type changes dynamically at runtime. We are not limited to as static view of the type of A that has to remain the same (that is static typing). There is no subsumption unless you come from the limited viewpoint of a class hierarchy with superclasses and subclasses with static typing. If you do not have a hierarchical class relationship, there is nothing to subsume, if you don't try and represent the dynamic type of 'X' with a single static type you don't have subsumption either. So if you lack either one of those properties, you don't have subsumption, and JS has neither. Just because you have a hammer, doesn't mean everything is a nail.

Unification has nothing to do with this, as that is a static type system thing. JavaScript has no static type system, so there is no unification in the (non) type system either.

It is exactly subsumption.

It is exactly subsumption. If x is a value in B, after the assignment, it is also a value in A. If y is a value in A, then after the assignment, we know nothing more about it (it is not necessarily a value in B!). Even covariance properly applies to fields (fields have the same typing behavior as type parameters, covariance usage means only reading, not writing the field). This flow is exactly what subtyping models.

It isn't a coincidence that the value flow properties between assignment and subtyping are exactly the same, and why HM makes sense for pure functional languages (no assignment, no subsumption, unify!).

Unification has everything to do if you are going to apply reliable type inference to JS programs efficiently via Hindley Milner. Of course, that isn't the only way to do type inference, but those are largely unexplored territory.

Static vs Dynamic

You are only thinking of a static type system where the type of 'A' has to represent all the types that the variable can hold at runtime. With dynamic type this is simply not the case. If I had to statically type JavaScript it seems more like union types. For example:

function f(a) {
   if (a.length) {
      console.log('array');
   } else if (a|0) {
      console.log('integer');
   }
}
var x = [1,2,3];
f(x);
x = 3;
f(x);

Where the type of x is 'Array[Int] \/ Int'.

But of course in a dynamic language like JS, x is an Array, then it is an Integer. There needs to be no relation between Array and Integer, 'x' simply takes these simple monomorphic types at different points in time. No subtyping, no subsumption.

Generally we can have a heterogeneous array, and we don't care what the contents are, they can be cats, or animals. All we care about is if they implement the method we call (duck typing). No contra or covariance, just simply, does it implement method 'X'. In some contexts we can treat an array of animals as cats. Eg:

function Cat() {
    this.meow = function meow() {
        console.log('meow');
    };
}

function Dog() {
    this.woof = function woof() {
        console.log('woof');
    };
}

var animals = {
    field: [new Cat, new Dog]
};

function stroke_cats(cats) {
    cats.field.forEach(function(cat) {
        if(cat.meow) {  
            cat.meow();
        }
    });
}

stroke_cats(animals);

That could happen, and there

That could happen, and there would be no good type to be found. Note even if you rely on structural typing, value flow is still important if you want to be static, there is still technically subsumption unless you want to deal with assignment via unification. But in reality, JavaScript is a platform for using libraries designed to be used in a certain way, which is why common optimizations that involve uncovering nominal classes implicitly (like animal) work at all. You can say JavaScript is duck typed, but you'll find tons of libraries that just assume nominally typed values (they wont work unless the value is an instance of an expected class).

You could apply whatever type system you wanted on top of JavaScript and just live with the consequences....but if those consequences mean not using favorite libraries, it isn't gonna fly. So if you want to support "JavaScript" as it is used today, a Hindley Milner type system backed up with some existential quantification probably won't work. Better to go design a new language and ecosystem that conform to the desired typing ideology and then plug it in via transpilation and interop.

Duck Typing and Object Prototypes.

The reality is JavaScript is duck typed like Python. Some people may make certain assumptions, but they will come unstuck. From experience, if you don't check the property before accessing you are asking for runtime errors. It may be that people do not care about writing reliable programs in JavaScript, but for me that is important, so defensive programming is necessary, hence you have to assume duck-typing for any object passed to your code from elsewhere. If other people are not as rigourous you have to do the checks for them before calling thier code (which you would not have to do if they wrote their code with the correct assumptions). Trying to program JavaScript like it is Java/C# is a mistake. Personally I think adding 'classes' to the standard was a mistake, people should instead embrace prototype based objects, as it is a core feature of JavaScript. I like this six part post on JS: http://www.walkercoderanger.com/blog/2014/02/javascript-minefield/ and the second part on TypeScript explains quite well what is wrong with trying to treat JS as a Java/C# style OO language.

In the cat/dog/animals example, 'cat'ness can be modelled as a type-class nicely, so that stroke-cat would have the type '(exists a . Catish a) => [a] -> IO ()'. Inferring type classes does not seem to be a problem. For the first example you would infer a type 'Int | Array Int', which would have to refer to a boxed value.

When you build an optimizing

When you build an optimizing compiler, you don't look at the code that could be written in a language, you look at the code that is written in the language. Likewise, when you retrofit a type system on a dynamic language, you try to capture the typing properties of existing code. And when you evolve the language...ditto.

Classes were added to Javascript because people were using them anyways. Typescript is able to get a lot of mileage with a class-based type system augmenting existing libraries. V8 and the other optimizers find it very productive to implicitly infer classes for objects. The fact is that the programmers were using Javascript like Java/C# anyways, because probably they weren't provided much direction or training in using it as a prototype language. And they most definitely weren't using Javascript like it was Haskell!

Given that the language didn't have classes, you then have to be very conservative just in case they aren't using them. V8 can handle implied classes changing with a significant performance hit, Typescript is only optionally typed.

Promises again

With promises becoming the defacto way of doing aschronous in JavaScript, more people are using it like Haskell (monads) than you think. There are also plenty of people like the author of the article I linked to, (and Douglas Crockford), who seem to agree with me that treating JS like Java/C# is a bad thing. There are plenty of libraries written in a functional or prototypical way. If you only see classes and typescript, then you are only looking at a part of the JS community (perhaps a Microsoft oriented part?). The reason people want to program like Java/C# is simply due to the number of people who already know those languages. They have a hammer and see JS as another nail. Its a bit like people who try and program Haskell like Jave/C#, and get frustrated it doesnt have objects, or try and implement them. Adding classes to JS to make these people happy has made is harder to define what JavaScript actually is. People should have been more confident in defending the JS object model, instead of turning the language into a bad immitation of Java/C#.

I will agree that performance is improved by keeping object properties static, but I don't think TypeScript is the answer.

Personally I have moved away from having object methods, and use objects as records, and functions as modules, so that I can store state directly in indexeddb, and send them to/from a server. IndexedDB has a functional API using promises, and this is apparently the way future browser APIs will be going.

So you mean...when

So you mean...when JavaScript programmers started using promises, they stopped using mutable assignment?

JavaScript has a prototype object model in the spirit of maybe...self. Self's object model is heavily mutable, obviously. I don't think you can get any farther away from Haskell than that! And I think, for the most part, prototypes have failed for very good reasons, which is why almost everyone shoehorns their own class pattern on top of it.

The reason people want to program in JavaScript is just web. It has nothing to do with the language itself. If they could have used Java or C# or Python or Ruby instead, many of them would have.

BTW, how does haskell handle type inference for a mutable variable represented in a Monad? Does it just unify with each mutation site? Or would some magic happen with row types?

I like javascript

I like JavaScript more than Java/C# because it is more functional. I think the reason people try and use it like Java/C# is because they don't know any better, and have not been significantly exposed to the functional way of thinking.

Promises do mean using less mutability, because the then/bind operator passes the result of the promise into the first argument of the function chained with 'then', resulting in a pattern exactly like Haskell's iomonad. Also in Haskell you can use iorefs for mutability freely within the IO monad, so there really is no difference except syntactic. You have an 'IORef a' type from which you can monadically get and put values, there are no row types in Haskell unless you do some type class stuff, but I don't see why you would need it. You can have an existential type in an IORef for mutable runtime polymorphism.

JavaScript and Python are practically the same language (excepting surface syntax). They have been adding features from each other for ages, python taking first class functions and lambdas from JS, and JS taking yield and generators from Python. The only remaining difference is the classical vs prototypical object model. JavaScript used the prototypical model because of the performance improvement, to look up a method on an object takes one less indirection (methods can be on the object itself, you don't have to follow a class pointer to get to them). Because of this python will never be as fast as JS. For performance you want to use the prototype model statically. I am not a fan of ruby, I use Java/C# when I have to, I quite like Python, but I actually prefer JS, although I always use strict mode.

Existential Type?

You can have an existential type in an IORef for mutable runtime polymorphism.

Riiight. Sorry, my understanding is that with mutable runtime polymorphism or some form of subtyping (even in the form of the specialization of universal quantifiers) you always immediately run into co-/contravariance issues.

My understanding was that Haskell simply threads one specific known type along and doesn't deal with subtyping.

This is the cannonical ML example:

val r: 'a option ref = ref NONE
val r1: string option ref = r
val r2: int option ref = r
val () = r1 := SOME "foo"
val v: int = valOf (!r2)

I don't know Haskell that well, but now I am interested. How are constructions like this prevented in Haskell? Or, is the type system simply broken in the case of mutability?

type error

In Haskell, translating your code would return a cryptic type error about escaping skolem constants: the type `a` is not accessible outside the construction of `r`. To represent a ∀a constraint on a type, in Haskell you'd use something like:

newtype IORefR = ∀a. IORefR {inIORefR :: IORef (Maybe a) }

This is roughly the same as `IORef Bool` for the observations you can make on it, except you can only set it to Nothing. Which might be a useful feature (e.g. the ability to 'reset' an IORef that someone else is using, or to model a memory manager for cached computations, etc.).

Covariance and Invariance

The Skolem error hints that you try to specialize a type in a context where you cannot. But the question, apart from the ML code, is more fundamental than that.

Pure languages normally assume covariance for containers. Covariance breaks with mutable containers where the simplest solution is to assume invariance.

If they assume covariance somewhere for IORef types you should be able to break the type system. Unless they treat IORef types specially, I don't see how you're going to solve the covariance/invariance problem.

(You trivially usually have covariance in typed pure languages since list int <: list a.)

Haskell doesn't have subtyping

You are correct in your earlier understanding that Haskell simply threads along one type. Though, this type can be existential and model an interface or abstract base class easily enough:

newtype CatRef = ∀a.Cat a ⇒ CatRef (IORef a)

Of course, wise programmers would separate the mutability concern and write:

newtype CatVal = ∀a. Cat a ⇒ CatVal a
type CatRef = IORef CatVal

I can't say I've ever missed or even felt the absence of subtyping in Haskell. Subtyping is just one means to the ends of generic, reusable code. Modeling generic types is another means to that ends.

An interesting related point

An interesting related point which I was previously unaware of is that types are still co/contravariant even without subtyping. Co/contravariance comes from the existence of map/comap functions. A type like List is covariant because there exists a function map : (a -> b) -> List a -> List b. A type like Predicate t = t -> Bool is contravariant because there exists a function comap : (a -> b) -> Predicate b -> Predicate a.

Subtyping is the same thing except the (a -> b) function is a projection of a subtype a into a supertype b.

I think you need more words


map : (a -> b) -> List a -> List b
map f as = []

Indeed, you need two

Indeed, you need two laws:

map id = id
map f . map g = map (f . g)

Covariance and the map function.

I don't get this. "map :: (a -> b) -> [a] -> [b]" involves no co or contra variance as far as I can see. 'a' is a single type, and 'b' is a single type. The types of 'a' and 'b' must be statically determined at compile time, so the above is simply a template where you substitute the types at compile time. So after substitution you are left with (for example) "(Int -> Float) -> [Int] -> [Float]" its a simple map from one type to the other.

It's the function itself is

It's not the function itself is co- or contravariant, it's the List type. The function is a witness to the co- or contravariance of the type constructor. A type constructor F is covariant if there is a map : (a -> b) -> F a -> F b, contravariant if comap : (a -> b) -> F b -> F a, and invariant if invmap : (a -> b, b -> a) -> (F a -> F b, F b -> F a).

View a function `a -> b` as a generalized "upcast" from `a` to `b`. Then a type like (a -> b) -> (F a -> F b) is saying that if `a` can be upcasted to `b` then `F a` can be upcasted to `F b`.

Covariant functor

List is a covariant functor - it has a

map: (F[A], (A -> B)) -> (F[A] -> F[B])

Predicate is a contravariant functor - it has a

comap: (F[A], (A -> B)) -> (F[B] -> F[A])

Using <: to denote subtyping

Covariance in subtyping says that

(F[A], A <: B) -> (F[A] <: F[B])

Contravariance in subtyping says that

(F[B], A <: B) -> (F[B] <: F[A])

E.g:

List[Manager] -> Manager <: Employee -> List[Manager] <: List[Employee]

in other words: "if you have a list of managers, you have a list of employees, but not the reverse"

IsOverpaid?[Employee] -> Manager <: Employee -> IsOverpaid?[Employee] <: IsOverpaid?[Manager]

in other words: "if you can determine if an employee is overpaid, you can determine a manager is overpaid, but not the reverse"

So what Jules is saying is that the covariant subtyping rule is just:

map <:

and the contravariant one:

comap <:

Injective and projective

Isn't this slightly misusing co and contravariant, in the sense that the list only contains a single type so there is no variance (of the types in the list). If anything this is a semantic edge case where we are saying a single type would behave co/contravariant if we extended to a heterogeneous list setting. Technically you may include the homogeneous case in either the co/contravariant bucket, but it is degenerate. Projective and injective would seem a better description of the functors.

As to the subtyping relation, A and B are different types, if A is a string and B is an Int, there is no subtype relation. The mistake is assuming that the list holds more than one type. In HM type systems it does not. As A and B have no subtyping relation, then neither do the respective lists.

No misuse, just using the

No misuse, just using the only definition of co- and contravariance for functors in category theory.
Mathematically, subtyping is just a special case as the others said.

Can vs Must

I agree you can think of it as covariance and contravariant, but I disagree that those properties exist independent of subtyping. If a type system has no definition of subtype, there is no co/contravarance hidden or otherwise. By introducing those terms you are actually redefining the type system to include subtyping. In other words I disagree with Jules Jacobs "interesting related point".

What worries me..

Is that a type system may not explicitly denote, or use, a subtype relation but that subtype relation may exist in the 'mathematical' sense.

Colloquially, this is about having a basket of apples, which is a basket of things, and someone else decides to put an orange in that basket, which means you end up eating some pretty weird 'apple.'

I personally more or less assume a subtyping relation exists in the mathematical sense in Haskell, on closer inspection it will turn out that Haskell assumes covariance for the arguments of containers, and there probably exists some corner case where the IORef monad breaks.

But maybe I am too pessimist.

But an optimist is just another badly informed pessimist.

So, there's that too.

basket of apples

You could take `IORef [Apple]` and observe or interact with it context of `(Fruit a) => IORef [a]`. In this context you could eat some apples, or duplicate them. But that doesn't mean you could stick an orange into the basket. You'd get a type error saying: I can't prove this orange is an 'a'.

The type of an IORef is invariant. So is the type of a list. Haskell doesn't do subtyping. It does type unification. The closest we have to a "mathematical subtyping" relationship is the ability to work with an unknown type through the limited lens of zero or more typeclasses.

A := B IS subsumption?

This interpretation doesn't seem especially reasonable in case of *single assignment*. If variable A is in context subject to SSA transforms, or is a field in a Promise or other model of single-assignment variables, for example, there is no reason to assume any form of subsumption.

SSA is basically

SSA is basically functionalizing assignment in a local variable context. By giving each assignment site its own variable, you can avoid having to merge their types (beyond phi merges, of course).

Globally speaking, if a variable only has one assignment over time, it really is not mutable. If you can create a SSA representation over time, then you've basically forked consumers of the variable into their own independent things, which isn't really viable when a variable is truly mutable.

And in the case of collections, there isn't even one assignment at a time. E.g. set.add(a) and set.add(b) can be modelled as set.E := a and set.E := b. If you have one of set.E, it could be a or b.

Types are inferred in secret.

The JavaScript compiler inferres the hidden classes, and tries to infer types for all variables. If you look at V8 optimisation logs, you can see the bailouts and deoptimisations that occur when assumptions made by inference fail. If inference fails, you are back to the slow way of doing things. The problem is every compiler / interpreter does it differently, if it was visible in the syntax or type system, you could reason about code performance using the static typing. You could recover the benefits of dynamic typing through some mechanism for runtime polymorphism (existential types is one possibility).

Everyone does it in secret

Everyone does it in secret because it isn't reliable, if it isn't reliable, the programmer can't really reason about it or rely on the optimization always occurring. They want to keep it at best effort because they can't do any better.

Again, people want to program in JavaScript, not some variant of Haskell encoded in JavaScript that is on its own in terms of libraries and code reuse. If they wanted the latter, why not just go with elm at at that point.

TypeScipt

The reason I program in JavaScript, not Elm, TypeScript, or PureScript is because that is what the browsers implement, and I prefer the direct approach, as dealing with browser variances, and writing the most performant code seems easier writing JS directly, and because I want access to the available libraries. Also, I have to make sure that the other developers that might work on the code in the future can understand and modify it.

What you can say is that keeping things static in JS improves performance, so maybe writing in TypeScript and compiling to JS is the way to go. Certainly major frameworks like Angular2 are going that way, so if you want to use those tools, you are going to have to get used to static typing.

PureScript is like an eager version of Haskell, which sounds cool, but I am concerned that JS performance is optimised for mutation, and adding a pure layer on top is going to cause performance problems, as well as limiting library choices (or having to implement foreign function imports for libraries). I have resisted adding a compilation stage to authoring JS, but it may be TypeScript is the way to go. That way the JS generated will be be statically typed, and the JS compiler should be able to infer all the (hidden) static classes at runtime, as the TypeScript compiler has made sure the types are static.

I realise there is some inconsistency here, in that I prefer compiled languages in general, but resist adding a compilation stage to web-development. I guess there is something about the compile target being another high-level language and not machine code worries me in some way. I think I will have to force myself to use one of the languages that target JavaScript and see if I feel the same after using it for a while.

I think that Javascript

I think that Javascript should have an alternative through statically typed bytecode inside browsers. Like an assembler, but of the kind that could be run inside browser (well, maybe even assembler itself, that could be sandboxed, but that could have an interface to HTML DOM).

But it was always hard to round up a new standard. Maybe in the future W3 consortium should consider embracing a type of bytecode into HTML X. Who wouldn't like a number of different languages to be supported while avoiding javascript with its performance and other issues.

Languages like Elm are using Javascript as their bytecode. But Javascript is a high-level language that draws unneeded complexity that consequences in a lot of issues. Nevertheless, we are witnesses of rapidly growing syntax and semantics of all languages, including Javascript. And who is to embrace it over and over again? Allmighty W3 consortium and noone else. Programmers want to be free without W3C. With generally accepted bytecode, programmers would be free to build any programming standard, extended in whatever way, without worying about non-standard mezzo-compilation to Javascript because "compiling to web bytecode" would be accepted standard.

Edit: centralization was never a good idea and that is exactly what W3C does. Give us some bytecode, we want to be free.

Java all over again?

Given a bytecode, are we not back to where we were with Java? I think the DOM and CSS are important to the success of JavaScript, allowing web design to merge with application design. Perhaps a bytecode with the same scope as JavaScript, limited to manipulating the web page, and other standardised APIs provided by the web browser, then it might be different.

I wonder what you mean by a typed byte-code? Are you proposing a tagged memory model? It would make arrays slow.

When I think of byte-code,

Don't be surprized, but when I think of byte-code, i'd want it to be as similar to assembler as it is possible, given a variety of processors on the market. It should be some generalization of Intel, ARM and other assembly languages.

So, why that much low level? There are two reasons:
(1) Speed. That way AOT compiler should produce a fast native code from the byte-code.
(2) Generality. What do we know about programming languages of the future? Right now we know about imperative, OOP, FP and maybe some other (still unknown to me) paradigms. If we restrict our byte-code just to one paradigm (like Java byte-code did), we can't efficiently implement other paradigms on top of it, given the speed, size and memory consumption of programs. I'd like to leave all options opened when it comes to enabling existing or yet unknown programming paradigms. I think that assembler like byte-code is the most efficient way to implement all the things we do and don't yet know about.

Typed byte-code? Surely we want our programs and libraries to be modular and reusable. To achieve that, byte-code should implement some type system of its modules. So some starting set of rules should exist, in a manner to share data between between DOM and modules and between modules itself. We could call it a typed assembler.

But if I think a bit further, what is DOM? It is another high-level construct, again a subject to big changes in the future of browsers, like Javascript is. What I'm having in mind is to embrace a minimal DOM that has the least number of elements in its complexity (starting with canvas-like module, and some basic event system). Higher level elements like textboxes, command buttons and tables should be modularly programmed and compiled to byte-code and since the byte-code should be very, very low-level, it should be (with a help of AOT compiler) reasonably fast.

Now, this last idea of low-level DOM might be somewhat strange in the first place, but again, why should we restrict ourselves to HTML? Why shouldn't we open possibilities of having PDF or ODF (or anything else) rendered inside browser through basic low-level elements under condition that all of them run reasonably fast?

So the idea is to have the smallest number of building elements for programming and DOM and to occasionally (when new versions are available) download high-level building elements in a form of byte-code from specialized sites, maintained by W3C or anyone else we want. Upon downloading some web page, browser should check in the cache does it have all libraries needed to show the page, and if it doesn't, it should update them from relevant sites.

I'm not sure of how to realize all said here, but I think it would be a good way to go with browsers and web technologies. And in the end, that way we could choose between static or dynamic languages when targeting web audience.

Typed Assembly Language

I can understand typed assembly language, but the type annotations get stripped out when it is assembled to machine code. I do not understand what typed machine code would be. Considering a typical RISC instruction (add.int32 r1 r2 r3), this adds signed 32bit ints in r1 and r2, returning the result in r3. How does this instruction know if the value in r1 and r2 are integer? They could be loaded from floating points. The only way to do this is to have type tags in each register encoding the primitive type. This means an instruction like (load.int32 0x1000 r1) needs to check the data at memory address 0x1000 is an int32 before transferring it. This means that every memory addess needs to be tagged with its type, and this in turn means that arrays become a problem.

Actually there is a partial solution to this I have thought of, but I have not seen it in any real machine language, and that would be to make an array a primitive type, and only addressable using indexed addressing. This way there can be a single type tag at the beginning, so something like (loadarray.int32 0x0(r0) r1), which although slower than the direct load, most compiled languages would generate. Of course that would preclude pointers direct to array elements like you can have in 'C', so it is definitely not a simple thing to design.

Typing so static it's invisible

I've skimmed papers on typed assembly language, typed machine language, and proof-carrying code a few times, and it sounds you're on track to colliding with ivanvodisek's meaning:

  • If you have a type tag on the array, you don't necessarily need it on the array elements.
  • If you have a type annotation on the code that interacts with some array, you don't necessarily need a type tag on the array. At this point, we see type annotations interspersed with code, and we can plainly recognize the program as being statically typed.
  • If you have a single type annotation on the whole program, you don't necessarily need type annotations scattered throughout the code (for some type systems, anyway). I think this is the format ivanvodisek has in mind.
  • If you have a type tag that resides in whatever system accesses the program, you don't necessarily need a type tag on the program itself. At this point, it might be hard to recognize that the program has a static typing at all, unless you already know what type it is.

I think most papers on typed machine language assume the last point of view: The machine language itself doesn't include any type annotations, but somehow the host knows what type it's supposed to be anyway.

A good illustration of this is TAS/Load (described in Hicks, Weirich, and Crary's "Safe and Flexible Dynamic Linking of Native Code"). When a TAS/Load program is compiled, its types are stripped off like you say. Nevertheless, any host program that calls load must somehow pass in a representation of the type expected.

Lying

The typetag on the whole program might be lying. Unless the CPU type checks every individual instruction with tags everywhere (the CPU operating as a typed interpreter), all you are doing is a Java like byte-code vetifier - which is no different really to type erasure and an untyped real machine code.

Having written more than one

Having written more than one Java bytecode verifiers...uhm...what? The old verifiers use DFA as a form of abstract interpretation, type tags that lie will cause verification errors. The new verifiers do instruction-by-instruction type checking directly (a classmate wrote this at Sun with Gilad).

Machine code

Because the verifier is separate from the execution, you can trick it into accepting code, as happened in practice with Java several times.

I was more thinking of hardware implementations though, if the CPU can execute code without types, how does it know the code conforms to the type annotations? In effect the CPU would not care about type annotations at all, and you just have untyped machine code. I am not thinking about the specific example of a web-browser, but a more general platform with typed machine code.

Trusting the typechecker

I was more thinking of hardware implementations though, if the CPU can execute code without types, how does it know the code conforms to the type annotations?

As far as I understand TAS/Load, the typechecker is part of the trusted compute base. So is load. A host program can't execute code without typechecking it, because if it did, the host program itself wouldn't have typechecked in the first place.

"The CPU can" do a lot of things that the trusted compute base can't do. Not all capabilities of the hardware need to be trusted -- just the ones that are in the formalism, and those can still be a broad and sufficient set of operations for programming. If for some reason I don't trust my hardware's implementation of something that is part of the trusted compute base, then wouldn't you know it, the trusted compute base... isn't. Misplaced trust can be betrayed, and that's how it goes.

machine code vs assembly language

Indeed, I am aware of typed assembly language and how it works, I was contrasting this to typed machine code (IE the code the machine actually executes). Perhaps I made a bit of a leap from typed byte-code to typed machine code, but if you think of a byte code as what the 'virtual CPU' actually executes then there is little difference.

I would say the original comment above should have said "verified byte code" not "typed byte code" as I see those are two very different things.

The verifier can have bugs,

The verifier can have bugs, I found those. Most of the bugs weren't related to DFA or type checking at all, but were of the stupid didn't check index bound correctly problem.

The CPU will just assume the instructions it executes are well formed, relying on trapping, VM hardware, process boundaries id they aren't. It would be too expensive to perform type checking during execution at the CPU level, even if you provide that abstraction the checking is going to be done in software.

arithmetic duck

Can't you write a Universal Turing machine just by using integer arithmetic only? Type checking couldn't account for safety then. Or posed as a question: if it could, how would it guarantee it?

You just described an

You just described an unityped language where the type is Int; a typesystem will vacuously prove that you'll never get a type error, only an integer or non-type runtime errors.
If you care, you can also describe dynamically typed languages as unityped, with type "Any" (as some like), and you'll get the corresponding guarantees: you'll never get a type error, only an integer or non-type runtime errors.

In both cases, though, the above implies that "trying to call a non-function" isn't a type error, so most avoid talking of a type system there.

verified-byte-code vs typed-byte-code

Verified byte code seems to be something distinct from a typed byte code (where all the data in the machine-model is typed). I think the other people posting in this thread are talking about verified-byte-code (where post verification its just a normal untyped byte code). If you think about typed-byte-code as contrasted with verified-byte-code, and re-read my posts, they should make sense.

Accessibility, WebAssembly, and Web page quines

If we restrict our byte-code just to one paradigm (like Java byte-code did), we can't efficiently implement other paradigms on top of it [...]

Efficiently implementing other paradigms on top of the Von Neumann architecture is pretty difficult as well. We end up with complicated workarounds like garbage collectors and transactional memory.

Nevertheless, the Von Neumann architecture has high accessibility, in the practical sense that everyone who already uses the Web just so happens to own Von Neumann-optimized hardware. :) I think it could be a well justified choice in that respect.

In practice, I think asm.js is catching on, now that Edge implements it. Between asm.js, PNaCl, and the upcoming WebAssembly standard, something like what you're talking about could be on the way, or it could already exist.

What I'm having in mind is to embrace a minimal DOM that has the least number of elements in its complexity (starting with canvas-like module, and some basic event system).

It's interesting to think about the design of a minimalistic DOM. I've had it on my mind for a while, and I think it would be quite a bit different from a canvas tag.

Canvases are nice as long as the user is accessing the document in a specific way, but they're not naturally hierarchical or composed of text, which makes it difficult to search them using text, trace them back to their corresponding textual source code to take excerpts, embed abstracted widgets into them as sub-hierarchies, and navigate them with keyboards and screen readers.

So, I think I would look for a minimal DOM that is essentially nothing but a plain hierarchical structure. On this hierarchy, any annotations related to input and output methods, such as vector graphics, raster canvases, keyboards, and speech, would all be handled by extensions, with the exception of annotations whose data format resembles the same format the source code is maintained in (e.g. plain text). Those annotations could be attached using a standard quotation syntax for the ease of building quines and fractals (e.g. Web pages that show people how to build Web pages, or widgets that help with the debugging of other widgets).

Whenever I think about this topic, I'm indecisive about it. Like, sometimes I'm not sure how "extensions" would actually be loaded, and sometimes I'm not sure how they would avoid stepping on each other's toes. I'm not even sure I want to recreate the Web to be exactly the same kind of experience it already is. Sometimes I think Angular.js and the shadow DOM have a lot of elegant ideas about abstraction in the DOM, even if I sometimes don't appreciate some other messes they're making at the same time. :) Until I manage to come up with any better ideas, the thought guides other projects I do in the meantime.

Asm.js and DOM

Normal JS is actually faster than asm.JS. asm.js only makes sense as a compilation target for pre-existing programs, but it turns out marshalling data introduces more overhead than asm.JS saves. The JavaScript compilers aggressively apply exactly the same optimisations to all JS code, so there is no real advantage to hand-coding asm.JS.

As for the DOM, what makes JS programming useful is I can use divs and form inputs, and I can let the UX designer style them in CSS. I don't want to go back to drawing lines and plotting pixels, unless I am developing a specific component that needs it. For graphs I would use SVG, so it is only really pixel based image manipulation that you need a canvas for.

What would be reasonable would be a DOM.and CSS with legacy stuff removed. You probably only need flex-box layout for example.

modules

I don't want to go back to drawing lines and plotting pixels

We'd have a library for that, named DOM.js or DOM.web.asm or something. If we don't like that library, we can do our own replacement and put it up for public. It's all about being as much modular as we can. The less number of modules that are hard-wired into a browser, the better.

Asm.js vs other compiler-generated JS?

Normal JS is actually faster than asm.JS. asm.js only makes sense as a compilation target for pre-existing programs, but it turns out marshalling data introduces more overhead than asm.JS saves. The JavaScript compilers aggressively apply exactly the same optimisations to all JS code, so there is no real advantage to hand-coding asm.JS.

I hear you, but I think it's still relevant to ivanvodisek's bytecode vision. I wouldn't hand-write a Von Neumann Web bytecode either (if I could help it), and as such, all the uses I would have for bytecode or Asm.js would be after I'd given up on hand-written JavaScript for my application anyway.

If I only have to interact with the DOM once per animation frame, and if I have a choice between a compiler that generates Asm.js code and an equivalent compiler that generates high-level JS code, do you think the latter code would tend to be more efficient?

The former is easier

I think it would be easier to write a compiler to asm.js as a target, but I think a lot would depend on the exact code. The problem will come when you have to copy the frame of animation out of the asm.js heap, and into the canvas. It would be faster to manipulate the canvas directly. I think asm.js needs a way to pass typed-arrays as argument to functions. This should not break too much as asm.js would still only be able to allocate on its heap. Without this, I think targeting normal JS but in a static way, like asm.js does would be faster. I think most JS compilers apply the optimisations wherever they can, irrespective of the "use asm" flag, but the flag enables a verifier that warns if something cannot be optimised.

For my application hand-written asm.js was slower than my the best normal javascript version doing image manipulations that updated a canvas once per frame.

Memory aliasing

In asm.js, the memory is just an ArrayBuffer, so one can create views that alias the underlying storage. AFAIK the canvas APIs don't generally work with typed array views, so that fault is really the canvas API.

More APIs should accept pre-allocated memory aliases

Canvas does use typed-arrays now, and you can create an ImageData object from your own Uint8ClampedArray. I didn't realise this at the time, so this would help one end of the process. At the other end I am using the pako library to inflate the data, and pako does not seem to have a mode for using a pre-allocated buffer. So there are always going to be problems, and not all APIs support it, but it seems to be better than I thought. Ill re-benchmark my code using the ImageData constructor.

Now that I've come back to

Now that I've come back to this thread, I'm disappointed that despite the flurry of comments, there hasn't been more discussion of actual empirical research on these questions, or what empirical studies might be good to perform. The linked article covers a wide variety, but surely not everything that's out there.

Difficult

I read the "literature review" with interest, and I think it is a very useful resource, but I also found it saddening. The covered studies represent an extremely large amount of work, and they are almost all found wanting and difficult to draw conclusions from. A single criticism, "unclear how much of the result are due to difference in programmer's skill", suffices to cast serious doubts on the validity of a large majority of the results, and for those that are not put in doubt in this way it seems rather clear that, while the results may be solid, generalizing them in any way (from a specific pair of language to "typed vs. untyped" for example) is impossible.

Then there is the part about how the results were blown out of proportions in the various language communities. This is also rather sad.

I'm convinced that doing empirical studies on programming is a difficult and much-needed task. But even if I had had any desire to design and run such studies personally (I don't), I think this review of existing work would have sufficed to discourage me. It clearly demonstrates that running those studies is very difficult, extremely demanding, and gives highly specific results. Which is of course better than nothing, but it is not for everyone.

On a positive note, I found that some programming usability studies have interesting side-results that are often more convincing or inspiring than the main results. For example, The impact of syntax colouring on program comprehension, by Advait Sarkar, 2015, shows heat maps obtained by eye-tracking technologies observing which part of a code screen programmers look at. I was not terribly wanting of empiritcal results on code highlighting (I know which part of it I found useful for me by personal experimentation, and don't care much about the rest), but I found those heat maps fairly interesting -- the idea of learning more about how programmers read code seems very enticing.

You got it

Yes, empirical research is that hard, so I think you read Dan Luu's review correctly.

Empirical research in psychology is also hard, yet we learned something from it.

However:
- you'd need many more people and money on this, just like psychologists. Actually, redirecting more psychologists to work on this (like Janet Siegmund) might be what we need.
- the questions we ask might be actually hard to answer on an absolute scale.

As mentioned, we're also just getting started — for instance, Janet and others have written a paper on how to reliably assess programming experience, so now we can at least do it correctly.

what empirical studies might be good to perform

Sorry to repeat myself but there is a two-pronged research program laid out here:

http://lambda-the-ultimate.org/node/5286#comment-90876

When intuitions can be harmful

An interesting blog post on a paper measuring the effect of linguistic intuitions on the correctness of our reasoning. The study discusses a dual-process model of judgment whereby humans typically use System 1 for quick, effortless and intuitive reasoning, and fall back on System 2 for slower, more analytical reasoning to correct System 1 in scenarios of disfluency.

The study seeks to measure more precisely when System 2 is engaged, which is directly relevant to program correctness if indeed programming utilizes more of our linguistic/System 1 reasoning than our analytical/System 2 reasoning. We need to know when the problem we're modeling actually exceeds our intuitive grasp, and whether any particular language features encourage this at the appropriate times (or conversely, inappropriately covers this deficiency).

This may also have relevance to debugging. One experiment triggered System 2 by simply using a font that made some letters look like symbols (Gadgets = G@dget$), and another using smaller and slightly translucent font, ie. black 12 pt vs grey 10 pt. Perhaps engaging such a mode while debugging, while counterintuitive, might help us find the problematic code that our intuition skips over as unproblematic.

The differences to correctness are significant. In the font size/shading scenario, 90% of fluent participants answered at least one question incorrectly, where only 35% of the disfluent participants did so. This implies some interesting experimental ideas for tooling.

On a language design front, Sean has often claimed that OOP is more linguistically natural, and functional languages are more analytical and less linguistic. If true, then this study would suggest that programming using functional languages would yield more correct programs, on average, possibly at the cost of some time (this cost is not quantified in this study, but if anyone has a link please share!).

It would further suggest that the most crucial aspects of a problem ought to be modelled using functional idioms to maximise potential for correctness, and only the simpler "glue code" that composes programs ought to be object-oriented.

Misguided

I think the objective is to train "system 1" to be effective in whatever task (correctness checking in this particular case) - not to rely on "system 2". E.g you don't want "system 2" driving your car. However, I think this particular case may be misguided as well - we want automated tools to do tedious correctness-checking - not people. Before computers you had to rely on people (mathematicians and logicians) to do proof verification. Now you don't. This is opening up programming to the mainstream as opposed to being confined to borderline asperger types - which is a good thing in my opinion.

Training insufficient

You can train system 1 for many scenarios, but not all scenarios you will encounter, and sometimes the differences a new situation has to a trained one are subtle. Your intuition will necessarily fail to recognize when this is the case, and the linked study proves that.

Correctness checking is all well and good, but you still need to express that proof, and debug proofs that don't check, which still involves a choice of system 1 and 2, and system 1 will rarely be up to the task of debugging in the first place because it likely made the initial mistake.

Performance

It seems pretty obvious that even with a modicum of training, system 1 significantly outperforms system 2 at any task you can name. I'd imagine the borderline-asperger types who are effective at manual algebraic manipulation are utilizing their own (trained) system 1 heavily in doing that.

I'm not sure where you

I'm not sure where this obviousness comes from, because it seems to depend entirely on what metrics you use for "outperform". According to the paper, the study designed tasks that did not take any longer using either system, and yet system 2 consistently outperformed system 1 when the metric was "correct answers".

Now certainly if your metric is "lines of code produced", system 1 will certainly outperform system 2 on average. If your metric is "correct lines of code produced", it's not so clear cut. When your metric is "extensible and correct lines of code produced", which is closer to the real metric we want, I'm afraid I simply don't follow the deduction that makes system 1 obviously superior.

sports, music, crafts, cooking, etc, etc

As mentioned, if you look at expertise in any task - you'll see it's mostly system 1. System 2 plays an editorial or supervisory role. Perhaps this study is highlighting an essentially editorial task. But the idea that we should generally transfer work to system 2 to gain better performance seems quite wrong.

But the idea that we should

But the idea that we should generally transfer work to system 2 to gain better performance seems quite wrong.

I don't see anywhere this claim was made in either the paper or my comments, where I'm assuming by "performance" you mean "speed of delivery".

Performance

By performance I mean speed of the correct completion of a task. Correctness is a factor. I may have misunderstood you but I got the impression that you were suggesting that programming language and IDE design should target system 2 - that was my objection.

Of course they should target

Of course they should target system 2 when correctness is at stake. Being faster in completing a task incorrectly is of little use. Some programming activity should focus on correctness, which means they should engage system 2.

For example, like I said in my original post, the need to debug already indicates a failure of system 1, so continuing to engage system 1 to find the problem is often futile.

Ok, but to me this implies a misunderstanding of roles

When system 1 fails, yes, system 2 must intervene. But that should be the exceptional case - even in programming and IDE design - imo. These should mostly be optimized for system 1 - including its required training for such tasks - it seems to me.

It's not exceptional though

It's not exceptional though because every program needs debugging and extension. Even during writing system 2 is needed. For instance, it seems pretty reasonable to engage system 2 when designing contract preconditions, postconditions and invariants. It also seems reasonable to engage it when designing automated tests. Also for types that are supposed to preserve invariants. You are vastly undervaluing system 2 in my opinion.

Maybe

Maybe, but my argument is based on the fact that using system 2 is onerous, "effortful", etc. Thus offloading the editorial function to automated tools and/or system 1 as much as possible seems desirable. Arguably, type-checkers, auto-completion, even debuggers are examples of that.

Being faster in completing a

Being faster in completing a task incorrectly is of little use.

Wait, are you assuming correctness is black or white?

"Black or white" in the

"Black or white" in the sense that a given solution ought to satisfy a specific set of currently known requirements. Spec validity is out of scope here, as usual. Of course, what's currently known can change given spec mistakes.

So you are assuming that

So you are assuming that "satisfy a known requirement" is black or white, right? Specs are always wrong, of course, its just a matter of how wrong they are. Also, the problem with writing correct software really isn't our inability to satisfy correctness according to a spec, but our inability to specify what "correct" means in the first place.

My understanding of "satisfy

Certainly if requirements state that users ought to be able to enter such and such data, and it should be validated with bounds X on data of type Y, these domain requirements ought to be satisfied.

That said, the developer should always be on the lookout for contradictions in the spec, which are not uncommon given specs are often wrong as you say. Including more detail about user stories in the spec helps elucidate the true goals and ferret these out early. This again argues for system 2 processing of the spec in my opinion.

And yes, "correctness" without a machine checked proof of the spec is at best an approximation, but it's what we have to work with for now.

"correctness" without a

"correctness" without a machine checked proof of the spec is at best an approximation

"Correctness" with a machine-checked proof of the spec is generally an illusion. Since it begs the question of whether the spec is right and relies on acceptance by a machine that lacks common sense.

According to the paper, the

According to the paper, the study designed tasks that did not take any longer using either system, and yet system 2 consistently outperformed system 1 when the metric was "correct answers".

Now certainly if your metric is "lines of code produced", system 1 will certainly outperform system 2 on average. f your metric is "correct lines of code produced", it's not so clear cut. When your metric is "extensible and correct lines of code produced", which is closer to the real metric we want, I'm afraid I simply don't follow the deduction that makes system 1 obviously superior.

Is "correct answer" the only metric that matters? Or even "extensible and correct"? Really, the only metric that matter is can I produce an acceptable solution to the problem before my competitors can. If I produce a more correct extensible solution, but my competitors beat me to market with a solution that consumers are OK with, they won't care if my solution is more correct or extensible.

Is this very surprising? I

Is this very surprising? I mean, intuition is just an optimization that we make by generalizing our experiences (or genetic generalization of our ancestors' experience). It is a bias, biases can be wrong, but right often enough that they are efficient to follow as long as they seem accurate.

On a language design front, Sean has often claimed that OOP is more linguistically natural, and functional languages are more analytical and less linguistic. If true, then this study would suggest that programming using functional languages would yield more correct programs, on average, possibly at the cost of some time (this cost is not quantified in this study, but if anyone has a link please share!).

This is completely my viewpoint. OO really allows us to be very loose and fuzzy with what we know about the problem, allowing us to assign arbitrary meanings to names. It more easily allows us to program something even when we aren't quite sure about what we are programming.

Functional programming, at least the pure Haskell variety, requires very good understanding of the problem you are trying solve to encode anything at all. You can't really hide behind arbitrary names. So the Haskell programmer spends a lot of time thinking about their problem so they can really understand it and come up with the most elegant solution possible.

It would further suggest that the most crucial aspects of a problem ought to be modelled using functional idioms to maximise potential for correctness.

But you have to get there first! How can I go from a intuitive vague understand of a problem to a formally correct understanding? On a whiteboard and with notebook paper? I want to start coding right away, I want to think through the problem with code.

Much of science consists of confirming certain intuitions

Much of science consists of confirming certain intuitions. Perhaps it's not surprising to some, but this has not been studied specifically in the context of programming languages either to my knowledge, and is not necessarily widely known outside the domain of psychology. This thread is about gathering empirical research that has a potential impact on programming languages, and this definitely qualifies!

I would also find it very surprising if you were aware of all the different ways the study triggered disfluency to engage system 2. Unless you're knee deep in psychology or perhaps HCI as a career, and even then, some discussion might reveal a new angle.

But you have to get there first! How can I go from a intuitive vague understand of a problem to a formally correct understanding? On a whiteboard and with notebook paper? I want to start coding right away, I want to think through the problem with code.

What if code isn't the best way to transition from vague and intuitive to formal?

Finally, I disagree with your characterization of OO and Haskell's suitability for solving unknown problems. With either language you can add or omit as much information as you'd like to make the problem simpler with fewer invariants, or harder and check more invariants. Haskellers simply tend to add more checked properties once they understand the problem better, but this isn't a necessity. But since this topic keeps coming up, perhaps another thread to discuss the particulars would be of use to clarifying OOP vs. functional. I'd be very surprised if there were a significant difference given the dualities entail any solution in OO would have a corresponding functional expression, just with different extensibility properties.

re: suitability for solving unknown problems

I'd very much like to read such a discussion. Having more of a fleshed out group understanding of the gamut of possibilities across and within paradigms sounds interesting and very useful and important to me $0.02.

(E.g. as random fodder, Sean, why/couldn't your IDE work with FP style?)

What if code isn't the best

What if code isn't the best way to transition from vague and intuitive to formal?

Then what is? Because programmers spend a lot of time figuring out what code to write, not just looking at code that is already written.

Finally, I disagree with your characterization of OO and Haskell's suitability for solving unknown problems.

But you previously said:

If true, then this study would suggest that programming using functional languages would yield more correct programs, on average, possibly at the cost of some time (this cost is not quantified in this study, but if anyone has a link please share!).

So what is it then? If FP engages System 2 right away, then it is definitely going to have a disadvantage in working through problems intuitively.

I must admit, I only ever hear about the elegant solutions Haskellers have come up with for problem X. That they focus on these elegant solutions biases me to think that they are necessary for writing code in Haskell. How quick and dirty can Haskell code be?

Quick and Dirty FP

Haskell code can easily be first order, embedded in a half specified monad to which one is still adding effects or tweaking APIs. A lot of my code looks like this at first... until I grasp a bigger picture and eventually refactor the model. Pure FP is effective at modeling and embedding imperative code, stream processing state machines, etc.. It can be agile.

Pure FP is easy to refactor due to local reasoning and equational reasoning, separation of effects from evaluation, disentanglement from the environment, easy testing of components. Refactoring for elegance is safe and predictable. Consequently, people who like elegance will take advantage. But many programs and libraries don't start that way.

You look only at the elegant end-product of FP and assume wrongly that it must be written elegantly from whole cloth by vulcans. You assume wrongly that local and equational reasoning is all about the end product, where in reality it supports the full development path and valid intuitions about what a given change will effect.

I'd really like to know more

I'd really like to know more about this. Whenever I sit down with a FP language, I feel like I'm being bombarded with an ideology of do things right the first time. The lack of a good debugger experience in Haskell seems to reinforce that (because...well typed programs can't go wrong?). My contact with the FP community (admittedly, very smart people like SPJ) and what they seem to talk about (elegant solutions to problems that have quick and dirty solutions in other languages) only reinforces that. Not only that, but the common academic spiel is "people should write programs like they write proofs", but I'm not even sure what I'm going to prove (this is probably a common occurrence).

The problems I prefer to work on generally defy obvious elegant solutions, or even clear understandings of the sub-problems, so I have to spend a lot of time prototyping. Time invested in making a solution "more right" is often wasted when the solution isn't appropriate at all.

Whenever I sit down with a

Whenever I sit down with a FP language, I feel like I'm being bombarded with an ideology of do things right the first time. The lack of a good debugger experience in Haskell seems to reinforce that (because...well typed programs can't go wrong?).

It worries me that we tend to think the alternative to getting everything right the first time is using a debugger. This seems a false dichotomy. I dislike debuggers; or at least, I dislike all the debuggers I've encountered.

Not only that, but the common academic spiel is "people should write programs like they write proofs", but I'm not even sure what I'm going to prove (this is probably a common occurrence).

I do think creating programs and creating proofs are substantially the same activity, at least when done fluently they are. To me, that doesn't mean programming is proof, but rather, that creating programs and creating proofs are both deeply intuitive processes in which the design is conjured from nowhere-rationale, perhaps from the realm of either Polyhymnia or Erato.

Types First in Haskell

Haskell is certainly a "types first" language and encourages you to at least specify some types up front. Similarly, there are OOP languages that heavily focus on types, design by contract, correctness, etc. (e.g. Eiffel). There are languages for both paradigms that play relatively fast and loose with types. I consider the emphasis on correctness, in both cases, to be separate from the programming paradigm. Functional programming is about equational reasoning and purity, not the type system.

Haskell, at least, does not overly interfere with anyone who is rapidly prototyping. Types first? Sure, just add a few placeholder types and adjust as needed. The type system helps you find obvious errors and fix them, but doesn't force your types (or program) to be 'correct' in any larger sense of meeting relevant requirements or passing tests.

As an aside, I certainly like elegant solutions, and refactoring towards them. If I'm asked to talk about or show off code, it won't be the quick and dirty stuff that I'm still hammering out. I wonder how much your opinion of FP is influenced by a biased sampling.

My point about the types is

My point about the types is that I really just want a debugger. Assuming the types don't get in the way if I'm not using anything heavily abstractive, they still don't provide all necessary feedback for problem exploration.

We of course show off your best code when asked, and we rarely talk about the journeys taken to achieve that code.

Debugger

I'm not fond of debuggers, and I'm not convinced they contribute to problem exploration. Mostly, they're a source of frustration to me. I would like a live programming environment, though. And zero button testing. And a good REPL, with graphics as a bonus (turtle graphics, diagrams, SVG, etc.). Maybe something like iPython notebook, combining a REPL with live programming and graphics...

Live programming is just a

Live programming is just a full-time debugger mated to the language (or could be, its probably more than that). Reminds me I need to do that ECOOP live programming workshop proposal.

I think languages carry with them philosophies about how we solve problems with programming. Many in the FP crowd do not believe much in debuggers, I guess that is just part of the philosophy.

A live programming

A live programming environment isn't what I assume or expect from a debugger. The properties I associate most strongly with 'debuggers' are the opposite of liveness: stepping, stopping, breakpoints, poking around in a frozen world.

Rather than cultural language philosophy, another hypothesis is that FP simply doesn't benefit as much from conventional breakpoint debuggers. There's less state or aliasing errors, rarely any `for` loops with potential fencepost errors, more control over effects. Also, it's easy to use functions independently of context. With imperative programming, it might be relatively convenient to run the entire program to a breakpoint as a brute force method to get external resources into the right state. With pure FP, those external resources aren't a concern, you can just evaluate the subprogram in a REPL or similar.

Anyhow, if a live programming environment works for you, maybe look into Lamdu.

Please to get the chocolate cohabitating with the peanut butter

I want both/all of the "debugger" and the "live coding" features. And everything else. All married well in a glorious IDE experience. Being able to stop the world is very useful to me. Being able to play with thing live is also. I beg nobody to throw babies out or forget to put them in the bath.

Then what is [the best way

Then what is [the best way to transition from vague and intuitive to formal]? Because programmers spend a lot of time figuring out what code to write, not just looking at code that is already written.

I don't know. Perhaps relational diagrams of some sort. Perhaps any one of a thousand things beyond my limited knowledge of HCI. Perhaps even paper and pencil.

Has your opinion on paper and pencil's unsuitability been tested against the best prototype development environments? That would be a useful addition to this empirical study thread, as all opinions on this matter seem to be anecdotal at this point.

One of the tests in the above study demonstrated that engaging system 2 lead to deeper understanding of a scenario than engaging system 1 (the mp3 player description experiment). If so, then system 2 ought to be used at the beginning of design when reading the requirements, and system 1 to write out the relationships you've now internalized, possibly with brief interruptions of system 2 if you're expressing important invariants and such.

Viewing software as a uniquely, or even primarily, a system 1 process just seems incorrect. Optimal results would seem to require quite a bit of give and take.

But you previously said: [...] So what is it then?

I previously simply stated your claim about OOP and functional languages and the implications of this study if those claims are true. I've now said that I'm skeptical your claims are true, at least to the extent you've described elsewhere.

That they focus on these elegant solutions biases me to think that they are necessary for writing code in Haskell. How quick and dirty can Haskell code be?

Since you're obviously familiar with Scala, you can write Haskell98 code that's just case classes and straight pattern matching. If you need to abstract over a number of such types, like you would by adding an interface to a number of objects, then you just add a type class and the requisite instances. Integrating with existing code is the most challenging part if that library uses elaborate types for effectful code, but anecdotal reports suggest this largely becomes a matter of getting used to the incantations, like what happens when familiarizing yourself with any new language's standard library.

System 1 Software Development

Viewing software as a uniquely, or even primarily, a system 1 process just seems incorrect. Optimal results would seem to require quite a bit of give and take.

Software as written today might be mostly system 2. But many people feel it shouldn't be this way, that software today is not developed optimally and is not nearly as accessible as it could be or should be.

I believe that PX and UX should be fully unified. Using applications - even posting to a forum like LtU - should be usefully viewable as 'programming'. Developing rich interactive forums - e.g. something akin to REPLs, interactive fictions, console apps, or MUDs - should be trivial from within the same forum (rather than going to edit a separate codebase). Further, barriers between applications must be lowered such that apps can easily be composed, integrated, refactored, abstracted, and reasoned about like language constructs.

In this context, the vast majority of 'code' is first order. Abstraction, in this world, isn't the primary approach to software development. Rather, we mostly just grow a first-order codebase. Abstraction is something to help with repetitive tasks, improve consistency, reduce latency, or derive information from data embedded in the codebase. A bit like user-macros, wiki templates, and a bit of query language or search.

Requirements aren't something we start with. They're something that we develop together with the software. Unless, of course, we already have written a lot of similar software... in which case, the requirements should have appropriate abstractions (types, interfaces, APIs, etc.) already. The main special case for requirements coming ahead of the software is when we're first implementing requirements known from experience in other programs or programming languages.

The details may be unique to my vision, but the theme of pushing much software into system 1 thinking is not. Looking into history, we can see Seymour Papert (with Logo), Alan Kay (with direct manipulation of objects as APIs in early Smalltalk environments), developers of Hypercard, 'Experiments in Oval', Morphic, spreadsheets, all different takes on similar ideas. More recent efforts include Sean McDirmid's work on YinYang and his more recent IDEs, Chris Granger's Eve, Paul Chiusano's Unison, and my own Awelon project.

It's important to support system 2 thinking, of course, but it doesn't need to be primary. Really, it could be an afterthought, with the benefits of hindsight, rather than something that happens before we begin programming.

Software as written today

Software as written today might be mostly system 2.

I don't think that's the case. I think most programmers primarily use system 1 once they're comfortable with their language and environment. Perhaps our languages and tools are still poorly adapted to system 1, as I think Sean would say, but there's definitely plenty of fast and loose development going on, rather than slow, careful and deliberate development characteristic of system 2.

Development being system 1 also explains inertia of programming languages and frameworks: it takes a long time to become productive with a new language and/or framework because we're using system 2 to learn, but every project after we've adapted to the idioms is practically a breeze. Programming is thus typically system 1, even today.

Requirements aren't something we start with.

That seems bizarre, because every client that contacted me, ever, has opened with "we need X, Y and Z, and have problems with W". Perhaps you mean something more specific by "requirements", but every list of requirements starts informal like this, and becomes more formal as it's elaborated. Elaboration consists of mapping out user stories which ferrets out specific requirements before I even type any code.

It's important to support system 2 thinking, of course, but it doesn't need to be primary. Really, it could be an afterthought, with the benefits of hindsight, rather than something that happens before we begin programming.

This also seems bizarre to me. I recommend reading the study if you haven't, because I think you'll change your opinion. The flaws inherent to system 1 are clear from the experiments. We evolved both systems for a reason, and we should exploit both as appropriate. To suggest that we'd need system 2 for typically forgiving, informal interactions with the real world in which we evolved, but we don't need it for unforgiving, formal interactions with a computer for which system 2 is well suited just doesn't make sense to me.

every client that contacted

every client that contacted me, ever, has opened with "we need X, Y and Z,

If your clients are coming to you with requirements, they've probably already spent some time exploring the problem without your input. This exploration might take the form of working with spreadsheets, interacting with web applications of similar businesses, struggling with an existing app developed by an intern years ago, etc.. It isn't as though every client that has contacted you, ever, is a complete virgin in the software world. Nor are you, for that matter, which is why you can look at user stories and translate some of them into requirements. It's also why so many such 'requirements' are aligned with a particular solution rather than with an overall purpose.

Ideally, professional programmers would only need to get involved a much smaller fraction of the time. That means, as late arrivals, we will see more requirements that have already matured to a greater extent.

The flaws inherent to system 1 are clear from the experiments.

You seem to have concluded too much from the study. System 1 reasoning isn't "inherently flawed". It's only flawed when our shortcuts in reasoning are badly aligned with a system's construction, i.e. where intuition leads us to incorrect answers. The solution is to provide an environment where the shortcuts are valid, where the weaknesses don't become problems. In such an environment, system 1 thinking is not flawed. (cf. Nudge by Thaler and Sunstein, for physical examples).

It's unfortunately easy to create and buy into a system that offers an illusion of alignment with our natural intuitions, but where those intuitions guide us wrongly. For example, our intuition that 'objects perceived' have some fundamental reality outside the observer runs into all sorts of philosophical bundle/substance/objecthood/identity issues, and leads in OOP to maintenance, extensibility, and integration problems we solve badly with dependency injection, observer and visitor patterns, ORMs, etc..

But we can find these problems and address or avoid them. We can study cognitive dimensions of notations and design for good qualities on those dimensions. It's much easier to focus on intuitions learned within a programming or user application environment (the same thing, if PX and UX are unified), rather than trying to leverage 'natural' intuitions. The important bit is to ensure that those 'learned' intuitions are valid, i.e. that they don't have exceptions, corner cases, discontinuities when scaling, etc. that we might encounter in real world software.

To suggest that we'd need system 2 for typically forgiving, informal interactions with the real world in which we evolved, but we don't need it for unforgiving, formal interactions with a computer for which system 2 is well suited just doesn't make sense to me.

We have vastly more control over our software development environment than we do over the real world. Consequently, we have the opportunity to make software development align with system 1 thinking far more effectively than does the real world in which we evolved. IRL, "it depends" and "it's complicated" are common answers. In software, we can control dependencies and do a lot to keep things simple.

Also, it probably doesn't help that most software interacts with the real world in non-intuitive ways. Humans didn't evolve with traffic lights, traffic jams, massive economies and nations, robotic house-cleaners, etc.. Most of our evolution involves little tribes and hunter-gatherer economies. Many of our natural intuitions (moral, social, linguistic, etc.) are oriented around group identity, authority, and shared experience.

You seem to have concluded

You seem to have concluded too much from the study. System 1 reasoning isn't "inherently flawed". It's only flawed when our shortcuts in reasoning are badly aligned with a system's construction. The solution is to provide an environment where the shortcuts are valid, where the weaknesses don't become problems. In such an environment, system 1 thinking is not flawed.

Thankyou.

It seems to me this whole approach is in danger of overthinking the problem. It's my understanding the paper this is coming from wasn't looking at programming language design at all, and it seems quite a leap to apply their results thereto — to the extent one trusts their results. My confidence is not strengthened by their use, early in the paper, of what I've long considered a classic example of incautious reasoning: assumptions about why people might prefer to drive rather than fly. In any case, afaict system 1 is closely allied with sapient thought (which is very good at some kinds of tasks, bad at others), system 2 closely allied with reasoning like a computer (which is a good way to do some things, an abysmally bad way to do others). It seems to me we've been consistently making the mistake of trying to make humans think more like computers and then claiming humans are a sort of inferior version of a computer (after going to so much trouble to try to make them so), when we should be finding ways to let computers do what they're good at while humans to what computers frankly can't.

Though I also worry that the cognitive-dimensions-of-notations thing could also be in danger of overthinking the problem. There's intuitive, and then there's unintuitive.

It seems to me this whole

It seems to me this whole approach is in danger of overthinking the problem. It's my understanding the paper this is coming from wasn't looking at programming language design at all, and it seems quite a leap to apply their results thereto — to the extent one trusts their results.

Correct, it wasn't about language design, but this is a dominant cognitive model in psychology with many experiments behind it. LtU is about discussing ideas for programming languages, which often includes developments environments like live programming, and this seems pertinent to those types of experiments.

So assuming two cognitive modes, which can be switched via visual cues as they do in the study, then this has useful implications for development environments when a given context suggests a specific cognitive mode would be more useful. The visual cues the study uses might give some people ideas, which is what we're all about here.

We've gone a little afar from this point in discussing the specifics of when system 2 is actually useful for programming, if at all (which baffles me), but such is life.

If your clients are coming

If your clients are coming to you with requirements, they've probably already spent some time exploring the problem without your input.

Like I said, it depends what you mean by "requirements". Client requirements are often vague or wrong considering where they want to go or how their business actually works, but they have them; sometimes it's just as simple as they "want an online presence". My point was only that requirements do not co-evolve with code right from the get-go, some sort of requirements always precede code or you'd have no code to write.

You seem to have concluded too much from the study. System 1 reasoning isn't "inherently flawed". It's only flawed when our shortcuts in reasoning are badly aligned with a system's construction, i.e. where intuition leads us to incorrect answers.

You basically just repeated exactly what I've said this whole thread, namely, that each evolved for a purpose and we should utilize both as appropriate. I never said that system 1 was "inherently flawed" with the implication that we should never use it, I said it had flaws, as does system 2 which I've also noted in this thread, and that each has contexts in which its use is more appropriate.

Now the context of these comments is entirely lost so let me re-establish: my reply was to your claim that supporting system 2 need not be primary, that it can be an afterthought with benefit of hindsight, and this is what I took issue with. The part of the development cycle during which it's easiest and cheapest to ensure good end results is the beginning. Engaging system 2 here to ensure accurate requirements, even if it's only initial requirements for an exploratory phase, ensures the most effective use of resources down the road. Isn't this obvious?

The flaws of system 1 in this domain are clear from the study, as it absorbs less information, and with less accuracy, both of which are critical to establishing requirements. Just look at experiment 2 contrasting heuristic cues and systematic cues. The system 1 participants were heavily influenced by the look of the person giving a product review instead of the content itself. How exactly is this mode of thinking useful when requirements gathering?

In software, we can control dependencies and do a lot to keep things simple.

Absolutely, but this doesn't entail that we should only rely on system 1 and system 2 ought to be an afterthought. System 1 isn't good at recognizing when it's no longer suitable, or when it's about to make a mistake. Feedback is needed for this, but in software development this feedback can be long-delayed and sometimes disastrous. A more proactive approach to engaging system 2 in some scenarios seems useful.

in software development this

in software development this feedback can be long-delayed and sometimes disastrous

You seem to be assuming this is an essential property for software development. I would posit otherwise. With appropriate control over effects, feedback need never be 'disastrous' (you can easily run software in a simulator). And with a good programming environment, feedback needn't be long-delayed.

Rapid prototyping should, in common problem domains, be little more expensive than whipping up a spreadsheet or power-point presentation. With augmented reality (or appropriate workrooms) it is feasible in the future to turn whiteboard sessions into rapid prototyping sessions. When the cost of rapid prototyping is marginal, we can afford to be wrong early and experiment more frequently. Relying on system 1 heuristic estimates of 'requirements' based on experience with similar software (as a user and programmer) is acceptable, especially since real requirements are difficult to tease out before a few iterative development cycles.

Most PLs today have many terrible properties: massive overheads to get started on a new project, long feedback cycles, frustrating security and environment mockup properties, social barriers to refactoring and extending shared libraries, a lot of accidental complexity from state and explicit caching. Even the best mainstream languages score poorly in a rubric of cognitive dimensions of notation. When suffering such an environment, yeah, I can see how system 2 thinking up front could help a lot. The implicit goal becomes to reuse as much of the prototype as possible with as few changes as possible because anything else is too painful.

One of my guiding hypotheses in PL design has been: Any programming language deficiency can be resolved by discipline and foresight. Relying on discipline or foresight is a sign of deficiency. System 2 thinking up front is ultimately a way of 'relying on foresight'. And I believe it is caused by many non-essential deficiencies in the programming languages and associated toolsets we use today.

I failed the uroborous test

"Any programming language deficiency can be resolved by discipline and foresight. Relying on discipline or foresight is a sign of deficiency. System 2 thinking up front is ultimately a way of 'relying on foresight'. And I believe it is caused by many non-essential deficiencies in the programming languages and associated toolsets we use today."

I fail to understand this. It reads to me like you are saying something can be created ex nihilo that has zero deficiencies? I figure you can't mean that, really. Pretty please unpack/restate in other terms for dense slow me.

discipline and foresight

"Civilization advances by extending the number of important operations which we can perform without thinking of them." - Alfred North Whitehead.

Same for PLs. In my experience, the presence of up front, repetitive, sustained, or systematic use of discipline and foresight - of heavy duty 'thinking' - has proven a very useful signal that something's rotten. I can frequently trace it to a common set of problems - e.g. lack of extensibility, or poor support for orthogonal persistence, or aliasing of state, or difficult integration of heterogeneous data resources, or weak support for EDSLs, or a slow feedback cycle, or opaque effects.

I have not suggested I can create a PL with zero deficiencies. But many common PL deficiencies can be addressed - mitigated, if not eliminated. Also, any PL I do create is informed by the strengths and flaws of all those hundreds of PLs I've studied, thus development is far from "ex nihilo".

Turing captured it nicely

Instruction tables will have to be made up by mathematicians with computing experience and perhaps a certain puzzle-solving ability. There need be no real danger of it ever becoming a drudge, for any processes that are quite mechanical may be turned over to the machine itself.

System 1 vs 2.

It has been a while since I taught basic programming (to tie in with the timertask in the paper above). One of the things that struck me as the hardest to do, was attempting to explain basic intuition (of programming) to people who are learning to program for the first time. There is so much reasoning that we shortcut via experience when we actually become programmers that the bootstrap process of teaching someone how to get there is incredibly difficult.

The model that I had in my head was an abduction/deduction feedback loop as used in AI as a model of human learning. There is a picture of one in Fig 7 in this page on knowledge and theories. It seems like a good description of the zone that a programmer spends most of their time in: surrounded by the problem, building bits of scaffolding to explain the solution to themselves. My somewhat hacky attempt to explain this to programming students is on slide 16 of this lecture. Apologies to anyone who follows the link, that was back in my beamer days and it is rather ugly on the eyes.

The claim that I would make, which I don't have any particular justification for but I suspect is quite widely believed, would be:

  • Each of the cycles on slide 16 is an instance of the feedback loop from Fig 7.
  • The top half of the loop (Abduction+Deduction) is system 2.
  • The bottom half of the loop (Application+Induction) is system 1.
  • (Good) programmers flip constantly between these two approaches a natural process of problem solving.
  • Problem solving in programming is naturally recursive so that we can be operating at points on this feedback loop for different scales in the program.

As I mentioned I don't have a particular justification for this, but when I've spoken to other experienced programmers this seems to be how they operate: my own introspection would roughly match some anecdotal "evidence". I've never seen anyone try to capture this theory of how programming works in an experiment, but I know other teacher who also assume this model strongly enough to bake it explicitly into their programming courses. It would be interesting to hear whether this matches other people's assumptions about how the process works, or if in fact it throws up any glaring contradictions with established results.

That all seems fairly

That all seems fairly reasonable. Feedback seems crucial to flipping, which something like Sean's research provides quite nicely. My conjecture is simply that some steps of slide 16's workflow should sometimes intentionally engage system 2 because it's more suitable, rather than it happening accidentally because you happened to catch some bit of feedback that caused you to double-take.

Has your opinion on paper

Has your opinion on paper and pencil's unsuitability been tested against the best prototype development environments? That would be a useful addition to this empirical study thread, as all opinions on this matter seem to be anecdotal at this point.

I'm a firm believer that the computer is meant to augment our intelligence and productivity. It is not just a slave to execute our well-reasoned commands! It is not "is pen and paper" not suitable, but rather, what can I do with the computer in front of me that has a tremendous amount of computational capabilities these days.

But believe me, I have pen and paper sitting next to my computer right now as we speak, and I use it a lot. I just wish that didn't have to be the case. I would be even more disturbed if I was expected to just work out my problem first before writing code.

One of the tests in the above study demonstrated that engaging system 2 lead to deeper understanding of a scenario than engaging system 1 (the mp3 player description experiment). If so, then system 2 ought to be used at the beginning of design when reading the requirements, and system 1 to write out the relationships you've now internalized, possibly with brief interruptions of system 2 if you're expressing important invariants and such.

I'm not denying that system 2 doesn't lead to a deeper understanding. But system 2 is expensive, necessarily mentally taxing, I don't want to initiate it unless I really have to because there are a lot of things to do in working out a problem. System 2 is basically "suck it up and eat your vegetables, or wear the hair shirt!" sometimes you need to, but it shouldn't be the first answer.

Viewing software as a uniquely, or even primarily, a system 1 process just seems incorrect. Optimal results would seem to require quite a bit of give and take.

Absolutes are always wrong absolutely.

Since you're obviously familiar with Scala, you can write Haskell98 code that's just case classes and straight pattern matching. If you need to abstract over a number of such types, like you would by adding an interface to a number of objects, then you just add a type class and the requisite instances.

This is one reason I felt so liberated by C#. I always felt compelled to aim for premature elegance in Scala, because it had the power to achieve that. With C#, I can just "go ugly early", settle for less elegant solutions (because more elegant ones are not very expressible), and get on with solving the rest of my problems. Humans are innately obsessed with perfection even when it is unnecessary, having a language that tempers expectations is quite useful from a psychological point of view.

Executing Commands

I don't think computers do augment our intelligence, and I don't think there is a single example to date of them doing so. They are just slaves to execute our commands, and not realising this leads to a lot of novice programming errors.

Maybe in the far future, when we have strong AI, then machine intelligence will augment human intelligence, but until then I don't find anything a computer does a truly intelligent. Intelligence requires understanding what you are doing and what it means. Of course they augment our productivity by doing things fast and repeatably, as well as allowing rapid search of data, which has become very useful now things are networked together.

don't think computers do

don't think computers do augment our intelligence, and I don't think there is a single example to date of them doing so.

Who programs without Stackoverflow, Google, these days?

Knowledge vs Intelligence

If anything Google increases our knowledge, it does not increase our intelligence. Programming is not a test of intelligence. Programming requires intelligence, knowledge, perseverance, and probably a whole host of other attributes.

Pedantry?


in·tel·li·gence
[inˈtelijəns]
NOUN
the ability to acquire and apply knowledge and skills:
"an eminent man of great intelligence" · [more]
synonyms: intellectual capacity · mental capacity · intellect · mind · [more]

the collection of information of military or political value:
"the chief of military intelligence" · [more]
synonyms: information gathering · surveillance · observation · [more]

See also https://en.m.wikipedia.org/wiki/Intelligence_amplification

Aquire and Apply

The ability to aquire and apply knowledge (or skills) is it not knowledge, and is independent of knowledge. Something may possess a lot of knowledge but be unable to apply it, or aquire new knowledge. Something else may be able to apply the little knowledge it has extremely well, and achieve great things.

You are making up your own

You are making up your own term definitions and distinctions and expecting people to just agree. I don't see how this adds anything to the argument, you can just tell academia they are completely wrong about the existence of intelligence amplification because....pedantry.

Explanation

I was only explaining what the definition you posted actually said, however you are right it adds nothing to the argument. If I ignore the part about intelligence amplification, I broadly agree with the rest of what you are saying. I don't agree with the claims of amplifying intelligence, and I don't find anything new in the above comments that I haven't heard before, so I will simply state that I don't believe computers amplify human intelligence any more than an encyclopedia or a hammer does.

overly narrow definitions

Your entire line of argument on the subject strikes me as analogous to: A hammer doesn't augment human strength because it doesn't help the human lift more. Someone with a broader understanding of what it means to 'augment strength' might scoff. A hammer, after all, enables a human to wield their strength with greater effect. What is that, if not augmentation?

Access to an encyclopaedia or a cookbook arguably does augment a human's intelligence. It serves as an external memory, can help a human make better decisions or better comprehend some phenomena. Computers can do a great deal more for us than serve as external knowledge resources.

Knowledge isn't intelligence, sure. Similarly, a hammer isn't strength. Nor are glasses the same as eyesight. But, in each case, the former can augment the latter.

Augmenting intelligence

A computer doesn't need to be intelligent to augment intelligence any more than a sword needs to wield itself to augment our capacity for violence.

But a computer does need to be available, in a timely manner (due to social or physical latency requirements, short term memory, etc.), with appropriate services active, to effectively augment intelligence. Thus, we probably won't see augmentation outside niche or high latency areas before wearable computing and augmented reality are a thing.

Technologies like Hololens are promising in this regard... but still don't have the slim and sexy form factor yet to become socially acceptable.

Sum of Intelligences

I think to increase intelligence you need to add more intelligence. All a computer is doing in the case of Google is increasing our knowledge, Knowledge and Intelligence are different things. Someone who spent a long time learning stuff by rote is not more intelligent than someone who knows nothing about that subject. This is why IQ tests give unseen problems, so that you cannot have learned the answer in advance. An unsupervised intelligence test for the 21st century needs to have ungoogleable answers to be a true test of intelligence.

Not at all

Augmenting intelligence only requires tools to make better decisions, or gain better understanding, etc.

Access to knowledge is a valid tool. Knowledge becomes vastly more effective as a tool if placed in timely and spatial context - e.g. to counter memetic bullshit arguments, or to highlight the next ingredients when cooking, or to properly lace shoes and swiftly fold t-shirts without reviewing a document multiple times.

But knowledge is nowhere near the only thing computers can do for us without being 'intelligent'. Tools to express and solve constraint problems - something humans are horrible at - are a way to augment our intelligence. Wolfram Alpha, which makes it easy to explore many variables together with real world data, is another. Expert systems can provide good advice within their domain. Simulations can help teach well understood subjects (physics, chemistry, historical battles, etc.) more effectively and swiftly. We can also improve perception and introduce new senses.

IQ isn't a true test of intelligence. I wouldn't be surprised to see ML systems passing IQ tests without being good at anything else.

Why is more intelligence required?

I think to increase intelligence you need to add more intelligence.

Why? I wonder if this is a claim that you could provide any evidence for, or if you are relying on an underlying assumption.

I would claim that a calculator makes me more intelligent. Clearly it is a far simpler machine than a computer, and it does not contain any intelligence to add to my own. But it does augment a basic process that I use when applying my intelligence to a problem. Without it I am limited to numerical reasoning that I can perform comfortably in my head (a few digits), or I have switch to using pen and paper to tackle problems that are a larger size. If the application of my intelligence to a problem is thwarted because a property of the problem makes its numerical representation too large for me to deal with using my default tools then it seems to me that that is lower level of intelligence. I get the impression that you are trying to draw a very fine line between the amount of intelligence, and the size of the effect produced by application of intelligence. I'm not really sure what that line is: I'm only familiar with an operational definition of intelligence, so its not clear to me how we would begin to define it in a non-operational way.

Intelligence is independent from knowledge.

I would argue that intelligence is an independent property from knowledge. An encyclopaedia contains a lot of knowledge, but it is not intelligent. Take a person that owms an encyclopaedia, and take that away from them, they are not suddenly less intelligent.

Acceleration

So where would this leave a tool that accelerates the application of intelligence, without adding intelligence or new knowledge? For example, a calculator. Does that increase intelligence, or is the "amount" of intelligence independent of the set of problems that can be solved?

hardly binary

Intelligence isn't some binary thing. It is context sensitive. It is fractal. It is relative. So there is the intelligence of knowing to pick up the calculator in the first place. Applying tricks on how to use it. Of understanding when it can't save you (like how I desperately tried once or twice to get my HP Graphing Calculator to solve my math test problems, and mostly failed). Of figuring out what the ultimate calculator could do that the current ones don't. Any number of things.

Yes

Yes I would agree, but I wondered what Keean's underlying model of intelligence was. If it can only be "increased" by other intelligence then it starts to sound more like a one-dimensional quantity. I wondered how that work - the only definitions of intelligence that I am aware of are context-sensitive and operational. Can intelligence level X solve problem Y, or the family of related problems Z.

I'm guessing that the ultimate calculator would look a lot like a desktop computer...

Intelligence

The Turing test is one way to test intelligence. I think it is much easier to say what is not intelligent, than it is to define what it is.

Testing

I don't think we have any test that measures "intelligence" in any deeply meaningful sense of the word. The Turing test says more about the judges than the contestants. Mensa, from what I've heard, has plenty of stupid people in it. I use the term "sapience" not because it's any better defined than "intelligence", but because being less used it carries less baggage; better a term we're unsure of than a term we're incorrectly sure of.

Observation of Intelligence

If I observe a student, and note the student has independently thought about a problem and solved it from first principles, I am inclined to credit them with more intelligence than someone who just googled the answer. Therefore the system of student + Google demonstrates less evidence of intelligence than just the student (because intelligence is not just searching for a pre-defined answer using text matching?).

Depends on the question

If you set textbook problem 3122 for a student, and they google that label and copy the answer that they find then they have demonstrated no intelligence beyond understanding the name of the question.

If you set an unusual problem for a student and it cannot be identified by keyword then they must:

  • Identify the salient parts of the problem.
  • Google for information about those parts.
  • Work out how to generalise the information that they've found into the setting they were given.
  • Synthesize a specific solution to their problem using the general / overlapping answers they found.

Not only does using google effectively on a real problem demonstrate intelligence - but it is also a better test of a student's ability to do the job that we are training them for. The real issue is whether or not the tester has taken the time to build a set of questions that are not trivial to find the answers to, and also to remove questions from the set as they appear on stackoverflow.

The presence of google (and open-book testing in general) means that the test of intelligence becomes dynamic rather than static as we must account for an increasing set of accessible background knowledge.

looking up the answer

Yes, looking up the answer doesn't require sapience at all. A computer can outperform humans at Jeopardy! without the computer demonstrating even the slightest glimmering of sapience. It is possible to enhance the effective performance of a sapient mind by giving it the ability to look things up; indeed, long before computers appeared on the scene it was a truism that a college education was largely about turning the student into an expert at recognizing what they need and knowing where and how to find it. It has also long been true that the student needs to become expert in when not to look things up, and there we are getting ourselves into trouble in the computer age because we're building infrastructure that prevents people from using any mode other than looking things up. Without that flexibility, the ability to look things up can go from enhancing sapience to stiffling it.