Dynamic typing in OCaml

On many occasions on this site we've discussed embedding a dynamic type checking scheme in a statically typed language using a Univ type. On many of these occasions, it's been suggested that this is an in-principle solution that would never be usable in practice, Turing tar-pit, etc., etc. Well, someone decided to put their money where our mouths are, and now we have Dynaml.

I've only briefly looked at the tutorial, but this definitely goes a long way toward demonstrating a plausible Univ embedding of a dynamic type system. Of course, I'd be curious to hear what everyone else thinks...

(from the caml weekly news...)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Clean Dynamics

For 3 years, Clean has had a hybrid static/dynamic type system, developed so they could work on mobile code and functional OS. See Towards a Strongly Typed Functional Operating System, 2002



Of course they wouldn't approve if Clean was used as a dynamically type checked language :-)

"It is possible to write CLEAN programs in which all arguments are dynamically typed. You can do it, but it is not a good thing to do: programs with dynamics are less reliable (run-time type errors might occur) and much more inefficient. Dynamics should be used for the purpose they are designed for: type-safe storage, retrieval, and communication of arbitrary expressions between (independent) applications."

Language Report, p80

yeah

Wow; I was thinking about this literally not 2 minutes before coming here and reading this.

What I was thinking was:
A. higher order types
B. the ability to tag values with their type, and thus have a tagged type
C. A special procedure application operator that takes a regular procedure and additional arguments, checks its tagged arguments for arity and type, applies the procedure if everything's ok or else raises an error... just like a dynamically typed language.

So you do (in Lispy syntax):
(dynacall + (tag 1) (tag 2))

And you could also write procedures that take dynamic arguments that could be applied with dynacall.

Skimming the DynaML tutorial, it doesn't appear that it has a standard, general way to apply statically typed procedures to your dynamically typed values. You have to manually extract the integers from the dynamically typed values, and then apply +.

Myths

Dynamically typed programs are not less reliable. That's just a myth. As far as I can tell, noone has ever convincingly showed that they are.

Someone who prefers to program in a dynamically typed language doesn't want to put so much effort as apparently necessary just to make things a little more dynamic. (In other words, if the subtext is that we now have a good combination of static and dynamic typing, then that's also a myth.)

Has the integration of static type checking into (the dynamically typed) CMU CL and SBCL implementations of Common Lisp already been discussed around here?

Pascal

Myths ?

They may not have to be less reliable but, well, (almost) everything which isn't checked by the computer has to be checked by a human being at some point. It's all well for developers who are able to do it by themselves. It's harder when you work in a team where documentations are not precise enough or not available early enough.

In additon, the latency between making a type mistake and finding out about that mistake is much longer when you have to wait for the testing phase to find out this kind of mistakes. As always, the longer it takes to catch a programming error, the harder it is to fix it.

Myths !

  • Not everything has to be checked by the computer, for example, aesthetical characteristics. If a program is elegant, it's easier to understand, but you cannot automatically check elegance.
  • Do you have evidence that team programming becomes harder without static type systems? (A study, for example?)
  • A type error is not a program error. Most program errors cannot be detected by a type system. Many type errors are not errors, but actually requried in programs which require a higher degree of flexibility. So in general, there is no correlation between program errors and type errors. (That's what most advocated of static typing don't get. "Type error" is a term that is completely defined within the boundaries of type systems, there is no connection to the "outside world". The assumed correlation between type errors and program errors is just an axiom that has to be believed. There is no practical evidence that this correlation actually holds.)
  • Therefore, statements that try to make you believe that catching type errors late in the game is a bad thing is actually meaningless.
  • It's true that it's bad to catch programming errors late in the game. But that's a different topic. (See above.)

Mists ?

Not everything has to be checked by the computer, for example, aesthetical characteristics. If a program is elegant, it's easier to understand, but you cannot automatically check elegance.

True. Not in contradiction to what I said, but true.

Do you have evidence that team programming becomes harder without static type systems? (A study, for example?)

No, just my personal experience. And anyway, what I wrote (or what I mean to write) is that, when working in a team, it takes much more careful documentation to avoid what actually amounts to type-related problems. That's not a condemnation of dynamic typing, mind you, just a different set of tools for the same task. I personally often prefer static typing, as I can sleep more easily at night knowing that some mistakes cannot be made by any team-mate (or me). Note that I do not have experience of large team developments. Also note that intermediate solutions, such as pluggable non-compulsory type systems (such as Ocamlexc) are also interesting tools, which I believe would deserve more focus than they currently receive.

A type error is not a program error.

What you meant to say, I believe, is that a static type error is not always a program error. Dynamic type errors, on the other hand, are by definition programming errors. It turns out that, in my experience, most type errors catch what could have developped into actual dynamic type errors. Ergo programming errors.

That might be dependent of the domain you're working on, of course. My latest developments are on a distributed virtual machine (using OCaml's static type system) and on e-book software (using XPCom's dynamic type system).

Therefore, statements that try to make you believe that catching type errors late in the game is a bad thing is actually meaningless.

See above.

doesNotUnderstand isn't always a programming error

In Smalltalk, the error condition #doesNotUnderstand (which corresponds to what static type systems would consider a type error) is frequently used for legitimate metaprogramming purposes, and occur in correct programs. A common use is as a delegation mechanism--if a message send returns #doesNotUnderstand, the class can "trap" the condition and dispatch the message to another object.

As mentioned in the header, retrieval of arbitrary data from external sources (which may be spoofed, incorrect, corrupt, or using an obsolete schema) probably requires dynamic typing to do properly; in this context #doesNotUnderstand doesn't mean the program is incorrect, but that the data being read is.

Dynamic typing is also useful in the context of programs deployed in separate chunks--though static module systems may be a better solution in the long term.

The thing that does annoy me about the dynamic/static debate is that too many people treat it as a binary thing, and/or a matter of "purity". The staticness (or OO-ness, or FP-ness, or dynamic-ness) of a languages is quoted as if it's a figure of merit, and because a tool is well-suited to a particular end, it is proclaimed superior. Like with many things, such decisions are best left in the hands of informed engineers attempting to meet a set of requirements. (Of course, many industrial programmers and architects lack the requisite skill and/or discipline implied by my use of the word "informed", and just do whateverthehell they want, or whatever worked last time...)

Agree

In Smalltalk, the error condition #doesNotUnderstand (which corresponds to what static type systems would consider a type error) is frequently used for legitimate metaprogramming purposes, and occur in correct programs. A common use is as a delegation mechanism--if a message send returns #doesNotUnderstand, the class can "trap" the condition and dispatch the message to another object.

Actually, I would say that this case should be handled by a reflection mechanism within a statically typed context. Java is surprisingly good at providing this type of information.

The thing that does annoy me about the dynamic/static debate is that too many people treat it as a binary thing, and/or a matter of "purity".

Couldn't agree more. I'm not interested in a language that is "pure X", for most values of X. Especially for a language that purports to be "general purpose", it's important that it find a middle ground in order to provide the best tools for the job at hand. Usually, when a language sticks to "pure X", its users end up finding ways around that purity in order to do the things they really want to anyway.

In C++ and Java, there is a time and a place for variant types (hence, Boost.Any and Boost.Variant), which is nothing more than an embedding of a dynamic type system within a statically typed language. Even polymorphic data types are a limited form of dynamic typing, and I don't see anyone in the static crowd saying those should go away.

The dynamic crowd is perfectly happy to let the compiler check their array bounds and manage their memory. Why not let it verify your types now and then? Dynamic typing can get out of control when you don't know whether "2" + "2" == "4" or "22". Recursion is beautiful, except when you have to convert to a less elegant tail-recursive form in order to get performance comparable to iteration. Imperative code makes it easy to write efficient algorithms except when you need to write a whole class to define an anonymous function.

Every paradigm has its strengths and weaknesses, and the languages that survive will be the ones that allow programmers to exploit the strengths most synergistically.

ST is rubber, DT glue

Do you have evidence that team programming becomes harder without static type systems? (A study, for example?)

&

Many type errors are not errors, but actually requried in programs which require a higher degree of flexibility.

Do you have any evidence that most type errors are harmless?

eh. Probably nobody has any evidence for anything. But I guess that's why this is such a fun holy war.

FWIW, I choose static typing. And emacs over vi.

The burden of providing evidence...

Statically typed languages have to reduce the expressivity of the "base" language in order to be able to make predictions about program behavior at runtime. This is because a type check can produce three values: Usage of types is correct, usage of types is incorrect, or it is unknown whether usage of types is correct or not. The last case is the interesting one. Statically typed languages (those with the goal of enforcing correct type usage) have to reject programs for which they don't know whether they are correct or not. So essentially, in the general case they have to reject some programs that would actually be correct. In other words, the programmers' freedom of expressing correct programs is reduced. (That's because the programmers' freedom of expressing any kind of program - correct or incorrect ones - is reduced, and that's intentional.)

So, to put it differently, there is a price to pay for using a statically typed language. The burden of providing evidence whether that price is worth to pay is on those who suggest it, not on those who do not suggest it.

So I don't need to provide evidence that type errors are harmless. However, I have actually provided evidence that statically typed languages can be more error prone in some cases than dynamically typed languages. See Dynamic vs. Static Typing - A Pattern-Based Analysis.

I am happy when a compiler performs static checks and warns about potentially incorrect programs. But I would like to have the right to overrule these warnings and let the program run anyway.

Enough already

It is quite unnecessary to trun every thread into a rehashed discussion of static vs. dynamic type checking.

I for one am losing interest in LtU beacuse of this, and I am sure I am not the only one.

If these wars really have to take place, how about starting separate threads, so the rest of us can keep up with the discussion about the item actually under discussion? Thanks.

Fair enough.

Fair enough.

OK.

OK, you're right. Sorry, I'll stop now.

Small meta-comment

These technical discussions are somehow amusing because it seems each party can live perfectly well with their own paradigm but it can't live with the fact that the other party can live well with theirs too. While dyn typers state that they don't have any code stability problems caused by type errors ( or just very few ) static typers don't be aware of problems with bloat code for sake of fullfilling typing requirements and coding/refactoring speed. So everything is perfect! But in a perfect world no two parties can exist with two radical different opinions on the same matter. The mere existence of the other party proves them somehow wrong: I can't live with my neighbour who prays to another god or elects another political party. That's like a virtual yugoslavia existing at a critical point and is in danger to break into parts each uncertain moment. Sometimes it helps being aware of defects in the human constitution more than of the wealths of PL features.

Kay

Small meta-response ;)

I am not against static typing. But claims that static typing is clearly better than dynamic typing are wrong.

No.

Static typing is more restricted, and thus objectively easier to make perform better.

You're (hypothetically) apply

You're (hypothetically) applying static and dynamic typing to the same program and looking at the result. I don't think that's fair: statically-typed and dynamically-typed programs are organized differently.

I mean that, in the real world, static typing isn't very important as a tool for program correctness. The value of static typing lies in program organization and style.

There's a distinct "modern C++ style" (lots of small classes with no virtual methods; heavy use of templates for program generation; transactional exception-safety). There's a "modern Java style" (community-standard interfaces with many implementations; inner classes; runtime configurability). There is a "Haskell style" - I'm not smart enough to characterize it yet, but I know it when I see it :-)

I stand by the opinion that those styles (born mainly from the respective type systems) are valuable and should be studied. If you say "dynamic typing is the way to go", you're throwing out the baby with the bathwater.

Types also add expressiveness

Type systems indeed do restrict expressiveness at one end. But you are completely ignoring the fact that, at the same time, all but the most primitive ones also add expressiveness at another (via explicit type constraints, type abstraction, value-independent type dispatch, etc.). Hence your conclusion about proof obligation is a non-sequitur.

Added expressiveness of static types

It's correct that static types can add expressiveness at another level, but I didn't ignore it, I just didn't talk about it because it's irrelevant for the point I was trying to make.

The claim that static type systems help to reduce program errors, i.e., that there is a useful relationship between type errors and program errors, needs convincing evidence (not just data points). If you simply drop that claim, you can still enjoy the shift of expressiveness from the dynamic to the static level if that helps you better expressing your programs - I don't want to convince anyone that dynamic typing is strictly better either, because that's also not true.

strange association...

i know many perl hackers die by vi. and many C++ guys are all emacs. but then, emacs is also absolutely essential for Lisp hackers and many other languages...

i don't think you should imply that emacs is closer to static typing.

i'm an emacs guy and mostly favoring dynamic languages...

emacs & ST

I wasn't implying anything, I was just choosing sides in another holy war that has no evidence.

Myths ?

A type error is not a program error.

Ah, huh? Not all program errors are type errors, but all type errors are program errors. (4 + "four") may not be a type error in your favorite language (although by doing so, the language may have violated the traditional semantics of "+"), but anytime you try to use a value in an operation that isn't defined for that value, you will get a failure at some point. What happens with that failure is another issue. "Type error" may only be defined within the boundry of a type system, but dynamically typed languages, like all reasonable languages, have type systems, too. Could you give an example otherwise?

Dynamic typing is easier to implement than decent static typing, and indecent static typing is a very bad thing, but that really is a different topic.

foo 4:r = 1 : foo rfoo "fou



foo 4:r = 1 : foo r
foo "four":r = 0 : foo r
foo nil = nil

foo [4 ,"four"]

Would probably be a type error in haskell. (I'm not sure I don't use the language personaly.) But with dynamic typing it could verry well be part of a working program.

Intent

If nothing else, types help clarify intent. Is that function returning a list of Foo's or Integers?


data Foo = I Integer | S String | Nil deriving Show

main = do print $ foo [I 4, S "four"]
          print $ bar [I 4, S "four", Nil]

foo :: [Foo] -> [Integer]
foo (I 4:r)      = 1 : foo r
foo (S "four":r) = 0 : foo r
foo [] = []

bar :: [Foo] -> [Foo]
bar (I 4:r)      = I 1 : bar r
bar (S "four":r) = I 0 : bar r
bar (Nil:r)      = Nil : []


>If nothing else, types help

If nothing else, types help clarify intent.

Yes. Static typing is good for alot of things. But its not without a cost.

foo [I 4, S "four"]

is not the same as

foo [4 , "four"]

Is that function returning a list of Foo's or Integers?

Both. In a dynamicaly typed language you could use a Integer any place you could use a Foo making Integer a subtype of Foo. (If you define Foo as the type that foo take a list of as argument.) Obviously I didn't define any type Foo so in that sense foo obviously return Integers.

foo in haskell

Soon, GHC will support impredicative types, allowing code like:

class Foo a where foo :: a -> Int
instance Foo Int where foo 4 = 1
instance Foo String where foo "four" = 0

x = map foo [1,"four"]

Type error...

Make this: "A static type error is not necessarily a program error."

I have hoped that this was clear from the context.

Static typing needs to exist

Static typing needs to exist because:

1) it gives some of us a warm fuzzy feeling, just like the one you get when your tests pass successfully

2) it yields interesting research (e.g. STM)

I suggest we stop the static/dynamic wars.

Static / dynamic "wars"

An end of flame wars between advocates of static and dynamic typing is a noble goal. But this will only work when both sides stop to suggest that their respectively preferrred programming style is "clearly" the better choice and the other will lead to chaos (or "unreliable programs", whatever that means).

Programs written in dynamically typed languages are not less or more prone to errors than programs written in statically typed languages. People who claim otherwise have the burden of providing evidence, but that rarely happens, except for the occasional anecdotal stories that don't prove anything.

If you prefer one programming style over the other, and are productive and happy with it, that's fine, go ahead! But be aware of the fact that yours is just a subjective preference, not an objectively better approach.

World peace

If you prefer one programming style over the other, and are productive and happy with it, that's fine, go ahead! But be aware of the fact that yours is just a subjective preference, not an objectively better approach.

I'd agree, but you don't seem too convinced yourself. Quote:

Static typing is more error-prone that dynamic typing.

:-)

...and happiness...

OK, you got me. Read the paper, though, it becomes clearer what I have really meant.

I will change that paragraph on my website. Thanks for spotting this.

3)

It can guarantee certain performance or complexity characteristics without running the program. (And thus hitting the Halting problem and undecidability.)

This is the big one, actually.

3) ?

I don't know about you, but usually, when my program hits the Halting problem, I have more pressing trouble than typing. Such as trying to locate that nasty infinite loop :)

3-)

Ha! I make sure none of my programs never ever halt, see my ocaml source code (!). Why would I need halting programs, anyway? It's not like the problems cease to exist, in general. Anyway, halting is seriously uncool, and a solved problem since the 60s.

Btw, it's not like motorola since the HC11 is doing something to fix the halting problem.

I work on embedded instruments...

...that have to run 24/7. I don't want my programs to halt, at least not until the user turns off the power. :)

Uhm.

All I meant is the fact that dynamic run-time checks cannot guarantee correctness.

Of course, nothing can guarantee "correctness"

...even in the presence of a perfect specification (one that is consistent, comprehensive, correct, and computer-readable). Given that specifications for computer programs rarely even get ONE of the four C's right, the best programmers can do is approximate a proof of correctness.

Guaranteeing correctness is a red herring. Improving the chances of correctness, however, is not.

Static type systems do allow automated reasoning about certain facets of a program's behavior, at the (small, in this case) cost of not being able to accept (statically typecheck) some number of type-correct programs.

How important or useful this feature is to getting some project X done is highly dependent on the nature of X. I've said it a zillion times, and I'll say it again:

* Insistence on only one sort of type system (manifest/latent, static/dynamic) as suitable for all of computer science is foolish.
* A skilled practitioner of programming will be well-versed in both, and able to apply either as it makes sense.
* Decisions as to what techniques and tools are appropraite for a particular project are best made by informed software engineers, not by research scientists investigating a new technology (or by uninformed programmers who blindly use whatever they know or like best, even if it's a square peg being driven into a round hole). Just because a metallurgist can develop higher-strength steel, doesn't make it his business to dictate to civil engineers how they should build their bridges. And just because a post-and-beam bridge worked when crossing Muddy Creek, doesn't mean that a civil engineer should use it for crossing the Mississippi.
* Decisions as to what techniques and tools are (in)appropriate for the programming/CS community as a whole should be made by that community as a whole; not by individual contributors spouting platitudes.
* Unfortunately, the discpline has far too few informed software engineers among its members, and an astonishing lack of professionalism. This is compounded by the fact that many technical decisions on software projects are made by persons unsuited to the role. Both are compounded by the fact that many customers of programming services (whether direct employers of programmers, or those who hire consultants) are unwilling to pay for top-drawer expertese or professionalism. This is further compounded/evidenced by an astonishing disregard--if not contempt--for the work of the research community among many members of the programming profession (including many otherwise competent programmers), including several programming subcultures who view the discipline as fundamentally art rather than science.
* This lack of professionalism increases the temptation for research scientists (unfettered by the concerns which plague industrial programmers) to assume the mantle of computer ethicist, and engage in the spouting of platitudes--often for or against some tool or methodology, on the often-disputable grounds that is is either wondrously good or inherently evil. Such harangues often falls on deaf ears, and only serves to increase the gulf between theory and practice.

Now that I've offended just about everybody with my rant, I'll stop now. :)

Beam me up!

I totally agree. Unfortunately, to make it into a real statement you need about four more commandments and a bit more biblical tone. ;-)

Commandments

How about this:

Thou shalt have no other Codd but E.F., and Larry Ellison shall make his profits.

Any others? :)

Going downhill...

I always thougt Codd was about relational databases, what has Ellison got to do with that?

Now, now...

lets not drink TOO much of the dbdebunk kool-aid, shall we?

Oracle, Sql Server, Informix, MySQL, and the rest of 'em are relational databases under the hood. It's just SQL that ain't relational. :)

I feel OCaml is a complex eno

I feel OCaml is a complex enough language already: type inference, parameterized types, objects, labels and polymorphic variants, modules, etc. With the addition of Dynaml, the language becomes about as complex as C++.

Read the tutorial (especially the section on equality functions), and I think you'll feel the same way.

This is probably a fair comme

This is probably a fair comment about making O'Caml too complex, which is why this project isn't really intended to become "part of" O'Caml. It's just an experiment.

I also agree that the stuff on dynamic equality needs work. I don't think there is anything there that is intrinsically complicated, just the current implementation/docco is lacking. If you were writing a list equality function in O'Caml, it would probably have type:

('a -> 'a -> bool) -> 'a list -> 'a list -> bool

Which is very close to what dynaml wants. The complexity comes from the fact that we cannot (yet) overload on multiple parameters, so the function has to be uncurried:

(('a * 'a) -> bool) -> ('a list * 'a list) -> bool

And then the other complication is explaining how dynaml automatically finds the first parameter, based on what values you give it to compare.

The overloading stuff is very new, it's only just been implemented, and was intended to demonstrate dynaml's power, not it's simplicity. I think it will improve and simplify with time. And maybe the tutorial could benefit from using a simpler example in this section, eg an overloaded function for creating string representations.

Some "evidence" in the static/dynamic typing debate.

Here's a piece from my own experience:

I have a reasonably sized application (~140000 lines) written in a dynamically typed language (python). This has been developed over more than 6 years by a team of 10+ developers. It has test suites, but they are by no means complete.

I'm certain that there are type errors present in the code. These will have been introduced due to re-factorings over the years, in corner cases not adequately exercised by the test suites. Occasionally these will show up for users, and be hastily patched. This is a strong argument for comprehensive testing, especially in dynamically typed languages.

However, if the application were written in a statically typed language it could not have these type errors. This is a fact.

So, static typing provides guarantees that certain classes of errors cannot occur if the code compiles. The more sophisticated the type system, the broader the guarantees. Dynamic typing provides no guarantees, but only a sliding scale by which the probability of errors can ben reduced by increasing the testing effort.

The open (and unresolvable) question, of course, is whether the guarantees associated with statically typed code outway the additional costs in developing it...

Evidence (of bugs and cost reductions)

This is a strong argument for comprehensive testing, especially in dynamically typed languages.

Unfortunately, the state space of even a relatively trivial program is sufficiently large that "comprehensive" testing simply isn't feasible. On the other hand, as dynamic language advocates are quick to point out, "not all program errors are type errors". Neither approach is going to catch every possible error. But a combination of static type-checking and lots of testing will catch more bugs than either approach alone. Personally, I'm all for a belt-and-suspenders approach when it comes to bug elimination.
The open (and unresolvable) question, of course, is whether the guarantees associated with statically typed code outway the additional costs in developing it...
Ah. Since "evidence" is the emphasis in this thread, I assume that you have some evidence to support the assertion that developing statically type-checked code costs more than developing dynamically type-checked code. Evidence which also supports the notion that it's actually the dynamic nature of the language that results in lower development costs, and not the inclusion of higher-order programming constructs and higher levels of abstraction that produces cost reductions.

I agree...

Unfortunately, the state space of even a relatively trivial program is sufficiently large that "comprehensive" testing simply isn't feasible

Yep.

I'm all for a belt-and-suspenders approach when it comes to bug elimination.

Me too, after experience with team development of large static and dynamically typed systems.

I assume that you have some evidence to support the assertion that developing statically type-checked code costs more than developing dynamically type-checked code.

Not at all. I was not intending to say that the costs were higher. Only that they might be (I'm not convinced they are). Instead of writing "the additional costs", I should have written "any additional costs".

QuickCheck

In my experience, the 'bug space' of a program is easily found with a tool like QuickCheck. QuickCheck finds corner cases and other problems quickly. It still depends on the quality of your test data generator, but those aren't difficult to write.

I think test driven development decreases the cost of developing software, whether dynamically or statically typed. I wrote up a TDD extension for QuickCheck that saves failing test values for later retesting, it works for me.

I do have years of experience dealing with large codebases in Python. I often find code in Plone and its extensions that never could have worked. In those cases I wish for static typing, but I think unit tests would help just as much to improve the code quality (though ina different way).

--Shae Erisson - ScannedInAvian.com