Flexible Addition of Static Typing to Dynamically Typed Programs

An idea occurred to me the other day, and I was wondering if anyone has seen anything like it. I'm sure I can't be the only one to have thought of this. The idea is to take the type annotations that we normally have in statically typed programs and place them in a separate file. A program in a language which allows this would look like it was dynamically typed, but static typing could be added progressively by adding declarations in a separate file. A good IDE could present a merged view of the source.

The thought behind this is that languages seem to mix two concerns in the program text: 1) the form of the program, and 2) an error detection regimen enforced by the compiler. It seems that they could be decoupled. A piece of program text could, for instance, have one set of types in one program, and another set of types in another.

Note that this has nothing to do with type inferencing. The idea is do to this for a "run of the mill" dynamically typed language.

Here's a link to a blog describing the idea.

I apologize for the lame attempts at entertainment at the beginning of the blog. They, regretfully, obscure my point a bit.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

I like it!

I've been reading this site for a while now, but this is the first time I felt compelled to create an account so I could post. I think this is a great idea to explore. I don't know that I would go as far as to say that strong typing is a form of bad coupling... but I have come to endorse the idea that strong typing can often give a false sense of security as well as get in the way of stating intent simply and clearly.

Another reference

You should check out Gilad Bracha's paper on optional types. It's linked through this page.

Oh that's a nice paper

Suddenly, I don't feel so alone anymore. I've come to many of the same conclusions.

Its a pity he isn't doing his research in Squeak.

Types of individual expressions?

It seems your desire is to have types checked at the property/method boundaries (others have proposed a module boundary). The problem is that these methods are composed of individual expressions, which you haven't provided a way to provide type information. This is going to make it rather difficult to do much static type checking. The best you could do is have the compiler roll out a contract check that verifies the runtime type of the function parameters and return values (a pre and post contract). Might be a nice way to auto-generate tests but it certainly doesn't qualify as static type checking - at least not one that is sound.

One other point that should be conveyed is that type systems shoud not be viewed as an impediment to getting things done. Rather, one should view the act of constructing programs as an implementation detail for manipulating types. Whether one believes that dynamic or static typing is the best path, you still have to be cognizant of types. Indeed, a programmer in a dynamically typed language should have more type discipline than those in static languages, as the compiler is not guiding your types for you.

Yes

I see your point about local expressions. Perhaps a little bit of type inferencing could help, giving a local the type of its initializing expression for example. I admit that this is not thought-through all the way.

Regarding type systems as an impediment, I admit that I have a different perspective. For the past three years, I've been traveling around helping people try to get unit tests in place in statically typed code that was not designed for unit testing, and it is a rather horrible experience. You have to do a lot of work to break dependencies that are exacerbated by the dominant type systems in the industry. I admit that my experience is atypical, but I really do think it points to a problem.

Cognizance of types while developing new code? I fully agree. It's important whether we work statically or dynamically.

typeof(x++) = ?

Taking a simple case of addition, one might suppose that given x is an integer and 1 is an integer that:

x += 1

is also an integer. But if we define an integer as a value that has a finite range (composed of 31 or 32 bits), then we have to worry about the value of x causing an overflow in the arithmetic. A static type system can not guarantee the absence of an overflow error (though there is work being done in this area). The only thing that it can guarantee is that if the operation is successful, the value in x will be of type integer.

Many dynamic languages, however, don't like these range boundaries so they silently promote the type of x from an integer to a bigint. From a typing standpoint, this means that you can not statically guarantee that the result of the addition will be of a certain type - it may be int or it may be bigint.

Of course, this points to a different sort of problem with static type checking. First, why have the distinction between int and bigint? Well, there is a matter of speed. Second, if we can have a definition of a discrete range, why can't we have more control over that process - statically guaranteeing that a number is in the range of plus or minus 10 for example?

puzzled

"A static type system can not guarantee the absence of an overflow error... The only thing that it can guarantee is that if the operation is successful, the value in x will be of type integer."

"why can't we... statically guaranteeing that a number is in the range of plus or minus 10 for example?"

Wouldn't not being able to statically guarantee the absence of overflow explain why we couldn't statically guarantee the absence of underflow and overflow for a subrange.

Different ends of the problem

Not being able to statically prove that an overflow can occur on a predetermined fixed number of ranges would also be a problem if one opens up the capability to have subrange types. However, many (if not most) static compilers don't allow you to statically guarantee a subrange. Specifically, if the type were truly a subrange of plus and minus 10, the operation of x = 10+1 will not fail and the x value will have a type error (i.e. no runtime error). The next operation on x can not assume that the subrange of x has been guaranteed.

Of course, there are static languages (like Pascal and IIRC Ada) that do have the capability to define subranges beyond the normal 8, 16, 32 and 64 bits. But that's the question I was posing: If enforcing these discrete subranges are a good thing, then why not supply the same guarantees for types that range over a different set of numbers? No particular reason it can't be done, other than it makes mixing these number types with other number types more complex.

statically prove subrange overflow

My puzzlement came from the way you seemed to ask for static guarantees on subranges, after already acknowledging that a static type system can not guarantee the absence of overflow.

Now I think you are just talking about having a representation for subranges that can be statically checked for consistency, and range checked at runtime.

"Subrange types were introduced in Pascal (and adopted in Modula) for two reasons: (1) to indicate that a variable accepts a limited range of values of the base type and to allow a compiler to generate appropriate guards for assignments, and (2) to allow a compiler to allocate the minimal storage space needed to store values of the indicated subrange. This appeared desirable in connection with packed records. Very few implementations have taken advantage of this space saving facility, because the additional compiler complexity is very considerable. Reason 1 alone, however, did not appear to provide sufficient justification to retain the subrange facility in Oberon.
N. Wirth - From Modula to Oberon

Probably got off track...

Really just touching the tip of the iceberg when it comes to type inference. From a dynamic standpoint, you could elect to make the type of all whole numbers be the same - with no consideration for the typical machine sizes (i.e. really big int). But then you might have to sacrifice some of the speed gains that static type systems usually try to take advantage of. If you elect to use the machine ranges, then you have to suppress the auto-conversion from one type representation to another.

Anyhow, since the original poster was using the example of an increment operation over an Integer type, I was just wondering what he meant when he assigned a type integer? Is this a range integer in terms of a register machine (2^32 bit combinations)? Or is it an integer as in the infinite variety? Either way can and has been made to work. It's a multitude of little issues like this that have to be decided in making a type system.

WRT to Wirth, the problem that I saw with respect to Pascal's range type is that the math between values that had different subranges was fairly hard to get a handle on. For example, if I multiply two numbers that range between 0 and 10, what is the range for the result? Is it 0 to 10; or 0 to 100; or is it just promoted to a non-subrange numeric type (32 bit, etc)? And it becomes even more complex when you start doing math on numbers that have different subranges.

TaPL

I'm at work so I don't have it with me, but the questions you are asking are almost straight out of chapter 8 or 9 of Types & Programming Languages.

As far as I understand, it's not possible to guarantee statically that the operation won't overflow without running the program.

If I remember correctly, GHC's Integer (big int) uses a native Int until it needs more space as well.

You can optimize bigint...

...to use machine ranges when it is small (byte to short to int to long to bigint). However, this is just an optimization trick in that the type is really always big int. There is some overhead in this method though in the sense that overflow can just either (a) be ignored; or (b) throw an exception. Each math instruction has to check for an overflow and change the representation (though a bit or two can be sacrificed to encode the current values representation).

Concrete examples?

You have to do a lot of work to break dependencies that are exacerbated by the dominant type systems in the industry.

Could you give some examples? I don't understand how static types are hurting. What kind of dependencies exist that don't with dynamic typing?

Dependencies in static languages

Here's the simplest one: in C++ (or Java, C#, etc) a concrete class A has a method that receives an instance of a concrete class B. It's hard to test that method if you don't want to create a B or if B does something nasty that you wouldn't like to have happen in a test. There are many ways to solve this one, the simplest is to extract an interface. A while back, I wrote a book about my experience getting past these sorts of testing and dependency issues. It's called 'Working Effectively with Legacy Code' and it ends with a catalog of 24 dependency breaking techniques.

class Foo implements SomeConcreteClass

I think adding support to Java for declaring some class "implements" another class would solve this. It should be possible to make a class Foo a subtype of the public interface implied by SomeConcreteClass simply by declaring that it implements that class (and of course actually implementing those methods).

The public interface implied by a class would simply consist of that class's public methods and fields. The "public fields"-part make this non-trivial, of course; but then again, I think treating (public) fields differently from methods was a bad idea (people end up writing getters and setters themselves anyway, so no speed gain there).

If public fields are dynamically bound and if you can override them with a pair of methods to do custom setting and getting, I see no reason for barring them from interface definitions. I think the "implementing a class"-feature would then really be trivial.

I think adding support to

I think adding support to Java for declaring some class "implements" another class would solve this.

Well, in Java you can just extend the class and override methods, assuming that the class isn't final. That would also fit in with the dynamic typing model.

Thanks Michael for posting the example. The "catalog of 24 dependency breaking techniques" sounds interesting. If you have a writeup somewhere that has escaped book form I'm sure we'd all be interested :)

Java only has single inheritance

Well, in Java you can just extend the class and override methods, assuming that the class isn't final.

... and assuming that's the only class you're inheriting from.

and update all construction sites

Note that your suggested solution requires one to update all points in the code where objects of that class are constructed to call your new subclass's constructor.

(And, as another person noted, it also assumes that you have not extended that class elsewhere.)

I think your solution's need to update the code's construction sites really highlights why we need a new linguistic feature for this; the construction sites for A should be oblivious to the superclasses of A, including ones that were declared after the fact.

Recap

(First of, thank you for pointing out the things I hadn't thought of; interesting stuff. I should've phrased my post more carefully, as I do realise that you can hardly 'solve' such a hard problem by thinking about it for a couple of minutes.)

My suggestion really only boiled down to language support for the 'extract interface'-refactoring. There are of course many more facets to this problem, as you and others suggested. I tried to summarise the different points (I apologise if I misunderstood, please correct me if I'm wrong or if I missed any):

  • There may be several construction sites (although the original post was simply talking about passing a mock instance to some other class)
    • I think this could be solved by support for virtual classes,
      which is (more or less) orthogonal to the language feature I suggested. It should be possible to (1) override a final class and (2) specify whether you want to inherit from the class you're overriding or you only want to establish a subtyping relation (the latter probably being your only option if the class was final).
  • You're forced to call the super-constructor in a subclass (only applies when extending a class, not when you simply implement its implied interface)
    • See Smalltalk for a more elegant solution (separate allocation, the task of the class, and initialisation, an instance-level thing)
  • You can simply extend the class and override some methods
    • Well, I was talking about the case where:

      • you simply don't want to inherit any part of the implementation of the original class, you only want to be a (behavioural) subtype
      • the class you want to be a subtype of is final, or has final methods that you want to override
      • there are multiple classes you'd like to subtype (not subclass)
  • People may add methods to the class you're extending
    • Don't see how to handle this in the language, besides the research on the Fragile Base Class problem (related work: reuse contracts, selective open recursion). In general however, I think that, if you change a (public/published) interface, you have to deal with the dependencies you break.
  • Java treats properties and methods differently
    • The difference between a field and a getter&setter method should be hidden (thus allowing you to override a field by a getter&setter-pair and vice versa, as in Scala or C#)

In summary, I think coupling classes&types and subtyping&subclassing is a good default, but it should be easy to separate them. That's what I had in mind with my initial suggestion.

Despite the fact that 'final' is a tool of the devil..

Yes, you can extend and override methods if the class isn't final.. and if there isn't anything happening in the constructor or any superclass's constructor that you wouldn't like to have happen in a test.

Extend and override is a quick fix, but I like to go to interfaces because it's pretty easy for people to add new methods to the class you're extending, and force you to go back to the extension to override those new methods. It's also bad from a dependency point of view.

I don't have a writeup of the 24 techniques outside of the book :( but a few are named on this blog. I need to write more articles.

Not so much a static typing problem...

...as it is Java's (and C++) (a). conflating the ideas of defining types with code inheritance; (b). having a difference between properties and methods; (c). having two methods of building types (interfaces and class inheritance); (d). having a proclivity for final (or non-virtual) methods. All of these things create grief in terms of substitution (polymorphism).

Use ML functors to break dependencies

There are many ways to solve this one, the simplest is to extract an interface.

Unfortunately, as we both know, that is rather intrusive in the mainstream statically typed OO languages.

In ML you can define an interface for B non-intrusively, make the module that you are actually working on into a functor, and instantiate the functor once for the actual production code and another time for tests using the desired test stub code instead of the real B. Let's see how it would roughly look like.

Here is the code that you want to test:

structure Foo =
   struct
      fun bar (a_baz : Baz) = (*...*) fooBar a_baz (*...*)
      (* ... *)
   end

It directly depends on the Baz type and performs operations (the call to fooBar) on Baz values. So, we turn the Foo module into a MkFoo functor:

functor MkFoo (type Baz
               val fooBar : Baz * Baz -> Baz
               (* ... And specifications for other
                  necessary operations on Bazes ... *)) =
   struct
      fun bar (a_baz : Baz) = (*...*) fooBar a_baz (*...*)
      (* ... *)
   end

Then we instantiate the functor once for the production system:

structure Foo = MkFoo (type Baz = Baz
                       val fooBar = fooBar
                       (* ... *))

This is quite mechanical. In particular, you don't have to modify anything inside the struct ... end module definition of the original code.

For testing, you simply instantiate the MkFoo functor with the test stubs:

local
   structure Foo = MkFoo (type Baz = MyTestBaz
                          val fooBar = myTestFooBar
                          (* ... *))
in
  (* unit tests for Foo *)
end

More or less how the Erlang type system used to work

Back when there was research into adding more type information to Erlang programs (something that appears to have gone away).

Very like Common Lisp

Common Lisp declarations (with a few exceptions) work exactly like this, except that they are not in a separate file. A declaration states that the value of an identifier always belongs to a certain type. Implementations may ignore a declaration, use it to generate a run-time check, or generate code which assumes that the declaration is correct.

The local-expression issue is handled by syntax:

(the <type> <expr>)

is an expression whose value is <expr>, and which declares that the value of <expr> belongs to the type named <type>. It too may be ignored, used to generate a run-time error, or assumed correct.

The problem with all such schemes is that it is a black art to know just which declarations should be added to optimize for performance and/or safety on particular implementations (and it is possible to construct declarations which actually pessimize these things).

CMUCL docs

The problem with all such schemes is that it is a black art to know just which declarations should be added to optimize for performance and/or safety on particular implementations

For CMUCL the Advanced Compiler Use and Efficiency Hints section of the manual is very good. I think it applies to SBCL too.

There was the Strongtalk system

Strongtalk was a Smalltalk system that allowed optional type annotations of variables and would generate optimized code based on the annotation. It would also generate generic code for cases where the variable didn't happen to contain an object of that type.

Sadly, Sun acquired the company working on Strongtalk and forced the programmers to use their powers for evil - making them build the Hotspot runtime optimizer for the dreaded Java language.

Yet another instance of a big company with too much money setting the state of the art back another 10 years.

forced?

forced the programmers to use their powers for evil...
Are you suggesting they were unwilling participants?

Perhaps "sold out" is a better term

Seems that everyone has their price.

their choice

Let He Who Is Without Sin Cast The First Stone

There must be a reason I'm not rich

that might be part of it. :-)

?

too much time spent throwing stones?

It was a pun

or palindrome

or something

If types are optional...

...then type declarations can't change the meaning of your program (obviously). To programmers used to today's dynamic languages or things like Common Lisp this usually sounds like a good thing.

This ends up meaning that you can't rely on types for static overload resolution (although multimethods can provide similar behavior). Perhaps more importantly, it also means that the effect of the expression "x.foo" for a particular value of x must be independent of the context where this expression appears (including the declared type of x, if any).

This doesn't sound like a big deal, but it precludes having a class with two members with the same name. If you don't think that this is a problem, then look at how creators of Python libraries typically mangle the "internal" member names of their classes to avoid collisions (or how Common Lisp users will create slots whose "names" are unique symbols rather than strings).

Many people make the mistake of assuming that type information is only useful for proving some correctness properties on your program, or providing hints to an optimizer. Once type declarations are required rather than optional, though, you can also use them to resolve the meaning of code from its lexical context.

Type Driven Overloading Considered Harmful

Many people make the mistake of assuming that type information is only useful for proving some correctness properties on your program, or providing hints to an optimizer. Once type declarations are required rather than optional, though, you can also use them to resolve the meaning of code from its lexical context.

I can only assume you mean type drive method overloading as implemented in C++ or Java. Having worked in languages with both, I've concluded that overloading just isn't that valuable and only adds confusion. I just don't see the benefits of being able to say

foo.bar(7)
foo.bar('seven')

rather than

foo barInteger: 7
foo barString: 'seven'

In fact, the second version is more explicit and obvious - so I prefer it.

Method overloading isn't the issue...

[Apologies for the long reply, but I am not smart enough to make a short one...]

I agree that static method overloading doesn't buy you very much, although there are two obvious counter-arguments/questions:

1 - Isn't decorating your method calls with the explitly type-qualified names "Integer" and "String" an ad-hoc equivalent of adding type declarations on your variables?
2 - How would you deal with the ad-hoc polymorphism that comes up for things like mathematical operators? Without static overloads you either have to write one uber-function that handles all cases (not modular) or use explicit functions with type-qualified names instead of handy operators.

The real problem has more to do with "unintentional" overloading, of either data or function members, and the way it breaks modularity. This can arise for example if I want to make a single object that satisfies two protocols (A and B), which both happen to use the method name "foo" (for different reasons). If type decls are required, then a piece of library code that calls "x.foo()" can use the static type of x to turn this into the more explicit "x.A::foo()" or "x.B::foo()". Without type decls this code is ambiguous at runtime.

The same problem arises with library versioning. Suppose I subclass class C in library L, and my subclass adds a new member "internalDetails" for some nominally private data, and then L version 1.1 comes out and the new version of class C has its own "internalDetails" member. Without declared types to resolve which is meant from context, both implementations with share one storage location, and hilarity will ensue.

Typically people counter this by saying one of:
- Decorate your member names to avoid conflict.
- Be prepared to modify library code when conflicts occur.
- Be prepared to modify your application code when new library versions ship.
- Change your member accesses syntax to explicitly name the "owning" class of the member you want.
- Put member accessors under the control of the module system, and make clients explicitly import the version of a given member name that they want (this is Dylan's solution).
- This problem doesn't occur in practice.

The first three options all break modularity pretty hard, but get recommended a lot when this comes up. The syntax change or module-system approach would make your language as brittle as if you put in static type decls, and would seem to be more work for the user, so why do it?

As for the last point, at the very least Tim Sweeny has claimed (on these boards) that this kind of modularity concern actually matters in game development (where lots of third-party libraries needs to be integrated, and clients want to drop in new library versions without any surprises).

[edit: cleaned up many typos]

Some responses

The real problem has more to do with "unintentional" overloading, of either data or function members, and the way it breaks modularity. This can arise for example if I want to make a single object that satisfies two protocols (A and B), which both happen to use the method name "foo" (for different reasons).

In 16 years, this has never happened to me. (making various typedefs of BOOLEAN work in C/C++ - now that happens a LOT).

If it did, I suppose I'd come up with some kind of adaptor/proxy where I handed an A proxy for me to things expecting A and a B proxy for me to things expecting B's.

How would you deal with the ad-hoc polymorphism that comes up for things like mathematical operators? Without static overloads you either have to write one uber-function that handles all cases (not modular) or use explicit functions with type-qualified names instead of handy operators?

The usual solution is to use double dispatch (also called "visitor"). However, I can see how you might find this unsatisfactory from an extensibility perspective. I ran across this when trying to implement Qualifiers for Java objects. It seems that new Integer(5).equals(new Long(5)) returns false when int i = 5; long j = 5; i ==j evaluates to true.

My solution was to allow each type to register a comparator in a dictionary keyed by class pairs (everything is an object/has a class). It did mean that there was one place in the code that listed all types exhaustively - but it was only one place rather than scattered through the code like the typical double dispatch implementation.

Its all a question of form though - the other solution where you have overloading is to have to write operators for all types anyhow - there is no free lunch - its a grouping problem more than anything else.

The compiler can't know how to perform all operations

If you create a new type, and you want it to interact with the existing number system (I'm still thinking in terms of creating a new numeric type that responds to all the usual mathematical operators and mixes with all the existing numerica types), then you have to write the code that says how to add an int to it, and how to add a float to it, and how to add every other kind of number you already have. How is the compiler going to automate that?

Answer, it can't. Given N types with m operators among them, you still have to implement n^2 * m functions to handle all combinations (I'm assuming you're starting with zero types and the compiler doesn't know anything). Adding a new type still costs (n+1)*m amount of work to do the integration. That is irreducible.

Whether you use a dictionary of comparators, or double dispatch using type overloading, or double dispatch using named methods doesn't matter.

I should mention that Java's utter lack of extensibility required the external dictionary based approach.

The dictionary

was just an example of a dispatching strategy. One that the compiler might well use under the hood. (I don't know a lot about how multimethods are done - but this seems a likely candidate for implementation strategy).

I think we've strayed pretty far from the original topic here.

Related

Soft typing and pluggable type systems seem like th relavent keywords, I think.

Soft and pluggable

Yes. I've been out of the loop on some recent thoughts. I just got a link to Gilad Bracha's paper on optional type systems yesterday, for instance. It seems to be the same set of ideas.

Quote from Gilad

The following quote from Gilad's weblog is worth recalling:

There are endless religious arguments over the merits of one approach or the other. These debates are often pointless, because the split between the mandatory and dynamic type religions is a false dichotomy.

Someone should write a nice, not too opinionated, FAQ on the dynamic vs static typing question...

Yup, as I was saying...

Good idea, though keep in mind I'm biased...

Yes, I mentioned this idea once on usenet, of separating type information into a separate file in this manner.

But since I didn't pursue it, the idea was fated to go nowhere for all this time. ;)

Chuck

http://www.lexspoon.org/ti/

Lex Spoon (and others?) created a dynamic type inferencing system/technique for Smalltalk. I think this is quite close to what you're interested in.