Type systems and software evolution

Since the "why are type systems interesting" thread is getting so long, I thought I'd start a new thread on this somewhat more specific issue.

I agree with Frank that software development is concerned with building artefacts (i.e, program, program modules, classes, libraries and what not) about which we can reason statically - which seems to me to be tantamount to saying that we must be able to understand the product we are building, not just be able to run it.

But let's back up a bit, and consider the problematic issue of gathring software requirements. This is one of the issues software engineering is struggling with. We aren't very good at establishing what the real customer requirements are, and even when we do, customers change their mind, or want more features in the next release. The iterative life cycle model isn't only (or even primarily) about fixing bugs. It's about evolving software per customer requirements.

Now let's assume a best case scenario, where we have consistent, even formal, requirements, and we design a software system that fulfills them. We can reason statically, with the aid of a static type system, and convince ourselves that the software does what's expected from it.

Now comes the next cycle, because users want changes and additional features ("show longer names on the tracker page", "enable keywords for comments" etc.) Some would argue that unless we are talking about making trivial changes, we should start form scratch and redesign the system. For various reasons, most of them obvious, this isn't what usually happens. We modify the exisiting sytem. Techniques like regression testing help us make sure we haven't caused too much damage.

Now, let's consider the question of whether the type system helps us with this sort of activity. Obviously, types give that same static guarantees for the changed program they give any type correct program. But can the type system do more? It's likely that some of the changes require chaneging the types we use in our system. Which type systems help us make such changes locally? Do types introduce unnecessary coupling to the sytem, making evolution harder? How about support for (sound) refinement, which would allow checking that changes are consistent with prior specification, while not propagating the the changes in the system in such a way that the impact of the change becomes unmanagable?

I think these sort of questions are worth considering. Thye may help us design better type systems, which would help solve a real need.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Re: Tayssir's Release post

Unfortunately, your view of software development seems to have a dependency on "releases." However, there are methods of development which depend on the emergent properties of computing, symbiosis with a human, or exploratory coding.

This sounds like a theoretical tarpit I would rather not enter. I am not interested in emergent behavior, whatever that is, unless it says something practical about software.

This is like rejecting valid forms of creativity, such as brainstorming, because they do not purport to have a sound logical result at each step or even at the end. Even if each step does culminate in a product of some sort.

My model doesn't reject these things, it just doesn't account for them. But then, neither does any other model. Why should it? It's not useful to talk about intermediate steps unless you can relate them in some meaningful fashion to the final product.

That's why I mentioned limits and converging sequences. I can give you an infinite number of sequences of numbers whose last element is n, but the only interesting ones are the ones where you can assign some meaning to the way in which the earlier numbers determine later ones. Maybe at some future time you will see a new pattern; but that's then, and not now.

As a sidenote, it seems you have a mathematical way to chop away philosophical issues.
That's right. I don't want to pontificate about philosophical issues if I can address them mathematically. IMO, philosophy (and methodology) is useful only where technical tools fall short.
Like when confronted with the question, "What is is?" you use a traditional mathematical definition, which of course was developed to make the philosphical question trival so you could give an easy answer
I think you misunderstand why I take this tack. It is really the opposite of what you seem to think, and frankly I would rather accuse you of defining away the important issues in these debates.

I am not trying to define away philosophical problems; I am trying to factor problems into parts which can be addressed conclusively (mathematically, say), and parts which cannot (philosophical or methodological parts). This discussion is a perfect example.

I have been saying that we should regard a development process as a finite sequence of different programs. So for example, Mozilla 0.9 is a different program from Mozilla 0.8. Many others would rather regard the entire sequence as one program, so Mozilla is one program and 0.8 and 0.9 are just "variants" (or something) of the same program.

The problem with the latter definition is that you can no longer talk about correctness (or efficiency or whatever) of programs. If Mozilla itself is a program, then it is always possible that, somewhere down the line, they will come out with a bug-free version, or a "complete version" or whatever. You don't know. Even the developers don't know.

To say anything useful about particular releases, then, you have to define the status of a "release". It's not a "program", you insist. OK then, what the hell is it? How do I make a useful statement about it? Sure, you can redefine "correct" and "efficient" and so on for "releases", but then you have to redefine a huge number of other terms too. This way lies madness.

Unfortunately, most people do not bother to do even that. They think that by renaming the concept, that they can get around the responsibilities and obligations associated with correctness or whatever. "Sure, this release has bugs/doesn't support Mac OS X/is a bit slow/whatever. The next one will be better, though." Yeah, great; a claim which can be infinitely postponed is not a claim. (Consider the Eldred vs. Ashcroft case.) An infinite program is not a program. A sequence of numbers is not a number. A starlet is not a star, an incumbent is not a winner, and a conjecture is not a fact.

If you regard each version of a project as a different program, though, you can reuse the tools that we already have for talking about programs. You can talk about correctness, behavior, efficiency and so on. There is nothing about that which prevents you from relating different programs produced by the same project, though. You can say one release refines another, or release X produces a different output from release Y on input Z. It doesn't prevent you from talking about maintenance or patches or hot-patching; it just means that when you talk about these things, you retain the ability to express yourself in well-defined, meaningful ways.

The issue here is similar to the one with imperative update. People who have never seen a functional language think that the only way to write the list append function is not as a function, but as a procedure which updates on of the lists. And then they get confused because they forget that a "variable" does not denote a value; it denotes a location in memory. In an FP language, of course, it is natural to write an append function which is a function, and produces a new function as a result rather than updating. And a "variable" really is a variable and does denote a value, with no exceptions.

Having said that, it is probably the case that I have sometimes formalized part of a problem, and then made it seem as though I've also addressed the remaining part. If I do so, you should call me on it. It's not my intention to do so.

I wrote, and you complained about:

BTW, if you consider two values/functions to be equal merely because they are assigned the same name, then we will never agree on anything. [...] If you alter the program so it behaves differently, I consider it a different program.

[BTW, the last sentence is wrong; I consider two programs unequal even if they are syntactically different with the same behavior. But if they have the same behavior, we can say they are extensionally equal as functions.]

You seem to dislike this. I would guess that it's because you think that the only way to patch a release is by identifying two definitions with the same name. But that's not the case, and in fact by doing so you are losing the ability (or at least making it more difficult) to talk about behavior in a sensible way.

More generally, I don't like this model of patching because it has nothing to do with the meaning of the program to be patched. I can name any part of a program any way I like, and it doesn't matter what that part does. I might make some definitions local; if I can only patch global definitions, then I have to be clairvoyant and be sure to bind everything which is likely to break at the top level. (OO inheritance has the same problem.) It imposes a hierarchy on programs which is artificial. And so on and so on.

Do I know a better solution? No. But by distinguishing between a thing and its name, I can at least point out the problems with a naive solution.

Let's not continue this here...

I am not interested in continuing this discussion in this current thread; Ehud is talking about something else, closer to what you consider "practical" and are interested in. Therefore this is no place for my post to have a new life, and can only serve to ruin his apparent desire for an important, constructive thread.

I don't dislike anything you say, merely I waited 'till the end of a long thread and wanted to work out something which was unresolved in my mind. I offered my email address for you to tell me that I was beating a dead horse (an English phrase which means continuing a discussion beyond anyone's interest) or causing pointless controversy.

If you wish to delete your response in this thread and put it in the originating one, I will delete this post. (By editing it so it is blank?)

Take a look at Michael Hicks's work....

He has been designing type systems for safe dynamic update of programs (ie, changing representation types and function definitions at runtime), and has worked out a lot of interesting stuff about it:

Dynamic Software Updating

Since this paper was written, he has worked out an inference algorithm for identifying program points where updates can safely happen, as well as better support for updating closures.

Reuse Support in Type Systems

Reuse Support in Type Systems

In this chapter [4] types and type systems were discussed with respect to their support for enhancing the reusability of software. Current programming languages form a very mixed bag in what they provide, and those few with a large user community that combine good ways for making code generic solve the problems at source level. It is easy to pinpoint what the conflicting requirements are:

  • programmers want flexibility, which generally reduces specification precision, but at the same time
  • programmers want a good specificational functionality, since that enhances quality.

Solving this quest for flexibility, while holding up to the promise of quality enhancing features, generally results in solutions with either a large cost in the software construction phase (as is the case in Ada), or else at runtime (as is the case with languages supporting type identification). The latter approach seems to be the most widely accepted, even though code not needing the added checks may suffer in performance.

The use of polymorphic types like in the typed lambda calculus provides a good way out of the dilemma, since these types make more of the hidden 'type flow' explicit. Generic and opaque types, as well as polymorphic variables can easily be modeled. Parameterized types can be provided for also, as was shown in the previous section. The introduction of instantiation for universally quantified types allows type parameterization without the introduction of a new type expression. It also cleans up the type system in the sense that all forms of instantiation - i.e. the removal of polymorphism from a generic type or item through the application to an actual type parameter - are now applied to the same kind of type.

With the introduction of these polymorphic types in a programming language, support for the construction of reusable software is increased considerably. Of course the language in itself is not enough to solve the total problem of software reuse, but it provides a platform that allows the gap between strict specification and (unfortunately) non-strict implementation to be reduced.