How to classify type of ML's exn ? (Naive question)

So exn seems kind of like an 'data' type, but with some odd kind of 1-level subclassing (of exn). But exn certainly doesn't seem like a Class type in any simple + common notion of a such a thing.

So what exactly is exn? Why isn't it possible for developers to create these types of their own? (or is my SML wrong?).

Are there other languages where one can create these odd 'exn'-style types? Does generalizing creation of these sorts of types create some other problems or burdens on the rest of the type system?

Aside from raise, are there general classes of problems (development level or problem domain level) that these 'exn' style types are meant to solve?

Many thanks for any answers or resources.

Scott

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

exn is an extensible sum

What this means is that exn is a sum type which you can, with a run-time side-effect, create new branches for. The resemblance to OO comes from the fact that sum types in ML look a bit like classes in OO -- in both cases there's a runtime tag that you can look at. However, there is no subtyping associated with exn.

It's a minor misfeature of ML that the only extensible sum in the language is the one used by the exception mechanism (ie, you can raise exns). These two features (extensible sums and exceptions) are conceptually orthogonal, and should not be coupled.

What makes it a minor rather than serious misfeature is that it's possible to implement the functionality of exn in ML, because ML is pretty much the only language that supports both implicit side effects and true type abstraction.

module type EXTENSIBLE_SUM =
sig
  type tag
  type exn

  type 'a ops = {
    tag : tag;
    make : 'a -> exn;
    case : tag -> exn -> 'a option
  }

  val newtag : unit -> 'a ops
end

module E : EXTENSIBLE_SUM =
struct
  type tag = int
  type exn = tag * (unit -> unit)
  type 'a ops = {
    tag : tag;
    make : 'a -> exn;
    case : tag -> exn -> 'a option
  }

  let gensym =
    let counter = ref 0 in
    fun () -> (incr counter; !counter)

  let newtag () =
    let return = ref None in
    let tag = gensym() in
    let make v =
      (tag, fun () -> return := (Some v)) in
    let case tag' (tag, thunk) =
      let update =
	if tag = tag' then thunk else (fun () -> return := None) in
      (update();
       !return)
    in
    {tag = tag; make = make; case = case}
end

What the module E does is to implement the extensibility features of the exn type. The newtag() operation creates a new branch (identified by a tag), which you can embed values into using the returned make function. Given an exn, you can test whether it has the relevant tag with the case function.

It does this with a bit of higher-order imperative code: morally this exploits the fact that the heap can be viewed as a giant extensible sum type (i.e., where the pointer values are the tags, and the pointed-to object is the tagged data).

Can you help me understand

Can you help me understand how newtag works? It looks, to my naive eyes, that successive calls to 'make' and 'case' should be clobbering/overwriting the single shared reference cell. Obviously, this isn't the case, but why?

That ref is only read

That ref is only read immediately after it's written, when case forces the thunk. That code works around ML not having dependent types. Instead of having an exn be a dependent pair (tag, value of corresponding type), a thunk of type unit->unit (which works for any exception) is used to place a value of the appropriate type in a per tag ref. Again, the ref is only modified during case, as the thunk is run to get the value into the right ref if tags match.

One ref per exn

There is one reference created per exception constructor. It is only used to tunnel back the argument of such a constructor to the `case' operator: if the tag matches it will be set to Some x, otherwise to None. This is a way to get the value of type 'a without having 'a show up in the implementation of type exn (because it is dependent on the tag, and you cannot express that in ML).

Extensible/open datatype/sum

They are called "extensible datatypes" or "open sum types" or a combination thereof, see e.g. Harper & Stone's type-theoretic account of SML. There is no subtyping involved, it is essentially just a datatype with an open set of (generative) constructors. (Although there have been proposals for generalizing them to hierarchical datatypes, which does involve subtyping and acts more like a class hierarchy.)

As for languages that support user-defined versions of them: in Alice ML (an SML extension) we allow the users to introduce their own "exttypes", and exceptions just become a special case of that. There are a number of uses for this, e.g. for generative names, to model property lists, or for message types — basically any situation where you need a datatype but there is no single location or module in the program that can/should know all possible variants for it.

So in a language with a

So in a language with a module system, like ML, would names for new constructors for these open sum types be module-qualified? If so, that would address most name collision issues. If not, then I'm curious how a name collision would be handled.

Interesting, this "open sum type". Me'thinks me'likes....

In a sense, yes

Constructors are "generative", which means that each evaluation of a constructor declaration creates a fresh first-class constructor. This is the case for exceptions already. Nothing really to do with modules, you can observe it with functions like here:

fun f () = let exception E in (E, fn g => g () handle E => ()) end
val (exn1, handler1) = f ()
val (exn2, handler2) = f ()
val _ = handler2 (fn () => raise exn1)  (*) will not catch the exception...

Ah yes, roger that.

Ah yes, roger that.

Open unions and exceptions are profoundly related

I believe open unions and exceptions are not conceptually orthogonal;
their coupling betrays their deep relationship. One may glean that
relationship from using multi-prompt delimited continuations to
implement local exceptions in OCaml (which were not generally possible
in OCaml until a few days ago, when OCaml 3.12 was released):

http://okmij.org/ftp/ML/ML.html#poly-exn

In fact, if a (typed) language supports open unions (in one form or
another) and the continuation monad, the language expresses
multi-prompt delimited continuations. Conversely, any (typed) language
that supports multi-prompt delimited continuations also supports
reference cells, and hence the other forms of open unions.

Hm, I'm not sure

Hm, I'm not sure I'm convinced. Couldn't you, by the same argument, also claim that exceptions and references are profoundly related? That would seem like a bit of a stretch. What all of these constructs, including prompts, have in common is a form of allocation aka generativity, but beyond that I don't know...

Missing return

I read that link, and the impression I got is that you need to go through all that because the language lacks a 'return' statement.

Statement which can be easily implemented using continuation-passing style, by the way... So I am surprised it is not there.