Working around limitations of whole-program monomorphization

I am interested in writing a compiler capable of doing whole-program monomorphization (similar to MLton) to allow for efficient data representations and other optimizations. However, I am also interested in supporting a number of features such as higher rank types, existential types (wrapped up via data constructors), user-definable kinds, and polymorphic recursion.

Since I'm only interested in being able to remove polymorphism over values, an obvious way of making this work would be to restrict things like polymorphic recursion to occurring over non-value-kinded types. Images of types, combined with explicit wrapping and unwrapping, would allow things like the usual typed interpreter example (pretend Haskell has user-definable kinds):

{-# LANGUAGE GADTs, EmptyDataDecls #-}

-- kind Image_Kind
data Int_Image  -- :: Image_Kind
data Bool_Image -- :: Image_Kind

data Image a where
  Image_Int  :: { int  :: Int  } -> Image Int_Image
  Image_Bool :: { bool :: Bool } -> Image Bool_Image

data Exp a where
  Val :: Image Int_Image -> Exp Int_Image
  Gt  :: Exp Int_Image -> Exp Int_Image -> Exp Bool_Image
  If  :: Exp Bool_Image -> Exp a -> Exp a -> Exp a

-- eval :: forall (a :: Image_Kind). Exp a -> Image a
eval :: Exp a -> Image a
eval (Val n)    = n
eval (Gt a b)   = Image_Bool (int (eval a) > int (eval b))
eval (If a b c) = if bool (eval a)
                    then eval b
                    else eval c

x :: Int
x = int $ eval (If (Gt (Val $ Image_Int 5)
                       (Val $ Image_Int 3))
                   (Val $ Image_Int 42)
                   (Val $ Image_Int 0))

Language support for getting "images" of types would make this less verbose.

Is there any prior work in this area? How far can you push the limits? Are there any clever compilation schemes for tagging values with appropriate size/type data for selecting amongst monomorphic functions at runtime (which is essentially what you're doing manually with Image_Int and Image_Bool)?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

User-definable Kinds

pretend Haskell has user-definable kinds:

I think that Haskell does support user-definable kinds. See type families.

Though Haskell implements this via added polymorphism rather than reducing it...


Also, MultiParamTypeClasses can support function selection based on kinds.

It is unclear to me which "whole program" you want monomorphed. Do you mean the program expressed as instance of type Exp? Or do you mean the Haskell program.

Type families are similar to

Type families are similar to user-definable kinds in some sense, but they're not what I'm after. What I have in mind looks much more like Omega. It doesn't much matter for this really.

Anyway, I'm talking about making an entire program, main and all the way down, monomorphic. As I only care to do this for efficiency reasons, I only care about value-kinded type variables being removed.

Even removal is likely not strictly necessary; if the language enforced that variables used in problematic ways (e.g. introduced via a higher rank quantifier or existential quantifier, called at a different type via polymorphic recursion, et cetera) were only allowed when additional information of the data representation was attached to the value classified by that type, it would probably be enough for the compilation approach I have in mind.

For example, you might disallow a type for a constructor like exists a. (a -> Int) -> a -> Foo because you will then not know the size of a when matching on a value of type Foo, but allow a type like exists a. (Box a -> Int) -> Box a -> Foo because you know the box will be of a fixed size and contain information about the size of a. In other words, by requiring that user explicitly box values when sufficient representation information cannot be determined statistically, you can guarantee that boxing will not occur in all other cases.

To simplify, I want fancy types and additional polymorphism without resorting to a universal (i.e. boxed) representation of values. I'm hoping this can be achieved with only minor restrictions on the language. I want to avoid penalizing runtime performance for the entire program simply because one minor part makes use of existentials or polymorphic recursion.

Implementing polymorphism

This is a good summary from Greg Morrisett of the different approaches to implementing polymorphism:

Indeed, but this simply

Indeed, but this simply states my problem. It doesn't give an answer to it.

Another disadvantage of monomorphization is that we're really forced
to have 2nd-class polymorphism which I believe is way too limiting.
For instance, it precludes things like existentials, GADTs, and data
structures like you'll find in Okasaki's book that demand support for
polymorphic recursion.

What I want to do is monomorphize in the presence of existentials, GADTs, polymorphic recursion, et cetera. I have some ideas on how the type system can enforce explicit boxing and on how you can move polymorphic recursion over value-kinded types to types of non-value kinds, but I'm curious if anyone else has tread down this path before.

It's all very well to say monomorphization is too limiting, but if your language implementation generates code that's too slow for the task at hand, it's useless. I don't want to have to drop down to C because my language insisted on GADTs for safety reasons!

Edit: I missed the part about "polymorphism as products" mentioned as option five. I'll have to take a further look at that. Is Compiling with Polymorphic and Polyvariant Flow Types what I'm looking for? Perhaps such an approach might reduce the number of restrictions required.

There might be a sixth approach, though

There might be a sixth approach, though, successfully attempted before:


Type Inference for Late Binding. The SmallEiffel Compiler.

Efficient Dynamic Dispatch without Virtual Function Tables. The SmallEiffel Compiler.

I'm not sure how that could relate/be feasible wrt the GADT-related support in your language, though, but these sets of "living types" computed at compile time (Colnet's papers above) also have some similarities with the "products" mentioned in that fifth approach (of Morrisett's notes).