Embedded Interpreters

This is a tutorial on using type-indexed embedding projection pairs when writing interpreters in statically-typed functional languages. The method allows (higher-order) values in the interpreting language to be embedded in the interpreted language and values from the interpreted language may be projected back into the interpreting one. This is particularly useful when adding command-line interfaces or scripting languages to applications written in functional languages. We first describe the basic idea and show how it may be extended to languages with recursive types and applied to elementary meta-programming. We then show how the method combines with Filinski's continuation-based monadic reflection operations to define an extensional version of the call-by-value monadic translation and hence to allow values to be mapped bidirectionally between the levels of an interpreter for a functional language parameterized by an arbitrary monad. Finally, we show how SML functions may be embedded into, and projected from, an interpreter for an asynchronous pi-calculus via an extensional variant of a standard translation from lambda into pi.

Another paper from Nick Benton.

Like the previous one this paper is dense and detailed, but this time there are some useful practical techniques that may come handy next time you build a DSL in a functional language.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Another Take

This sounds virtually identical to Norman Ramsey's Embedding an Interpreted Language Using Higher-Order Functions and Types. Not that there's anything wrong with having consensus in the community...

You're right, ...

... he even says so:

We have already mentioned Ramsey’s (2003; 2004)
independent use of the same basic technique to embed the LuaML interpreter.

Type Classes

In the LuaML paper, it briefly mentions that Haskell's type classes simplify the interface. This isn't too surprising as the EP type is just a dictionary and the functions using them are just dictionary creation functions. All of that is handled behind the scenes with Haskell's type classes.

Some excerpts of a variation of one of the earlier interpreters in Benton's paper:

data U = UF !(U -> U) | UP !U !U | UU | UB !Bool | UI !Int | US !String
    deriving Show

class EmbedProject a where
    embed :: a -> U
    project :: U -> a

instance EmbedProject Int where
    embed = UI
    project (UI i) = i

instance EmbedProject U where
    embed = id
    project = id

instance (EmbedProject a, EmbedProject b) => EmbedProject (a,b) where
    embed (a,b) = UP (embed a) (embed b)
    project (UP a b) = (project a,project b)

instance (EmbedProject a, EmbedProject b) => EmbedProject (a -> b) where
    embed f = UF (embed . f . project)
    project (UF f) = project . f . embed

iter :: U -> (Int -> U -> U) -> Int -> U
iter m f 0 = m
iter m f n = f n (iter m f (n - 1))

builtins = [("*", embed ((*) :: Int -> Int -> Int)),
            ("true",embed True),
            (">",embed ((>) :: Int -> Int -> Bool)),
            ("toString",embed (show :: U -> String)),
            ("iter",embed iter)]
            
eval (Id v) env = fromJust (lookup v env)
eval (Int i) env = embed i
eval (String s) env = embed s
eval (Bool b) env = embed b
eval (App f x) env = project (eval f env) (eval x env)
eval (Pair a b) env = embed (eval a env,eval b env)
eval (Let v e b) env = eval b ((v,eval e env):env)
eval (If c t e) env = if project (eval c env) then eval t env else eval e env
eval (Lam v b) env = embed (\x -> eval b ((v,x):env))
eval (LetFun f v e b) env = eval b ((f,f'):env)
    where env' = (f,f'):env
          f' = embed (\x -> eval e ((v,x):env'))

Some things are simplified by using type classes: We only need to provide enough information to resolve to a monomorphic type for both embedding and projecting, often times that can be no information at all. There is no need to manage dictionaries explicitly, i.e. it's likely quite easy to mess up the definition of '-->'. embed and project are useful in defining the evaluator. All that's necessary to embed another type is another instance (and likely changing U as it's currently, but that problem can be gotten rid of by other techniques.) Supporting monads should be straightforward.

"supporting monads should be straightforward"

Actually, no. See the discussion in Section 6.1 about why you can't embed into a monadic interpreter in Haskell or in ordinary ML. What's surprising is that by using an amazingly clever idea due to Andrzej Filinski, you *can* do it in an ML with control operators (e.g. SML/NJ or MLton (but unfortunately not SML.NET)).

I didn't mean via the reify/reflect way

... which is obviously impossible in Haskell (unless you make all functions be in a monad that supports shift/reset but that kind of defeats the point though it may still have some use). The issue in the paper is how to turn a "pure" type into it's monadic equivalent. However, it should be mostly straightforward to simply embed the monadic equivalent. This means you can't embed map, but instead must embed mapM (actually uncurried functions may be a bit more convenient).

How about GADTs?

It seems like we could use GADTs to (mostly) use the host language's typesystem to represent the hosted language's.

type F s t = s -> Eval t

data Expr t where
  Id   ::                 String -> Expr t
  Con  ::                      t -> Expr t
  Lam  ::       String -> Expr t -> Expr (F s t)
  App  :: Expr (F s t) -> Expr s -> Expr t

eval           :: Expr t -> Eval t
eval (Id v)    = lookupTypedVar v
eval (Con x)   = return x
eval (Lam v e) = return (\x -> setLocalVar v x e)
eval (App f x) = do f'  Eval t
...
setLocalVar    :: String -> s -> Eval t -> Eval t
...

Functions like lookupTypedVar and setLocalVar require some work, but at the very least they can be done with an embedding/projection pair and a typeclass.

Unfortunately, lifting host functions into the interpreted language is tricker this way because we have to transform α → β into α → Eval β so that plus :: Int → Int → Int becomes embed plus :: Int → Eval (Int → Eval Int). There's probably a clever way to do that, but it escapes me.

Also, there's the question of how to parse the interpreted language and produce the appropriately-typed values of Expr. I think there's a trick for that, but I could be wrong.

Re: How about GADTs?

Unfortunately, lifting host functions into the interpreted language is tricker this way because we have to transform α → β into α → Eval β so that plus :: Int → Int → Int becomes embed plus :: Int → Eval (Int → Eval Int). There's probably a clever way to do that, but it escapes me.

This is covered in the paper and was the issue Nick Benton raised with my comment. In the particular case you mention you can do it and should be able to finagle the type class machinery to do it for you, but in the general higher order case you have problems: how do you turn an: (A → B) → C into (A → M B) → M C ? You'd have to have some way of turning (A → M B) into an (A → B) which isn't possible in general unless the effects represented by M can be handled implicitly.

Re: How about GADTs?

You're right, that loses the "higher-order" aspect.

I suppose if more functions were defined over generic Arrow types, this would be less of a problem. Just use Kleisli m instead of (→).

(On the other hand, I wouldn't want to have to explain type signatures like ArrowFunctor α φ ⇒ α β γ → α (φ β) (φ γ) to newbies.)

Extending to projection errors

I attempted to extend the example above with simple error handling when projection fails, ie:

class EmbedProject a where
    embed :: a -> U
    project :: U -> Maybe a

It all goes straightforwardly...

instance EmbedProject Int where
    embed = UI
    project (UI i) = Just i
    project _ = Nothing

instance (EmbedProject a, EmbedProject b) => EmbedProject (a,b) where
    embed (a,b) = UP (embed a) (embed b)
    project (UP a b) = do a' <- project a
			  b' <- project b
			  return (a',b')
    project _ = Nothing

...etc, until it comes to to defining the instance of EmbedProject for functions. embed is ok - incorrect parameters map to return values of the UU type. But I'm lost as to what needs to happen with the project function. The definition belows doesn't compile because (project.f.embed) has type a->Maybe b, but I need something of type Maybe (a->b):

instance (EmbedProject a, EmbedProject b) => EmbedProject (a -> b) where
    embed f = UF (\a -> maybe UU (embed.f) (project a) )
    project (UF f) = project.f.embed
    project _ = Nothing

Any suggestions? Is there something obvious (or non-obvious!) that I am missing?

One way

[I just came across this whole embedded interpreters idea and found it quite interesting. My response in this discussion is probably way too late but perhaps someone might still find it useful...]

This can be solved by splitting the EmbedProject instance for functions into two parts:

instance (EmbedProject a, EmbedProject b) => EmbedProject (a -> b) where
    embed f = UF (\a -> maybe UU (embed . f) (project a))

instance (EmbedProject a, EmbedProject b) => EmbedProject (a -> Maybe b) where
    project (UF f) = Just (project . f . embed)
    project _ = Nothing

This allows to embed any host language function into the interpreter (as before) but projection can only be done onto functions of the form a -> Maybe b.

The reason is that the execution of an interpreted function could fail with a dynamic type error and so an infallible function signature like (a -> b) doesn't work.