Types and reflection

In my day-job as a Java programmer I use a lot of tools that relies heavily on reflection, and I've come up with quite a few uses of reflection that can simplify my day job. Having also become quite fond of OCaml and it's powerful type systems I've started wondering if combining reflection with a powerful type system is possible. The two features seem quite at odds with each other, reflection completely undermines the type system, something I also see every day in my day-job.

Has anyone looked at ways of combining the power of these two language features? In OCaml I'll have to resort to something like camlp4 if I want to do the stuff I use reflection for in Java. But it's seems to me that there might be a middle ground between syntactic extension and the metaprogramming allowed by reflection. Or is there some fundamental reason why this is impossible? As you probably understand I really don't have any clue what this is called, or if it exists, or if it's useful, so I'm curious about anything that might shed some light on it.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Reflective OCaml

Good subject!

Using my terminology: Reflection is an abstraction breaking mechanism, where as the type system exists to ensure the integrity of abstractions.

It is quite interesting to think about how these combine, and Frank's suggestion is a good one. My personal take is that there's still room for new ideas concerning these issues. I think we need better abstraction breaking mechanisms (i.e., less powerful than reflection) which will allow for more controlled abstraction breaking, by utilizing the type system. Reflection is too powerful to be left in the hands of programmers ;-)

I like...

Why not?

My take on reflection: reflection is the ability to reason about the current context (ie the environment, situation and/or state) This suggests a knowledge base containing context information and other knowledge about the context (ie more rules and facts). Further in order to make use of this we need to be able to act on the context to change things, while keeping track of the changes. A language for doing this is called Lewis .

They should combine quite nicely

Ocaml's type system is 100% compile time- the executable code generated has no type information (well, almost none- non-pointers are tagged, but that's about it). Reflection is pretty much 100% run time.

Ints would remain tagged, with the low bit set to 1, chars and booleans stored in words whose low two bits would be 10, pointers would be words with the low two bits being 00, and objects on the heap would have their class pointers stored with them in memory. Variant types would be implemented as pointers to "special" words in memory. So the getClass function, written in C, would look something like:

class_t * getClass(word_t value) {
    if ((value & 1u) == 1) {
        /* it's an int */
        return &intClass;
    } else if ((value & 2u) == 2u) {
        /* it's a char or boolean */
        if ((value & 4u) == 4u) {
            /* it's a char */
            return &charClass;
        } else {
            /* it's a boolean */
            return &booleanClass;
        }
    } else {
        /* it's a pointer or a variant type- class
           pointer is stored just before the object.
        */
        return (class_t *) ((word_t *) value)[-1];
    }
}
Adding reflection and run time type checking would have a small cost in performance, but the success of languages that have it (include Java, C#, Python, Ruby, etc.) suggests that the benefits outweigh the costs.

Reflection v. Introspection

Introspection (sometimes known as structural reflection) is easy to add to just about any language (with a preprocessor at the worst). Simply store the type information and such in some standard place, usually extra methods on an object in OO languages. This is often handy to generate "boilerplate" type code automatically at runtime. I'm not sure if andnaess means just this. Behavorial reflection is the stickier one. It includes operations to change things at runtime. In simple cases, this is not much different, though it does lead to the ability to break invariants by violating the abstractions. But, for more ambitious types of reflection, the issues are at least technically complex especially when one wants to combine with and reflect on rich type information. In more ambitious schemes you can completely redefine a method/function/whatever at runtime. This, obviously, utterly destroys the type guarantees. (In really ambitious schemes, it is possible to redefine the very language itself.) I believe Java (and probably now C#) are the only statically typed languages that support this to any degree (and in Java's case, in typical baroque Java style). Still, it does look like you could either not allow functions to change type (but allow the implementation to change if it types the same) or you could simply retypecheck the affected code when the change is made (this involves having the typechecker available at runtime; note you still keep the guarantees but it is now possible to get type errors at runtime ("load" time rather) and of course you'd need a "bulk" update operation). Ironically, the dynamically typed language Self has done a good chunk of research on how to make exactly this reasonably efficient. I personally think the pieces are in place to make a usable statically typed language with fairly sophisticated reflection capabilities, but I don't think it has been done yet to the extent I believe possible nor do I think that it would retrofit onto an existing language particularly well without some work.

I'm not sure about that termi

I'm not sure about that terminologi. The way I read it, structual reflection is reflection regarding the datastruktures of the language, and behavorial reflection is reflection regarding the very language itself. Introspection or not introspection (there need to be a better word for this) is orthogonal to that. Also there is the possibillity of static behavorial reflection, wich could perhaps be staticly typed.

Me neither

I do intend for structural reflection to be reflection on data structures, but often this is restricted to just being able "look at" them and not modify them or only "modify" them in trivial ways (that incidentally preserves the structure) e.g. changing the value of a field, but not, say, adding and removing methods. Introspection is, to me, quite definitely not orthogonal to "reflection". It is the reification part of reflection without which you have little more than dynamic loading (useful but not as hard). My impression is often introspection is used to emphasize that it is only the reification part. I am probably abusing the term "behavorial reflection". It is often used to mean specifically being able to modify language constructs. I am using it in a way that is likely less strong though I include that sense as well.

I didn't mean to say that int

I didn't mean to say that introspection is orthogonal to reflection. Its definitly a type of reflection, and probably the most usual. Just that its orthogonal to the division between behavioral and structual reflection. Also note the existence of behavioral introspection like the eval and apply functions.

In a FPL....

Wouldn't a reflective operation that purported to modify a type NOT modify the type (possibly breaking invariants of all sorts; invariants assumed by the compiler when typechecking or inference occurred), but instead create a NEW type?

THAT would be somewhat interesting, I would think. Of course, if you create new types at runtime then, out of necessity, you must defer typechecking until then, but I'm not allergic to dynamic typing. And while many statically typed languages do type-erasure as an optimization; there is no REQUIREMENT that this be done--objects could still have an implicit pointer to a vtable or type object--a pointer which can be ignored if the compiler can assign definite types to a given term.

One more point: The languages that have the really gnarly reflective capabilities--Smalltalk and CLOS come to mind (and the advocates of both languages often claim that advanced reflection is a required feature for them to take a language seriously), usually use this feature as part of the DEVELOPMENT environment. They way you edit a class in Smalltalk is you use the object brower to manipulate the class definition, using the reflective capabilities built into the language. It's far less common, AFAIK, for running PROGRAMS to mutate existing types as part of their normal operation. With modern IDEs that offer the apparent seamless of image-based environments, yet maintain rigorous isolation between the source code of a program and its running instances, I'm not sure full-boat reflection is a necessary or desired feature anymore.

Typechecking at runtime

"you must defer typechecking until then, but I'm not allergic to dynamic typing"

I have said this a few other places before, but I think it is important that running the typechecker at runtime is (for rich type systems at least) different from dynamic typechecking.

I dont think I understand

It's diffrent from most dynamic type systems of today, but how do you rationalize it not being dynamic typing at all? And what do you then mean by dynamic typing?

Agreed...

...though I was intentionally avoiding the question of whether formal typechecking (assigment of types to terms) or simple verification of operations on objects (how "dynamic typing" is usually implemented) is employed.

Virtually all production dynamically-typed languages do the latter. How much research has there been on the former? If coupled with dynamic translation (say, from bytecode to machine code), it could conceiveably make dynamic languages run much faster; and produce more understandable type-error diagnostics than #doesNotUnderstand.

"Runtime" typechecking

Yes, I have thought of the case of "future" "dynamically" typed languages. However, I still think it is fundamentally different and pushing a "dynamic" typechecking toward it eventually would result in something rather different (from a dynamically typed language). That said, I can't prove this because there is no commonly accepted formal definition of "dynamic typechecking". Still some things that are different from dynamic typechecking, some aspects: any code is typechecked before it is executed (i.e. at load time) and as a result all code is typechecked once instead of everytime through. Type erasure is still completely possible (though for the purposes of reflection this is probably less desirable, it is also probably useful for efficiency reasons as well). Non-local features of type systems e.g. unification of polymorphic types can be dealt with which is rather beyond the "tag checking" of most (all?) current dynamically typechecked languages. Properties and guarantees of static type systems will be maintained e.g. parametricity. On the flip side, the restrictions will remain as well, it will be possible to have "runtime typechecked" code that won't "run" (load) that would under today's dynamically typechecked languages.

In a nutshell, we are simply moving "compile-time" to "load-time", this should not automatically drastically change the nature of the language. A point in case is hs-plugins which does (or did at least) simply call the compiler at runtime and link in the resulting code.

OK. I think I understand now.

OK. I think I understand now. Yes loadtime type checking is realy neighter dynamic or static typing. Actually I think the terms static and dynamic typing is losing there meaning. Few languages if any has purely static or dynamic typing, everythings a hybrid.

What uses?

In my day-job as a Java programmer I use a lot of tools that relies heavily on reflection, and I've come up with quite a few uses of reflection that can simplify my day job.

I'd be interesting in hearing how you use reflection, if you don't mind sharing. It's always nice when we can bring real-world requirements into the mix.

It's not advanced stuff

One example is form processing and validation. For example, I can write a class that has some properties, and based on the types of these properties the form is processed and handled. So that takes a lot of drudgery out of the coding. I could of course do this with code generation, but it feels like overkill.

A lot of it comes from the concept of "configuration by convention". I tend to use a lot of conventions in my code, and by using reflection I can in fact enforce those conventions at the same time as I save the world from more XML pollution.

But most importantly is probably the tools I use. Hibernate for example uses a lot of reflection. The Spring framework has it all over the place. The web framework we use is full of reflection code. And the problem is that I'm getting a lot of run-time errors that I suspect could be prevented by a type-checker.

I suppose what I'm seeing is that I want macros. Or some other means of extending the language here and there. But I hate having to give up compile-time type checks because after having seen how powerful they can be (OCaml), I have gone from being a fan of dynamic typing to be being a fan of static typing.

So I'm imagining that there is this wonderful point where OCaml and Lisp meet perfectly, and that we unfortunately haven't found it yet.

XmlRpc and Haskell

Type based handling is possible. As a demonstration, I write unit tests for Plone via XmlRpc from Haskell.

The XmlRpc library does the right thing for outgoing and incoming values based on the declared types of the arguments.

.keyglyph, .layout {color: red;} .keyword {color: blue;} .str, .chr {color: teal;}

remoteTestList = 
    TestList 
    [
     (remote url "ping" :: IO String) ~>= "hello"
    ,(remote url "addtwonumbers" (4 :: Int) (2 :: Int) :: IO Int) ~>= 6 
    ,(remote url "getgroup" "shapr" :: IO String) ~>= "group_shapr"
    ,(remote url "setvalue" "foo" :: IO String) ~>= "foo"
    ,(remote url "getvalue" :: IO String) ~>= "foo"
    ,((remote url "setvalue" "xyzzy" :: IO String) 
      >> (remote url "getvalue" :: IO String)) ~>= "xyzzy"
    ,(remote url "creategroup" "shapr" :: IO String) ~>= "group_shapr"
    ]

(Pretty colors from HsColour. The operators are from HUnit, slightly extended to compare a monadic result to a pure expected value.)

Polyvariadic functions in Haskell

And just in case you're wondering just how that works, see Oleg's polyvariadic functions in Haskell.

Could you clarify

One example is form processing and validation. For example, I can write a class that has some properties, and based on the types of these properties the form is processed and handled. So that takes a lot of drudgery out of the coding.

It's not clear to me as to why this would require reflection. Couldn't this be done with plain OOP?

Because of types

I basically say that "this class represents a form", the class might have the properties

String name;
Integer volume;
BigDecimal price;

And when I process the data coming in from the web (strings) I validate the name property as a String, the volume as an Integer, and the price as a BigDecimal. I have to use reflection to know the types of the properties is. I could of course use regular OOP, it's just that I like the more "magic" approach because form processing is something I do all the time, and the code is extremely boilerplate-loaded.

But this is not a very good example because there's no real problem with type safety. What got me thinking was that I was pondering how to add some more sophisticated stuff using annotations, and I found that it would be nice if I could use annotations to set up methods to call. But then the method call would happen using the reflection API, and that means that type safety goes out the window. The method name would just be a string.

This example suggests to me t

This example suggests to me that you might want to think backwards from the way you usually think when working with languages like O'Caml.

What do I mean by that? Well, instead of thinking in terms of having a data structure that contains a string field, an integer field, and a bigdecimal field, and wanting to automatically infer which form validation routines to use for each, approach it the other way around. (Notice that I used the word infer here--that's because by turning things around, we make use of type inference to avoid redundancy.)

Here's an example of how that might work for the above:

(* NOTE: All this code was written on the fly, totally untested, etc. *)

(* validators raise this exception if validation fails *)

exception Validation_missing
exception Validation_failed of string

(* A validator takes a string (field name or partial field name),
 * followed by a form request structure, and either
 * returns a value of the appropriate type or raises
 * Validation_failed with an explanation *)
type 'a validator = string -> form_request -> 'a

(* The string validator accepts any string *)
let string_validator field_name req =
   match get_field req field_name with
     | Some x -> x
     | None -> raise Validation_missing

(* The integer validator only accepts integers *)
let int_validator f r =
  let s = string_validator r f in
  if parses_as_integer s then make_integer_out_of s
  else raise (Validation_failed "That's no integer!")

(* Here's a more complicated one *)
let point_validator f r =
  let x = int_validator (f ^ ".x") r in
  let y = int_validator (f ^ ".y") r in
  Point (x,y)

(* And a combinator *)
let option_validator v f r =
  try Some (v f r) with Validation_missing -> None

(* And finally, a function using the above: *)

let my_request_handler r =
  let name = string_validator "name" r in
  let volume = int_validator "volume" r in
  let optional_price = option_validator int_validator "price" r in
  ...

Now, there are a couple of things going on here. I think it's clear how the type inference is working for us: Now, we define a validator, and let the type of the validator define the type of our resulting value. This also happily provides "for-free" the ability to have different validators for the same resulting type.

Another thing that's going on here is that we can have higher-order validators, like option_validator. We could also have, for example, a list validator that automatically chops up a list for us, and uses a sub-validator on pieces.

And finally, note that there are various tricks that can make the above even more syntax-light. See http://www.brics.dk/RS/98/12/ for a great example of how this works for that great reflection-like troll of yore, printf. Also search around for stuff about parser combinators to see just how scarily powerful this can get on the "validation" side.

Anyway, I just wanted to point out that instead of breaking down data structures to find out what code to run, you can run code to find out what data structures to build. :)

Thinking backwards is hard

And the longer I get stuck in Java land, the harder it gets. That's why I play with OCaml in my spare time.

Anyway, this was an interesting read although I'm not entirely sure I understood everything.

However, my example was extremely simplified, I only answered the question -- why I needed the types -- there's more to it. My plan was to use annotations as well, as in this example from Sun: http://java.sun.com/developer/technicalArticles/J2SE/constraints/annotations.html

So the types are probably the smallest part of the validation, I just though that since I have to declare types in Java I might as well try to put them to good use! A second reason for doing it this way is that the form object is responsible for reading the request and keeping track of the original values as posted by the user, because you want those values to redisplay if the form failed to validate, no matter how wrong they are. Also, I actually want form objects to know how to render themselves, or have some FormRenderer that can introspect into a form and based on type information and annotations, render the form fields using input, textarea, checkbox etc.

But as I'm sitting here I think I'm starting to appreciate your idea more. Take for example the integer validation. For integers you probably want to test min/max values. In some cases you may also want more general tests such as simply negative or positive, and one can probably come up with uses for more exotic things such as that the integer is odd or even, or a power of two. So, you could create the various validators that are able to check each little aspect, and you can combine them to form the validator you want. That's a very powerful approach indeed!

But alas, it does not help me get rid of the more tedious tasks of using the right form field for the input, and keeping track of the user's actual input variables. I also need to make sure all errors that happen during validation are automatically kept track of, I don't want all my form handling code to contain the same repetitive boilerplate code that checks if each validation succeeds, and if not, store the error message somewhere.

What I really want to do is encode 5 years of experience dealing with web forms in code. I have a fairly good idea about sensible defaults that will work in 80% of the cases, so I'm trying to apply the 80/20 rule here. But it's crucial that every default can be overloaded for those 20% of cases where the defaults are not good enough. Reflection allows me to infer these defaults from a combination of type information and annotations.
question.

General links

Since we are on the topic of reflection, you might want to check out these two threads and reflect on them.

Macros?

I finally had a chance to look at MetaOCaml, and to me it looks very much like macros. Is there a difference?

And as always...

... I was confused by the interface and posted a comment to my own posting. I was viewing the post from Frank Atanassow, and that was the post I wanted to post an answer to.