How useful is reflection, anyway?

In a post in a different thread, Ben Titzer explains why he doesn't include reflection capabilities in his language Virgil, a language intended for very small and/or deeply embedded systems. In the context of his requirements, the exclusion makes perfect sense.

Introspection (the ability of a program to inspect itself at runtime) is well-established as a powerful facility--especially in languages with otherwise weak typing systems, or in dynamically typed languages. Some argue that it breaks Liskov substitution (as via introspection one can discover the differences between a subtype and a supertype; differences which are not visible from the interface of the type), but it's useful to have around.

But what of reflection--the ability for a running program to *modify* itself--changing subroutines or type definitions "on the fly", and in some cases having extant data objects reconfigured when the underlying schema changes? Many LISPs (culminating in CLOS; far less so in Scheme) are highly reflective, as is Smalltalk. More recently, many of the popular scripting languages feature reflective capabilities. RDBMSs, which generally have no phase distinction as they have to be running 24/7, are highly reflective systems--the schema of a particular relation (table) is defined in a different relation, which can be altered by a user with sufficient authority (a DBA and nobody else, one hopes).

Excluding the RDBMS example, and other high-availability, dynamically-reconfigurable systems--the most common use of reflection capabilities (again, here I mean mutating runtime structures on the fly; not merely inspecting them) seems to be... debugging and development. A common complaint about "traditional" languages with a phase distinction is that the edit/compile/run/debug cycle is too long; and that use of reflection improves programmer productivity. OTOH, many non-reflective languages nowadays come with rapid-turnaround editing environments (whether a REPL or something fancier).

Again excluding databases and such ('cause they've been covered above, not because I don't think they are worth considering)--and now excluding debugging/development--what about production code? Should deployed code, in the general case, ever be using reflection capabilities to "rewrite" itself--epsecially "destructive" changes? What user requirements call for reflective capabilities--requirements which are harder to implement in languages with a phase distinction (and lack of mutability of structures handled in earlier phases)?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Much

A good survey on reflection is Types and Reflection.

reflection, product definitions, politics

The most common use of reflective systems is the "complete system" of a "general purpose computer." In that context, reflection is what allows users to control what software runs on the computer.

So, if your programming environment as a whole doesn't include reflection, then you have made a tool which, by form and function, creates a power imbalance between those who only get to run the software vs. those who get to modify the software.

The user need that can be partially satisfied by complete reflection is the need to be able to use software in freedom.

A "complete system" in that sense doesn't so commonly coincide with where programming language borders are typically drawn. A unix-like system, for example, is comprised of code in many conventionally distinguished languages. There's a "naturalistic" perspective, perhaps, in which -- so combined -- those various languages can be regarded as a single "union language".

-t

I was big on reflection once

I was big on reflection once too. I now think that staging, polymorphism, and polytypism is just about as powerful as reflection, but safer.

[Edit: I would be curious to hear of any applications that can't be handled by the combination of the above three.]

"Safer"?

It seems to me you are making an assumption that computational reflection is unsafe. What theoretical argument is there for it being unsafe?

Ben Titzer says:

ii.) Most programmers are either not competent enough to be trusted, or are simply not geared to that mindset. Automatically transforming the program on a metalevel opens up not only the gates of heaven, but the sheer pits of hell. As a potential user of other people's code, I definitely _do not_ want to debug their meta-whiz-bang transformation along with the original program it is operating on.

It seems to me like Ben recapitulates the safety argument in the quote above.

In addition, Scott Johnson asks, "How useful is reflection, anyway?" In a nutshell, the answer is that reflection is the path of least resistance whenever you need the benefit of interrogative knowledge. Be careful not to envision anything in terms of its usefulness ahead of envisioning its semantics. The first rule of programming is to figure out what you want to say before you figure out how to say it. If what you want to say is interrogative, then reflection is a good candidate for how to say it.

The troubling thing I see in the safety argument is that the arguments presented are environmental factors that probably can be addressed through increased education and engineering simplicity. If I was to raise a safety concern, it'd be about concurrency. Obeying "one fact, one place, one time" is necessary for interrogative statements to have any value whatsoever.

o O

In addition, Scott Johnson asks, "How useful is reflection, anyway?" In a nutshell, the answer is that reflection is the path of least resistance whenever you need the benefit of interrogative knowledge. Be careful not to envision anything in terms of its usefulness ahead of envisioning its semantics. The first rule of programming is to figure out what you want to say before you figure out how to say it. If what you want to say is interrogative, then reflection is a good candidate for how to say it.

Now I need the benefit of interrogative knowledge.

The first rule of

The first rule of programming is to figure out what you want to say before you figure out how to say it.

You can't convince a petite bourgeois to go wild. Even transgressions have to be severly limited. So it matters quite a lot on how you want to say something on what you want to say. I don't see why programming shall be different here. Maybe because there is the enduser as a superego? But once a programmer programs for other programmers the lines become fuzzy.

It seems to me you are

It seems to me you are making an assumption that computational reflection is unsafe. What theoretical argument is there for it being unsafe?

Even a cursory reading of the post points this out: reflection involves changing functions and types on the fly. What type system do you know of can safely do this?

The deeper answer is: I'm not assuming anything, I'm stating a conclusion I've reached after years of reading and research into reflective systems. Structural and behavioural reflection break abstractions thus violating their invariants, and it violates parametricity, thus hindering the expressiveness of the language. Thus, it is unsafe.

Geoff Washburn has proposed a line of research that seeks to address these limitations, and most of the arguments against undisciplined reflection as found in most current systems are described in detail there.

Also, as Matt M said, without a proper understanding of what "interrogative" means, or at least what sorts of problems are naturally solved by "interrogative" solutions, I remain unconvinced that reflection is necessary.

One feature I did neglect to mention in my prior post was dynamics. Once a language has the aforementioned features + dynamics, I can't think of any problems that can't be solved equally well in a statically typed language.

[Edit: my personal opinion is that polytypism + staging can provide any sort of dynamics one would need, but that's only a conjecture a this point.]

Even a cursory reading of

Even a cursory reading of the post points this out: reflection involves changing functions and types on the fly. What type system do you know of can safely do this?

Ooh, Ooh! Pick me! Is it a dependent type system?

(I can't say I see the need for reflective code, but that doesn't mean you can't prove it to be correct.)

You can the definition of a

You can actually modify the definition of a function on the fly in a dependent type system?

[Edit: corrected the question so it actually made sense. In any case, that capability of dependent types is new to me.]

Worse

You can make another functions definition dependent on some aspect of that functions', TYPES OR VALUES. Scary, but, perhaps this is the logic we're really looking for in single language generative programs. Of course programs that generate programs in another language are a different story.

Yes, I realize this, which

Yes, I realize this, which is why I find it hard to believe that one could redefine functions and types on the fly. Dependent type systems are theorem provers, which means that functions and types have to meet very strict dependencies for the language to remain safe, which is a really tall order for reflection-constructed types and functions.

Hope you don't see this as nitpicking, but...

Structural and behavioural reflection break abstractions thus violating their invariants, and it violates parametricity, thus hindering the expressiveness of the language. Thus, it is unsafe.

The wording here is what bothers me.

Abstractions are based upon assumptions; they are merely symbols without them. The assumptions are what makes abstractions possible (e.g., digital logic would not be possible without the notion of a propagation delay). If the assumptions change, then the specification has changed. The biggest concern with regard to invariants should be how you guarantee that the code you replace is trustworthy. It's one thing to say the invariants have been preserved, and it is another the preserve them. Mark Miller brings this to light in his discussion of capability security and the E programming language.

[Edit: my personal opinion is that polytypism + staging can provide any sort of dynamics one would need, but that's only a conjecture a this point.]

Great. I always suggest starting with some anecdote can be a great way to find a topic to research. It gives you passion and interest. However, at this point, it's only an opinion -- and it's intuitively appealing to you. Have you given any thought as to how you plan to flesh this intuition out into something rigorous?

Don Chamberlin, influential in the design of SQL, began his fascination with terse, powerful database manipulation syntax and semantics through what he and his friend called The Query Game. In a nutshell, he had a problem domain -- typical inventory warehouse with distributorships -- and questions that might be asked about the problem domain by domain experts. He would then try to imagine the best way to query the database to get that information, comparing Codd's style to Bachman's style, &c. SQL leaves a lot to be desired, particularly in data mining (a problem domain where the target of a data manipulation is unknown, something SQL took for granted), but this anecdote demonstrates a potentially effective way for you to begin building up your claim. I'd suggest making a Game of sorts for yourself. Perhaps The Looking Glass Game? :-P

Abstractions are based upon

Abstractions are based upon assumptions; they are merely symbols without them. The assumptions are what makes abstractions possible.

I'm not sure what definition of abstraction you're using, but an abstract data type (abstraction) has a fairly well-defined meaning in the literature. Assuming it is a fully sealed type with a set of operations provided for it, you cannot access the internals of the type in anyway or you violate fundamental assumptions of the very language in which you have expressed your solution.

Mark Miller (MM) would agree with me. See his "defense of manual serialization". Reflection is often used for serialization, and MM instead defends the position that objects must be active participants in any serialization protocol. He's mostly right.

Types help you maintain invariants, as does encapsulation. Reflection violates at least the latter.

Abstraction!=ADT

Abstraction isn't always short for abstract data type. It certainly isn't in "lambda abstraction"!

True, but when talking about

True, but when talking about structural reflection, we are talking about abstract data types. :-)

Confidence vs. certainty

Types help you maintain invariants, as does encapsulation. Reflection violates at least the latter.

A QA process like Cleanroom Software Engineering (CSE) demands far more than types and encapsulation to maintain invariants. A rule of thumb I use is that types provide confidence, but not certainty. Reflection provides a way to do live maintenance of the code. I already spoke about trustworthiness.

you cannot access the internals of the type in anyway or you violate fundamental assumptions of the very language in which you have expressed your solution.

There are four levels of specification: defined, implementation defined, unspecified, and undefined. It is certainly allowable to change the internals of a type and swap out an implementation defined attribute in response to new needs. First, think about how you maintain code and improve its performance. Second, think about systems that require 100% uptime. You can measure in armored trucks the bags of money banks deliver to IBM for mainframes, so there is certainly a real world, practical demand. How is hot swapping hardware in a mainframe that different from computational reflection?

Scale

When swapping out a board, or a disk, or even a processor--one is dealing with components on a fairly large scale--components which are designed to be exchanged in this manner. (Adding hotswap capability to a circuit board or assembly greatly complicates the design).

The equivalent in software is replacing a DLL, or an executable, or a kernel module or driver. Again, such "modules of deployment" are designed to be configured by end-users or administrators--folks besides the originating programming team.

The hardware equivalent to a highly reflective PL would be the act of replacing a resistor or other component, or adding a "green wire", to a circuit board **while it is energized and plugged into the system**. To do so safely at that scale, one would either a) have to really know what they are doing, and be especially handy with the tools, or b) design the system such that individual parts can be replaced in that matter.

Many reflective systems do in fact take the latter approach--much of the complexity of CLOS is not in rigging record types on top of lambdas (which isn't hard to do), but in adding sufficient levels of inderection and metacruft to permit class redefinitions to be performed "safely". "Safely" doesn't mean your program won't fail in some way--using reflection to delete the accountNumber attribute from the all instances of bankAccount class is seldom a good idea (unless one is sure the data can be re-created if needed and the change is "only" a refactoring)--it simply means that your problem won't experience an uncontrolled crash.

Likewise, Smalltalk pays dearly for the become: method (which changes the class of an object--a very lightweight reflective technique). Implementations either need to use tombstones, or tracing techniques to hunt down all references to the object and re-seat them. And can become: be used in a static type system?

In an unconstrained reflective system, where you can alter or delete behaviors or attributes at will--it is quite difficult to reason about the behavior of the program beforehand--you can't assume that any type will support any particular behavior (even if extent at compile time)--because the rug can be yanked out from you at runtime; and showing otherwise is (in general) undecideable. The best you can do is ensure that attempts to shoot oneself in the foot produce a controlled and immediate failure rather than a crash.

With constrained reflection, you at least have a chance.

A rule of thumb I use is

A rule of thumb I use is that types provide confidence, but not certainty.

Types provide certainty, you just have to be clear on what certainties they provide. The more powerful the type system, the more certainty you can attain.

Reflection provides a way to do live maintenance of the code.

Reflection is not needed for live maintenance and/or upgrade. As a matter of fact, I'd say reflection leads you to exactly the wrong solution. Why would you ever do "live" maintenance on a deployed solution? ie. live debugging and patching. Arguably, the right way is to replicate a live system to a test server and debug THAT, then upgrade the live code with the fix. That is readily achievable with only dynamics support.

For a hint as to how, consider how Erlang provides upgrade: a server process receives an "upgrade" message, which provides a closure to the new server function. The old server function then performs a tail call to the function with its private state. How do you load that new function? Dynamics. That's all you need, and full static type safety is preserved. Where does reflection enter the picture?

I already spoke about trustworthiness.

Honestly, "trust" is such an overloaded term that I don't see why you would try to use it as a metric. What is the property of "trustworthiness"? Soundness, safety, robustness, these are all useful, concrete program properties. Types help you with the first two, E's event loop concurrency can help you with the last.

It is certainly allowable to change the internals of a type and swap out an implementation defined attribute in response to new needs.

Sometimes. And how do you know it's safe? If the new type signature matches the old one so existing clients don't break. How does reflection maintain such invariants if it can bypass them? The proposed mechanism contradicts the goals.

Second, think about systems that require 100% uptime.

I've suggested nothing that would preclude high availability systems, as my thought experiment demonstrated.

How is hot swapping hardware in a mainframe that different from computational reflection?

Because hardware doesn't break the defined type interface. If it did, it would cause hardware failures indicating that it needs swapping out. :-)

An ignorant question about Erlang's type safety

For a hint as to how, consider how Erlang provides upgrade: a server process receives an "upgrade" message, which provides a closure to the new server function. The old server function then performs a tail call to the function with its private state. How do you load that new function? Dynamics. That's all you need, and full static type safety is preserved.

I know little about Erlang's type system beyond the fact that it was retro-fitted with one sometime in the late 90s, but this raises worries in my mind. What does the type of the old server process say about the possible types of the closure the upgrade message passes? What formal calculi model this sort of message-passing?

Plugging in closures where they were not expected has caused trouble for PLs with excellent credentials before...

Erlang doesn't have a type

Erlang doesn't have a type system, merely a few static analyses which approximate some simple type checking.

As for closures, I'm not sure what you mean regarding the troubles.

As for your question, this is indeed a tough problem to solve safely. The best answer I have at the moment, is that access to the upgrade function/message shouldn't be available via the public interface, but is instead returned as a second value of the type's constructor. This makes it a capability held by trusted subjects/code. Thus, only the creator, or those to whom he has delegated the capability, are capable of upgrading a live object. Capabilities are good for enforcing integrity like this.

Closure trouble

Thanks for the comments. There could be a really good POPL paper there...

As for closures, I'm not sure what you mean regarding the troubles: I had in mind when I wrote that A Generalization of Exceptions and Control in ML-like Languages, which documents, in section 1.1, a source of failures of subject reduction in SML/NJ using call/cc.

Thanks for the pointer. I am

Thanks for the pointer. I am investigating safe, secure upgrade and other ideas for my language, so I may yet write something. I'll definitely post here if I do. :-)

Dialyzer

Does Dialyzer do anything more better for these issues?

"more" certainty... or "less" skeptical?

A rule of thumb I use is that types provide confidence, but not certainty.

Types provide certainty, you just have to be clear on what certainties they provide. The more powerful the type system, the more certainty you can attain.

It seems we are talking past each other on the confidence vs. certainty issue. I don't see any disagreement, just rhetoric. The idea of "more" certainty can be also stated in terms of confidence: You are more confident and less skeptical that your solution contains inadequacies. Therefore, I don't disagree with you at all. I just think it's a matter of rhetoric; let's stay away from that.

I've suggested nothing that would preclude high availability systems, as my thought experiment demonstrated.

Pardon, but what thought experiment are you referring to? Sometimes these threaded replies make it hard to know what the correct call to reference is.

How is hot swapping hardware in a mainframe that different from computational reflection?

Because hardware doesn't break the defined type interface. If it did, it would cause hardware failures indicating that it needs swapping out. :-)

Fine. :-) Now explain to me how you can describe hardware swapping using an analogy different from computational reflection. Polytypism, polymorphism, and staging are at your disposal, along with any other notions you may need.

It seems to me that you oppose reflection if and only if it breaks the type contract. Is my interpretation correct?

Re: confidence vs certainty,

Re: confidence vs certainty, to me one implies absolutes, the other merely probabilities, but if you think we're in agreement, no need to pursue it further.

Pardon, but what thought experiment are you referring to? Sometimes these threaded replies make it hard to know what the correct call to reference is.

The one referred to in the same post to which you just replied. I gave a simple example for upgrading live code in a statically type-safe way, which is one necessary facility in HA systems.

[Edit: I also elaborated on the approach slightly in another post in this thread, where a capability to the upgrade facility is used to maintain integrity.]

Fine. :-) Now explain to me how you can describe hardware swapping using an analogy different from computational reflection. Polytypism, polymorphism, and staging are at your disposal, along with any other notions you may need.

So basically, you're asking for a model of a finite state machine which represents a hardware interface. How is this any different from any other interaction problem? You can model hardware as a monad, an actor, a process, or just an abstract data type. Nothing fancy required. For instance:

signature HARDWARE =
sig
  type device
  type Response 'b = Unplugged | Ok of 'b
  val write_data: device -> string -> Response int
end

It seems to me that you oppose reflection if and only if it breaks the type contract. Is my interpretation correct?

If and only if it breaks any invariants in the language that one expects to rely upon. Respecting typing is one invariant. Respecting the interface of abstract/sealed/encapsulated objects such that they can maintain the integrity of their private state is another.

Geoff Washburn's thesis, which I link linked to previously, seeks to use flow analysis to ensure that reflection doesn't violate such invariants. If that's indeed achievable, then we can talk about using reflection seriously.

Abstraction

An abstract data type is only one example of abstraction. The ability to write a set of rules determining what the abstract datatype will be depending on the actual data present is another example of abstraction, over the abstract data type. Most mutable reflection systems break this because at some level they deal with string matching instead of type matching, or ruled type matching. Once again dependent types are an abstract mechanism that includes reflection as part of it's abstraction.

Is reflection unsafe if

Is reflection unsafe if generated code conforms to the requirements of the existing code?

For example, in a Java application, classes could be built at run-time. If these classes implement interfaces known by the program, then type safety is not violated.

Not just safe, but right

Part of that point is not just that the reflection/introspection/generation of code can potentially produce code or modifications that are "unsafe"--i.e. violate the language's type system, is also that it is really hard to get things right, even if they do typecheck.

As much as the static type system may catch potential bugs in the base language, I am not convinced it's going to catch logic errors in the construction of new program based on the processing the meta level representation of the program. I'm not sure about you, but I have, and often do, write programs that typecheck but nevertheless do not compute the correct result :)

So even if the safety issues are solved, there is the deeper issue of how I might begin to understand _your_ code that processes yours (and my) program together, and produces new programs and program fragments. And how do I go about identifying and correcting it when it is incorrect (but its output nevertheless typechecks)?

I'm not sure about you, but

I'm not sure about you, but I have, and often do, write programs that typecheck but nevertheless do not compute the correct result :)

Maybe your type system just isn't strong enough. ;-)

So even if the safety issues are solved, there is the deeper issue of how I might begin to understand _your_ code that processes yours (and my) program together, and produces new programs and program fragments. And how do I go about identifying and correcting it when it is incorrect (but its output nevertheless typechecks)?

I've found debugging the reflective programs I've written in C# via the ordinary debugger is generally sufficient. Then again, one could argue that my uses have simply been too limited. I would then argue that limited use of reflection is the only legitimate use. So I think we agree. :-)

One problem of metaprogramming in general

...whether implemented via reflection, preprocessor macros, string ops + eval, syntax-level term rewriting systems (think C++ templates), or standalone tools whose output is fed into a compiler--is that many such systems are hard to debug. Why? Often times, your tools are debugging the generated code rather than the original source--which, especially for nontrivial metaprogramming systems--is frequently gobbeldygook. (An equivalent problem is trying to debug a program written in a HLL by using a debugging system which only provides access to the generated machine code).

The solution to this problem, requires that a) the underyling "host" language, in which the metaprogramming is reduced to, provides some way to translate correlate program statements in the metaprogram to program statements in the generated code; and b) that the person writing metaprograms uses these (or the necessary metadata is generated automatically). If you turn debugging on your favorite (C) compiler, you'll see the exe balloon by an order of magnitude; much of this is due to the additional metadata (the "dwarf" or "stabs" in Unix-land, for example) needed to support source-level debugging.

Many metaprogramming environments and metaprograms don't bother, unfortunately. And in some cases, the problem is diffcult. C++ has had templates for 15 years now, and the error messages emitted by most C++ compilers when you use a template incorrectly are *still* largely incomprehensible junk. The next revision of C++ promises to have new features to assist with better specifying template constraints and documenting errors--but these won't necessarily help with legacy code.

And C++ templates are a rigidly staged system. The problem is undoubtedly far worse for buggy metaprograms generated at runtime.

The problems you describe

The problems you describe with other metapgrogramming systems are generally solved by multistaging, which is why I suggested it. I agree that better tools or techniques are still needed even with MetaOCaml.

good point

The Aspect programming people like to talk about how good it is to separate concerns, but in their case that also means the person reading the original source code may have no idea what's going on unless he has a pointer from the original source code to the point cut code that alters it invisibly.

Multiple dimensions

Aspects, syntactic templates, type inference, subclassing, and a host of other powerful compile time technologies all make it hard to figure out what's going on by looking at one small block of code. That's not necessarily an argument against those features, though. It's perhaps more of an argument that if there is real power in muddying the relationship between localized chunks of source code and the final AST then our tools will need to continue to grow beyond being glorified text editors.

Smalltalkers have known this for a long time, but their focus was largely on the runtime image since that's where they do their more complicated magic. As the compile time image gets more complex in some languages we'll need more sophisticated tools to work with that phase as well.

Hear, hear...

This is one of the most important and high-leverage research directions today.

Unfortunately, cultural pressure between PL and SE people seems to push people away from the border between languages and tools. Tool people prefer yesterday's languages, and language people prefer yesterday's tools. A sad state of affairs. (I should note that this is obviously IMHO only, and that there are exceptions, blah blah blah.)

In any case I think it's a really important area and I'd like to see more and better work in this direction.

A simple challenge

Here is a simple puzzle you might attempt to solve. It's a real world example. C++ was used as an implementation language.

Product managers used to write file system properties into an XML document called the profile. Each file was uniquely characterized by a path. The purpose of a program called the checker was to check file attributes and contents against specifications. As a special requirement the tests had to be executed in the order of the files in the profile that was arbitrary. Each test was implemented as a method in the checker. At runtime the profile was read and path data of files were sent to the checker. The test method names were such that the checker could map path data onto the method names. So path data alone were sufficient to identify the test methods of the corresponding files.

Design challenge. Call the methods without use of runtime reflection.

Keep in mind...

my comments and questions are primarily concerning *mutable* reflection--altering the structure of a program via a reflection interface. In CLOS you can redefine a class schema, and all instances of the class will automatically get the new attributes or whatnot.

The use case you describe--calling functions specified at runtime--requires introspection--examining the program structure, but not its modification.

I've little doubts of the benefits of read-only reflection; and about its potential for far less abuse. :)

O.K. so we have that :)

Another example for constructive reflection. I just released version 0.2 of generator_tools for Python. It implements a missing feature namely copying and pickling live generators - a feature that is available in Stackless Python ( "tasklet pickling" ) but not in the CPython distribution. It makes use of reflection on each level: inspection of generator objects, bytecode inspection, code object construction, GeneratorSnapshot construction for the purpose serialization / deserialization ... and avoids completely to use C code.

It's essentially a language feature written in pure Python - something known for long in the Lisp communities as you noted. The portability is limited though because the implementation relies on bytecode inspection. Given a different context such as Javas and the JVM this would imply that language features could be distributed and installed as class files no matter which implementation of the JVM is used as long as it is standards complient.

necessity of reflection

I don't mean to be a pill but the necessity of reflection is a property of the programmable computer, at all. When you are talking about subsystems in which reflection is omitted, all that you are doing is splitting the programmability of a computer into two separate economic roles -- creating a contentious power relationship. In the era of Web 2.0-style surveillance, it is pretty easy to ask why anyone would want to do that.

-t

So that's why they call it a "Power" button!

I think there's a terminological problem here, or something. Scott is trying to focus on reflection which mutates the original program somehow. Presumably, that doesn't include changing the source code of a program and rebuilding it, unless that happens while the program is running, without stopping the program. So are you saying that systems that require a stop & restart in order to make a program change create a contentious power relationship?

All von Neumann machines

are reflective--at a base level, reflection (or not) doesn't materially affect economic or political power relationships. These are an interesting topic, but not terribly relevant to this thread. We have seen in recent years a number of devices (TiVos, numerous gaming consoles, iPhones and other mobile devices) which attempt to go to great lengths to prevent the user from modifying the underlying firmware, presumably to deny the end user some economic benefit, such as choice in mobile networks, the ability to run games published in foreign markets (or pirated games FTM), etc.

But DRM and "trusted computing" as such has little to do with programming languages. Most of the systems described above are written in low-level (and nonreflective) languages like C1, and use non-PL design elements such as cryptography/digital signatures and tamper resistant hardware to keep the hackers out. (And most such attempts inevitably fail--how many "trusted" consumer devices out there remain uncracked?)

So no... I'm not focusing on reflection at the machine layer, or at the operating system layer. I'm even ignoring, for the most part, course-grained reflection at the application layer, such as the loading of DLLs, use of Java class loaders, or other means to permit a program to be composed from components at runtime.

I'm focusing solely on reflection at the language layer, where a program written in language L can do reasonably fine-grained manipulation itself at the level of the syntax or semantics of L. In particular, destructive reflection--where extant (and live) system elements are modified (as opposed to, say, using string manipulation functions to compose source code and passing it off to eval()--an operation which creates a new function, but doesn't modify the behavior of any existing ones).

All of which has little to do with who owns the computer and/or its OS or firmware.

1: Some might view C as a highly reflective language, as it's low-level nature lets you do whatever you want, including modification of object code. However, as such hacking is highly platform-dependent and requires use of constructs which are officially undefined; I am not considering C reflective for purposes of this discussion.

the news from Harvard

[Anton's subject line really cracked me up.]

I don't think there's as much disconnect as you think. When should a program have this ability? What's the user case? I gave a clear one.

More conventionally, I've heard of (not touched or tasted) mission critical OLTP systems that want live updates. Similarly, systems that use techniques like "code evolution".

-t

Even so...

I've touched some, and it's still not obvious to me that reflection is the best way to achieve live updates. A coarser-grained approach based on clustering or something is more common (for a reason, IMHO), and it also addresses the problem of how to deal with unexpected failures.

Here is a case where introspection would be useful

I am currently making a c++ GUI application with a large data model. I am using templates for member declarations of aggregates, so as that I do not have to write set and get methods. The member declarations also have static constraints. This allows me to do, for example:

enum Type {
    Type1 = 1,
    Type2,
    Type3
};

class Item : public ModelRecord {
public:
    ModelProperty<Enum<Type, Type1, Type2, Type3>, Type2> type;
    ModelProperty<Range<10, 20> , 15> value;
};

The class ModelProperty automatically registers itself to the ModelRecord it belongs. In order to achieve this, I have overloaded the operator new for ModelRecord in order to keep the memory range that the record occupies; when a ModelProperty is constructed, it searches the model record ranges and attaches itself to the appropriate model record.

The classes ModelRecord and ModelProperty have methods for converting themselves to XML and for converting XML to objects. The above would be converted to:

<Item>
    <type>2</type>
    <value>15</value>
</Item>

In order to convert them to XML, I have to manually specify the name of each record and property, in the constructor of each record class.

If I had reflection:

1) I wouldn't need to keep the memory range of each model record. I would find the appropriate model record at run-time by examining the retrospection information of each object.

2) I wouldn't have to specify the tag names manually; I would use the class and member names from retrospection information.

The classes above do some other interesting things as well: a model property has a signal 'modified'; a model record has a slot 'setModified' (I am using Qt, hence the terminology). When a model property registers itself to the record it belongs, it also connects its modified signal to the setModified slot of the record; thus, when I build a tree of objects, modifying a property somewhere in the tree results in notifying the root object that something was modified, enabling me to do interesting things like setting the save button to enabled, put an asterisk near the name of the edited document etc.

Now that I wrote all this, I wonder how would I do the above in Haskell. Is someone interested in presenting working code? I would like something more than 'use the zipper monad', please.

OK

Use Scrap More Boilerplate:

Writing boilerplate code is a royal pain. Generic programming promises to alleviate this pain by allowing the programmer to write a generic “recipe” for boilerplate code, and use that recipe in many places. In earlier work we introduced the “Scrap your boilerplate” approach to generic programming, which exploits Haskell’s existing type-class mechanism to support generic transformations and queries.

This paper completes the picture. We add a few extra “introspective” or “reflective” facilities, that together support a rich variety of serialisation and de-serialisation. We also show how to perform generic “zips”, which at first appear to be somewhat tricky in our framework. Lastly, we generalise the ability to over-ride a generic function with a type-specific one.

All of this can be supported in Haskell with independently-useful extensions: higher-rank types and type-safe cast. The GHC implementation of Haskell readily derives the required type classes for user-defined data types.

Paul already gave the right

Paul already gave the right answer, I just wanted to expound on it a moment: all redundancy and boilerplate that I'm aware of is eliminated by polymorphism and polytypism combined. Most languages now support adequate polymorphism, and polytypism has been getting more attention of late. So what's left for reflection?

Serialization/pickling perhaps, which is slightly polytypic in that it's a function defined on the structure of types, but there's an aspect of serialization beyond polytypism: code loading/linking. There are two ways I can see to address this: a native facility for pickling ala Alice ML, or multistaging by interpreting a bytecode so the pickling facility is built in the language itself. The only remaining problem with the latter approach is pickling modules, which explains my interest in first-class modules . I'm still investigating solutions. Alice ML also solves this via "packages".

What is polytypism?

And how is it different from polymorphism? Is it similar to the kind of structural typing C++ templates allow or ..?

I gave an intro here.

I gave an intro here. In short, it's programming on the structure of types.

For example...

Consider the problem of writing a serialization function for some arbitrary object. Other than doing it by hand, several metaprogramming techniques are frequently brought to bear.

In languages with introspection (Java, Smalltalk, CLOS)--one can use introspection to discover an object's type, and iterate over its fields at runtime. This technique defers much to runtime, and may require downcasting, as the introspection routines typically return Object or some other root type; rather than having the type of the actual object.

In some systems, preprocessing is used to generate such things from a specification. Corba and RPC work this way; an "IDL compiler" iterates over a type defined in an interface definition language, and emits equivalent type definitions in the target language, as well as functions to do things like pickling (marshalling) and unmarshalling.

With polytypic programming, the structure of a type is available to functions which run at compile-time.

With polytypic programming,

With polytypic programming, the structure of a type is available to functions which run at compile-time.

Not always. With term-level representation types ala RepLib, I don't see why a new type and its corresponding representation type couldn't be loaded/defined at runtime. I believe the important difference is that the signature of the representation type is known statically.

True...

when I mentioned compile-time, I didn't mean to exclude runtime. Stuff that the compiler knows can always be made available to the runtime in suitable fashion (though many PLs still discard much of the information retained by the compiler).

It's going the other direction that's hard.

I think what you are really

I think what you are really trying to say is that it can't be done if there's type erasure.

Yes, and no...

in a statically-typed language where the most specific type of every term is known, type queries can be made available to the runtime at the point of request, without having to make type info generally available.

If I have the following program (in pseudocode):

let Integer a = 5;

...

let Type typeOfA = typeof(a);

the RHS of the second statement is known at compile-time; and can be treated like any other literal.

If, on the other hand, I can weaken types (as in an OO system with subtyping, or in a language with algebraic sums), then some runtime info will be needed to rediscover the strongest type. As similar runtime machinery is also needed to do dynamic dispatch in such langauges, 'tis no big deal.

There's different levels of type erasure, of course. Discarding metadata is IMHO seldom useful, especially if there's only a per-type penalty rather than a per-instance or per-reference penalty. OTOH, compiling to code in a type-unsafe language (such as a machine's instruction set) which contains no typechecks, but works by virtue of being the output of a type-safe translator, is frequently a very useful thing to do.

Preventive maintenance

But it does make sense to always push the role of compilation checks to the limit and reduce the use of execution checks. The distinction is important. Although it is a commentary on conventional thought that the reaction is it's "faster" to do the checks at compile time, the true advantage is preventive maintenance.

Use the Command Pattern.

Use the Command Pattern. Read the command to invoke from the XML file, instantiate a Command object, filling in whatever parameters are necessary based on that command, and invoke it. No reflection, nor even introspection, necessary. Yes, you end up with a method that dispatches XML commands to Command objects, but this can be automatically generated by a pre-processor.

I just completed a work assignment maintaining a multi-million line Java environment that used reflection to do precisely what Kay describes -- reading a DSL and dispatching to methods to invoke each DSL statement's functionality. Debugging it was A NIGHTMARE. You'd be diagnosing the problem up until you get into the reflector, and then, *poof* the trail ends and there's nowhere for your debugger to acquire additional context from.

Setting breakpoints were futile because there were many classes which implemented commands -- which one was used? In short, it is the worst experience of my professional coding career.

Both reflection/introspection

and "use of a preprocessor to generate Command objects", are simply different means of metaprogramming.

Run-time v. Compile-time

There's a big difference between compile-time and run-time metaprogramming. Though introspection is a pretty trivial thing to do given compile-time metaprogramming.

The difference isn't that important

If the DSL could have called any method in the entire system his "comamnd pattern" generator would have statically created something much like what the Java runtime does. So, modulo dynamic loading considerations, the difference isn't that important in this case.

In essence he created a (non-destructive) reflection system for a statically determined subset of the available methods.

[Edit: The original author put too much emphasis on the "command pattern". In a functional language the same could be done by creating a map from strings to function values. In a first order procedural language a giant switch could have been used.]

Is it necessary or just another name?

Do dependent types (as in epigram) make "mutable" reflection unnecessary or is it just another name for a more structured version of the same thing? I'm suspicious of reflections systems that use regular expressions. If we're going to reflect and mutate we should do it with some kind of control.

Pointers anyone?

Another point of view on reflection: Basically it is how one programs without pointers. Java has reflection because it doesn't have pointers. You can't write a Lisp-like language/interpreter in Java without reflection. List structure with doted pairs requires pointers. Judging from some early Java books, this is actually the thought process behind Java reflection. They eliminated pointers to get rid of all those pointer errors and re-invented Lisp, I mean reflection as a replacement.

Edit: I just woke up from a really strange experience and realized that this makes no sense at all. Consider it deleted.

I don't see how pointers are

I don't see how pointers are relevant to any of your arguments, and your lisp argument is bizarre. If JRuby is possible, then JLisp is certainly possibly as well.

Reflection as in Java is certainly a more principled way of interpreting a data structure than trying to interpret its pointer layout, but who actually does that? If anything, reflection is more like Ada's "type model" approach, where a type declaration allows the compiler to generate all sorts of other information about the type and provide it to the programmer as attributes (in this case, the declared type's internal structure).

Not so

You can't write a Lisp-like language/interpreter in Java without reflection.

Java minus reflection is still Turing equivalent. You can create an interpreter for any Turing equivalent language in any other Turing equivalent language.

List structure with doted pairs requires pointers.

Depends on what you mean by "pointer." Since you said that Java doesn't have pointers I assume you mean that dotted pairs require C style pointers. In that case, I beg to differ. Dotted pairs require references similar to what Java (and a bunch of languages) use.

But it doesn't matter either way. If it really need C style pointers then since we're talking about an interpreter we could, theoretically, create "pointers" that are offsets in a giant vector of bytes if we wanted to. That's not how I would write a Lisp in Java, but it certainly could be done.

Basically it is how one programs without pointers.

There is a large set of languages with neither pointers nor reflection. Are you saying one can't write programs in Haskell98? Here's an implementation of a large subset of Scheme in Haskell.

Dotted pairs require

List structure with doted pairs requires pointers.

Depends on what you mean by "pointer." Since you said that Java doesn't have pointers I assume you mean that dotted pairs require C style pointers. In that case, I beg to differ. Dotted pairs require references similar to what Java (and a bunch of languages) use.

Linked lists are implemented in Java by reflection. See for example Chapter 14 of "Java by Example" where they implement a Lisp in Java using reflection.

There is a large set of languages with neither pointers nor reflection. Are you saying one can't write programs in Haskell98?

Lisps such a Common Lisp are highly reflective as pointed out above, the same could be said for Haskell. But to achieve this type of behavior in C or C++ requires the use of a pointer type.

Linked lists are implemented

Linked lists are implemented in Java by reflection. See for example Chapter 14 of "Java by Example" where they implement a Lisp in Java using reflection.

So? Because one misguided example uses reflection, they must ALL use reflection? Here's a linked list:

public class LinkedList {
  Object current;
  LinkedList next;

  public LinkedList(Object value, LinkedList next) {
    current = value;
    this.next = next;
  }

  public getCurrent() { return current; }
}

In what way is reflection required?

Lisps such a Common Lisp are highly reflective as pointed out above, the same could be said for Haskell.

Haskell is not reflective. Embedding a model of a machine (ie. Lisp) within another machine (ie. C or Haskell) means you can inspect the embedded machine's state. You don't need pointers or reflection to do so at all. Efficiency is the only reason you might use pointers (ie. address arithmetic) in a C implementation of Lisp. Address arithmetic is the only distinguishing feature of pointers from Java-like references, and references are all you really need.

You messed up your definition a bit

A cons cell need not point to another LinkedList structure in its cdr. You need to have a distinguished nil object at least to terminate the list and you can also have "dotted lists" of the form (1 . 2), where both the car and cdr point to objects. So a closer to realistic implementation would look like this:

public class Cons extends LispObject {
LispObject car;
LispObject cdr;

public Cons (LispObject _car, LispObject _cdr) {
car = _car;
cdr = _cdr;
}

public LispObject car() { return car; }
public LispObject rplaca(LispObject _car) { return car = _car; }
public LispObject cdr() { return cdr; }
public LispObject rplacd(LispObject _cdr) { return cdr = _cdr; }
}

This is, of course, quite an abstraction, and the nuts and bolts of the interface would need to be tuned to achieve any sort of performance, but your main point was correct - introspection is not needed to create an interpreter. If you want to talk about a compiler, though...

Nil = null. It's not very

Nil = null. It's not very safe, but it's minimal.

But you said:

But you said:

So? Because one misguided example uses reflection, they must ALL use reflection?

It seems to me that if you are going to call someone else's example "misguided" that your counter example should be exemplary, and usable by a client programmer without unintended consequences. Your example is not exemplary because, as you said, "it's not very safe". I'd also say it's not "minimal" but less than minimal. It doesn't satisfy the mathematical recursive definition of a list.

It's not supposed to satisfy

It's not supposed to satisfy the "mathematical recursive definition" of a list because neither lisp nor the CS concept of a linked list directly do so. Linked lists and recursive lists are not the same concept, however closely related they may be.

Nor should we expect counterexamples to be perfectly engineered software for client programmers when they're simply there to demonstrate that a statement is patently false and that using the language feature in question is introducing huge amounts of unnecessary complication. It would be nice to have some time for actual discussion rather than polishing examples.

Reflection in Java

In what way is reflection required?

It is required if you want to create a dynamic Lisp-like language in an environment without pointer types, such as Java. This is the point I was trying to make above.

Reflection is not required,

Reflection is not required, which is the point I was trying to make above.

Is it possible that you're

Is it possible that you're confusing 'reflection' with 'references' (the most likely candidate for "Java's replacement for pointers")?

This thread has gone into bizarro world

Linked lists are implemented in Java by reflection.

A statically type safe linked list can be implemented in most any language with first order polymorphism - e.g. Java - without anything like reflection.

But we were talking about heterogenous pairs, not linked lists. Here's the thing, a program written in Java to use heterogenous pairs for arbitrary linked lists would certainly be awkward at best due to the way its type system works. But that's beside the point. It doesn't undermine my ability to write a Lisp interpreter in Java.

I might start with an interface called LispObject which specifies a tag method that returns the runtime "type" of my LispObject. DottedPair would implement that interface and would have LispObject members called car and cdr. There'd be similar classes to implement symbols and numbers and all the other Lisp stuff you've come to know and love.

Lisps such a Common Lisp are highly reflective as pointed out above, the same could be said for Haskell.

Are we talking about the same Haskell? Or the same meaning of the word "reflective?"

But to achieve this type of behavior in C or C++ requires the use of a pointer type.

Yes, but what does that have to do with implementing a Lisp interpreter in a language with neither pointers nor reflection? Again you can write an interpreter for any language, even a highly reflective one like Common Lisp, in any other Turing equivalent language. That's kinda fundamental to computer science. It might be tedious, awkward, or ugly but it's always possible.

Java minus reflection is still Turing equivalent.

Haskell98 is in no way

Haskell98 is in no way reflective - sum types do not count! - and while GHC supports some features that can be considered reflective, the Dynamic type is at best a limited form of introspection.

Types

Wow, and I thought this would be uncontroversial. I read over Chapter 14 in "Java by Example" and find that the reason for the 18 pages of source code called java.lang.reflector is to find objects and methods of the right type. It is not really the absence of pointers since references are used instead. The need for a separate reflection capability in a Lisp interpreter seems to be due to types and not pointers or references.

But still there doesn't seem to be a need for any separately defined reflection in the C, C++ system. I would like to discuss this more but it may be too unfamiliar for me.

Still not needed, unless...

Wow, and I thought this would be uncontroversial

If all you want to do is build a Lisp interpreter in Java then you don't need anything like reflection. There shouldn't be any controversy here. Turing and Church proved it in the 30's.

If, on the other hand, you want to build a Lisp interpreter that can interact with arbitrary Java libraries then you'll need reflection of some sort (e.g. Java's reflection libraries or dynamic byte code generation or both).

But still there doesn't seem to be a need for any separately defined reflection in the C, C++ system

A Lisp interpreter in C or C++ could not call just any existing library. In stock standard ANSI C and C++, pointers or not, there's no standard way to find a pointer to arbitrary code at runtime given just the name of a function or whatever. DLLs and SOs and the libraries to work with them add some reflective capability in order to support dynamic linking. Things like COM go even further. But all these aren't part of C or C++. They're separately defined reflection systems.

[Edit: but "separately defined" is a side-issue. The real point is that a pointer gives you a handle to something, but it doesn't automatically give you any particular power to find that something in the first place.]

Functional reflection

One can use introspection and some form of "eval()" in a functional language to attain the same ends as a "mutating reflective" language.

For example, using a non-mutating form of lisp, one could introspect on functions (looking at their tree representation), build new forms, and execute them.

If one did this in Haskell (or camplp4?) you could do it all with type-safety (see naasking's note on staging, and Oleg et al.'s work on staging (e.g. here) seems relevant too, although I certainly don't understand it all).

If you use linear types, you could even have the _implementation_ do 'destructive' updates behind-the-scenes, while having none of this visible at the language level (although I haven't seen any implementation of linear types that needs _no_ annotations, but if you're writing higher-level code that only takes advantage of e.g. a library you might not need any).

Henry Baker wrote a lot about linear types (see e.g. here), and as I recall a lot of his interest was in embedded systems (e.g. 'The Forth shall be First'). When I say you can use "eval()", a full-fledged parser+compiler obviously won't work for most embedded systems, but if the eval() is a suitable compile-time macro, or if one instead works with a parsed (tree) representation and the compiler is able to understand how to transform the needed manipulations into corresponding manipulations of the compiled code, then this can still work.

I'm still waiting for a nice functional language (typed scheme?) making good use of linear types..

Reflection is good, but costly

I believe that reflection is good, at least in large software systems, but it does have a cost.

The most interesting reflection is "compiled reflection" which requires that the system contains a compiler able to generate code. This is interesting but intrisically costly. On embedded systems (like those targetted by Virgil) this is too expensive.

A less interesting reflection is "interpretative reflection" which is just the ability to introspect, and perhaps update, at runtime, the system state (including its call stack). This requires a compiler (external to the system) and a runtime (included in the system), so also have a cost. During runtime, the access are essentially interpreted (so are costly), like e.g. in JVM.

As mentioned before, similar to that is the ability to do multi-staged computation, like eg in MetaOcaml.

I believe that reflective systems are necessarily complex (and I accept this) but this complexity is sometimes too costly (in some embedded systems, like my dishwasher or my 10€ watch, it is too expensive).

I believe that reflective

I believe that reflective systems are necessarily complex (and I accept this) but this complexity is sometimes too costly (in some embedded systems, like my dishwasher or my 10€ watch, it is too expensive).

Forth is low level, reflective and small. I think Forth serves as a counter example of your claims.