Program in a program

A couple of weeks ago (the database LtU seems corrupted so I can't find the link) I posted about my theory of data parallel computing. I make a bunch of claims (not necessarily in that paper) why my idea can lead to more efficient software than some other parallel programming systems, and I was wondering why.

At some level, my Integrative Model for Parallelism is based on dataflow, as an Intermediate Representation, to be exact. And I think one reason dataflow approaches can be efficient is that the dataflow graph is a representation of the program in the program. By building up a dataflow graph, the program reconstructs a limited version of itself that can then be analyzed by a higher level tool than either the compiler or a traditional runtime.

My question to this esteemed audience: surely I'm not the first one to observe this. Is there a formalism, or at least a vocabulary to describe this phenomenon?

(Background info: I know that dataflow is a 1980, early 90s at best, phenomenon, but it's been in resurgence, with OpenMP tasks (OMP v3 and later), Intel TBB & CnC, and dedicated linear algebra software such as SuperMatrix and Quark. In all these cases the C/Fortran program creates a task graph out of ordinary function pointers and submits it to some scheduler which can be defined on a fairly high level, in a library itself written in C.)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.



That's the one.

partial evaluation, adaptive optimization, staged programming

I'm not sure I understand what phenomenon you're trying to describe, but you might look into partial evaluation, adaptive optimization, dynamic recompilation, super compilation, and staged programming for some appropriate vocabulary.

Multi-stage programming

Thanks. It'll take me a while to investigate all these concepts, but I'm reading the thesis by Walid Taha "Multi-stage programming" and I'm finding some of the ideas there:

"Having Run in the language is important if we want to use code constructed using the other MetaML constructs, without going outside the language."

So there you have program-in-a-program concept fairly explicitly.


I recently realized that you can abuse C# generics+structs+interfaces for a kind of multi stage programming. In C# if you have a generic type or method Foo<T> and T is a struct, then the CLR will create a specialized version Foo_T. You can abuse this by representing your data at the type level. For example you can represent a regex at the type level with types Alternative<A,B>, Sequence<A,B>, Repeat<A>, Character. This enables you to write a regex.Match method that causes the CLR to generate a specialized automaton.

I'm not sure if the CLR

I'm not sure if the CLR actually specializes all generic instantiatiions, though. At any rate, I find this behavior quite useful for generic programming, whether or not it forces specialized versions.

AFAIK the CLR specializes

AFAIK the CLR specializes for all value types, but shares code for all reference types. So if you define Alternative<A,B>, Sequence<A,B>, Repeat<A>, and Character as structs then any method that receives them as a type argument gets specialized.

Nice to know. Do you have a

Nice to know. Do you have a link to evidence for this assumption? It's something I've wondered but have never been able to really verify; if true that could be very useful.


There isn't much assurances.

There isn't much assurances. It is probably safer for my use case to just reinvent templates, especially since I'm using the DLR to generate code anyways.

A more authoritative link,

A more authoritative link, from the CLR program manager.

Bling builds dataflow graphs

Bling builds dataflow graphs at run-time to be compiled into databinding code, physics code, GPU code, and so on. It is quite efficient, but the phase distinction makes debugging very hard. The more indirect programming gets, the harder it becomes.