Lambda the Ultimate

inactiveTopic Ed Felten: Programs vs. Data
started 2/5/2003; 12:10:46 PM - last post 2/7/2003; 2:05:21 PM
Ehud Lamm - Ed Felten: Programs vs. Data  blueArrow
2/5/2003; 12:10:46 PM (reads: 1981, responses: 10)
Ed Felten: Programs vs. Data
Hal Abelson starts the forward he wrote for EOPL with following observation:

This book brings you face-to-face with the most fundamental idea in computer programming: The interpreter for a computer language is just another program.

This observation is so basic for understadning programming languages that it is easy to forget just how profound it is. Ed Felten is writing about programs vs. data, which is basically another way to approach this fundamental truth of computer science.

A few more words from Abelson:

It sounds obvious, doesn't it? But the implications are profound. If you are a computational theorist, the interpreter idea recalls Goedel's discovery of the limitations of formal logical systems, Turing's concept of a universal computer, and von Neumann's basic notion of the stored-program machine.

Indeed, without coming to terms with these basic ideas of computability theory it is impossible to study programming languages. For example, type systems (a favorite topic around here) should be decidable. Interesting computational properties are in general not decidable. Remarkably, a type system can ensure that all program terminate, but cannot ensure that all programs written in a Turing complete language not encounter divide by zero problems.

The subtle difference between these two goals is an attribute of universality, which is grounded in overcoming the distinction between programs and data.

By the way, from a computer science perspective, this topic is closely related to another pet peeve of Felten's The Fallacy of the Almost-General-Purpose Computer.


Posted to general by Ehud Lamm on 2/5/03; 12:25:26 PM

Ehud Lamm - Re: Ed Felten: Programs vs. Data  blueArrow
2/5/2003; 2:27:41 PM (reads: 1266, responses: 0)
Note that the argument above can be used to argue that a course in programming languages is required for decent CS education.

Max Dornseif - Re: Ed Felten: Programs vs. Data  blueArrow
2/5/2003; 3:30:27 PM (reads: 1225, responses: 2)
So is my understanding right that there is no information-theoretical way to formally destinguish between program and data?

Ehud Lamm - Re: Ed Felten: Programs vs. Data  blueArrow
2/6/2003; 3:45:56 AM (reads: 1199, responses: 1)
It's not really an information theory question. Information theory can in fact give you useful clues (e.g., code and data often have different entropies). But from a theoretical computer science perspective there is no formal way to distinguish code from data in general. Practical special cases exist, but not a general distinction.

Notice that one man's program can be another's data.

When I write a program in emacs, the program is data. When I compile it, the program is again data (for the compiler). The resulting object code is data for the loader. The object code in memory, when executed, is "program", but may still serve as data for a debugger.

Things are even more fun when you think about virtual machines (like the Java JVM). The JVM is executing your compiled Java program, so the class file is a program. But if you examine the JVM itself, you notice that treats the class files as data (read from file and interpreted).

These cases are not special cases. The same goes for HTML in the browser (code) and in the validator (data) etc.

You may wonder whether it is possible to answer the original question, when considering pairs of (program,data), instead of single values. For example (browser,html file) will return "program" where as (editor, html file) will return "data". This is also not possible in general, but I'll leave the explanation for another time (essentially, this approach requires you know what the program "does", and this is undecidable).

Alex Sauer-Budge - Re: Ed Felten: Programs vs. Data  blueArrow
2/6/2003; 5:48:51 AM (reads: 1137, responses: 1)
Ehud wrote:
Remarkably, a type system can ensure that all program[s] terminate, but cannot ensure that all programs written in a Turing complete language not encounter divide by zero problems.

I am curious about this statement (and still very new to programming language theory). I looked in a few online PL textbooks but didn't find this proposition discussed. Can a type system for a "useful" language be devised that ensures all programs written in the language terminate? If I can do something "useful", like find the root of a function with Newton's method, does this mean that the type checker will itself prove the necessary Newton convergence theorems? Since the convergence of Newton's method depends on the properties of the function whose roots are sought, does this mean that the function must be supplied at compile-time (furthering the ambiguity between program and data)?

Ehud Lamm - Re: Ed Felten: Programs vs. Data  blueArrow
2/6/2003; 5:58:05 AM (reads: 1244, responses: 0)
While we are discussing these fascinating and fundamental questions, let me mention that they are related to another important and interesting phenomenon: self replication.

The possibility of self reproducing automata is quite amazing. By showing the self reproducing automata exist we can convince oursleves that self reproduction (or replication), a major feature of biological life, can be mechanized, and its computational properties studied.

Johnny von Neumann studied this question in the late 1940s (before the discovery of DNA!) His Theory of self-reproducing automata is a tour de-force.

A noteworthy feature of von Neumann's model of self replication is the double-faceted use of the information that serves as the blueprint: It first serves as instructions to be interpreted so as to construct a new universal constructor, after which this same blueprint is copied unmodified, to be atached to the new offspring constructor - so that it may replicate in its turn.

For more information about self-replication research and von Neumann's seminal work read Moshe Sipper, Fifty years of research on self-replication: An overview, Artificial Life, vol. 4, no. 3, pp. 237-257, Summer 1998.

Michael Vanier - Re: Ed Felten: Programs vs. Data  blueArrow
2/6/2003; 6:19:18 PM (reads: 1064, responses: 1)
Note that the argument above can be used to argue that a course in programming languages is required for decent CS education.

Does anyone really argue against this?

Note that proving program termination is in general also undecidable, except in highly impoverished type systems (e.g. simply typed lambda calculus).

Ehud Lamm - Re: Ed Felten: Programs vs. Data  blueArrow
2/7/2003; 1:26:45 AM (reads: 1098, responses: 0)
I assume everybody here knows about the halting problem...

Frank Atanassow - Re: Ed Felten: Programs vs. Data  blueArrow
2/7/2003; 5:16:05 AM (reads: 1042, responses: 1)
I am curious about this statement (and still very new to programming language theory). I looked in a few online PL textbooks but didn't find this proposition discussed. Can a type system for a "useful" language be devised that ensures all programs written in the language terminate?

Yes, for example Charity. Dependent type systems are another approach, but I don't think they're realistic.

If I can do something "useful", like find the root of a function with Newton's method, does this mean that the type checker will itself prove the necessary Newton convergence theorems?

It is more like you are proving convergence, and the proof you write is executable. The typechecker's role is to ensure that the code you write is a proof; it won't magically come up with a convergence proof for you.

Since the convergence of Newton's method depends on the properties of the function whose roots are sought, does this mean that the function must be supplied at compile-time (furthering the ambiguity between program and data)?

No.

If termination depends on the properties of a function, then in a Charity-like language you will need to come up with an inductive datatype for representing functions because termination is ensured by structural induction. This means you will need to define a little language of terms which denote functions and compute the necessary properties by recursing over them in a particular manner.

In this case, the functions are functions on reals, though, and I am pretty positive you will not be able to define such a datatype in Charity. Your only hope for coding up Newton's method in Charity would be to figure out a way of generalizing it to a more discrete domain.

In a dependently typed language, it would be possible (in a dependently typed language anything is possible), but you would undoubtedly need a huge library encoding information about Dedekind or Cauchy reals, real analysis, etc. Coding up such stuff is difficult enough; making it efficient would be a vast undertaking.

Ehud Lamm - Re: Ed Felten: Programs vs. Data  blueArrow
2/7/2003; 5:44:46 AM (reads: 1079, responses: 0)
To clarify, the subtle difference between the two statements I gave is that by ensuring termination, the type system essentially makes our language less then Turing-complete. However, if the language is Turing complete (as required by the second statement above), non-trivial properties are undecidable.

Ehud Lamm - Re: Ed Felten: Programs vs. Data  blueArrow
2/7/2003; 2:05:21 PM (reads: 1040, responses: 0)
Seeing as you mentioed Charity, I wonder if anyone has links to interesting Charity code examples? The Charity website leaves much to be desired.