Term Rewrite System Implementations?

Is anyone aware of concrete term rewriting systems in a form of computer languages? It seems to me as a very promising area as a programming paradigm, yet there is no sign of it as far as I have searched the web. There is no sign of it even on wiki page of programming paradigms, even generally. Are we looking at yet unrevealed area of programming? By my personal opinion, term rewriting might be a way humans think at some higher abstraction level, above all the neural networks native to the human brain. Term rewriting essence seems so natural to me as a thinking framework for solving math, physics, chemistry or logic problems, that I'm surprised there are no implementations of it in a form of programming languages.

Does anyone know of any implementation? And should term rewriting be accepted as a different programming paradigm?

Thank you,
Ivan

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Term-rewriting programming

hidden

So it was hidden under functional languages? Thank you very much for the link, it is a valuable resource to me. I find specifically interesting languages: Refal and Stratego/XT.

Still, I wonder why term rewriting is not more popular with programming commercial applications. Maybe because of lack of dedicated programming libraries?

To that list of languages

I've seen Curry mentioned. Maybe it should be in that list of languages.

Corrections to the list and pitch for Pure.

Clean belongs to the Haskell family, so it's not generalized term rewriting (that is, it has constructor discipline). I don't know much about Refal, but it appears to be eager with constructor disciplineas well.

Q is the ancestor of Pure, and has been abandoned for a while in favor of Pure, which basically subsumes it (Q is a pure interpreter, Pure is an LLVM-based JIT compiler).

Pure, on the other hand is the real thing. It is eager, dynamically typed (all terms belong to a single Herbrand universe), and has no distinction between constructors and functions (which means rules are not guaranteed to be confluent, as that is inconsistent with Turing-completeness).

That link isn't working.

It comes out in blue boldface but isn't clickable because, although it's an <a> element, it has no href attribute.

Sssh...

According to the Clean people, Clean is older than Haskell. I think they find the notion a bit offensive but, anyway, there only four or five of them so if you don't mention it too loudly it's okay.

Clean generalizes term rewriting to graph rewriting. AFAIK, early Haskell implementations did the same.

I don't see how confluency affects Turing completeness? A language can be confluent and be Turing complete...

There is rewriting and rewriting

The operational semantics of functional languages such as ML, Haskell, Clean may often be expressed as rewrite systems (for call-by-need you need to either manipulate graph as your mention or have non-local rewrite rules that also update a store). Each program then correspond to some rewrite system, but the converse is not in general true: for an arbitrary rewrite system it is not clear whether it corresponds to a program in those languages.

On the contrary, most of the systems discussed here strive to let you give *any* rewrite system (usually in a class of systems with certain natural properties) and think of it as a program. This means that the expressive freedom, the set of useful concepts, and the implementation techniques are in practice very different.

I don't see the problem

Each program then correspond to some rewrite system, but the converse is not in general true: for an arbitrary rewrite system it is not clear whether it corresponds to a program in those languages.

This seems counterfactual. A rewrite system normally has a precise and clear 'operational' semantics which should be implementable in any Turing capable language.

Link?

Hm

No, I think my claim is correct.

There is a difference between "the operational semantics of my language in term of rewrite systems determines that program P corresponds to the rewrite system S" and "program P implements the rewrite system S". Any program in a language with a small-step operational semantics can be seen as a rewrite system, but if you implement a rewrite system S as a program P, the rewrite system naturally associated to P by the operational semantics will *not* be S. It will be a larger rewrite systems that does one or several reduction steps for each corresponding step in S, and results in a large state from which a simpler state in S can be decoded.

I'm claiming that the rewrite systems that corresponds to the operational semantics of programs do not cover the whole space of rewrite system (or even most reasonably defined sub-classes of them); this does not contradict the fact that they can still *implement* (encode) arbitrary rewrite systems.

Seems obvious

It seems obvious to me that an arbitrary rewrite system may not be expressible within the limitations of any given language. Let's call a rewrite system that implements Haskell 'H', and a rewrite system that implements x86 assembly 'X'. It should be obvious that a program 'P' written in X is not always a valid program in H (because H limits programs to those that are sound in its type system), however we should be able to rewrite 'P' into a new program 'Q' that observably does the same thing as P, but is a valid program in H. Or have I missed the point?

.

Duplicate

Rewrite System != Program

I don't see how a program corresponds to a rewrite system?

A rewrite system is grammar plus a rewrite strategy. You can define a program in a rewrite system as an expression according to the grammar.

Looks like bollocks to me.

rewrite sequence == program

Bullshit

And what do programs have to do with typed systems? Programs can be untyped.

Easy now, here is your answer

This statement seems reasonable to me: “A proof is a program; the formula it proves is a type for the program"

Edit:
Suppose you have a rewrite rule:

sum (a, b) -> a + b

"sum (a, b)" is a type, "a + b" is a result, the whole expression is in the same time a program, and a single step proof that "sum (a, b)" corresponds to "a + b".

Rewrite System != Type System

But the general question is what programs have to do with rewrite systems. As I stated, a rewrite system is a grammar with a rewrite strategy, a program is an expression in that grammar.

That doesn't mean the grammar describes a typed system. So, the CH-correspondence has nothing to with it.

If you combine

If you combine multiple rewrite rules, you get a complex function that takes some input on one end and returns some output on the other end. In between is a calculation that gradually transforms an input to an output. And input has some type, like a result has one. Now, if that is not a program, someone please explain, cause I'm getting confused.

This is my understanding:

program == function, right?
function == rewrite rule sequence, right?
rewrite rule sequence == proof, right?
-----------------------------------------
so 1. program == rewrite rule sequence
so 2. program == proof
so 3. function == proof


Am I missing something? Isn't a rewrite strategy itself a program that transforms a typed input to some output?

Untyped SK

Let's take a small rewrite system.

K x y   = x
S x y z = x z (y z)

The rewrite strategy tells you how to rewrite. A program is something like "SK(KS)SKKSS(SKSKKSS(SKKKSSSK)SSKSS)".

No types needed.

Rules take expressions

that have their syntax type by their nature (for example math or chemistry expressions). On the contrary, contained expressions could represent even programming languages that are untyped such is basic or something. It is about a different level typings. I'm talking about syntax types of accepted left-hand sides of rules.

P.S.
I imagine a program as a set of terminals that get transformed to other terminals by rewrite rules.

P.P.S.
I'll admit, Curry-Howard correspondence still might be misunderstanded from me, but I'll try to explain it the other way: imagine a correspondence between series of applied rules and series of computations. I still claim that any set of rules represent a declarative program in a meanning of its possible input and defined output.

well, sort of


program == function, right?

Some programs are functions. The difference between the two is stuff that these days is often marginalized, literally and figuratively, by calling it "side-effects".

function == rewrite rule sequence, right?

Well, invoking a function can be represented by a sequence of rewrite-rule applications, supposing the language has known small-step operational semantics. Most sequences of rewrite-rule applications don't happen to be instances of the small-step operational semantics of some known programming language.

rewrite rule sequence == proof, right?

Any finite sequence of rewrite-rule applications proves that the first term of the sequence can be reduced the the last term of the sequence by means of the rewrite rules that were applied. It's not really insightful in general to call that a "proof". Nor to treat the theory of the associated rewriting system as a "type theory".

The Curry-Howard correspondence, even when generalized, is an isomorphism between reasoning about propositions in certain formal logics and reasoning about types in certain programming systems (the basic one is propositions in intuitionistic logic and types in typed lambda-calculus). It is not a correspondence between formal logic and computation, as such.

As a practical matter, rewriting systems can be used to program a system whose purpose is to transform terms over some term algebra. Rewriting systems are mainly of interest when they are Church-Rosser and compatible; proving Church-Rosser-ness requires some care for the form of the rewriting system. If you need to write that sort of program, and you can afford to guarantee CR compatibility, you may find it convenient to program using a rewriting system.

A sort of, agreed.

The point is: rewrite rules can be used for computing values. They take a typed input and they produce an output. In that way, they represent a declarative program code very similar to functional programming.

They take a typed input and

They take a typed input and they produce an output.

Somewhat, yes. Though how typing applies is not entirely obvious to me. And confluence is a consideration that doesn't ordinarily come into play but would with rewriting systems. The essence of both regular CRSs and regular SRSs, as I recall, is that applying any rewrite rule will not disable any other redex. So that those confluence proofs don't help with βη-calculus.

Got me there...

I have to admit that expressions like CRS, SRS, redex and βη-calculus confuse me. Though I understand what confluence is.

These are types I'm talking about.

Well, fwiw,

I can maybe explain the βη thing, which illustrates the part that matters about the other ones.

The β-rule is the sole rewrite rule of λ-calculus:

((λx.M) N) → M[x ← N]

That is, if you have a redex that consists of a λ-function applied to an operand, you can rewrite that redex as the body of the function (M) with the operand (N) substituted for the parameter (x).

For "ordinary" λ-calculus, that's the only rewrite rule. However, there's another rule that sometimes comes in really handy, called the η-rule, which says that an expression (λx.(M x)) is equivalent to M. Which makes sense; after all, the only thing you can actually do with (λx.(M x)) is apply it to an operand, and applying that to any operand is equivalent to just applying M to the operand.

(λx.(M x)) → M

If you study partial evaluation, which is basically optimizing a program based on partial but incomplete knowledge of what the inputs will be, from a λ-calculus perspective that sort of optimization is heavily dependent on η-reduction.

Now, it turns out that if you have both the β-rule and the η-rule, the calculus is still Church-Rosser. (Church-Rosser means that if a term is reducible in two different ways, then there's always some term that both paths can be further reduced to: for all T1,2,3, if  T1 →* T2  and  T1 →* T3  then there exists some T4 such that  T2 →* T4  and  T3 →* T4.) The point about Klop and all that is simply that you have to construct a custom proof that it's Church-Rosser, because the combination of the β- and η-rules isn't covered by these big general Church-Rosser-ness theorems. The key requirement for those big general theorems is that the rewrite rules not interfere with each other's redexes in exactly the way that the β- and η-rules do interfere with each other's redexes. That is, suppose you have a term of the form

((λx.(M x)) N)

There are two different redexes here; the whole thing is a β-redex, and the λ part of it is an η-redex. But the big general theorems require that when you reduce one redex, it doesn't mangle any other redexes in the expression, and in fact if you reduce either of these redexes it causes the other one to disappear. So the big general theorems don't apply to a calculus that has both the β- and η-rules in it (even though βη is a very well-known combination of rules and isn't hard to custom-prove Church-Rosser if you've already proven that the β-rule alone is Church-Rosser).

βη

Thanks for a quick tour. If I understood correctly, β-rule makes parameter application (for saturation of function parameters), and η-rule makes parameter abstraction (for formation of functions). It is nice to see some technical background behind common sense.

Actually, this duality between β and η rule, but in my own understanding, is following me for a quite long time now. I've tried several times to rewrite them both using the same new, non-dual atomic operator similar to logic implication, but without success. Whichever way I flip it over, there is a need for both rules to exist as different atomic operators.

I don't find Church-Rosser-ness very useful. What if an expression has to evaluate to several different results (even math has these examples)?

I'm not sure of that

I'm not sure of that description of η. The η rule allows you to eliminate a λ when all it says is "apply such-and-such function to the given operand". Why apply a function that says to apply another function, when you could just apply the other function directly?

If you want to know all the possible roots of an equation, say, you don't want to nondeterministically get some root, you want a list of all of them (or, some sort of collection of all of them). So you still want a single result. Church-Rosser-ness means that when you're reasoning about the behavior of the rewriting system, you can always assume that ambiguity in the order of reductions doesn't matter: no matter which redex you exercise first, a wrong decision can't cut you off from reaching the final answer. In essence, Church-Rosser-ness allows you to get a deterministic result from a nondeterministic reduction process.

η

η is used for optimization, then.

Using a list as a single deterministic result from rules makes sense now to me. Church and Rosser might be smart people after all. Well, they were smart anyway, it is me being in a position to understand people or not :)

I agree that a single unified normal form could be a great property for a set of rules and it could save a lot of headaches.

Confused

marco, you are the one that wrote:

Clean generalizes term rewriting to graph rewriting. AFAIK, early Haskell implementations did the same.

And that made perfect sense to me, but I tried to explain in a rigorous way that Clean and Haskell are still conceptually very different from languages based on rewrite systems (eg. Pure).

Now you ask:

I don't see how a program corresponds to a rewrite system?

And I don't really understand how someone that wrote the first sentence could ask the second. If Clean/Haskell "generalize term rewriting into graph rewriting", does that not mean that Clean/Haskell programs are rewrite systems?

Looks like bollocks to me.

This is rather insulting. I would appreciate if you could explain what part of what I write you don't understand or agree with, instead of making vague statements that give me no idea of what part of my argument I should try to refine to progress in the conversation -- if there is a conversation in the first place.

As I stated before

A term rewrite system is a grammar plus a rewrite strategy. A program is an expression in that grammar. Look at the SK example.

Clean implements a rewrite system, not programs. That should be trivial.

Proposing that a Clean program is a rewrite system is akin to proposing that VI (the editor) is equal to C (the implementation language), or that lambda calculus is a program.

For the rest, well you didn't get basic definitions right. It's nonsense.

definitions

As far as I can tell, your argument is equivalent to: "an interpreter isn't a program, because the program is the input to the interpreter". (Feel free to substitute 'interpreter' for 'compiler' or 'term rewrite system', etc..) What's wrong with understanding BOTH to be programs, just at different levels/layers?

A term rewrite system (assuming it's expressed in an executable language) is a program that takes programs as inputs and produces a different representation of the same program (perhaps constrained, if not confluent) as output.

Yeah well

You wouldn't call an interpreter a language, now would you?

Confluence

Although proving Church-Rosser-ness can be quite challenging in general, there are known large classes of rewriting systems for which it has already been proven, so one only has to check that the set of schemata satisfy the conditions for the proof. The well-known one is regular CRSs, for which Church-Rosser-ness was proven in Jan Willem Klop's 1980 dissertation (and doing so was such a mess that everyone just cites Klop for it). I proved Church-Rosser-ness for a superset of those in my dissertation, regular SRSs (because I needed to cope with substitution functions other than β-substitution, and also because Klop's standardization theorem used a standard order that wasn't ideal for moderating Plotkin correspondence theorems).

K framework

You see people using the K language from time to time in ICFP and POPL papers. This could be worth looking at.

Interesting

Thanks for the link, very interesting. K framework is actually very close to the project I'm developing. We both use BNF-ish rules to define syntax and term rewriting to define semantics. But their hotspot is formal language verification, while mine is practical programming and artificial intelligence.

I took a look at their funding listings. Maybe I'm in a wrong team :o)

wonderful PSA

wow, i love this: this video (well if you watch it sped up anyway) is a wonderfully fast and to the point and somewhat hilarious (in a sad way) public service announcement for why formalism matters.

maude?

maude?

OBJ family in general?

Rewriting schemata as a language?

Re the use of rewriting schemata to specify an algorithm, I used both meta-circular evaluators and rewriting calculi to describe the evaluator algorithm of Kernel in my dissertation, and had some things to say about the tradeoffs between the two strategies (§2.3.3):

The specification of a reduction system can be a lucid medium for articulating fine points of order of operations (which computational activities precede which others). The entire specification is geared toward its reduction rule schemata, and each schema is just a statement of one specific kind of permissible succession from computational state to computational state. Moreover, the schemata are understood separately from each other: whether a schema can be applied to a given term is determined entirely within that schema, regardless of any other schemata in the system. So each schema immediately expresses a specific point for the human reader to study in isolation.

The downside of this unremitting independence between schemata, for descriptive purposes, is that while detailed orderings are made explicit, any algorithmic structure larger than a single schema is made implicit. The overall modular structure of the reduction algorithm may have been known to a human who wrote the description; but the description doesn’t preserve that information, so the human reader has to reconstruct it.

An important consequence of the lack of overall structure is that there is nothing to prevent multiple schemata from being applicable to the same term. This tendency toward nondeterministic reduction is magnified by taking the compatible closure of the rules, which facilitates the construction of terms with multiple redexes.

[...]

[The next thing the section discusses is Church-Rosser-ness.]

I've experienced similar modularity concerns with logic programming.

What kind

of sturcture would one expect?

I've been experimenting with structurization of terms and got some results. Suppose we have a tree structure of terms bundled up with rewriting rules. Terms are just that, they accept some form of syntax and stitch it up to the AST. Rewriting rules have terms on the left and on the right side, and they accept terms from the left side and put terms from the right side to the AST. The problem was which rewriting rules could even be applied and at what places. The answer is: if a rule output (right hand) is of the same type as a term placed anywhere inside term tree, that term could be replaced by the left side of the rule, which forms appropriate right-hand term.

And here was the same problem you ran into. I wanted to narrow down applicable rule set. So I decided: Only rules placed on sibling nodes can be applied. If some rule is nested deeper in the structure, and you want it to be applicable, you can drag its reference to a sibling node and get it considered upon creating input. This is not "the" solution, this is "a" solution, but it works for now.

Example:

(
  (Sum <- (a <- @int, '+', b <- @int)) -> @int (@a + @b) |
  Term <- (
    Number <- @int |
    String <- @str
  )
).Term

Here we extracted term "Term" from braces. When a parser gets to "@int" below "Number", it knows that "Sum"-s right side is also "@int" and tries to parse "Sum"-s left side. If it passes, it returns its calculated righ side. Deeper nodes in structure are ignored, if we dont drag them to be siblings of "Term" inside braces. Because we extract only "Term" from braces, its siblings are ignored on regular parse (they get into account only through same-type-and-sibling mechanism).