## Practical: Designing a graph matching language.

I've written a graph pattern-matcher (Common Lisp), and I was wondering if anyone could offer comments regarding primitives or operators to modify or add. Mostly, I tried to extend regexes to graphes (or any data structure that can contain references), and make it easier to extend the language. Maybe I should have posted this on my own webspace and pasted a link. If it is the case, simply tell me and I'll fix that.

Ob-Grocery list of features: circularity-aware (for the graph that is to be matched, not the pattern against which to match - circular code is wrong ;), not biased towards any data structure, easily extensible and, of course, a dreaded sexp-based syntax.

The core of the language is:

• alt: like the | operator in regexes, it provides ways to define failover patterns. (alt f1 f2 f3 ...) tries to match the corresponding subgraph against f1, and, if it fails, tries f2, f3, and so on. (alt) fails automatically.
• and: Succeeds if all of its subclauses match against the subgraph (the subclauses are matched in left to right order). Fails as soon as any of the subclauses fail.
• access: (access accessor pattern) Matches the pattern against the result of applying accessor (any common-lisp function) on the corresponding graph node. Backtracks gracefully when the accessor throws an error.
• pred: (pred predicate) Succeeds if the predicate (a full-blown Common Lisp function) returns true on the graph node, fails otherwise.
• Digression: I also allow a limited form of state in the presence of matching. Basically (so far), I can bind matches to a name, and each bindings only succeeds if all the matches bound to a given name are all deeply equal. For that reason, pred [when its optional second argument is true] can also fail or succeed based on the result of predicates on both the graph node and the names bound (with the subgraphs to which they are bound) so far. I think this could be comparable to one-way unification.

bindpred: (bindpred predicate). Succeeds or fails based on the result of a predicate on the current graph node and the names bound so far. If the predicate succeeds, its second return value is the updated alist of bindings.

And that's it for the primitives. Lazy, one-level deep, macroexpansion (to allow recursive macros) does the rest.

Other operators are defined in terms of the previously described primitives and macroexpansion.

• atom: (atom) matches any atom. (atom foo) matches any atom that is equal to foo.
• any: (any) matches anything. (any type) matches anything that is of that type.
• bind: (bind name pattern) tries to bind "name" to the graph node if "pattern" matches against the node. It expands to:
(and pattern
(%bind-now name))
where %bind-now expands into a bindpred clause. %bind-now does the "tries to bind 'name' to the graph node" part of bind's job description. If there is no previous bindings to "name" or it is deeply equal to the current graph node, it succeeds (and saves a bindings from name to the graph node if needed); otherwise, it fails.
• rep: The star of the show, and the only recursive construct so far. rep is analogous to the (kleene?) star in regexes.

(rep pattern pattern-final &rest accessors) [everything after pattern-final is kept in a list of accessors] matches a graph node against "pattern," and if that is successful, all of that node's children (those that are accessed through the accessors) against "pattern," and so on. When "pattern" doesn't match, it still succeeds if "pattern-final" matches against the same graph node and fails otherwise. rep is implemented purely in terms of macroexpansion: (rep pattern pattern-final accessor1 accessor2 ...) expands into:

(alt (and pattern
(access accessor1 (rep pattern pattern-final accessor1 accessor2 ...))
(access accessor2 (rep pattern pattern-final accessor1 accessor2 ...))
...)
pattern-final)

Note that the expansion is recursive, which is why macroexpansion is lazy.

• cons: The only datastructure-specific operator. (cons &key (car-clause '(any)) (cdr-clause '(any)))

Matches any cons whose car matches car-clause (which defaults to (any), which matches anything) and whose cdr matches cdr-clause (same default). It expands into:

(and (pred consp)            ;;consp is a predefined predicate
(access car car-clause)
(access cdr cdr-clause))
(with the appropriate defaults)

Example usage:

To match any combination of cons with only 2 atoms in it:

(rep (cons)
(alt (bind a (atom))
(bind b (atom)))
car cdr)
It traverses conses' car and cdr, matching any cons. When the current node isn't a cons, it saves the corresponding atom under the names a or b. If "a" is already bound to something that isn't equal to the current node, it tries to bind it to "b". If, again, "b" is already bound to something that isn't equal to the current node, the matching fails. Thus, it can only match up to two different non-cons atoms (I think there is some redundance here ;). Note that when it does match successfully, the bindings' alist is returned as a second return value. No point in wasting all that work!

Now, it obviously wouldn't be very hard to build a recursive function to do the same job -- in fact, this is exactly what the matcher will do. However, making the same function work in the presence of circularity would make this already repetitive job a tad more tedious.

Comments, questions or suggestions (additional operators or another syntax, for example) would be appreciated.

Paul Khuong

EDIT: How _I_ expect to use this: I want to write a compiler that works, as much as possible, through graph rewriting. I hope the matcher can help a lot there.

EDIT2: Fixed the expansion for rep

## Comment viewing options

### Some papers and links along the way.

Diaplan. Graph- and diagramm-based programming language. Papers on Diaplan contain theory about graph-with-variables reduction.

And paper on Graph Query Language. It seem that it does match your work by some parts.

I, when thinking about code generator, also envision structure-query language with graph-like representation.

So, I think you're on the right way.

### Another link

At CP(Graph), constraint programming over Graphs is described. May be of interest to you.

### Yet another...

You might also be interested in SPARQL, the W3C's query language for RDF.

It's hard sometimes to dig through the copious amounts of specification produced by the W3C, but RDF is basically a graph description language based on triples like (node, edge-label, node), and SPARQL is basically a query language that uses patterns to specify graph structure.