Lambda the Ultimate

inactiveTopic Semantics-based Filtering: Logic Programming's Killer App?
started 2/20/2002; 3:11:21 AM - last post 2/21/2002; 3:04:20 PM
Ehud Lamm - Semantics-based Filtering: Logic Programming's Killer App?  blueArrow
2/20/2002; 3:11:21 AM (reads: 2346, responses: 5)
Semantics-based Filtering: Logic Programming's Killer App?
Gopal Gupta, Hai-Feng Guo, Arthur Karshmer, Enrico Pontelli, Desh Ranjan, Bryan Milligan, N. Datta, O. El Khatib, M. Noamany, and X. Zhou. PADL02, LNCS 2257, pp. 82-100, 2002.

The talk was dedicated to semantic filtering: a translation of a document from one formal notation to another while preserving its meaning. Examples are translating: code from Fortran to C; an image from an XWD to the TIFF formats; a database query to a query for a differently-structured database. In a simpler case, semantic filtering is porting, e.g., a C program from Unix to Windows.

Thanks, Oleg!


Posted to Logic/Declerative by Ehud Lamm on 2/20/02; 3:22:51 AM

Ehud Lamm - Re: Semantics-based Filtering: Logic Programming's Killer App?  blueArrow
2/20/2002; 3:14:44 AM (reads: 1620, responses: 0)
The talk was dedicated to semantic filtering: a translation of a document from one formal notation to another while preserving its meaning. Examples are translating: code from Fortran to C; an image from an XWD to the TIFF formats; a database query to a query for a differently-structured database. In a simpler case, semantic filtering is porting, e.g., a C program from Unix to Windows.

Semantics of the source language -- to be preserved during the translation -- is represented compositionally by Horn clauses. This makes the semantics directly executable by a logic programming system, and permits automatical generation of provably correct translators. In many cases, a reverse translator can also be generated automatically from the same specification.

The paper introduces the approach and describes it in more detail on concrete applications, such as translating a Nemeth Math Braille notation to LaTeX, translating HTML to VoiceXML, and providing interoperability between several bioinformatics genome analyses packages.

Semantics of the source language to be preserved during translation is denotational semantics. As the paper explains, denotational semantics has three components:

  • mapping from a concrete document to its parsed tree (syntax)
  • mathematical objects that express the meaning -- typically sets or lattices of values (semantic domain)
  • mapping from parsed tree to elements of semantic domain (valuation function).

In the approach of the paper, the valuation function is given as a set of Horn clauses, and semantic domain is the abstract or concrete syntax of the target language. Thus the meaning of the source language is given in terms of the target language.

The translating procedure is then as follows: parse the source document into its Source_AST, prove a goal 'ast_relation(Source_AST,Target)' and pretty print the 'Target' (AST).

The syntax of the source language is given in Prolog's Definite Clause Grammars (DCG). Thus both the syntax and the semantics of the source language are specified declaratively as Horn clauses. The specification can be "executed" by a Prolog system, which is tantamount to translation.

Since the denotational program (which expresses the meaning of one AST in terms of the other) is a logical program, it can be used for verification of the source. The reverse translator is also possible.

Prolog's definite clause grammars (DCG) can handle context-sensitive languages. As the paper notes, many languages created by domain experts (rather than computer scientists) turn out context-sensitive.

A translation from a Nemeth Math Braille notation to LaTeX is a very impressive application. The extremely convoluted Nemeth Math Braille notation was first specified in 1951. The notation is very context sensitive, to let the reader know as much of the context of a phrase as possible. For example, pow(x,y)+1 will be written in Nemeth Math Braille notation as x^y"+1 (the double-quote means returning to the base level), whereas pow(a,(pow(x,y)+1))+2 will be written a^x^^y^+1"+2. The notation specified as several hundred examples, no formal description has existed. Machine translation from the full Nemeth Math Braille notation was considered _impossible_. With the logical programming approach, a translator to LaTeX was built only in 3-4 man-months of work!

Another impressive application discussed in the paper is making several genome analyses packages inter-operate (see www.swbic.org for background). A typical analysis includes querying of a database of genetic information, aligning of sequences and inferring of phylogenic information. The packages that run each stage read and write generally incompatible data formats. Logic programming has built several _bi-directional_ translators, using NEXUS as an internal format. Parsing from NEXUS was challenging: NEXUS was developed by computational biologists to be human readable and highly general. As the result, the format is horrific. For example, NEXUS permits nested comments and comments inside tokens! DCG grammars proved sufficient; but they had to explicitly resort to non-determinism [guessing and backtracking]. The NEXUS translator was produced in 6 man-month. It was the first complete NEXUS translator. As the presenter mentioned, the translation speed is good: translating 100K of bioinformatics data from one format to another takes seconds.

The paper also talks about HTML to VoiceXML conversion. However, they used a subset of HTML (I believe with explicitly closed tags). They're still working on a translator from the full HTML (where closing tags can be omitted).

The paper concludes that

  • the speed of creating translators (a matter of several man-months for what previously considered intractable data formats),
  • the ability to get the reverse translator for free
  • provable correctness of generated translator
  • specification of the language semantics that is rather suitable to machine analyses and deductions could make semantic filtering a killer application for logic programming.

Related approaches:

S.Stepney. High Integrity Compilation. Prentice Hall, 1993. Building provably correct compilers by declaratively specifying mapping between parse tree patterns and machine instructions.

Non-LALR parser generators: ANTRL, a pred-LL(k) parser generator by T. Parr: www.antlr.org

-- Oleg

Jay Han - Re: Semantics-based Filtering: Logic Programming's Killer App?  blueArrow
2/20/2002; 12:38:22 PM (reads: 1525, responses: 0)
FWIW, a white paper from http://www.galois.com/technology/whitepapers.htm

Legacy Translation White Paper

Abstract

Galois Connections has developed a suite of legacy code translation tools, enabling companies to solve their legacy code problems quickly and cheaply, without also abandoning the knowledge embedded in legacy systems. Because these tools are semantically aware, they are capable of translating the vast majority of legacy code automatically, requiring only minimal intervention from a maintenance engineer, reducing the time and cost of legacy code translation to affordable levels.

Brent Fulgham - Re: Semantics-based Filtering: Logic Programming's Killer App?  blueArrow
2/21/2002; 10:33:42 AM (reads: 1399, responses: 0)
Is this paper available anyplace on-line? I did some quick searching on Google, but didn't come up with anything.

Oleg - Re: Semantics-based Filtering: Logic Programming's Killer App?  blueArrow
2/21/2002; 2:41:36 PM (reads: 1361, responses: 1)
The paper is available online, but not for free: http://www.springer.de/cgi-bin/search_book.pl?isbn=3-540-41768-0

However, the primary author's website http://www.utdallas.edu/~gupta/
points out to a number of papers and manuscripts that cover various aspects of the PADL presentation, in particular
http://www.cs.nmsu.edu/~gupta/sem.html
http://www.cs.nmsu.edu/~gupta/assistive.html

Ehud Lamm - Re: Semantics-based Filtering: Logic Programming's Killer App?  blueArrow
2/21/2002; 3:04:20 PM (reads: 1431, responses: 0)
Those that have a Springer Link account can get the papers here.