archives

Online Learning of Relaxed CCG Grammars for Parsing to Logical Form

Online Learning of Relaxed CCG Grammars for Parsing to Logical Form, Luke S. Zettlemoyer and Michael Collins. Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007.

We consider the problem of learning to parse sentences to lambda-calculus representations of their underlying semantics and present an algorithm that learns a weighted combinatory categorial grammar (CCG). A key idea is to introduce non-standard CCG combinators that relax certain parts of the grammar—for example allowing flexible word order, or insertion of lexical items— with learned costs. We also present a new, online algorithm for inducing a weighted CCG. Results for the approach on ATIS data show 86% F-measure in recovering fully correct semantic analyses and 95.9% F-measure by a partial-match criterion, a more than 5% improvement over the 90.3% partial-match figure reported by He and Young (2006)

This paper isn't exactly PL, though Ehud has been okay with the odd foray into computational linguistics before. I thought it was interesting to see machine learning work make use a typed formalism like categorial grammars to handle long-range dependencies, and it leaves me wondering if it's possible to set these kinds of techniques onto program analysis problems.

One neat thing about the CCG formalism is that you have parsing, which is described more or less as a typechecking problem, and separately you have the semantic constraints, which are basically lambda-calculus terms that build up a term in first-order logic. So I can imagine doing something like writing down how you're supposed to use a library as semantic constraints that add up to a Prolog program, and then at compile time the compiler walks the typing derivation to construct the program, and then runs it to figure out if you're doing something dumb (according to the library designers). I don't know if that actually makes sense, but this work certainly prompted me to think.