Is this possible? Has anyone done it?
I'm not thinking so much of natural languages (I know that there do exist human-generated attempts at formal grammars for English) as for reverse-engineering source code. Source code should be easier because you can make some useful assumptions about syntax (matching braces, for example).
My motivation is the need to pretty-print code written in an obscure language (the scripting language for IRAF, CL). I found GPP on the 'net, but if I understand correctly it's not as generic as you might think because it require a context free grammar for the language in question.
This raises another practical issue - it's not clear to me that CL has a useful context free grammar. It was designed back when such things were (I suspect) the concerns only of quiche-eaters and is implemented in yacc with a fair amount of behind-the-scenes management of state. But maybe an approximate grammar is possible (I don't have a good explanation for what "approximate grammar" means).
I suspect that the hardest part of the problem isn't generating *a* grammar, but generating one that is both compact and reflects the semantics of the language.
Of course there are probably simpler solutions to the problem (I'm going to try vgrind with each language in turn and see what happens). But I thought the problem was interesting and someone here might be tickled by it.
Sorry for not posting or reading here recently - new job
|