User loginNavigation |
Literature on recovering grammars?Hi, for awhile now I've been interested in how one can generate a grammar from a corpus of source code. I have a vague idea as to how I might go about it and I was hoping to get pointers to more info. The basic idea is that we try to find the smallest "regular expression" that can accept samples from our corpus, where "regular expression" actually means "can parse a context-free grammar". We would start by finding the strings that occur over and over again, and we would match those by a simple string matcher. Then we would attempt to find a regexp that characterises the stuff that occurs between the commonly occuring substrings. I am much less certian how to do this part. the ideas that I have are too vague to describe. Another thing that I think is important, is that we would address the complexity issue by first running our grammar recoverer on a small sample. The idea is that the small sample would do the bulk of the work in discovering the grammar, while later iterations would do less work because they are only refining the work that was done before. I think this could be done by running the recoverer and seeing how it fails, doing only enough work to refine the recoverer into a working one. Anyways, I know this is all vague, but can anyone point me to research on how to recover a grammar for a language from samples of text in that language? I am going to a nearby college to check out some books on formal languages but I'm doubtful that I'll find what I'm looking for. By Holgly Morgan at 2007-02-24 01:10 | LtU Forum | previous forum topic | next forum topic | other blogs | 6050 reads
|
Browse archives
Active forum topics |
Recent comments
23 weeks 22 hours ago
23 weeks 1 day ago
23 weeks 1 day ago
45 weeks 2 days ago
49 weeks 4 days ago
51 weeks 1 day ago
51 weeks 1 day ago
1 year 1 week ago
1 year 6 weeks ago
1 year 6 weeks ago