Learn python to implement a complicated static code analyzer

I will have to implement a complicated static code analyzer.

I have been using SOOT. It is not hard but but it seems time-consuming to get acuqinted with their large amount of API.

I believe a easier way is to use Python,
(Look at this link
http://suif.stanford.edu/~courses/cs243/hw6/hw6.html )

But I really do not want to start from "Helloworld. " Would you PL guys who happen to know Python tell me where should I start to learn Python for this usage?


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

GCC Python Plugin

GCC Python Plugin includes a quite sophisticated static code analyzer. Look at libcpychecker/absinterp.py for example.


Does than mean an extensive knowledge in C/C++?

Actually, I have been using JAVA to analyze JAVA source. I would like to use Python to perform the same task, because SOOT's API seem too large and incur a long learning curve. My goal is to prototype and implement and test some new ideas. So where would be a good starting point to learn Python for this specific purpose?

Soot, Java, and size

Soot is not based around working with Java, it's based around working with an IR above bytecode. One IR is a bit like a high level register based 3 address assembly, another is basically the same thing but with SSAisms, and the third is a very thin veneer above Java bytecode.

If you want to do static analysis on Java source then it's the wrong tool to use. I haven't used FindBugs, but it is all about static analysis of Java source and lets you write your own analyses.

If an IR is appropriate for you then give Soot another look. I've used Soot to prototype an optimizing compiler backend which meant doing both analaysis and code gen. Soot is not as huge as it seems at first. The trick is to ignore the IRs that aren't important for your needs.

Picking a program analysis framework

Picking a program analysis framework is not really about the language that you will use (at least not as long as it is Python vs Java vs yet-another-imperative language). In most cases, your custom analysis that you want to develop depends on other program analyses, such as type information, points-to analysis, call graphs, etc. Framework like Soot implement such foundational analyses so that you don't have to write them (and you really don't want to write those for complicated languages like Java, unless you have way too much time on your hands).

So, what you need to do is figure out what foundational program analyses you need for your own analysis, and then pick a framework that implements those best.

The Stanford course notes that you refer to do not really use Python for program analysis btw. The invocation you see there on the web page is only the main driver of the program analysis, which actually invokes Java code. The core of the analysis is implemented in yet another language, a dialect of Datalog. BDDBDDB is a Datalog engine that interprets those Datalog rules.

(Shameless plug: You might like Doop, which is work by Yannis Smaragdakis and me. See http://doop.program-analysis.org . Doop is based on Datalog as well, but uses the Datalog engine of LogicBlox, the company I work for these days)

Thank you all for being so helpful!

perhaps consider MELT

You might also consider MELT on gcc-melt.org which is a high-level lispy like domain specific language to extend GCC. It gives you access to many internal GCC representations, while giving you the power of a high-level lisp-like language (object, reflection, functional programming style, efficient garbage collection, etc...). It provides you with powerful pattern matching facilities.

And you can also put chunks of C code in MELT (since it is translated to C).

To really take advantage of MELT (or even of GCC Python plugin), you'll need to understand some of GCC internals (and that is probably the most time-consuming).

PS. I am the main developer of MELT.