Facebook open sources "Infer", static program analysis tool

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

This is the company they bought, right?

This is the company they bought, right?

Yes

Judging from the authors of the blog post, this is indeed from Monoidics.

It says it in the linked post.

Please read. Thanks.

Context

This was a question meant to be phrased within the context of LtU because it came up during the discussion of a paper I don't like.

Tell your servants how to behave. Thanks.

Seems to track a lot of properties you solve in C++ with RAII

I wouldn't know. "Yet another static analyzer?" Does it do more than other analyzers? And a lot of the things it tracks don't occur when you write 'modern' C++ and follow RAII rules.

I don't use null pointers, I don't use raw pointers, I never use malloc or new, all resources are destroyed on destruction? (Oh, and I loop over vectors with C++ new for loop exclusively?)

Probably RAII doesn't solve everything though; I am not that good a C++ programmer.

Tracking race conditions are nice I guess, but I wonder whether those too won't be fixed by adhering to specific software principles.

I'll see whether it makes any inroads into global software engineering practice. I take it as a wait-and-see tool.

Um

Most programs have non-trivial resource acquisition semantics.

Yah

But will it deduce any problems given the non-trivial semantics of those programs?

The examples given are trivialities, hardly occur in C++ using RAII (might occur in C though), and in the website the authors discuss at length the possibilities of false positives and all the semantic checks the tool doesn't catch or do.

Scale


Facebook Infer uses logic to do reasoning about a program's execution, but reasoning at this scale — for large applications built from millions of lines of source code — is hard. Theoretically, the number of possibilities that need to be checked is more than the number of estimated atoms in the observable universe. Furthermore, at Facebook our code is not a fixed artifact but an evolving system, updated frequently and concurrently by many developers. It is not unusual to see more than a thousand modifications to our mobile code submitted for review in a given day. The requirements on the program analyzer then become even more challenging because we expect a tool to report quickly on these code modifications — in the region of 10 minutes — to fit in with developers' workflow. Coping with this scale and velocity requires advanced mathematical techniques. Facebook Infer uses two such techniques: separation logic and bi-abduction.

They are solving a problem of scalability.

Also, consider: RAII does not dictate life time of objects, it only limits objects to scope.

I'll see it when it emerges

Coping with this scale and velocity requires advanced mathematical techniques.

God man. I have taught how to handle resource allocation in RTOS'es and did formal verification of small algorithms. I've also seen hundreds, if not thousands, of applications written by students and (mostly small) enterprises.

It's a non-trivial problem and it's not clear 'advanced mathematical techniques' will help a lot.

If anything, 'advanced mathematical techniques' prove complexity results which indicate they can't solve SE problems since you're bound to be solving an intractable or undecidable problem. Resource allocation questions are best solved with solid software engineering, not math.

I'll see where it ends but I am not overly enthusiast yet.

yes and no

I want both. I want the developers to suck less inherently, and I also want there to be massive investments in safety nets like these static checkers. Maybe if the static checker knew there was a "design pattern" that would have prevented the need for the checker, then it could demote the programmer / shock them.

You have math against you

Yeah, that and a pony too. But it remains, there are lots of complexity results which indicate that what you want is blocked by intractability and undecidability.

Better languages, better libraries, better software engineering are a far safer bet than hoping that math will 'magically' solve engineering problems since the complexity results indicate math can't.

Even the verification of trivial protocols cannot be done mechanically but has to be done by interactive proof tools and can take years. And you still don't prove a lot of interesting properties.

It's nice research, these static checkers, but it's very unlikely it'll scale to anything substantive in the near term future.

concur

well, yes, i'd like the languages and systems to not suck in the first place other than only wrt those things which are intractable. :-)

Checking patterns

Maybe if the static checker knew there was a "design pattern" that would have prevented the need for the checker, then it could demote the programmer / shock them.

It would be nice, but the approach that gets closest to that result doesn't scale very well. Pick a particular flavour of expressing the pattern over code: abstract semantics / theorems / model-checking etc. Pick a fresh faced engineer who seems to be suitable to picking it up. Leave them on it for about five years, award them a PhD if they found anything of general interest while designing this one pattern. Suddenly their marketability goes through the roof and so perhaps they don't stick around to design a second pattern. Goto step 1.

Pattern matching on code would be insanely useful in other settings as well. Regex scales nicely to trees (e.g. xpath), but becomes difficult to express over directed graphs. Converting a graph pattern to a set of matching paths to see it hits on a particular program is hard. Approximating it leads to the problem of removing infeasible paths from the solution to make it precise enough to be useful.

People are relatively easy though, hence the SE people chasing low hanging fruit. It is the most pragmatic way to bring down the defect rate in software. "Don't do X otherwise we will fire you" can lead to an enlightening and productive discussion about what X is and how it occurs in many different scenarios. Achieving that useful level of interaction with static analysis seems to be tricky: specialise the analyser enough to pick up on process-specific defects, and in a timeframe shorter than "years".

Examples are always interesting. The ones in the blog posting were too simplistic and trivial to convey anything about what Infer can do. The bi-abduction technique looks very interesting although I don't really see the importance of the Framing Rule (and I have no paywall access at the weekend). It would be good to see an example of what they can achieve using these techniques that would be difficult in a standard Abstract Interpretation.

Edit: so many years and I still can't use a basic reply form properly. So sad. This was supposed to be a reply to raould above, as the quote seems completely out of context now.

XPath on directed property graphs

Regex scales nicely to trees (e.g. xpath), but becomes difficult to express over directed graphs.

It's worth checking out Gremlin, which supports XPath queries against a "property graph".

Thanks

Fantastic! I had not come across Gremlin or SPARQL before. It seems that there has been some real progress in this area over the past decade. From (a very quick) read over the docs its seems that there is a gap between the two. SPARQL names the graph vertices as variables to be instantiated by matching the pattern. It produces a very expressive syntax for the pattern, but limits the pattern matches to constant-sized regions on the graph. Gremlin can describe arbitrary regions, but it uses a Turing-complete language to do so, resulting in a syntax is that is quite cumbersome and procedural.

I am now quite curious how big this gap is. Are there any more powerful languages than SPARQL, e.g. variable sized patterns, transitive closures etc that are less than Turing complete. Alternatively are there any proofs about classes of patterns that cannot be matched in a Turing incomplete language?

Datalog

You should read up on Datalog, which is quite expressive for graph queries. Datomic's particular flavor of datalog is approachable.

Been a while

Last time I saw Datalog it was a simplified version of Prolog suitable for data descriptions rather than programming. I assume that the graph is encoded directly as vertex and edge predicates, then patterns are encoded so that the SLD tree binds the match in the graph structure. How does this scale for transitive queries, is there a complexity issue from backtracking?

Check out

Check out http://event.cwi.nl/grades2015/02-nguyen.pdf to see how a modern implementation of datalog performs on graph queries.

Wow!

A very impressive piece of work. I need to spend some time reading it very thoroughly, but given the algorithm and the difference in timings for the query set that is something I will need to download and investigate. Thanks for the tip.

prior work

Last static analyzer for Android that I found was Amandroid:

http://amandroid.sireum.org/

Can also benefit from the various Java tools. Objective-C might benefit from methods in Frama-C or Astree. For now, glad I don't develop on iOS as there's not much available for verification.