binpac: A yacc for Writing Application Protocol Parsers

binpac: A yacc for Writing Application Protocol Parsers.

R. Pang, V. Paxson, R. Sommer, and L. Peterson. ACM Internet Measurement Conference. October 2006.

A key step in the semantic analysis of network traffic is to parse the traffic stream according to the high-level protocols it contains. This process transforms raw bytes into structured, typed, and semantically meaningful data fields that provide a high-level representation of the traffic. However, constructing protocol parsers by hand is a tedious and error-prone affair due to the complexity and sheer number of application protocols. This paper presents binpac, a declarative language and compiler designed to simplify the task of constructing robust and efficient semantic analyzers for complex network protocols. We discuss the design of the binpac language and a range of issues in generating efficient parsers from high-level specifications. We have used binpac to build several protocol parsers for the "Bro" network intrusion detection system, replacing some of its existing analyzers (handcrafted in C++), and supplementing its operation with analyzers for new protocols. We can then use Bro's powerful scripting language to express application-level analysis of network traffic in high-level terms that are both concise and expressive. binpac is now part of the open-source Bro distribution.

Binpac nicely abstracts away issues such as large numbers of concurrent, asynchronous parsing processes and protocol specifics (such as HTTP's chunked encoding). A parser for a large part of HTTP is presented in the paper and fits on half a page. The authors have also written parsers for CIFS/SMB, DCE/RPC, DNS, NCP, and Sun/RPC.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

PADS

PADS is a similar idea:

http://www.research.att.com/viewPublication.cfm?id=659
http://www.research.att.com/viewPublication.cfm?id=785

The papers themselves are only behind the ACM DL, I believe.

Ragel

Is this similar to how Mongrel has been using Ragel as an HTTP protocol parser?

http://www.zedshaw.com/tips/ragel_state_charts.html

Similar, but not similar; would this work for log parsing?

You think that this tool could be used for log parsing also?