binpac: A yacc for Writing Application Protocol Parsers.
R. Pang, V. Paxson, R. Sommer, and L. Peterson. ACM Internet Measurement Conference. October 2006.
A key step in the semantic analysis of network traffic is to parse the traffic stream according to the high-level protocols it contains. This process transforms raw bytes into structured, typed, and semantically meaningful data fields that provide a high-level representation of the traffic. However, constructing protocol parsers by hand is a tedious and error-prone affair due to the complexity and sheer number of application protocols. This paper presents binpac, a declarative language and compiler designed to simplify the task of constructing robust and efficient semantic analyzers for complex network protocols. We discuss the design of the binpac language and a range of issues in generating efficient parsers from high-level specifications. We have used binpac to build several protocol parsers for the "Bro" network intrusion detection system, replacing some of its existing analyzers (handcrafted in C++), and supplementing its operation with analyzers for new protocols. We can then use Bro's powerful scripting language to express application-level analysis of network traffic in high-level terms that are both concise and expressive. binpac is now part of the open-source Bro distribution.
Binpac nicely abstracts away issues such as large numbers of concurrent, asynchronous parsing processes and protocol specifics (such as HTTP's chunked encoding). A parser for a large part of HTTP is presented in the paper and fits on half a page. The authors have also written parsers for CIFS/SMB, DCE/RPC, DNS, NCP, and Sun/RPC.
Parsers written using our library are remarkably similar to BNF; they are almost entirely free of solutionspace (i.e., programming language) artifacts. Our system allows the grammar to be specified as a separate class or mixin, independent of tools that rely upon it such as parsers, syntax colorizers etc. Thus, our grammars serve as a shared executable specification for a variety of language processing tools. This motivates our use of the term executable grammar. We discuss the language features that enable these pleasing results, and, in contrast, the challenge our library poses for static type systems.
Executable Grammars in Newspeak (pdf) Gilad Bracha
Executable Grammars in Newspeak JAOO 2007 slides (pdf)
F. Vahid. It's Time to Stop Calling Circuits "Hardware". IEEE Computer Magazine, September 2007.
The advent of field-programmable gate arrays requires that we stop calling circuits “hardwareâ€
and, more generally, that we broaden our concept of what constitutes “software.†...
Developing modern embedded software capable of executing on multiprocessor and FPGA platforms requires expertise not just in temporally oriented modeling (W comes after X) like writing sequential code but also in spatially oriented modeling (Y connects with Z) like creating circuits.
An interesting take on where programming should be heading in the future -- and consequently, where programming languages should also be heading. This article is somewhat related to the recent discussion here on LtU about FPGA CPUs. As the excerpt above illustrates, Vahid draws a distinction between what he calls "temporally-oriented" computing, which focuses on sequence, and "spatially-oriented" computing, which focuses on connectivity of components. His basic argument is that traditional programming languages (and traditional programming education) focus on temporally-oriented computing, but that the growing use of FPGAs as an integral part of many systems (particularly embedded systems) necessitates a greater emphasis on programming in a spatially-oriented mode. We don't tend to talk too much about "hardware description" languages like VHDL and Verilog here on LtU, but perhaps they are the answer (or at least part of the answer) to Ehud's recent question about which languages we should be discussing to "stay ahead the curve".
Derivation and Evaluation of Concurrent Collectors, Martin T. Vechev, David F. Bacon, Perry Cheng, and David Grove. ECOOP 2005.
There are many algorithms for concurrent garbage collection, but they are complex to describe, verify, and implement. This has resulted in a poor understanding of the relationships between the algorithms, and has precluded systematic study and comparative evaluation. We present a single high-level, abstract concurrent garbage collection algorithm, and show how existing snapshot and incremental update collectors, can be derived from the abstract algorithm by reducing precision. We also derive a new hybrid algorithm that reduces floating garbage while terminating quickly. We have implemented a concurrent collector framework and the resulting algorithms in IBM’s J9 Java virtual machine product and compared their performance in terms of space, time, and incrementality. The results show that incremental update algorithms sometimes reduce memory requirements (on 3 of 5 benchmarks) but they also sometimes take longer due to recomputation in the termination phase (on 4 of 5 benchmarks). Our new hybrid algorithm has memory requirements similar to the incremental update collectors while avoiding recomputation in the termination phase.
Status Report: The Manticore Project. Along the lines of Concurrent ML comes a new language design that is aimed at the parallelism of multicore CPUs.
The Manticore project is an effort to design and implement a new functional language for parallel programming. Unlike many earlier parallel languages, Manticore is a heterogeneous language that supports parallelism at multiple levels. Specifically, we combine CML-style explicit concurrency with fine-grain, implicitly threaded, parallel constructs. We have been working on an implementation of Manticore for the past six months; this paper gives an overview of our design and a report on the status of the implementation effort.
Interesting material in terms of approach and implementation based on an ML style language (sans mutable variables). But the underlying assumption that language design offers the best path to solving parallelism is probably the key appeal for LtU:
The problem posed by such processors is how to effectively exploit the different forms of parallelism provided by the hardware. We believe that mechanisms that are well integrated into a programming language are the best hope for achieving parallelism across a wide range of applications, which is why we are focusing on language design and implementation.
And as long as we're on the subject, I see that Reppy's book on Concurrent ML is now in paperback form (moving it more into my price range).
I am in the process of relocating to the United States for a couple of years (we'll be stying in the Palo-Alto area), so I will probably be less active around LtU in the next couple of weeks until I settle in. I also expect it will take me more time to respond to emails.
Witnessing Side-Effects, Tachio Terauchi and Alex Aiken. ICFP 2005.
We present a new approach to the old problem of adding side effects to purely functional languages. Our idea is to extend the language with “witnesses,†which is based on an arguably more pragmatic motivation than past approaches. We give a semantic condition for correctness and prove it is sufficient.We also give a static checking algorithm that makes use of a network flow property equivalent to the semantic condition.
If I understand the idea in this paper correctly, you take a functional language with a possibly non-deterministic or parallel execution model, and then add references to it. To keep this from being impossible to reason about, you add dataflow tokens to each reference operation (assignment, allocation, reading) to ensure that they don't happen until each op's predecessors have happened -- and you make the tokens first class, so that the programmer can directly specify the amount of serial execution needed. Then you can do an analysis to ensure that the reduction is confluent, which means that you have no races.
Shape Analysis with Structural Invariant Checkers, Bor-Yuh Evan Chang, Xavier Rival, and George C. Necula. SAS 2007.
Developer-supplied data structure specifications are important to shape analyses, as they tell the analysis what information should be tracked in order to obtain the desired shape invariants. We observe that data structure checking code (e.g., used in testing or dynamic analysis) provides shape information that can also be used in static analysis. In this paper, we propose a lightweight, automatic shape analysis based on these developer-supplied structural invariant checkers. In particular, we set up a parametric abstract domain, which is instantiated with such checker specifications to summarize memory regions using both notions of complete and partial checker evaluations. The analysis then automatically derives a strategy for canonicalizing or weakening shape invariants.
I saw a talk by Evan, and thought this approach is fundamentally the right way to go, because the proper sharing behavior of heap data is information that's integral to the design and not at all in the code. So programmers should be able to supply this information to their analyses.
OMeta: an Object-Oriented Language for Pattern Matching
A new paper by Alessandro Warth and Ian Piumarta, related to the Reinvention of Programming project:
This paper introduces OMeta, a new object-oriented lan-
guage for pattern matching. OMeta is based on a variant of
Parsing Expression Grammars (PEGs) [5]—a recognition-
based foundation for describing syntax—which we have
extended to handle arbitrary kinds of data. We show that
OMeta’s general-purpose pattern matching provides a nat-
ural and convenient way for programmers to implement
tokenizers, parsers, visitors, and tree transformers, all of
which can be extended in interesting ways using familiar
object-oriented mechanisms. This makes OMeta particularly
well-suited as a medium for experimenting with new designs
for programming languages and extensions to existing lan-
guages.
"Each chapter describes a particular aspect of the P-code system, each section discussing a particular procedure or group of procedures, sometimes preceded by an explanation of the data-structures used." Pascal Implementation
|
Recent comments
3 weeks 6 days ago
4 weeks 4 hours ago
4 weeks 1 day ago
4 weeks 1 day ago
4 weeks 6 days ago
4 weeks 6 days ago
4 weeks 6 days ago
8 weeks 2 hours ago
8 weeks 5 days ago
8 weeks 5 days ago