Real-life use case - which PLs support it nicely?

I am going to initiate a marginally on-topic discussion, Ehud - please correct me if it is not on-topic at all.

Let's say I have a well-defined binary file format1, consisting of structural elements of different granularity ranging from unsigned ints to tagged blocks to sequences to complete files, and I want to express this structural elements in my PL, covering at least reading the file, analyzing/transforming it at the level of structural elements (as opposed to bits/bytes), and writing it back. An example would be optimizing Flash file for size, while preserving its functionality.

Additional points for:

  1. Compile-time checking to some extent (yes, it's unfair)
  2. Natural support of tagged blocks - e.g., block with tag 22 must have this type, while block with tag 42 must have that type (explicit "switch" does not count)
  3. Natural support of bit-fields (explicit bit-shifting does not count)
  4. Ditto with variable bit-field length (Byte n; BitField[n] x; BitField[x] y)
  5. Symmetric serialization/deserialization specification (ideally this should be completely defined by the structure of type)
Well, I realize that this sound too much like a DSL for describing file formats, but the question is - which of general-purpose PLs come sufficiently close to this (very informal) specification? Is dependent typing the most natural solution for variable length of fields and tagged blocks?
1If you are familiar with SWF you may think of this hypothetical format as SWF (if not, think RIFF, MIDI, etc.).

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

On topic

This is on topic for LtU so long as the discussion remains about pl ocnstructs and not the vaibility of binary file formats.

I suggest you look at PL/I based attirbute and file types and routines. Quite old fashioned but also quite elaborate...

Erlang bit syntax could be wh

Erlang bit syntax could be what you are looking for.

Looks interesting

...even though not statically typed :-)

I would prefer type-driven approach better because (at least theoretically) it would be more usable in specifications of formats (as opposed to implementations).

The current state of art seems to be using something like ol' good C struct/union with free-form comments describing intended constraints.

As a side note - why not separate the issue in two, and say that abstract structure (kinda semantic) is defined separately from the mapping to bits (kinda syntactic)? I guess because too often the process of reading bits must be guided by interpretation of abstract structure (e.g., the value of the first field denoting number of bits in the second field, which value denotes the number of bits in the third value, etc.). That's why I am considering dependent types. Then again, will it really make standards more accessible (and of course I have no influence over standardization organizations, but let's think of this as gedankenexperiment)?

Practical Common Lisp

Isn't this exactly what the MP3-chapters in Practical Common Lisp are about? No compile time checking in CL, though :-).

The Next 700 Data Description Languages & PADS

Andris Birkmanis: Well, I realize that this sound too much like a DSL for describing file formats...

And the problem with that would be...?

Given your own perfectly valid (IMHO) characterization of your question, let me recommend The Next 700 Data Description Languages and PADS.

Update: Andris Birkmanis: Is dependent typing the most natural solution for variable length of fields and tagged blocks?

From "The Next 700 Data Description Languages:"

At the heart of our work is a data description calculus (DDC), designed to capture the core features of data description languages... We base our calculus on a dependent type theory because as we have seen, it is common in data description languages for expressions to appear within types.

Wow

Looks like what I needed, thanks a lot!

I might still return after reading the paper (mwahaha) :-)

[on edit: I meant the DDC paper]

rebol

PADS is great, but you might also like to take a look at Rebol, Rebol has a lot of great ways of working with bitsets
some examples:

matching types shows matching of bitsets as well
http://www.rebol.com/docs/core23/rebolcore-15.html#section-5

some bitset tools written in rebol
http://www.codeconscious.com/rebol/scripts/bitsets.r

some tips
http://www.codeconscious.com/rebol/tips-and-techniques.html#binary!

not to say however that Rebol does not have problems, although I think these problems are mainly cultural.