Programming Languages in the Markup World

The Extreme Markup 2006 conference has finished and the proceedings are up. If you have any interest in the intersection of programming and markup languages, there were a few papers worth looking at. Here's my choice:

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Thoughts on Streaming Combinators

There's a typo in the definition of select, which should be select splitter -> splitter unless suppress.

It strikes me that, when the paper talks about "chunks of input", it is referring to an implicit notion of a stream-of-streams. A splitter sends "chunks" (== elements of the stream-of-streams) to the true output, and delimits them from other chunks with data sent to the false output. In the case of the xml-element splitter, there might not be any data between chunks of input, so the splitter has to cheat by sending empty xml comments into the false output.

Is there a good reason for leaving this stream-of-streams implicit? Perhaps it interferes with the semantics of the nested combinator (which I have yet to get my head around)?

Interface simplicity

There's a typo in the definition of select, which should be select splitter -> splitter unless suppress.

There is an error indeed, but it's in the previous two definitions. They should have been


filter where splitter → if splitter then filter else as-is
filter unless splitter → if splitter then as-is else filter

Quite embarassing.

Is there a good reason for leaving this stream-of-streams implicit?

One very good reason is leaving the streams as simple as possible (but no simpler). Introducing another feature to the interface between components means that every component and combinator must handle the new feature. Or alternatively, you could have different types of streams and then different classes of components that can handle those streams. Either way, more complexity gets added. Some examples of that complexity can be seen in Flow-Based Programming, which has sub-streams and control signals all over the place.

But frankly, I was surprised myself when I discovered how far the framework can get by dealing with flat streams. While extending the framework, I kept expecting to run into a problem which absolutely requires some control signal communication between the components. That may happen yet, of course, but so far I've always managed to find a workaround.

The next challenge will be to design some reqular-expression combinators. The main problem I can foresee now is how to match an empty string. If I overcome that hurdle I should be able to build a better Unix shell.

As for nested combinator, it is rather counter-intuitive. All other flow-control combinators can work with stateless components, in which case they work pretty much the same way as the ancient block diagrams. The nested combinator will produce an infinite loop if it's given two stateless splitters. It can only work if one or both arguments' behaviour depends on the previous input. Other than that tidbit, there's nothing special about it.

I might have found another

I might have found another typo. Should there be an else as-is clause at the end of the definition of descendant-element-named?

One very good reason is leaving the streams as simple as possible (but no simpler). ... Either way, more complexity gets added.

And your Streaming Combinators are a great example of "simple as possible (but no simpler)" in action. I could argue that empty xml comments are really a kind of control signal, but then I would be quibbling over matters of taste and expediency.

Do you think you would have made the same choices if your implementation language had been, say, Haskell?

As for the nested combinator, it is rather counter-intuitive....

After taking another look at the descendent-element-named example, I think I get it. I was confused earlier because I was worried about buffering & synchronisation problems, but that's an issue you address in other parts of the paper.

No, that one is fine

I might have found another typo. Should there be an else as-is clause at the end of the definition of descendant-element-named?

No, that one is ok. Why would you expect else as-is?

If you'd like to examine the source code, it's available here. It's around 1500 lines of OmniMark code. The presentation slides are not on-line yet, but I can send them to you if you're interested.

Do you think you would have made the same choices if your implementation language had been, say, Haskell?

Funny you should ask. I have already started working on a Haskell implementation. It's been rather painful so far, but I hope it gets easier once I pass the initial hurdle of finding the optimal data structures to represent sources and sinks.

To answer your design question, some things are certainly different. OmniMark does not (yet) provide polymorphic sources and sinks: all streams are character streams. The main reason I chose Haskell is its type system. One thing that Haskell will make possible are streams of streams, but I don't know if I'll actually need them yet.

foreach ... then ... else

Why would you expect else as-is?

The rest of the paper lead me to believe that a foreach expression has the form foreach splitter then filter else filter, but in the code I was questioning, there is no else clause:

foreach descendant-element-named "file-path"
then as-is join ((select xml-element-content)
                 >> (select last (!substring "/"))
                 >> (prepend "")
                 >> (append ""))

It's optional

The clauses then and else of if and foreach combinators are optional. If a clause is not specified, the default argument value is as-is. After looking through the paper, I see there's no way to deduce that. Syntactic and semantic details like these were left unexplained for reasons of brevity, but I didn't realize that the examples were relying on some of those details. Sorry.

splitters vs predicates

I don't understand why you define many splitters rather than defining one splitter that takes a boolean stream and a value stream and splits the values based on the booleans. This is how most other series processing systems work (e.g. Lisp Series macros, Haskell (the span function)). This seems to me to be more flexible since boolean streams can be used to split one stream based on another, etc.

[edit: hmm, actually Haskell's span is a counter example since it takes a predicate, not a stream of booleans, and it doesn't really split a list item by item -- is there a Haskell function that does that?]

Shifting the responsibility

I don't understand why you define many splitters rather than defining one splitter that takes a boolean stream and a value stream and splits the values based on the booleans.

But who provides the boolean stream? You would replace N different splitters by one splitter and N functions from value stream to boolean stream. These functions may seem easy to define, but they're not. Check out Continuations and Transducer Composition to see what they'd look like in Lisp.

boolean streams

You would replace N different splitters by one splitter and N functions from value stream to boolean stream. These functions may seem easy to define, but they're not.

I don't see why they are not easy to define. You only need to provide one function that lifts any predicate to a stream processor that takes a stream and returns a boolean stream. This is trivial. It is a mapping.

in Haskell:

boolStream :: (a -> Bool) -> [a] -> [Bool]
boolStream p xs = map p xs

Not quite

Have you looked at the link I posted in my previous reply?

Never mind. I'll explain it here.

The first problem is that a map works only for stateless stream transformers, i.e., those that always map the same item to the same boolean. Most interesting transformers will map an item to a boolean depending on the items preceding it. You can model these using a left fold:

   boolStream :: (acc -> a -> (acc, Bool)) -> acc -> [a] -> [Bool]
   boolStream p acc xs = snd (mapAccumL p acc xs)

Assuming this definition, this is what the second-simplest splitter combinator, >&, might look like:

   type Splitter a acc = (acc -> a -> (acc, Bool), acc)

   (>&) :: Splitter a acc -> Splitter a acc -> Splitter a acc
   (p1, acc1) >& (p2, acc2) = (p3, (acc1, acc2))
      where p3 (acc1, acc2) a = ((acc1', acc2'), b2)
         where (acc1', b1) = p1 acc1 a
               (acc2', b2) = if b1 then p2 acc2 a else (acc2, False)

I haven't tested this code, but I don't think it can get any simpler and I wouldn't call it simple as it stands. And I'm not sure that the non-CPS model can cover all possible cases.

stateful stream mappers

Have you looked at the link I posted in my previous reply?

Yes but I didn't see what you are getting at.
I build stream processors in C for doing real time audio and music composition using something like Protothreads scheduled via trampolines. It is trivial to make stateful stream mappers this way. I compose the streams in a purely functional manner in a Smalltalk like language to run remotely on an audio synthesis server.

I'll just say that I prefer separating predicates from stream splitters, I've done it that way, and leave it at that.

Human readability, as well...

Is there a good reason for leaving this stream-of-streams implicit?

Something has been bugging me since I first replied to this question, and I may have found what it was. Another good reason is that a flat stream can be readable by humans. Now this was certainly not my reason to design the framework this way, and I think the benefits of human readability are overestimated. But it has been cited as an important factor in success of Unix and XML, two designs that have succeeded beyond anybody's expectations. So there must be something in it.

Human readability...

Good point!

I'm still not sure that I agree with you, but I'm so impressed with the overall design that I feel quite churlish for quibbling over minor details.