What is the best literate programming tool/environment/research work?

I'm building an IDE to support our next generation architecture at work, and believe a better documentation tool than NDoc would be nice.

When talking about documentation, the first thought that comes to mind is "literate programming". Currently I'm reading three of the four full-length books on the subject, I've already read the Literate Programming FAQ as well as Marc van Leeuwen's Literate Programming in C CWEBX manuscript.

I don't have time to endlessly evaluate prior art in this space, and was hoping there was a language guru here at LtU who knows a lot about this esoteric area of language research: making code more like english prose and the prose more like code, to the point of unifying codes and comments.

To be clear, the tool doesn't have to be called a "literate programming tool" to qualify as such! Even if it is only a partial idea for LP, if it is really cool and slick, then I want to demo it. A good example is Emacs' MMM-Mode.

I love examples! TIA

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

That's easy

What is the best literate programming tool/environment/research work?


PLT Scheme now support Literate Programming


LP better for interface than for implementation code.

In my experience, literate programming (which I used once for a one year project) is better suited for interface code (ie *.h in C++ or C or *.mli in Ocaml) than for implementation code (*.c in C, *.cc in C++, *.ml in Ocaml).

Literate programming is important for API description. It is much less useful in practice for code. And once your code is "literate" going backwards to straight code is not very easy.

If you are designing a language and its IDE, I would suggest that having formatted comments is very useful.

My common experience when

My common experience when discussing my design pains with other programmers is apathy or confusion, a sort of "What are you after here?"

My current environment doesn't have the same problems as other environments, but its avoidance/solution to those problems creates a host of new problems.

However, you have done a good job summarizing current limitations of literate programming. I think the biggest bottleneck is what I call the "Knuth model of LP". I don't think we should be doing LP in terms of WEB, TANGLE, and WEAVE -- I feel these are tremendous design hacks that are largely the product of our poor integration of concerns between documentation and coding. Especially as I believe live programming languages are extremely valuable from a competitive standpoint, I want an LP tool that integrates painlessly with a live programming environment.

code comments versus literate style ..

I use doxygen for documenting my C/C++ code which is more along the lines of comment formatting that you suggest. When someone else needs to look at the code, understand and debug it, I've found it adequate to have good comments interleaved with the code.

However, when it comes to documenting an algorithm, it usually happens that the code evolves on its own while the documentation lags behind. This is a problem that the literate programming style is supposed to solve. (Whether it does is stuff for debate.) For this purpose, I've also used fairly extensive documentation at the implementation level just so I can edit both in the same place.

i don't get it

i mean, isn't literate programming just an example of the definition of insanity: doing something that doesn't work ever over, in fact just doing more of it?

my pie in the sky feeling

is that it won't make much sense until there is some way for the natural language parts to be auto checked against the code to show when things get out of sync.

On my desk, I have a copy of

On my desk, I have a copy of Lanczos' Linear Differential Operators. Somehow this book is still good 50 years after the fact that, and I can't compile any of the "code" in it.

Likewise, MIT students aren't throwing themselves off buildings because Strang's linear algebra textbook contains theorems they can't auto check.

The real point behind a good documentation system is to simply make any inconsistencies obvious. Javadoc was the first industry-wide tool to advocate putting comments in the source code, and studies later showed it was a wise choice. Javadoc comments are just one "data structure" for relating code to comments, and vice versa. And it's a data structure that hasn't improved that much. NDoc is the best advancement so far, and it's not that special. Why? B/c most of the advances NDoc has made has nothing to do with the data structure, but instead munging that data structure.

I also think "natural language parts [needing] to be auto checked against code" is silly. Those 80+ page specs we give clients usually are only written up after we give them a (usually) free prototype they can play with and comment on, when we've nailed down 80% of what they need to do in 20% of the features. Nobody reads them, they're very boring. They're actually meaningless, because natural language is a horrible way to document systems. Usually, in large OO systems, the length of natural language documentation is inversely proportional to the quality of the design.

perhaps just going to disagree :-)

In my mind, the Javadoc thing is not a whole lot better. If there is any flying in the face of DRY, there will be problems due to it. Having comments or literate whatever that has to parallel actual code is asking for a lot more effort to be put into maintaining the system. While that might not be an impossible thing to do, it is I think quite different from regular (admittedly not wonderful) programming.

(Tho, I could perhaps believe it being less bad in terse/succinct languages like Haskell as opposed to boilerplate-heavy lame things like Java.)

Re: NL auto checked, what i'm thinking is that if there is !DRY then i want the comments to be unable to drift from the code, if that is possible (and vice versa). Which makes me think of systems which purport to use a controlled NL to good effect.

p.s. I think there is a significant difference between the text and equations in a book vs. the nature of living code, so the examples you give aren't sufficient to change my mind so far :-)

"Fast, Good or Cheap, pick two"

Javadoc is fast and cheap.

If there is any flying in the face of DRY, there will be problems due to it.

A consequence of Javadoc being fast+cheap. As I've already said, Javadoc's data structures for relating code and comments are pretty lame for the year 2009. @See doesn't cut it any more. Sun actually made an open source replacement for ctags, so instead of improving Javadoc we have two different islands for expressing the same system responsibility: detecting inconsistencies between code and comments.

Moreover, modern IDEs provide features like Intellisense that can provide a one line description of functions. Borland was the first to pioneer this synchronous communication between the system and the programmer, and SLIME/SWANK has an asynchronous feedback model that effectively does the same thing.

Which makes me think of systems which purport to use a controlled NL to good effect.

For your Attempto example: I wasted 30 minutes trying to find a demo or workable example of what human beings have to read and understand, i.e. syntax. I gave up searching. Why can't academics (and open source projects in general) be more like, oh, say, jQuery in presenting (a) what the code is like (b) mission statement. Instead, Attempto tells me how it relates to a whole bunch of theories I don't have time to learn or care about, so it just makes itself irrelevant.

p.s. I think there is a significant difference between the text and equations in a book vs. the nature of living code, so the examples you give aren't sufficient to change my mind so far :-)

* The Lanczos and Strang examples paraphrases the classic LP example given by van Ammers and Ramsey in the LP FAQ.

* The Spring SimpleFormController was intended to illustrate how too much plain english is an overall design smell, indicating poor design. The fact it has the "Simple" prefix for a class about 6 levels deep in a class hierarchy is "enough said", but then you can also just look at the documentation, and as your eyes glaze over reading "John Updike Does Web Programming", you can just feel it is bad. I'm not claiming any system I design will turn John Updike novels into Illuminated Scrolls scribed by mountain monks, decorated with golden curlicues and calligraphy.

Instead, I'm simply advocating better human-computer interaction, with emphasis on human factors.

NL vs code

Re: NL auto checked, what i'm thinking is that if there is !DRY then i want the comments to be unable to drift from the code, if that is possible (and vice versa). Which makes me think of systems which purport to use a controlled NL to good effect.

Wouldn't that be of the complexity as somehow having the capability to spit out the natural language description given the code? I think the point about and value of commenting code is specifying the same thing in at least two *different* ways. For example, I have to write C++ code at work, but I often find Haskell code to be very succinct expressions of what I want to achieve. So I often leave the equivalent haskell code of a page of C++ code as a two-line comment :) If that Haskell code needs more explanation, I add some prose as well.

Btw, I usually write gobs of english prose to describe a few lines of Haskell code for some personal exploratory code :)

re: of the complexity

yeah, it might well be that the only checkable specification that is of any use is... one written in code, not in prose.

to me it just argues that we need to still stay the heck away from bifurcating our explanations with code vs. prose and instead simply make our code itself as damned good as possible. and have lots of e.g. Fitness or BDD tests to make it as clear as possible what the system thinks the requirements are.


"Autodoc" on the

"Autodoc" on the Commodore-Amiga predated Javadoc by several years. In fact, half of the book, "Amiga ROM Kernel Reference Manual: Includes and Autodocs", are just print-outs of all the autodocs in the ROM OS.

I'm sure the idea goes earlier still; I can't imagine that folks at Commodore would have invented the technique.

Example: The Quick C-- Compiler

The entire Quick C-- compiler is implemented as a Noweb literate program, mostly in OCaml and Lua. Here is the main module, for example. The resulting LaTeX and PDF files are not on the site, such that I can't link to them.



You might want to check out Physically Based Rendering: from Theory to Implementation. It's a book (and software) on raytracing techniques written in the literate style. The literate software was written by the authors themselves so you might send an inquiry to one of it's authors Greg Humphreys about it.


I had no idea Humper's book was written in literate style. It has been on my amazon.com wish list for years.


I thought some people might like to see what I've been digging up on my own.

Ralf Hemmecke's Aldor is a computer algebra system based on Axiom that is written in a literate style. It uses noweb, which appears to be the most popular plug-in/LP tool due to its Emacs support and Emacs mode integration with MMM-Mode.

Are you just thinking about

Are you just thinking about a document that describes code (~literate programming), or do you want a "live" document that can be executed and reflect the result of running code? If the latter, you should take a look at Sweave.

My initial thought was...

Why not both?

Thanks for the Sweave example.

To give you an idea of where I am going with this, I am one of the more "fancier" WPF/Silverlight programmers out there and know the framework inside and out. I could debate for hours and hours on the differences of SVG, XPS and *TEX, but let's just say I've made my decision and "it is XPS, because TEX and SVG are fundamentally batch-mode systems" (simplifying things).

You certainly do want both

You certainly do want both (if you'll search the archives for my posts about Sweave you'll see that's roughly what I was after). But notice that the the type of interaction is different (consider "run time" versus "compile time", if you will). Another way to see the point is to ask about the type of artifact you are after: a program or the output of executing the program. Sweave style systems add an extra meta-level, in some sense.

I have yet to see a nice semantic model combining both these feature sets together. I am sure it is not impossible to come up with one, which is why I keep on looking...

I know what you mean

Most architects and even academic researchers falsely claim they've created the right level of abstraction to solve this and similar problems. A good example is Magritte, the web application framework that is super-dynamic but super-monolithic. As dynamism is strictly a matter of designing the application protocol before the API, the super-monolithic part is unacceptable.

I also am an intense practitioner of multi-stage programming, owing to my strength being statistics and scientific analysis of research data.

Most people think too much in terms of compile-time versus run-time. They also think multi-stage programming is about code generation optimization. It is not. Multi-stage programming is a mathematical way to systematically break down a problem into logical stages. Code optimization is simply a "neat trick" that involves moving significant choices into the preprocessor-phase, compile-phase or link-phase. Run-time optimization based on multi-stage programming is also theoretically possible (and worth doing), but the industry is a ways away from anything more advanced than LLVM-spit. Thinking beyond optimization, while it is cool for Basic Linear Algebra Routine proxies like Jacques Charette's MetaOCaml BLAR, there are more practical layman use cases for it. You can't have truly live and literate programs without breaking past the conventional Edit-Compile-Debug cycle, you need something like EnterStage-EditSubsequence-ErrorCheck-Verify cycling. EnterStage would be the key for assessing the quality of a literate program, as it loads the associated relevant documentation.

My basic opinion is is that ad-hoc upgrades (the Edit-Compile-Debug cycle) are harmful to a system's cohesiveness, that too many programmers use techniques that are not only unnecessary but downright counter-productive. Some of these are forced upon them, others they accept as easy skills to learn and yet others are simply the result of no social mores to institute better practices. Only very rarely do we see academics pull society along, with a good example being Ted Codd's data independence principle to counter ordering, index and access path dependence. Part of the problem is how fragmented research is: Go to OOPSLA or ICFP and ask around if anyone knows who Bernard Thalheim is.

For what it is worth, I do plan to pump out some mini-articles on LP future directions, mostly deconstructing problems with current environments. I could link them to you if you want more food for thought.

... and mathematica

liveness of notebooks

Do mathematica notebooks have a fine-grained notion of liveness?

What I am thinking of is that liveness is itself a property, that survives "publication". I see no reason why I have to decide, at "report generation" phase, whether to link live code/data or the static results of code from a previous phase (which may have at one point been live code).

yes, I think

I'm unable to access the wolfram site for links now, but notebooks are indeed live in the sense that their liveness survives publication .. and the author has control over it.


i can believe in use cases for the other way 'round as well, that one might wish to see a specific version rather than only be able to see the latest.


Versioning and publication are really simple to get right, actually.

And you want semantic versioning, not blob versioning.

The hard part is creating the bootstrapping code to make the Big Bang a reality.

I don't have plans to actually allow external users the rights to version documents, though. I'd rather not my servers become Iron Mountain for storing / shredding electronic documents. Instead, it makes more sense to allow users to save published documents by exporting them to a "universal print file format" like XPS. They can then do their own blob versioning.


I'd like to bring up Miller Puckette and David Zicarelli's Max graphical language which has now grown into Max/MSP/Jitter for audio, MIDI and video processing. I was a Max user a long time ago and I loved the "live-ness" of the tool that few systems can compete with today and many (such as Apple's Quartz Composer) try to emulate. Miller Puckette himself continues to swear by the graph model with his open source version of Max called Pure Data.

A program in a Max family language (usually called a "patch") looks like its own documentation that you can play with live. You can run a patch, probe the data flowing and edit the patch as it is running. Text comments further aid the reading of a graph.

On a related topic, I've lamented the absence of higher level abstraction tools in such graphics languages, but it looks like Quartz Composer has made some effort in that direction through "macro patches" that can do things to "sub-patches" such as iterate them.

Another such graphical environment worth looking at is MIT's Scratch.

.. ok now I'm getting off topic :)

that's not off-topic... ..

that's not off-topic...

.. ok now I'm getting off topic :)

That's friggin' awesome! I've actually always searched for an amazing composition tool after reading Notes from the Metalevel.