Interesting Standard Libraries to Study

Lately, I've been perusing Plauger's "The Standard C Library" again with interest.

Well, it occurred to me that I'd like to examine the design of other language's standard libraries that might be particularly noteworthy for any variety of reasons: how the library might make interesting use of language (including module) features, what the libraries might include or exclude in a noteworthy manner, how the library chops up the world in terms of what is abstract and what is concrete, what is built-in to the language and what is relegated to a library routine, etc.

Needless to say, almost every programming language has a some library or another, far too many to profitably examine. So I'm looking for some particular exemplary samples, and samples of standard libraries, not samples of just nifty languages.

I'll throw in that due to the nature of the exercise, fairly stellar library documentation ranks pretty highly, if not an absolute prerequisite.

Thanks!

-Scott

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

STL, Haskell 98

For influence, it would be hard to beat Stepanov and Musser's STL.

The Haskell 98 prelude and standard libraries are a good example of idiomatic Squiggol-like and monadic coding.

Good call on STL

STL certainly qualifies as part of a "standard" library, if in part "standard" means "general purpose." Certainly well documented. I'm sure it's evolved, but I believe I have the one of the early, original Addison Wesley volumes on the early STL library, which should be more far more pleasing to read than scouring source comments :-)

Is there any interesting, quality expository treatment of the design of the Haskell prelude?

Thanks! -Scott

STL documentation

I find the documentation at the SGI STL website to be an excellent resource. For a better understanding of the principles underlying the STL design, you might be interested in the book Elements of Programming written by Alex and me. (Tom Duff mentioned it on LtU here.)

Prelude considered dubious

I'd exercise some caution regarding the standard Haskell 98 Prelude. At small scales it is, if memory serves, written for idiomatic clarity and simplicity rather than performance, and differs from the (more efficient) code you get by default when using Prelude functions in a program compiled by GHC.

At larger scales it has some less-than-ideal design choices, sometimes as poorly-considered simplifications, sometimes from retaining decisions made for older versions of Haskell or ancestor languages, etc.

In other words, it's nice to read as Haskell, but not as much as a library.

Scala

Scala's standard library is very intertwined with its language. Not just the standard library, most libraries designed specifically for Scala will often heavily leverage even its most advanced features.

How to find the std Scala library docs?

I have the slightly outdated 2.8.0.RC1. Under doc, there is a "manual" in .pdf and .html form, but it appears to tell me nothing I want to know. Is there an official Scala standard library and a central, organized repository for the library?

Thanks much!

-Scott

Most of the documentation

Most of the documentation that a library user would need to know is under Scala doc (as in Java, embedded in the code, extracted to HTML with a tool). Otherwise, there are smatterings of documentation in the form of articles, e.g., the architecture of the Scala collection API is over on Artima.

And the scaladoc...

...is here.

May not qualify

as "fairly stellar library documentation"

Agreed

Any documentation system written in 2010 that doesn't have some mechanism for allowing people to post examples of how to use the library is passe'. The Scala documentation is typical programmer documentation: reams of what something does, not how to use it. At least Microsoft put effort into LINQ and provided documentation on how to use the DSEL.

Also, Alex Buckley ripped one of the Scaladoc developers on their IDE support at PLATEAU 2010, and said he filed two bugs before the poor guy's presentation. The gist of Alex's complaint was Scala was calling their system "modern" yet it still had the same IDE integration issues as Javadoc. From the perspective of most real world developers, basing a design on how people actually check and read documentation (e.g., in the IDE and not in a browser) is important. Scala does allow some documentation to be viewed in the IDE, but it does not gaurantee the version of the documentation matches the version of the library being used.

I don't think documentation

I don't think documentation is not included in the library zip, maybe it should be (otherwise another solution is needed to avoid version drift).

I always thought that the IDE is the place to view documentation, it just doesn't make sense to view it anywhere else. Not only that, but you should be able to easily access examples, find stuff, and ask questions, all from your IDE. Also, your users are the best source of documentation for your library, adding and editing docs should be as easy as wiki.

That sounds like Collaborative Scaladoc

Any documentation system written in 2010 that doesn't have some mechanism for allowing people to post examples of how to use the library is passe'.

Then this is what you want: http://code.google.com/p/collaborative-scaladoc/

About IDE-based vs. browser-based views of API documentation. That's actually two different (albeit related) topics.
From what I've read, the browser-based paradigm is not the main theme of the paper [1] presented at PLATEAU (rather, cognitive aspects, ie. how developers benefit from precise, navigable type info displayed in a readable manner).

[1] http://infoscience.epfl.ch/record/151520

Main themes aren't always the most profitable

The author who presented at PLATEAU had a more broad goal of simply making developer's lives better. Part of his goal in attending the PLATEAU workshop was to get feedback and ideas to improve Scala's documentation and make jumping into Scala easier - he was proud of the design of Scala and wanted more people to use the language and enjoy a development environment as good as the language. While the paper was indeed very focused, he did cover a lot of ground in the Q&A session afterward in terms of fielding ideas from the audience and also discussing stuff he'd been looking at.

Buckley's comments were not some ad-hoc attack at the author's work, by the way. They only came after somebody asked why the presentation only showed documentation in a web browser.

I have to say, though, that islands of documentation systems is the biggest problem today. CMU has developed dozens of good ideas for documentation systems but there is no systematic, modular way to glue these things together (both from the library writer's perspective of the documentation, and the reader's perspective of peeling away layers of detail and creating a lightweight doc that fits their reading/browsing style). These islands develop primarily because you need novelty to get published, and nobody funds "integration of concerns" as a genuine human factors concern. It's B.S., but that is how things are.

D

Because it's designed as a fresh start by die-hard C++er's. E.g. std.algorithm, std.range.

Excellent suggestion

D is another "systems programming" language, so IIRC from casual inspection, it doesn't attempt to just absorb the entire computing universe in a "standard" library like JVM and CLR based languages (not to mention "scripting" languages).

I recall reading a paper on range-based vs. iterator-based collection and algorithm libraries. I don't recall if this paper directly referenced D, but D may present an interesting comparison with C++'s STL, not to mention comparing D's take on other "typical" standard libraries with ANSI C's.

In this day and age of huge so-called "standard" libraries (we need a new name for these Stone Henge scale monoliths), I'm as interested in what is left out of the language and its library standard as in what is included.

Thanks! -Scott

re: we need a new name for these Stone Henge scale monoliths

The standard monolith.

"Iterators must go"

Iterators Must Go, by Andrei Alexandrescu, who's one of the designers of D's stdlib.

Anecdotally, I've used ranges modelled after D's in one of my hobby languages - with nice results. As long as you stick to forward ranges, they're basically equivalent to plain old iterators, so there are no surprises, and the code for library functions like map is the same. Then you can add bidirectional ranges and other stuff on a per-collection basis as desired.

Libraries

I'll give a broad overview and go into detail as I feel necessary.

  • Finite Element Methods; there is one based on object-oriented principles but I am unaware of a FEM library using a more functional approach e.g. in Haskell
  • Basic Linear Algebra Routines (many flavors of this in many languages, but BLAR is the basic banner acronym)
  • Uri-related, correctly implementing RFC spec for Uris while allowing good composition
  • Database management, ranging from a superset of "migrations" API to "query management" API (aka ORM). Examples include Ruby on Rails Migrations and SQL Server Management Objects (SMO), ApexSQL's API, Entity Framework (ORM), Hibernate/NHibernate (ORM).
  • Enterprise scheduling software
  • System management APIs, e.g. Puppet and cfengine
  • State machine definition and generation, in particular for embedded systems; example include Boost::FSM, Boost::StateChart, Ragel State Machine Compiler, Samek's Quantum Framework, etc.
  • Parsing; parser generators and parser combinators

There are more I am interested in, but that's a good starter. Scott, if you have any feedback, it would be appreciated.

Smalltalk

The perennial example of a strong standard library would be Smalltalk 80. The library is covered in depth in Goldberg et al's Smalltalk 80 manual.

Another ancient tome already on the shelf

Good old "Smalltalk the Language" by Goldberg. I'll certainly take another peak. IIRC, over the years when I've peaked at Squeak and a few commercial ST's in the early '90s, the class browsers seemed to indicate some kind of additional package-style scheme that I don't recall in the original ST-80.

Thanks! -Scott

Squeak is in some ways superior

In particular, the System.Editor namespace (or whatever it is that governs atomic edits) and the Monticello package for handling versioning. This approach to versioning is fairly unique to Squeak, and there are good discussions on the Squeak mailing list about this feature. It is one of those libraries that as an outsider you probably couldn't understand, or simply discount, just by looking at it at first glance.

Pharo by Exampe 2, draft chapter on Monticello

I just saw that the Pharo By Example 2 book home page has a draft chapter about Monticello that you might like reading.

available for free

These Smalltalk and Pharo books are available for free at http://stephane.ducasse.free.fr/FreeBooks.html

Goo

Goo is another nice example. It basically takes everything known about classic Lisp, and puts it into a 19-page manual. (Goo, Dylan, EuLisp, ISLISP... are basically the same language: the dynamic language that would be king ;) )

I recall eagerly reading the

I recall eagerly reading the EuLisp standard a few decades ago. Will definitely look at Goo. A concise 19-pages is just too enticing to pass up.

Thanks! -Scott

The Art of the Metaobject Protocol

If only reflection APIs where as useful as this one.

You mean reflection APIs like...

...this one?

Mirrors: Design Principles for Meta-level Facilities of Object-Oriented Programming Languages. Phil Wadler wrote on his blog about the design: "One of the best OO design papers I have read. I'm convinced, mirrors are the way to reflect; I'm particularly struck by their importance for capability-based security."

In addition, purely declarative logic reflection has been baked directly into programming languages, such as Maude and Archon, that have a more mathematically rigorous definition for how they work.

Here are some basic issues, that as I recall Clavel pointed out in his thesis as well:

1) ad-hoc set of primitives; how will we ever know when we have defined the perfect metaobject protocol for intercession, introspection, etc. concepts? Having used the CLOS, I can tell you that there are times when it just doesn't quite do what you want or have the right features, and so you have to bend your model to CLOS rather than use CLOS to naturally describe your model. The essential point captured by Clavel is, How will we know when it is time to stop?
2) different software needs different constraints on reflection; how will we gaurantee correctness when something like CLOS allows us to do whatever we want? In a pure logic environment, we can reason statically about the equations and say definitively that we're abusing such and such. CLOS doesn't even gaurantee portability across Common Lisp implementations.

A separate issue is that some of the features in CLOS mix code reuse and other engineering techniques. It would be nice if we could tackle reuse separately, such as in Assmann's Invasive Software Composition framework...

Finally, you might want to look at Marcus Denker and Eric Tanter's work on Squeak and Pharo.

I'm not sure the CL MOP is a "standard library"

In general, I'd readily classify Common Lisp as a "negative example" of a "standard library" well endowed language. Not that I don't have much familiarity with and affection for CL!

I'm seeking language definitions with a some notion of a minimal "core language" (this may be highly variable) and then relegation of a high degree of "required, pragmatic functionality" to one or more "standard libraries."

How about a negative example

Admittedly, I have not looked at it in several years, but when I started using O'Caml, the standard library was rather thin, and often implemented very naively. For example, map over lists (or something similar) was not written tail-recursively. As a result, it would blow the stack on lists of a few thousand elements.

Right.

That's at least one way how I define "interesting".

For example, I think SMO (SQL Server Management Objects) API is awful. It really only supports a limited set of use cases, due to the fact that cloning breaks invariants. A more functional, stateless API would make more sense for manipulating SQL Server configuration.

Similarly, ApexSQL isn't very modular and it uses very poor control partitioning. Furthermore, there is no hook in the API into using the front-end GUI to visualize the differences in the diff'ing API.

And when I mentioned object-oriented finite element method libraries, I was suggesting something one could Google and read about to learn about the problem domain and solutions proposed.

A nought example

After a negative example has been given, how about a nought example?
J does not seem to have any need for a standard library, because the language itself abounds in operators. They do have the so called “phrases”, written in J, but this is rather a collection of (supposedly) good solutions to a random set of problems than a standard library, and is not a part of the language definition (which only consists of a Dictionary).
So, if there is anything in this example related to your question, it is perhaps asking another one, such as “When and why a standard library is not needed in a language?”.

What is a standard library?

I think it's important to ask ourselves what should be a language's "standard library"? For example, imho, to include some "stdxml" among a language's "standard" library will usually display (not always) some weakness in the language definition (compared to "C" or Java, just to be prosaic).

By comparison, to use a hypothetical Lisp as an example, I can imagine a library into which "with-open-file" and "format" might be stuffed. Or some ML'ish language might have a "vector" library into which "vec-map" and "vec-foldr" might be stuffed. Or perhaps there would be some higher order abstraction - who knows? That's the fun part!

I suppose one obvious and major question is: what language features _can_ be expressed and implemented in a library as opposed to the "core" language? And does it ultimately matter to programmers, aside from language mavens. And if so, then why?

-S.

Philosophical differences abound

Perhaps trite and obvious in a place like LtU, but it's the first thing that popped into my head upon reading your question: the distinction between "standard library functionality" and "core language feature" is going to depend heavily on the philosophy of the language designer, specifically in three areas:

  • Fundamental power of the language primitives
  • Orthogonality of language concepts
  • Permission for end-programmers to extend the language concepts

The primitives themselves will control how much composition we can perform to build new concepts; consider Lisp as a great example of primitives with unlimited potential for expansion. Yes, you have to learn to "think in lists" to do it, but it's there; and the uniformity of the primitives (everything's a list already) makes this incredibly powerful. C is in a similar boat with pointers; you can construct all manner of data structures and algorithms using pointers and primitive types, albeit without much of a safety net in many cases.

C is also obviously a little bit different from Lisp in that it doesn't have uniformity across its primitives; there are values, and then pointers to values; this touches on orthogonality. The more orthogonal your primitives are, generally speaking, the fewer you need; and the fewer you have, the more powerful they tend to be. Tcl is another great example of this philosophy in action, in my opinion. Again, orthogonality helps immensely in terms of what type of composition we can do with the built-in language concepts.

Lastly, some languages (Java, for instance) expressly don't want the end-programmer fooling around with adding new concepts to the language. (Sometimes they don't want anyone adding concepts to the language, end-programmer or otherwise; insert obvious joke at Java's expense regarding closures.)

I think these three elements comprise the key factors that affect the library/core-language distinction in most cases. If you can compose the core elements of the language richly, why add the composed concepts directly to the core? The library is the sensible spot to put them. If you can't, on the other hand, you have little choice; if you can't express a concept in the language itself using a composition of its primitive concepts, you're stuck building it into the core language itself.

Of course there may be corner cases where a standard-library concept becomes ubiquitous (or has unique requirements) such that it makes sense for optimization reasons to stuff it into the language core. I think that's something that historically happened for strings a long time ago, for instance (languages without true string primitives notwithstanding).

A standard library

A standard library, as I understand it, is a library intended to be common and ubiquitous enough that language implementors might as well place it in the language standard - possibly in order to offer dedicated optimizations or integrate syntactic sugar for it. For example, natural numbers are modeled in the standard library for some languages (Maude, Charity), via Peano logic - but receive syntactic sugar (you can use '3' rather than 's s s z') and in source may be represented via decimal numbers... and, under the hood, they are represented using machine integers. Lists and list-comprehensions and monads have received such dedicated attentions in Haskell. We can use [1, 2, 3] instead of 1:2:3:[]. Strings similarly receive dedicated attention in many languages that model them in libraries.

In my own designs, I've given some thought towards representing the standard list under-the-hood in terms of finger-tree ropes, despite modeling it in terms of the common recursive structure (head:tail | nil). This would force developers to use standard foldl, foldr, sort, and map functions for ideal performance, but would also allow the list to serve as the data structure for linear data - as a deque, for large strings, et cetera.

For example, imho, to

For example, imho, to include some "stdxml" among a language's "standard" library will usually display (not always) some weakness in the language definition (compared to "C" or Java, just to be prosaic).

I don't understand why including an XML processing API shows a weakness in the language definition. Many Lisp shops I know of have custom, hand-rolled XML processing APIs not written by library authors but simply hackers... and they suffer for it. Reams of code, and you can't just hire a random J. Lisp Hacker off the street who knows your XML API or whatever.

The complexity adds up.

XML should be a standard API. Interestingly, half of the Scala design team was split on whether Scala should support XML. The dude who did the API as a grad student got ripped by half the team, kind of made fun of (direct comments from him, made publicly [edit: see, for example, the Acknowledgements section of Burak Emir's scala xml manual). But he was dead on accurate in the need for the feature and API.

There are also a lot of different ways to process XML, some more cumbersome than others.

Moreover, you're missing a very basic idea. Communication benefits from a standard encoding, so that every language can speak a common encoding and then just work from there with the actual spec. Sure, that encoding might not be ideal (e.g., compressibility) in all circumstances, but it doesn't have to be. Human time is more important. And you really only need one default standard encoding, and if you really have a bottleneck (e.g., inflation or deflation) , you can develop custom ones.

Scala's XML support was rare

Scala's XML support was rare as it went straight into the language with its compiler module, so it needed to be in the core library otherwise the compiler would break.

Standardization is a worthy and yet ultimately futile goal. There are many ways of doing the same thing (e.g., parse/output XML), each with its own tradeoffs: XML-like syntax or something more concise? More safety or quicker hacking? Performance? ....

Better to make every library for a language a part of its standard library, easy enough to access without any intermediate steps. I've proposed this before, and I'm getting closer to talking about something more concrete :)

You might have misunderstood

You might have misunderstood what I meant by "standard."

I haven't chimed in too much on the philosophical stuff, but by "standard library" I roughly mean a group of modules (packages, compilation units, etc.) that formally accompany the language definition.

My own interests do tend a) toward the plural, hence my above critique of CL's "the library" approach; b) toward the minimal, relying on both language expressive power and compiler prowess to author high performing and highly functional 3rd party libraries (these may enjoy status for some time in some communities as de facto standards) for oodles of unanticipated future problem domains; and c) toward the separable and optional, for example not having to include "stdio.h" in some embedded application.

Of course, I imagine there are classes of programming languages where these criteria don't apply well.

I understand

And I realize some systems dont have "standard out" and "standard in", and even fewer might have "console in" and "console out".

I just think there are standard semantics, standard syntax, and standard pragmatics, and the ideal language does all three.

I think if you're interested in this approach, then you should dive deep into Maude and Scheme. I know you seem familiar with Scheme already. Maude is especially interesting, since it is designed in layers. e.g. OO-Maude builds on Core-Maude. David's comments are also an example of how interesting Maude can be.