Using Hackage to Inform Language Design

The idea of mining code repositories to tease out language design issues have been discussed before on LtU. In this paper, J. Garrett Morris looks at the usage of overlapping instances in the Haskell code repository site hackage on deciding whether to include the feature in a new language.

Using Hackage to Inform Language Design
Abstract

Hackage, an online repository of Haskell applications and libraries, provides a hub for programmers to both release code to and use code from the larger Haskell community. We suggest that Hackage can also serve as a valuable resource for language designers: by providing a large collection of code written by different program- mers and in different styles, it allows language designers to see not just how features could be used theoretically, but how they are (and are not) used in practice.We were able to make such a use of Hackage during the design of the class system for a new Haskell-like programming language. In this paper, we sketch our language design problem, and how we used Hackage to help answer it. We describe our methodology in some detail, including both ways that it was and was not effective, and summarize our results.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Use with care

I consider this methodology suspect in general.

I have a package on Hackage. I have a problem in my code which I would like to solve with overlapping instances, but have not done so because I consider the feature broken in its current form.

So the problem with the methodology is that you only see the positive instances - where a programmer was able to solve their problem using the language feature in question - and not the negative instances - where the programmer thought that language feature should be the solution to their problem, but was unable to get it to work.

The point wasn't to decide

The point wasn't to decide how to *extend* the language, but rather to discover to what extent *removing* an extension would hurt common idioms, and what possible (ideally, more tractable) features could replace that extension.

This is sort of like regression testing against hackage, but for language features rather than API changes.

Don't cross the instances

I have a problem in my code which I would like to solve with overlapping instances (...)

...and then you'd have two problems!

Seriously though, if it's not too far off topic, what is it you hoped to accomplish with overlapping instances? Usually that's only necessary for heavy duty type-level metaprogramming, and if you're talking about HaskellForMaths I don't immediately see where it would need that kind of stuff.

Personally, I tend to view overlapping instances as an irredeemably bad idea that is, regrettably, as yet the only way to implement certain useful things in GHC-flavored Haskell. So I'm curious how other people feel about the extension.

What I wanted overlapping instances for ...

... was the following.

An algebra is a vector space which is also a ring (plus some additional conditions). Many algebras arise from a basis which is a monoid. For example, the algebra of commutative polynomials arises from the monoid of commutative monomials; the group algebra arises from a group (which is a monoid). However, not all algebras are of this form, for example the complex numbers as an algebra over the reals (because i^2 = -1, not 1).

So the overlapping instances are:
- That I want to say that whenever the basis is a monoid, we have the monoid algebra. Something like:
instance (Fractional k, Monoid m) => Algebra k m where ...
- But I also want to say, for some specific bases which are not monoids, that they give rise to an algebra. For example:
data CBasis = Re | Im
instance Algebra Double CBasis where ...

It seems to me that in maths, overlapping instances are probably all over the place, just because we tend to define structures at a level of abstraction for which there are many different ways to provide concrete instances.

(Another example that springs immediately to mind is (finite) projective planes. Given any (finite) field k, then PG2(k) is a projective plane, which leads to a family of instances. However, there are also more anomalous non-desarguesian planes, which may require case by case construction of the instances.)

That's a tricky one

Ah, I see. In the general case, though, would overlapping instances even work anyway? It won't help when there's no unambiguous instance. For example, at least four plausible instances of Monoid exist for something like Float.

Something to disambiguate is necessary; the standard libraries tend to use newtype wrappers, which aren't ideal but get the job done. Could you do something similar, e.g.:
instance (Fractional k, Monoid m) => Algebra k (MonoidWrapper m) where ..., perhaps? I'm not sure what would be useful, since I unfortunately don't really know much about the maths in question (despite the fact that just today I was writing something closely related... and probably making a complete hash of it).

The example you gave, though, is the usual sort of "want to select an instance based on class membership of parameters" thing, which is one place where using overlapping instances is particularly grating--since typically, the instances shouldn't actually overlap (e.g., CBasis isn't an instance of Monoid), and if they do it's likely to be a bug. Ugh.

Suspect compared to what?

Suspect compared to what? The alternative method that I know is to drive language design decisions by divine inspiration.

Great paper!

Looks like yet another question from the thread I started awhile back here on LtU is getting addressed, somehow.

See: What data sets would tell you the most about what sort of programming language to design?"

However, with such papers, it is important that the authors understand the limitations of the study. For example, GHC only added support for deriving {Functor, Foldable, Traversable} about 1.5 years ago, and it may have only been this year that the GHC compiler switches for it were standardized. If we look at the factors contributing to why a particular package even exists on hackage and think about whether some factors were even contributable to that package but not others, then we would have more well-defined sensitivity analysis and more information to inform how to data mine Haskell programs to design new languages.