How can middle school algebra help with domain specific languages?

As a programmer with no training in type theory or abstract math (beyond what I've learned on LtU), I'm curious to know if the laws of basic, middle school, algebra can help design or understand better domain specific languages.

Recently I saw a presentation on how algebraic data types don't just contain the name 'algebra,' but actually allow many of the same operations as on simple numbers. This will be obvious to people on this forum but such ideas are very surprising to programmers at large. The link describes not just products and sums, but also x to the power of y style operations. The youtube presentation in the link above then talks about how one can even take derivatives of type expressions, which result in very useful objects, such as zippers.

There have been interesting articles[pdf] on derivatives of regular expressions. Again, kleene algebra seems to define basic operations, +, *, etc. Although the derivative part is beyond me.

Modern database systems, are based on relational algebra, and this algebra beautifully simple. Although I have't seen this algebra described specifically using basic arithmetic operators, it does contain cartesian product, sums and even division.

So many of these, very practical, tools programmers use every day are based on what looks similar to school algebra (they even contain the word algebra in their name). Where can I learn more about how to design DSLs using algebraic operators? My assumption is that by starting with a domain, and being forced to come up with interpretations of +, -, *, -, etc. (interpretations which follow specific rules), I'll end up with a pretty minimal, yet expressive, language.

Finally, I actually started thinking about this after implementing Peyton-Jones & Eber's "Composing Contracts." That language, describes a couple of basic types: Contracts and Observables. Then describes operations to on these types: And, Or, Give, Scale, which map pretty easily to +, *, negate, etc. Once again, a beautifully simple language can describe a complex set of financial instruments. What's more, the constructs of this language (like so many others), seem very closely related to school algebra! What might it mean to take the derivative of a contract? What might it mean to take it's square or square root??

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Abstract Algebra

What you’re discovering is the field of abstract algebra, which lets us talk about algebraic structures in terms of the operations they support. If you prove some fact using the operations of a particular structure, then you know it is true for any instance of that structure.

For example, a monoid is a very simple algebraic structure. It consists of a set S and a binary operation • such that these statements are true:

  • For all A, B, and C in S, (A • B) • C = A • (B • C).
  • There is an element of S called 1, such that for any A in S, A • 1 = 1 • A = A

And it turns out that lots of things have the structure of a monoid:

  • Strings, with string concatenation, where the identity is the empty string.
  • Lists, with list concatenation, where the identity is the empty list.
  • Sets, with set union, where the identity is the empty set.
  • Pairs whose elements are monoids, where (A, B) • (C, D) = (A • C, B • D) and the identity is (1, 1).

There are many other, more powerful algebraic structures, and they are regularly used to construct DSLs. The example you cited is a ring, which has addition, multiplication, and negation—with several properties about how these operations relate to one another.

In Haskell, people regularly design DSLs using monads, which have two operations: bind (for composing operations together) and unit (for constructing operations from values). And as it turns out, a monad is a kind of monoid!

is abstract algebra used to design a dsl?

Thanks for your detailed answer Jon. I've heard of rings, monoids, groups, etc. but was never motivated to learn greater and greater rigor in describing integers :) As usual, what's considered abstract in math may turn out to be eminently practical in computer science.

I get how, given a language, it can be described by various structures from abstract math. How about the other way around. Say I need to create a domain specific language, is it practically helpful to see if it fits a monoid or a ring or a monad?

Is it common practice (or should it be) that you start with a domain: a date library, animation language, financial contract langauge, etc. and see if it makes sense to implement it as one structure or another? Would the design become more consistent, expressive or elegant if one started to think of a dsl in such terms?

The few documents I've read about DSL design talk a great deal about parsing languages or embedding them in existing languages. Unfortunately, I haven't found anything which provides a more systematic way of designing a good dsl.

my subjective $0.02 :-)

q1: a1: yes.
q2: a2: no, yes.
q3: a3: probably yes, but one can still screw it up in other ways i'm sure.

Direct-sum monoid

Pairs whose elements are monoids, where (A, B) • (C, D) = (A • C, B • D) and the identity is (1, 1).

Of course, technically you mean that the components of the pair are elements of (fixed) monoids, not the monoids themselves.

Too much overloading!

I meant “pairs whose elements are members of sets which form monoids with some other operations and identities, which I here refer to using a convenient abuse of overloaded notation because I am a lazy functional programmer”.

Consistency, Orthogonality and Simplicity

For me the idea is that mathematicians have had a very long time to refine their useful concepts, so they are useful and generally applicable. Mathematical operators should have a universal meaning, so for example using '+' for integer or int_32 addition makes sense, however using it for string concatenation is not a good idea (because its not commutative).

So you have the objects on which the DSL is to operate and the functions you wish to perform. Sometimes there are alternative equivalent minimal sets of functions. I think it is better to use the familiar algebraic concepts of addition, multiplication etc. (and their associated operators) rather than making up new function definitions. However this should only be done if the objects obey the requirements for the operators. Overloading '+' with something non-commutative would be worse than just having a function called 'concatenate(strings...)'.

Another point is that just because you can 'add' something doesn't mean it makes sense to multiply or square-root it. This is why there is a hierarchy of algebraic structures, not every group is a ring, and not every ring is a module. Its more about discovering if your domain of interest is a ring. If it is then use the operators for that ring correctly. If it is not, do not use those operators at all.

string concatenation?

however using it for string concatenation is not a good idea (because its not commutative)

Hmm. Do you prefer multiplicative monoids to addative ones? (Or do you prefer multiplication be commutative, too? :-) I've always thought + a very natural choice for string concatenation, because conceptually "adding stuff" is what it's doing.

It's the definition of "adding stuff" that's at issue.

To me, "adding stuff" implies, among other things, an operation where the order of the operands doesn't matter. String concatenation isn't addition because "alpha" + "beta" != "beta" + "alpha".

The issue is that if I'm looking at an expression with variable names rather than literals, and thinking of what it means and how to express it more simply, the + operator implies a whole set of transformations that simply aren't valid on concatenation and I have to stop and think about what type these things are before realizing I'm wrong.

That said, I'd have no objection to there being an infix operator for concatenation: it's a very useful operation and a concise way to express it would be great. It's just that that operator can't be + because + means a category of operations which simply does not include concatenation.

Algorithms

The question is, are all algorithms defined on '+' valid for the operation. I can't think of many algorithms defined only in terms of + apart from summation (which comes back to commutivity)

If we have something that implements a few more operators, then the requirement becomes more interesting.

In terms of commutivity of multiplication, if there are significant algorithms that depend on the commutivity, then there should be a separate operator for non-commutitive multiplication. The decision for me would be based on looking at which algorithms you would lose,

why is type 'pair' a product type?

Thanks Keean, perhaps you can help me understand just what it means to '+' something.

In type theory, addition is described as a union: None|Some() where the value is of type None OR Some(...). Product type is a pair (A,B). My gut reaction would have been to make a pair represent '+'. Just because '+' is something I think of as combining two things and the object which has been 'added' contains BOTH of operands, not either.

I believe product type refers to cartesian product, as I've seen in relational algebra. However, it doesn't make intuitive sense to me why a pair type isn't more naturally a '+' rather than a '*'

Intuition for product types

You can think of types as (naïve) sets of possible values. Call the number of such values the magnitude. What you are “summing” with a sum type is those sets—in other words, Either a b is the disjoint union of a and b, and magnitude(Either a b) = magnitude(a) + magnitude(b). Similarly, what you are “multiplying” with a product type is the sets—Pair a b is the Cartesian product (as you mentioned) of a and b, and magnitude(Pair a b) = magnitude(a) × magnitude(b)

It’s also not intuitive because in natural language we can say “and” to mean several things, including sums, concatenation, and pairing, but for products we say “times”—i.e., not “with”-ness, but repetition.

There is something to be said for the intuition of union and intersection types rather than sums and products—magnitude(Union a b) = magnitude(a ∪ b) and Union a a = a, i.e., union is idempotent; magnitude(Intersection a b) = magnitude(a ∩ b) and Intersection a a = a as well. These are particularly useful when subtyping is involved, because they correspond to “meet” and “join” in the interface hierarchy.

in natural language we can say “and” to mean several things

Natural language is not necessarily being imprecise, rather using "and" for sums can refer to the sum of exponents - which is a product, i.e
"3 pebbles and 2 pebbles = 5 pebbles"
means
pebble^3 x pebble^2 = pebble^5

or in PL jargon "a pair of a 3-tuple of pebbles and a pair of pebbles can be transformed into a 5-tuple of pebbles"

Of course, since sums and products are dual, and indirection is common in everyday language people can and do on occasion indirectly refer to one with the other.

Products and sums of their cardinalities

If you think about types that are finite sets of values, then product and sum types have a cardinality that is the product or sum of the original cardinalities. For instance, the product of two 3-valued types is a 9-valued type, and the sum of two 3-valued types is a 6-valued type.

More generally, the two operators have a distributive law: Every type (A * (B + C)) can be converted to ((A * B) + (A * C)) and back.