Unordered pairs and their representation

Occasionally situations arise where a type of unordered pairs would be very handy.

Perhaps the most important example is for representing edges in undirected graphs.

I came across another example that is instructional in a paper by C.J. Date entitled "The Primacy of Primary Keys: An Investigation". He wonders how to define a relation to store information about marriages, under the definition that a marriage is between two people. He gives a definition like:

MARRIAGE ( HUSBAND, WIFE, DATE_OF_MARRIAGE )
    CANDIDATE KEY ( HUSBAND, DATE_OF_MARRIAGE )
    CANDIDATE KEY ( WIFE, DATE_OF_MARRIAGE )

and raises the issue that there is no reason to prefer designating one key or the other as primary. If you amend the example so that there are not distinguished roles like "husband" or "wife" that could provide sensible labels for two separate attributes each storing a single person identifier, then this seems to be a natural spot for an unordered pair type. (Please refrain from discussing the politics of this definition, there are plenty of places on the internet for that.)

I set about thinking of how to make such a type and what its interface could be. I ended up with nearly the exact same interface I had for general sets. I am working on a language for expressing database queries, so I want to strictly control non-determinacy. (Adopting Haskell concrete syntax for purposes of communication...) I have a member :: a -> UnorderedPair a -> Bool containment test, an instance Eq a => Eq (UnorderedPair a), and a function like elements :: UnorderedPair a -> Set a. There's no pattern matching on unordered pairs because it would be so easy to accidentally shoot yourself in the foot with it, and no pattern matching on sets either. There is toList :: Set a -> NonDeterministic [a] though if you need an escape hatch. I also have an instance Ord a => Ord (UnorderedPair a) and inorder :: Ord a => UnorderedPair a -> (a, a).

Overall I'm reasonably happy with this. But I got to wondering whether it would be better to have a more general facility and view unordered pairs as a special case of it. You can look at them as the quotient of pairs with respect to an equivalence relation that permutes the ordering, and then you get unordered triples and ... too, if you can think of anything to use them for. Or you can look at them as a refinement type of sets with cardinality 2.

Those definitions of course differ on whether (3,3) :: UnorderedPair Integer though, and it seems like there may be applications for both answers. A fair amount of literature discusses undirected graphs with loops edges from one vertex to itself. But at least as much literature discusses graphs where such edges are disallowed, and disallowing them seems more natural for the marriage example. I suppose with general purpose refinement types you could look at sets with cardinality in [1 : 2], having your cake and also eating it.

The quotient type approach struggles with one of my goals, because presumably the pattern matching function that applies on ordered pairs would still apply to the quotient unordered pairs without flagging the result as non-deterministic.

If you take the ad hoc approach (neither quotients nor refinement types) it is difficult to think of names for the unordered pairs with and without allowing the degenerate pairs relating a value to itself, so that is another challenge.

I wasn't able to dig up too many examples through search engines, does anyone have thoughts on the best approach to modeling these?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Unordered Tuples

... to have a more general facility and view unordered pairs as a special case of it.
There already is a more general facility: the Relational Model.
From a RM approach, your first example is exactly the same as MARRIAGE ( DATE_OF_MARRIAGE, WIFE, HUSBAND ), or any permutation of the attributes. Your unordered triple is right there. IOW a tuple is a _set_ of attribute/value pairs, not a sequence of attributes.

[I'll point out right away that SQL _doesn't_ do it that way. SQL allows positional reference, and indeed tables with repeated attributes, or anonymous attributes. Abandon hope all ye who enter.]

Very topically in NZ, we've just enacted same-sex marriage legislation, so could we put MARRIAGE ( DATE_OF_MARRIAGE, WIFE, WIFE )? Well no, because there would then be no way to refer unambiguously to one of the partners. (For example to validate that a person hadn't married themselves, or wasn't polygamous.)
Best I can think of is a surrogate:
MARRIAGE ( MARRIAGEID, DATE_OF_MARRIAGE )
MARRIAGEPARTNER ( MARRIAGEID, PARTNER )
(With a constraint that there be exactly two partners for each marriage.)
For your unordered graphs with loops, allow edges with one or two vertexes.

Failing that, call the attributes PARTNER1, PARTNER2 and provide some arbitrary way of distinguishing the partners (alpha name sequence, oldest first, position on the application form, ...).

Whether order matters is at least as much down to how your application treats the data as the 'physical storage model'.

[P.S. Good to see that you're reading C.J. Date, but not so good if you think that example is anything about ordering of attributes.]

Unordered is an imprecise word

AntC, that is a good correction.

I'll say that I completely agree with all of that, and am fully on board that there is no ordering of attributes.

Amend my previous remarks to say that I am interested in looking at indistinguishable pairs, pairs where you can't tell by ordering or naming which is "first" or "husband" or so forth because there is no basis for providing the names/indices.

I don't see a reason to "refer unambiguously to one of the partners". The whole nature of the situation is that there is no way to refer unambiguously to one of the partners, because there is no basis for distinguishing them. It doesn't follow that you can't "validate that a person hadn't married themselves, or wasn't polygamous", both of those are easily done using the interface I outlined.

I agree that this discussion has nothing to do with physical storage models or ordering of attributes, but I still think there is an important issue here.

A few more notes

The solution with two relation variables and a constraint that there be exactly two partners is basically isomorphic to defining unordered pairs as a refinement type of sets with cardinality 2, except that it requires inventing and tracking marriageids.

Which is natural enough in the case of marriages but seems fairly unmotivated in the case of general undirected graphs. If you use a definition like this to represent the type of undirected graphs you end up with all the usual issues around fresh name generation when a direct implementation of unordered/indistinguishable pairs would sidestep that.

... to have a more general

... to have a more general facility and view unordered pairs as a special case of it.

You've probably already considered this (and this isn't a useful comment so much a vague pointer), but this reminds me of work on combinatorial species as a basis for data types. If you're looking for a "general facility," you could do worse than to review that literature.

This?

Are you talking about the stuff like Quotient Types: A Modular Approach by Aleksey Nogin? (This may not be anywhere near the initial work on it, I haven't had time to chase down all the references yet. This is just what google lead me too.)

I came across that earlier in the week. It is very interesting but requires changing the presentation of all kinds of things away from the familiar data List a = Nil | Cons a (List a) and introducing a bunch of high-powered machinery.

I think this is great work, and I could be persuaded, but I am leaning towards thinking that this isn't a good value in terms of what you get for the added complexity.

Suggestion.

Are you not describing multisets?

[edit: simplified]

How?

If so I'm not sure I see the connection.

Or possibly we are using different definitions of multisets, the one I am familiar with is the quotient of lists with respect to permutation of their elements. Or another way to look at it is as a set and a function that tells you how many "copies" of each element are in the set.

So, { 3, 6, 9, 9, 9 } is a multiset of integers, but it isn't an unordered/indistinguishable pair of integers.

I suppose you could consider the refinement type of multisets so that there are exactly two elements (considering multiplicity) instead of the refinement type of sets so that there are either 1 or 2 elements, but I don't see what that would gain you. If anything that would make an even bigger difference between the type of unordered pairs allowing "loops" and the type of unordered pairs with strictly distinct elements.

Yes

Yes, I was thinking of a multiset being a function f:S -> {0,1,2,...}, and for your example, the restriction of |f|=2 which gives an unordered pair.

You know your problem alot better than I but when you wrote:

But I got to wondering whether it would be better to have a more general facility and view unordered pairs as a special case of it.

I automatically thought multisets. More general than a set, which is f:S -> {0,1}, and unordered pairs, whether they permit multiplicity or not, are handled nicely.

Are you sure you're not really looking for a more specific facility?

I think I see

Ah, I see. I meant a more general way of creating more specific types. For example, adding general refinement types to the language is more general in the sense I was talking about, because in addition to giving you these pairs it would give you things like the type of all positive rationals as a refinement of the rationals, etc.

Arbitrary Fixed Ordering

For most use-cases, such as "describing edges in undirected graphs", we can arbitrarily insist on a (min,max) ordering. This can be achieved by logic in a 'smart' constructor.

I would be interested if you can find a use-case for unordered pairs where the elements themselves cannot be ordered. But even so, I would suggest having fixed-order pairs for the common use-cases. Unordered pairs aren't very useful for 'generic' programming anyway (since they are not algebraic), so I don't see a lot of value in developing them as if they were.

There are orderings and then there are orderings

I don't have a use-case for unordered pairs where the elements themselves cannot be ordered by the machine, because for nearly everything you can convert to some canonical form and then then order it by the binary representation of the canonical form, if nothing better is at hand.

On the other hand, and motivating my inquiry into this, I do have a lot of use cases where the ordering, while it exists mechanically, doesn't exist logically and so I prefer to hide it.

For example consider the type of telephone numbers. Sure, you can model them as integers, or as lists of keypresses, or whatever. Suppose for the sake of argument that there is a canonical form (I don't know enough about the PSTN to know if that is actually true). So whichever of these representations you choose will come with a fairly natural ordering.

But I don't want to have an instance Ord TelephoneNumber because I want it to be a type error for people to compare telephone numbers as if that made sense.

Same comments apply, perhaps even more strongly, to the types of made-up identifiers that are all over databases. Product IDs, Employee IDs, etc.

It's great that the ordering is actually around, and I think we can take advantage of it with another type class that tracks which types have any ordering at all, even if it is a gibberish ordering, so that they can be placed in hash tables and so that the unordered pairs can be stored in canonical order to save comparison time, and so forth. But I really think there is value to be had in not binding it to the ubiquitous ,

I'm afraid I don't fully understand your remarks about unordered pairs not being algebraic. Is there something wrong with the Haskell data UnorderedPair a = ... type that I outlined above? I rather like it but I may be missing a really bad drawback.

Partial or cyclic orderings

There may also be interesting examples for types with partial or cyclic orderings.

For a partial order example, AIUI, there are collation functions for unicode that only partially order strings.

For a cyclic order example, there's this controversy.

For something totally unordered, we might have an undirected graph with vertices labeled by their cartesian coordinates or spherical coordinates. There is an mechanical ordering the machine can use, but it's not a logical one so this is along the lines of the earlier examples I gave.

Arbitrary Ordering

There are cases where orderings do not exist logically, e.g. we cannot order `IO()` operations or `a->b` functions based on any observable, computable properties. But none of your examples qualify. Rather, you present cases where many logical orderings can be computed from observable properties, but thus where any particular choice for the Ord instance may seem arbitrary.

I don't believe arbitrary ordering is a problem. Enumerations, modulo arithmetic or cycles, characters, etc. all have arbitrary orderings. The significant concern is protecting the invariants or documented properties of the typeclass, not ensuring there is only one way to do it. (Similarly, the arbitrariness of `Hashable` is not a very relevant issue.)

I'm afraid I don't fully understand your remarks about unordered pairs not being algebraic.

I assume you are familiar with algebraic types. Unordered pairs lose ordering information in every pairing. With unordered pairs, 2*2=3 (i.e. a pair of booleans has three values) and 3*3=6 (and k*k has k(k+1)/2 possible values). You also have very little flexibility in combining types.

Non-algebraic types aren't necessarily problematic, but they do require more concrete knowledge of the types to understand the resulting system behavior, so they hinder generic programming. In the contrapositive, since it is unlikely we'll use them for generic programming, there is less value in developing generic unordered pair types.

Ah, OK

I think we have to agree to disagree on whether allowing arbitrary ordering is a problem. I agree with your position for general purpose languages, for the most part. I think for the query language I am aiming for that they are bad news because it means that anyone reading the query has to understand all the nuances of which arbitrary orderings have been assigned to which types. For similar reasons I've gotten rid of operator precedence, the language requires explicit grouping in a lot of places where more traditional practice has been to invent mostly-arbitrary precedence hierarchies (or even user-defined ones).

You say "there are cases where orderings do not exist logically, e.g. we cannot order IO () operations or a->b functions based on any observable, computable properties." That's true, but those are cases where orderings do not exist mechanically. My examples point to where orderings don't exist logically. The type data Turtle = Leonardo | Michelangelo | Donatello | Raphael has no logical ordering although clearly we have 4! choices if we want to pick an ordering to use mechanically. I think it's better to have Donatello >= Leonardo be a type error rather than doing some potentially difficult to remember thing that is based on our choice of ordering. Worse, some languages do it by declaration order, so if someone changes the definition to data Turtle = Donatello | Leonardo | Michelangelo | Raphael they potentially change the semantics. This isn't at all a new idea, lots of mainstream languages don't allow ordering comparisons between pointers, for example, because there is no logical ordering even though there is clearly a mechanical ordering.

Protecting the invariant can be done with the smart constructor approach, I'm fine with that. I'm trying to steer people away from writing queries that don't mean what they seem like they mean.

Perhaps even more importantly I am trying to strictly control and track non-determinacy so that the system can have more leeway in deciding what to cache. Rather than analyzing peoples arbitrary functions that pattern match against logically-unordered pairs to see if they behave the same way regardless of which order they non-deterministically saw (which is undecidable) I think I'd rather give them an interface that lets you do 95% of the safe things you want to do and has an explicitly marked escape hatch for the other 5%.

So that means that no quotient types are algebraic types in that sense, correct? I'm not convinced quotient types are a good idea as a general facility either, at least not in languages without dependent types. I'm not sure what you mean by "you also have very little flexibility in combining types" though, it seems like there is a generic type generator unordered pairs that works whenever its argument has decidable equality.

Your characterization of

Your characterization of mechanical vs. logical seems a little inconsistent. I think I could order `a->b` functions or `IO()` operations mechanically - based on internal representation (e.g. bytecode), stable names or pointer comparisons, etc.. But I cannot order them logically.

It seems to me that turtles have 24 (4!) logical orderings. We could find some that seem less arbitrary - e.g. based on lexicographic ordering of the names, or canonical age, or left-to-right order on the cover of the May'84 comic. All those orderings are computable from intrinsic, observable properties. (The same cannot be said of pointer comparisons or other 'mechanical' orderings.) What, other than apparent arbitrariness, distinguishes ninja turtles from data SmallNum = One | Two | Three | Four?

I can understand the desires to protect against arbitrariness. When heuristics, default values, or other arbitrary decisions become part of a library, it is similar to implementation details leaking into the semantics. OTOH, most benefits of determinism (e.g. with regards to testing, validation, maintenance, replication) are not sensitive to arbitrariness. So when I must make a choice between 'non-deterministic' vs. 'deterministic but arbitrary', I will favor the latter.

not sure what you mean by "you also have very little flexibility in combining types" though, it seems like there is a generic type generator unordered pairs that works whenever its argument has decidable equality.

I mean the obvious: Regular pairs allow combining two different types (a*b). Unordered pairs only allow combining a type with itself (a*a). Relatively: very little flexibility in combining types.

You could split the

You could split the (arbitrary) ordering out of the type by using a simple wrapper, for example:

data Turtle = Leonardo | Michelangelo | Donatello | Raphael

data OrderableTurtle = OT Turtle

instance Ord OrderableTurtle where
  (OT x) <= (OT y) = (toNum x) <= (toNum y)
                     where toNum t = case t of
                                          Leonardo     => 0
                                          Michelangelo => 1
                                          Donatello    => 2
                                          Raphael      => 3

data OtherOrderableTurtle = OOT Turtle

instance Ord OtherOrderableTurtle where
  -- ...

One example of this is Haskell's Monoid instances, since we can turn Bool into a Monoid using AND or OR, and we can turn numeric types into a Monoid using + or *.

If you have sufficient control over the implementation, the ordering relation can be derived from the datatype automatically, which removes the need to update the ordering when the datatype changes.

Wrong words, but I think I am being consistent

Logical and mechanical may not be the best choice of words.

Maybe fundamental and arbitrary might be better ones. I'm not sure.

You can't order a->b functions or IO () operations even mechanically in the sense I meant, because you can't order them in a way that respects their extensional equality. In comparison you can do that with pointers, unordered pairs, enumerations with no important or distinguished ordering among the members, telephone numbers (under the previous assumption that they have a canonical form), etc.

Question.....

If you can order unordered pairs, and you cannot order a->b functions, then can you, or can you not, order unordered pairs of a->b functions?

Just how are you ordering unordered pairs?

I was vague there, sorry.

I was vague there, sorry. You can order unordered pairs when you can order the type that they are pairs of. instance Ord a => instance UnorderedPair a is what I meant.

Ordering Pointers

you can't order them in a way that respects their extensional equality. In comparison you can do that with pointers

You've a mistake in reasoning here. I think it leads to the inconsistency I'm seeing.

Pointers generally cannot be compared based on any observable properties. I.e. if I have a pointer to an object O, and a pointer to O' that transparently forwards every message to O, then I cannot distinguish those two pointers. Unless, of course, if I provide some 'mechanical' ordering (e.g. based on an address of an object, and barring copy-collection). If I provide a mechanical ordering, then suddenly I am able to distinguish pointers (even those that otherwise would be logically equivalent) based on the provided mechanical ordering. But the same can be said if I provide a mechanical ordering on functions or IO operations.

Pointers/references simply aren't in the same category as enumerations or telephone numbers. For telephone numbers, many orderings can be constructed based on observable properties. Pointers are in the same category as functions and IO ops. For pointers, functions, IO ops, etc. you can't construct the ordering; it must be provided to you based on characteristics or implementation details that would otherwise not be observable.

Pointers as I meant them are

Pointers as I meant them are in the exact same category as telephone numbers. The mistake you are making is that you are thinking of pointers as references. I agree that you can't compare references based on observable properties. But a pointer is just the telephone number for a certain piece of memory. You can compare pointers for equality because their intensional equality is their extensional equality, no two pointers name the same piece of memory. It's only when you conflate them with references that they can't be compared for extensional equality. Pointers need not point to objects.

References are in the same category as functions and IO ops. Raw pointers are not.

There's actually an even tighter distinction than that. If you consider the meaning of a reference to be the identity of what it refers to, then you can compare them for equality. It's only if you consider the meaning of a reference to be the meaning of what it refers to, and what it refers to does not have decidable extensional equality, that you cannot compare them. This is the case you were explaining.

Pointers are in the same

Pointers are in the same category as functions and IO ops: you cannot distinguish pointers except by a provided, implementation-dependent comparison operations. This is not a mistake I'm making.

You're using the word 'pointer' where you might mean 'address'. There is a difference. Pointers don't need to be addresses (and historically, they aren't, due to issues such as split memory or copy collection). And addresses don't need to be pointers (e.g. URLs are also addresses).

It's only if you consider the meaning of a reference to be the meaning of what it refers to, and what it refers to does not have decidable extensional equality, that you cannot compare them.

Again, similar can be said of functions. If you consider the meaning of a function to be the code describing it (which is mostly true in a fexpr-based language or pattern calculus) then you can compare functions based on their code. If you consider the meaning of a function to be how it transforms inputs to outputs, then they don't have decidable equality.

That's fine. I thought that

That's fine. I thought that was the pointer vs. reference distinction. If it's the address vs. pointer distinction then replace everywhere I said pointer with address.

The crux

I think this is the crux of it. You wrote:

What, other than apparent arbitrariness, distinguishes ninja turtles from data SmallNum = One | Two | Three | Four?

I can understand the desires to protect against arbitrariness. When heuristics, default values, or other arbitrary decisions become part of a library, it is similar to implementation details leaking into the semantics. OTOH, most benefits of determinism (e.g. with regards to testing, validation, maintenance, replication) are not sensitive to arbitrariness. So when I must make a choice between 'non-deterministic' vs. 'deterministic but arbitrary', I will favor the latter.

It is precisely the apparent arbitrariness that distinguishes Turtle from SmallNum. I think that's an important distinction though.

"When we must make a choice between 'non-deterministic' and 'deterministic but arbitrary', I will favor the latter."

I favor the former in the context of a database language, because it forces programmers to be explicit everywhere that they depend on that arbitrariness. This preserves the ability for the system to choose a different arbitrary answer later, e.g. for performance reasons, without compromising existing programs.

To give a related example of the same design principle, when presented with a query without a sort specification (without the equivalent of a SQL ORDER BY clause), my system intentionally shuffles the first 5 tuples of the result before returning them to the client. This surfaces the issue in testing if the query plan selected happens to have arbitrarily returned your results in some sensible order as a result of decisions it made for performance reasons (a very common scenario due to indexing), forcing you to specify the sort if you intend to rely on it.

Your choice is probably superior for general purpose languages.

In practice, programs are

In practice, programs are developed and tested against one implementation. When another implementation is used, or the non-deterministic implementation-dependent decisions change, programs often break badly. This is one of the common issues that plague actors model code. I agree that shuffling or randomizing things can help, and I have often encouraged actors model 'debug mode' development run using a randomized message ordering. But even then, the testing that must be performed with non-determinism is considerably more exhausting.

Also: I think that, historically, non-determinism hasn't been as successful for performance benefits as one might imagine it to be. I'm not sure of the reasons for this, but I imagine it's difficult to tweak decisions in a way that will be better 'in general', so either a lot of specialized analysis is required, or developers would need to more explicitly specify what they want anyway (in which case determinism would do just as well). Further, determinism offers its own performance advantages, e.g. with regards to replicated computations. A common strategy for networked video games is to replicate most of the computations and thus reduce the network load to just user-inputs, but this requires determinism to avoid divergence. Web servers might similarly be mirrored to support high loads, but we don't want them to diverge.

I would be interested if you

I would be interested if you can find a use-case for unordered pairs where the elements themselves cannot be ordered.

I recently hit a brick wall when I needed to create sets of types, since I couldn't create an ordering for them (without hacking the host language, at least).

Inspired by algebraic effect handlers in Idris, I tried to implement sideways-composition of monads. If two monad transformer stacks "FooT Bar" and "BarT Foo" can be proved (using dependent types) to behave the same, then we can collapse them into a monad "Effectful {Foo, Bar}", where "{}" is a set (or unordered pair without repetition, in the case of two elements).

This would allow more fine-grained types than the IO sledgehammer, for example we could keep track of which files are accessed:

getLine : (f : Filename) -> FileRead f String
getLine f = ...

myApplication : Effectful {(FileRead "config.ini") (FileRead "data.sqlite")} ()
myApplication = do conf <- getLine "config.ini"
                   data <- openSqlite "data.sqlite"
                   ...

main : IO ()
main = runEffectful myApplication

However, I gave up on the idea after several failed attempts to implement sets of types in an extensible way (to allow user-defined types, via classes or suchlike).