Extensible tuples?

I've been mulling over a language design issue which would be nicely solved if tuples behaved like a monoid. Is anyone aware of a language where tuples can be arbitrarily extended?

For example, if # is the tuple extension operator:

f :: a -> b -> a#b 
f x y = x # y


f 1 "dog" ==> (1,"dog")
f (1,"dog") true ==> (1,"dog",true) 
f () ("dog", true) ==> ("dog", true)

It seems that type inference is still possible here, though I'm not sure exactly how it would work. (ie, how do you unify types a*b*c and d#e?)

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Sort of

I toyed around with this a little while back. There's a number of little issues that make them difficult to work with. Two somewhat major ones:

How do you distinguish between (1, "dog", true) and ((1, "dog"), true) or the opposite grouping? I ended up going with two operators, one to signify the append one for the 'plain' tupling. Not sure that's a *good* solution (or if there is a good solution). Leads easily to cases where the code doesn't do what the programmer expects.

The other problem is in typing. Is (1, "dog", true) considered usable where functions expect an (int,string)? There's no technical problem to not allow it, but again leads to cases where the language doesn't do what a programmer expects.

I don't know of a published language that allows this sort of thing, but I'd be surprised if there wasn't one.

Two tupling operators

I might have been a little unclear. I was also envisioning two tupling operators: (,) works as usual and (#) extends/concatenates tuples. Thus, (1,"dog",true) has type and can't be used in place of a argument.

In addition, there would be a t1#t2 type for a tuple extension over unknown types.

This last part is where I'm afraid type inference would get hairy.
The inference engine should simplify a#b to a proper tuple type whenever possible.

x = (1,2) # 3  
Derivation:
(1,2) :: int * int
3 :: int
(#) :: a -> b -> a#b
(1,2)#3 :: int*int # int ==> int * int * int

If sufficient type information isn't available at a particular instance...well, then I'm not really sure what to do. That's why I'm looking to see how others handle this sort of thing.

Implicit injection

I'm pretty sure that type inference won't fly like that if you insist on keeping injection into (unary) tuple types implicit. For example, what's the type of

f x = (666 # x).2

It could be either of

f : a -> a
f : a # b -> a
f : a1 * ... * aN -> a1   (for all N>=1)

which, as far as I can see, are incomparable. The first and second type seem wrong, because if a gets instantiated with a tuple type you don't want to get out a. The third seems more like your intention, but you do not even have a proper way to express it.

What you want is very similar to non-disjoint union types. For a number of good reasons, type inference being one of them, most languages provide disjoint unions, where you have to apply constructors explicitly to inject into a union.

If you can live with explicit injection, then you could look at all the literature on record calculi and polymorphic record typing (because tuples are basically a special case of records).

Implicit vs. Explicit?

Sorry, I'm unfamiliar with your terminology. Can you tell me what explicit injection would look like in this case?

Unary tuples

Basically, you need to distinguish between a plain type like, say, int, and the unary tuple (* int) (or whatever syntax you prefer). Same on the expression level. The upshot is that you get a clear distinction between (a # b) and (*a # b) and (*a # *b) = (a * b). And f would be replaced by either of the following (different) functions, making clear the intention:

f1 : a -> a
f1 x = (666 # *x).2

f2 : *a # b -> a
f2 x = (666 # x).2

This is only a rough sketch, of course.

Ah

Ok, I see. Unfortunately, maintaining that distinction is more messy than I can bear. I'll check out the above link on extensible records and see how far that takes me.

Another try

This is a bit of a kludge, but I'd still appreciate your feedback. What if the extended-tuple type can only be destructured as an ordinary tuple when it does not contain naked type variables.

Thus, the function

f x = match (666 # x) with _, x -> x 

would be rejected by the type checker since (int # a) cannot be unified with a proper tuple. You could, however, do:

z = (1,2)
match 1 # z with (a,b,c) -> a+b+c

since the expression 1#z has type int # (int*int).

This would certainly slow down type checking, but it seems like it should at least work.

Destroys completeness

This destroys completeness of type inference, especially because it becomes dependent on order. The following "obvious" example will probably not type check:

f x = ((666 # x).2) + x

while this one likely will:

f x = x + ((666 # x).2)

Is there a reason...

... that you can't just use (a*b)*c?

Categorically, all there is to a tuple is a set of projection operators each of which can extract a different value. You're free to represent the structure upon which those projection operators operate in any well-typed manner you wish.

For tuples to be useful, whichever function to which you pass the tuple must be capable of analyzing the tuple (via projection or pattern matching) and assigning a useful meaning to the analysis. To analyze an extensible tuple would require the ability to break it down into some chunk and 'the rest'... e.g. pattern matching on (a,b,c)#rest_of_tuple... much as you would with lists or other arbitrary-sized collection structures. The advantage you'd get is that each value could have a distinct and statically determined type, such that you could apply polymorphism (e.g. based on typeclass) to select functions. It would also be useful for metaprogramming. The disadvantage is that not all types could be statically determined (e.g. in Haskell, an extensible tuple could be of infinite arity, returning a predictable sequence of types.)

I suspect your language design already includes static typing and polymorphism (perhaps via typeclasses), or the whole idea of extensible tuples would be somewhat useless for you. But I'm honestly curious what particular use you think you'll find in them. If it is mostly for metaprogramming, it may be that a solution won't require the unification itself be first-class functional.

I've written extensible tuple libraries of arbitrary (but finite) arity for C++ before (via template metaprogramming). The tuples there were necessary for a challenge to provide an iterator over 'all combinations' from multiple collections. While it would have been trivial with this system to write a function that takes a tuple and a value and produces then returns a larger tuple of the appropriate type, the benefits of doing so would be more difficult to identify... esp. since C++ of 2003 lacks support for the 'auto' type. You could only (usefully) pass such a tuple to functions designed to break down and analyze tuples with polymorphic functions. There were no problems with the template-driven type productions, but then again the type structure inherently prevented recursion and problems associated with first-class functions. Still, I could get you the code if you're interested.

Tuple length genericity

I'm working on a language with implicit scaling, meaning the following transformation is implicitly applied:

(+) :: (int * int) -> int 

[1 2] + [3 4]
==> map  (+) (zip [1 2] [3 4])

Now, if I only care about binary operators, then this works using the usual zip function (ie, zip:: [a] -> [b] -> [a*b]). If, however, I would like to perform this transformation for arbitrary functions, then it seems beneficial to have a zip function with signature

zip :: [a] -> [b] -> [a#b]

Now, I could make implicit scaling work for any number of arguments.

f :: (int * int * int) -> int 
f ([1 2], [3 4], [5 6])
==> map f (zip [1 2] (zip [3 4] [5 6]))

If I *didn't* have tuple extensibility then this would fail since I would be passing arguments of type (int * (int * int)) to the function f.

Zip Calculus

You may want to look at the Zip Calculus.

A bit problematic

Someone on LtU suggested it to me a little whiles back. It seemed, initially, to fit the bill but unfortunately:

1) n-tuple type signatures are unwiedly/unfriendly to the programmer

2) despite the name, you can't actually implement zip/zipWith/etc.. using n-tuples unless you also supply built-in seqTupleL/seqTupleR functions. This works, but it's far from elegant.

3) most importantly, Tullsen doesn't provide an inference algorithm and cooking one up on my own doesn't seem trivial.

I hope he does some follow-up work. The Zip Calculus is neat, but not yet usable in its current form.

This sounds like APL / J

Instead of "tuple," think "array."

Sort of

APL/J is the language family I'm aiming for. Unfortunately, arrays don't (easily) have a statically verifiable shape/length.

Extensible records

Why not then treat tuples as a special case of extensible records?

Seconded

I second the suggestion to look at Daan Leijen's work.

Thanks

Thank you both.

Type inference

Type inference for records with a record extension operator is easy, but type inference for extensible records with a symmetric record concatenation operator is a lot more messy. (See e.g. Didier Remy's papers from the early nineties for details.)

Type inference for tuples with a tuple extension operator is just as easy, but it looks like you are after symmetric concatenation. Perhaps you can get some mileage out of Remy's cunning hack, that allows you to do a limited form of record concatenation 'for free'.

What feature do you want exactly?

I'm curious what you're trying to do and if you only need scalar extension (e.g. push : (Stack A) -> b -> (Stack A b)) or if you need concatenation. If you only need the former, you should look at how stack-based languages are typed, namely Cat at the moment.

Scalar *promotion*

I'm working on a statically typed array language. I'd like to maintain the brevity and expressiveness of a language like Matlab but using a HM family type system. A reasonable-seeming way to deal with arrays is to simply define all relevant functions and operators in terms of scalars and insert "promotions" where necessary.

1 + [2 3] 
==> map (+) (distl 1 [2 3])
==> map (+) [(1,2) (1,3)]
==> [3 4]

[2 2] .* [3 4] 
==> map (.*) (zip [2 2] [3 4])
==> map (.*) [(2,3) (2,4)]
==> [6 8]

This (hopefully) works for binary operators, but what if we wanted to write something like:

f :: num*num*num -> num
f([1], [2], [3])

we run into a problem. How do we splice together an array of 3-tuples to use as an argument to (map f)? That's the problem I wanted to solve with extensible tuples.

Attacking a Fly with a Sledgehammer

You're aiming a rather generic facility at that problem.


tuple_flatten3 ((a,b),c) = (a,b,c)
zip3 as bs cs = map tuple_flatten3 (zip (zip as bs) cs)
map f (zip3 [1] [2] [3])

It seems what you're really aiming to do is avoid (generically) the repetition of such things as 'tuple_flatten3 .. tuple_flattenN' and the potential 'zip3..zipN' you could define along side it. And you're targeting tuples in particular because they are the only structural type your language possesses.

But such patterns of fn2..fnN will tend to repeat themselves once you start using them. There is a reason that people have been aiming to support variadic templates in C++. So I do understand your intentions in avoiding that problem.

Can what you're aiming at be done? I believe so.

Being structural types, the type-descriptors for these extensible tuples should be isomorphic to the pattern-matching and constructors you allow over the tuples. And I expect you could probably escape with some relatively simplistic type unification and cover 80% or more of all application cases you'll ever encounter.


tuple_rotate (a)#X#(b) = (b,a)#X
tuple_addend ()#X a = ()#X#(a)

I'd say go for the simplistic unification algorithm - one that you'll know will be fast, imperfect, and 'good enough' in a manner that allows strictly less than you'd prefer. That gives you the freedom to fix it in a later version of the language without breaking backwards compatibility. It isn't as though you're solving a critical problem here.

More generally, if your language had more structural types, what you're really aiming for is functionally produced Type Descriptors by use of pattern-matching. You might want to look into first class Type Descriptors and allowing function-types to be specified in terms of functions over Type Descriptors. (Doing so is still less than dependent typing.)

Which typing rules?

This is likely due to my insufficient experience with type systems (just started reading TAPL yesterday), but I can't think of good-enough typing rules that don't run into the problem Andreas pointed out below (messing up completeness of the type system). Can you give me an example of what you had in mind?

Also, can you point me to any papers on Type Descriptors? I've never heard of them and I didn't see anything on google scholar.

Thanks,
Alex

The loss of 'completeness'

The loss of 'completeness' is only because there is no 100% perfect solution to the problem Andreas presented. But there is an 80% solution available. Simply unify:

f x = (666 # x).2

as

f :: (a#b) -> a

And be done with it. To make it an 80% solution, include an implicit coercion a => (a#()) when 'a' is a non-tuple type used in an extension. [Edited 20081114; original text called it a 'unification'].

As far as 'type descriptors' go, I refer to values that describe types. For example, the Type of Type Descriptors might be something like:

data TypeDescriptor = Tuple [TypeDescriptor]
                    | Lambda TypeVar TypeVar
                    | TaggedUnion [String * TypeDescriptor]
data TypeVar = TypeDescriptor
             | Polymorphic [TypeClassReq] Variable

Having TypeDescriptor for a language be explicitly available for functional operations within a language is reasonable only for a language with purely structural typing.

Doing this allows one to apply functions to TypeDescriptors to produce TypeDescriptor.

   TypeDescriptor -> TypeDescriptor 

The ability to produce such functions is the basis for System F, but System F uses a function and application language independent of the one used for values (often represented as a big Lambda or big Pi, distinct from little lambda). Making TypeDescriptor an explicit value in the language goes in the opposite direction, allowing regular functions to produce new TypeDescriptors (and, thus, new types).

Most of the time there isn't much advantage to it, but there also isn't much loss. The few cases I've seen where it seems to be helpful is when doing: (a) metaprogramming, and (b) distributed programming, and (c) especially any combination of the two (supporting typed pipes and sockets and files, producing modules and services at runtime, etc.).

I'm not certain it would work with type inference. But, then, type inference itself isn't all that great with structural typing (since there is often more than one 'type' compatible with a given term).

Unsound

But that's not an 80% solution. AFAICS, it's simply unsound.

Imagine this program:

g (h : int * int -> int * int) = h (3,4)
x = g f

The type of x clearly is int*int, according to your suggestion. However, here is how it evaluates:

x = g f
  = f (3,4)
  = (666 # (3,4)).2
  = (666,3,4).2
  = 3 (* Oops... *)

Edit: Fixed typo screwing up example.

Edit 2: Made the typing more explicit.

Types reflect operators. If

Types reflect operators. If (666 # (3,4)) == (666,3,4) then, necessarily, (int*int) == (int # (int # ())). Given this, f (3,4) would unify cleanly as 'int'.

Without a monomorphism limit, x would also unify as 'int' in the example you provide.

If there is a monomorphism limit under the language in question, then your program wouldn't typecheck because f would need to have two types: (int#(int#())) -> int (from the application f (3,4)) and (a#())->a (from the unification of f with the type declared for h; 'a' is unknown but provably could not be an extended type).

Either way, your example doesn't demonstrate the 80% solution to be unsound.

Typo

I'm sorry, I made a typo that screwed up the example. The application f (3,4) should have been h (3,4). Fixed now.

Do you now see how it is unsound?

Unsoundness would require

Unsoundness would require that the example both pass the typecheck and result in an unsafe produce an unsafe operation. I don't see your example producing these in combination.

Consider that (int*int) is equivalent to (int#(int#())). Given f and h with the following types:

f :: (a#_) -> a
h :: (int*int) -> (int*int)

There are two possible unifications that can be produced by consistent application of rules. Either:

  1. h is treated as (int#(int#()) -> (int#(int#())
  2. or h is treated as (int*int)#() -> (int*int)#()

Which of these unification approaches is chosen MUST be consistent with the choice of the '#' operator. So consider your example: (666 # (3,4)). In accordance with the above two possibilities, either:

  1. (3,4) is consistently treated as (int#(int#())
  2. or (3,4) is consistently treated as (int*int)#()

In your example, (666 # (3,4)) is producing 666#(3#(4#())) == (666,3,4). If you are to be consistent with this example, then you must also be treating (3,4) in general as (3#(4#())), and you must similarly be treating (int*int) in general as (int#(int#())). Remaining consistent with your example:


 h :: (int#(int#())) -> (int#(int#()))
 f :: (a#_) -> a
 f (int#(int#())) => int

f is not type compatible with h. The typecheck will fail the program (g f).

You're right that if the example was passed by the typecheck, it would be proof of unsoundness. But I'm really not seeing how any set of typing rules consistent with your use of the '#' operator would ever allow f to unify with h.

It is only at the corner cases where things get hairy. Hairy. Not necessarily unsound. The corner case here is the automatic coercion of a non-tuple to a unary tuple in order that it can be processed and extended. Thus (f 4) => (f 4#()) => (666#(4#()).2 => 4.

In the earlier version of your example, you were clearly aimed at such automatic coercions by specifying 'h :: a -> a'. You implied that ((a#_) -> a) could be passed around as a first class function of type (b -> b).

This could be done if (a#() == a) would evaluate to true in a manner that doesn't involve any implicit coercions. As is, any automatic coercion of f : (a#_ -> a) to h : (b -> b) would result in 'f` a = f (a#())', automatically coercing the first argument to a unary tuple.

I agree that I didn't go into enough detail on exactly how I imagined such a unification would occur.

Associativity applies both ways

You have (int * int)#() = int * int. Consequently, int * int -> int * int = (int * int)#() -> int * int. This clearly is an instance of a#b -> a. So the application g f is well-typed according to your rules.

Consistency is relevant. I

Consistency is relevant.

I don't deny that (int*int)#() is an instance of (a#b). But if (int*int) is treated as (int*int)#(), then (3,4) must also be treated as (3,4)#()... i.e. as a unary tuple containing a pair.

Thus, if you unify (int*int) as (int*int)#(), then you must also use the following evaluation rule: f (3,4) => f (3,4)#() => (666#((3,4)#())).2 => (666,(3,4)).2 => (3,4).

Whichever coercion rules you use for specifying the operator you must also use for the type unification.

No inconsistency

But if (int*int) is treated as (int*int)#(), then (3,4) must also be treated as (3,4)#()... i.e. as a unary tuple containing a pair.

Sure.

Thus, if you unify (int*int) as (int*int)#(), then you must also use the following evaluation rule: f (3,4) => f (3,4)#() => (666#((3,4)#())).2 => (666,(3,4)).2 => (3,4).

No. 666#((3,4)#()) = 666#(3,4) = (666,3,4). And similarly, int#((int*int)#()) = int#(int*int) = int*int*int.

This flattening was the whole point of Alex' question and his # operator.

No. 666#((3,4)#()) =

No. 666#((3,4)#()) = 666#(3,4) = (666,3,4).

Either you treat (int*int) as (int#(int#())) for both typing AND evaluation purposes, or you reject it for both typing AND evaluation purposes. You are using one approach for typing and another for evaluation. That seems inconsistent to me.

And flattening is achieved by treating (int*int) as (int#(int#())), such that 666#(3,4) is treated as (666#(3#(4#()))). But this doesn't necessarily exclude the possibility for (a,b)#(c,d) to be ((a,b),c,d). Alex is looking for a solution, and there are many possible solutions.

In any case, the problem of associativity goes away if you simply treat (a => (a#()) as a coercion whenever a non-tuple is passed to a function requiring tuple extension. A potential inverse coercion is also viable, such that (a == a#()) if and only if a is not a tuple. I never intended (a == a#()) to be true for cases where 'a' is already a tuple.

I don't know what you're getting at

Either you treat (int*int) as (int#(int#())) for both typing AND evaluation purposes, or you reject it for both typing AND evaluation purposes. You are using one approach for typing and another for evaluation.

No, I don't. Please point me to the place where I would do.

And flattening is achieved by treating (int*int) as (int#(int#())), such that 666#(3,4) is treated as (666#(3#(4#()))). But this doesn't necessarily exclude the possibility for (a,b)#(c,d) to be ((a,b),c,d).

I have no idea what you are aiming at. Alex' intention was having a tuple concatenation operator (on term and type level) with (a,b)#(c,d) = (a,b,c,d), and a#(b,c) = (a,b,c) if a is not a tuple (obviously, this also implies (a,b)#() = (a,b)). That's what I replied to. You seem to have a different thing in mind, basically an associative version of *. But I fail to see how that addresses his problem.

Edit: Removed a less relevant paragraph.

Alex' intention was having a

Alex' intention was having a tuple concatenation operator [...] you seem to have a different thing in mind, basically an associative version of *. But I fail to see how that addresses his problem.

The 'problem' Alex presented is an N-arity zip, such that he could systematically write (zipN As (zipN Bs Cs)) would produce a neat list of type [(A*B*C)]. The 'problem' is tuple extension, not tuple concatenation. See page title.

You are using one approach for typing and another for evaluation.

No, I don't. Please point me to the place where I would do.

You are treating (int*int) as (int*int)#() when pattern-matching its type against (a#b). The only way this can be made consistent with the operator is to have (3,4)#() evaluate to the unary tuple ((3,4),), such that in (a#b), 'a' is equal to (int*int) and 'b' is equal to ().

The need for consistency between how you evaluate an operator and how you compute a type for that operator is fundamental. When defining a new operator, you can start with either the type or the evaluation rule, but you MUST make the other consistent. And keep in mind that we are attempting to define a new operator.

Confusion

The 'problem' is tuple extension, not tuple concatenation.

Not in the initial posting and the subthread I originally responded to.

Re-reading your earlier posts I now realise that we have been talking past each other. You are basically redefining # to mean (right-associative) * and suggest living with that weaker system. The latter is ok, although I don't see how it is enough for Alex' purposes (in particular, it is not symmetric). The former was a good way to cause confusion, especially since I don't see much of a point in distinguishing # from * with this approach. ;-)

You are treating (int*int) as (int*int)#() when pattern-matching its type against (a#b). The only way this can be made consistent with the operator is to have (3,4)#() evaluate to the unary tuple ((3,4),), such that in (a#b), 'a' is equal to (int*int) and 'b' is equal to ().

I think the confusion here is that the concatenation operation a#b Alex was suggesting is in fact ambiguous when a and b are just type variables. It will mean different things with different instantiations of the variables. There simply is no way to interpret it consistently before you now the instantiations! (Your suggestion above is not consistent either.) That was my whole point, and the reason why the typing rule you gave - or any other! - is unsound for that system.

Explicit injection like I was proposing in the other subthread removes the ambiguity. So does your suggestion (I think), although in a more restrictive way. A third way would be to introduce implication on the type level, but that is guaranteed to also introduce ultimate headaches for inference.

Explicit injection isn't

Explicit injection isn't necessary. Implicit coercion of non-tuples to unary tuples, when comes time to 'extend' them, will do the job. This implicit coercion can be integrated into the typing rules as well in order to allow certain forms of function passing, should you demand it.

Er...

...no. It makes no difference whether your type system factors out coercions or folds them into the typing rule for concatenation. The ambiguity arises either way.

(Ignoring the fact that type inference is pretty much doomed to be incomplete anyway when you have implicit coercions.)

But it does make a

But it does make a difference whether your language coerces only non-tuples or allows just anything to coerce to a unary tuple. Ambiguity exists if and only if there are multiple legal evaluation paths.

 . . . . . . . . . . . . . . . . .

So how

So how does increasing the number of possible coercions decrease the number of possible evaluation paths?

It doesn't. Ambiguity

It doesn't.

Ambiguity exists if and only if there are multiple legal evaluation options... and, even then, if and only if those paths have different results. So avoiding ambiguity only requires ensuring that all legal evaluation paths have the same result. This is most readily done by ensuring there is either one or zero legal evaluation paths for any given program.

Adding a coercion to allow an evaluation (1 paths) where before there was no legal evaluation (0 paths) does not result in ambiguity. For example, coercing a non-tuple into a unary tuple where there is no evaluation path for non-tuples (such as applying a tuple-extension operator to a non-tuple) is unambiguous.

Right. So?

Right. So why did you suggest implicit coercions as an alternative to explicit injections in the situation above? The whole point there was to reduce the number of valid interpretations in order to avoid ambiguity.

I'm sorry, but I find your style of shifting topics unproductive.

Because...

Right. So why did you suggest implicit coercions as an alternative to explicit injections in the situation above?

Because they are an alternative to explicit injection. Further, implicit coercion comes closer than explicit injection to accomplishing what Alex has specified in his use cases at a syntactic level. That seems a good enough reason to suggest it.

The whole point there was to reduce the number of valid interpretations in order to avoid ambiguity.

And I'm saying that you can get by with more valid interpretations and still avoid ambiguity. So reducing the number of valid interpretations isn't necessarily helping you avoid ambiguity.

I'm sorry, but I find your style of shifting topics unproductive.

And I found your earlier question in response to the issue of ambiguity ("So how does increasing the number of possible coercions decrease the number of possible evaluation paths?") to be misleading. Avoiding ambiguity neither implies nor requires that one "decrease the number of possible evaluation paths".

Moot

Because they are an alternative to explicit injection.

Not in the approach I was referring to. Only with the much weaker notion of tuple extension you favor. But I have mentioned these alternatives already, so I find your comment to that simply moot, if not misleading.

Only with the much weaker

Only with the much weaker notion of tuple extension you favor.

I favor the weaker notion of tuple extension because none of Alex's use-cases truly require anything stronger than extension of tuples from just one side, and any strictly less capable (80%) solution is both much easier to accomplish without flaw and much easier to later modify under constraints of backwards compatibility. Your assertion that coercion creates ambiguity with the "only" exception being this weaker notion of tuple extension is unfounded and almost certainly incorrect.

But I do agree we are "talking past each other". We aren't accomplishing anything. For now, I move to table this discussion.

I assumed

I assumed that it was clear from context that I meant the "only" of the two (or three) alternatives under discussion. I don't see where I suggested otherwise.

Ur includes a generic record

Ur includes a generic record concatenation operator, which is close to what you're asking about. I don't make any attempt to provide complete type inference, though.