Indispensible use cases for record subtyping?

Are there any cases where subtyped records are truly indispensible? Standard examples are cases like Point3D as a subtype of Point2D, but this isn't convincing.

Certainly Point2D corresponds to the (x,y) plane of Point3D, but that's only one of 3 axis-aligned planes in 3D space. So while subtyping allows you to naturally program in one 2D subset, chances are you'll need operations along any arbitrary 2D plane, and so you probably won't ever use Point2D's functions in programs using Point3D.

Other cases might be something like a CEO and an Engineer both subtyping an Person record type, but this seems like a domain modelling failure. CEO's and engineer's aren't different kinds of people, they're both just people, but with different roles. So I'd argue a more natural expression would be a single Person record with a sum field: type Role = CEO | Engineer.

So are there any truly indispensible use cases for record subtyping, where it's truly the most natural expression of the domain?

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

This is a question I've raised elsewhere with no replies

Specifically, I think the important question is whether a posteriori subrecords is necessary. A priori it may be very useful, but needs no special machinery. For example, I wrote a set/bag library using a single data structure, composed from a hash table mapping the elements to the number of occurrences and a flag indicating the maximum number of occurrences. The exposed set-* and bag-* functions checked the flag and then called an internal sob-* function which did the actual work. The only sob-* functions that needed to check the flag again were those which added or removed elements. It would be annoying to add a third subtype on similar lines (say, minibags with at most 4 elements), and it couldn't be done without the source code, but it's far from clear that you would really need such a thing anyway ("zero, one, or infinity")

A relation

IMHO, subtyping is a complex relation between types that can be achieved by a set of more elementary operators. Sum and product operators, along with some others like intersect, divide (oposite to product) and map function can form an algebra for dealing with subtyped data.

I believe that mentioned system of elementary operators is not the only system that can *hold-define* subtyping. I'd dare to say that Turing machine essential operators also form such a system in terms of runtime data it operates on. Therefore, any decent computer language has everything it needs to form a subtyping relation in its runtime, even if subtyping is not a built-in feature on the language meta level.

The question is how clean the subtyping implementation can be. Probably the most raw implementation would involve an assembler command sequence with some I/O for checking types and operations on data, but it would be a very dirty implementation.

When choosing elementary operations, I like this combination: sum, product, intersect and map. What I'd like to see is a Turing complete language that uses only these. Yet to be seen...

Higher-order & API boundaries

Consider some higher-order function f(x, g) that uses some element of x and then at some point calls g(x). If you want to allow g to accept additional elements in x, you need to somehow account for that in the type of f.

This is pretty common in framework code; architectures with callbacks or hooks. For example, if you want to allow for extensible parameters with dynamic-extent or request-scope. The Go folks wrote about the Context interface that has been promoted to the stdlib: https://golang.org/pkg/context/ - In particular note the example on the `Value(key interface{}) interface{}` method, which leverages private nominal types to accomplish a sort of branding that enables private entries in the parameter map.

As for the Point2D vs Point3D example. I think the way you deal with that is by encoding nominal typing in to a structural type via a brand field. For example, if you have a structure {tag: Point2D, x: 123, y: 456} then {tag: Point3D, x: 123, y: 456, z: 789} is no longer a subtype.

Dimensions, dimensions

Point4D = {
    Origin: {
        Origin: {
            Origin: {
                Origin: null,
                Extrusion: 12
            },
            Extrusion: 34
        },
        Extrusion: 56
    },
    Extrusion: 78
};

x = Point4D.Origin.Origin.Origin.Extrusion;
y = Point4D.Origin.Origin.Extrusion;
z = Point4D.Origin.Extrusion;
t = Point4D.Extrusion;

Dimensions continued

Ok, so I've done a bit of homework and this is what I've found out, if it can help:

//note that @Null is not an empty set. It behaves like (@Null × A) -> A, and (A × @Null) -> A

Dimension ⊆ @Float64;
SpaceTime ⊆ (
    Origin ⊆ (@Null ∪ @SpaceTime) ×
    Extrusion ⊆ @Dimension
);

//a bit of pattern matching does the magic
Point1D ⊆ @SpaceTime (@Dimension);
Point2D ⊆ @SpaceTime (@Dimension × @Dimension);
// you can even do this with the same effect
Point3D ⊆ @SpaceTime (@Point2D × @Dimension);

P ⊆ @Point3D (128 × 256 × 512);
x ⊆ @P.Origin.Origin.Extrusion;
y ⊆ @P.Origin.Extrusion;
z ⊆ @P.Extrusion;


There is a chance that parameter application could be done by an intersection like in the following example:

P ⊆ @Point3D ∩ (128 × 256 × 512);


so it reports an error if wrong parameters are used.

Consider some higher-order

Consider some higher-order function f(x, g) that uses some element of x and then at some point calls g(x). If you want to allow g to accept additional elements in x, you need to somehow account for that in the type of f.

So you're suggesting that higher-order composition might require subtyped records of some sort:

type A ⊆ B
g : B -> ()
f : A ⊆ B.A -> (B -> ()) -> ()
f x:A = g x

I'm not convinced. In what way must the extra fields in A meaningfully be tied to B via a subtyping relation? This has an equally valid, albeit in some sense "less efficient", expression passing around another product type:

g : B -> ()
f : (A,B) -> (B -> ()) -> ()
f (x,y) g = g y
--OR more simply
f : A -> B -> (B -> ()) -> ()
f x y g = g y

In general, it seems any extra fields with which you'd want to extend an existing product type, can instead be more easily provided by including the original product type as a field in a new product type.

In particular note the example on the `Value(key interface{}) interface{}` method, which leverages private nominal types to accomplish a sort of branding that enables private entries in the parameter map.

I don't quite see how this is related to record subtyping. Are you saying the Context interface is somehow a union of all record extensions made to some base record? Because that doesn't sound like something Go can do.

More likely that Context has some type of internal map and Value() indexes the map via some compiler-generated, private type identifier (your brand), in which case I don't see how record subtyping is related to data structures and runtime types.

The important bit: API Boundaries

In general, it seems any extra fields with which you'd want to extend an existing product type, can instead be more easily provided by including the original product type as a field in a new product type.

Sure, if you control all of the existing types. However, in practice, you frequently do not control all of the existing types.

Are you saying the Context interface is somehow a union of all record extensions made to some base record?

No, I'm saying that the Context interface solves the same problem (extensible parameters crossing API boundaries) by providing untyped accessors.

More likely that Context has some type of internal map and Value() indexes the map via some compiler-generated, private type identifier (your brand)

That's exactly how it works. It contains a map[interface{}]interface{}, which is essentially just Map<Dynamic, Dynamic> in more popular notation. The empty interface type is just a struct of *reflect.Type and unsafe.Pointer. You're right about the "brand". All types are the equivilent of a "newtype" in Haskell, and you can make un-exported types. The primary reason that the Context type doesn't expose the underlying map directly is to prevent you from enumerating the keys, which gives you some safety against other code from peaking at your private parameters, but still lets them flow through at runtime.

in which case I don't see how record subtyping is related to data structures and runtime types.

I only pointed out Context because it addresses the use case. It's a quasi-dynamic solution to the problem: It leverages static types to provide branding, but reflection/dynamic to provide extensibility. However, at runtime, the extra values flowing across the API boundary could very well be known statically. If you had row polymorphism for extensible parameter maps, then you could say "This definition is of a polymorphic function takes at least these context parameters." Then at the call site, that type narrows to include additional parameters.

Sure, if you control all of

Sure, if you control all of the existing types. However, in practice, you frequently do not control all of the existing types.

I'm afraid I still don't follow. If some module X requires a record B, and you write some functionality that depends on X but also needs fields in record A, then why wouldn't A just include a field for record B and pass that along? Or create a new record type C that includes both A and B as fields if you don't control record A?

However, at runtime, the extra values flowing across the API boundary could very well be known statically. If you had row polymorphism for extensible parameter maps, then you could say "This definition is of a polymorphic function takes at least these context parameters." Then at the call site, that type narrows to include additional parameters.

I'm skeptical it would be so easy to express this with row polymorphism in way that isolates it from other modules accessing it, but you could achieve the same thing by having the context be a polymorphic parameter and giving clients accessor functions on that parameter.

Passthrough

If some module X requires a record B, and you write some functionality that depends on X but also needs fields in record A, then why wouldn't A just include a field for record B and pass that along?

Now you're missing the "higher-order" part. It's not about module X depending on record B. It's about module X wanting to call a function in module Y, where module Y calls back to some function provided by X. If Y wants to allow passthrough of additional data, it needs to offer either 1) dynamic extensibility (e.g. a map) or 2) some kind of polymorphism.

A concrete use case from the Go is request scoped parameters. Say you want to write some so called "middleware" function that takes an HTTP request and decorates it with the current user. The middleware function takes a context of type ρ and the next function is calls takes a context of a some type conjoin(user, ρ). In this case, you didn't write the middleware function, so you can't add a user key to its context type yourself.

Say you want to write some

Say you want to write some so called "middleware" function that takes an HTTP request and decorates it with the current user. The middleware function takes a context of type ρ and the next function is calls takes a context of a some type conjoin(user, ρ).

I think the specifics are going to matter for clarity, so is this your middleware function:

middleware : ∀ a.ρ -> (ρ * User -> a) -> a
-- or alternately:
middleware : ∀ a.ρ -> (ρ -> User -> a) -> a

So module X provides module Y with a polymorphic function which acts as a sort of row variable. If this accurately captures your middleware, what do extensible records actually add to this? If this does not accurately capture the scenario, can you provide a type signature that does?

Unordered composition

can you provide a type signature that does?

Not really, no, since I find almost all polymorphic type signatures to be completely inscrutable.

However, if I understand your type signatures correctly, you're creating an ordered product of some unknown data plus the User data. That may work for this very narrow use case, but it won't work when you have a rich stack of middleware where the order matters. You wind up in nonsensical world of problems experienced by things like a monad transformer stack, where you're manually indexing in to ordinal position in a product type. If you add an extra middleware function in to the pipeline, or reorder some, then all of a sudden you have to change a lot of code.

Top-down vs. bottom-up

I think you're imposing an artificial top-down view of this pipeline, when the application can just as easily build the pipeline it needs bottom-up. In side-effecting pseudo-ML, something like:

module Pipeline : sig
  type 'a t
  val pipe : 'a t -> ('a -> 'b) -> ('a -> 'b) t
  val hook : 'a t -> ('a -> ()) -> ()
  val run  : 'a t -> 'a -> ()
end

So the program builds the pipeline it needs, in the order it wants, and can hook into it at any point. If it starts with 'a=HttpRequest, and adds 'a=Session and 'a=User, then it would end up with (HttpRequest -> Session -> User) Pipeline.t, which you hook into with a function (HttpRequest -> Session -> User -> ()). The explicit ordering isn't a big deal, a simple wrapper can swap argument order for modules that expect a different order, and all you need for this solution are first-class functions.

Given a top-down view though, I'm still not convinced that extensible products are actually the right solution. You have each stage of the pipeline extending the request record with its own extra field which incurs a lot of unnecessary memory allocations. If you can lift all the extensions to happen once at initialization, I'd bet it will have an encoding without extensible products, which is why I asked for a type sig so we can discuss something concrete.

Otherwise, I think the dynamic map is the better solution in top-down architectures, and you can ensure correct indexing via static capabilities, which again don't require extensible products.

Overspecifying

The explicit ordering isn't a big deal, a simple wrapper can swap argument order for modules that expect a different order

The explicit ordering is a actually a huge deal. Consider what happens when both the pipeline framework and several stages/hooks in the pipeline are written by different authors. The person composing the pipeline needs to add wrapper functions in order to shuffle parameters around. This is complete noise and will cause a serious maintenance headache when you change one component and have to insert/remove/shuffle a bunch of adaptors.

More importantly, using a product type that is both ordered and closed is semantically wrong. It over-constrains the problem. What you want to express is the notion "I take in some data, including, but not limited to X, Y, and Z, and then pass all of that data on to the next component." There's no statement of order and an explicit statement of open membership. Over-specifying types leads to brittle programs that require dramatic non-local changes when refactoring, among other problems.

If you ask me, ignoring unordered & open data is one of the biggest ongoing mistakes of language design. You need both ordered/unordered and open/closed. Usually ordered+closed go well and unordered+open go well. You can get very far with vectors + maps/sets/whatever. Lots more languages should take this path.

Anyway, that's all for me on this matter.

I'm not convinced that

I'm not convinced that argument shuffling is as huge a deal as you're implying, nor am I convinced that extensible products actually solve this problem any better. Like I said, extending a single record at each pipeline stage has terrible memory behaviour, so you want to build a single record and accept it once into your application, then delegate the constrained subsets to subcomponents, ie. entryPoint : { r | request : HttpRequest, db : DbConnection, oauth : OAuthProvider } -> ().

This is literally no different than having a single entry point into your application that accepts all of your dependencies, (HttpRequest -> DbConnection -> OAuthProvider -> ()), and then delegating to the subcomponents the subsets they need.

At both points, you have all of the dependencies in one place. If you want to allow other components to hook earlier in the pipeline before all dependencies are available (which you can forbid by design to avoid this possibility), shuffling combinators are one-liners:

Pipeline.hook p (fun a unused b -> componentInit b a)
-- or
let swap f = fun a b -> f b a
Pipeline.hook p (swap componentInit)

Neither of our approaches can ensure a total order of pipeline stages so only needed dependencies are created before delegating to components, so some arguments must be ignored, yours by row variable, mine by dropped arguments. The type system will tell you precisely which shuffles are incorrect when swapping components, which are 10 second fixes each time. In terms of software maintenance, this seems like a pretty trivial problem, so I don't see the headache. In your approach, the above would be barely nicer:

-- 'a : {r | request : HttpRequest, db : DbConnection, oauth : OAuthProvider }
-- componentInit : { r | request : HttpRequest, oauth : OAuthProvider } -> ()
Pipeline.hook p componentInit

Of course, your approach assumes that each independently developed component will use the same labels when accepting the same component types, which I'm skeptical will be true in general. If untrue, then you need projection wrappers even in your scenario.

I think this example has probably hit a dead end, do you have another that use case of open data that might be more compelling?

Barely Nicer

What you call "barely nicer", I call not over-constraining my program. You might be OK with the one-liner data shuffles, but if changing some small part of your program causes non-local compiler errors that can trivially be fixed up by data shuffles, then you've over-constrained the types by specifying order where no order applies.

This sort of viral compiler error is precisely why working programmers frequently grow frustrated with languages having type systems more advanced than the typical OOP ones. It's also why inference is so important. If you force people to change irrelevant stuff to satisfy the compiler, then you invite them to write worse code to workaround the problem. Consider the lack of effect inference on checked exceptions that leads Java programmers to try/catch/printStackTrace to shut-up the compiler. The same API-boundary problem occurs there too: Somebody creates a class without a declared exception type and you subclass it, but now you need to try/catch/rethrow as an unchecked exception in order to satisfy the effect type when overriding a method.

Consider the lack of effect

Consider the lack of effect inference on checked exceptions that leads Java programmers to try/catch/printStackTrace to shut-up the compiler.

I don't think this is faithful analogy. Changes to checked exception signatures incur unpredictable far-reaching non-local breakages, which is why inference is so important. That's not the case for the example we've been discussing. Changing argument order in a pipeline which is hooked into N times requires at most N small tweaks (and N is small in this application). Changing an exception signature of a function called N times requires at least N changes at each call site, and possibly changing each call site X's exception context, and now repeat the previous steps for each site that calls X.

A much better example of use for row polymorphism that you've been describing is Caml-Shcaml. They use row polymorphism to replace the unstructured line-oriented text output of typical shell pipelines with streams of structured records, and pipeline transducers are roughly functions over polymorphic records. This is the most compelling case I've seen so far, but:

1. shell scripts are quick ad-hoc programs that are not necessarily intended to be maintainable long-term. Hardly typical of large-scale structured programming.
2. this example argues for polymorphic field access, not for any kind of record extensibility
3. the justification for polymorphic field access implicitly depends on assigning types to field-polymorphic programs, ie. their motivating example is cat /etc/passwd | cut -d: -f1, where the field being selected is the user name. But if we don't artificially constrain ourselves to assigning types to existing combinators, then that script is easily replaced by something like cat /etc/passwd | map (fun x -> x.username). The "cut" combinator serves no real purpose when you already have first-class functions.
4. row subtyping doesn't solve this problem, which is what I originally asked about; so there's still no evidence of real utility to record subtyping that I've seen so far

data integration

Extensible records only simplify data shuffling on one dimension (the linear structure of the record). They don't help with different representation (slightly different labels, types). They don't help with deeply structured data (e.g. operating on a node in a tree, composite structure, different groupings of data), They don't help with distributed representation (gathering data from multiple locations in a collection, redistributing results). Do extensible records really offer much benefit?

Data model integration is something I've spent a lot of time thinking about, as part of my broader vision. There are many spatial (scatter/gather, representation), temporal (event aggregation and translation), and security issues when integrating heterogeneous models used by different parts of a system. The problems are a bit easier within a single language, but not by much. I attribute the existence of 'frameworks' as examples of 'abstraction failure' when it comes to heterogeneous, real-time data integration by modern programming paradigms.

It is my impression that weak partial solutions (like extensible records) don't offer much to solve the larger problem. Instead, developers end up with inscrutable type signatures and a relatively specialized set of benefits.

Techniques such as lenses have potential to work more generally. I developed a paradigm around an effectful, reactive variant of spatial-temporal lenses - reactive demand programming - based on my observations of the problem. I plan to pursue such things further, over the next few years.

When I consider which data types would do me the most good, I suspect polymorphic variants (i.e. lightweight dependent pairs) would do me more good than order-independent extensible records. If I want order-independent records, I can construct or sort a list of polymorphic variants by their label.

Namespaces and Flat structures

They don't help with different representation (slightly different labels, types).

Slightly different labels are usually caused by a lack of a namespace mechanism for the record keys. Compare to Clojure's namespaced keywords: You can comfortably have :foo/name and :bar/name in one map and refer to them succinctly within their respective namespaces.

Slightly different types can often be accommodated by subtyping, often of the extensible record variety.

They don't help with deeply structured data (e.g. operating on a node in a tree, composite structure, different groupings of data),

Namespaced keys help here too by enabling flatter representations. For example, instead of representing a song and its album as {:id 123, :name "A Song", :album {:id 456, :name "The Album"}} you can represent it as {:song/id 123, :song/name "A Song", :album/id 456 :album/name "The Album"} and suddenly this representation is much more broadly useful: it can be used where either a song or an album is expected.

Do extensible records really offer much benefit?

Yes. At least in my experience with Clojure's maps and, to a lesser extent, Go's struct embedding. Programs that use associative data instead of positional data tend to be much more resilient to change, since most changes are adding new keys.

developers end up with inscrutable type signatures

Can't argue with this. Stop writing so many type signatures. Or at least choose a better way to represent records: keysets.

If I want order-independent records, I can construct or sort a list of polymorphic variants by their label.

This is so bazaar to me, since I want order-independent records approximately half of the time. Forcing a total order on to keys seems like a weird way to accomplish this and fails to accommodate the "open" requirement.

bazaar

If you have namespace labels, you'll just need to translate between slightly different labels from entirely different namespaces. E.g. instead of `{x, y}` vs. `{x,y,z}` for point2D vs. point3D, you'll have `{p2d/x, p2d/y}` vs. `{p3d/x, p3d/y, p3d/z}`. And when you need to use your 3D point in some other library, you'll learn that it suddenly expects `{Point3D/X, Point3D/Y, Point3D/Z}`.

That's a contrived example, of course. But it's representative of my experience integrating systems (my day job for five years - integrating robotics systems with diverse sensors, payloads, software drivers).

Your composite song/album record is contrived in its own way. Why would song and album be part of the same record anyway? There are many different compositions: album description is referenced by song, album contains a collection of songs, etc.. And when there's more than one way to do it. people will do it every which way. So you'll still have the full translation issues between independently developed services.

Programs that use associative data instead of positional data tend to be much more resilient to change, since most changes are adding new keys.

I agree that associative structures are relatively robust to extension. Though, I think whether most changes involve "adding keys" depends primarily on whether that's on a path of least resistance. There are quite a few ways to do associative structures. Component Entity System and Behavior Object Tag come to mind.

Anyhow, changes to a data structure by a single developer don't even begin to cover what you describe as "important bit" issues surrounding API boundaries.

This is so bazaar to me

Funny you should misspell 'bizarre' so appropriately. Because API boundaries really are a bazaar. And the many ways of representing things (data, events, APIs, security, etc.) is a source of many challenges for data and system integration.

scale affects clean design and lack or presence of mess

If you ignore scale and complexity, I think subtyping is strictly an optimization, by doing similar things in one place. (I.e.: this code applies to more than one subtype, which happen to have this part in common).

Are there any cases where subtyped records are truly indispensible?

Probably not indispensible, more like: greatly reduces (net) ways to cause trouble.

You can always duplicate code, so each type has its own disjoint set of associated routines. But managing that duplication gets harder the more there is, eventually surpassing a dev's ability to track when reasoning, once things get hairy enough. (If tools handle it, a dev must still debug the tools, so the problem persists.)

It seems another variant in the "together or scattered?" decision in design. Together is often an argument for efficiency, or for clarity, but it entangles things. Since you can always gather things that were scattered, it doesn't seem formally different in result from together. But when it comes to cohesion, enough quantity can cause a difference in quality.

This example may not seem related: if you represent dependents of objects in memory, is the dependency relation stored as a separate relation? Or does it hang right off each object that might have dependents? The former makes them less entangled. The latter puts the info right at hand when you modify state that might require notification. If this happens with extremely high frequency, and usually there are no dependents, it is more efficient to see the answer is no right now. You get the same results either way, if you don't care how fast things run, or whether some details are weirdly privileged. Visibility can affect developer reasoning, and given more loose pieces a bigger mess can be made; so there are pragmatic concerns.

You can always duplicate

You can always duplicate code, so each type has its own disjoint set of associated routines.

I'm not sure why you think duplication would be needed. Imagine that whenever you'd want to extend a record type X with a new a field, you instead simply create a new record type Y with the new fields and a field referencing X. All functions on X are still defined only on X, all new functions are defined on Y.

This arrangement would seem to be better factored for testing purposes as well, so I don't understand your references to duplication and scattering.

to understand, expect 'divergent thinking' instead of convergent

I didn't make my point clear. No duplication is needed if you subtype just as you say, which is the normal way to do it in C. My point was that if you didn't want to subtype, you could duplicate, though this would have little to recommend it beyond making things independent.

My point about scattering was that it didn't materially change things, besides multiplying entities, not that it was a good idea without a reason. I aimed for a remark on necessity by comparing alternatives. But most things I say over a sentence in length get a dicey reception.

My point was that if you

My point was that if you didn't want to subtype, you could duplicate, though this would have little to recommend it beyond making things independent.

This is what I was asking about. I don't see how this would work. The example I gave wasn't of subtyping, it's just building a tree of records matching the dependencies, rather than flattening that tree into one subtyped record.

Effects models. World models.

Row polymorphic types are very useful for modeling an extensible set of 'effects' each with a separate handler or capability token. Each subprogram may only use a few effects, but our overall program uses the composite set.

For world->world functions, it is similarly common that subprograms in composition operate on different (partially overlapping) fragments of the world. In that case, having each world be an appropriate subtype is useful.

But there are other approaches. Lenses. Typeclasses. Traits. Interfaces. I do not believe row polymorphic types are essential. Just sometimes convenient.

In case of using Point3D for Point2D, it seems some form of lens would be appropriate. I.e. projecting a 2D plane intersecting our point in 3D space, possibly with curvature. Of course, this wouldn't help unless either the Point2D API was written with lenses in mind or our language enables their use transparently (getters, setters). A simple, robust option might be to just explicitly project 3D to 2D and back.

Row polymorphic types are

Row polymorphic types are very useful for modeling an extensible set of 'effects' each with a separate handler or capability token. Each subprogram may only use a few effects, but our overall program uses the composite set.

But extensible effects doesn't really support the case for record subtyping, as it only requires open unions. It seems overkill to include the former just for the latter.

I'm looking for some examples where record extension and subtyping are really the most natural solution to a problem. I cited examples where the availability of record subtyping easily leads you astray from a proper domain model, so if there are no good cases, I plan to include extensible sums but not extensible records.

sums vs. products

If you take polymorphic labeled sums together with first-class functions, you effectively get polymorphic labeled products because you can ask the function for its value at a given label, and you can construct new functions that override the value at a given label.

To go the opposite direction, treating a product as a sum type, requires dependent types. Or dynamic types.

So yeah, I can see why you might prefer to focus on sums at the language level. They do offer a better return on investment in context of the conventional static type assumption that all branches of a conditional expression should return the 'same' type.

I guess the main advantage of supporting both products and sums would be better GC support for the products compared to simulating them via functions.

I guess the main advantage

I guess the main advantage of supporting both products and sums would be better GC support for the products compared to simulating them via functions.

Absolutely, but if extensible products tend to yield bad domain models, I'd just as soon not include them. The path of least resistance should then (hopefully) yield more maintainable programs. If they ever need to be simulated, then I suppose that will yield the data I'm looking for.

What about final encodings

What about final encodings of data types?

Isn't symmetry worth something too? Having normal sums and products is good, so why make it asymmetric for extensible sums and products?

Another argument is that extensible sums and products are nothing but ordinary sums and products with extra convenience. You can to some extend do homegrown row polymorphism with Pair[Int,Pair[Bool,R]] where R is the rest of the tuple, i.e. the row variable. Row polymorphism extends this to normal records with named fields rather than nested pairs that have to be indexed into manually.

Isn't symmetry worth

Isn't symmetry worth something too? Having normal sums and products is good, so why make it asymmetric for extensible sums and products?

Software engineering reasons. Like I said, it's fairly common to use product extension where it's not appropriate, it seems harder to make this mistake with sum extension. I'm trying out an opinionated stance on what sorts of extensions are actually useful and maintainable.

Final encodings seems like the best example presented so far, where you might want to extend a record of functions representing a final encoding. I'll have to think about it.

Regarding the path of least

Regarding the path of least resistance leading to bad designs I think there is a bit of a difference between subtyping and row polymorphism, because row polymorphism is much like parametric polymorphism. If you really need it you can get subtyping out of it by existentially quantifying the row variable, but that's not nearly as much a path of least resistance as having subtyping everywhere by default.

I agree, but the issues I

I agree, but the issues I raised also question the utility of record extension, which also applies to row polymorphic extensible records. I see the utility of polymorphic field access, but why would you want to extend a record?

The main application I can think of is an embedding of a relational language, where you may project a set of fields into a product for joins and selection, but do tuples not suffice? If your product is so large that a tuple is unwieldy, then you should probably use a record type with a name anyway.

Tuples suffice, but names

Tuples suffice, but names are more convenient. Having row polymorphism for records and variants without having record/variant extension might be good enough. A reason for not including record/variant extension would be simplicity, but I don't immediately see why this construct would lead to bad paths of least resistance like subtyping.

Let S be a finite set of labels for a particular record or variant type e.g. S = {Foo,Bar,Baz} and let F : S -> Type be a function that assigns a type to each label. Then a record type is simply the dependent product Πx:S. F(x) and a variant type is simply the dependent sum Σx:S. F(x). Modeling records and variants as syntactic sugar over these constructs may be a good approach. This allows you to do row polymorphism and record extension and even fancier things like record merge in user space, and you can choose whether or not you add syntactic sugar for each particular construct.

Row polymorphic extension is

Yes, names are more convenient, which is why I said when tuples would be inconvenient, you should use a named record type.

Row polymorphic extension is definitely a murkier case than subtyping, but a) like you said, it's easy to simulate subtyping if you have it, so those coming from subtyped languages will abuse it until they learn better, and b) extension is certainly not a cheap operation, so there's a performance-path-of-least-resistance case for leaving it out.

I'm open to it though, but like with subtyping, I'd like to see some actual use cases first.

least resistance

Isn't symmetry worth something too? Having normal sums and products is good, so why make it asymmetric for extensible sums and products?

Language design isn't just about providing the most power to the programmer, it's about providing a good programming experience. Asymmetric designs with a clear "path of least resistance" can be useful insofar as they guide independent developers to consistent models that are more easily integrated.

Row polymorphism modelling effects

I wish I were convinced row polymorphpism is useful for managing effects. I have a system where any procedure or function type can be indexed by any type (including record types). Written D -> [E] C. Using (scoped) row polymorphism certain things can be calculated but it seems limited. I use D -> 0 for a procedure, so

proc f[E] (g: D -> [E] 0) (x:D) : [ (IO:1 | E)] {
  g x; doIO;
}

This says f has whatever effects E that g has plus it does IO, IO is a record field label with type unit. The effects type annotation is purely manual (the code has to be analysed and the type calculated by hand). Procedure and function index types are checked by the usual specialisation rules. There are some obvious uses, for example with the default for the index as type unit, a function parameter with type D -> C means D -> [1] C, and the type checker guarantees the function argument is annotated pure, which means the function itself can safely be annotated as effect free.

Since it is also possible to remove known fields, we can also provide annotations that say that an argument must throw exception X, and since we catch it in our function, we can remove it from our type index.

proc f[E] (g: D -> [(ThrowX:1 | E)] 0) (x:D) : [E] {
  try { g x; } catch X => ;
}

But I question whether this is really useful "in the large". A few simple examples are not the same as the impact on a large system. For example Ocaml polymorphic variants look really cool, but in practice they're not as useful as they seem. This is because extra flexibility always incurs extra housekeeping costs.

How about

How about:

distribute (manufacturer, product, region) <: produce (manufacturer, product)

Wherein subtyping means an existence dependency?

Is this supposed to model

Is this supposed to model distributors and producers, where a distributor is a producer that's tied to a region? If so, like the Person example, these seem to be roles for a commercial entity, not intrinsically tied to some kind of intrinsic kind of entity.

But I'm not sure what "existence dependency" might mean here, so I'm probably off.

Relationships, not entities

It's supposed to model distribution of products in regions by manufacturers, and production of products by manufacturers. By existence dependency I meant a (composite) foreign key constraint - distribution implies production. However, production doesn't necessitate distribution.

Unlike your Person example, you can't just add a Role field, a nullable Role field allows us to conflate different predicates and isn't a clean solution at all. Also, unlike your Point example, you can't just project distribute to get produce.

Got it, so you have two

Got it, so you have two composite keys, one of which has an extra field. So how does subtyping between these keys actually help?

Relationally, you'd probably also want to slice this data many ways, just like the point example, ie. you'd want to see (manufacturer, region), (product, region), etc. So it seems like you'd still want explicit projection.

And while you can't project a distribute result set to get a produce result set, I'm not clear why subtyping would help.

In a DBMS with support for

In a DBMS with support for FK constraints and joins, it's redundant. In non-relational systems like NoSQL stores, it could provide some of the same benefits.

Can you elaborate how key

Can you elaborate how key subtyping can help you with a simple key-value store? I don't understand what benefits this is supposed to yield.