## Formalization and programming language design -- explained to all

As part of an introduction to my own academic work, I wrote a rather general introduction (for non-specialists) to the mathematical approach to programming language design. This introduction reasonates with some eternal LtU debates, so I thought that I could propose it here.

I suspect that many frequent posters it will find it very old-school. It is old school, and it is not at all radical text -- I could be more radical, but this would not be appropriate for this document. In particular, it is mostly a post-hoc justification of the general "mathematical" approach as represented in mainstream PL conferences. I'm in favor of supporting diversity and underdog approaches, but I still think the mathematical approach has a very strong case, and I think the claims in this text could make consensus across many LtU members. Feedback is welcome.

Humans programmers have invented many different symbolic representations for computer programs, which are called programming languages. One can think of them as languages used to communicate with the computer, but it is important to remember that programming is also a social activity, in the sense that many programs are created by a collaboration of several programmers, or that programs written by one programmer may be reused, inspected or modified by others. Programs communicate intent to a computer, but also to other human programmers.

Programmers routinely report frustration with the limitations of the programming language they use -- it is very hard to design a good programming language. At least the three following qualities are expected:

• concision: Simple tasks should be described by simple, not large or complex programs. Complex tasks require complex programs, but their complexity should come solely from the problem domain (the specificity of the required task), not accidental complexity imposed by the programming language.

For example, early Artificial Intelligence research highlighted the need for language-level support for backtracking (giving up on a series of decisions made toward a goal to start afresh through a different method), and some programming languages make this substantially easier than others.

• clarity: By reading a program description it should be easy to understand the intent of its author(s). We say that a program has a bug (a defect) when its meaning does not coincide with the intent of its programmers -- they made a mistake when transcribing their thoughts into programs. Clarity is thus an essential component of safety (avoiding program defects), and should be supported by mechanized tools to the largest possible extent. To achieve clarity, some language constructions help programmers express their intent, and programming language designers work on tools to automatically verify that this expressed intent is consistent with the rest of the program description.

For example, one of the worst security issues that was discovered in 2014 (failure of all Apple computers or mobile phones to verify the authenticity of connections to secure websites) was due to a single line of program text that had been duplicated (written twice instead of only once). The difference between the programmer intent (ensure security of communications) and the effective behavior of the program (allowing malicious network nodes to inspect your communications with your online bank) was dramatic, yet neither the human programmers nor the automated tools used by these programmers reported this error.

• consistency: A programming language should be regular and structured, making it easy for users to guess how to use the parts of the language they are not already familiar with. Programming languages must be vastly easier to learn than human languages, because their use requires an exacting precision and absence of ambiguity. In particular, consistency supports clarity, as recovering intent from program description requires a good knowledge of the language: the more consistent, the lower the risks of misunderstanding.

Of course, the list above is to be understood as the informal opinion of a practitioner, rather than a scientific claim in itself. Programming is a rich field that spans many activities, and correspondingly programming language research can and should be attacked from many different angles: mathematics (formalization), engineering, design, human-machine interface, ergonomics, psychology, linguistics, sociology, and the working programmers all have something to say about how to make better programming languages.

This work was conducted within a research group -- and a research sub-community -- that uses mathematical formalization as its main tool to study, understand and improve programming languages. To work with a programming language, we give it one or several formal semantics (defining programs as mathematical objects, and their meaning as mathematical relations between programs and their behavior); we can thus prove theorems about programming languages themselves, or about formal program analyses or transformations.

The details of how mathematical formalization can be used to guide programming language design are rather fascinating -- it is a very abstract approach of a very practical activity. The community shares a common baggage of properties that may or may not apply to any given proposed design, and are understood to capture certain usability properties of the resulting programming language. These properties are informed by practical experience using existing languages (designed using this methodology or not), and our understanding of them evolves over time.

Having a formal semantics for the language of study is a solid way to acquire an understanding of what the programs in this language mean, which is a necessary first step for clarity -- the meaning of a program cannot be clear if we don't first agree on what it is. Formalization is a difficult (technical) and time-consuming activity, but its simplification power cannot be understated: the formalization effort naturally suggests many changes that can dramatically improve consistency. By encouraging to build the language around a small core of independent concepts (the best way to reduce the difficulty of formalization), it can also improve concision, as combining small building blocks can be a powerful way to simply express advanced concepts. Finding the right building blocks, however, is still very much dependent of domain knowledge and radical ideas often occur through prototyping or use-case studies, independently of formalization. Our preferred design technique would therefore be formalization and implementation co-evolution, with formalization and programming activities occurring jointly to inform and direct the language design process.

## Comment viewing options

### Simplicity, clarity, conciseness, consistency - Human factors

Let me first say that I'm a supporter of formal methods and that logical consistency is of course a foundational requirement for a programming language. However I think a common mistake in formal languages is to assume conciseness and/or consistency in notation directly correspond to clarity or simplicity for (most) human beings. Before automated tools I think - when type-checkers were people - having a highly regular, minimal notation was probably helpful for the type of people willing to do such tasks - as it is a tedious and highly error-prone task. However, if you take natural language as a cue - you'll see that it retains a certain amount of redundancy and that the very most common constructs are irregular. Compressed, highly-regular constructs tend to come across as mind-numbing to most people. Since programming languages have already become far more mainstream than traditional math and logic and seem to be ever-increasingly so, this may be something worth considering should you wish to reach a broad range of users and to avoid that eyes-glazed-over response. Now, I'm not saying I know what the best "language" should be from a human factors point of view - I'm just suggesting that it may not turn out to be the most concise or consistent (in the sense of regular structure) one. Also irregularity is undesirable from a meta-programming point of view so it seems there's a challenge in incorporating such effectively - note that we've done that to accommodate machine "factors" - e.g boxing of primitives.

### Specific examples?

Not knowing much about linguistic, I don't feel comfortable arguing about natural languages. While I could see some of what you say being potentially true, I think I would find it more convincing if supported by specific examples. Which features of programming-languages would you suspect of being mind-numbing, and would be helped by a bit more irregularity?

The only example I can find would be "parentheses in Lisp", but:
- I am not convinced people actually mind once they have passed the familiarity barrier (but that may matter nonetheless for adoption);
- In any case it is quite clear to me that the math-based approach sketched in this introduction is ineffective to judge most details of concrete syntax (it only catches the spectacular mistakes such as Coffeescript's absence of local binding construction) and thus would not apply in this case.

Do we have examples of somewhat less superficial aspects of a programming language where your point on regularity decreasing usability applies?

### off the top of my head..

- point vs point-free notation
- foreach / select / do-notation vs lambda/function-call notation for functors/monads
- while loops vs tail-recursion
- Record and data declarations vs Tuples and Either

### Domain-specific constructions?

Thanks, the examples really help.

Point-free notation is easily readable when it describes a linear processing pipeline (composition of one-input one-output transformations), and it seems to me that it is in this quality that it is most often used, with much delight in some communities; I would say that in this situation it is not clear that one option is clearer than the other. The named notations have a clear advantage in readability when either the introduction of an intermediate name helps comprehension (this is linguistic, but not related to (ir)regularity); or when we start needing non-linear plumbing, for example expressing (fun x y -> x * x + y * y) in a concatenative language is a pain.

I'm not sure what you mean on the foreach/select side. I find it notable that, some language have a "regular" discipline of foreach-style iteration, see Racket iteration for example. (But this is not presented as composition of atomic combinators, so it may actually support your point.) On a similar note, people appreciate the monad-comprehension syntax generalizing the set-comprehension syntax, which I suspect is more regular than foreach/select and equally convenient. On the other hand, it is also often preferred over the generic use of >= operators, which are arguably more regular.

From this example and the one on loops and tail-recursion, I would propose the following principle: for simplicity, people prefer to use the least expressive construction that still allows them to express their goal. Language and library designers are of course free to express the less expressive constructions in term of the more general ones, but this raises language design questions:
- which less-expressive notations should be retained for this usage?
- is this a question of familiarity with existing systems, or are there natural hierarchies of expressive constructs?

On the second question, I would vote for the second choice (natural hierarchies), both because it is the more pleasing choice, and because this is clearly true in certain domains, such as formal language grammars for example, from regular languages to LL/LR to context-free grammars to arbitrary parsers.

### foreach/select

Yes, I meant monad-comprehensions. For, foreach, select, in addition to do-notation have been used in various languages.

I mentioned redundancy in addition to irregularity. Some of these are examples of that.

As regards point-free notation - I doubt the communities that delight in it fall into the mainstream.

I don't know the answers to the questions you raise.

Also to the above list I'd add:

- uncurried vs curried functions

### General Principle

for simplicity, people prefer to use the least expressive construction that still allows them to express their goal

I think this reverses cause and effect. The lesser expressive construct usually only exists because it is simpler than the more general construct. If the general construct is simpler, we would have gotten rid of or never added the less expressive, more complicated construct. Sometimes, for simplicity, we choose a less expressive construct, but restricted expressiveness isn't the preference.

### That ignores the paradox of

That ignores the paradox of choice, though. The more general construct is more complicated intrinsically because more can be done with it. It doesn't narrow the design space as much as a more limited construct does.

### I don't agree

Generality doesn't imply complication. There are two ways you can generalize something: by handling additional cases explicitly or by removing irrelevant assumptions. The former tends to add complication but the latter tends to make things more general without complication.

### Generality does imply

Generality does imply complexity where choice is involved. This is basically saying that parametric > non parametric or abstract > concrete, not incredibly controversial statements. You might be able to make it up with complexity savings elsewhere, but that is something else.

### simplicity vs. power

You can't say anything sweeping like "the least expressive construct is better because"
1) people prefer it
or
2) it optimizes better

Let me give a counter example.
Lua has only one collection type, tables.
Lua users are preferring perhaps the most complex type because the trade off is that THEY ONLY HAVE TO LEARN ONE TYPE.
so simpler can be I only learn one type, but it's a more expressive type.

Similarly, in terms of optimization, in the context of a language with untyped variables, having fewer types optimize better than having more optimal types. So the jitted version of Lua has only one collection type and only one numeric type (doubles). That optimizes better than having a more complex numerical stack, and the one collection type, under the covers, optimizes to multiple types depending on use, arrays, queues, mutable tables, immutable tables, the combination of the tables and arrays - the caveat being that the optimizer has to be able to back out of an optimization at run time when the assumption fails.

So which is preferred depends on context.

For sophisticated users, there could be a preference for complex types and constructs because the more expressive uses of those might be wanted. So even constructs that can invalidate whole programs might be wanted, threads, full continuations (say if one wants non determinism or search or constraints).

### Agreed; premise is weak

That's a great example.

And more to the point, programmers are typically asked to be comfortable with *not* precisely knowing what's going on. Likewise, language features are instead pushed into libraries, and their definition is subject to change.

Put another way: how deep must the understanding go before there are diminishing returns to formalization?

I am not against formal semantics, but I am all for high impact work, and problem definition is the starting point :)

### Knowing what's going on

Unfortunately you have to depend on knowing what the optimizer can do.

For instance luajit can make arrays that are queues by cheating about where the array starts, but if you make holes in the middle rather than the end, eventually it can turn your queue into a hash table.

So even though the abstract behavior is the same, the difference in speed can be important.

Similarly previous versions of luajit could optimize using a table as an object definition with immutable methods - but not if the same object was also indexed as an array.

You can't get away from having to know what's going on, even if just for the sake of speed and memory usage.

I always thought the problem with Prolog's depth first SLD resolution was that you have to know exactly how it works, not because it's powerful, but because it has limitations that could be considered bugs. And the non-logical features require that you know exactly the algorithm.

Another example of "you have to know everything". In a logic language with a smarter search and no extra-logical features, then you could almost get away with knowing just "it works" - except you still have to know how it scales.

### Unwarranted prescriptivism

Certainly I sympathize with and share the motivation to study (programming) languages from the mathematical perspective. I become concerned when the descriptive approach (build a mathematical model of how people use a language and what and how they intend to express) becomes prescriptive: build the language according to so-and-so clearly right'' principles.

For example, I take an issue with the following claim.

consistency: A programming language should be regular and structured, making it easy for users to guess how to use the parts of the language they are not already familiar with. Programming languages must be vastly easier to learn than human languages, because their use requires an exacting precision and absence of ambiguity. In particular, consistency supports clarity, as recovering intent from program description requires a good knowledge of the language: the more consistent, the lower the risks of misunderstanding.
Whence this should be'' come? What is the evidence for such a prescription for regularity? Why ambiguity is implicated in the difficulty learning (and using)? The claim is not borne by facts. For a very thorough discussion, I strongly recommend the following recent paper by Wasow http://www.stanford.edu/~wasow/Ambiguity.pdf
He wrote many papers on the topic of ambiguity, http://www.stanford.edu/~wasow/Lapointe.pdf and I would recommend all of them.

I can't help to quote the following paragraph from http://homepages.abdn.ac.uk/k.vdeemter/topiCS-Khanetal.pdf

Various aspects of the problem were explored in three experiments in which responses by human participants provided evidence about which reading was most likely for certain phrases, which phrases were deemed most suitable for particular referents, and the speed at which various phrases were read. We found a preference for "clear" expressions to "unclear" ones, but if several of the expressions are "clear" then brief expressions are preferred over non-brief ones even though the brief ones are syntactically ambiguous and the non-brief ones are not; the notion of clarity was made precise using Kilgarriff's Word Sketches. We outline an implemented algorithm that generates noun phrases conforming to our hypotheses.
Hence ambiguous sentences turns out to be preferred by the speakers.

Our talk with Chung-chieh Shan at OBT workshop a few years ago touched on the topic: http://okmij.org/ftp/Computation/index.html#collaborative-PL

Another example: one may think that irregular verbs are sort of a hold-over, and the more frequently a verb is used, the more likely it would eventually become regular. The reality is more complex. In fact, the more often an irregular verb is used, the more likely it will remain irregular. http://blogs.discovermagazine.com/notrocketscience/2009/09/29/the-evolution-of-the-past-tense-how-verbs-change-over-time/

Paradoxically, it even happens that a regular verb becomes irregular: https://stancarey.wordpress.com/2010/06/18/snuck-sneaked-in/
Hence claiming that regularity is best seems premature.

In short, if we acknowledge that programs are written for humans to communicate (to each other and to a computer), then we should study what makes it easy for the speakers rather than what makes it easy for the grammarians.

### Ambiguity is out

Thanks for the feedback; you and chrisoliver make valid points on the "ambiguity" remark. I now agree it is weak and I will remove it.

Here would be the redacted claim:

consistency: A programming language should be regular and structured, making it easy for users to guess how to use the parts of the language they are not already familiar with. In particular, consistency supports clarity, as recovering intent from program description requires a good knowledge of the language: the more consistent, the more predictable, the lower the risks of misunderstanding.

I still support my defense of regularity. I think the important point here is predictability -- the principle of least surprise.

### So then use that.

So then use that. Consistency is vague, Principle of least surprise has some action.

More importantly, what is the drawback to too much consistency, if any?

### Ralph Waldo Emerson?

Consistency is the hobgoblin of little minds? :-p

### texture, traction, and recall

I have a very unscientific answer, when measure is evidence collected impartially in appropriately designed experiments. But I spent an inordinately long time thinking about it, years ago, when comparing and contrasting choices on exactly that dimension. I was trying to explain suckage caused by too much uniformity, in terms of how the human mind works. (And we're often just guessing about how a human mind works.)

what is the drawback to too much consistency, if any?

It seems hard for the mind to latch onto character of different places when it all looks the same. If there is a part of memory helped by spatial mnemonics, attaching meaning or content to location, it can be compromised by bland uniformity erasing distinctions. You end up with fewer random access buckets, because you can't tell buckets apart very well. A bigger bucket population is enabled by weird buckets.

Think of things people complain about because "it all looks the same", which is partly a problem of not having learned to attend parts that vary which happen to be important. Until one learns keywords in a code language, and how they relate, code seems like gibberish.

Uniformity causes a kind of mental smoothness, which makes it easier to slide around, which is good when you are trying to reason from one place to another. But it causes those different places to merge into undifferentiated sameness, which hinders seeing differences in parallel on problems where bigger chunking would be helpful. White noise makes a lousy Rorschach image. Code viewed as binary 1's and 0's would look like white noise.

Irregular verbs in natural language seems explained by Shannon entropy fairly well -- more bits for important parts -- but there may also be a chunking element as well to help attention get traction on the flow.

### A good start, but is

A good start, but is uniformity the same as consistency?

But then that's the point: consistency, conscision, clarity...none of them really have formal definitions in usability contexts (well, consistency is at least a cognitive dimension, but I didn't see Green's work invoked anywhere). What we then have are a bunch of false equivalences. I will state it again: there has been no demonstrated connection between formal PL semantics and usability, and there probably isn't one.

Empirically, there might be something here. Entropy, chunking, overly homogenous encodings without appropriate landmarks, sounds promising. These could all lead to testable hypotheses, unlike say consistency.

### word definitions have diminishing returns

Words are fuzzy enough one can argue (via sophistry) a particular word means whatever you want, if you try hard. But loosely, I suspect consistency is more conceptual while uniformity is more perceptual. Ideas are consistent, while appearance is uniform. But we can't see ideas. :-) So ideas are easy to conflate with means of presentation. Semantically, consistency is more about lack of contradiction, and uniformity is more about regular structure. But near synonyms get used for one another all time, almost exactly to avoid too much uniformity. (The more often a word is repeated, the less it means, because it begins to dominate and lose role by contrast.) I can imagine someone using consistent in a definition of uniform.

I think I was trying to use uniformity to help define consistency. A consistent design is one that uniformly avoids self-interference, because there's a plan about how moving parts play in compatible ways without conflict. The plan, perhaps never expressed per se, imposes some uniformity in pursuit of consistency. Now you can see I am already this close to word-game induced nonsense, and re-use of words has dulled clarity of points.

If I really want something clear, I try for a little darwinian selection, by having characters speak in a dialog, trying to trash one another's inferior sallies. But this is so non-conformist the form overwhelms the value a bit.

### Formal definitions are

Formal definitions are formal, no sophistry allowed! So when someone says consistency, they have a complete definition for the term. In contrast, when Green says consistency in the CD framework, it means "learning some parts explicitly allow other parts to be learned implicitly"...a definition defined by its effect! Not formal at all, of course.

Formal definitions are by there very nature descriptive in some sort of world abstraction that doesn't include humans or usability in any shape or form. There simply is no formal non-empirical reasoning about human behavior and performance, so it shouldn't be surprising that there is no connection between formal PL and PL usability.

Hmm, which makes me wonder: what if we defined formal models about how humans behave like we do with math for say real physics? Where you could define consistency with some real human consequence? Of course, it would all probably be very wrong, it maybe not...formal work on machine learning could be very useful in this regard.

### Formalism and usability

it shouldn't be surprising that there is no connection between formal PL and PL usability.

I disagree, as I wrote:

The community shares a common baggage of properties that may or may not apply to any given proposed design, and are understood to capture certain usability properties of the resulting programming language. These properties are informed by practical experience using existing languages (designed using this methodology or not), and our understanding of them evolves over time.

Type soundness is related to usability. Principal type inference is related to usability. Hygienic macros are related to usability. The list goes on.

It is definitely true that there is no *formal* connection between formal PL and non-formal usability. There is certainly value in studying this connection using a different scientific method; we discussed empirical user-studies, Oleg above backed up his claims on ambiguity by referring to linguistics works. But it is also important to recognize that the method that has been used so far, which is to combine a practice of formalization with our own's programmers experience of using implementations of the formal systems, is still relevant and useful.

### If your type system or macro

If your type system or macro system is broken, you can definitely make a case that this is 'bad' for usability, but we haven't actually learned anything about usability! So to say, it must work...sure, but then what?

We are in the dark pseudo science age of usability, and the current method might as well be hemetherepy.

### ahem

So... Unlicensed back-street clinics performing live transplants from language to language. New constructs rejuvenating the life of old language. Usability cry the surgeons and unleash duck typed Algol on the world.

### Not broken is good, right?

I have trouble making sense of this comment. It sounds like you're saying that traditional PL is just about building languages that aren't broken. Don't the usability guys want to start with a model that isn't broken before they paint on racing stripes?

If usability becomes about "usably not broken" or "usably fast", then...carry on as usual. If you care about user interaction, for every gasche who claims "type soundness is good for usability" we'll have a gbracha who claims that "type unsoundness is good for usability". In the former case they are trying to justify tech decisions post priori, in the latter, it's all about usability driving tech choices, the results are quite different.

### Not aware

I'm not aware that Gilad ever claimed that "type unsoundness is good for usability". All he is saying is that soundness often comes with too high a price tag in terms of complexity, i.e., that there is a usability trade-off.

### Yes, that sounds right.

Yes, that sounds right. Experience design is all about making trade offs, and we don't get to think in terms of good and bad anyways. My point was that technologies should be brought in to satisfy certain experience goals, they are not a foundation for the experiences themselves.

### Mathematicians are people too

Many (if not all) of the same usability issues are already present in traditional math. When physicists do dimensional analysis they are acting as human type-checkers. Pattern matching is a usability feature. As gasche says - the list goes on. However, the end result has been that the usability barrier is so high that 99.9% of the population can't get over it. Thanks to automated tools the story is somewhat different with PL - supposedly the most common job in some US states is now software engineer. It's a sad and obvious fact to everyone on Ltu that mainstream languages such as C++ and Javascript are crap - but it seems pretty clear from the numbers you can't just look to traditional math for usability answers.

### Adaptive Categorisation and Sparse Coding.

This reminds me of adaptive categorisation, where languages have more words for highly used concepts. A categorisation depends on its usefulness, too few categories and it discriminates nothing, too many and it is not performing a useful categorisation either. There are competitive algorithms that look at the ratio of the number of uses of a category to the number of uses of the categorised concept, and split highly used categories into new sub-categories, and recycle the less used categories to do so, with categories in different concepts competing for the finite resources available. I believe this has a relationship to the deep-learning concept of sparse-coding.

### planning adaptive in PL design

Adaptive encoding and categorization is a good association. That it happens in natural language was more an underlying premise in my post. I implied, indirectly, that programming language design does it artificially with goals in mind, both in terms of human usability and good fit for target application space. Overtly I was saying smooth consistency in PL design can have a drawback, owing to a quirk that human minds seem to track irregular detail that stands out, perhaps helping guide attention. Foreground and background are funny things in human perception. The right mix of context and substance might be hard to get.

In PL design, maybe something should act like sprocket holes on the edges of film feeding it through a projector, as far as directing attention is concerned. If it blends too smoothly with actual content, users might confuse logical levels. Also, if content in different places can blur together, because it's easy to do, or the machinery encourages this, it has both pros and cons. When you need to jump sideways, this is easy; but when you need to focus, you have distractions or (worse) confusions.

Not that this matters -- not sure why I'm talking about it. Maybe I've seen a lot of attention failures, and confusing unrelated things is a standout problem. Perhaps it contributes to unnecessary coupling. "I was eating an ice cream cone, and needed a bus ticket, so I wrote an ice-cream-bus-ticket manager." Stuff like that.

### Scala

I'm not sure how exactly the consistency claim should be expressed (gasche revised his claim). But your answer is also (by necessity) ambiguous, so let me talk examples.

Take Scala. It has various aspects that nobody would claim are consistent, and people are hard at work on fixing them — it violates either phrasing of "consistency", and the Internet is rich in rants against them. This is certainly not the ambiguity you describe as desirable — many Scala issues are mostly unintentional bugs, not features.

I think your idea is interesting, but it might be premature — by decades. Can we have ad-hoc language features that fit our expectations yet don't come back to bite us in more complex cases?

Alternatively: why are the languages you commonly typically considered examples of mathematical elegance, and designed according to such prescriptivist principles?

as a way to side-step the ux issues, by throwing the baby out with the bathwater.

### Like s-expr

I was thinking of the typechecker. You're replying about Scala's syntax, right? That's also a problem, but that's one that at least experienced users can deal with — I don't look forward to teaching Scala syntax though.

### What are the consistency

What are the consistency principles that Scala violates?

### Scalac is unpredictable to its authors

To some, including me, this question feels like "But is the sky actually blue?". But I know YMMV, so I'll try to articulate an answer, even though it is hard.

Since we're using gasche's definition, I think Scalac violates:

making it easy for users to guess how to use the parts of the language they are not already familiar with

However, what Scalac violates is more basic:

making it possible at all for users to guess with high confidence how to use the parts of the language they have huge familiarity with, with access to the full spec, without needing to double-check with the compiler.

Not only it has no spec, but even its authors have usually a hard time predicting what Scalac will do on some code, or sometimes even what should happen. Work on DOT and dotty is figuring that out, exactly by making Scala more well-founded, in the kind of tradition that gasche hints at.

Depending on how you use Scala, this issue ranges from a theoretical to a daily problem; IIRC you worked on Scalac, but Scalac is actually well-tested on itself, so some of the worse issues might be less prominent. (Also, Moors once claimed on some ML "10% of the users hit 90% of the problems).

For me this was a big issues for a long while — one of my first big projects was LMS-like, but wrapping the Scala collection library in its full glory. With that I found enough bugs that Scalac hackers got to know me. In another projects I used Scalac much more conservatively, so I ran into fewer issues.

For concreteness, I think these two pieces of code should be equivalent for type inference, yet they aren't. The difference feels hard to predict to me:

case class ViewNoTypeAnnot[T, Repr <: TraversableLike[T, Repr]](base: Repr)
{
//def copy(base: Repr) = ViewNoTypeAnnot(base) //Does not compile: Nothing is inferred instead of T
def copy(base: Repr) = ViewNoTypeAnnot[T, Repr](base)
}

case class ViewNoTypeAnnotWorkingInference[T, Repr <: TraversableLike[T, Repr]](base: Repr with TraversableLike[T, Repr]) {
def copy(base: Repr) = ViewNoTypeAnnotWorkingInference(base)
}


This is from https://issues.scala-lang.org/browse/SI-5298, 4.5 years ago.

To take different examples from somebody else, see this blog post:
http://yz.mit.edu/wp/true-scala-complexity/

To be fair, Scala is not alone — I also find Agda's type checking/inference to be hard to predict, but that feels more acceptable somehow (maybe just because Agda doesn't claim to be a mainstream PL).

On the other side:

For contrast, since some issues relate to type inference, compare with Joshua Dunfield's and Neelk's work on bidirectional type checking. Beyond the huge amount of math, their main story is to give a simple specification for bidirectional type inference to the user, for usability purposes. I find this sort of thing important for usability, and I wonder what you'd think of this aspect.

### I don't really buy the

I don't really buy the "consistency = predict what you don't know about a language from what you do know about it" definition. I mean, it isn't really meaningful. Predictable? Reliable? We don't have a great vocabulary to talk about these things. This is where empirical study could really pay off in coming up with a model and principles in what type inference is "good" and what is "not so good."

Giving the programmer the specification, even if simple, won't help much (it is definitely not sufficient if necessary). Programs have a lot of moving parts, they could be huge, and if type inference occurs over a large amount of code, even with simple rules, the programmer isn't going to be able to replicate it in their head. As analogy, consider Lisp: very simple syntax and semantics, but in aggregate of entire large programs, simplicity of language syntax/semantics isn't enough to save the programmer...that cannot play the computer even if the rules were simple enough that they could...there are just too many steps involved.

We must also focus on better modularity (separate compilation and separate reasoning). But there is something else: dynamic languages still have huge usability advantages wrt usability, since they relegate type errors to the case of debugging with concrete examples...something that is much more tangible than abstract type theory. With a debugger, you can pull out thinking from your head and put it back into the computer...something type systems could make a better effort at.

Anyone who uses Scala heavily will find lots of bugs (I did also, even got my code banned), this is just a right of passage.

### Just this example

Before answering on the rest: I conjecture "current Scala with bugs" will be less worse/less usable, in some sense, than "Scala without bugs". On request, I tried to explain/rationalize in which sense, and you seem to disagree with every step of the rationalization — with interesting points. But do you also disagree with the conclusion? Because I'm honestly not sure.

### We can agree that "Scala

We can agree that "Scala without bugs" is better for usability than "Scala with bugs", but that is really more about engineering than usability.

But I guess we could argue that Scala with a formal proof would be easier/harder to use than Scala without a formal proof. In that sense, having the formal proof means it is simple enough to formalize, but that simplicity could create more moving parts that create even more complexity (e.g. as typically happens with Lisp's simplicity). Not having a formal proof could come about as a result of having a semantics that does more heavy lifting, so the implementation might have more bugs, but in general programs can have fewer moving parts, so it becomes a wash.

Scala was always in the latter category. Ya, it was ambitious, and the lack of simplicity in its semantics led to an implementation that could only be validated through testing. On the other hand, this didn't bug me that much, since I got a lot more out of it...Scala gave me usability elsewhere.

### Scala's Unsoundness

The only thing that put me off is the unsoundness of Scala's type system. It's not that it has bugs, nor that there is no formal proof, but that an allegedly deliberate decision was made to abandon soundness. Things can go wrong, and it's not a bug, it's a feature... And the JVM, the only two things that put me off Scala are its unsoundness and the JVM. Oh, and the fact the syntax is a mess, and lack of kind polymorphism, the four things that put me off Scala are ...

If you were not constrained by the limitations of the JVM, would you really design a language like Scala?

### H&M type systems like

H&M type systems like Haskell exist. OO is popular because people like subtyping, and if you give up nominal subtyping, Scala isn't interesting. Those languages already exist and are perpetually relegated to niches.

Unsoundness doesn't bother me. I can understand why many feel its like scratching nails on a chalk board, but most people are ok with it.

### Popularity of Object Oriented Programming

I am not sure OO is popular because people like subtyping. I remember when OO first came to promenance and people were still formally being taught 'C'. I spent much time trying to persuade people they should be teaching 'OO' at the time. The point is they did not naturally think in OO hierarchies, they had to be taught it, and they still did not get it right. The correct OO solutions are not obvious, hence the need for books like the gang of four's "Design Patterns", and UML modeling.

I think there is room for an OO and functional crossover which is why I spent time working with Oleg and Ralf on the OOHaskell paper. There was a lot in there about the precise relationships between OO concepts and HM type systems, including subtyping, built on top of HM with type-classes. http://arxiv.org/pdf/cs/0509027v1

### The problem with subtyping

The problem with subtyping (+ bounded generics) is not getting a sound type system (well, maybe a bit), but with type inference. I'm not sure how OOHaskell handles type inference.

### This was my experience as

This was my experience as well. I ended up splitting my type system into a coarse unification based system and a precise forward-propogation-only system based on a type lattice. This is kind of the idea of refinement types, too.

### Ya, I used folding and

Ya, I used folding and unfolding of hierarchical trees of type members to get it to work out. Not sure yet about soundness though :)

### HM plus Type Classes

The unification system sounds like HM, and the forward-propagation part sound like type-classes. You can of course integrate forward-propagation with backtracking and remain sound (as forward propagation of constraints behave like a committed choice logic language).

### Hmm

I see how you mean it, but I think that's a bit different from what I meant. The basic inference problem is this:

f x = [myDog, x]

If you have types Animal and Dog, then this could have type (Animal -> [Animal]) or (Dog -> [Dog]) or something more general but more complicated like (forall a>Dog. a -> [a]). I'm saying I would infer (Animal -> [Animal]) for the coarse type but then have something more specific for the precise type.

### See Below

I would be interested in your thoughts on the material in the paper, or at least the fragment presented below which shows how such things can be inferred.

Edit: In effect the default is duck-typing, so in the example with printPP the inference would be any argument that provides a print method is acceptable. To restrict it to a nominal type requires an annotation (but the mitigation is all classes and interfaces are types anyway so if the function is a method it will have a type signature). However the function that accepts a nominal type can be expanded to accept any subtype of that nominal type with no further annotations.

This is of course assuming static polymorphism. In your example you use a list '[]' which is normally monomorphic. For runtime polymorphism you could use an existential type, but then you can only use the base-class interface to access the contents and no down-cast to the sub-type is possible. I think here is where the complexity of existentials is a problem. The simplest solution is to use a sum type:

data Animal = Dog {...} | Cat {...} | ... etc


The reason people often try and use types here is because sum types are normally closed and type-classes or subtype hierarchies are open. To me open-sum-types seem a better solution than trying to use existential types. Something like:

open Animal

Animal |= Dog {...}
:
:
Animal |= Cat {...}


Now:

f x = [myDog, x] -- is a list of dogs
f x = [Dog myDog, x] -- is a list of Animals


Edit2: An interesting variant would be an open-sum-type that constrains all the types in the disjoint union to have the same type-class (interface) so that you can call common methods without having to match the tag.

### No implicit casts

It's been a while since I looked at the OO Haskell paper, but I think you've mentioned one main difference: it doesn't support implicit upcasting. It supports lightweight casts that basically just tell the type system where to cast, but you still have to specify them. I do that, too, but with a trade-off: casting breaks inference. So if you write this:

f x = [myDog, cast x]

Then I'm not able to infer anything sensible for x and I'll report an error. It looks like OOHaskell will just infer a big bag of constraints along the lines of "x is of type a and a is narrowable to type b and b is ...". If you just sprinkle 'narrow' everywhere, do you recover a useable form of implicit upcasting? I kind of doubt it, but I'm not sure what breaks exactly. I'm leery of big-bag-of-constraints semantics.

Implicit upcasting is important for me since I want precise types that subsume refinement types. If we have to insert a cast at a transition from Int{>1} to Int{>0}, then we'd basically have to insert casts everywhere.

### Refinement Types As Constraints

I think what we called downcasting (casting towards the superclass) you call upcasting. The other one (casting towards a subclass is nearly always unsafe. I am not sure implicit casting is a semantic problem, and you could include the downcast for all parameters as syntactic sugar without problems. Don't confuse writing the 'proof' by translation into Haskell with the surface syntax the language would have. Obviously you cannot recover the original type after a downcast, so a disjoint union type makes more sense when you want to do this.

Refinement types can be modelled as constraints, and for specific domains (like natural numbers, 32 bit integers etc) rules can be added. In other words numbers are a special case, and there is no casting involved. The type would be something like:

lessThanZero :: (LessThan Zero x) => x -> x


You can even use polymorphic recursion to lift a value to a type, allowing a value level function to assert a type level constraint, so that no type annotations are necessary to establish this:

let y = lessThan 2 x


Would give y the type constraint LessThan Succ(Succ(Zero)) X, so refinement types do not need subtyping. Both refinement types and subtypes can be implemented as type constraints independently.

### Refinement Types As Constraints

I don't have strong opinions on the term, but I've always seen class diagrams drawn with superclasses on top and it looks like the Wikipedia downcasting article uses the term the way I would.

You're right that you could probably model all of this with constraints. I wouldn't go so far as to say that refinement types don't need subtyping, though. It's just that you can model subtyping with constraints. Again, if you have an Int{>1} and want to pass it to a function expecting an Int{>0}, you want implicit subtyping.

And regardless of whether you model this with general constraints, I think usability will suffer if you use bag-of-constraints for your inferred types. Again consider this example:

f x = [myDog, cast x]

You can't infer that x must be an Animal here. In fact, in general it needn't be. You could have a list containing myDog and the number 5 and assign it a type that indicates that its members have no common operations. I don't think this is a good default for how lists should behave. I'd rather get an error in this situation.

You can recover my scheme with constraints by adding a rule that whenever you have a type constraint a < b, you can infer that cat a ~ cat b (if two types are comparable, then their categories / coarse types unify).

### Refinement types are constraints

In functional programming trees have their roots in the ground, although I think it confused me too, I didn't decide which way round things were in that paper.

I find treating refinement types as subtypes slightly strange. You are gong to run into problems with complex inequalities. What you want to do is collect all the constraints on a variable and then solve. The best way to do this is with something like an SMT solver. If I Google refinement types, all the top indexed papers seem to be treating them as constraints and using SMT solving techniques which seems the natural way to do it to me.

If I wasn't clear I prefer disjoint union types for runtime polymorphism, so I would see:

f x = [myDog, x] -- a list of dogs
f x = [Dog myDog, x] -- a list of animals


as being unambiguous, and more user friendly as sum types are really easy to understand, much easier than subtyping with casting.

If you want members with common operations you can call without unpacking each option in a case statement, you want an existential type with a typeclass (interface).

data Box = forall a . Interface a => Box {unbox :: a}
data PolyList = [Box]


So you can put any type that implements an 'Interface' in the list. There is no casting from the base-class, after all you don't know what the original type was. This is clearly more complex and harder to understand than the disjoint union type.

What would be interesting would be a hybrid between an existential type and a disjoint union, where you can call methods on a common interface directly, but unpack using a case statement and the disjoint union constructors to access the original type. This is easily achievable by making Animal an instance of the interface.

data Animal = Dog dog | Cat cat
instance Interface Animal where
method (Dog dog) = method dog
method (Cat cat) = method cat
type PolyList = [Animal]


The interface for Animal is obviously boilerplate, you might even be able to use a deriving clause in Haskell. In any case you could effectively generate this interface implementation for Animal automatically in new language.

### Subtyping

I'm not sure what links you're looking at, but the first paper that I find is about refinement types and laziness in Haskell. The second, on adding refinement types to ML, has an abstract that begins with the sentence:

We describe a refinement of ML’s type system allowing the specification of recursively defined subtypes of user-defined datatypes.

Collecting constraints seems like a reasonable way to describe how to implement the mechanism, but if you can pass an Int{>1} to Int{>0}, how is that not subtyping?

I think a preference for disjoint union types is appropriate in some situations, but not in others. The Animal example probably isn't a good one because it's not a real problem.

### Not Subtyping

What is a Refinement Type?
In a nutshell,

Refinement Types = Types + Logical Predicates
That is, refinement types allow us to decorate types with logical predicates (think boolean-valued Haskell expressions) which constrain the set of values described by the type, and hence allow us to specify sophisticated invariants of the underlying values.

So a refinement type is a logical predicate on a type, and it goes on to discuss solving with SMT. As Haskell does not have subtyping, this is clearly not using subtyping.
This is a 'if all you have got is a hammer, everything looks like a nail' kind of thing. I suspect OO programmers approach this from a subtyping perspective, and people used to HM and type-classes see it as solving logical predicates.
This comes back to my earlier comment about backtracking, effectively this kind of types + logical predicates is really a logic language, so rather than implementing some hobbled cludge of a type system (and I am referring to my early attempts for clarity) which turns out to be a badly implemented Prolog interpreter, I would write a sound logic interpreter built like a Prolog interpreter cleanly, and it would just need a few rules to do type inference etc.

### Still subtyping

As Haskell does not have subtyping, this is clearly not using subtyping.

It's well known that subtyping can be modeled without subtyping by insertion of coercions (see TAPL, I think). I think we're just disagreeing on wording.

You will perhaps be surprised to learn that I don't consider my type system to be a hobbled cludge. I could type everything with a bag-of-constraints, too. I just don't think that's a good idea.

### Not Surprised

I am not surprised to learn that, I am sure you would not have written it that way if you did. Not everyone is going to agree on the best way to do things, otherwise there would only be one programming language.

I think this is a bit of a rorschach test, we can look at the same pattern of dots and one of us sees subtyping, and the other sees constraints. That does not mean it is either, its just a pattern of dots.

Constraint solving is the more general approach and can deal with overlapping ranges like (x : 3 < x < 5) unified with (y : 4 < y < 6) to correctly give (z : 4 < z < 5) and more complex interval and repeating ranges.

### (I know you weren't

(I know you weren't surprised. One way of reading what you had written was characterizing the system I was proposing as hobbled cludge, so I was making a joke.)

I still don't agree with that wording. I'm not choosing subtyping instead of constraints. I'm saying that you need subtyping in the surface language to make refinement types palatable. You can desugar the subtyping into coercions / constraints - that's one way to view subtyping - but you still want subtyping in the user facing presentation.

Think about the example I keep citing. You have a value that's known to be greater than one. You want to pass it to a function that requires a parameter greater than zero. You don't want to have to explicitly coerce in such a situation. In other words, you want subtyping.

I don't see that part of what I've written as contentious. It is possible I'm using terminology in a nonstandard way, but I don't think I am. The substantive part of my proposal is to have coarse types (I actually call them type categories) that are solved by unification. This fits just fine into the kind of framework you're talking about, but it means sticking to a discipline where subtyping constraints always imply category equality. Your observation that subtyping constraints can just be modeled as ordinary constraints doesn't eliminate what I find useful about this discipline, namely that my categories are simple and don't look like bags-of-constraints.

### User Facing

Type classes propagate constraints, and do not need explicit coercion. Are you saying type-classes are equivalent to subtyping?

I think your course types are the same as HM types and your subtypes can be implemented with type-class constraints.

If you are saying it gives you a nice way to pretty-print the types and their constraints, I don't see any problem with that.

I think my original point that you could implement your system using the tools provided with OOHaskell + syntactic sugar still stands. I felt you were objecting to that, and I was trying to explain why it was true, although I may have got sidetracked a bit.

### Can we do an example?

Yes, my coarse types play a role similar to HM types. It gives a nice way to pretty-print, plus it allows you to rely on type inference for things like overload resolution / instance selection without trying to understand a morass of constraints.

If you're saying that you're going to use type classes without any implicit casts, then maybe I don't understand what you're proposing. Can we walk through how the example I keep mentioning would work just using type classes? You have a variable x which is of type Int{>1}. You have a function of type Int{>0} -> String. You want to make the call (f x). How does that work?

### An Example

This is fairly quickly put together, but here's a example:

{-# LANGUAGE
ScopedTypeVariables,
MultiParamTypeClasses,
FlexibleInstances,
FlexibleContexts
#-}

data Z = Z
data S a = S a

data Gt a = Gt Int deriving Show

class GreaterThan a b
instance GreaterThan Z (Gt Z)
instance GreaterThan Z (Gt (S a))
instance GreaterThan a (Gt b) => GreaterThan (S a) (Gt (S b))

(x :: Gt (S Z)) = Gt 7 -- we trust the programmer to honor the guarantee
(y :: Gt Z) = Gt 5

f :: (GreaterThan (S Z) a, Show a) => a -> String
f x = show x

> f x
"Gt 7"

> f y
"... No instance for (GreaterThan (S Z) (Gt Z)) ..."


Note: I changed it a bit so I could show a failure, so have a function that requires a natural number greater than one. In the example we trust the types assigned to data.

We encode guarantees as data types and requirements as type classes. The type class checks the guarantee is good enough, and the program will only type check if all the requirements are met with appropriate guarantees.

### Improved Example

This is a bit better:

{-# LANGUAGE
ScopedTypeVariables,
MultiParamTypeClasses,
FlexibleInstances,
FlexibleContexts
#-}

data Z = Z
data S a = S a

data Gt a = Gt Int deriving Show

class SubRange a b
instance SubRange (Gt Z) (Gt Z)
instance SubRange (Gt Z) (Gt (S a))
instance SubRange (Gt a) (Gt b) => SubRange (Gt (S a)) (Gt (S b))

(x :: Gt (S Z)) = Gt 7 -- we trust the programmer to honor the guarantee
(y :: Gt Z) = Gt 5

f :: (SubRange (Gt (S Z)) a, Show a) => a -> String
f x = show x


There is actually an interesting thing in common with one of my recent posts on logic languages, which is the duality in Prolog of clauses as evaluation and structure. In my mini language experiment with function like things in a logic language where it became apparent that stuctures need to be separated from evaluation, we see the same thing here, data types model Prolog clauses used as structure, and type-classes model Prolog clauses as evaluation (although as committed choice not backtracking, but if instances do not overlap there is no difference). We could write the type level program in Prolog as:

sub_range(gt(z), gt(z)).
sub_range(gt(z), gt(s(X))).
sub_range(gt(s(X)), gt(s(Y))) :- sub_range(gt(X), gt(Y)).

:- sub_range(gt(s(z)), gt(z)).


Edit: I prefer to think of this as a meta-program accessing static meta data about the program, which can include types, constants, and the ast-tree itself (for type inference etc). This avoids confusion with runtime polymorphism where people consider types that only exist at runtime and cannot be known statically. When thinking of these as type level programs you run into problems with things like the types in a polymorphic recursion that depends on a runtime value.

### Does this work?

In order to avoid the cast that I was inquiring about, you've subtly shifted the type of f from (Int{>1} -> String) to something akin to (forall a. a < Int{>1} => a -> String). That seems to work in this case, but I'm not sure how it generalizes. For example, how would you use this trick on the successor function which I would type (Int{>0} -> Int{>1})? Now you have the same integral type on both sides of the arrow, but with different ranges.

I prefer to think of this as a meta-program accessing static meta data about the program.

I agree that's a useful point of view.

### Successor

The type of the successor function could be defined as:

class Natural a
instance Natural Z
instance Natural a => Natural (S a)

g :: Natural a => Gt a -> Gt (S a)
g (Gt x) = Gt (x + 1)

> g(x)
"Gt 8"
> g(y)
"Gt 6"
> g(g x)
"Gt 9"


Adding the constraint 'Natural a' makes sure 'a' is restricted to natural numbers. As for type inference, certain primitive functions must declare a type, but that is normal.

We could put the constraint in the data type, but that is generally considered bad form in Haskell.

### Nevermind

I asked a question about writing double, but nevermind. I see how you'd generalize this. You're basically moving much of the computation to type level. Let me mull over it.

### Double

We can implement arithmetic at the type level, including more efficient baseN arithmetic, (and if using Prolog at the meta-level, these can be numbers built into the meta-language, the implementation with S and Z is just to get it working in Haskell's type system):

class Add a b c | a b -> c
instance Add a b c => Add (S a) b (S c)

class Mul a b c | a b -> c
instance Mul Z a Z
instance (Mul a b c, Add b c d) => Mul (S a) b d

type Zero = Z
type One = S Z
type Two = S (S Z)


We can then redefine the succ function above into this style and define double:

successor :: Add One a b => Gt a -> Gt b
successor (Gt x) = Gt (x + 1)

double :: Mul Two a b => Gt a -> Gt b
double (Gt x) = Gt (2 * x)


This all becomes a very simple logic meta-program, even HM type inference is a relatively simple logic program.

Edit: I could have defined a logic program to evaluate 'subtype' instead of 'subrange', but I hope you can see these would be very different logic programs, and the correct definition of subtype is much harder to write and more error prone than the definition of subrange. This is what I mean about not using subtyping, there is no 'subtype' predicate on types that has to deal with a large number of types and combinations of types.

### This doesn't seem like refinement types

What you're describing with subranges doesn't really seem like refinement types. Open the Liquid Types paper you linked me to and search for "subtype". It's an important part of that paper. The requirement of refinement types is the ability to write Type{Predicate} for an arbitrary predicate. If you want to subsume refinement types, you need to propose a mechanism to handle arbitrary predicates semi-automatically. For example Int{\x. x>10 and (even x)}.

It seems to me that to get refinement types in this type class style, you'd have to systematically convert every function that takes type T into a function that takes an arbitrary subtype of type T. Since functions are parametric and don't inspect the types they are defined on, this doesn't really get you any power. It means the rule for function application is just unification, having moved all of the subtyping constraints into the type class mechanism, but I can't figure out why that's helpful. I can tell it appeals to you that the result is a "simple logic program", but I'm not sold on that being important.

As an aside, the phantom approach whereby you skip defining a direct semantics for a type and instead have the semantics implicit in the allowed type transitions is ugly to me. I'd rather define the type semantically and then prove the transitions sound. I think it's better for programmers.

### Boolean Predicates

In your example: >10 and (even x) is just a boolean predicate. This predicate can be implemented in a logic language (which was in effect what I was doing), so you can of course express any restriction.

I only lifted the operations to the type level, because the operations affect the range. For example I assumed that when adding one to a value you could add one to the range start as well, as a value greater than one when you add one to it, you can be sure its greater than two (no matter what the value is, providing the condition was true to start with). Somewhere you have to encode all these rules for your primitive operations. You saw the encoding in Haskell for the rule for numeric ranges under addition, how would you write these down formally?

I don't get what you are saying here "It seems to me that to get refinement types in this type class style, you'd have to systematically convert every function that takes type T into a function that takes an arbitrary subtype of type T.". It restricts functions to only compile when they are passed valid arguments, isn't this exactly the power you are looking for?

I suspect you are confusing functions that modify the guarantees attached to types, with functions that have requirements that must be satisfied. The earlier example with the 'to string' function showed the requirement side. Using those definitions:

> f y
... No instance for (SubRange (Gt (S Z)) (Gt Z)) ...

> f (successor y)
"Gt 6"


because 'f' requires a value with a guarantee it is greater than one, and 'y' only guarantees to be greater than zero the first application fails. However the second succeeds because the successor of y must be greater than one.

The only bit that you cannot easily do in Haskell is the path dependent types, however now consider a logic meta-language that can also access the syntax tree. This allows us to neatly write the type inference, and infer the types from the code AST directly such that:

if x > 3 then
-- here we can guarantee gt(3)
else
-- here we can guarantee le(3)
endif


The types generated here are all inequalities on integers, and we only have to deal with satisfiability modulo integer number theory, not subtyping.

### GADTs with type level predicates

I'm trying to understand how the technique you're describing handles general refinement types. I guess to generalize what you've done with ranges, you'd need type level predicates. So maybe something like:

data RT p where -- refinement type of Int 

RT :: Pred p => RT Int

Then you have functions like this:

succ :: RT {\n. p (n+1)} -> RT p

That bit in curly brackets is magic that lifts the function to type level.

I can imagine that something like that would work and get you pretty much full refinement types. This seems much more complicated to work with than just using subtyping, though.

Edit: Yes, I was deliberately avoiding the issue of path-dependent types.

### Guarantees and Requirements

One of the things I don't like about the subtyping approach is that it makes no distinction between a guarantee and a requirement. Given some type (x : x > 0) is this a guarantee that the variable is greater than zero, or a requirement that it must be? Such confusion makes debugging harder. The subtyping relationship makes this implicit by relying on covariance properties to force functions to only accept arguments satisfying their requirements. This means the relationship is only clear at the point of application, and you have to introduce a subtyping constraint if you want to propagate the type to solve at another point in the AST. This is just obfuscating what is really going on. It seems much clearer to separate requirements (expressed with constraints, for which type classes will do) and guarantees (which can be encoded in the types). We can extend the definitions I gave, keeping guarantees in the type, and requirements in the constraints by implementing a simple interpreter at the meta-level. The requirements can safely be propagated without losing their relationship to the function application, allowing all the requirements for a variable to be gathered and simplified without adding the complexity of a subtyping constraint. Guarantees can also be propagated independently allowing solving at different points in the AST to aid debugging. Once guarantees and requirements have been turned into subtype constraints it is very hard to get back to guarantees and requirements unless the co/contravariance information about each application is propagated along with the subtype constraints.

### Assumptions and conclusions?

By guarantee do you mean conclusion and by requirement you mean assumption? I like that to be handled the way it is done in classical logic - assumptions are anything left of the turnstile but there is no difference in form between assumptions and conclusions. Maybe I'm missing your meaning, though.

Also, you seem to be saying here that you could have just extended your previous definitions to handle different types of refinement types. I didn't think that was true. Suppose you have some code written using your Gt definition. Then somewhere else you also want to track parity (odd/even). It seems to me that to do that you'd have to replace Gt with a more general GtWithParity type. Without subtyping, you can't implicitly cast this to a Gt type, so you have to insert casts to use the functions defined on Gt. No? This is why I concluded that you'd need to use an arbitrary predicate lifted to type level to support general refinement types.

### Guarantees vs Assumptions

I thought my terminology was fairly clear. Requirement is common notion, for example we might say indexing an array has the requirement that the index is not greater than the length of the array. It is a restriction on the arguments for the function to be valid. Guarantee is perhaps less common term, but the meaning seems straightforward, it is a variable who's value has somehow already been proved to satisfy the guarantee.

You are right that 'Gt' needs to be extended to a more general predicate in a simple expression language.

data Z = Z
data S a = S a

type Zero = Z
type One = S Z
type Two = S (S Z)

data RT f p = RT Int deriving Show

data Var
data Gt a b
data Sub a b
data Mul a b
data Div a b
data And a b
data Mod a b

class SAT a b
-- this is not trivial

-- this is a guarantee (from the programmer that x > 0)
x :: RT Var (Gt Var Zero)
x = RT 27

-- this is a requirement that (x > 1) is satisfied by argument 'a'.
f :: SAT (Gt Var One) (f, p) => RT f p -> String
f (RT x) = show x

-- these functions modify the guarantee
successor :: RT f p -> RT (Sub f One) p
successor (RT x) = RT (x + 1)

double :: RT f p -> RT (Div f Two) p
double (RT x) = RT (2 * x)


The SAT predicate is going to be non-trivial as it has to implement an SMT solver. This may or may not be possible in Haskell, I have not really thought about it much. My intention was not to implement this in Haskell, but to use a logic meta language to solve, taking advantage of type classes being equivalent to logic clauses, and types being equivalent to logic structures.

I had to split the RT type in two, so in RT f p, 'f' is a function applied to the variable before the predicate is called, (x : p(f x)). This lets us compose the transformations of x like this (x : p(f . g x)).

### Duplication of the program at type level

1. I don't think you need to have guarantees. At the introduction of an RT type, you can basically have a "singleton" RT -- the predicate indicates that a single value is allowed in the type. The SAT solver can work out the rest.

2. It seems like you shouldn't need a function and predicate parameter to RT. Maybe you could use an existential, since the rule for a value belonging to the image of a function is "exists x in domain such that ..."

3. Biggest concern: doesn't this approach basically require you manually re-encoding the entire program fragment that uses refinement types at the type level? This seems unacceptable for usability.

(This has been an interesting conversation, though. Thanks!)

### Questions.

1. The RT type represents a guarantee, the SAT solver constraint is the requirement. Really the important thing for me is that the types are parametric and there is no covariance or contravariance involved, the type class is a constraint that is either satisfiable or unsatisfible.

2. The problem is we need to compose operations like this:

> :t double(successor x)
double(successor x) :: RT (Mul Two (Add One Var)) (Gt Var One)


We start with (x : x > 1) then (x : (x - 1) > 1) and finally (X : (x - 1) / 2 > 1).
So how do we transform (x : x > 1) into (x : (x - 1) > 1) ? My answer was to define the predicate (x : (f x) > 1), so now we just compose the inverse functions, so we get:

x :: (x : (f x) > 1) where f = id
successor x :: (x : (f x) > 1) where g = \y . y - 1, f = g . id
double (successor x) :: (x : (f x) > 1) where g = \y . y - 1, h = \y . y / 2, f = h . g . id


How else would you do this?

3. How else do you express the type transformations required? Language builtins would have the correct types already defined, so when composing you would not have to repeat at the type level. Note, I found a mistake in the above code, as we need the to compose the inverse functions, so it is not just repeating at the type level. In any case an automated transform can be applied to the value-level code to generate the type level structures.

### Subject

1. My point is that you can always give a more specific guarantee:
x :: RT Var (Eq Var TWENTYSEVEN) x = RT 27

This step can be fully automated. And the "requirement" part, that it's greater than zero, can be specified with the SAT/type class framework. In this way we eliminate the "trust me" step that 27 is greater than 0... we can prove that.

2. Generically, if you have a function f : A -> B, then the predicate / refinement type at B corresponding to the image of a predicate p on A is given by q = \b. exists a:A. p a and b = f a. I don't really see how to make this simpler. With the inverse function idea, what do you do if there isn't an inverse?

3. I'm not sure how to fix everything right now, but it seems to be more complicated than the refinement type approach. You seem to have created a harder problem to solve with this approach. Namely, it seems like in this approach we're having to keep track of what happens to subtypes of Int under functions instead of the refinement approach which just focuses on particular values. The relationship between the ideas isn't very clear to me right now, though.

### Assignment

What do you do if the Variable is not assigned until runtime, but you want to statically prove it is compliant? How do you deal with path dependent types?

In any case you are saying to apply the function to the constraint instead. That would require the requirements as 'type data', and making the guarantees with type class constraints, I will update the example.

I am not sure what you mean by some functions not having an inverse? (Well I do know, but I think it's not an issue). What is a good example of a problematic function? What happens to the guarantees when you multiply for example (a -> b -> c) consider the cases where one of a and b make a guarantee, and then where they both do.

I don't see any fundamental issues here. I may not have time to update the example until later.

### Example with Requirement Propagation

We need to define substitution to get this to work.

class Subst s t a b | s t a -> b -- b = (s ~> t) a
instance Subst s t s t
instance Subst s t Z Z
instance Subst s t a b => Subst s t (S a) (S b)
instance (Subst s t a c, Subst s t b d) => Subst s t (Gt a b) (Gt c d)
instance (Subst s t a c, Subst s t b d) => Subst s t (Add a b) (Add c d)
instance (Subst s t a c, Subst s t b d) => Subst s t (Mul a b) (Mul c d)

-- requirements
f ::  RT (Gt Var One) -> String
f (RT x) = show x

-- functions
successor :: Subst Var (Add One Var) p q => RT q -> RT p
successor (RT x) = RT (x + 1)

double :: Subst Var (Mul Two Var) p q => RT q -> RT p
double (RT x) = RT (2 * x)

-- guarantees
guarantee :: SAT p1 p2 => Int -> p1 ->  RT p2
guarantee x _ = RT x

-- example
test = f (double (successor (guarantee 5 (undefined :: Gt Var Zero))))


If you run this with no instances for SAT you get the following error:

No instance for (SAT (Gt Var Zero) (Gt (Mul (S (S Z)) (Add One Var)) (S Z)))


Which looks about right for the satisfiability problem.

### Singleton types

I'm mobile for a while but I think I see how to simplify things along the lines of my last post. Basically you use singleton types instead of RT types. Then a predicate can be entirely specified with type class constraints. So the basic idea is to lift values to type level and then do something akin to normal refinement types at type level.

### Lifting Values To Type Level

Its pretty simple to lift the value to the type level, you can even do it using polymorphic recursion. However I am not sure that's what you want to do, otherwise you limit yourself to only working with static constant values.

Obviously it really needs path dependent types to work. I think I need an example that provides alternation with guarantees. Something like:

path = do
if (x > 0) then
putStrLn $f (double (successor (guarantee x (undefined :: Gt Var Zero)))) else putStrLn "Fail"  Ideally the guarantee is lifted directly from the 'if' statement. ### Not static Lift expressions to type level just like you've been doing. But don't have expressions for the predicate. Do that with type classes. Code: data Expr = ...  data Sing a e where -- Singleton . . . Sing a :: Expr e => a -> Sing a e -- fixed plus :: forall ex, ey. Sing Int ex -> Sing Int ey -> Sing Int (Plus ex ey) plus (Sing x) (Sing y) = Sing (x+y)  foo :: forall e. Pred (GT e 1) => Sing Int e -> String  You might want to change the way that expressions interact with types, but that's the general idea. I think by working with singletons, you avoid the extra complexity I was observing that arises from trying to compute images of larger types. ### GADT Syntax All this seems to be doing is duplicating the value level computation at the type level? I don't actually see that the types are adding anything here? What am I misunderstanding? Is you have a value level computation y = double (successor x) there is no point in simply replicating this at the type level, as it requires you to know 'x' before you can know the value of 'y'. This will only know if the constraint is satisfied at runtime, so you may as well replace the constraint with an "assert" statement. Here's an expansion of your example to a working program: data Plus a b data Gt a b class GT a b instance GT (S a) Z instance GT a b => GT (S a) (S b) class Add a b c | a b -> c instance Add Z a a instance Add a b c => Add (S a) b (S c) class Expr e instance Expr Z instance Expr a => Expr (S a) instance (Expr a, Expr b) => Expr (Plus a b) class Eval e r | e -> r instance Eval Z Z instance Eval a b => Eval (S a) (S b) instance (Eval a c, Eval b d, Add c d e) => Eval (Plus a b) e class Pred p instance (Eval a c, Eval b d, GT c d) => Pred (Gt a b) data Sing a e where Sing :: Expr e => a -> Sing a e plus :: Expr (Plus ex ey) => Sing Int ex -> Sing Int ey -> Sing Int (Plus ex ey) plus (Sing x) (Sing y) = Sing (x + y) foo :: Pred (Gt e One) => Sing Int e -> String foo (Sing x) = show x test = foo (plus (Sing 1 :: Sing Int One) (Sing 1 :: Sing Int One))  Now this works, doesn't need any SAT solver, and actually evaluated the type level expression in predicate, such that it will fail if the value passed to 'foo' is not greater than one. However, this is not doing anything that this much simpler solution would do: bar :: Int -> String bar x = if x > 1 then show x else undefined test2 = bar (1 + 1)  ### Not static You didn't need the SAT solver here because you could statically evaluate everything in sight. But what if you had: baz :: Pred (Gt e Two) => Sing Int e -> String baz x = foo (x minus one) This shows why you need a theorem prover in the logic and also I think shows why it's not true that this is just a static mechanism where you have to "know" x first. The type of x can be parameterized with an arbitrary expression and you have to make progress / prove things in that abstract environment. Aside: The question remains of whether this is a good idea. Surely once we've noted that automated type level replication is possible in this way, we shouldn't actually do it like this. We ought to just let the type system have access to value expressions, IMO. ### How to solve Right now I can't seem to get 'baz' to work, it does not seem to propagate the constraints properly. It is no longer clear if 'Pred' is a requirement or a guarantee, and there is no obvious constraint where the SAT problem is to be solved. The encodings I gave had a clear 'SAT' constraint that was either satisfied or unsatisfied. I don't see that here. (In answer to the aside, the way I would actually implement this is with a logic meta program that has access to the syntax tree, and can infer types, and access constant values and variable names). ### Guarantees vs. Requirements I could try to help you later, but I'm not particularly adept at Haskell's type class system. Any time I've tried to do anything even marginally complicated with it, I've found it to be a royal pain. I don't see the guarantee vs. requirement distinction as important. As I understand it, requirements are things you assume and guarantees are things you have to prove. In this framework, if you give an explicit type declaration to a function, then you're stating your assumptions (requirements). You'll encounter proof obligations (need for guarantees) any time you try to call another function from within it. If you don't specify any types and let type inference do its thing, then you'll get a bunch of constraints. If they aren't satisfiable then you won't be able to call that function analogously to the way that you can't call a function with a parameter in the empty type in a strict language. There is a little bit of a disconnect here since Haskell isn't a strict language. I'm not sure what the best way would be to make the type level lazy too. ### Lazy Type System Why does that matter, as you are trying to prove everything statically at compile time, you prove the constraints on all branches if the program. Alternatively, is that the difference why we don't seem to be able to agree on what is necessary? Are you assuming this happens at runtime as information becomes available, like some kind of dependent types? As for the bunch of constraints, this makes me think you are talking about runtime when you will know the actual value and can see if the constraints are satisfied or not by the value. If so I can explain the difference. To make your system work you need to dynamically insert code at runtime, for example an 'if' statement where the value is input or created, that tests all the constraints. This is in effect generating code from types. I think this is a bad idea because the programmer does not get to inspect and optimise this code. In my system this is replaced by path dependent types which make a guarantee (hence why you don't see the need for them). The guarantee takes the place if the machine generated code, and is written by the programmer. The programmer has full visibility of exactly what code will run, and the type system does what is always has, validates the code before allowing it to compile, there is no invisible code generated from the type system. So I think the guarantee vs requirements thing is important, in fact it is the critical difference between the approaches that allows one to operate without dynamic code generation. I think the example is okay, using GADT syntax with a single constructor is no different from a normal datatype: data Sing a e where Sing :: a -> Sing a e data Sing a e = Sing a  given: minus :: Sing Int ex -> Sing Int ey -> Sing Int (Take ex ey) minus (Sing x) (Sing y) = Sing (x - y) foo :: Pred (Gt e One) => Sing Int e -> String foo (Sing x) = show x baz :: Pred (Gt e Two) => Sing Int e -> String baz x = foo (x minus one)  with no instances of 'Pred' we get Could not deduce (Pred (Gt (Take e One) One)) arising from a use of ‘foo’ from the context (Pred (Gt e Two))  Which show it is trying to do the right thing, but it needs to be able to derive one predicate from the other. We can then attempt to define Pred. class Pred p instance Pred (Gt (S a) Z) instance Pred (Gt a b) => Pred (Gt (S a) (S b)) instance (Add b c d, Pred (Gt a d)) => Pred (Gt (Take a b) c) > baz one No instance for (Pred (Gt Z (S Z))) arising from a use of ‘baz’ > baz two No instance for (Pred (Gt Z Z)) arising from a use of ‘baz’ > baz three "2"  However we run into problems if we change the constraint: baz :: Pred (Gt e Three) => Sing Int e -> String baz x = foo (x minus one) Could not deduce (Pred (Gt e (S (S Z)))) arising from a use of ‘foo’ from the context (Pred (Gt e Three))  The problem is we lose the ability to specify the rewrite of one Predicate to another. This works, but has an infinite recursion, so will hit the depth limit rather than fail due to lack of matching: class Pred p instance Pred (Gt (S a) Z) instance Pred (Gt a b) => Pred (Gt (S a) (S b)) instance {-# OVERLAPPABLE #-} (Add b c d, Pred (Gt a d)) => Pred (Gt (Take a b) c) instance {-# INCOHERENT #-} Pred (Gt a (S (S b))) => Pred (Gt a (S b)) baz :: Pred (Gt e Three) => Sing Int e -> String baz x = foo (x minus one) > baz two Context reduction stack overflow; size = 101 > baz three No instance for (Pred (Gt Z Z)) arising from a use of ‘baz’ > baz four "3"  I am not sure if I can do any better at the moment. ### Laziness I think the problem is that we're cooking up a logic to reasons over the semantics of the base language, and the logic required for laziness is just more complicated. I vote to call this issue a red herring and move along. I think the meat of the discussion remaining is that you think runtime code generation will be required for this, but I'm not sure why. The expression parameter to a Sing type is phantom -- there's no need to generate any code for such a thing. I expect this to be a fully compile time way to check refinement types. Can you give an example of types that aren't be erasable? I think path dependence fits neatly with the above framework if we use a special 'if' that lifts to type level: if' :: Sing Bool e -> (Pred e => a) -> (Pred (Not e) => a) -> a So, if the then-branch is taken, then the expression defining the condition can be assumed to hold. If the else-branch is taken, it's complement can. (And 'Pred' isn't a very good name for this class, since it's really supposed to assert that the predicate is true.) BTW, nice job working all of this out! Hitting problems with overlapping / incoherent instances doesn't surprise me at all. I wonder if type families would have been better for writing eval in a functional style? (I'm not suggesting you try.) ### No Difference Then I don't think the laziness is as issue either. The 'if' looks like the kind of thing I was thinking, so it appears we are talking about the same thing here too. I will have a think about improving the implementation, and whether I prefer the constraint propagation, or type-interpreter based approach better. ### No backtracking Isn't part of the problem going to be that it's very difficult to write a proof search in a logic language without backtracking? ### Backtracking. An aside about my own direction at the moment: If you think about a type class with no value level functions, like Add above this can be rewritten in Prolog-like syntax as: add(z, A, A). add(s(A), B, s(C)) :- add(A, B, C).  So with that mapping in mind, my own language has backtracking at the type level. In fact it is basically a Prolog-like meta-language. Once working in a logic-language the type inference algorithm is much easier to write, and check it is correct. So much so that my project became more developing a logic language intended for proof carrying compiler implementation than the original language at the moment. I think the eventual language will be meta-level proofs in a logic language, coupled with a high performance low level imperative language for implementation. The interesting thing is many language constructs like type-classes do not need explicit support, as they become trivial meta-programs in a logic language with backtracking. ### Sounds good Yeah, that sounds nice. ### upcasting and downcasting... I approximately never want subtypes unless I'm using type as a shorthand for method dispatch. If I have 'dog' and 'cat' as subclasses of 'animal', and I have an animal.makenoise() method that both have specialized, I want to have some routine that takes an argument of type 'animal' and then makes a call to animal.makenoise() on that object. And if it's a cat I want cat.makenoise() aka "meow" and if it's a dog I want dog.makenoise() aka "woof." I don't want to have to write a dispatcher, or make a cast, and I don't want to have the routine that makes the call to that method have to care in any way what kind of animal it is. I don't even want to recompile that routine when I add the subtype 'cow' with its own specialized makenoise() method. In other words, I find the "dangerous" cast to be the only time I ever even want subtypes. Otherwise you don't achieve any abstraction so what's the point? ### I think a language with ducktyping and where object.super is not legal outside of a method and where there is no casting is a good idea. Even inside of a method super is very dangerous, it means that each subtype is totally dependent on the stability of its supertype. But I have no solution for that problem other than getting rid of inheritance altogether. Which isn't a horrible idea either. ### Interfaces without subtyping Dog and cat can both implement the Animal interface and you get what you want. An interface is the same thing as a type-class, and a type-class can be interpreted as a constraint on type. So you may want to write (using Rust syntax): fn do_what_I_want<A : Amimal>(a : &A) { a.makenoise() }  Where we are saying 'a' is a thing that can be any type 'A', as long as 'A' implements the 'Animal' trait (interface). There are no casts involved at all. ### Are Cats Animals? The people of Rust haven't decided how to handle variance, yet. ### Variance. It has runtime polymorphism so I can have a collection of objects that support the animal interface. What does it need variance for? I haven't missed it yet, and I never missed it in Haskell. Edit: reading the docs, Rust subtyping is only used for lifetimes, so variance is only an issue with lifetime markers. Reading the discussion about whether lifetimes are co- or contra-variant in different positions I already feel this was a design mistake. Would lifetime polymorphism be enough? ### Minimal Type Annotation OOHaskell also tried to work out what can and cannot be inferred (there's a lot in the paper). For example printable_point x_init s = do x <- newIORef x_init returnIO$ nominate PP -- Nominal!
\$ mutableX .=. x
.*. moveX .=. (\d -> modifyIORef x (+d))
.*. print .=. ((s # getX ) >>= Prelude.print)
.*. emptyRecord

printPP (aPP::N PP x) = aPP # print -- accept PP only
printPP’ o = printPP (nUpCast o) -- accept PP and nominal subtypes


The implementation in Haskell is the proof of soundness (assuming Haskell is sound), the syntax could be replaced by whatever syntactic sugar you like.

This shows the creation of a nominal type 'PP' without any type annotations. And shows a function that accepts subtypes without any annotations.

The actual function that accepts a nominal type needs an annotation, but if this function were part of an interface or class definition the type signature is needed in any case to define that interface or class.

Hence outside the definition of type level constructs like classes and interfaces (which are type level constructs to start with) no type annotations are necessary.

### JVM-related warts or not

I'll answer separately on soundness, here I'll stick to the JVM.
Some warts are clearly due to the JVM. For instance, a member of a class can be a method or a function — the distinction is mostly due to the JVM. Many parametricity violations also come from there. Oh, and erasure — the problem is not erasure itself, but the fact that the JVM requires only some erasure, and this becomes very messy.

Should you be comparing to Haskell: note that Scala is a language from the ML family, so strictness, implicit effects and subtyping were there on purpose (https://twitter.com/Blaisorblade/status/701800572173869057). Subtyping comes from making the ML module language (which *has* subtyping) first-class.
(The best example is the lack of instance coherence in Scala, but that's clearly not related to the JVM or to your comments, so I'll keep that out).

If you want a purer version of Scala, I think you should look at what Derek Dreyer and Andreas Rossberg (and others) have been doing with MixML, 1ML, "Modular Type Classes" and friends in the last decade. Where Scala has IIUC still extra expressivity and design differences.

An important design difference is nominality — ML functors are structural, and I remember Rossberg insists on that. Scala has to be nominal because of the JVM; I remember that papers on Scala really argue for nominality, but I can't remember their argument.

(And of course Scala is less finished compared to Haskell and that's annoying, but I'm leaving that out, and in part I'm assuming an "idealized Scala").

### Intentionally abandon soundness?? Nope

Yes, Scala has soundness problems, but that's not considered "a feature" (there's something to your claim arguably, but the correct version is very different) — that would be appropriate for Dart. And actually, since Dart explicitly claims to trade soundness for simplicity, it is an interesting example of what Sean is claiming. If possible, somebody should test that. But Dart should be another subthread.

First, people are working on soundness, with Coq proofs recently announced on ArXiv.

What *is* a feature is that things that are obvious in Scala are not there in ML because ML researchers considered them "unsound" while in fact they're just much harder to get right. And instead of just writing papers, Odersky did what he thought was enough foundational research and made Scala successful in practice (well, it's not JavaScript, but it's also not a research language like Clean or Mercury).

Then it became clear that Scala and Scalac were collapsing under their own complexity, and Odersky & C. went back to the drawing board and to soundness proofs. This is not the standard academic strategy; however, I'm not sure you could pull this off otherwise. Without Scala's success, it's hard to justify the amount of resources spent on proving it sound. If a PhD student did that, it'd be brave, but almost a career suicide, academically speaking.

Viceversa, if you tried to write Scalac with Haskell's level of rigor, it'd probably take 10 years and hundreds of papers, and you'd have trouble getting them published. To some extent, I suspect here peer review would slow down progress (you can only publish a paper if you convince reviewers, so what you propose can't be too different). Though OTOH related research gets published, even favorably — heck, Pure Subtype Systems (which is closely related) *was* published at POPL without a soundness proof.

Maybe I mixed up Dart and Scala there. Have you read the OOHaskell paper I linked to above? The fact that you can express a translation of things like multiple inheritance and subtyping in a sound type system is a proof of soundness by translation.

### I have seen that in the

I have seen that in the past, but never read it in full I'm afraid. (I just rechecked the abstract). Not a 100% sure why that's relevant: are you implying that Scala's a bad idea and OOHaskell has enough expressivity?

But the problem with Scala is with type members and upper and lower bounds, which I'd strongly guess go far beyond OOHaskell (type members aren't mentioned). If you care, I could try to summarize why type members matter (or point to the literature on ML modules).

Without bounds, you could stick to 1ML and translate to System Fomega (or even System F), following F-ing modules (as the 1ML paper does) proving soundness like you propose. System Fsub has upper bounds, and IIUC nobody tried to exploit that to extend 1ML with... type refinement? Not sure what you'd get.

Either way, even just adding lower bounds made things much more complex with standard proof techniques. The issue is that during reduction, since you have two bounds that can be refined independently, they can become contradictory — and if a type member T is a supertype of Any but a subtype of Int, you can cast anything to T and then to Int, violating soundness. For a long time it wasn't clear how to prevent that, until http://arxiv.org/abs/1510.05216 showed why this can't happen.

### Type Classes

Can't upper and lower bounds be implemented in type classes? An upper bound type is just a kind of constraint on the type parameter, something that can be soundly provided by type classes.

f :: (SubtypeOfX y) => y -> y


Casting is part of the problem, In my opinion, good language designs do not need casting. It is the attempt to recover type information (which in my opinion is inherently static) from runtime state that is the problem. I don't think you should do this, simply use open discriminated union types instead (although I am not aware of any languages that do provide them).

Edit: Trying to express this more clearly, the unsoundness comes from situations at runtime where you think you have type A, but in fact you have type B. There is a limit to what you can prove statically (a type system is a limited proof system). If you can prove you have type A when a type A is expected then the program is sound. Static polymorphism is easily handled by static type techniques. If you have runtime polymorphism then you cannot say what the type is, although you can restrict it to an abstract interface (which is type-class), hence existentially quantified types constrained by type-classes can represent runtime polymorphism.

### By far not enough

I'm afraid your comment seems overly underinformed — if that's due to my summary, I apologize.

Your example has a limited form of second-class subtyping (you can't even abstract over X); Data types a la carte is already better. Scala is about extremely first-class modules — in the dynamic semantics for the DOT soundness proofs, types must be passed around at runtime (well, that's already needed for Fsub as they show).

Also, I'm confused what you mean by casting. Do you mean downcasting? That does not exist in DOT, and why would one expect soundness with that? Do you mean the subsumption rule, that is, upcasting X to Y when X

I didn't claim to explain the problem, only to mention a few of the keywords. Also, note that the problem is born when you're using subtyping to encode an extremely expressive module system with abstract types. Your example can't even abstract over X, and I bet it cannot encode DOT. If you want to try, please consider the actual DOT literature.

So here's a short and incomplete summary of the issue: Scala allows expressing T.X

You might wonder why that can't be avoided, or why that expressivity is needed. Scala people have written papers at FOOL 2012 and OOPSLA 2014 showing why the issue is hard, and now have 2 papers on the successful soundness proofs. I won't try to summarize them.

What should be clear is that showing only how to deal just upper bounds is not addressing the problem, unlike many think.

Also, similar problems can show up even *without* any subtyping. For instance, a similar problem shows up with typecheckers for dependent types — here's a note I wrote on the relation years ago:

Also, to give an idea of the complexity of DOT, remember there are around 25 years between Fsub (beginning of the 90s) and a sound DOT. Full Fsub is already surprisingly tricky (compared to kernel Fsub).

### Can't fully separate engineering and formalization

Yes, some Scalac bugs are just bugs. But I think that's not true for all ones — for some bugs, people simply didn't know what should the compiler do, and some of those can't be solved without a better formalization. And (arguably) Odersky went back to a formalization because he thinks so too. I think he's not doing math for the sake of math, and I'm sure Tiark Rompf isn't either — he calls himself a pragmatist.

To be sure, I use Scala, Haskell and (a bit) ML, and I totally agree on Scala's expressivity advantages. But I also think some of this expressivity has usability problems — how many Scala developers master singleton types and can use the cake pattern well? That none read "Scalable component abstractions" doesn't help, but I think even that paper is not the full story about the cake pattern.

### Anyone who uses Scala heavily will find lots of bugs

Anyone who uses Scala heavily will find lots of bugs

I am interested in the current state of Scala. Any real show stoppers, like fundamental flaws?

### Confused on Lisp

I'm not sure what you mean there. Yes, even in the simplest language you can still write programs that are too big to fit your head. And yes, a "simpler" language can give you less support for managing that complexity. For me, a clear example is that ML modules (or their Scala equivalent) allows you to enforce modularity and manage bigger programs. Some claim that Haskell can't scale because it's non-modular, and I think this claim deserves investigation. Haskellers should look at the Backpack paper for a version of this claim signed by Peyton Jones :-).*

But I don't get why the simple semantics of Lisp make this problem worse, and since you use this example again, I'd like to understand it better.

Unless you refer to dynamic scoping, which simplifies interpreters (no closures around) but complicates using the language. But I think PL theorists consider lexical scoping mathematically simpler, or at least more elegant, than dynamic scoping, though this point is probably arguable.

*OTOH, Haskell's successes suggest that "writing a nice, clean program, instead of screwing together modules" is maybe not as unworkable as we thought, which fits some of Thomas Lord's claims. No, this is very far from proven, and all I'm saying is a lie under some meaning of "modularity". But there's something to be understood.

### Usability is a holistic

Usability is a holistic property. If complexity in your language can simplify your programs, then you might still be winning as long as you can appropriately internalize language complexity. Minimizing the number of moving parts that a programmer has to think about is key of course (i.e. modularity), but when evaluating language usability, you have to look at actual programs.

I come back to Lisp because it is based a on a few simple concepts...most obvious is syntax. It is the simplest syntax in the world, but many people don't really like it, since it can be very hard to handle as code grows large (and all those parens...).

### Scheme

has the most powerful constructs, but unreadable syntax. And don't tell me that you can't have a grammar and be homoiconic. Prolog is homoiconic.

And the hygienic macro system is a bad design. The opaque token system under it is a bad design - they broke sexpressions by making tokens. They also should have avoided making it a restricted DSL, they should have made it out of scheme calls and much more powerful.

Lisp isn't as powerful as scheme, and it's too big.

but you can learn a lot from programming in either language.

### Scheme++?

I believe you have opinions about how to move on past those issues :-), would you be so kind as to summarize what language(s) you see as making the most forward progress?

### I don't have time for an essay right now

and my idea of how to move past those issues isn't the same as copying other languages.  maybe I'm taking a lot from Prolog.

Uhm I like scheme's smallness. The language can be described in a small document and implemented in a short program. I think those are useful properties.

Lua has those properties and because of that is popular as an embedded language.

I happen to like its syntax for reasons I've mentioned at length in the past and so won't go into here. I also find the fact that it uses the call stack differently than other languages useful. Lua allows the number of returned values to mismatch the number of values expected. Same in the other direction. That turns out to save a lot of glue code.

But besides designing a syntax to be simpler and easier to scan, my answer for how to make one that has those properties AND is homoiconic AND is extensible is to go back to operator precedence grammars. Operator precedence grammars compose more easily than the usual kinds of context free grammars.

I would also add the innovations of distfix operators (ones that have more than one symbol and possibly more than two sides), and the innovation of allowing the user to define more than one kind of expression or context. So, within the delimiters of a distix operator might sit a totally different kind of expression that parses differently. Or even the same expression at a specified level of precedence (which can be useful for obscure reasons).

Operator grammar expression can be matched on directly, as happens in prolog, or they can be automatically converted to s-expressions and back. So there is nothing about this design that makes walking code harder than in Lisp.

As for macros and hygiene, all one really needs is to be able to specify if each symbol in a macro is:
1) a local variable being created in the macro (and as such should have a unique name)
2) refers to a specific definition visible in the macro generation environment
3) refers to something that should be visible at the top level of module being compiled into.
3a) something new that should be DEFINED at the top level of the module being compiled into. Also, should it be exported with the module?
4) should not be renamed, ie is locally visible under that name
5) should come from a custom environment, such as a name from a class binding
5a) should be defined into an object/custom environment. ie creating it rather than finding already there.
6) is a user supplied name, taken from the code itself, so its meaning depends on the original context. It is dangerous to manipulate these and there has to be library code that can do it and preserve its referential properties. The worse case is moving a local variable to a place where its name is hidden. See below for a mechanism to handle this case.
7) is a user supplied name from a macro parameter list, its meaning is decided by the intended use of the macro, and the macro programmer has to be able to give it the appropriate context such as 1) or 5)
8) should come from a chain of bindings at expansion time, say try 5) if it fails try 3) if it fails try 1) if it fails try 4) if it fails then error

There should be a textual way of specifying where an id looks up its binding, something like module_name::id. Macros should not depend on opaque properties of tokens.

The worse case mentioned above in 6) requires that lambdas be annotated with unique names so that they can be referred back to reliably. Note that the lambda that must be an annotated can be OUTSIDE of the macro proper.

Also, I like fexprs as a way of dealing with equations and code that writes code at runtime. But I'm not happy with John Shutt's definition. I think for the sake of efficient compilation and ease of human understanding, it's important that that every time a specific symbol is seen in a specific block, it necessarily refers to the same thing. You shouldn't be allowed to reevaluate what a symbol refers to, that should be set wherever it appears in a lambda/let/define or library include - and there should be no way to change it other than those. One exception is that it's useful to be able to create top level definitions from somewhere other than the top. The fact that 'define' in scheme does something different at the top level feels like a mistake.

These are the kinds of things you only notice when you're trying to make advanced macros such as object systems.

Another thing missing from scheme is the easy to use matching (and convenient for matching) backtracking search that one has in prolog. If one combined the macros with prolog matching, and combined fexpers with prolog matching (only different in that the variables in a fexper have already been bound) then one has a something rather more powerful than before. This is only an innovation if you think that prolog is innovative. So we're catching up to the 70s

Search and reified search and lazy evaluation and logic programming, and (possibly) sat solving are other areas I could write essays on (in that last case not yet!) and am interested in, but I'm going to keep it short:

Constructs that can change the meaning of whole the rest of the program:
1) usually considered evil. But that includes the now important threads - so everyone is busy trying hobble the rest of the language to be more robust against the existance of threads. So
1a) the new evil isn't threads, it's state because they're a dichotomy
2) Continuation
2a) amb and searches
3) laziness, memoizing and memory leaks

I don't consider these things evil. I hate languages that try to force you into one choice, one side of a dichotomy:
1) So Clojure which tries to banish state in favor of threads is bad but in a different way than languages which lack threads

You should get the choice.

I have an idea for dealing with continuations/amb/search:
define stackless functions as a type, distinct from stack-using functions. And only stackless functions can be captured in a continuation or reified search.

Make a call to a stackless function take a different syntax than a call to a stack-using function.

In this way search/amb/continuations don't change the meaning of a program because they never appear where they weren't expected.

One idea has to do with memoizing and also possibly reified search.

The problem with laziness is that you can't specify how much memory is set aside to memoize what, and you can't specify which values can be thrown away and recalculated.

I'm not interested at the moment with fixing lazy memoization, but am interested in the easier problem, but more useful, of doing that sort of thing with search.

So one can have a sub language that's convenient for saving partially completed searches and expanding them.

And you can specify how much space to set aside to store leaves. And when you run out of space, then it automatically switches from purely stored leaves to iterative deepening.

Enough ideas for now? I wonder how many I'll decide I forgot.

### thanks!

great food for thought!

### fexprs and textual representation

It occurs to me that the variable reference problem changes for fexprs.

I'm starting to think that evaluation should have at least two phases.

One gives lambdas and other contexts unique names and annotates variable or fexprs to refer to their context names (note that for fexprs the context will be to code that is not even in the text).

Then it takes a second phase to actually evaluate the code.

In Racket, tokens have some fields for this that can't even be examined and can't be controlled properly. Some, perhaps, that can't be set initially except in limited ways. It's a horrible mess that makes a lot of things impossible, or very hard.

That's what I want to avoid.

So, maybe for me a token looks like (*token* [association list]) before the first phase. In any case I demand that all code have a valid textual representation

### support for search and other features

Other ideas involve support for search.

Code blocks with an undo section that can only run on backtracking.
Logical variables.
Mutable logical variables (ie mutations reverse on backtracking).

Multiple search strategies.

Nonchronological backtracking. If you can detect the source of a problem, then you backtrack all the way back to that problem. That could mean a non-local cut feature - pass a label then cut to it then fail or it could be a nonlocal fail-to feature.

Note that most complex sorts of reordering search could get to be more like lazy eval and actually reorder the program, but I haven't worked anything like that out. But it could be useful if you want the ability to create a Curry-like sublanguage. That's a lazy functional logical programming language that uses 'needed-narrowing' and is based on Haskell.

Laziness is interesting. I'm not that interested in it, but I am interested in supporting multiple dialects in the same code, so laziness is an obvious dialect to have to demonstrate having such...

Another feature I like is duck typed objects and Smalltalk like syntax that names each parameter every time the message is sent and does not need parentheses at leas for single level expressions.

I like reified messages, a-la Smalltalk. They simplify a lot of code if you can just save a message and send it to any random object later.

So along with stealing from Prolog I want to steal from Smalltalk. So here we are trying to catch up to 40 years ago.

### continuation aware/thread aware segregation and other questions

I've been thinking about how to apply the principle I had to segregate continuation vulnerable parts of a program from the rest and how to apply that to threads.

Perhaps thread visible variables could be segregated from other variables - say you have to access them through a lock, a dummy lock or through an atomic access lock.

The idea is not to make races impossible, but just to make the code that could have races obviously set apart from code that could not.

Another question is, what would I call a language with these features.

There is no point calling it a Lisp or Scheme. The committees who control the standards move too slowly - they would not be open to radical features like these.

Another question is:
Are cons cells obsolete? They're a data structure from a time when memory was expensive. We could go with some other basic data structure. For instance, lua's unified table/array/queue/class/object catch all. Lua is missing a "copy" construct but I would put one in. But having a catch all like that seems inherently unfriendly to threads.

### I like your train of thought here

Continuations as a kind of jump are slightly too raw, like a goto, so analysis is hard when evaluating impact of potential flow of control on desired relationships (such as: all these conditions taken together obey an invariant under concurrency that can occur).

So more structure would help. The idea of structured programming was originally a reaction to wild-west-style use of goto, especially in assembler. Higher level patterns are easier to evaluate, and can follow standards. Control flow in cooked form is easier to analyze than raw, and coders are more likely to follow convention recognized by readers, so maintainers are more often correct in grasp of intent and detail.

To avoid historical connotation of "structured", because we don't really care about history, one can use synonyms like orderly, methodical, framed, tidy, systematic, disciplined, etc., and interchange them to prevent fixation on a narrow definition. All we need is the idea of system, plan, and patterns of usage that follow rules. It can help to use one word when being vague on purpose, but another when saying something precise. For example, "way" is very vague, even too vague, so people won't register it any more than preposition "of".

When the problem is threads or fibers, a language should be able to see them as features, and whether control flow usage is methodical, honoring state access patterns intended to preserve invariants (serialized reads and writes on these several related data structures). It's hard to evaluate "who" has access to things and whether throttled by synchronization, when there aren't any rules or patterns lending structure to methodical organization with a systematic plan. (Yes, laying it on thick there, as a wee joke.)

You need mechanisms that identify all parts of something inside a critical section, and the ways in which one enters and exits that critical section, while making this clear to the original coder and later readers. Relying on "it's obvious after you understand all the code" is poor form and a poor basis for automated checking. Monitors are a really old school way to frame part of this, but threads and fibers need to be added to the plan.

Now a parting remark on optimism. :-) In 1990, a bunch of C++ coders said, "I know, let's just make each of the primitive data structures thread-safe." That would take care of it, you see: no more races. The interesting part here is that this plan was not obviously stupid to folks involved, and required a walk-through to outline the problem. What if A and B are thread-safe individually, but a combined (A+B) at the same time is what you need thread-safe? If you protect the relation of both at once, the thread-safety of each individually is redundant and a cause of cycle-wasting lock contention ... so you don't want A and B thread-safe separately then.

A language should have features that frame concurrent execution so you can see it in code, as well as organize things that need protecting in critical sections. Leaving threads and critical sections to be handled entirely in a coder's mind is a recipe for problems.

### Showing where non-local effects are

As a side note, I originally came up with the "two kinds of calls" thing out of necessity rather than as good practice. I am implementing continuations on top of Lua as a CPS transform. Lua guarantees tail call elimination in all cases, so that's not a problem. But if I want to coexist with preexisting libraries, then I have no choice but to treat continuation-allowing calls differently.

But it was an example of a DIFFERENT principle that excited me, that of having apparently incompatible techniques coexist in one language. Specializing is the essence of optimization, but different specializations are not compatible with each other...

But THEN it occurred to me how deeply clarifying it is to have to specify which calls and which code has to be reentrant.

I don't have a way of making multiple threads and their data quite as clear without limiting what techniques people use. Sadly, even if variables that are globally visible (ie shared across threads) are marked both in declaration and use, the fact is that visibility is transitive in multiple ways and
1) data points at data
2) data transferred from a globally visible variable to a thread local variable may still be globally visible.

My idea was only to mark the declaration and accesses to the globally visible variables - which doesn't solve the transitivity problem. And I don't WANT to solve that, because any "solution" will limit what programmers can do. So at best this is a hint, and it's up to programmers to heed that hint and not lose track of what it means.

I was NOT intending to force code to be structured to any greater extent than C++ does with locks. And in the case of dummy locks (say for initialization and times when you know that the data isn't REALLY accessable across threads) not at all, and also in the case of atomic instructions, also not at all.

I don't want to limit what algorithms people use, because it is precisely within this kind of code where optimizations can be significant.

If people need to program come up with torturous nonblocking algorithms using with various C++ like memory models and atomic access, I don't want to stop them. And I DO know that they're so torturous that published algorithms are often flawed, and I've put flawed algorithms into products myself.

Notice that for concurrent processing we don't have anything like offsetting calls the way we do for continuations.

Really the best answer I have is to supply a bunch of safe, nonblocking queue data structures and suggest that programmer do NOT share data between threads except by handing it off from one thread to another unshared.

If you have more ideas, let me know. Maybe small grained concurrency would help. Loops that can be concurrent. I don't know. Shared data concurrency likes non-mutable functional data structures that instanciate unfilled portions rather than mutate.

Maybe we should provide however many of those will solve problems.

Another trick that isn't used except in advance systems is that of:
1) Aggregating unsafe work for later
2) counting out the sharing threads, and doing the aggregated work, and then restarting the shared threads, ie. PHASES

PHASES and QUEUES are the techniques that I'm familiar with. All this other fancy stuff I'm not, and I wonder if they work as well.

Maybe phases and queues are a better answer for highly parallel work after all.

My final question is, should non-visible variables be FORCED to be non-visible by making them all thread local. Should we prevent the programmer from cheating? My feelings are mixed. On processors with lots of registers like 64 bit intel, I feel "why not?" but it kind of runs against my "don't dumb down, don't forbid, don't make people pay for what they don't want" principle.

### showing more everything sounds good

(Your long attention span is refreshing. I have so many verbal conversations where folks assume my point will be in sight within seven words, which is never true.I can't even identify things I want to relate in seven words, so I get interrupted part way into naming a first. Not even prefacing remarks with "I am about to relate three things, the first is..." will work. Only drawing an introductory picture first works; then it becomes obvious nodes and arcs will need description. A picture is essential for random-access rejoinder in discussion: you can point at the answer and elaborate. The funny part is when folks ask if I need to take a picture of my drawing afterward. I offer "I'm the one who drew it" as polite reminder. People are so weird. It's apparently impossible I could have been thinking that before I drew it. Projection seems likely.)

Adding structure (or orderliness, method, framing, system, discipline, etc) might end up being an add-on kludge when living inside an existing system. Even if you start from scratch, only so many things can be close to one another in a flat two dimensional text representation. Bandwidth is limited, which sucks. There might not be a nice partial ordering that makes one concern most important, and therefore deserving of being outermost at the top, inside of which everything is embedded as a dangling detail.

Writing things down separately as top level concerns and then relating them has the problem that you can relate them incorrectly, or in contradiction, which violates a desire to say important things once only (in the best order). I prefer important things get expressed, explicitly, so it can be checked by both people and machine -- even if this must be redundant and not all in one place. I would rather admit potential error in expression than not be able to express.

So basically I very much approve of your specifying explicitly what code is re-entrant, or any other salient detail a person should note. That's a great strategy. Actual realization is just tactical detail negotiated with your users.

In code people need 1) a way to get more than one thing happening concurrently, 2) a way to keep them from interfering by default in some kind of domains that lack direct overlap, and 3) a way to order effects when interference is going to happen because local semantics requires it (we're sharing this mutable thing). The more implicit those things are, the harder to check and easier to forget to consider. Simplification can go too far, with failure caused by mess below threshold of perception. Everything needs a way to pop the hood and check the engine.

Really the best answer I have is to supply a bunch of safe, non-blocking queue data structures and suggest that programmer do NOT share data between threads except by handing it off from one thread to another unshared.

That sounds perfect, if queues are easy enough to perceive and reason about, perhaps expressed as some other related thing (people sometimes forget pipes are basically queues). Being able to make data explicitly immutable is good too. And when it must be mutable, provide a way to manage it properly with your favorite flavor of sync primitive. As long as a dev can go into the blueprints, point at a gate, and correctly predict what happens when this one gets stuck.

In the vein of "more ideas", the complex ones I have are about large scale organization of concurrent elements, and being able to track history of interactions. (I want green threads to run with command lines so lightweight processes have shell histories despite being embedded. You can have pure binary interfaces, but debugging may be miserable.) Simpler ideas are more about optimization, like keeping context switches from sucking in fine granularity. To keep from murdering CPU cache-line data caches, you want control flow to follow dataflow, so message hand-off tends to activate the next green thread in the same pipeline, all running in the same native thread timeslice. Basically you want scheduling to wake nearest neighbor any time a fiber parks, for data locality.

On the topic of forcing compliance with some systematic orderly method, it seems a good idea when it reduces the amount of hard radiation spewing software particles through a process address space, corrupting memory relationships. Rule-breaking can't really be done in volume and remain stable. Once there are hundreds or thousands of places where a rule was broken dangerously, there's no good way to triage when debugging.

### Update

Spurred by your question, I've worked for a few weeks on parser that may eventually turn into my Scheme++ (though the underlying engine is Lua not scheme so that will affect things).

The main conceit of the parser is that despite the bald assertions to the contrary in all of the literature, it IS possible to build a computer language entirely out of an operator precedence grammar, especially if you allow distfix operators. That's important because operators compose much better than other kinds of grammar constructs.

As an example, here is a definition, from actual code, of "if then" etc. as a distfix operator:

def_dist_data(main_exp,
{ ['if']={ initial=true, follow={'then'}, semantic_in = {__exp}},
['then']={ follow={'elseif','else','end'}, semantic_in={__statement} },
['else']={ follow= {'end'}, semantic_in={__statement}},
['elseif']={ follow={'then'}, semantic_in={__exp}},
['end']={final=true }, semantic_out={__statement}})

Since this is Lua code, some table keys have to be quoted as [''] because they're also Lua keywords.

This example shows some subtle problems, partially because I'm trying to support a grammar that isn't really suited to an operator precedence specification (this example being a Lua grammar).

Note the semantic attributes. In a grammar better suited to OPG, the difference between a statement and an expression (if there was one) would be syntactic, making semantic attributes unnecessary in parsing. But to do that, then the two sub-grammars, expression and statement, would always have to show up enclosed and separated by terminals... And that doesn't happen in Lua so the subgrammars have to be merged and the illegal combinations of the two disallowed by matching on semantic tags.

If the grammar was more suited to OPG then the definition would look like this:

def_dist_data(main_statement,
{ ['if']={ initial=true, follow={'then'}, exp = main_exp},
['then']={ follow={'elseif','else','end'}, exp = main_statement },
['else']={ follow= {'end'}, exp = main_statement},
['elseif']={ follow={'then'}, exp = main_exp},
['end']={final=true }})

The semantic tags are gone replaced by subgrammars, the grammar it's in is main_statement instead of main_exp, and the whole statement doesn't need a semantic output.

In the actual grammar, disambiguating, for instance, the various uses of the comma involves complicated matching on tags. That does show, I suppose, that this method fails that simple compose property when the grammar isn't suited to it.

This is an example of what that looks like:
def_op(main_exp,',','left',1000,{
{{__par_list,__exp_list},{__name_list},{__elipses}},
{{__par_list,__exp_list},{__id},{__elipses}},
{{__par_list,__exp_list},{__name_list},{__elipses}},
{{__name_list,__exp_list,__var_list},{__id},{__id}},
{{__name_list,__exp_list,__var_list},{__name_list},{__id}},
{{__var_list,__exp_list},{__var},{__var}},
{{__var_list,__exp_list},{__var_list},{__var}},
{{__exp_list},{__var_list},{__exp}},
{{__exp_list},{__exp},{__exp}},
{{__exp_list},{__exp_list},{__exp}},
} )

The first of each clause is the semantic output, the next two mean (for left and right) "must match at least one of the listed tags"

As I said, in a more suitable grammar, none of that matching would be necessary.

Once I get the parser finished I can try to implement the rest of language in various possible syntaxes. The first goal is to have a version that translates these languages into lua in a dynamic system.

### Non-scalable Lisp syntax?!?

I'm genuinely confused you're talking about syntax. And about syntax not scaling to large codebases, and that being somehow obvious? I must be missing something huge. You know Kuhn's incommensurability — which I learned from Dick Gabriel's Onward Essay? A HUGE case of incommensurability — and if I didn't know you, I'd just take if for nonsense — but I do know you.

Going to the merit. If you're ever counting parens in Lisps you're doing it wrong, as a colleague explained me; just add them till you close the right one, and auto-indent the program to show structure. I used to hate s-expressions, and then learned to use them just fine. I still wouldn't write a numerical algorithm without adding syntax sugar for array references (which you can) ;-), but that kind of case might be about it.

Of course, you need to use the right workflow, but the rules are simple — and until we get smarter-than-us strong AIs, we'll always need some rules when programming.

Also, after my Racket experience, I'd say s-expressions are closest to projectional editing that is usable today — my editor is always a lexically valid s-expression (unless I decide otherwise), essentially by simply having electric parens done right. At least Eclipse's support for Scala is (in comparison) horrible, though that's a most unfair comparison.

### I look at this comment and I

I look at this comment and I have no idea what you are replying to. LtU is just too deeply nested, like a Lisp program.

### Presumably the problem isn't

Presumably the problem isn't how deeply nested it is, but how well or badly the nesting is presented. In both cases.

### What he's responding to

It's easy to see what he's responding do, just line up the left edge of his comment and scroll up till you find a comment that goes further left.

Sean might be confused because Blaisorblade misunderstood the comment (also by Sean) that he's responding to, because when Sean wrote "but many people don't really like it, since it can be very hard to handle as code grows large", he wasn't talking about databases of code growing large, he was talking about large snarls of syntax-free code being hard to understand and maintain.

### Touchscreens

It sounds as if you are using one of those old fashioned spinning wheels mounted on a mouse, possibly with a keyboard acting as an accomplice. On a touchscreen, or even a mac touchpad where the scrolling is 2D, it is easier said than done. I guess we have to throw precision under the wheel of progress.

Reading the rendered Drupal output is horrific - significant whitespace is only usable in a local region, err like in lisp, not globally. But the html that it generates is very nicely structured. Am I to believe that the LtU readership is not using the more readable alternative? Use the source luke.

### PL designers are not

PL designers are not graphics designers, just look at how ugly our languages are! Knowing how to code also doesn't make one a web designer...the best web designers spend all their time designing and very little, or none, coding.

### Nesting is one of those

Nesting is one of those things where each level adds increasingly amounts of complexity. So 3 levels, ok, no problem. 10 levels, 100 levels? There is a reason why facebook only allows two levels, why menus can't be more than 3 levels deep, why programming languages hide nesting with syntax even though we all know its still technically there. Somehow people just begin freak out at deep nesting, it creates anxiety.

### Ironic and perfect

People who can't even follow conversations on this website due to how badly the human eye/brain follows nesting, have no right to ever say again that indenting editors make Lisp as readable as other languages!

### feedback

I wrote a rather general introduction (for non-specialists) to the mathematical approach to programming language design. [...] I think the claims in this text could make consensus across many LtU members. Feedback is welcome.

There is some vagueness in the first paragraph that I think suggests deeper problems. At the risk of sounding nit-picky, let me try to reach the deeper issues via the superficial ones:

Humans programmers have invented many different symbolic representations for computer programs, which are called programming languages.

"Human programmers have invented": To the best of my knowledge, all known programmers and all known inventors are human.

"symbolic representations for computer programs": This description applies to flow charts and to the stylized prose Knuth uses to express some algorithms in "The Art of Computer Programming" (yet neither is normally considered to be a programming language).

"representation": Every representation has something which is represented: evidently there is "the program" and "the representation of the program". Yet in vernacular language, the text 10 PRINT "hello" is a program in BASIC.

I'm afraid that what might be said in that first sentence, boiled down, is that programming languages are languages for programming.

The aim of this first paragraph does get a little clearer here:

One can think of them as languages used to communicate with the computer, but it is important to remember that programming is also a social activity, in the sense that many programs are created by a collaboration of several programmers, or that programs written by one programmer may be reused, inspected or modified by others. Programs communicate intent to a computer, but also to other human programmers.

That is clearly trying to say that programming languages have a dual character: programs "communicate" to both computer and programmer, but what does it mean to "communicate with a computer" (is that different from "supplying input to a computer"?) and what, exactly, is being communicated?

What seems to me missing here, in the first place, is acknowledgment that computer programs are interpreted by computing systems (e.g., executed, run, etc.).

If we focus on the automatic interpretation of programs, it becomes apparent that the subject of every programming language is a particular space of possible executions. The Java programming language is a notation for a large set of computations that can be done on a JVM. Scheme is a notation for a set of computations characterized by a particular abstract syntax and denotational semantics. C is a notation for a set of computations characterized by various operational semantics prose standards for C. That kind of thing.

Many of the PLT breakthroughs from the earliest days through the mid-1970s are concerned with how, in general, to characterize "spaces of possible executions". Here I would point to advances in the development of operational, axiomatic, and ultimately denotational semantics.

Much of that early work asked "How can we precisely and usefully characterize the semantics of a program language vis a vis the meaning of programs in that language?" where the meaning of a program is understood as some essential facts about its faithful execution.

With hindsight it seems, in retrospect, that the question of defining programming languages was definitively solved by the end of that period, the mid-70s.

By 1976 we have the operational, axiomatic, and denotational approaches and everything since, in the area of defining languages, refers to these, perhaps elaborating them.

In parallel with developing theories of how to define languages, historic PLT research has also addressed the questions of what possible definitions of spaces of executions are of interest.

Thus we have since the 50s or so a proliferation of families of languages, each family representing a distinct concept of "possible executions". Here I would non-exhaustively name: algorithmic languages, logic languages, functional languages, and actor languages. Each of those broad categories further splits into many variations. So, for example, algorithmic languages might be broken down into stack-discipline languages and languages with more general continuations. On a different axis, algorithmic languages might break down into languages that use "machine" data types, and those with some more abstract idea of types (like Scheme). Logic languages might be broken down according to various approaches to negation. So on.

Recently discussed examples such as programming languages for knitting machines or programming languages for machine learning suggest that it is far too premature to declare we are done inventing new "spaces of possible executions" -- new families of languages.

In any event, the subjects of programming languages are their corresponding spaces of possible executions and, indeed, programming languages express possible executions in a form that can be automatically interpreted by a computing system, and ideally also in a way that helps people communicate with one another about programs.

In that light, the assertion that there exists a "mathematical approach" to PLT concerned with:

• concision
• consistency
• clarity

seems cramped.

In particular, there is no indication there that creating new execution models is a big part of the game.

Furthermore, it is nice to see mentioned that programs are a medium of human communication and field of human cooperation but it is notable that that insight is immediately abandoned and in its place, we proceed with concern for the anecdotal "frustrations" of individual programmers.

For example, it seems to me that the mathematical structure of modularization in a particular language -- what information appears on which sides of various modular boundaries -- must have something to do with the kind of institutional organization and patterns of communications human programmers engage in. For example, what information two departments of programmers must (or ought not) share with one another about their code surely has some relation to the precise semantics of modularity in the language they use.

What I'm left with, finally, from the essay is a sense that the mathematical study of programming language is mainly about formalization, analysis, and refinement of a more or less given foundation of settled languages. That impression certainly reflects a lot of the direction of current work as it appears on LtU.

Yet I keep thinking that is missing the forest for a few trees. For one thing the social implications of a language design seem susceptible to formalization, analysis, and refinement. For another, the question of inventing new models of execution seems to be missing.

### Thanks

Thanks for your feedback. I will go back and meditate on the first half of the comments later, but I would like to address your point on "inventing new models of execution".

I would tend to see "inventing new models of execution" as an activity that arises from the search for conciseness and clarity in programming.

For example, my understanding is that language support for backtracking was inspired by the AI research of the time, and that it eventually lead to logic programming as we know it -- it would be interesting to know whether we can trace the idea of backtracking from an extension to the programming languages of the time to the radical change of logic programming, or whether the latter was developed independently from a model-theoretic culture and reconnected to this application domain later.

I wrote

Finding the right building blocks, however, is still very much dependent of domain knowledge and radical ideas often occur through prototyping or use-case studies, independently of formalization. Our preferred design technique would therefore be formalization and implementation co-evolution, with formalization and programming activities occurring jointly to inform and direct the language design process.

and I thought of programming paradigms as instances of those "building blocks", although I would agree in retrospect that the formulation as it is written seems rather more modest, rather like finding the right API or base combinators in a DSL. I may try to reformulate this.

Moving out from the text to a more general discussion, I think that we cannot (and should not) demand from the formalisation activity to be the source of creativity that provokes radical change. Formalisation has an impressive simplification power, and I think the clarity it brings often helps pivot around an idea, and practically suggests some of the radical ideas that come when studying an existing design. But formalisation should not be our only activity when working on programming language design; in particular, I believe it should be paired with personal practice of programming, and hopefully with the study of other ways of obtaining insight on programming activities.

### models of execution

I would tend to see "inventing new models of execution" as an activity that arises from the search for conciseness and clarity in programming.

I see that. I think you might (or might not) benefit by drawing that out a bit more.

I think that we cannot (and should not) demand from the formalisation activity to be the source of creativity that provokes radical change.

That is a beautiful formulation of an important truth, I think.

### Non-human programmers and inventors

Check out the Humies awards.

### ML vs. PL

I would say that ML is dual of PL now. One deals with getting machines to handle abstractions automatically, the other focuses on making it easier for humans to do it.