archives

Formalization and programming language design -- explained to all

As part of an introduction to my own academic work, I wrote a rather general introduction (for non-specialists) to the mathematical approach to programming language design. This introduction reasonates with some eternal LtU debates, so I thought that I could propose it here.

I suspect that many frequent posters it will find it very old-school. It is old school, and it is not at all radical text -- I could be more radical, but this would not be appropriate for this document. In particular, it is mostly a post-hoc justification of the general "mathematical" approach as represented in mainstream PL conferences. I'm in favor of supporting diversity and underdog approaches, but I still think the mathematical approach has a very strong case, and I think the claims in this text could make consensus across many LtU members. Feedback is welcome.

Humans programmers have invented many different symbolic representations for computer programs, which are called programming languages. One can think of them as languages used to communicate with the computer, but it is important to remember that programming is also a social activity, in the sense that many programs are created by a collaboration of several programmers, or that programs written by one programmer may be reused, inspected or modified by others. Programs communicate intent to a computer, but also to other human programmers.

Programmers routinely report frustration with the limitations of the programming language they use -- it is very hard to design a good programming language. At least the three following qualities are expected:

  • concision: Simple tasks should be described by simple, not large or complex programs. Complex tasks require complex programs, but their complexity should come solely from the problem domain (the specificity of the required task), not accidental complexity imposed by the programming language.

    For example, early Artificial Intelligence research highlighted the need for language-level support for backtracking (giving up on a series of decisions made toward a goal to start afresh through a different method), and some programming languages make this substantially easier than others.

  • clarity: By reading a program description it should be easy to understand the intent of its author(s). We say that a program has a bug (a defect) when its meaning does not coincide with the intent of its programmers -- they made a mistake when transcribing their thoughts into programs. Clarity is thus an essential component of safety (avoiding program defects), and should be supported by mechanized tools to the largest possible extent. To achieve clarity, some language constructions help programmers express their intent, and programming language designers work on tools to automatically verify that this expressed intent is consistent with the rest of the program description.

    For example, one of the worst security issues that was discovered in 2014 (failure of all Apple computers or mobile phones to verify the authenticity of connections to secure websites) was due to a single line of program text that had been duplicated (written twice instead of only once). The difference between the programmer intent (ensure security of communications) and the effective behavior of the program (allowing malicious network nodes to inspect your communications with your online bank) was dramatic, yet neither the human programmers nor the automated tools used by these programmers reported this error.

  • consistency: A programming language should be regular and structured, making it easy for users to guess how to use the parts of the language they are not already familiar with. Programming languages must be vastly easier to learn than human languages, because their use requires an exacting precision and absence of ambiguity. In particular, consistency supports clarity, as recovering intent from program description requires a good knowledge of the language: the more consistent, the lower the risks of misunderstanding.

Of course, the list above is to be understood as the informal opinion of a practitioner, rather than a scientific claim in itself. Programming is a rich field that spans many activities, and correspondingly programming language research can and should be attacked from many different angles: mathematics (formalization), engineering, design, human-machine interface, ergonomics, psychology, linguistics, sociology, and the working programmers all have something to say about how to make better programming languages.

This work was conducted within a research group -- and a research sub-community -- that uses mathematical formalization as its main tool to study, understand and improve programming languages. To work with a programming language, we give it one or several formal semantics (defining programs as mathematical objects, and their meaning as mathematical relations between programs and their behavior); we can thus prove theorems about programming languages themselves, or about formal program analyses or transformations.

The details of how mathematical formalization can be used to guide programming language design are rather fascinating -- it is a very abstract approach of a very practical activity. The community shares a common baggage of properties that may or may not apply to any given proposed design, and are understood to capture certain usability properties of the resulting programming language. These properties are informed by practical experience using existing languages (designed using this methodology or not), and our understanding of them evolves over time.

Having a formal semantics for the language of study is a solid way to acquire an understanding of what the programs in this language mean, which is a necessary first step for clarity -- the meaning of a program cannot be clear if we don't first agree on what it is. Formalization is a difficult (technical) and time-consuming activity, but its simplification power cannot be understated: the formalization effort naturally suggests many changes that can dramatically improve consistency. By encouraging to build the language around a small core of independent concepts (the best way to reduce the difficulty of formalization), it can also improve concision, as combining small building blocks can be a powerful way to simply express advanced concepts. Finding the right building blocks, however, is still very much dependent of domain knowledge and radical ideas often occur through prototyping or use-case studies, independently of formalization. Our preferred design technique would therefore be formalization and implementation co-evolution, with formalization and programming activities occurring jointly to inform and direct the language design process.

Earl Grey; the story of a new programming language

"Screw it, I'll make my own!"

If you ask me why I made a programming language, I could justify it in a lot of ways, point out its strengths, what I think it does better than the others, and so on. But I don't think that's really the driving force. As I see it, that driving force is, basically, a kind of conceit. A typical programmer will learn one or several well-established languages, depending on what they aim to achieve. They will adapt their way of thinking to fit these tools as best they can. But perhaps you don't want to adapt. You don't like any of the tools you can find because they are never exactly the way you want them, it's like they don't fit your brain at the moment. If you won't adapt to the language, the only alternative is to adapt the language to you. And if you are like me it becomes a bit of an obsession, an itch you just have to scratch...