As part of an introduction to my own academic work, I wrote a rather general introduction (for non-specialists) to the mathematical approach to programming language design. This introduction reasonates with some eternal LtU debates, so I thought that I could propose it here.
I suspect that many frequent posters it will find it very old-school. It is old school, and it is not at all radical text -- I could be more radical, but this would not be appropriate for this document. In particular, it is mostly a post-hoc justification of the general "mathematical" approach as represented in mainstream PL conferences. I'm in favor of supporting diversity and underdog approaches, but I still think the mathematical approach has a very strong case, and I think the claims in this text could make consensus across many LtU members. Feedback is welcome.
Humans programmers have invented many different symbolic
representations for computer programs, which are called programming
languages. One can think of them as languages used to
communicate with the computer, but it is important to remember that
programming is also a social activity, in the sense that many programs
are created by a collaboration of several programmers, or that
programs written by one programmer may be reused, inspected or
modified by others. Programs communicate intent to a computer, but
also to other human programmers.
Programmers routinely report frustration with the limitations of the
programming language they use -- it is very hard to design
a good programming language. At least the three following
qualities are expected:
concision: Simple tasks should be described by simple,
not large or complex programs. Complex tasks require complex
programs, but their complexity should come solely from the problem
domain (the specificity of the required task), not accidental
complexity imposed by the programming language.
For example, early Artificial Intelligence research highlighted the
need for language-level support for backtracking (giving up
on a series of decisions made toward a goal to start afresh through
a different method), and some programming languages make this
substantially easier than others.
clarity: By reading a program description it should be
easy to understand the intent of its author(s). We say that
a program has a bug (a defect) when its meaning does not
coincide with the intent of its programmers -- they made a mistake
when transcribing their thoughts into programs. Clarity is thus an
essential component of safety (avoiding program defects), and should
be supported by mechanized tools to the largest possible extent. To
achieve clarity, some language constructions help programmers
express their intent, and programming language designers work on
tools to automatically verify that this expressed intent is
consistent with the rest of the program description.
For example, one of the worst security issues that was discovered in
2014 (failure of all Apple computers or mobile phones to verify the
authenticity of connections to secure websites) was due to a single
line of program text that had been duplicated (written twice instead
of only once). The difference between the programmer intent
(ensure security of communications) and the effective behavior of
the program (allowing malicious network nodes to inspect your
communications with your online bank) was dramatic, yet neither the
human programmers nor the automated tools used by these programmers
reported this error.
-
consistency: A programming language should be regular and
structured, making it easy for users to guess how to use the parts
of the language they are not already familiar with. Programming
languages must be vastly easier to learn than human languages,
because their use requires an exacting precision and absence of
ambiguity. In particular, consistency supports clarity, as
recovering intent from program description requires a good knowledge
of the language: the more consistent, the lower the risks of
misunderstanding.
Of course, the list above is to be understood as the informal opinion
of a practitioner, rather than a scientific claim in
itself. Programming is a rich field that spans many activities, and
correspondingly programming language research can and should be
attacked from many different angles: mathematics (formalization),
engineering, design, human-machine interface, ergonomics, psychology,
linguistics, sociology, and the working programmers all have something
to say about how to make better programming languages.
This work was conducted within a research group -- and a research
sub-community -- that uses mathematical formalization as its main tool
to study, understand and improve programming languages. To work with
a programming language, we give it one or several formal semantics
(defining programs as mathematical objects, and their meaning as
mathematical relations between programs and their behavior); we can
thus prove theorems about programming languages themselves, or about
formal program analyses or transformations.
The details of how mathematical formalization can be used to guide
programming language design are rather fascinating -- it is a very
abstract approach of a very practical activity. The community shares
a common baggage of properties that may or may not apply to any given
proposed design, and are understood to capture certain usability
properties of the resulting programming language. These properties are
informed by practical experience using existing languages (designed
using this methodology or not), and our understanding of them evolves
over time.
Having a formal semantics for the language of study is a solid way to
acquire an understanding of what the programs in this language
mean, which is a necessary first step for clarity -- the
meaning of a program cannot be clear if we don't first agree on what
it is. Formalization is a difficult (technical) and time-consuming
activity, but its simplification power cannot be understated: the
formalization effort naturally suggests many changes that can
dramatically improve consistency. By encouraging to build the language
around a small core of independent concepts (the best way to reduce
the difficulty of formalization), it can also improve concision, as
combining small building blocks can be a powerful way to simply
express advanced concepts. Finding the right building blocks,
however, is still very much dependent of domain knowledge and radical
ideas often occur through prototyping or use-case studies,
independently of formalization. Our preferred design technique would
therefore be formalization and implementation co-evolution, with
formalization and programming activities occurring jointly to inform
and direct the language design process.
Recent comments
23 weeks 2 days ago
23 weeks 2 days ago
23 weeks 2 days ago
45 weeks 3 days ago
49 weeks 5 days ago
51 weeks 2 days ago
51 weeks 2 days ago
1 year 1 week ago
1 year 6 weeks ago
1 year 6 weeks ago