LtU Forum
"Screw it, I'll make my own!"
If you ask me why I made a programming language, I could justify it in a lot of ways, point out its strengths, what I think it does better than the others, and so on. But I don't think that's really the driving force. As I see it, that driving force is, basically, a kind of conceit. A typical programmer will learn one or several well-established languages, depending on what they aim to achieve. They will adapt their way of thinking to fit these tools as best they can. But perhaps you don't want to adapt. You don't like any of the tools you can find because they are never exactly the way you want them, it's like they don't fit your brain at the moment. If you won't adapt to the language, the only alternative is to adapt the language to you. And if you are like me it becomes a bit of an obsession, an itch you just have to scratch...
As part of an introduction to my own academic work, I wrote a rather general introduction (for non-specialists) to the mathematical approach to programming language design. This introduction reasonates with some eternal LtU debates, so I thought that I could propose it here.
I suspect that many frequent posters it will find it very old-school. It is old school, and it is not at all radical text -- I could be more radical, but this would not be appropriate for this document. In particular, it is mostly a post-hoc justification of the general "mathematical" approach as represented in mainstream PL conferences. I'm in favor of supporting diversity and underdog approaches, but I still think the mathematical approach has a very strong case, and I think the claims in this text could make consensus across many LtU members. Feedback is welcome.
Humans programmers have invented many different symbolic
representations for computer programs, which are called programming
languages. One can think of them as languages used to
communicate with the computer, but it is important to remember that
programming is also a social activity, in the sense that many programs
are created by a collaboration of several programmers, or that
programs written by one programmer may be reused, inspected or
modified by others. Programs communicate intent to a computer, but
also to other human programmers.
Programmers routinely report frustration with the limitations of the
programming language they use -- it is very hard to design
a good programming language. At least the three following
qualities are expected:
concision: Simple tasks should be described by simple,
not large or complex programs. Complex tasks require complex
programs, but their complexity should come solely from the problem
domain (the specificity of the required task), not accidental
complexity imposed by the programming language.
For example, early Artificial Intelligence research highlighted the
need for language-level support for backtracking (giving up
on a series of decisions made toward a goal to start afresh through
a different method), and some programming languages make this
substantially easier than others.
clarity: By reading a program description it should be
easy to understand the intent of its author(s). We say that
a program has a bug (a defect) when its meaning does not
coincide with the intent of its programmers -- they made a mistake
when transcribing their thoughts into programs. Clarity is thus an
essential component of safety (avoiding program defects), and should
be supported by mechanized tools to the largest possible extent. To
achieve clarity, some language constructions help programmers
express their intent, and programming language designers work on
tools to automatically verify that this expressed intent is
consistent with the rest of the program description.
For example, one of the worst security issues that was discovered in
2014 (failure of all Apple computers or mobile phones to verify the
authenticity of connections to secure websites) was due to a single
line of program text that had been duplicated (written twice instead
of only once). The difference between the programmer intent
(ensure security of communications) and the effective behavior of
the program (allowing malicious network nodes to inspect your
communications with your online bank) was dramatic, yet neither the
human programmers nor the automated tools used by these programmers
reported this error.
-
consistency: A programming language should be regular and
structured, making it easy for users to guess how to use the parts
of the language they are not already familiar with. Programming
languages must be vastly easier to learn than human languages,
because their use requires an exacting precision and absence of
ambiguity. In particular, consistency supports clarity, as
recovering intent from program description requires a good knowledge
of the language: the more consistent, the lower the risks of
misunderstanding.
Of course, the list above is to be understood as the informal opinion
of a practitioner, rather than a scientific claim in
itself. Programming is a rich field that spans many activities, and
correspondingly programming language research can and should be
attacked from many different angles: mathematics (formalization),
engineering, design, human-machine interface, ergonomics, psychology,
linguistics, sociology, and the working programmers all have something
to say about how to make better programming languages.
This work was conducted within a research group -- and a research
sub-community -- that uses mathematical formalization as its main tool
to study, understand and improve programming languages. To work with
a programming language, we give it one or several formal semantics
(defining programs as mathematical objects, and their meaning as
mathematical relations between programs and their behavior); we can
thus prove theorems about programming languages themselves, or about
formal program analyses or transformations.
The details of how mathematical formalization can be used to guide
programming language design are rather fascinating -- it is a very
abstract approach of a very practical activity. The community shares
a common baggage of properties that may or may not apply to any given
proposed design, and are understood to capture certain usability
properties of the resulting programming language. These properties are
informed by practical experience using existing languages (designed
using this methodology or not), and our understanding of them evolves
over time.
Having a formal semantics for the language of study is a solid way to
acquire an understanding of what the programs in this language
mean, which is a necessary first step for clarity -- the
meaning of a program cannot be clear if we don't first agree on what
it is. Formalization is a difficult (technical) and time-consuming
activity, but its simplification power cannot be understated: the
formalization effort naturally suggests many changes that can
dramatically improve consistency. By encouraging to build the language
around a small core of independent concepts (the best way to reduce
the difficulty of formalization), it can also improve concision, as
combining small building blocks can be a powerful way to simply
express advanced concepts. Finding the right building blocks,
however, is still very much dependent of domain knowledge and radical
ideas often occur through prototyping or use-case studies,
independently of formalization. Our preferred design technique would
therefore be formalization and implementation co-evolution, with
formalization and programming activities occurring jointly to inform
and direct the language design process.
A technical report by Ravi Chugh et al. Abstract:
We present the SKETCH-N-SKETCH editor for Scalable Vector Graphics (SVG) that integrates programmatic and direct manipulation, two modes of interaction with complementary strengths. In SKETCH-N-SKETCH, the user writes a program to generate an output SVG canvas. Then the user may directly manipulate the canvas while the system infers realtime updates to the program in order to match the changes to the output. To achieve this, we propose (i) a technique called trace-based program synthesis that takes program execution history into account in order to constrain the search space and (ii) heuristics for dealing with ambiguities. Based on our experience writing more than 40 examples and from the results of a study with 25 participants, we conclude that SKETCH-N-SKETCH provides a novel and effective work- flow between the boundaries of existing programmatic and direct manipulation systems.
This was demoed at PLDI to a lot of fanfare. Also see some videos. And a demo that you can actually play with, sweet!
I want to know exactly what my software is doing. Better still, I want cryptographic proof that my software executes each and every computation step correctly.
Wouldn’t you?
Currently, I don’t know what Windows 10 is doing (or it is very hard to find out) and I hate that.
That’s because most of Windows 10:
- is compiled to machine code that bears no resemblance to the original source code,
- hides intricate networks of mutable objects after abstract interfaces,
- and destroys precious machine state after machine state.
Welcome to SPREAD!
In SPREAD, all data, machines states, code and computations are cryptographically authenticated.
To put it mildly, some practical issues had to be resolved to make SPREAD a reality. First and foremost, SPREAD almost completely eradicates mutable state.
Obviously, keeping all the state for authentication purposes would require enormous amounts of storage. SPREAD solves that issue by cryptographically ‘signing’ states incrementally, while still allowing full user-control over which ones need to be signed, and at what level of granularity.
Alternatively, full machine states can also be stored incrementally by SPREAD. In turn, this allows the pervasive re-use of state that was calculated earlier.
So SPREAD kinda acts like a spreadsheet. Because spreadsheets also re-use the previous states of a workbook to optimally recalculate the next.
Unlike SPREAD however, spreadsheets are incapable to keep all their versions around. And typically, Excel plug-ins completely destroy the (otherwise) purely functional nature of spreadsheets. In contrast, SPREAD only allows referentially transparent functions as primitives.
SPREAD builds on top of a recent breakthrough in cryptography called SeqHash. Unfortunately, SeqHash has been unnoticed by most. But hopefully this post will stir some renewed interest. To honour SeqHash, my extension has been called SplitHash:
SplitHash is an immutable, uniquely represented Sequence ADT (Authenticated Data Structure):
- Like SeqHashes, SplitHashes can be concatenated in O(log(n)).
- But SplitHash extends SeqHash by allowing Hashes to also be split in O(log(n)).
- It also solves SeqHash's issue with repeating nodes by applying RLE (Run Length Encoding) compression.
- And to improve cache coherence and memory bandwidth, SplitHashes can be optionally chunked into n-ary trees.
SplitHash is the first known History-Independent(HI) ADT that holds all these properties.
Sorry about the obvious self interest, but I'm just so excited about what I've created that I need to share this.
(This group and the diaspora seems to be doing all sorts of interesting things. I got around to this by wondering about static analysis of JavaScript wrt performance.)
As a developer
Who ends up on concurrent systems
I would like to be able to debug them.
(Even before I run them.)
Persuasive Prediction of Concurrency Access Anomalies
Jeff Huang, Charles Zhang
Department of Computer Science and Engineering The Hong
Kong University of Science and Technology
Predictive analysis is a powerful technique that exposes concurrency bugs in un-exercised program executions. However, current predictive analysis approaches lack the persuasiveness property as they offer little assistance in helping programmers fully understand the execution history that triggers the predicted bugs. We present a persuasive bug prediction technique as well as a prototype tool, PECAN , for detecting general access anomalies (AAs) in concurrent programs. The main characteristic of PECAN is that, in addition to predict AAs in a more general way, it gener- ates ‚bug hatching clips‚ that deterministically instruct the input program to exercise the predicted AAs. The key in- gredient of PECAN is an efficient offline schedule generation algorithm, with proof of the soundness, that guarantees to generate a feasible schedule for every real AA in programs that use locks in a nested way. We evaluate PECAN using twenty-two multi-threaded subjects including six large concurrent systems, and our experiments demonstrate that PECAN is able to effectively predict and deterministically expose real AAs. Several serious and previously unknown bugs in large open source concurrent systems were also re- vealed in our experiments
(sotto voice: I guess there's something to be said for using just utterly unprincipled, unrestricted, unconstrained, awful things like rampant concurrency, and Java, JavaScript, et. al., because it gives the Really Smart People in the world something to attack and improve and show off around. I mean, if we all had the luxury of Doing Things Right from the get-go I feel like lots of valuable insights (with wider application than their originating research) would never have been discovered or created.)
Knitting is the process of creating textile surfaces out of interlocked loops of yarn. With the repeated application of a basic operation – pulling yarn through an existing loop to create a new loop – complicated three-dimensional structures can be created [4]. Knitting machines automate this loop-through-loop process, with some physical limitations arising from their method of storing loops and accessing yarns [1, 3]. Currently, knitting machines are programmed at a very low level. Projects such as AYAB [2] include utilities for designing knit colorwork, but only within a limited stitch architecture; designers working in 3D usually do so via a set of pre-designed templates [4].
From Lea Albaugh, James McCann, "Challenges Facing a High-Level Language for Machine Knitting", POPL2016.
Project Lamdu, a live-programming environment with a little something for everyone, including things like:
- ...the canonical representation of programs should not be text, but rich data structures: Abstract syntax trees.
- Effect Typing... ...allows a live environment to actually execute code as it is being edited, safely, and bring the benefits of spreadsheets to general purpose programming
- When types are rich enough, much of the program structure can be inferred from the types.
- Integrated revision control and live test cases will allow "Regression Debugging".
I have used Typed Lua. I have kept an eye on Typed Racket, and Typed Clojure. Overall I currently get the impression that they somehow don't quite get over some hurdles that would allow them to really shine. And thus people who wanted to love and use them are leaving them instead.
A post today re: Typed Clojure echoes this previous one:
In September 2013 we blogged about why we’re supporting Typed Clojure, and you should too! Now, 2 years later, our engineering team has made a collective decision to stop using Typed Clojure (specifically the core.typed library).
I come not to bury Typed X, but to praise them. Can somebody please work on figuring out what it is that is missing or needs to be tweaked to make them more usable? (Anything from speed to culture.) Or should we truly conclude that trying to augment / paper over a dynamic ecosystem is just for the most part doomed to fail?
Someone has a need to talk.
People have mentioned Bedrock on and off here over the years. I found the latest paper to be an exciting read since for some strange reason I like the idea of verification.
From Network Interface to Multithreaded Web Applications:
A Case Study in Modular Program Verification.
This paper reports on one case study applying modular proof techniques in the Coq proof assistant. To our knowledge, it is the first modular verification certifying a system that combines infrastructure with an application of interest to end users. We assume a nonblocking API for managing TCP networking streams, and on top of that we work our way up to certifying multithreaded, database-backed Web applications.
|
Recent comments
3 days 5 hours ago
4 days 2 hours ago
5 days 7 hours ago
5 days 7 hours ago
1 week 3 days ago
1 week 3 days ago
1 week 3 days ago
4 weeks 4 days ago
5 weeks 2 days ago
5 weeks 2 days ago