Is the "Getting started"-guide up-to-date?

The Getting Started thread is very educational, but I am writing to point out that as time progresses more literature comes into being and there might be a more efficient way to digest programming language theory, category theory, programming language design, functional programming etc. I would love to see a curated list that describe a nice progression path from apprentice to journeyman and beyond, that still could be community maintained much like the getting started guide.

There could even be different suggested paths. I for one have programmed a lot of software, and my knowledge mostly pertains to maintainability, and software engineering approaches to distributing knowledge and writing robust systems. I am however very interested in the concepts that this site pertains to, but I have a lot of difficulties approaching it solely from a theoretical stand-point. I need application, and I find that until one's own knowledge moves beyond a particular point does it become possible to dream up one's own applications and sandbox-experiments.

And, I am sure there are other individuals that approach these concepts from a more mathematical standpoint, but I cannot much speak for them as a group as I am not amongst them.

Kind regards,
Filip Allberg

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Programming Languages are an open problem.

A large part of the challenge in describing how to come up to speed on programming language theory is that despite what feels to be the case programming languages turn out to be an open problem. With some very large fundamental challenges remaining unsolved.

One of the classic problems with reasoning about programming languages
is the halting problem. To understand this problem it is instructive to read up on The Lambda Calculus and methods of evaluating it.
To read papers on The Lambda Calculus and The Simply Typed Lambda Calculus you need a passing familiarity with the Sequent Calculus.

Another great problem that is slowly falling to analysis is the problem of heap management and in general resource management. There are several possible approaches people have developed: Garbage Collection, Linear types and it's dual Linear Logic, and most recently there is Separation Logic.

Theoretically Garbage Collection is very simple and Mark&Sweep or Copy Collection can be written in just a few very obvious lines.

Linear types are similarly simple to understand. They are the type of a variable that must be used exactly once. The theory on them is a bit more interesting as it pulls you into substructural logics. This needs a bit of sequent calculus to read the theory. Typechecking for substructural types can either be done by clever typechecking rules or by syntactic checks that enforce the use one property.

Separation Logic and it's kin are also a flavor of substructural logic and the theory is weirdly incomplete. As I recall there is no support for function application in separation logic. To understand separation logic you of course need everything needed to understand linear logic and linear types.

A grand problem is described in John C. Reynolds 1978 paper Syntactic Control of Interference. Basically the problem is can we make it obvious which pieces of code have conflicting side effects. What Reynolds constructs in that paper is a system where any program with side effects can be evaluated lazily (and always yield the same result). Which honestly is a grand accomplishment and something very much desired. As it makes reasoning about programs much easier. Unfortunately the solution that is proposed is one that requires something like whole program inspection to implement, and is unreasonably restrictive.

Still Syntactic Control of Interference yields describes a very real problem, and has spawned work to tackle it. At the same time the paper is quite readable, and concrete.

The work on Separation Logic actually comes from looking for better solutions to the problem presented in Syntactic Control of Interference. To get to Separation Logic there is a detour to the Logic of Bunched Implications which is supported by work in Category Theory.

On the days when Separation Logic works it provides a way of looking at programs and how they manage memory and other resources that is much simpler and much more intuitive than other methods, and most importantly appears to work for normal memory management paradigms that people use in languages like C. Which suggest the possibility of creating a programming language similar to C where it is impossible to use memory after it is freed or to generate a location in memory that has no pointers, pointing to it (a leak). I am currently investigating that possibility.

The most interesting thing I have learned from studying Separation Logic is that proofs can be made similar and more maintainable if the logic that is used for the proof can use concepts that are closer to those of programmers use. Doing that is important as to make programs robust in the face malicious actors seems to require proof.

That is the path I have followed through the maze. I hope that helps. I certainly would not be surprised if more branches of mathematics wind up being useful for someone in some weird but very important way in dealing with programming languages, as that is the nature of mathematics.

Expressiveness

Seeing the preceding nifty overview, I wondered if I could put together anything at all akin for my own area of PL design, which is even more wide open, or to put it another way, nobody has much of a clue how to approach it. Expressiveness, in the broadest possible sense; I don't think my area actually has a name as such, because nobody has much of a clue how to approach it. There is, of course, the whole business about imperative versus declarative, which is not unrelated but is an at least partly separate issue, part of what to say rather than how to say it.

Some bits of history:

Peter Landin set out the concept of syntactic sugar, which is syntax that doesn't really add anything that couldn't simply be rewritten using other features of the language. Macros were the primary vehicle of the extensible languages movement of the sixties; in Standish's post-mortem-ish survey of extensible languages in 1975, he called that kind of extension paraphrase; he also named two other kinds, orthophrase adding a capability the language didn't already have, and metaphrase changing the rules by which phrases in the language are interpreted. (Personally I suspect this trichotomy may be more distracting than useful since fexprs, a feature of much interest to me, seem to straddle the line between paraphrase and metaphrase.)

Landin's mentor Christopher Strachey coined the term first-class value, a value that has all the rights and privileges normally afforded to values in the language. Illustrated particularly by an example of something that isn't first-class: procedures in ALGOL are not first class, for a technical reason later known in the Lisp community as the funarg problem. The problem was resolved in Lisp with the shift from dynamic scope to static scope. (Strachey, btw, effectively created the academic discipline of PL design.)

Aad van Wijngaarden proposed orthogonality as a principle for the design of ALGOL 68; there was a rival design principle, I think maybe it was called diagonality, that didn't prevail (I also never quite understood the rival principle, which is typically the fate of the losing side in such debates). The Revised Report on ALGOL 68 takes orthogonality seriously enough that its table of contents is two-dimensional.

Matthias Felleisen, apparently as a byproduct while developing his dissertation in which he defined relatively well-behaved lambda-like calculi for describing imperative language features (continuations and mutable variables), proposed a formal definition of a "more expressive than" relation between programming languages, building in part on Landin's notion of syntactic sugar. Notable as a rare objective measure in a very subjective area.

It's widely recognized there are two polar-opposite philosophies for language design, one restrictive to prevent the programmer from doing bad stuff, and one permissive to empower the programmer to do good stuff. Opponents may call the former "bondage-and-discipline languages", the latter "unstructured languages" (perhaps there's a similar term of ridicule but I'm not familiar with it; unless maybe followers of the restrictive path consider "unstructured" a sufficient insult). Douglas McIlroy, back around 1970 when the extensible languages movement was at its height (he more-or-less cofounded the movement), described these philosophies as "fascist" and "anarchist", which I suspect has curious cultural dynamics as Europeans are much more sensitive about the term fascism whereas, in the US, as best I recall, we've never had a president assassinated by a fascist but we did have one quite explicitly assassinated in the name of an anarchist agenda. Interestingly, none of those milestones I named takes sides on the restrictive/permissive dimension.

There's the expression problem, of course, which has been discussed from time to time on LtU. (For example, here.) My own efforts have tended, at least in recent years, to fall broadly on the permissive side, and the dictum I've recommended to keep things sane on that side is "dangerous things should be difficult to do by accident". I've speculated, but have yet to prove formally, that fexprs have a property similar to Felleisen's expressiveness, which I call abstractiveness.

All of which seems to me to somehow have more overall shape to it than I've been able to capture here.

Many dimensions

I might want safety in terms of performance, but loosey-gooseyness in terms of security. Or whatever else. There's not just one kind of expressiveness in my mind. Or at least expressiveness can be seen through various lenses, or broken down into various categories. All of which interplay.

Concrete examples:
* I want to know subsets of JavaScript that are the fastest with current VMs so I can stick to that.
* I want all my dynamic languages to secretly be completely amenable to progressive gradual static type inference checking so that I can personally use that even if others do not. Flow seems to work. Things like TypedLua, TypedClojure, TypedRacket in my mind don't so much, perhaps.
* I want to be able to control ownership of things cf. linear types or whatever so that I can be sure I didn't screw up my use of resources.

order vs chaos

Your point about fascist vs anarchist poles in design seems quite relevant to a larger viewpoint about context surrounding PL design (about where a language aims to fit in existing schemes).

two polar-opposite philosophies for language design, one restrictive to prevent the programmer from doing bad stuff, and one permissive to empower the programmer to do good stuff.

Context is hard to talk about, partly because often assumed (the "way things are done" as if unchanging and unchangeable). But a language is designed with some context in mind, which has a model of how things work, when it comes time for execution via interpretation by some hardware. There is also a social context, since source code is also processed by people, which affects whether a language is more or less suitable for a development process practiced by a group using a language. But this just emphasizes that more than one context is always present. For example, ignoring hardware and people, there is also another context: an ecology of tools that will interact with code in different phases. Where you draw lines may depend on what you worry about.

The fascist pole pursues order-by-force, with a focus on one model in typically one context, assuming a fixed ontology: your app will do this, the only events are these, code will arrive in boxcars in this format, and trains will run on time (or heads will roll). It promises safety by imperial fiat. What could go wrong while we're in charge? But any plan you want outside the rules will be crushed. I exaggerate for effect of course.

The anarchist pole pursues chaos-by-play, with a focus on variable models cooperating by contract. Whatever your ontology, if you can say this when that happens, we can get by. But the goal is not chaos as much as variance, and play is just another synonym for freedom. The aim is freedom to vary, but a possible result is chaos. You express things, following minimal rules, and the meaning is emergent, fitting into a context created by the resulting system model, which need not be one statically predefined by the language.

Code passes through hands and eyes of developers, then ends up getting interpreted in some runtime context where other things already happen, but new code can run too, and that forms a context surrounding new code defining its living space. A language must define what is meant by many things. For example, what is a program? You can just take an existing definition from a particular platform, and say a program in this PL is whatever is meant by how that OS does things. "I just work here, I don't make things up." This sort of design makes it easier to lean toward order-by-force: you might take the model as given, and only allow devs to specify a wiggle here and there where choice is permitted.

If instead you wanted to let devs do things that look crazy to the system, but work for a new app, you need something more flexible, with more explicit definitions about models, and about how things interact despite not enforcing a common schema everyone knows is true. More "meta" is needed to permit variance that still harmonizes. I don't have a political point here, though that's easy to read into my description, since fascism generally sounds bad, while freedom is admired. Rather, my point is that something more fancy happens in language design, so meta-rules bound anarchy, if you favor that pole in design philosophy.

Where not to look

You can also read that as a rather long list of where not to look.

Are you referring to the

Are you referring to the above comments?

No

No, I was just contemplating the original post on the 'theory of programming languages' and the first comment on that.

I more or less read this view on what programming languages theory should be as "mathematicians need jobs too." Well, that's fine with me, but looking around in the world, I just don't see much use for the subjects noted in the two posts.

It depends if you want to be

It depends if you want to be a PLT academic or you just want to design/build programming languages/experiences. They seem like increasingly different topics.

what's it about?

be a PLT academic or you just want to design/build programming languages/experiences. They seem like increasingly different topics.

So then what is the academic part about anymore?

PLT is about itself (add a

PLT is about itself (add a joke about Y combinators). Ever since the big bug scare of the late 90s/early 00s, security and formal verification have been a big dominating focus of much of the community (just look at the PLDI 2016 accepted paper list).

There is also now more interest in applying PLT to other fields, like biology.

I expect the community could fork between traditional PLT and those of us who are interested in programmer experiences and don't see the L part as an interesting topic for us in itself.

Y joke around?

There is also now more interest in applying PLT to other fields, like biology.

I am terrified to suspect you refer to so-called genetic engineering which already, all on its own, has a dangerously vacuous cloud of obfuscating bs kicked up around actual lab and field practices that are objectively, existentially reckless.

Among other things...

PL has applications pretty much everywhere in biology, including genetic engineering (ie, "synthetic biology"). But there are many interesting applications not part of that -- things like languages for formalizing lab protocols (reproducibility is an ever-bigger problem as the size and complexity of experiments goes up), scaling up lab automation (eg, languages for specifying microfluidic circuits), and theoretical stuff on things modelling cell signalling pathways (eg, using pi-calculus).

The most exciting thing I've heard about lately is applying language technology to help researchers and medical professionals with vigilance tasks. Eg, a medical prescription in a hospital might say something like "do a pain check every two hours, and give a dose of painkillers if the patient is in serious pain, and abort the prescription and call in a doctor if more than three doses are given in any twelve hour period".

This is basically a program that is executed by a team of nurses. One prescription isn't hard to manage, but if they have a dozen patients with a couple of dozen prescriptions, it's really easy to miss a check or fail to communicate to the rest of the team that a drug has been administered. So by formalizing prescriptions as programs, you can compile them to an app that will notify nurses when checks need to be made, and provide a database that they can use to see all the things that everyone else has already done.

See POP-PL: A Patient-Oriented Prescription Programming Language.

There is nothing wrong with

There is nothing wrong with this, but I don't think the PL work will stand out anywhere near as much as the ML work in the same fields (granted, they are working at different ends of a big problem). PL is more and more about boring compliance work done to make beaurcrats happy and fend off lawsuits. Worthy work for sure, just not very sexy.

And we really don't have this progrsmming thing worked out yet. Shouldn't somebody be looking at that?

Medical errors are actually

Medical errors are actually the number 3 cause of death in the US, killing about 440,000 people per year -- only heart disease and cancer kill more people. When I talked with Spenser Florence, he told me that an early version of this system cut opioid overdose rates at the hospital he's working with from 7/month to zero for the past decade.

If you scale this kind of approach out you're talking about making visible reductions in the mortality rate for the whole country. Honestly, it doesn't matter if the work "stands out", if it can save that many lives.

Like I said, ML has much

Like I said, ML has much more potential in that field. Although they are working at opposite sides of the problem, I have a hunch that advancements in ML applied here will obsolete most of what comes out of PL. And if you think about, this really is the battle between logical AI and machine learned AI played out all over again, since the PL solutions are very similar to the former. Logical AI keeps losing, and will keep losing.

In response to your other point, saving lives is a worthy cause; it's just not mine, at least in the bondage and discipline way. There are plenty of others who aren't into it, it's annoying how much funding it has sucked up, and it's telling that programming has moved away from it in general. Yes, lives need to be saved in hospitals, but programs need to get written also.

AI?

And if you think about, this really is the battle between logical AI and machine learned AI played out all over again, since the PL solutions are very similar to the former.

Could you please explain how the work described by neelk relates to "logical AI"?

The most exciting thing I've

The most exciting thing I've heard about lately is applying language technology to help researchers and medical professionals with vigilance tasks. Eg, a medical prescription in a hospital might say something like "do a pain check every two hours, and give a dose of painkillers if the patient is in serious pain, and abort the prescription and call in a doctor if more than three doses are given in any twelve hour period".

This is basically a program that is executed by a team of nurses. One prescription isn't hard to manage, but if they have a dozen patients with a couple of dozen prescriptions, it's really easy to miss a check or fail to communicate to the rest of the team that a drug has been administered. So by formalizing prescriptions as programs, you can compile them to an app that will notify nurses when checks need to be made, and provide a database that they can use to see all the things that everyone else has already done.

Sounds a lot like symbolic reasoning (the logic-based approach to AI) to me.

Where's the reasoning?

That sounds like it's just providing a minimal programming language to non programmers (from my limited experience with medicine, I'd guess you could handle most cases with a fairly straightforward UI).

I went through and glanced

I went through and glanced at the paper. According to the related work section, symbolic reasoning AI systems are the traditional solution to this problem, and....they are different in a very unclear way, surely they are doing less, but it is hard not to see this as just a stripped down AI system. Here is the quote from the paper:

Computer scientists have already made several attempts to improve clinical practice; these efforts have come largely from the AI community (Peleg et al. 2000; Sharar et al. 1998; Tu and Musen 1999). Most of these efforts, at a high level, consist of building ontologies and decision support systems for medical practice, in an attempt to make more accurate diagnoses, generate treatment plans, or detect hazards. Unlike these approaches, we are attempting to build on the established strengths of software, relieving clinicians from the tedious, error-prone tasks that compromise patient safety. We leave the challenging tasks of interacting with patients, gathering evidence, and diagnosing to people who are well-trained for it and, perhaps more importantly, love to do it.

This just looks like a cool

This just looks like a cool contract language, the medical equivalent of something like financial contract languages we've seen before. There's no AI in this that I can see.

It depends on what "AI" is.

It depends on what "AI" is. Today, we would probably say not, since we've moved over to thinking of intelligence as machine learned. But 10 or 20 years ago when symbolic reasoning and ontologies were still a thing....

abstraction

I wonder if you could help machine learning by supplying ontologies a-priori. Can ml learn to generate ontologies?

To use the popular psychology description, symbolic processing is like a left brain without a right brain, and ml is like a right brain without a left one.

To get human like intelligence you need the two to feed off each other.

ML can generate ontologies

ML can generate ontologies with lots of corpus. We have a project in our lab up stairs that does that (though the clusters are unlabeled).

ML is quite limited by its ability to perform abstract reasoning. So it is perfectly capable of generating abstractions, but can really only use those abstractions if they are somehow pre-included. Humans apply their abstract reasoning skills to the topology of the network being trained, for example.

But that isn't really symbolic reasoning, which is more like rule execution according to well defined semantics. As we would say in PL, normal program execution.