The most obsolete infrastructure money could buy - my worst job ever

A funny article by Juho Snellman about really existing legacy software engineering and PLT.

For example on my first day I found that X was running what was supposedly [the] largest VAXcluster remaining in the world, for doing their production builds. Yes, dozens of VAXen running VMS, working as a cross-compile farm, producing x86 code. You might wonder a bit about the viability of the VAX as computing platform in the year 2005. Especially for something as cpu-bound as compiling. But don't worry, one of my new coworkers had as their current task evaluating whether this should be migrated to VMS/Alpha or to VMS/VAX running under a VAX emulator on x86-64!

Why did this company need to maintain a specific C compiler anyway? Well, they had their own ingenious in-house programming language that you could think of as an imperative Erlang with a Pascal-like syntax that was compiled to C source. I have no real data on how much code was written in that language, but it'd have to be tens of millions lines at a minimum.

The result of compiling this C code would then be run on an ingenious in-house operating system that was written in, IIRC, the late 80s. This operating system used the 386's segment registers to implement multitasking and message passing. For this, they needed the a compiler with much more support for segment registers than normal. Now, you might wonder about the wisdom of relying on segment registers heavily in the year 2005. After all use of segment registers had been getting slower and slower with every generation of CPUs, and in x86-64 the segmentation support was essentially removed. But don't worry, there was a project underway to migrate all of this code to run on Solaris instead.

F* (FStar) reworked and released as v0.9.0

I sure hope the aggregate effect of F*, ATS, Rust, and such is to more quickly bring about even better static checking.

v0.9.0 (released before ICFP 2015). Countless improvements, including an interactive mode, a new extraction mechanism to OCaml and F#, the ghost effect, a lot of new examples (micro-F* formalization, Wysteria, etc), hyper-heaps, build-config support, quite a bit of cleanup of the code base.

F* (pronounced F star) is an ML-like functional programming language aimed at program verification. Its type system is based on a core that resembles System Fω (hence the name), but is extended with dependent types, refined monadic effects, refinement types, and higher kinds. Together, these features allow expressing precise and compact specifications for programs, including functional correctness properties. The F* type-checker aims to prove that programs meet their specifications using a combination of SMT solving and manual proofs. Programs written in F* can be translated to OCaml or F# for execution.

The latest version of F* is written entirely in F*, and bootstraps in OCaml and F#. It is open source and under active development on GitHub. A detailed description of this new F* version is available in a recent draft. You can learn more about F* by following the online tutorial.

verified ML

Am I dreaming to hope that more verification is a good thing, and is on the way?

CakeML: a project that aims to make proof assistants into trustworthy and practical program development platforms. At the heart of our approach is a new dialect of ML, which we call CakeML. CakeML is designed to be both easy to program in and easy to reason about formally in proof assistants for higher-order logic.

I'm not saying nobody else is doing / has done cool verified stuff (e.g. cf. verified PreScheme, et. al.), I am just posting this because I'm excited that this project is apparently alive and kicking. I just want to be able to "sudo apt-get install" some day and start programming away, feeling like there's a little bit more safety-blanket feeling there.

Available here with hackernews and reddit discussions ongoing.

Even though I'm not a big fan of Haskell, I'm pretty excited about this. It represents a trend where PL is finally taking holistic programmer experiences seriously, and a move toward interactivity in program development that takes advantage of our (a) rich type systems, and (b) increasing budget of computer cycles. Even that they are trying to sell this is good: if people can get used to paying for tooling, that will encourage even more tooling via a healthy market feedback loop. The only drawback is the MAS sandbox, app stores need to learn how to accept developer tools without crippling them.

Python, Machine Learning, and Language Wars. A Highly Subjective Point of View

A nice article that describes tradeoffs made in choosing a PL for a domain-specific field.

Another "big" question

To continue the interesting prognostication thread, here is another question. Several scientific fields have become increasingly reliant on programming - ranging from sophisticated data analysis to various kinds of standard simulation methodologies. Thus far most of this work is done in conventional languages, with R being the notable exception, being a language mostly dedicated to statistical data analysis. However, as far as statistical analysis goes, R is general purpose -- it is not tied to any specific scientific field [clarification 1, clarification 2]. So the question is whether you think that in the foreseeable future (say 5-15 years) at least one scientific field will make significant use (over 5% market share) of a domain specific language whose functionality (expressiveness) or correctness guarantees will be specific to the scientific enterprise of the field.

It might be interesting to connect the discussion of this question to issues of "open science", the rise in post-publication peer-review, reproducability and so on.

Will scientific fields give rise to hegemonic domain specific languages (within 5-15 years)?

pollcode.com free polls

word2vec

So I made some claims in another topic that the future of programming might be intertwined with ML. I think word2vec provides some interesting evidence to this claim. From the project page:

simple way to investigate the learned representations is to find the closest words for a user-specified word. The distance tool serves that purpose. For example, if you enter 'france', distance will display the most similar words and their distances to 'france', which should look like:

                 Word       Cosine distance
-------------------------------------------
spain              0.678515
belgium              0.665923
netherlands              0.652428
italy              0.633130
switzerland              0.622323
luxembourg              0.610033
portugal              0.577154
russia              0.571507
germany              0.563291
catalonia              0.534176

Of course, we totally see this inferred relationship as a "type" for country (well, if you are OO inclined). Type then is related to distance in a vector space. These vectors have very interesting type-like properties that manifest as inferred analogies; consider:

It was recently shown that the word vectors capture many linguistic regularities, for example vector operations vector('Paris') - vector('France') + vector('Italy') results in a vector that is very close to vector('Rome'), and vector('king') - vector('man') + vector('woman') is close to vector('queen') [3, 1]. You can try out a simple demo by running demo-analogy.sh.

I believe this could lead to some interesting augmentation in PL, in that types can then be used to find useful abstractions in a large corpus of code. But it probably requires an adjustment in how we think about types. The approach is also biased to OO types, but I would love to hear alternative interpretations.

Unstructured casting considered harmful to security

Unstructured casting (e.g. Java, C#, C++, etc.) can be harmful to security.

Structured casting consists of the following:

1: Casting self to an interface implemented by this Actor
2: Upcasting
a) an Actor of an implementation type to the interface type of the implementation
b) an Actor of an interface type to the interface type that was extended
3: Conditional downcasting of an Actor of an interface type to an extension interface type.  (An implementation type cannot be downcast because there is nothing to which to downcast.)


Claim: All other casting is unstructured and should be prohibited.
Note that this means that an implementation cannot be subtyped although an implementation can use other implementations for modularity.

Edit: The above was clarified as a result of a perceptive FriAM comment by Marc Stiegler

Actor DepositOnlyAccount[initialBalance:Euro] uses SimpleAccount[initialBalance]｡
implements Account using
deposit[anAmount] →
⍠Account⨀SimpleAccount.deposit[anAmount]¶
// use deposit message handler from SimpleAccount (see below)
getBalance[ ] → ⦻¶      // always throw exception
withdraw[anAmount:Euro] → ⦻§▮   // always throw exception


As a result of the above definition, DepositOnlyAccount⊒Account and
DepositOnlyAccount has message handlers with the following signatures:


getBalance[ ] ↦ ⦻,   //  always throws exception
withdraw[ ] ↦ ⦻,     //  always throws exception
deposit[Euro] ↦ Void▮


The above makes use of the following:

Interface Account with
getBalance[ ]↦Euro,
deposit[Euro]↦Void,
withdraw[Euro]↦Void▮

Actor SimpleAccount[startingBalance:Euro]
myBalance ≔ startingBalance｡
// myBalance is an assignable variable
// initialized with startingBalance
implements Account using
getBalance[ ] →   myBalance¶
deposit[anAmount] →
Void                    // return Void
afterward  myBalance ≔ myBalance+anAmount¶
// the next  message is processed with
//  myBalance reflecting the deposit
withdraw[anAmount:Euro]:Void →
(amount > myBalance) �
True ⦂ Throw  Overdrawn[ ] ⍌
False ⦂ Void           //  return Void
afterward myBalance ≔ myBalance–anAmount ⍰§▮
//  the next  message is processed with updated myBalance


Harnessing Curiosity to Increase Correctness in End-User Programming

Harnessing Curiosity to Increase Correctness in End-User Programming. Aaron Wilson, Margaret Burnett, Laura Beckwith, Orion Granatir, Ledah Casburn, Curtis Cook, Mike Durham, and Gregg Rothermel. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '03). (ACM paywalled link).

Despite their ability to help with program correctness, assertions have been notoriously unpopular--even with professional programmers. End-user programmers seem even less likely to appreciate the value of assertions; yet end-user programs suffer from serious correctness problems that assertions could help detect. This leads to the following question: can end users be enticed to enter assertions? To investigate this question, we have devised a curiosity-centered approach to eliciting assertions from end users, built on a surprise-explain-reward strategy. Our follow-up work with end-user participants shows that the approach is effective in encouraging end users to enter assertions that help them find errors.

Via a seminar on Human Factors in Programming Languages, by Eric Walkingshaw. To quote Eric's blurb:

This paper introduces the surprise-explain-reward strategy in the context of encouraging end-user programmers to test their programs. Attention investment provides a theory about how users decide where to spend their attention based on cost, risk, and reward. Surprise-explain-reward provides a strategy for altering this equation. Specifically, it attempts to lower the (perceived and actual) costs associated with learning a new feature, while making the reward more immediate and clear.

OcaPic: Programming PIC microcontrollers in OCaml

Most embedded systems development is done in C. It's rare to see a functional programming language target any kind of microcontroller, let alone an 8-bit microcontroller with only a few kB of RAM. But the team behind the OcaPic project has somehow managed to get OCaml running on a PIC18 microcontroller. To do so, they created an efficient OCaml virtual machine in PIC assembler (~4kB of program memory), and utilized some clever techniques to postprocess the compiled bytecode to reduce heap usage, eliminate unused closures, reduce indirections, and compress the bytecode representation. Even if you're not interested in embedded systems, you may find some interesting ideas there for reducing overheads or dealing with constrained resource budgets.