## A language for blind uncomprehending idiots who have no idea how programs work.

I'm at present working on a thing which isn't really a programming language, because it's to be used by a thing which isn't really a programmer.

This is part of a machine learning project. Each 'program' is a genome which is operated on by a genetic algorithm. The genetic algorithm of course is the aforementioned blind uncomprehending idiot that has no idea how programs work. It combines and mutates genomes at random; the fact that these genomes code for programs is irrelevant to it.

The PL challenge is to build a programming language (genome encoding) that has the evolvable properties that will enable a GA to make progress. Ordinary programming languages (Koza's work with LISP included) are extraordinarily poorly suited to evolutionary methods.

What is required for evolvability:

First, small changes in the program (usually) make small changes in semantic behavior;

Second, combining elements of different programs results in a new one that's reasonably likely to have semantics similar to both (or all) parents, although there must be a chance that it does something completely different and unexpected instead.

Third, the genetic encoding must support the production of programs that are arbitrarily short or long, arbitrarily simple or complex, and must be able to get there mostly in 'baby steps.'

Fourth, there must be significant retention and potential recombination of structure even when that structure does not semantically affect the current program. (non-coding or 'junk' DNA, alias 'introns', is a reservoir of potentials that may be invoked in the case of mutations and usually code for something that's useful when that particular mutation happens.

Fifth, absolutely random genomes must make a program that has a definite semantics - any runtime or compile error is instantly lethal to that genome.

Sixth, tree representations with branch-swap combination operations are largely considered to be a failed experiment at this point, because they sort of failed many of the evolvability criteria given above. Most combinations did not express anything close to the semantics of any parent, and mutations weren't usually 'small' changes as measured in semantics. There wasn't much of a 'smooth gradient' that genetic algorithms could climb. So, it's time to build something else.

Our blind uncomprehending idiot is attempting to debug the code. It has lots of patience and lots of energy, but absolutely no clue how the genomes are related to semantics; it just has a 'combination' operator and a 'mutation' operator and gets fed 'solution effectiveness' information that it uses to determine which genomes to favor or disfavor.

Now, what kind of programming language should the blind uncomprehending idiot use? Remember that readability, syntactic cleanliness, intuitive design, etc, score absolutely zero here. The programmer can't read, has no idea what syntax is, and has no 'intuition' as such. Its efforts to debug mostly amount to slicing programs at random places and in random directions and sticking them together with bits it sliced from other genomes.

I've managed to build a 'language' that seems to have good evolvable properties for GAs and specifies neural networks. It is, however, horrifying to human sensibilities.

The genomes are metaprograms; A path through them is traversed (each state contains a 'next state' pointer) and any 'output bits' stored at states along the path are output in sequence and become part of the input to the neural network builder. So the genome works something like a funge, and the output of running the funge is a linear program.

The 'structure' results from the path taken through the genome funge. The path is specified by the "next operation" addresses in each node. If part of a path is particularly important then 'introns' can arise to protect it from minor mutations. Such introns would take the form of next-instruction pointers in nodes near the path node, pointing back onto the path. Mutations to instruction pointers usually change the destinations by a small distance in funge-space so the idea of introns in nodes that are 'near' the path makes sense.

I've been pretty careful with the format of the bit strings accepted by the neural network builder, and no catenation of output messages results in an outright error. It might result in a completely useless network, but never an error. And Artificial neural networks are fairly forgiving in terms of retaining most of their functionality when a few nodes here and there are changed or missing or when a few new ones appear, so this
seems more "reasonable" than trying to evolve Lisp programs using subexression exchange.

I think it would be possible to use the same funge-ish mechanism to create output strings that are read as other kinds of program as well, with a different kind of 'interpreter' than the one that builds neural networks. But my confidence is less, because most other kinds of 'program' are far less forgiving of minor structural changes. If you were doing any semantics other than constructing neural networks, how would you do it?

## Branch Forward Only

The fly-by-wire flight software for the Saab Gripen (a lightweight fighter) went a step further. It disallowed both subroutine calls and backward branches, except for the one at the bottom of the main loop. Control flow went forward only. Sometimes one piece of code had to leave a note for a later piece telling it what to do, but this worked out well for testing: all data was allocated statically, and monitoring those variables gave a clear picture of most everything the software was doing. The software did only the bare essentials, and of course, they were serious about thorough ground testing.

No bug has ever been found in the "released for flight" versions of that code.

A quote from John Carmack on Inlined Code. I'm interested in hearing more about that style of programming, and any programming language that facilitates that style.

## Alice, Bob, and Penthesilea: mutually suspicious code and data owners

I'm wondering what is known, if anything, about safe interactions between mutually suspicious data and code owners. Here's an Alice and Bob story abou it:

Alice has some data she wishes to keep confidential. Bob has some code which he wishes to keep confidential. Alice is willing to pay Bob for the right to use his code to operate on her data in some way. I'll assume that Alice assumes that Bob's code actually does the Right Thing, and doesn't leak Alice's data in any way. However, Alice doesn't want Bob to have general access to her data, and Bob doesn't want Alice to have general access to his code.

The conventional solutions are for Bob to compile or otherwise obfuscate his code, and then provide it to Alice under a grave and perilous license, or for Alice to upload her data to Bob's system and constrain him by a similarly grave and perilous contract. However, in modern times neither Alice nor Bob may wish the responsibility of owning and running the computers that the application of Bob's code to Alice's data may require.

So they turn to Penthesilea, who will rent out lots of computers to either Alice or Bob at very reasonable prices. However, Penthesilea is not interested in accepting any grave and perilous legal documents, and disclaims all liability for Bad Things happening. What can Alice and Bob do in order to keep each's confidential material safe from the other and (so far as possible) from Penthesilea as well?

Thoughts and (especially) references to related work would be greatly appreciated.

Update: changed "Alice's code" to "Alice's data"

## 7th Workshop on the Evaluation and Usability of Programming Languages and Tools (PLATEAU) - Call for Papers

7th Workshop on the Evaluation and Usability of Programming Languages and Tools (PLATEAU)

Co-located with SPLASH 2016
Amsterdam, Netherlands

PLATEAU 2016
http://2016.splashcon.org/track/plateau2016

CALL FOR PAPERS

Programming languages exist to enable programmers to develop software effectively. But how efficiently programmers can write software depends on the usability of the languages and tools that they develop with. The aim of this workshop is to discuss methods, metrics and techniques for evaluating the usability of languages and language tools. The supposed benefits of such languages and tools cover a large space, including making programs easier to read, write, and maintain; allowing programmers to write more flexible and powerful programs; and restricting programs to make them more safe and secure.

PLATEAU gathers the intersection of researchers in the programming language, programming tool, and human-computer interaction communities to share their research and discuss the future of evaluation and usability of programming languages and tools.

TOPICS

Some particular areas of interest are:

empirical studies of programming languages
methodologies and philosophies behind language and tool evaluation
software design metrics and their relations to the underlying language
user studies of language features and software engineering tools
visual techniques for understanding programming languages
critical comparisons of programming paradigms
tools to support evaluating programming languages
psychology of programming
domain specific language (e.g. database languages, security/privacy languages, architecture description languages) usability and evaluation

PLATEAU encourages submissions of three types of papers:

Research and position papers: We encourage papers that describe work-in-progress or recently completed work based on the themes and goals of the workshop or related topics, report on experiences gained, question accepted wisdom, raise challenging open problems, or propose speculative new approaches. We will accept two types of papers: research papers up to 8 pages in length; and position papers up to 2 pages in length.

Hypotheses papers: Hypotheses papers explicitly identify beliefs of the research community or software industry about how a programming language, programming language feature, or programming language tool affects programming practice. Hypotheses can be collected from mailing lists, blog posts, paper introductions, developer forums, or interviews. Papers should clearly document the source(s) of each hypothesis and discuss the importance, use, and relevance of the hypotheses on research or practice. In addition, we invite language designers to share some of the usability reasoning that influenced their work. These will serve as an important first step in advancing our understanding of how language design supports programmers.Papers may also, but are not required to, review evidence for or against the hypotheses identified. Hypotheses papers can be up to 4 pages in length.

Submission site: PLATEAU papers should be submitted via HotCRP.
https://plateau2016.hotcrp.com/

Format: Submissions should use the SIGPLAN Proceedings Format (http://www.sigplan.org/Resources/Author/), 10 point font. Note that by default the SIGPLAN Proceedings Format produces papers in 9 point font. If you are formatting your paper using LaTeX, you will need to set the 10pt option in the \documentclass command. If you are formatting your paper using Word, you may wish to use the provided Word template that supports this font size. Please include page numbers in your submission. Setting the preprint option in the LaTeX \documentclass command generates page numbers. Please also ensure that your submission is legible when printed on a black and white printer. In particular, please check that colors remain distinct and font sizes are legible.

All types of papers will be published in the ACM Digital Library at the authors’ discretion.

KEYNOTE

Alan Blackwell
Professor
Computer Laboratory
University of Cambridge
Cambridge, United Kingdom
http://www.cl.cam.ac.uk/~afb21/

DATES
Submission deadline: August 1, 2016

PROGRAM COMMITTEE
Kelly Blincoe, Auckland University of Technology, New Zealand
Jeff Carver, University of Alabama, USA
Kathi Fisler, Worcester Polytechnic Institute, USA
Tudor Gîrba, Independent, Switzerland
Stefan Hanenberg, University of Duisburg-Essen, Germany
Andrew Ko, University of Washington, USA
Brad Myers, Carnegie Mellon University, USA
Peter-Michael Osera, Grinnell College, USA
Janet Siegmund, University of Passau, Germany
Jeremy Singer, University of Glasgow, United Kingdom
Emma Söderberg, Google, USA
Andreas Stefik, University of Nevada, Las Vegas, USA
Ian Utting, University of Kent, United Kingdom
Philip Wadler, University of Edinburgh, United Kingdom

ORGANIZERS
Craig Anslow, Middlesex University, UK
Thomas LaToza, George Mason University, USA
Joshua Sunshine, Carnegie Mellon University, USA

## Is there a language with the ability to write arbitrary type functions?

I'm looking for a usable, fully implemented language that supports a kind of 'programming in types'. That is, it should allow the definition of type functions that take types as parameters, return values that are (existing or new) types, and support compile-time logic to implement those type functions. The generated types should of course then be used to instantiate functions that use those types.

I know that some of this is possible in C++, but the underlying type system in C++ is not really suitable.

I know that anything is possible in Lisp, but again I'm really looking for a language with a modern type system and functions on those types at compile time.

For background, this is in support of my Andl project link here. The need is to define generic operations on tuple types as they are transformed by the relational algebra, with full type inference.

## how to design PL support for effects emerging from parallel non-determinism?

Let's make an on-topic place to discuss programming language relevant issues in concurrency and non-determinism. Software development in general isn't really on topic. It really needs to be about an aspect of PL design or implementation, or else LtU isn't suitable.

It seems folks designing a PL would like a language to be suitable for performance work, in a world where scaling better may imply doing more things in parallel, to get more hardware usage per unit of time in lieu of processors actually running faster. The plan is something like: well if cycles have stalled we can just use more cores. In general this seems to lead to unpredictability, which perhaps might be addressed by PL tech, somehow or other.

Something is missing in how developers specify what they think code will do, in the large. It's a common belief it will be obvious. ("It should be obvious what the code does from reading it, so comments are superfluous.") This is slightly like thinking other people will recognize a tune you have in mind when you drum out its rhythm with your fingers, when in fact they will only hear you drumming your fingers. A pattern you expect to be obvious won't be.

Maybe more than one layer of grammar would help. (Just an idea.) In addition to local language syntax, you could specify larger patterns of what will happen, in terms of expected patterns in future inter-agent transactions, whether they be objects or something else. There seems lack of current art in capturing developer intent about shape of event sequences. Sometimes I suspect a few people intend to fix this with determinism in language behavior, which doesn't seem a good fit for fallible distributed internet components.

I have nothing to post unless a remark catches my eye. If I have something interesting to say, I will. (I don't blog any more, and I'm all but unreachable except here, unless you call my phone or email my work address. My online presence is near zero. I don't even check LinkedIn.)

What specifically would you do to a programming language to address the emergent elephant that results from code specifying tail, legs, trunk, etc?

## PHOG: Probabilistic Model for Code

A paper presented as a poster at ICML. Abstract:

We introduce a new generative model for code called probabilistic higher order grammar (PHOG). PHOG generalizes probabilistic context free grammars (PCFGs) by allowing conditioning of a production rule beyond the parent non-terminal, thus capturing rich contexts relevant to programs. Even though PHOG is more powerful than a PCFG, it can be learned from data just as efficiently. We trained a PHOG model on a large JavaScript code corpus and show that it is more precise than existing models, while similarly fast. As a result, PHOG can immediately benefit existing programming tools based on probabilistic models of code.

Also comes with reviews.

## Transpiling a dynamically typed language to a statically typed language

Hi, I have been pondering what I want to do for my master
thesis (2+ years away) and I have an idea. I am writing to ask if
the question has a known answer or is even theoretically possible
to conduct.

My idea is to write a source-to-source compiler from a Lisp_2 to
Rust, but I do not know if it is possible to transpile from a
dynamic language to a statically typed language.

Below is a rough draft for the introductory chapter. Beware that
my writing style implies that I already know that producing such
a transpiler is possible, but I do not.

Kind regards,
Filip Allberg

A Tale of Two Languages

Imbuing Lisp with the thread-safety guarantees of Rust by using a source-to-source compiler.

Introduction

This short introductory chapter starts by presenting the question we are trying to answer, followed by a summary of our suggested solution that provides a definitive answer to the our question. We end this chapter ends with an overview of the subsequent chapters.

The question

By compiling a language X into a target language Y through the use of a source-to-source compiler (also known as a transpiler) the executable program (created by an additional compilation step) inherits certain qualities from the language Y, particularily its execution model.

Transpiling X to a target language Y may put restraints on the definition of X and possibly have a negative impact on the runtime environment of the final executable certain desireable traits may be inherited from Y, for an example cross-platform portability.

A commonly selected target language for a transpiler is the programming language "C". Since C is available on practically any machine, the C-code that is generated from a program written in X is likely to be portable as well.

Moreover, any optimizations that the C-compiler can achieve are automatically and implicitly available to the source language X. C-compilers excel at carrying out a great many optimizations with respect to allocating registers, laying out code, and choosing modes of address. As designers of the source language we are liberated of such issues.

In the case of C there are numerous pain-points such as pointers running amok and its lack of built-in garbage collection. If the source-language is a garbage-collected language, that is type-safe, such problems have to be addressed in the source-to-source compiler.

Furthermore there are certain limits imposed on our source language by the limitations of C has, such as a maximum of 32 arguments in function calls or less than 16 levels of lexical blocks.

Inherently, each source language presents its own set of advantages and negative consequences with respect to the source language.

The Rust programming language, which is a systems programming language that is a contender in the same domain as C and C++, has the desireable attribute of guaranteeing thread-safety and threads without dataraces.

The question then is, can a specific source language compiled into Rust be unequivocally determined to be thread-safe through the merits of the source-to-source compiler alone?

The solution and answer

In this text we present a source-to-source compiler from a Lisp2 dialect to Rust whereby we can, through the use of the Rust compiler rustc, compile the generated code. Using the generated programs, we can conclude, with the addition of formal theoretical deductions, that valid programs written in the aforementioned Lisp dialect are thread-safe and provides threads without data-races.

Overview

Since there is no additional body of work this subsection cannot be fully written, but a rough layout of the remainder of the text would probably look like:

Chapter 1. An overview of Lisp and a formal description of our dialect as well as an overview of our concurrency model (not sure if the latter is necessary)

Chapter 2. A precursory introduction to Rust with references to papers demonstrating the validity of the claims that Rust provides thread safety guarantees and data race guarantees as this is a corner stone to our work.

Chapter 3. Examples from concrete programs written in the Lisp dialect that will serve as benchmarks when evaluating the code generated by our compiler.

Chapter 4. An overview of the implementation of the compiler that also discuss alternative implementation choices and why they were not chosen

Chapter 5-(maybe several chapters): Explanations describing the parts of the different compilation phases. It is probable that we will encounter some automatas here.

Chapter 6: Code generation optimizations and translation strategies to portable Rust code.

Chapter 7: Evaluation of benchmarks

Chapter 8: Discussions of possible applications, such as tiering source-to-source compilers in a layered fashion to successively imbue desireable traits from each language in the compilation sequence to the top-most source language.

Appendices: Complete source code for the compiler

## Viability of a static type system (like ML) for a relational language?

I'm building a relational language. I like the idea of a static type system but after toying for a while with the interpreter I have wonder how feasible is it.

Here, I think a "type" is the whole head of the relation (even scalars as "1" are relations) and most thing fell fine, but the join operator cause trouble: It could merge heads (or not) depending in the kind of join. So, I'm saying that I feel the relational mode generate on the fly new types and can't be specified before.

So, If i have a query like:

 city = Rel[id:Int name:Str population:Int]

 

city where .id=1 city select [.id] city select [toString(.id)] city union city 

I see the type of each relation change. Is impractical to type by hand each combination, but I like the idea of type most of it. But can the compiler be made to work with this similar to ML?

How work the sql languages, them are dynamic?

## Recursive types

In some literature, including Wikipedia, there is a distinction I don't quite understand made between iso-recursive and eqi-recursive typing. In both cases, however, there is a concept of unrolling a recursive type by replacing the fixpoint variable with the recursive term.

I believe this idea is suboptimal and should be replaced by a better one I shall describe below. I am interested if this has been done before, and if my analysis (and the concrete algorithm I have written) is correct.

The problem with iso-recursion is that it induces an equivalence relation on type representations, and this a partition, which fails to place eqi-equivalent types in the same class. Here is an example:

([] -> [([] -> fix-2)])
([] -> [fix-2]

Here [..] denotes an n-ary tuple type, -> is a function type, and fix-2 is a fixpoint whose binder is implicitly 2 levels up in the structure. These encodings represent the same type, the second one is recursive but its unrolling is not equal to the first type.

I have found a solution to this problem: instead of a full unrolling, we can just unroll one level. This is much better! In particular my re-rolling algorithm subsumes the standard re-rolling by a full circle, and also produces a unique canonical representative of the class. It should work for monomorphic and first order polymophic types, at least if there is only a single fixpoint.

I will not give the algorithm here but I will give a description because it is fun! Consider a piece of string representing your type, with coloured segments for each level. The end of the string is knotted around the string some way up. The knot is called the fixpoint and the place it is knotted to is called the binding point.

The algorithm for re-rolling the string simply rolls the circle up the string one segment and compares the colours. If they're equal, roll up another segment. Continue until you either run out of string, or you get a mismatch, in which case backtrack by one. You can finish by cutting the matching tail off and doing a new knot.

Unrolling is the same. Just unroll ONE segment to shift the fixpoint binder down out of the way.

A standalone demo in Ocaml is here:

https://github.com/felix-lang/felix/blob/master/m.ml