LtU Forum

Programming with a Differentiable Forth Interpreter

On arxiv. Abstract:

There are families of neural networks that can learn to compute any function, provided sufficient training data. However, given that in practice training data is scarce for all but a small set of problems, a core question is how to incorporate prior knowledge into a model. Here we consider the case of prior procedural knowledge such as knowing the overall recursive structure of a sequence transduction program or the fact that a program will likely use arithmetic operations on real numbers to solve a task. To this end we present a differentiable interpreter for the programming language Forth. Through a neural implementation of the dual stack machine that underlies Forth, programmers can write program sketches with slots that can be filled with learnable behaviour. As the program interpreter is end-to-end differentiable, we can optimize this behaviour directly through gradient descent techniques on user specified objectives, and also integrate the program into any larger neural computation graph. We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex transduction tasks such as sequence sorting or addition with substantially less data and better generalisation over problem sizes. In addition, we introduce neural program optimisations based on symbolic computation and parallel branching that lead to significant speed improvements.

Theory of syntax extensions: does it exist?

I've become interested in the question of syntax extensions: not the general theory of extending syntax, nor the mechanisms for extending syntax (there are many, from the C preprocessor to Racket's extended syntax-case), but some basis for deciding what syntaxes should be provided and how they ought to be organized. Most languages have fixed syntax, either because their creators took it for granted or as a matter of principle. Those that do allow syntax extension usually have cultural constraints against overusing it, and the extended syntaxes provided by the standard library are a rag-bag whose members depend on history or the creator's whims or both. We seem to be at the same state where syntax libraries are concerned that we were in in the 1960s for procedure libraries.

Is there state of the art about this that I haven't heard about yet? Please inform and/or enlighten me.

For the record...

This google-oracle trial (##googacle) is hysterical. sarah jeong seems to be the person to follow.

Are there any PL experts/LTU-folk as expert witnesses?

STEPS Toward the Reinvention of Programming, 2012 Final Report

It appears that the final report from the STEPS program has been recently released at long last:

STEPS Toward the Reinvention of Programming, 2012 Final Report

If computing is important -- for daily life, learning, business, national defense, jobs, and more -- then qualitatively advancing computing is extremely important. Fro example, many software systems today are made from millions to hundreds of millions of lines of program code that is too large, complex and fragile to be improved, fixed, or integrated. (One hundred million lines of code at 50 lines per page is 5000 books of 400 pages each! This is beyond humane scale.)

What if this could be made literally 1000 times smaller -- or more? And made more powerful, clear, simple, and robust? This would bring one of the most important technologies of our time from a state that is almost out of human reach -- and dangerously close to being out of control -- back into human scale.

An analogy from daily life is to compare teh great pyramid of Giza, which is mostly solid bricks piled on top of each other with very little usable space inside, to a structure of similar size made from the same materials, but using the later invention of the arch. The result would be mostly usable space, and require roughly 1/1000 the number of bricks. In other words, as size and complexity increase, architectural design dominates materials.

The "STEPS Toward Expressive Programming Systems" project is taking the familiar world of personal computing used by more than a billion people every day -- currently reqiring hundreds of millions of lines of code to make and sustain -- and substantially recreating it using new programming techniques and "architectures" in dramatically smaller amounts of program code. This i made possible by new advances in design, programming, programming languages, systems organization, and the use of science to analyze and create models of software artifacts.

..and there seems to be a new research activity with Alan Kay starting up (HARC which might at least inspired by STEPS (if not used a jumping off point).

A Farewell to FRP in Elm

Making signals unnecessary with The Elm Architecture

...the big benefit is that Elm is now significantly easier to learn and use. As the design of subscriptions emerged, we saw that all the toughest concepts in Elm (signals, addresses, and ports) could collapse into simpler concepts in this new world. Elm is designed for ease-of-use, so I was delighted to stumble upon a path that would take us farther with fewer concepts. To put this in more alarmist terms, everything related to signals has been replaced with something simpler and nicer.

Algebra Of Pointers

I have been thinking a little about pointers, and what kind of mathematical structure they form. They seem to be like a one-dimensional vector (as opposed to a simple scalar). If we use the terms location for a point in n-dimensional space, and distance for a vector, then we get the following properties, adding a distance to a location results in a location, subtracting a distance from a location results in a location, subtracting a location from a location results in a distance.

Where is gets more tricky is that adding a location to a location would seem to be invalid (I don't know what type of thing this is), and yet to average two locations, to get the mid-point seems perfectly reasonable. Topologically what is going on here? What is the type of a location added to a location?

If we treat all locations as distances with an origin at zero, then everything makes sense again in terms of types, except for zero itself which seems to still be a location and not a distance (or it all gets horribly self-referential).

Any thoughts on how to make sense of this, what the type of the sum of two locations might be, etc?

Session Types in a Linearly Typed Multi-Threaded Lambda-Calculus

The paper Session Types in a Linearly Typed Multi-Threaded Lambda-Calculus, Hongwei Xi, Zhiqiang Ren, Hanwen Wu, William Blair, 2016.

We present a formalization of session types in a multi-threaded lambda-calculus (MTLC) equipped with a linear type system, establishing for the MTLC both type preservation and global progress. The latter (global progress) implies that the evaluation of a well-typed program in the MTLC can never reach a deadlock. As this formulated MTLC can be readily embedded into ATS, a full-fledged language with a functional programming core that supports both dependent types (of DML-style) and linear types, we obtain a direct implementation of session types in ATS. In addition, we gain immediate support for a form of dependent session types based on this embedding into ATS. Compared to various existing formalizations of session types, we see the one given in this paper is unique in its closeness to concrete implementation. In particular, we report such an implementation ready for practical use that generates Erlang code from well-typed ATS source (making use of session types), thus taking great advantage of the infrastructural support for distributed computing in Erlang.

Programming by poking: why MIT stopped teaching SICP

A blog post; excerpt:

In this talk at the NYC Lisp meetup, Gerry Sussman was asked why MIT stopped teaching the legendary 6.001 course, which was based on Sussman and Abelson’s classic text The Structure and Interpretation of Computer Programs (SICP). Sussman’s answer was that: (1) he and Hal Abelson got tired of teaching it (having done it since the 1980s). So in 1997, they walked into the department head’s office and said: “We quit. Figure out what to do.” And more importantly, (2) that they felt that the SICP curriculum no longer prepared engineers for what engineering is like today. Sussman said that in the 80s and 90s, engineers built complex systems by combining simple and well-understood parts. The goal of SICP was to provide the abstraction language for reasoning about such systems.

Today, this is no longer the case. Sussman pointed out that engineers now routinely write code for complicated hardware that they don’t fully understand (and often can’t understand because of trade secrecy.) The same is true at the software level, since programming environments consist of gigantic libraries with enormous functionality. According to Sussman, his students spend most of their time reading manuals for these libraries to figure out how to stitch them together to get a job done. He said that programming today is “More like science. You grab this piece of library and you poke at it. You write programs that poke it and see what it does. And you say, ‘Can I tweak it to do the thing I want?'”. The “analysis-by-synthesis” view of SICP — where you build a larger system out of smaller, simple parts — became irrelevant. Nowadays, we do programming by poking.

Also, see the Hacker news thread. I thought this comment was useful:

What should we consider fundamental?

A fair question, and a full answer would be too long for a comment (though it would fit in a blog post, which I'll go ahead and write now since this seems to be an issue). But I'll take a whack at the TL;DR version here.

AI, ML, and NLP and web design are application areas, not fundamentals. (You didn't list computer graphics, computer vision, robotics, embedded systems -- all applications, not fundamentals.) You can cover all the set theory and graph theory you need in a day. Most people get this in high school. The stuff you need is just not that complicated. You can safely skip category theory. What you do need is some amount of time spent on the idea that computer programs are mathematical objects which can be reasoned about mathematically. This is the part that the vast majority of people are missing nowadays, and it can be a little tricky to wrap your brain around at first. You need to understand what a fixed point is and why it matters. You need automata theory, but again, the basics are really not that complicated. You need to know about Turing-completeness, and that in addition to Turing machines there are PDAs and FSAs. You need to know that TMs can do things that PDAs can't (like parse context-sensitive grammars), and that PDAs can to things that FSAs can't (like parse context-free grammars) and that FSAs can parse regular expressions, and that's all they can do.

You need some programming language theory. You need to know what a binding is, and that there are two types of bindings that matter: lexical and dynamic. You need to know what an environment is (a mapping between names and bindings) and how environments are chained. You need to know how evaluation and compilation are related, and the role that environments play in both processes. You need to know the difference between static and dynamic typing. You need to know how to compile a high-level language down to an RTL. For operating systems, you need to know what a process is, what a thread is, some of the ways in which parallel processes lead to problems, and some of the mechanisms for dealing with those problems, including the fact that some of those mechanisms require hardware support (e.g. atomic test-and-set instructions). You need a few basic data structures. Mainly what you need is to understand that what data structures are really all about is making associative maps that are more efficient for certain operations under certain circumstances.

You need a little bit of database knowledge. Again, what you really need to know is that what databases are really all about is dealing with the architectural differences between RAM and non-volatile storage, and that a lot of these are going away now that these architectural differences are starting to disappear. That's really about it.

Was there a language with an explicit call stack?

When writing a runtime system or an interpreter in C (though TBH it’s true about all programming languages that I know of) with e. g. moving garbage collection or support for continuations, you find that, surprisingly, the call stack is too abstracted away by the semantics, and basically have to roll a separate one yourself (without the benefits of hardware support) or resort to things like getcontext(3) or GCC’s split stacks that technically are an extension of the semantics that need their own definition. LLVM tries to support precise GC, but falls short of real-world uses. Another facet of this problem is the unpredictability of stack overflows with alloca(3), C99 variable-length arrays or just plain old recursion. (I have to count my recursive calls so that my recursive-descent parser doesn’t crash on malicious input? Really?)

This motivates the question: was there a language that officially permitted or even required managing one’s own call stack, inspecting stack frames and the like? Note that this doesn’t mean not having some sort of subroutine linkage, but could mean banning recursion unless the user explicitly allocates space for it, so that the implicit stack is statically guaranteed to be bounded. (FORTRAN didn’t have recursion, of course, but I doubt it was for this reason.) Things I’d want to (be able to) implement in such a language are, again, a moving GC, closures, coroutines, and continuations. I’d expect the language to be reasonably ALGOL- or ML-like so that FORTH and its kin are excluded.

On a second thought it looks like the requirements of moving GC are orthogonal to those of continuations and well-defined stack overflow behaviour, but closures seem to be somewhere in between, so I’m leaving it as one question with possibly multiple valid answers.

PL's hotness challenge

Blog post on HN. Only the intro is related to PL:

I’m trying to get into neural networks. There have been a couple big breakthroughs in the field in recent years and suddenly my side project of messing around with programming languages seemed short sighted. It almost seems like we’ll have real AI soon and I want to be working on that. While making my first couple steps into the field it’s hard to keep that enthusiasm. A lot of the field is still kinda handwavy where when you want to find out why something is used the way it’s used, the only answer you can get is “because it works like this and it doesn’t work if we change it.”

Putting my ear to the ground, competition from ML has become more and more common not just in PL, but also in systems, I know many researchers who are re-purposing their skills right now while it has become more difficult to find grad students/interns.

So, is this something we should be worried about? What could happen to make PL more competitive in terms of exciting results/mindshare?

XML feed