Learning to Execute and Neural Turing Machines

First, Learning to Execute on training neural nets to execute simple programs. Abstract:

Recurrent Neural Networks (RNNs) with Long Short-Term Memory units (LSTM) are widely used because they are expressive and are easy to train. Our interest lies in empirically evaluating the expressiveness and the learnability of LSTMs in the sequence-to-sequence regime by training them to evaluate short computer programs, a domain that has traditionally been seen as too complex for neural networks. We consider a simple class of programs that can be evaluated with a single left-to-right pass using constant memory. Our main result is that LSTMs can learn to map the character-level representations of such programs to their correct outputs. Notably, it was necessary to use curriculum learning, and while conventional curriculum learning proved ineffective, we developed a new variant of curriculum learning that improved our networks’ performance in all experimental conditions. The improved curriculum had a dramatic impact on an addition problem, making it possible to train an LSTM to add two 9-digit numbers with 99% accuracy.

Next, Neural Turing Machines on training Turing machines. Abstract:

We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-toend, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

Perhaps these are baby steps to a future field of neuro PL.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Nice juxtaposition

Two very different and interesting approaches - nice combination.

The first paper (Learning to Execute) really confuses me though. Are they trying to learn an interpreter for a language?

I read their intro as describing the following learning task:
* Given a character stream input
* Learn the target numerical string output

So that the RNN is encoding the mapping (lexer+interpreter). But then I find their examples quite confusing so I've probably misunderstood something. In the memorisation examples towards the end are they demonstrating each example is a random sample from that domain, with the predicted output value?

It wasn't my idea...I found

It wasn't my idea...I found the two papers talked about together here in a discussion about RNNs.

At the end appendix they are presenting the program, results from evaluating the code (target), and then result predicted under a bunch of training techniques.

Deep API Learning

[didn't start a new topic as it reminded me of this one].

This popped up today on Hacker News:

Deep API Learning
Xiaodong Gu, Hongyu Zhang†, Dongmei Zhang†, and Sunghun Kim
Arxiv link

Quick summary: take a very large corpus of annotated code (not manually labelled, treat comments as human-readable annotations). Feed it into a RNN and use it to predict sequences of API calls from a given human-readable query. It's not the first work in the area, but the RNN is learning a model of the language rather than a semantic-free bag-of-words approach.

It was quite cute. Test sequences are in Table 2 (page 8), mostly quite trivial although this one was interesting (standard threats to validity about cherry-picking test sequences apply) :

Query: "copy a file and save it to-your destination path".
Result: FileInputStream.new FileOutputStream.new FileInputStream.getChannel FileOutputStream.getChannel
FileChannel.size FileChannel.transferTo FileInputStream.close
FileOutputStream.close FileChannel.close FileChannel.close

Tea. Earl Grey. Hot.

Could a neuroscientist understand a microprocessor?

Somewhat related research about deep learning. I still have to read it.

Could a neuroscientist understand a microprocessor?, E. Jonas, K. Kording. Pre-print.