Google Open Sources Skynet

Well. Not that exactly. But a product interesting in its own right.

TensorFlow™is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

The site is light on information. It shows off a visual programming tool with a backend in Python/C++, so that seems in line with current interest on LtU.

I am interested in data center applications these days, not sure how it does its magic within Google's centers.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Python is actually a

Python is actually a front-end also. All front-end code does is produce a (symbolic) data flow graph that is then pumped into an engine for the heavy duty stuff. The technique is fairly standard among these toolkits (symbolic data flow graph, automatic/symbolic differentiation for back propagation, etc...), and is also how I did Bling awhile back (GPU languages were already doing this way before ML was on the radar).

I have a colleague, Frank Seide, who is working on a language called BrainScript in this direction...it uses a lot of FP/Array Programming techniques to abstract over loops (since you can't iterate directly, but must generate nodes in a graph). For the PL crowd, this is a very lucrative/timely direction for research.

"Dataflow" at Google.

This sounds somewhat similar to the library described in this video from @scale, although they look like different projects in the same area at Google. The approach in the video does not focus on numerical computation but it does have some interest in the area of scheduling data-flow operations over live sources. The discussion about fast, imprecise sampling of the dataflow functions over the stream to achieve low latency (in comparison to the more precise, but higher latency, batch operations) may suit your interests in data center magics.

Spark is used for some

Spark is used for some machine learning tasks, but is a more general data flow/streaming system. Graph computation engines are also in vogue these days. The systems world is quite active with new not-exactly-languages libraries.

Give that woman a raise! Performance?

Yah, this is more or less what I expected. A combinator/dataflow language expressed as a library in Java where computations expressed in point-free style are analyzed and pushed to the back-end.

I had expected that they would use Scala in interactive mode on a front-end to get rid of all that boilerplate code you need to write in Java.

Still, batch processing seems to have some overhead in the cloud since they ''need to wait until VMs are up and running.'' Yet another legacy, I guess. If I got it right she reported a 2 minutes wait time.

I imagine they boot into some Linux image with Java, send it a jar, then run that jar. That's a lot of overhead which could be avoided if a language supports an Erlang-like VM where at will code could be 'hot-swapped' in.

(I also don't buy into their auto-magical scaling of computations.)

Very nice technology though. Good presentation too. Google seems to be way ahead at the moment.