Introducing PathQuery, Google's Graph Query Language

Introducing PathQuery, Google's Graph Query Language

We introduce PathQuery, a graph query language developed to scale with Google's query and data volumes as well as its internal developer community. PathQuery supports flexible and declarative semantics. We have found that this enables query developers to think in a naturally "graphy" design space and to avoid the additional cognitive effort of coordinating numerous joins and subqueries often required to express an equivalent query in a relational space. Despite its traversal-oriented syntactic style, PathQuery has a foundation on a custom variant of relational algebra -- the exposition of which we presently defer -- allowing for the application of both common and novel optimizations. We believe that PathQuery has withstood a "test of time" at Google, under both large scale and low latency requirements. We thus share herein a language design that admits a rigorous declarative semantics, has scaled well in practice, and provides a natural syntax for graph traversals while also admitting complex graph patterns.

Things that are somewhat interesting to me, from an engineering standpoint:

1. PathQuery has a module/compilation system, enabling re-use of PathQuery modules across projects. (Someone had mentioned that Google has around 40,000 PathQuery modules already, internally...)
2. PathQuery supports native functions so that some query pieces can be evaluated procedurally (peephole optimization)
3. Use of relational algebra to enable a lot of known optimizations, plus future optimizations

Also, from a socio-linguistic perspective, Graph Languages are effectively the new Object-Relational Mapping layer, but they solve an interesting organizational problem of allowing multiple teams to code in different languages, without needing to re-write / re-implement entities and mapping configurations in each language. It's the Old New Thing again...

Google announces Logica: organizing your data queries, making them universally reusable and fun

You can read more about it at the Google Open Source blog post, Logica: organizing your data queries, making them universally reusable and fun.

They advocate for datalog-like language they developed internally at Google.

The reason?

Good programming is about creating small, understandable, reusable pieces of logic that can be tested, given names, and organized into packages which can later be used to construct more useful pieces of logic. SQL resists this workflow. Although you can encapsulate certain repeated computations into views and functions, the syntax and support for these can vary among implementations, the notions of packages and imports are generally nonexistent, and higher-level constructions (e.g. passing a function to a function) are impossible.

Coq will be renamed

From the Coq-club:

The Coq development team acknowledges the recent discussions (started on the Coq-Club mailing list) around Coq's logo and name.

We wish to thank everyone that participated in these discussions. Testimonies from people who experienced harassment or awkward situations, reports about students (notably women) who ended up not learning / using Coq because of its name, were all very important so that the community could fully recognize the impact of the current name and its slang meaning in English, especially with respect to gender-diversity in the Coq community.

For these reasons, the Coq development team is open to a renaming.

Suggestions for alternative names go here.

LAMBDA: The ultimate Excel worksheet function

Post by Andy Gordon and Simon Peyton Jones on LAMBDA giving Excel users the ability to define functions.

Ever since it was released in the 1980s, Microsoft Excel has changed how people organize, analyze, and visualize their data, providing a basis for decision-making for the millions of people who use it each day. It’s also the world’s most widely used programming language. Excel formulas are written by an order of magnitude more users than all the C, C++, C#, Java, and Python programmers in the world combined. Despite its success, considered as a programming language Excel has fundamental weaknesses. Over the years, two particular shortcomings have stood out: (1) the Excel formula language really only supported scalar values—numbers, strings, and Booleans—and (2) it didn’t let users define new functions.

Until now.

Google Brain's Jax and Flax

Google's AI division, Google Brain, has two main products for deep learning: TensorFlow and Jax. While TensorFlow is best known, Jax can be thought of as a higher-level language for specifying deep learning algorithms while automatically eliding code that doesn't need to run as part of the model.

Jax evolved from Autograd, and is a combination of Autograd and XLA. Autograd "can automatically differentiate native Python and Numpy code. It can handle a large subset of Python's features, including loops, ifs, recursion and closures, and it can even take derivatives of derivatives of derivatives. It supports reverse-mode differentiation (a.k.a. backpropagation), which means it can efficiently take gradients of scalar-valued functions with respect to array-valued arguments, as well as forward-mode differentiation, and the two can be composed arbitrarily. The main intended application of Autograd is gradient-based optimization."

Flax is then built on top of Jax, and allows for easier customization of existing models.

What do you see as the future of domain specific languages for AI?

Built to Last

Mar Hicks. Built to Last. Logic. Issue 11, "Care".

It was this austerity-driven lack of investment in people—rather than the handy fiction, peddled by state governments, that programmers with obsolete skills retired—that removed COBOL programmers years before this recent crisis. The reality is that there are plenty of new COBOL programmers out there who could do the job. In fact, the majority of people in the COBOL programmers’ Facebook group are twenty-five to thirty-five-years-old, and the number of people being trained to program and maintain COBOL systems globally is only growing. Many people who work with COBOL graduated in the 1990s or 2000s and have spent most of their twenty-first century careers maintaining and programming COBOL systems...

In this sense, COBOL and its scapegoating show us an important aspect of high tech that few in Silicon Valley, or in government, seem to understand. Older systems have value, and constantly building new technological systems for short-term profit at the expense of existing infrastructure is not progress. In fact, it is among the most regressive paths a society can take.

Recently, work on the history of technology has been becoming increasingly more sophisticated and moved beyond telling the story of impressive technology to trying to unravel the social, political, and economic forces that affected the development, deployment, and use of a wide range of technologies and technological systems. Luckily, this trend is beginning to manifest itself in studies of the history of programming languages. While not replacing the need for careful, deeply informed, studies of the internal intellectual forces affecting the development of programming languages, these studies add a sorely needed aspect to the stories we tell.

Tackling the Awkward Squad for Reactive Programming

https://2020.ecoop.org/details/ecoop-2020-papers/19/Tackling-the-Awkward-Squad-for-Reactive-Programming-The-Actor-Reactor-Model

Sam Van den Vonder, Thierry Renaux, Bjarno Oeyen, Joeri De Koster, Wolfgang De Meuter

Reactive programming is a programming paradigm whereby programs are internally represented by a dependency graph, which is used to automatically (re)compute parts of a program whenever its input changes. In practice reactive programming can only be used for some parts of an application: a reactive program is usually embedded in an application that is still written in ordinary imperative languages such as JavaScript or Scala. In this paper we investigate this embedding and we distill “the awkward squad for reactive programming” as 3 concerns that are essential for real-world software development, but that do not fit within reactive programming. They are related to long lasting computations, side-effects, and the coordination between imperative and reactive code. To solve these issues we design a new programming model called the Actor-Reactor Model in which programs are split up in a number of actors and reactors. Actors and reactors enforce a strict separation of imperative and reactive code, and they can be composed via a number of composition operators that make use of data streams. We demonstrate the model via our own implementation in a language called Stella.

The Simple Essence of Algebraic Subtyping: Principal Type Inference with Subtyping Made Easy

The Simple Essence of Algebraic Subtyping: Principal Type Inference with Subtyping Made Easy, Lionel Parreaux, ICFP 2020.

MLsub extends traditional Hindley-Milner type inference with subtyping while preserving compact principal types, an exciting new development. However, its specification in terms of biunification is difficult to understand, relying on the new concepts of bisubstitution and polar types, and making use of advanced notions from abstract algebra. In this paper, we show that these are in fact not essential to understanding the mechanisms at play in MLsub. We propose an alternative algorithm called Simple-sub, which can be implemented efficiently in under 500 lines of code (including parsing, simplification, and pretty-printing), looks more familiar, and is easier to understand.

There's also an introductory blog post and an online demo.

Stephen Dolan's Algebraic Subtyping (discussion) unexpectedly provided a solution to the problem of combining type inference and subtyping, but used somewhat heavy and unusual machinery. Now Lionel Parreaux shows that the system can be implemented in a very straightforward and pleasing way. Here's to hoping that it makes it into real languages!

Applications of Blockchain to Programming Language Theory

Let's talk about Blockchain. Goal is to use this forum topic to highlight its usefulness to programming language theory and practice. If you're familiar with existing research efforts, please share them here. In addition, feel free to generate ideas for how Blockchain could improve languages and developer productivity.

As one tasty example: Blockchain helps to formalize thinking about mutual knowledge and common knowledge, and potentially think about sharing intergalactic computing power through vast distributed computing fabrics. If we can design contracts in such a way that maximizes the usage of mutual knowledge while minimizing common knowledge to situations where you have to "prove your collateral", third-party transactions could eliminate a lot of back office burden. But, there might be benefits in other areas of computer science from such research, as well.

Some language researchers, like Mark S. Miller, have always dreamed of Agoric and the Decades-Long Quest for Secure Smart Contracts.

Some may also be aware that verification of smart contracts is an important research area, because of the notorious theft of purse via logic bug in an Ethereum smart contract.

Turnstile+: Dependent Type Systems as Macros

In 2017, a team from Northeastern University released Turnstile, a framework for implementing propositionally typed languages in Racket; cf. naasking's story Type Systems as Macros. The system was really nice because it allowed type systems to be expressed in a manner similar to the way theoretical PL researchers would in a paper, and because it hooked into Racket's clean compiler backend.

Now Stephen Chang, one of that team, together with new coauthors Michael Ballantyne, Usamilo Turner and William Bowman, have released a rewrite that they call Turnstile+, together with a POPL article, Dependent Type Systems as Macros. From that article's introduction:

Turnstile+ represents a major research leap over its predecessor. Specifically, we solve the major challenges necessary to implement dependent types and their accompanying DSLs and extensions (which Turnstile could not support), while retaining the original abilities of Turnstile. For example, one considerable obstacle was the separation between the macro expansion phase and a program’s runtime phase. Since dependently typed languages may evaluate expressions while type checking, checking dependent types with macros requires new macrology design patterns and abstractions for interleaving expansion, type checking, and evaluation. The following summarizes our key innovations.

  • Turnstile+ demands a radically different API for implementing a language’s types. It must be straightforward yet expressive enough to represent a range of constructs from base types, to binding forms like Π-types, to datatype definition forms for indexed inductive type families.
  • Turnstile+ includes an API for defining type-level computation, which we dub normalization by macro expansion. A programmer writes a reduction rule using syntax resembling familiar on-paper notation, and Turnstile+ generates a macro definition that performs the reduction during macro expansion. This allows easily implementing modular type-level evaluation.
  • Turnstile+’s new type API adds a generic type operation interface, enabling modular implementation of features such as error messages, pattern matching, and resugaring. This is particularly important for implementing tools like tactic systems that inspect intermediate type-checking steps and construct partial terms.
  • Turnstile+’s core type checking infrastructure requires an overhaul, specifically with first-class type environments, in order to accommodate features like dependent binding structures of the shape[x:τ]...,i.e., telescopes [de Bruijn 1991; McBride 2000].
  • Relatedly, Turnstile+’s inference-rule syntax is extended so that operations over telescopes, or premises with references to telescopes, operate as folds instead of as maps

The code is available at https://github.com/stchang/macrotypes.