Introducing PathQuery, Google's Graph Query Language

Introducing PathQuery, Google's Graph Query Language

We introduce PathQuery, a graph query language developed to scale with Google's query and data volumes as well as its internal developer community. PathQuery supports flexible and declarative semantics. We have found that this enables query developers to think in a naturally "graphy" design space and to avoid the additional cognitive effort of coordinating numerous joins and subqueries often required to express an equivalent query in a relational space. Despite its traversal-oriented syntactic style, PathQuery has a foundation on a custom variant of relational algebra -- the exposition of which we presently defer -- allowing for the application of both common and novel optimizations. We believe that PathQuery has withstood a "test of time" at Google, under both large scale and low latency requirements. We thus share herein a language design that admits a rigorous declarative semantics, has scaled well in practice, and provides a natural syntax for graph traversals while also admitting complex graph patterns.

Things that are somewhat interesting to me, from an engineering standpoint:

1. PathQuery has a module/compilation system, enabling re-use of PathQuery modules across projects. (Someone had mentioned that Google has around 40,000 PathQuery modules already, internally...)
2. PathQuery supports native functions so that some query pieces can be evaluated procedurally (peephole optimization)
3. Use of relational algebra to enable a lot of known optimizations, plus future optimizations

Also, from a socio-linguistic perspective, Graph Languages are effectively the new Object-Relational Mapping layer, but they solve an interesting organizational problem of allowing multiple teams to code in different languages, without needing to re-write / re-implement entities and mapping configurations in each language. It's the Old New Thing again...