## Running Probabilistic Programs Backwards

I saw this work presented at ESOP 2015 by Neil Toronto, and the talk was excellent (slides).

Running Probabilistic Programs Backwards
Neil Toronto, Jay McCarthy, David Van Horn
2015

Many probabilistic programming languages allow programs to be run under constraints in order to carry out Bayesian inference. Running programs under constraints could enable other uses such as rare event simulation and probabilistic verification---except that all such probabilistic languages are necessarily limited because they are defined or implemented in terms of an impoverished theory of probability. Measure-theoretic probability provides a more general foundation, but its generality makes finding computational content difficult.

We develop a measure-theoretic semantics for a first-order probabilistic language with recursion, which interprets programs as functions that compute preimages. Preimage functions are generally uncomputable, so we derive an abstract semantics. We implement the abstract semantics and use the implementation to carry out Bayesian inference, stochastic ray tracing (a rare event simulation), and probabilistic verification of floating-point error bounds.

(also on SciRate)

The introduction sells the practical side of the work a bit better than the abstract.

Stochastic ray tracing  is one such rare-event simulation task. As illus- trated in Fig. 1, to carry out stochastic ray tracing, a probabilistic program simulates a light source emitting a single photon in a random direction, which is reflected or absorbed when it hits a wall. The program outputs the photon’s path, which is constrained to pass through an aperture. Millions of paths that meet the constraint are sampled, then projected onto a simulated sensor array.

The program’s main loop is a recursive function with two arguments: path, the photon’s path so far as a list of points, and dir, the photon’s current direction.

simulate-photon path dir := case (find-hit (fst path) dir) of absorb pt −→ (pt, path) reflect pt norm −→ simulate-photon (pt, path) (random-half-dir norm) 

Running simulate-photon (pt, ()) dir, where pt is the light source’s location and dir is a random emission direction, generates a photon path. The fst of the path (the last collision point) is constrained to be in the aperture. The remainder of the program is simple vector math that computes ray-plane intersections.

In contrast, hand-coded stochastic ray tracers, written in general-purpose languages, are much more complex and divorced from the physical processes they simulate, because they must interleave the advanced Monte Carlo algorithms that ensure the aperture constraint is met.

Unfortunately, while many probabilistic programming languages support random real numbers, none are capable of running a probabilistic program like simulate-photon under constraints to carry out stochastic ray tracing. The reason is not lack of engineering or weak algorithms, but is theoretical at its core: they are all either defined or implemented using [density functions]. [...] Programs whose outputs are deterministic functions of random values and programs with recursion generally cannot denote density functions. The program simulate-photon exhibits both characteristics.

Measure-theoretic probability is a more powerful alternative to this naive probability theory based on probability mass and density functions. It not only subsumes naive probability theory, but is capable of defining any computable probability distribution, and many uncomputable distributions. But while even the earliest work  on probabilistic languages is measure-theoretic, the theory’s generality has historically made finding useful computational content difficult. We show that measure-theoretic probability can be made computational by

1. Using measure-theoretic probability to define a compositional, denotational semantics that gives a valid denotation to every program.
2. Deriving an abstract semantics, which allows computing answers to questions about probabilistic programs to arbitrary accuracy.
3. Implementing the abstract semantics and efficiently solving problems.

In fact, our primary implementation, Dr. Bayes, produced Fig. 1b by running a probabilistic program like simulate-photon under an aperture constraint.

## Comment viewing options

### Okay, this I find excellent.

No joke. Simple idea, good theory, has an implementation, no idea what algorithms it enables beyond what they imagine now. New stuff.

It meets all my criteria, doesn't get any better.

### Relation to Hansei?

I'm curious how this work relates to HANSEI, which seems to easily include recursion, etc. and in fact doesn't seem to be "first-order" at all.

"Measure-theoretic probability is a more powerful alternative to this naive probability theory based on probability mass and density functions. It not only subsumes naive probability theory, but is capable of defining any computable probability distribution, and many uncomputable distributions."

Yeah, that's the problem, all right. :-) OTOH...

"And it turns out that algorithmic information theory is really a constructive version of measure (probability) theory..." — Gregory J. Chaitin. Exploring Randomness (Kindle Locations 31-32). Kindle Edition.

We already have a theory of constructive measure theory. It's just called Algorithmic Information Theory or, more often, Kolmogorov Complexity.

It'd be nice if the people studying probability, logic, and AIT figured out they're studying the same thing.

### Relation to Hansei?

I am far from an expert on probablistic languages, so I will certainly say many wrong things.

I think that the core idea of HANSEI is to have a very shallow embedding of the probabilistic operations inside an existing language using delimited control. This allows to define inference algorithms as libraries, manipulating the model given as a first-class value ((unit -> 'a) as a poor man's delimcc+random monad, or reified as non-normalized exploration trees). In particular, HANSEI would in principle also solve this part of the problem:

Hand-coded stochastic ray tracers, written in general-purpose languages, are much more complex and divorced from the physical processes they simulate, because they must interleave the advanced Monte Carlo algorithms that ensure the aperture constraint is met.

HANSEI would also allow you to decouple the implementation of the physical process (user code written against the HANSEI library) from the inference algorithms optimizing the exploration of the probabilistic space (as generic libraries manipulating HANSEI exploration trees).

The question, however, is whether this shallow form is expressive enough to allow efficient implementations for the scenarios studied in the above paper, such as stochastic ray tracing. Embedded Probabilistic Programming, by Oleg Kiselyov and Chung-chieh Shan, 2009, describes "importance sampling" as an HANSEI-implementable algorithm that avoids getting stuck in early-success or early-failure scenarios by exhaustively exploring a fixed number of lookahead steps.

Not being familiar with these algorithms and problems, I have no idea whether this idea would be sufficient to handle the raytracing problem, or could be adapted to work. Reusing it as is seems doubtful, as the branching width of random directions in the half-space is huge, but it may be possible to "sample" the lookahead subspace (just as the paper is sampling preimages) to make it scale. If the raytracing problem uses rays that are reflected many times before reaching the aperture (I don't know), then an algorithm optimized to avoid early success or early failures situation may be in trouble (I don't know).

Post Scriptum: In any case, I think what is said about HANSEI in the Related Works section of the above paper ("Hansei is based on density functions") may be a bit off the mark: I see Hansei as a shallow embedding technique that is relatively decoupled from the probability semantics one decides to give to the probabilistic operators. This would also mean that the other contribution of the paper, namely the detailed study of the relation between the various semantics, is probably valuable to the users of HANSEI and related techniques as well.