#import "../common.typ": *
#import "@preview/prooftrees:0.1.0": *
#show: doc => conf("Probabilistic Programming", doc)
#import "@preview/simplebnf:0.1.0": *

#let prob = "probability"
#let tt = "tt"
#let ff = "ff"
#let tru = "true"
#let fls = "false"
#let ret = "return"
#let flip = "flip"
#let real = "real"
#let Env = "Env"
#let Bool = "Bool"
#let dbp(x) = $db(#x)_p$
#let dbe(x) = $db(#x)_e$
#let sharpsat = $\#"SAT"$
#let sharpP = $\#"P"$

- Juden Pearl - Probabilistic graphical models

*Definition.* Probabilistic programs are programs that denote #prob distributions.

Example:

```
x <- flip 1/2
x <- flip 1/2
return x if y
```

$x$ is a random variable that comes from a coin flip. Instead of having a value, the output is a _function_ that equals

$ 
db( #[```
  x <- flip 1/2
  x <- flip 1/2
  return x if y
```]
) 
=
cases(tt mapsto 3/4, ff mapsto 1/4) $

tt is "semantic" true and ff is "semantic" false

Sample space $Omega = {tt, ff}$, #prob distribution on $Omega$ -> [0, 1]

Semantic brackets $db(...)$

=== TinyPPL

Syntax

$
#bnf(
  Prod( $p$, annot: $sans("Pure program")$, {
    Or[$x$][_variable_]
    Or[$ifthenelse(p, p, p)$][_conditional_]
    Or[$p or p$][_conjunction_]
    Or[$p and p$][_disjunction_]
    Or[$tru$][_true_]
    Or[$fls$][_false_]
    Or[$e$ $e$][_application_]
  }),
)
$

$
#bnf(
  Prod( $e$, annot: $sans("Probabilistic program")$, {
    Or[$x arrow.l e, e$][_assignment_]
    Or[$ret p$][_return_]
    Or[$flip real$][_random_]
  }),
)
$

Semantics of pure terms

- $dbe(p) : Env -> Bool$
- $dbe(x) ([x mapsto tt]) = tt$
- $dbe(tru) (rho) = tt$
- $dbe(p_1 and p_2) (rho) = dbe(p_1) (rho) and dbe(p_2) (rho)$
  - the second $and$ is a "semantic" $and$

env is a mapping from identifiers to B

Semantics of probabilistic terms

- $dbe(e) : Env -> ({tt, ff} -> [0, 1])$
- $dbe(flip 1/2) (rho) = [tt mapsto 1/2, ff mapsto 1/2]$
- $dbe(ret p) (rho) = v mapsto cases(1 "if" dbp(p) (rho) = v, 0 "else")$
// - $dbe(x <- e_1 \, e_2) (rho) = dbp(e_2) (rho ++ [x mapsto dbp(e_1)])$
- $dbe(x <- e_1 \, e_2) (rho) = v' mapsto sum_(v in {tt, ff}) dbe(e_1) (rho) (v) times db(e_2) (rho [x mapsto v])(v')$
  - "monadic semantics" of PPLs
  - https://homepage.cs.uiowa.edu/~jgmorrs/eecs762f19/papers/ramsay-pfeffer.pdf
  - https://www.sciencedirect.com/science/article/pii/S1571066119300246

Getting back a probability distribution

=== Tractability

What is the complexity class of computing these? #sharpP

- Input: boolean formula $phi$
- Output: number of solutions to $phi$
  - $sharpsat(x or y) = 3$

https://en.wikipedia.org/wiki/Toda%27s_theorem

This language is actually incredibly intractable. There is a reduction from TinyPPL to #sharpsat

Reduction:

- Given a formula like $phi = (x or y) and (y or z)$
- Write a program where each variable is assigned a $flip 1/2$:

  $x <- flip 1\/2 \
  y <- flip 1\/2 \
  z <- flip 1\/2 \
  ret (x or y) and (y or z)$

How hard is this?

$#sharpsat (phi) = 2^("# vars") times db("encoded program") (emptyset) (tt)$

*Question.* Why do we care about the computational complexity of our denotational semantics?
_Answer._ Gives us a lower bound on our operational semantics.

*Question.* What's the price of adding features like product/sum types?
_Answer._ Any time you add a syntactic construct, it comes at a price.

=== Systems in the wild

- Stan https://en.wikipedia.org/wiki/Stan_(software)
- https://www.tensorflow.org/probability Google
- https://pyro.ai/ Uber