214 lines
No EOL
5.3 KiB
Text
214 lines
No EOL
5.3 KiB
Text
#import "../common.typ": *
|
|
#import "@preview/prooftrees:0.1.0": *
|
|
#show: doc => conf("Probabilistic Programming", doc)
|
|
#import "@preview/simplebnf:0.1.0": *
|
|
|
|
#let prob = "probability"
|
|
#let tt = "tt"
|
|
#let ff = "ff"
|
|
#let tru = "true"
|
|
#let fls = "false"
|
|
#let ret = "return"
|
|
#let flip = "flip"
|
|
#let real = "real"
|
|
#let Env = "Env"
|
|
#let Bool = "Bool"
|
|
#let dbp(x) = $db(#x)_p$
|
|
#let dbe(x) = $db(#x)_e$
|
|
#let sharpsat = $\#"SAT"$
|
|
#let sharpP = $\#"P"$
|
|
|
|
- Juden Pearl - Probabilistic graphical models
|
|
|
|
*Definition.* Probabilistic programs are programs that denote #prob distributions.
|
|
|
|
Example:
|
|
|
|
```
|
|
x <- flip 1/2
|
|
x <- flip 1/2
|
|
return x if y
|
|
```
|
|
|
|
$x$ is a random variable that comes from a coin flip. Instead of having a value, the output is a _function_ that equals
|
|
|
|
$
|
|
db( #[```
|
|
x <- flip 1/2
|
|
x <- flip 1/2
|
|
return x if y
|
|
```]
|
|
)
|
|
=
|
|
cases(tt mapsto 3/4, ff mapsto 1/4) $
|
|
|
|
tt is "semantic" true and ff is "semantic" false
|
|
|
|
Sample space $Omega = {tt, ff}$, #prob distribution on $Omega$ -> [0, 1]
|
|
|
|
Semantic brackets $db(...)$
|
|
|
|
=== TinyPPL
|
|
|
|
Syntax
|
|
|
|
$
|
|
#bnf(
|
|
Prod( $p$, annot: $sans("Pure program")$, {
|
|
Or[$x$][_variable_]
|
|
Or[$ifthenelse(p, p, p)$][_conditional_]
|
|
Or[$p or p$][_conjunction_]
|
|
Or[$p and p$][_disjunction_]
|
|
Or[$tru$][_true_]
|
|
Or[$fls$][_false_]
|
|
Or[$e$ $e$][_application_]
|
|
}),
|
|
)
|
|
$
|
|
|
|
$
|
|
#bnf(
|
|
Prod( $e$, annot: $sans("Probabilistic program")$, {
|
|
Or[$x arrow.l e, e$][_assignment_]
|
|
Or[$ret p$][_return_]
|
|
Or[$flip real$][_random_]
|
|
}),
|
|
)
|
|
$
|
|
|
|
Semantics of pure terms
|
|
|
|
- $dbe(p) : Env -> Bool$
|
|
- $dbe(x) ([x mapsto tt]) = tt$
|
|
- $dbe(tru) (rho) = tt$
|
|
- $dbe(p_1 and p_2) (rho) = dbe(p_1) (rho) and dbe(p_2) (rho)$
|
|
- the second $and$ is a "semantic" $and$
|
|
|
|
env is a mapping from identifiers to B
|
|
|
|
Semantics of probabilistic terms
|
|
|
|
- $dbe(e) : Env -> ({tt, ff} -> [0, 1])$
|
|
- $dbe(flip 1/2) (rho) = [tt mapsto 1/2, ff mapsto 1/2]$
|
|
- $dbe(ret p) (rho) = v mapsto cases(1 "if" dbp(p) (rho) = v, 0 "else")$
|
|
// - $dbe(x <- e_1 \, e_2) (rho) = dbp(e_2) (rho ++ [x mapsto dbp(e_1)])$
|
|
- $dbe(x <- e_1 \, e_2) (rho) = v' mapsto sum_(v in {tt, ff}) dbe(e_1) (rho) (v) times db(e_2) (rho [x mapsto v])(v')$
|
|
- "monadic semantics" of PPLs
|
|
- https://homepage.cs.uiowa.edu/~jgmorrs/eecs762f19/papers/ramsay-pfeffer.pdf
|
|
- https://www.sciencedirect.com/science/article/pii/S1571066119300246
|
|
|
|
Getting back a probability distribution
|
|
|
|
=== Tractability
|
|
|
|
What is the complexity class of computing these? #sharpP
|
|
|
|
- Input: boolean formula $phi$
|
|
- Output: number of solutions to $phi$
|
|
- $sharpsat(x or y) = 3$
|
|
|
|
https://en.wikipedia.org/wiki/Toda%27s_theorem
|
|
|
|
This language is actually incredibly intractable. There is a reduction from TinyPPL to #sharpsat
|
|
|
|
Reduction:
|
|
|
|
- Given a formula like $phi = (x or y) and (y or z)$
|
|
- Write a program where each variable is assigned a $flip 1/2$:
|
|
|
|
$x <- flip 1\/2 \
|
|
y <- flip 1\/2 \
|
|
z <- flip 1\/2 \
|
|
ret (x or y) and (y or z)$
|
|
|
|
How hard is this?
|
|
|
|
$#sharpsat (phi) = 2^("# vars") times db("encoded program") (emptyset) (tt)$
|
|
|
|
*Question.* Why do we care about the computational complexity of our denotational semantics?
|
|
_Answer._ Gives us a lower bound on our operational semantics.
|
|
|
|
*Question.* What's the price of adding features like product/sum types?
|
|
_Answer._ Any time you add a syntactic construct, it comes at a price.
|
|
|
|
=== Systems in the wild
|
|
|
|
- Stan https://en.wikipedia.org/wiki/Stan_(software)
|
|
- https://www.tensorflow.org/probability Google
|
|
- https://pyro.ai/ Uber
|
|
|
|
=== Tiny Cond
|
|
|
|
Observe, can prune worlds that can't happen.
|
|
|
|
Un-normalized semantics where the probabilities of things don't sum to 1.
|
|
|
|
#let observe = "observe"
|
|
|
|
$ db(observe p \; e) (rho) (v) = cases(
|
|
db(e) (rho) (v) "if" db(p) (rho) = tt,
|
|
0 "else",
|
|
) $
|
|
|
|
To normalize, divide by the sum
|
|
|
|
$ db(e)(rho)(v) = frac(db(e) (rho) (v), db(e) (rho) (tt) + db(e) (rho) (ff)) $
|
|
|
|
== Operational sampling semantics
|
|
|
|
Expectation of a random variable, $EE$
|
|
|
|
$ EE_Pr[f] = sum^N_w Pr(w) times f(w) \
|
|
approx 1/N sum^N_(w tilde Pr) f(w)
|
|
$
|
|
|
|
Approximate with a finite sample, where $w$ ranges over the sample
|
|
|
|
To prove our runtime strategy sound, we're going to relate it to an _expectation_.
|
|
|
|
https://en.wikipedia.org/wiki/Concentration_inequality
|
|
|
|
=== Big step semantics
|
|
|
|
#let bigstep(sigma, e, v) = $#sigma tack.r #e arrow.b.double #v$
|
|
|
|
$ sigma tack.r angle.l e . rho angle.r arrow.b.double v $
|
|
|
|
$tack.r$ read as "in the context of"
|
|
|
|
We externalized the randomness and put it all in $sigma$
|
|
|
|
This is like pattern matching:
|
|
|
|
$ v :: bigstep(sigma, flip 1/2, v )$
|
|
|
|
== Lecture 3
|
|
|
|
=== Rejection sampling
|
|
|
|
=== Search
|
|
|
|
$x <- flip 1/2 \
|
|
y <- ifthenelse(x, flip 1/3, flip 1/4) \
|
|
z <- ifthenelse(y, flip 1/6, flip 1/7) \
|
|
ret z$
|
|
|
|
You can draw a search tree of probabilities. Add up the probabilities to get the probability that a program returns a specific value.
|
|
|
|
You can share $z$ since it doesn't depend directly on $x$. This builds a *binary decision diagram*.
|
|
|
|
=== Knowledge compilation
|
|
|
|
https://en.wikipedia.org/wiki/Knowledge_compilation
|
|
|
|
Relationship between hardness of propositional reasoning tasks and its syntax of the formula
|
|
|
|
SAT for DNF is easy. What kinds of structure enables efficient reasoning?
|
|
|
|
==== Succinctness
|
|
|
|
$cal(L)_1$ is more succinct than $cal(L)_2$ if it's efficient (polynomial-time) to translate (in a semantics-preserving way) programs written in $cal(L)_2$ to programs written in $cal(L)_1$
|
|
|
|
==== Canonicity of BDDs
|
|
|
|
There is only 1 structural BDD for any particular formula. |