d
This commit is contained in:
parent
3b18a625a5
commit
cb2b3db493
2 changed files with 211 additions and 1 deletions
3
.gitignore
vendored
3
.gitignore
vendored
|
@ -1,4 +1,5 @@
|
|||
*.asv
|
||||
.vscode
|
||||
*.pdf
|
||||
*.zip
|
||||
*.zip
|
||||
.DS_Store
|
||||
|
|
209
assignments/hwk01/HW1.typ
Normal file
209
assignments/hwk01/HW1.typ
Normal file
|
@ -0,0 +1,209 @@
|
|||
#set document(
|
||||
title: "Assignment 1",
|
||||
author: "Michael Zhang <zhan4854@umn.edu>",
|
||||
)
|
||||
|
||||
#let c(body) = {
|
||||
set text(gray)
|
||||
body
|
||||
}
|
||||
#let boxed(body) = {
|
||||
box(stroke: 0.5pt, inset: 2pt, outset: 2pt, baseline: 0pt, body)
|
||||
}
|
||||
|
||||
1. *(20 points)* #c[Derive the VC dimension of the following classifiers.]
|
||||
|
||||
a. #c[What is the VC dimension, $d_c$, of a threshold $c$ in $RR$? The classification function
|
||||
is specified by $f (x) = +1$ if $x > c$ and $f (x) = -1$ if $x lt.eq c$. Prove your answer.]
|
||||
|
||||
- VC dimension is #boxed[2]
|
||||
- Given c, pick one point below $c$ and another point above $c$
|
||||
- For ex: Choose points $\{2, 4\}$ . For any arrangement of + / - labels, you can always distinguish them by putting a threshold at 3
|
||||
- Cannot shatter 3 points since if there's something in the middle then it's not shatterable
|
||||
- Choose any points $\{a, b, c\}$ in increasing order. The labeling a=+, b=-, c=+ cannot be achieved with any threshold
|
||||
- The trivial case of any 2 equaling each other also doesn't work since the case where those 2 are labeled differently cannot be distinguished
|
||||
|
||||
b. #c[What is the VC dimension, $d_I$ , of intervals in $RR$? The classification function
|
||||
specified by an interval $[a,b]$ labels any example positive iff it lies inside the interval
|
||||
$[a,b]$. Prove your answer.]
|
||||
|
||||
- VC dimension is #boxed[2]
|
||||
- Given the interval, pick one point in the interval and one outside
|
||||
|
||||
#table(
|
||||
columns: (auto, auto, auto),
|
||||
[2], [4], [interval],
|
||||
[+], [+], [[1, 5]],
|
||||
[+], [-], [[1, 3]],
|
||||
[-], [+], [[3, 5]],
|
||||
[-], [-], [[6, 8]],
|
||||
)
|
||||
- For ex: Choose points $\{2, 4\}$
|
||||
- 2=+, 4=+ => interval (1, 5)
|
||||
- 2=+, 4=- => interval (1, 3)
|
||||
- 2=-, 4=+ => interval (3, 5)
|
||||
- 2=-, 4=- => interval (6, 8)
|
||||
- Cannot shatter 3 points with the (positive, negative, positive) pattern, since the inside of the interval must be interpreted as positive.
|
||||
- Same as above, choose any points $\{a, b, c\}$ in increasing order. The labeling a=+, b=-, c=+ cannot be achieved with any interval since the positives are separated by a negative in between
|
||||
|
||||
2. *(20 points)* \c{Find the Maximum Likelihood Estimation (MLE) for the following pdf.
|
||||
In each case, consider a random sample of size $n$. Show your calculation}
|
||||
|
||||
a. #c[$f(x|theta) = frac(1, theta) e^(-frac(x, theta)) , x>0 , theta>0$]
|
||||
|
||||
- To find MLE, first find the log likelihood function:
|
||||
$ frak(L) (theta|x) &= log( limits(Pi)_t frac(1,theta) e^(-frac(x^t, theta)) ) \
|
||||
&= sum_t ( log(frac(1, theta)) + log(e^(-frac(x^t,theta))) ) \
|
||||
&= sum_t ( log(frac(1,theta)) -frac(x^t,theta) ) $
|
||||
- Then take the partial with respect to $theta$
|
||||
$$\begin{split}
|
||||
\frac{\partial\mathfrak{L}}{\partialtheta} &= \sum\limits_t \frac{\partial}{\partialtheta} \left( \log(\frac{1}{theta}) -\frac{x^t}{theta} \right) \\
|
||||
&=\sum\limits_t \left( -\frac{1}{theta} + \frac{x^t}{theta^2} \right)
|
||||
\end{split}$$
|
||||
- Now set it to 0 to find a local maximum
|
||||
$$\begin{split}
|
||||
0&=\sum\limits_t \left( -\frac{1}{theta} + \frac{x^t}{theta^2} \right) \\
|
||||
\sum\limits_t \frac{1}{theta} &= \sum\limits_t \frac{x^t}{theta^2} \\
|
||||
\sum\limits_t 1 &= \sum\limits_t \frac{x^t}{theta} \\
|
||||
\sum\limits_t 1 &= \frac{1}{theta} \sum\limits_t x^t \\
|
||||
N &= \frac{1}{theta} \sum\limits_t x^t \\
|
||||
theta &= \boxed{\frac{\sum\limits_t x^t}{N}}
|
||||
\end{split}$$
|
||||
|
||||
b. #c[$f(x|theta) = 2theta x^(2theta - 1) , 0<x lt.eq 1 , 0<theta<infinity$]
|
||||
|
||||
- Find the log likelihood function:
|
||||
$$\begin{split}
|
||||
\mathfrak{L}(theta|x) &= \log \left( \prod\limits_t 2theta {x^t}^{2theta - 1} \right) \\
|
||||
&= \sum\limits_t \left( \log(2theta) + \log({x^t}^{2theta-1}) \right) \\
|
||||
&= \sum\limits_t \left( \log(2theta) + (2theta - 1)\log(x^t) \right)
|
||||
\end{split}$$
|
||||
- Take the partial with respect to $theta$
|
||||
$$\begin{split}
|
||||
\frac{\partial\mathfrak{L}}{\partialtheta} &= \sum\limits_t \frac{\partial}{\partialtheta} \left( \log(2theta) + (2theta - 1)\log(x^t) \right) \\
|
||||
&= \sum\limits_t \left( \frac{1}{theta} + 2\log(x^t) \right)
|
||||
\end{split}$$
|
||||
- Set to 0
|
||||
$$\begin{split}
|
||||
0 &= \sum\limits_t \left( \frac{1}{theta} + 2\log(x^t) \right) \\
|
||||
-\sum\limits_t 2\log(x^t) &= \sum\limits_t \frac{1}{theta} \\
|
||||
-theta \sum\limits_t 2\log(x^t) &= \sum\limits_t 1 \\
|
||||
-theta \sum\limits_t 2\log(x^t) &= N \\
|
||||
theta &= \boxed{-\frac{N}{\sum\limits_t 2\log(x^t)}}
|
||||
\end{split}$$
|
||||
|
||||
3. *(20 points)* \c{Let $P (x|C)$ denote a Bernoulli density function for a class $C \in {C_1, C_2}$
|
||||
and $P (C)$ denote the prior}
|
||||
|
||||
a. \c{Given the priors $P (C_1)$ and $P (C_2)$, and the Bernoulli densities specified by $p_1 equiv p(x = 0|C_1)$ and $p_2 equiv p(x = 0|C_2)$, derive the classification rules for classifying a sample $x$ into $C_1$ and $C_2$ based on the posteriors $P (C_1|x)$ and $P (C_2|x)$. (Hint: give rules for classifying $x = 0$ and $x = 1$.)}
|
||||
|
||||
- The posteriors $P(C_i | x)$ can be found by expanding the Bayes' theorem equation:
|
||||
- $P(C_i|x) = frac(p(x|C_i) P(C_i), sum^{\{1,2\}} p(x|C_k) P(C_k) )$
|
||||
- Since $p_1=p(x=0|C_1)$ , we can expand this into a general case for $p(x|C_1)$ by using the Bernoulli density formula: $p(x|C_1)=p_1^{(1-x)} (1-p_1)^x$
|
||||
- Since $p_2$ is defined in an analogous way, I'll write $p(x|C_i)=p_i^{(1-x)} (1-p_i)^x$
|
||||
- Expanded form: $P(C_i|x)=frac( p_i^{(1-x)} (1-p_i)^x P(C_i) , sum_k^{\{1, 2\}} p_k^{(1-x)} (1-p_k)^x P(C_k) )$
|
||||
- To determine the classification rules, pick the $C_i$ with the maximum posterior:
|
||||
- For $x=0$ , pick $C_1$ if $P(C_1|x=0)>P(C_2|x=0)$ else $C_2$
|
||||
- For $x=1$ , pick $C_1$ if $P(C_1|x=1)>P(C_2|x=1)$ else $C_2$
|
||||
|
||||
b. \c{Consider D-dimensional independent Bernoulli densities}
|
||||
|
||||
$$
|
||||
\c{
|
||||
P (x|C) = P (x_1, x_2, \cdots , x_D|C) = \prod\limits_j P (x_j |C)
|
||||
}
|
||||
$$
|
||||
|
||||
\c{specified by $p_i j equiv p(x_j = 0|C_i)$ for i = 1, 2 and $j = 1, 2, \cdots , D$. Derive the classification rules for classifying a sample $x$ into $C_1$ and $C_2$. It is sufficient to give your rule as a function of $x$.}
|
||||
|
||||
- The posteriors $P(C_i|x)$ can be found by expanding the Bayes' theorem equation:
|
||||
- $P(C_i|x)=frac( p(bold(x)|C_i) P(C_i) , sum_k^{\{1,2\}} p(bold(x)|C_k) P(C_k) )$
|
||||
- Since $p_{i j}=p(x_j=0|C_i)$ , we can expand this into a general case for $p(bold(x)|C_i)$ by using the multivariate form of the Bernoulli: $p(bold(x)|C_i)= pi {j=1}^{D} p_{i j}^{(1-x_j)} (1-p_{i j})^{x_j}$
|
||||
- To determine the classification rules, pick the $C_i$ with the maximum posterior
|
||||
- We use the discriminant function found in the slides $g_i(bold(x)) = p(bold(x) |C_i)P(C_i)$ to select the posterior
|
||||
- If $g_1(bold(x)) > g_2(bold(x))$ , then choose $C_1$ else choose $C_2$
|
||||
|
||||
c. \c{Follow the definition in 3(b) and assume $D = 2, p_{11} = 0.6, p_{12} = 0.1, p_{21} = 0.6$, and $p_{22} = 0.9$. For two different priors ($P (C_1) = 0.2$ or 0.8 and $P (C_2) = 1 - P (C_1)$), calculate the posterior probabilities $P (C_1|x)$ and $P (C_2|x)$. (Hint: Calcu- late the probabilities for all possible samples $(x_1, x_2) \in \{(0, 0), (0, 1), (1, 0), (1, 1)\}$).}
|
||||
|
||||
- I wrote the following Python program to compute these values:
|
||||
|
||||
```py
|
||||
def calc_posterior(p_c1: float, D: int, p_ij: dict[tuple[int, int], float]):
|
||||
priors = {
|
||||
1: p_c1,
|
||||
2: 1 - p_c1,
|
||||
}
|
||||
|
||||
def p_x_given_Ci(xs: list[int], i: int):
|
||||
s = 1.0
|
||||
for j in range(len(xs)):
|
||||
s *= pow(p_ij[i, j], 1.0 - xs[j]) * pow(1.0 - p_ij[i, j], xs[j])
|
||||
return s
|
||||
|
||||
posteriors = {}
|
||||
for i in [1, 2]:
|
||||
for xs in product([0, 1], repeat=D):
|
||||
numer = p_x_given_Ci(xs, i) * priors[i]
|
||||
|
||||
def each_denom(k): return p_x_given_Ci(xs, k) * priors[k]
|
||||
denom = sum(map(each_denom, priors.keys()))
|
||||
posteriors[*xs, i] = numer / denom
|
||||
|
||||
print("Priors:", priors)
|
||||
for xs in product([0, 1], repeat=D):
|
||||
print(f"{xs = }")
|
||||
for i in [1, 2]:
|
||||
prob = posteriors[*xs, i]
|
||||
print(f" * C{i}: {prob:0.3f}")
|
||||
print()
|
||||
|
||||
|
||||
def prob_3c():
|
||||
D = 2
|
||||
p_ij = {}
|
||||
p_ij[1, 0] = 0.6
|
||||
p_ij[1, 1] = 0.1
|
||||
p_ij[2, 0] = 0.6
|
||||
p_ij[2, 1] = 0.9
|
||||
|
||||
calc_posterior(0.2, D, p_ij)
|
||||
calc_posterior(0.8, D, p_ij)
|
||||
```
|
||||
|
||||
- The values that it output are:
|
||||
|
||||
```
|
||||
Priors: {1: 0.2, 2: 0.8}
|
||||
xs = (0, 0)
|
||||
* C1: 0.027
|
||||
* C2: 0.973
|
||||
|
||||
xs = (0, 1)
|
||||
* C1: 0.692
|
||||
* C2: 0.308
|
||||
|
||||
xs = (1, 0)
|
||||
* C1: 0.027
|
||||
* C2: 0.973
|
||||
|
||||
xs = (1, 1)
|
||||
* C1: 0.692
|
||||
* C2: 0.308
|
||||
|
||||
Priors: {1: 0.8, 2: 0.19999999999999996}
|
||||
xs = (0, 0)
|
||||
* C1: 0.308
|
||||
* C2: 0.692
|
||||
|
||||
xs = (0, 1)
|
||||
* C1: 0.973
|
||||
* C2: 0.027
|
||||
|
||||
xs = (1, 0)
|
||||
* C1: 0.308
|
||||
* C2: 0.692
|
||||
|
||||
xs = (1, 1)
|
||||
* C1: 0.973
|
||||
* C2: 0.027
|
||||
```
|
Loading…
Reference in a new issue