1.**(20 points)** \c{Derive the VC dimension of the following classifiers.}
a. \c{What is the VC dimension, $d_c$, of a threshold $c$ in $\mathbb{R}$? The classification function
is specified by $f (x) = +1$ if $x > c$ and $f (x) = -1$ if $x \le c$. Prove your answer.}
- VC dimension is \boxed{2}
- Given c, pick one point below $c$ and another point above $c$
- For ex: Choose points $\{2, 4\}$ . For any arrangement of + / - labels, you can always distinguish them by putting a threshold at 3
- Cannot shatter 3 points since if there's something in the middle then it's not shatterable
- Choose any points $\{a, b, c\}$ in increasing order. The labeling a=+, b=-, c=+ cannot be achieved with any threshold
- The trivial case of any 2 equaling each other also doesn't work since the case where those 2 are labeled differently cannot be distinguished
b. \c{What is the VC dimension, $d_I$ , of intervals in $\mathbb{R}$? The classification function
specified by an interval $[a,b]$ labels any example positive iff it lies inside the interval
$[a,b]$. Prove your answer.}
- VC dimension is \boxed{2}
- Given the interval, pick one point in the interval and one outside
- For ex: Choose points $\{2, 4\}$
- 2=+, 4=+ => interval (1, 5)
- 2=+, 4=- => interval (1, 3)
- 2=-, 4=+ => interval (3, 5)
- 2=-, 4=- => interval (6, 8)
- Cannot shatter 3 points with the (positive, negative, positive) pattern, since the inside of the interval must be interpreted as positive.
- Same as above, choose any points $\{a, b, c\}$ in increasing order. The labeling a=+, b=-, c=+ cannot be achieved with any interval since the positives are separated by a negative in between
2.**(20 points)** \c{Find the Maximum Likelihood Estimation (MLE) for the following pdf.
In each case, consider a random sample of size $n$. Show your calculation}
a. \c{$f(x|\theta) = \frac{1}{\theta} e^{-\frac{x}{\theta}} , x>0 , \theta>0$}
- To find MLE, first find the log likelihood function:
a. \c{Given the priors $P (C_1)$ and $P (C_2)$, and the Bernoulli densities specified by $p_1 \equiv p(x = 0|C_1)$ and $p_2 \equiv p(x = 0|C_2)$, derive the classification rules for classifying a sample $x$ into $C_1$ and $C_2$ based on the posteriors $P (C_1|x)$ and $P (C_2|x)$. (Hint: give rules for classifying $x = 0$ and $x = 1$.)}
- The posteriors $P(C_i | x)$ can be found by expanding the Bayes' theorem equation:
- Since $p_1=p(x=0|C_1)$ , we can expand this into a general case for $p(x|C_1)$ by using the Bernoulli density formula: $p(x|C_1)=p_1^{(1-x)} (1-p_1)^x$
- Since $p_2$ is defined in an analogous way, I'll write $p(x|C_i)=p_i^{(1-x)} (1-p_i)^x$
- To determine the classification rules, pick the $C_i$ with the maximum posterior:
- For $x=0$ , pick $C_1$ if $P(C_1|x=0)>P(C_2|x=0)$ else $C_2$
- For $x=1$ , pick $C_1$ if $P(C_1|x=1)>P(C_2|x=1)$ else $C_2$
b. \c{Consider D-dimensional independent Bernoulli densities}
$$
\c{
P (x|C) = P (x_1, x_2, \cdots , x_D|C) = \prod\limits_j P (x_j |C)
}
$$
\c{specified by $p_ij \equiv p(x_j = 0|C_i)$ for i = 1, 2 and $j = 1, 2, \cdots , D$. Derive the classification rules for classifying a sample $\mathbf{x}$ into $C_1$ and $C_2$. It is sufficient to give your rule as a function of $\mathbf{x}$.}
- The posteriors $P(C_i|x)$ can be found by expanding the Bayes' theorem equation:
- Since $p_{ij}=p(x_j=0|C_i)$ , we can expand this into a general case for $p(\mathbf{x}|C_i)$ by using the multivariate form of the Bernoulli: $p(\mathbf{x}|C_i)= \prod\limits_{j=1}^{D} p_{ij}^{(1-x_j)} (1-p_{ij})^{x_j}$
- To determine the classification rules, pick the $C_i$ with the maximum posterior
- We use the discriminant function found in the slides $g_i(\mathbf{x}) = p(\mathbf{x} |C_i)P(C_i)$ to select the posterior
- If $g_1(\mathbf{x}) > g_2(\mathbf{x})$ , then choose $C_1$ else choose $C_2$
c. \c{Follow the definition in 3(b) and assume $D = 2, p_{11} = 0.6, p_{12} = 0.1, p_{21} = 0.6$, and $p_{22} = 0.9$. For two different priors ($P (C_1) = 0.2$ or 0.8 and $P (C_2) = 1 - P (C_1)$), calculate the posterior probabilities $P (C_1|x)$ and $P (C_2|x)$. (Hint: Calcu- late the probabilities for all possible samples $(x1, x2) \in \{(0, 0), (0, 1), (1, 0), (1, 1)\}$).}
- I wrote the following Python program to compute these values: