This commit is contained in:
Michael Zhang 2023-11-09 21:29:17 -06:00
parent 301c76e4a1
commit 15481ebb04
11 changed files with 372 additions and 10 deletions

1
.envrc Normal file
View file

@ -0,0 +1 @@
layout python3

3
.gitignore vendored
View file

@ -3,3 +3,6 @@
*.pdf
*.zip
.DS_Store
.venv
.direnv

View file

@ -24,19 +24,19 @@
```
>> AllProblem1
Dataset 1:
Model 1: (train err = 5% ), (test error = 20% )
Model 2: (train err = 6% ), (test error = 17% )
Model 3: (train err = 7% ), (test error = 18% )
Model 1: (train err = 5%), (test error = 20%)
Model 2: (train err = 6%), (test error = 17%)
Model 3: (train err = 7%), (test error = 18%)
Dataset 2:
Model 1: (train err = 7% ), (test error = 23% )
Model 2: (train err = 14% ), (test error = 56% )
Model 3: (train err = 13% ), (test error = 53% )
Model 1: (train err = 7%), (test error = 23%)
Model 2: (train err = 14%), (test error = 56%)
Model 3: (train err = 13%), (test error = 53%)
Dataset 3:
Model 1: (train err = 1% ), (test error = 12% )
Model 2: (train err = 19% ), (test error = 45% )
Model 3: (train err = 2% ), (test error = 5% )
Model 1: (train err = 1%), (test error = 12%)
Model 2: (train err = 19%), (test error = 45%)
Model 3: (train err = 2%), (test error = 5%)
```
b. #c[*(5 points)* State which model works best on each test data set and explain why you believe this is the case. Discuss your observations.]
@ -47,9 +47,68 @@
- For dataset 2, model 1 worked the best.
- For dataset 3, model 3 worked the best.
The separate covariance matrices work better when each class has its own individual dimensions, while the shared matrix works better when they're closer. The third model will work better when there's not a lot of data and the dependency may be inaccurate.
c. #c[*(15 points)* Write the log likelihood function and derive $S_1$ and $S_2$ by maximum likelihood estimation of model 2. Note that since $S_1$ and $S_2$ are shared as $S$, you need to add the log likelihood function of the two classes to maximizing for deriving $S$.]
The maximum likelihood of a single
Starting with priors, $P(C_1) = frac(N_1, N)$ and $P(C_2) = frac(N_2, N)$.
Supposing individual sample mean is $m_1 = frac(sum_t r_1^t x_1^t, N_1)$ and $m_2 = frac(sum_t r_2^t x_2^t, N_2)$, the combined mean $m$ can be found by
$ bold(m) = m_1 p(C_1) + m_2 p(C_2) $
Then, for the final $S$, we have:
#let wtf = $m_1 p(C_1) + m_2 p(C_2)$
$ s_i &= frac(1, N) sum_t (bold(x)^t - bold(m)) (bold(x)^t - bold(m))^T \
&= frac(1, N) sum_t (bold(x)^t (bold(x)^t)^T - bold(x)^t bold(m)^T - bold(m) (bold(x)^t)^T + bold(m) bold(m)^T) \
$
In our $S_i$, we have:
$ s_j &= frac(1, N_i) sum_t^N_i (bold(x)^t - bold(m)_i) (bold(x)^t - bold(m)_i)^T \
$
When we add that together as a weighted sum we get:
$ s_j &= frac(1, N) (sum_t^N_1 (bold(x)^t - bold(m)_1) (bold(x)^t - bold(m)_1)^T + sum_t^N_2 (bold(x)^t - bold(m)_2) (bold(x)^t - bold(m)_2)^T) \
&= frac(1, N) (sum_t^N bold(x)^t (bold(x)^t)^T - (sum_t^N_1 bold(x)^t bold(m)_1^T + sum_t^N_2 bold(x)^t bold(m)_2^T) - (sum_t^N_1 bold(m)_1 bold(x)^t + sum_t^N_2 bold(m)_2 bold(x)^t) + (sum_t^N_1 bold(m)_1 bold(m)_1^T + sum_t^N_2 bold(m)_2 bold(m)_2^T)) \
&= frac(1, N) (sum_t^N bold(x)^t (bold(x)^t)^T - bold(x)^t (sum_t^N_1 bold(m)_1^T + sum_t^N_2 bold(m)_2^T) - (sum_t^N_1 bold(m)_1 + sum_t^N_2 bold(m)_2) bold(x)^t + (sum_t^N_1 bold(m)_1 bold(m)_1^T + sum_t^N_2 bold(m)_2 bold(m)_2^T)) \
&= frac(1, N) sum_t^N (bold(x)^t (bold(x)^t)^T - bold(x)^t bold(m)^T - bold(m) (bold(x)^t)^T + bold(m) bold(m)^T) \
$
which matches the above equation
// Then:
// - for the sample covariance $S_1$, $s_(i i) = frac(sum_t (x_i^t - m_i)^2, P(C_1)) = frac(sum_t (x_i^t)^2 - 2x_i^t m_i + m_i^2, P(C_1))$ (with mean drawing from $m_1$)
// - for the sample covariance $S_2$, $s_(i i) = frac(sum_t (x_i^t - m_i)^2, P(C_2)) = frac(sum_t (x_i^t)^2 - 2x_i^t m_i + m_i^2, P(C_2))$ (with a different mean drawing from $m_2$)
// the overall covariance is $s_(i j) = frac(sum_t (x_i^t - m_1)(x_j^t - m_2), N)$
// deriving from $S_1$ and $S_2$ you get:
// $ s_(i j) &= frac(1,N) ((S_1)_(i j) P(C_1) + (S_2)_(i j) P(C_2)) \
// &= frac(1,N) (sum_t (x_i^t)^2 - 2x_i^t (m_1)_i + (m_1)_i^2 + (x_i^t)^2 - 2x_i^t (m_2)_i + (m_2)_i^2) \
// &= frac(1,N) (sum_t 2(x_i^t)^2 - 2x_i^t ((m_1)_i + (m_2)_i) + (m_1)_i^2 + (m_2)_i^2) \
// $
// Combining covariance matrixes:
// $ bold(S) &= frac(sum_t (bold(x)^t - bold(m)_1) (bold(x)^t - bold(m)_2)^T, N) \
// &= frac(sum_t bold(x)^t^T bold(x)^t - bold(x)^t bold(m)_2 - bold(m)_1 bold(x)^t + bold(m)_1 bold(m)_2, N) \
// $
// // The maximum likelihood of a single class can be found with:
// // $ theta^* &= "argmax"_theta cal(L) (theta|bold(X)) \
// // frac(diff, diff theta) cal(L) (theta|bold(X)) &= 0 \
// // frac(diff, diff theta) log(cal(l) (theta|bold(X))) &= 0 \
// // frac(diff, diff theta) sum_t log(p(x^t|theta)) &= 0 \
// // frac(diff, diff theta) sum_t log(frac(1,(2 pi)^(d/2) |bold(Sigma)_i|^(1/2)) exp(-frac(1,2)(bold(x)-bold(mu)_i)^T bold(Sigma)_i^(-1) (bold(x)-bold(mu)_i))) &= 0 \
// // $
2. #c[*(50 points)* In this problem, you will work on dimension reduction and classification on a Faces dataset from the UCI repository. We provided the processed files `face_train_data_960.txt` and `face_test_data_960.txt` with 500 and 124 images, respectively. Each image is of size 30 #sym.times 32 with the pixel values in a row in the files and the last column identifies the labels: 1 (sunglasses), and 0 (open) of the image. You can visualize the $i$th image with the following matlab command line:]

64
assignments/hwk03/EMG.m Normal file
View file

@ -0,0 +1,64 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Name: EMG.m
% Input: x - a nxd matrix (nx3 if using RGB)
% k - the number of clusters
% epochs - number of iterations (epochs) to run the algorithm for
% flag - flag to use improved EM to avoid singular covariance matrix
% Output: h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params
% m - a kxd matrix, the maximum likelihood estimate of the mean
% Q - vector of values of the complete data log-likelihood function
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [h, m, Q] = EMG(x, k, epochs, flag)
% variables
num_clusters = k; % number of clusters
eps = 1e-15; % small value that can be used to avoid obtaining 0's
lambda = 1e-3; % value for improved version of EM
[num_data, dim] = size(x);
h = zeros(num_data, num_clusters); % expectation of data point being part of a cluster
S = zeros(dim, dim, num_clusters); % covariance matrix for each cluster
b = zeros(num_data,num_clusters); % cluster assignments, only used for intialization of pi and S
Q = zeros(epochs*2,1); % vector that can hold complete data log-likelihood after each E and M step
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: Initialise cluster means using k-means
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: Determine the b values for all data points
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: Initialize pi's (mixing coefficients)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: Initialize the covariance matrix estimate
% further modifications will need to be made when doing 2(d)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Main EM loop
for n=1:epochs
%%%%%%%%%%%%%%%%
% E-step
%%%%%%%%%%%%%%%%
fprintf('E-step, epoch #%d\n', n);
[Q, h] = E_step(x, Q, h, pi, m, S, k);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: Store the value of the complete log-likelihood function
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%
% M-step
%%%%%%%%%%%%%%%%
fprintf('M-step, epoch #%d\n', n);
[Q, S, m] = M_step(x, Q, h, S, k);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: Store the value of the complete log-likelihood function
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end

View file

@ -0,0 +1,21 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Name: E_step.m
% Input: x - a nxd matrix (nx3 if using RGB)
% Q - vector of values from the complete data log-likelihood function
% h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params
% pi - vector of mixing coefficients
% m - cluster means
% S - cluster covariance matrices
% k - the number of clusters
% Output: Q - vector of values of the complete data log-likelihood function
% h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [Q, h] = E_step(x, Q, h, pi, m, S, k)
[num_data, ~] = size(x);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: perform E-step of EM algorithm
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end

View file

@ -0,0 +1,34 @@
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Name: E_step.m
% Input: x - a nxd matrix (nx3 if using RGB)
% Q - vector of values from the complete data log-likelihood function
% h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params
% S - cluster covariance matrices
% k - the number of clusters
% Output: Q - vector of values of the complete data log-likelihood function
% S - cluster covariance matrices
% m - cluster means
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [Q, S, m] = M_step(x, Q, h, S, k)
% get size of data
[num_data, dim] = size(x);
eps = 1e-15;
lambda = 1e-3; % value for improved version of EM
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: update mixing coefficients
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: update cluster means
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: Calculate the covariance matrix estimate
% further modifications will need to be made when doing 2(d)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end

View file

@ -0,0 +1,87 @@
function [] = Problem2()
% file names
stadium_fn = "stadium.jpg";
goldy_fn = "goldy.jpg";
% load image and preprocess it
goldy_img = double(imread(goldy_fn))/255;
stadium_img = double(imread(stadium_fn))/255;
% convert RGB images
goldy_x = reshape(permute(goldy_img, [2 1 3]), [], 3); % convert img from NxMx3 to N*Mx3
stadium_x = reshape(permute(stadium_img, [2 1 3]), [], 3);
% get dimensionality of stadium image
[height, width, depth] = size(stadium_img);
% set epochs (number of iterations to run algorithm for)
epochs = 10;
%%%%%%%%%%
% 2(a,b) %
%%%%%%%%%%
index = 1;
figure();
for k = 4:4:12
fprintf("k=%d\n", k);
% call EM on data
[h, m, Q] = EMG(stadium_x, k, epochs, false);
% get compressed version of image
[~,class_index] = max(h,[],2);
compress = m(class_index,:);
% 2(a), plot compressed image
subplot(3,2,index)
imagesc(permute(reshape(compress, [width, height, depth]),[2 1 3]))
index = index + 1;
% 2(b), plot complete data likelihood curves
subplot(3,2,index)
x = 1:size(Q);
c = repmat([1 0 0; 0 1 0], length(x)/2, 1);
scatter(x,Q,20,c);
index = index + 1;
end
shg
%%%%%%%%
% 2(c) %
%%%%%%%%
% get dimensionality of goldy image, and set k=7
[height, width, depth] = size(goldy_img);
k = 7;
% run EM on goldy image
[h, m, Q] = EMG(goldy_x, k, epochs, false);
% plot goldy image using clusters from EM
[~,class_index] = max(h,[],2);
compress = m(class_index,:);
figure();
subplot(2,1,1)
imagesc(permute(reshape(compress, [width, height, depth]),[2 1 3]))
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% TODO: plot goldy image after using clusters from k-means
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% begin code here
% end code here
shg
%%%%%%%%
% 2(e) %
%%%%%%%%
% run improved version of EM on goldy image
[h, m, Q] = EMG(goldy_x, k, epochs, true);
% plot goldy image using clusters from improved EM
[~,class_index] = max(h,[],2);
compress = m(class_index,:);
figure();
imagesc(permute(reshape(compress, [width, height, depth]),[2 1 3]))
shg
end

BIN
assignments/hwk03/goldy.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

View file

@ -0,0 +1,41 @@
#let dfrac(a, b) = $display(frac(#a, #b))$
= Problem 1a
Given:
#let ww = $bold(w)$
#let xx = $bold(x)$
#let vv = $bold(v)$
#let XX = $bold(X)$
- $E(ww_1,ww_2,vv|XX) = - sum_t r^t log y^t + (1 - r^t) log(1 - y^t)$
- $y^t = "sigmoid"(v_2 z_2 + v_1 z_1 + v_0)$
- $z^t_1 = "ReLU"(w_(1,2)x^t_2 + w_(1,1)x^t_1 + w_(1,0))$
- $z^t_2 = tanh(w_(2,2)x^t_2 + w_(2,1)x^t_1 + w_(2,0))$
Using the convention $x_(j=1..D)$, $y_(i=1..K)$, and $z_(h=1..H)$.
Solved as:
- $
frac(diff E, diff v_h) &= - sum_t frac(diff E, diff y^t) frac(diff y^t, diff v_h) \
&= - sum_t (r^t dot frac(1, y^t) - (1-r^t) dot frac(1, 1-y^t)) (y^t z^t_h (1-y^t)) \
&= - sum_t (frac(r^t, y^t) - frac(1-r^t, 1-y^t)) (y^t z^t_h (1-y^t)) \
&= - sum_t (frac(r^t (1-y^t)-y^t (1-r^t), cancel(y^t) (1-y^t))) (cancel(y^t) z^t_h (1-y^t)) \
&= - sum_t (frac(r^t - y^t, cancel(1-y^t))) (z^t_h cancel((1-y^t))) \
&= - sum_t (r^t - y^t) z^t_h \
$
- $
frac(diff E, diff w_(1,j)) &= - sum_t frac(diff E, diff y^t) frac(diff y^t, diff z^t_h) frac(diff z^t_h, diff w_(1,j)) \
&= - sum_t (frac(r^t, y^t) - frac(1-r^t, 1-y^t)) (y^t (1-y^t) v_h) (x_h cases(0 "if" ww_1 dot xx <0, 1 "otherwise")) \
&= - sum_t (r^t - y^t) v_h x_h cases(0 "if" ww_1 dot xx <0, 1 "otherwise") \
$
- $
frac(diff E, diff w_(2,j)) &= - sum_t frac(diff E, diff y^t) frac(diff y^t, diff z^t_h) frac(diff z^t_h, diff w_(2,j)) \
&= - sum_t (r^t - y^t) v_h x_h (1-tanh^2(ww_2 dot xx)) \
$
= Problem 1b

View file

@ -0,0 +1,52 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-1/(1 - y)\n"
]
}
],
"source": [
"from sympy import *\n",
"from sympy.abc import *\n",
"\n",
"print(diff(log(1-y), y))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 57 KiB