diff --git a/.envrc b/.envrc new file mode 100644 index 0000000..4e46a90 --- /dev/null +++ b/.envrc @@ -0,0 +1 @@ +layout python3 \ No newline at end of file diff --git a/.gitignore b/.gitignore index 036c568..1e2eda9 100644 --- a/.gitignore +++ b/.gitignore @@ -3,3 +3,6 @@ *.pdf *.zip .DS_Store + +.venv +.direnv \ No newline at end of file diff --git a/assignments/hwk02/HW2.typ b/assignments/hwk02/HW2.typ index 4409440..41e712d 100644 --- a/assignments/hwk02/HW2.typ +++ b/assignments/hwk02/HW2.typ @@ -24,19 +24,19 @@ ``` >> AllProblem1 Dataset 1: - Model 1: (train err = 5% ), (test error = 20% ) - Model 2: (train err = 6% ), (test error = 17% ) - Model 3: (train err = 7% ), (test error = 18% ) + Model 1: (train err = 5%), (test error = 20%) + Model 2: (train err = 6%), (test error = 17%) + Model 3: (train err = 7%), (test error = 18%) Dataset 2: - Model 1: (train err = 7% ), (test error = 23% ) - Model 2: (train err = 14% ), (test error = 56% ) - Model 3: (train err = 13% ), (test error = 53% ) + Model 1: (train err = 7%), (test error = 23%) + Model 2: (train err = 14%), (test error = 56%) + Model 3: (train err = 13%), (test error = 53%) Dataset 3: - Model 1: (train err = 1% ), (test error = 12% ) - Model 2: (train err = 19% ), (test error = 45% ) - Model 3: (train err = 2% ), (test error = 5% ) + Model 1: (train err = 1%), (test error = 12%) + Model 2: (train err = 19%), (test error = 45%) + Model 3: (train err = 2%), (test error = 5%) ``` b. #c[*(5 points)* State which model works best on each test data set and explain why you believe this is the case. Discuss your observations.] @@ -47,9 +47,68 @@ - For dataset 2, model 1 worked the best. - For dataset 3, model 3 worked the best. + The separate covariance matrices work better when each class has its own individual dimensions, while the shared matrix works better when they're closer. The third model will work better when there's not a lot of data and the dependency may be inaccurate. + c. #c[*(15 points)* Write the log likelihood function and derive $S_1$ and $S_2$ by maximum likelihood estimation of model 2. Note that since $S_1$ and $S_2$ are shared as $S$, you need to add the log likelihood function of the two classes to maximizing for deriving $S$.] - The maximum likelihood of a single + Starting with priors, $P(C_1) = frac(N_1, N)$ and $P(C_2) = frac(N_2, N)$. + + Supposing individual sample mean is $m_1 = frac(sum_t r_1^t x_1^t, N_1)$ and $m_2 = frac(sum_t r_2^t x_2^t, N_2)$, the combined mean $m$ can be found by + + $ bold(m) = m_1 p(C_1) + m_2 p(C_2) $ + + Then, for the final $S$, we have: + + #let wtf = $m_1 p(C_1) + m_2 p(C_2)$ + + $ s_i &= frac(1, N) sum_t (bold(x)^t - bold(m)) (bold(x)^t - bold(m))^T \ + &= frac(1, N) sum_t (bold(x)^t (bold(x)^t)^T - bold(x)^t bold(m)^T - bold(m) (bold(x)^t)^T + bold(m) bold(m)^T) \ + $ + + In our $S_i$, we have: + + $ s_j &= frac(1, N_i) sum_t^N_i (bold(x)^t - bold(m)_i) (bold(x)^t - bold(m)_i)^T \ + $ + + When we add that together as a weighted sum we get: + + $ s_j &= frac(1, N) (sum_t^N_1 (bold(x)^t - bold(m)_1) (bold(x)^t - bold(m)_1)^T + sum_t^N_2 (bold(x)^t - bold(m)_2) (bold(x)^t - bold(m)_2)^T) \ + &= frac(1, N) (sum_t^N bold(x)^t (bold(x)^t)^T - (sum_t^N_1 bold(x)^t bold(m)_1^T + sum_t^N_2 bold(x)^t bold(m)_2^T) - (sum_t^N_1 bold(m)_1 bold(x)^t + sum_t^N_2 bold(m)_2 bold(x)^t) + (sum_t^N_1 bold(m)_1 bold(m)_1^T + sum_t^N_2 bold(m)_2 bold(m)_2^T)) \ + &= frac(1, N) (sum_t^N bold(x)^t (bold(x)^t)^T - bold(x)^t (sum_t^N_1 bold(m)_1^T + sum_t^N_2 bold(m)_2^T) - (sum_t^N_1 bold(m)_1 + sum_t^N_2 bold(m)_2) bold(x)^t + (sum_t^N_1 bold(m)_1 bold(m)_1^T + sum_t^N_2 bold(m)_2 bold(m)_2^T)) \ + &= frac(1, N) sum_t^N (bold(x)^t (bold(x)^t)^T - bold(x)^t bold(m)^T - bold(m) (bold(x)^t)^T + bold(m) bold(m)^T) \ + $ + + which matches the above equation + + + // Then: + + // - for the sample covariance $S_1$, $s_(i i) = frac(sum_t (x_i^t - m_i)^2, P(C_1)) = frac(sum_t (x_i^t)^2 - 2x_i^t m_i + m_i^2, P(C_1))$ (with mean drawing from $m_1$) + // - for the sample covariance $S_2$, $s_(i i) = frac(sum_t (x_i^t - m_i)^2, P(C_2)) = frac(sum_t (x_i^t)^2 - 2x_i^t m_i + m_i^2, P(C_2))$ (with a different mean drawing from $m_2$) + + // the overall covariance is $s_(i j) = frac(sum_t (x_i^t - m_1)(x_j^t - m_2), N)$ + + // deriving from $S_1$ and $S_2$ you get: + + // $ s_(i j) &= frac(1,N) ((S_1)_(i j) P(C_1) + (S_2)_(i j) P(C_2)) \ + // &= frac(1,N) (sum_t (x_i^t)^2 - 2x_i^t (m_1)_i + (m_1)_i^2 + (x_i^t)^2 - 2x_i^t (m_2)_i + (m_2)_i^2) \ + // &= frac(1,N) (sum_t 2(x_i^t)^2 - 2x_i^t ((m_1)_i + (m_2)_i) + (m_1)_i^2 + (m_2)_i^2) \ + // $ + + // Combining covariance matrixes: + + // $ bold(S) &= frac(sum_t (bold(x)^t - bold(m)_1) (bold(x)^t - bold(m)_2)^T, N) \ + // &= frac(sum_t bold(x)^t^T bold(x)^t - bold(x)^t bold(m)_2 - bold(m)_1 bold(x)^t + bold(m)_1 bold(m)_2, N) \ + // $ + + // // The maximum likelihood of a single class can be found with: + + // // $ theta^* &= "argmax"_theta cal(L) (theta|bold(X)) \ + // // frac(diff, diff theta) cal(L) (theta|bold(X)) &= 0 \ + // // frac(diff, diff theta) log(cal(l) (theta|bold(X))) &= 0 \ + // // frac(diff, diff theta) sum_t log(p(x^t|theta)) &= 0 \ + // // frac(diff, diff theta) sum_t log(frac(1,(2 pi)^(d/2) |bold(Sigma)_i|^(1/2)) exp(-frac(1,2)(bold(x)-bold(mu)_i)^T bold(Sigma)_i^(-1) (bold(x)-bold(mu)_i))) &= 0 \ + // // $ 2. #c[*(50 points)* In this problem, you will work on dimension reduction and classification on a Faces dataset from the UCI repository. We provided the processed files `face_train_data_960.txt` and `face_test_data_960.txt` with 500 and 124 images, respectively. Each image is of size 30 #sym.times 32 with the pixel values in a row in the files and the last column identifies the labels: 1 (sunglasses), and 0 (open) of the image. You can visualize the $i$th image with the following matlab command line:] diff --git a/assignments/hwk03/EMG.m b/assignments/hwk03/EMG.m new file mode 100644 index 0000000..2b96290 --- /dev/null +++ b/assignments/hwk03/EMG.m @@ -0,0 +1,64 @@ + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +% Name: EMG.m +% Input: x - a nxd matrix (nx3 if using RGB) +% k - the number of clusters +% epochs - number of iterations (epochs) to run the algorithm for +% flag - flag to use improved EM to avoid singular covariance matrix +% Output: h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params +% m - a kxd matrix, the maximum likelihood estimate of the mean +% Q - vector of values of the complete data log-likelihood function +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +function [h, m, Q] = EMG(x, k, epochs, flag) + + % variables + num_clusters = k; % number of clusters + eps = 1e-15; % small value that can be used to avoid obtaining 0's + lambda = 1e-3; % value for improved version of EM + [num_data, dim] = size(x); + h = zeros(num_data, num_clusters); % expectation of data point being part of a cluster + S = zeros(dim, dim, num_clusters); % covariance matrix for each cluster + b = zeros(num_data,num_clusters); % cluster assignments, only used for intialization of pi and S + Q = zeros(epochs*2,1); % vector that can hold complete data log-likelihood after each E and M step + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: Initialise cluster means using k-means + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: Determine the b values for all data points + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: Initialize pi's (mixing coefficients) + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: Initialize the covariance matrix estimate + % further modifications will need to be made when doing 2(d) + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + % Main EM loop + for n=1:epochs + %%%%%%%%%%%%%%%% + % E-step + %%%%%%%%%%%%%%%% + fprintf('E-step, epoch #%d\n', n); + [Q, h] = E_step(x, Q, h, pi, m, S, k); + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: Store the value of the complete log-likelihood function + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + %%%%%%%%%%%%%%%% + % M-step + %%%%%%%%%%%%%%%% + fprintf('M-step, epoch #%d\n', n); + [Q, S, m] = M_step(x, Q, h, S, k); + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: Store the value of the complete log-likelihood function + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + +end \ No newline at end of file diff --git a/assignments/hwk03/E_step.m b/assignments/hwk03/E_step.m new file mode 100644 index 0000000..460270f --- /dev/null +++ b/assignments/hwk03/E_step.m @@ -0,0 +1,21 @@ +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +% Name: E_step.m +% Input: x - a nxd matrix (nx3 if using RGB) +% Q - vector of values from the complete data log-likelihood function +% h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params +% pi - vector of mixing coefficients +% m - cluster means +% S - cluster covariance matrices +% k - the number of clusters +% Output: Q - vector of values of the complete data log-likelihood function +% h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +function [Q, h] = E_step(x, Q, h, pi, m, S, k) + + [num_data, ~] = size(x); + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: perform E-step of EM algorithm + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +end \ No newline at end of file diff --git a/assignments/hwk03/M_step.m b/assignments/hwk03/M_step.m new file mode 100644 index 0000000..4505b1a --- /dev/null +++ b/assignments/hwk03/M_step.m @@ -0,0 +1,34 @@ +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +% Name: E_step.m +% Input: x - a nxd matrix (nx3 if using RGB) +% Q - vector of values from the complete data log-likelihood function +% h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params +% S - cluster covariance matrices +% k - the number of clusters +% Output: Q - vector of values of the complete data log-likelihood function +% S - cluster covariance matrices +% m - cluster means +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% +function [Q, S, m] = M_step(x, Q, h, S, k) + + % get size of data + [num_data, dim] = size(x); + eps = 1e-15; + lambda = 1e-3; % value for improved version of EM + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: update mixing coefficients + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: update cluster means + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: Calculate the covariance matrix estimate + % further modifications will need to be made when doing 2(d) + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + + +end \ No newline at end of file diff --git a/assignments/hwk03/Problem2.m b/assignments/hwk03/Problem2.m new file mode 100644 index 0000000..447304f --- /dev/null +++ b/assignments/hwk03/Problem2.m @@ -0,0 +1,87 @@ +function [] = Problem2() + + % file names + stadium_fn = "stadium.jpg"; + goldy_fn = "goldy.jpg"; + + % load image and preprocess it + goldy_img = double(imread(goldy_fn))/255; + stadium_img = double(imread(stadium_fn))/255; + + % convert RGB images + goldy_x = reshape(permute(goldy_img, [2 1 3]), [], 3); % convert img from NxMx3 to N*Mx3 + stadium_x = reshape(permute(stadium_img, [2 1 3]), [], 3); + + % get dimensionality of stadium image + [height, width, depth] = size(stadium_img); + + % set epochs (number of iterations to run algorithm for) + epochs = 10; + + %%%%%%%%%% + % 2(a,b) % + %%%%%%%%%% + index = 1; + figure(); + for k = 4:4:12 + fprintf("k=%d\n", k); + + % call EM on data + [h, m, Q] = EMG(stadium_x, k, epochs, false); + + % get compressed version of image + [~,class_index] = max(h,[],2); + compress = m(class_index,:); + + % 2(a), plot compressed image + subplot(3,2,index) + imagesc(permute(reshape(compress, [width, height, depth]),[2 1 3])) + index = index + 1; + + % 2(b), plot complete data likelihood curves + subplot(3,2,index) + x = 1:size(Q); + c = repmat([1 0 0; 0 1 0], length(x)/2, 1); + scatter(x,Q,20,c); + index = index + 1; + end + shg + + %%%%%%%% + % 2(c) % + %%%%%%%% + % get dimensionality of goldy image, and set k=7 + [height, width, depth] = size(goldy_img); + k = 7; + + % run EM on goldy image + [h, m, Q] = EMG(goldy_x, k, epochs, false); + + % plot goldy image using clusters from EM + [~,class_index] = max(h,[],2); + compress = m(class_index,:); + figure(); + subplot(2,1,1) + imagesc(permute(reshape(compress, [width, height, depth]),[2 1 3])) + + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % TODO: plot goldy image after using clusters from k-means + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + % begin code here + + % end code here + shg + + %%%%%%%% + % 2(e) % + %%%%%%%% + % run improved version of EM on goldy image + [h, m, Q] = EMG(goldy_x, k, epochs, true); + + % plot goldy image using clusters from improved EM + [~,class_index] = max(h,[],2); + compress = m(class_index,:); + figure(); + imagesc(permute(reshape(compress, [width, height, depth]),[2 1 3])) + shg +end \ No newline at end of file diff --git a/assignments/hwk03/goldy.jpg b/assignments/hwk03/goldy.jpg new file mode 100644 index 0000000..5f045da Binary files /dev/null and b/assignments/hwk03/goldy.jpg differ diff --git a/assignments/hwk03/hw3_sol.typ b/assignments/hwk03/hw3_sol.typ new file mode 100644 index 0000000..80dc96a --- /dev/null +++ b/assignments/hwk03/hw3_sol.typ @@ -0,0 +1,41 @@ +#let dfrac(a, b) = $display(frac(#a, #b))$ + += Problem 1a + +Given: + +#let ww = $bold(w)$ +#let xx = $bold(x)$ +#let vv = $bold(v)$ +#let XX = $bold(X)$ + +- $E(ww_1,ww_2,vv|XX) = - sum_t r^t log y^t + (1 - r^t) log(1 - y^t)$ +- $y^t = "sigmoid"(v_2 z_2 + v_1 z_1 + v_0)$ +- $z^t_1 = "ReLU"(w_(1,2)x^t_2 + w_(1,1)x^t_1 + w_(1,0))$ +- $z^t_2 = tanh(w_(2,2)x^t_2 + w_(2,1)x^t_1 + w_(2,0))$ + +Using the convention $x_(j=1..D)$, $y_(i=1..K)$, and $z_(h=1..H)$. + +Solved as: + +- $ + frac(diff E, diff v_h) &= - sum_t frac(diff E, diff y^t) frac(diff y^t, diff v_h) \ + &= - sum_t (r^t dot frac(1, y^t) - (1-r^t) dot frac(1, 1-y^t)) (y^t z^t_h (1-y^t)) \ + &= - sum_t (frac(r^t, y^t) - frac(1-r^t, 1-y^t)) (y^t z^t_h (1-y^t)) \ + &= - sum_t (frac(r^t (1-y^t)-y^t (1-r^t), cancel(y^t) (1-y^t))) (cancel(y^t) z^t_h (1-y^t)) \ + &= - sum_t (frac(r^t - y^t, cancel(1-y^t))) (z^t_h cancel((1-y^t))) \ + &= - sum_t (r^t - y^t) z^t_h \ + $ + +- $ + frac(diff E, diff w_(1,j)) &= - sum_t frac(diff E, diff y^t) frac(diff y^t, diff z^t_h) frac(diff z^t_h, diff w_(1,j)) \ + &= - sum_t (frac(r^t, y^t) - frac(1-r^t, 1-y^t)) (y^t (1-y^t) v_h) (x_h cases(0 "if" ww_1 dot xx <0, 1 "otherwise")) \ + &= - sum_t (r^t - y^t) v_h x_h cases(0 "if" ww_1 dot xx <0, 1 "otherwise") \ + $ + +- $ + frac(diff E, diff w_(2,j)) &= - sum_t frac(diff E, diff y^t) frac(diff y^t, diff z^t_h) frac(diff z^t_h, diff w_(2,j)) \ + &= - sum_t (r^t - y^t) v_h x_h (1-tanh^2(ww_2 dot xx)) \ + $ + += Problem 1b diff --git a/assignments/hwk03/playground.ipynb b/assignments/hwk03/playground.ipynb new file mode 100644 index 0000000..7a7d71b --- /dev/null +++ b/assignments/hwk03/playground.ipynb @@ -0,0 +1,52 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "-1/(1 - y)\n" + ] + } + ], + "source": [ + "from sympy import *\n", + "from sympy.abc import *\n", + "\n", + "print(diff(log(1-y), y))\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/assignments/hwk03/stadium.jpg b/assignments/hwk03/stadium.jpg new file mode 100644 index 0000000..72b6592 Binary files /dev/null and b/assignments/hwk03/stadium.jpg differ