hwk 3

2023-11-09 21:29:17 -06:00 · 2023-11-09 21:29:17 -06:00 · 15481ebb04
commit 15481ebb04
parent 301c76e4a1
11 changed files with 372 additions and 10 deletions
--- a/.envrc
+++ b/.envrc
@ -0,0 +1 @@
+layout python3
--- a/.gitignore
+++ b/.gitignore
@ -3,3 +3,6 @@
 *.pdf
 *.zip
 .DS_Store
+
+.venv
+.direnv
--- a/assignments/hwk02/HW2.typ
+++ b/assignments/hwk02/HW2.typ
@ -47,9 +47,68 @@
  - For dataset 2, model 1 worked the best.
  - For dataset 3, model 3 worked the best.

+  The separate covariance matrices work better when each class has its own individual dimensions, while the shared matrix works better when they're closer. The third model will work better when there's not a lot of data and the dependency may be inaccurate.
+
  c. #c[*(15 points)* Write the log likelihood function and derive $S_1$ and $S_2$ by maximum likelihood estimation of model 2. Note that since $S_1$ and $S_2$ are shared as $S$, you need to add the log likelihood function of the two classes to maximizing for deriving $S$.]

-  The maximum likelihood of a single 
+  Starting with priors, $P(C_1) = frac(N_1, N)$ and $P(C_2) = frac(N_2, N)$.
+
+  Supposing individual sample mean is $m_1 = frac(sum_t r_1^t x_1^t, N_1)$ and $m_2 = frac(sum_t r_2^t x_2^t, N_2)$, the combined mean $m$ can be found by
+
+  $ bold(m) = m_1 p(C_1) + m_2 p(C_2) $
+
+  Then, for the final $S$, we have:
+
+  #let wtf = $m_1 p(C_1) + m_2 p(C_2)$
+
+  $ s_i &= frac(1, N) sum_t (bold(x)^t - bold(m)) (bold(x)^t - bold(m))^T \
+  &= frac(1, N) sum_t (bold(x)^t (bold(x)^t)^T - bold(x)^t bold(m)^T - bold(m) (bold(x)^t)^T + bold(m) bold(m)^T) \
+  $
+
+  In our $S_i$, we have:
+
+  $ s_j &= frac(1, N_i) sum_t^N_i (bold(x)^t - bold(m)_i) (bold(x)^t - bold(m)_i)^T \
+  $
+
+  When we add that together as a weighted sum we get:
+
+  $ s_j &= frac(1, N) (sum_t^N_1 (bold(x)^t - bold(m)_1) (bold(x)^t - bold(m)_1)^T + sum_t^N_2 (bold(x)^t - bold(m)_2) (bold(x)^t - bold(m)_2)^T) \
+  &= frac(1, N) (sum_t^N bold(x)^t (bold(x)^t)^T - (sum_t^N_1 bold(x)^t bold(m)_1^T + sum_t^N_2 bold(x)^t bold(m)_2^T) - (sum_t^N_1 bold(m)_1 bold(x)^t + sum_t^N_2 bold(m)_2 bold(x)^t) + (sum_t^N_1 bold(m)_1 bold(m)_1^T + sum_t^N_2 bold(m)_2 bold(m)_2^T)) \
+  &= frac(1, N) (sum_t^N bold(x)^t (bold(x)^t)^T - bold(x)^t (sum_t^N_1 bold(m)_1^T + sum_t^N_2 bold(m)_2^T) - (sum_t^N_1 bold(m)_1 + sum_t^N_2 bold(m)_2) bold(x)^t + (sum_t^N_1 bold(m)_1 bold(m)_1^T + sum_t^N_2 bold(m)_2 bold(m)_2^T)) \
+  &= frac(1, N) sum_t^N (bold(x)^t (bold(x)^t)^T - bold(x)^t bold(m)^T - bold(m) (bold(x)^t)^T + bold(m) bold(m)^T) \
+  $
+
+  which matches the above equation
+
+
+  // Then:
+
+  // - for the sample covariance $S_1$, $s_(i i) = frac(sum_t (x_i^t - m_i)^2, P(C_1)) = frac(sum_t (x_i^t)^2 - 2x_i^t m_i + m_i^2, P(C_1))$ (with mean drawing from $m_1$)
+  // - for the sample covariance $S_2$, $s_(i i) = frac(sum_t (x_i^t - m_i)^2, P(C_2)) = frac(sum_t (x_i^t)^2 - 2x_i^t m_i + m_i^2, P(C_2))$ (with a different mean drawing from $m_2$)
+
+  // the overall covariance is $s_(i j) = frac(sum_t (x_i^t - m_1)(x_j^t - m_2), N)$
+
+  // deriving from $S_1$ and $S_2$ you get:
+
+  // $ s_(i j) &= frac(1,N) ((S_1)_(i j) P(C_1) + (S_2)_(i j) P(C_2)) \
+  // &= frac(1,N) (sum_t (x_i^t)^2 - 2x_i^t (m_1)_i + (m_1)_i^2 + (x_i^t)^2 - 2x_i^t (m_2)_i + (m_2)_i^2) \
+  // &= frac(1,N) (sum_t 2(x_i^t)^2 - 2x_i^t ((m_1)_i + (m_2)_i) + (m_1)_i^2 + (m_2)_i^2) \
+  // $
+
+  // Combining covariance matrixes:
+
+  // $ bold(S) &= frac(sum_t (bold(x)^t - bold(m)_1) (bold(x)^t - bold(m)_2)^T, N) \
+  // &= frac(sum_t bold(x)^t^T bold(x)^t - bold(x)^t bold(m)_2 - bold(m)_1 bold(x)^t + bold(m)_1 bold(m)_2, N) \
+  // $
+
+  // // The maximum likelihood of a single class can be found with:
+
+  // // $ theta^* &= "argmax"_theta cal(L) (theta|bold(X)) \
+  // // frac(diff, diff theta) cal(L) (theta|bold(X)) &= 0 \
+  // // frac(diff, diff theta) log(cal(l) (theta|bold(X))) &= 0 \
+  // // frac(diff, diff theta) sum_t log(p(x^t|theta)) &= 0 \
+  // // frac(diff, diff theta) sum_t log(frac(1,(2 pi)^(d/2) |bold(Sigma)_i|^(1/2)) exp(-frac(1,2)(bold(x)-bold(mu)_i)^T bold(Sigma)_i^(-1) (bold(x)-bold(mu)_i))) &= 0 \
+  // // $

 2. #c[*(50 points)* In this problem, you will work on dimension reduction and classification on a Faces dataset from the UCI repository. We provided the processed files `face_train_data_960.txt` and `face_test_data_960.txt` with 500 and 124 images, respectively. Each image is of size 30 #sym.times 32 with the pixel values in a row in the files and the last column identifies the labels: 1 (sunglasses), and 0 (open) of the image. You can visualize the $i$th image with the following matlab command line:]

--- a/assignments/hwk03/EMG.m
+++ b/assignments/hwk03/EMG.m
@ -0,0 +1,64 @@
+
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% Name: EMG.m
+% Input: x - a nxd matrix (nx3 if using RGB)
+%        k - the number of clusters
+%        epochs - number of iterations (epochs) to run the algorithm for
+%        flag - flag to use improved EM to avoid singular covariance matrix
+% Output: h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params
+%         m - a kxd matrix, the maximum likelihood estimate of the mean
+%         Q - vector of values of the complete data log-likelihood function
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+function [h, m, Q] = EMG(x, k, epochs, flag)
+    
+    % variables
+    num_clusters = k; % number of clusters
+    eps = 1e-15; % small value that can be used to avoid obtaining 0's
+    lambda = 1e-3; % value for improved version of EM
+    [num_data, dim] = size(x);
+    h = zeros(num_data, num_clusters); % expectation of data point being part of a cluster
+    S = zeros(dim, dim, num_clusters); % covariance matrix for each cluster
+    b = zeros(num_data,num_clusters); % cluster assignments, only used for intialization of pi and S 
+    Q = zeros(epochs*2,1); % vector that can hold complete data log-likelihood after each E and M step
+    
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    % TODO: Initialise cluster means using k-means
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+   
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    % TODO: Determine the b values for all data points
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+   
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    % TODO: Initialize pi's (mixing coefficients)
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    % TODO: Initialize the covariance matrix estimate
+    %       further modifications will need to be made when doing 2(d)
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    
+    % Main EM loop
+    for n=1:epochs
+        %%%%%%%%%%%%%%%% 
+        % E-step
+        %%%%%%%%%%%%%%%%
+        fprintf('E-step, epoch #%d\n', n);
+        [Q, h] =  E_step(x, Q, h, pi, m, S, k);
+        
+        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+        % TODO: Store the value of the complete log-likelihood function
+        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+        %%%%%%%%%%%%%%%%
+        % M-step
+        %%%%%%%%%%%%%%%%
+        fprintf('M-step, epoch #%d\n', n);
+        [Q, S, m] = M_step(x, Q, h, S, k);              
+        
+        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+        % TODO: Store the value of the complete log-likelihood function
+        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+
+end
--- a/assignments/hwk03/E_step.m
+++ b/assignments/hwk03/E_step.m
@ -0,0 +1,21 @@
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% Name: E_step.m
+% Input: x - a nxd matrix (nx3 if using RGB)
+%        Q - vector of values from the complete data log-likelihood function
+%        h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params
+%        pi - vector of mixing coefficients 
+%        m - cluster means
+%        S - cluster covariance matrices
+%        k - the number of clusters
+% Output: Q - vector of values of the complete data log-likelihood function
+%         h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+function [Q, h] = E_step(x, Q, h, pi, m, S, k)
+
+    [num_data, ~] = size(x);
+    
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    % TODO: perform E-step of EM algorithm
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+end
--- a/assignments/hwk03/M_step.m
+++ b/assignments/hwk03/M_step.m
@ -0,0 +1,34 @@
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% Name: E_step.m
+% Input: x - a nxd matrix (nx3 if using RGB)
+%        Q - vector of values from the complete data log-likelihood function
+%        h - a nxk matrix, the expectation of the hidden variable z given the data set and distribution params
+%        S - cluster covariance matrices
+%        k - the number of clusters
+% Output: Q - vector of values of the complete data log-likelihood function
+%         S - cluster covariance matrices
+%         m - cluster means
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+function [Q, S, m] = M_step(x, Q, h, S, k)
+     
+    % get size of data
+    [num_data, dim] = size(x);
+    eps = 1e-15;
+    lambda = 1e-3; % value for improved version of EM 
+
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    % TODO: update mixing coefficients
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    % TODO: update cluster means
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+    
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    % TODO: Calculate the covariance matrix estimate 
+    %       further modifications will need to be made when doing 2(d)
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+    
+end
--- a/assignments/hwk03/Problem2.m
+++ b/assignments/hwk03/Problem2.m
@ -0,0 +1,87 @@
+function [] = Problem2()
+
+    % file names
+    stadium_fn = "stadium.jpg";
+    goldy_fn = "goldy.jpg";
+
+    % load image and preprocess it
+    goldy_img = double(imread(goldy_fn))/255;
+    stadium_img = double(imread(stadium_fn))/255;
+    
+    % convert RGB images
+    goldy_x = reshape(permute(goldy_img, [2 1 3]), [], 3); % convert img from NxMx3 to N*Mx3
+    stadium_x = reshape(permute(stadium_img, [2 1 3]), [], 3);
+
+    % get dimensionality of stadium image
+    [height, width, depth] = size(stadium_img);
+
+    % set epochs (number of iterations to run algorithm for)
+    epochs = 10;
+
+    %%%%%%%%%%
+    % 2(a,b) %
+    %%%%%%%%%%
+    index = 1;
+    figure();
+    for k = 4:4:12 
+         fprintf("k=%d\n", k);
+    
+          % call EM on data
+         [h, m, Q] = EMG(stadium_x, k, epochs, false);
+     
+         % get compressed version of image
+         [~,class_index] = max(h,[],2);
+         compress = m(class_index,:);
+     
+         % 2(a), plot compressed image
+         subplot(3,2,index)
+         imagesc(permute(reshape(compress, [width, height, depth]),[2 1 3]))
+         index = index + 1;
+     
+         % 2(b), plot complete data likelihood curves
+         subplot(3,2,index)
+         x = 1:size(Q);
+         c = repmat([1 0 0; 0 1 0], length(x)/2, 1);
+         scatter(x,Q,20,c); 
+         index = index + 1;
+     end
+     shg
+
+    %%%%%%%%
+    % 2(c) %
+    %%%%%%%%
+    % get dimensionality of goldy image, and set k=7
+    [height, width, depth] = size(goldy_img);
+    k = 7;
+
+    % run EM on goldy image
+    [h, m, Q] = EMG(goldy_x, k, epochs, false);
+
+    % plot goldy image using clusters from EM
+    [~,class_index] = max(h,[],2);
+    compress = m(class_index,:);
+    figure();
+    subplot(2,1,1)
+    imagesc(permute(reshape(compress, [width, height, depth]),[2 1 3]))
+
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    % TODO: plot goldy image after using clusters from k-means
+    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+    % begin code here
+
+    % end code here 
+    shg 
+    
+    %%%%%%%%
+    % 2(e) %
+    %%%%%%%%
+    % run improved version of EM on goldy image 
+    [h, m, Q] = EMG(goldy_x, k, epochs, true);
+
+    % plot goldy image using clusters from improved EM
+    [~,class_index] = max(h,[],2);
+    compress = m(class_index,:);
+    figure();
+    imagesc(permute(reshape(compress, [width, height, depth]),[2 1 3]))
+    shg
+end
--- a/assignments/hwk03/goldy.jpg
+++ b/assignments/hwk03/goldy.jpg
--- a/assignments/hwk03/hw3_sol.typ
+++ b/assignments/hwk03/hw3_sol.typ
@ -0,0 +1,41 @@
+#let dfrac(a, b) = $display(frac(#a, #b))$
+
+= Problem 1a
+
+Given:
+
+#let ww = $bold(w)$
+#let xx = $bold(x)$
+#let vv = $bold(v)$
+#let XX = $bold(X)$
+
+- $E(ww_1,ww_2,vv|XX) = - sum_t r^t log y^t + (1 - r^t) log(1 - y^t)$
+- $y^t = "sigmoid"(v_2 z_2 + v_1 z_1 + v_0)$
+- $z^t_1 = "ReLU"(w_(1,2)x^t_2 + w_(1,1)x^t_1 + w_(1,0))$
+- $z^t_2 = tanh(w_(2,2)x^t_2 + w_(2,1)x^t_1 + w_(2,0))$
+
+Using the convention $x_(j=1..D)$, $y_(i=1..K)$, and $z_(h=1..H)$.
+
+Solved as:
+
+- $
+    frac(diff E, diff v_h) &= - sum_t frac(diff E, diff y^t) frac(diff y^t, diff v_h) \
+    &= - sum_t (r^t dot frac(1, y^t) - (1-r^t) dot frac(1, 1-y^t)) (y^t z^t_h (1-y^t)) \
+    &= - sum_t (frac(r^t, y^t) - frac(1-r^t, 1-y^t)) (y^t z^t_h (1-y^t)) \
+    &= - sum_t (frac(r^t (1-y^t)-y^t (1-r^t), cancel(y^t) (1-y^t))) (cancel(y^t) z^t_h (1-y^t)) \
+    &= - sum_t (frac(r^t - y^t, cancel(1-y^t))) (z^t_h cancel((1-y^t))) \
+    &= - sum_t (r^t - y^t) z^t_h \
+  $
+
+- $
+    frac(diff E, diff w_(1,j)) &= - sum_t frac(diff E, diff y^t) frac(diff y^t, diff z^t_h) frac(diff z^t_h, diff w_(1,j)) \
+    &= - sum_t (frac(r^t, y^t) - frac(1-r^t, 1-y^t)) (y^t (1-y^t) v_h) (x_h cases(0 "if" ww_1 dot xx <0, 1 "otherwise")) \
+    &= - sum_t (r^t - y^t) v_h x_h cases(0 "if" ww_1 dot xx <0, 1 "otherwise") \
+  $
+
+- $
+    frac(diff E, diff w_(2,j)) &= - sum_t frac(diff E, diff y^t) frac(diff y^t, diff z^t_h) frac(diff z^t_h, diff w_(2,j)) \
+    &= - sum_t (r^t - y^t) v_h x_h (1-tanh^2(ww_2 dot xx)) \
+  $
+
+= Problem 1b
--- a/assignments/hwk03/playground.ipynb
+++ b/assignments/hwk03/playground.ipynb
@ -0,0 +1,52 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "-1/(1 - y)\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sympy import *\n",
+    "from sympy.abc import *\n",
+    "\n",
+    "print(diff(log(1-y), y))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/assignments/hwk03/stadium.jpg
+++ b/assignments/hwk03/stadium.jpg