idgi

2023-10-22 01:04:22 -05:00 · 2023-10-22 01:04:22 -05:00 · 920475fdde
commit 920475fdde
parent 5e820c4227
5 changed files with 26 additions and 13 deletions
--- a/assignments/hwk02/Back_Project.m
+++ b/assignments/hwk02/Back_Project.m
@ -6,14 +6,15 @@ function [] = Back_Project(training_data, test_data, n_components)
  % stack data 
  data = vertcat(training_data, test_data);

-  % TODO: perform PCA
-  
+  % perform PCA
+  coeff = pca(data);
   
  % for each number of principal components
-  for n = 1:length(n_components)
+  for n_idx = 1:length(n_components)
+    n = n_components(n_idx);

    % TODO: perform the back projection algorithm using the first n_components(n) principal components
-    
+    W = coeff(:,1:n);

    % TODO: plot first 5 images back projected using the first
    % n_components(n) principal components
--- a/assignments/hwk02/HW2.typ
+++ b/assignments/hwk02/HW2.typ
@ -25,6 +25,8 @@

  c. #c[*(15 points)* Write the log likelihood function and derive $S_1$ and $S_2$ by maximum likelihood estimation of model 2. Note that since $S_1$ and $S_2$ are shared as $S$, you need to add the log likelihood function of the two classes to maximizing for deriving $S$.]

+
+
 2. #c[*(50 points)* In this problem, you will work on dimension reduction and classification on a Faces dataset from the UCI repository. We provided the processed files `face_train_data_960.txt` and `face_test_data_960.txt` with 500 and 124 images, respectively. Each image is of size 30 #sym.times 32 with the pixel values in a row in the files and the last column identifies the labels: 1 (sunglasses), and 0 (open) of the image. You can visualize the $i$th image with the following matlab command line:]

  ```matlab
@ -35,10 +37,12 @@

  a. #c[*(10 points)* Apply Principal Component Analysis (PCA) to find the principal components with combined training and test sets. First, visualize the first 5 eigen-faces using a similar command line as above. This can be accomplished by completing the _TODO_ comment headers in the `Eigenfaces.m` script.]

+  #figure(image("images/eigenfaces.png"))
+
  b. #c[*(20 points)* Generate a plot of proportion of variance (see Figure 6.4 (b) in the main textbook) on the training data, and select the minimum number ($K$) of eigenvectors that explain at least 90% of the variance. Show both the plot and $K$ in the report. This can be accomplished by completing the _TODO_ headers in the `ProportionOfVariance.m` script. Project the training and test data to the $K$ principal components and run KNN on the projected data for $k = {1, 3, 5, 7}$. Print out the error rate on the test set. Implement your own version of and K-Nearest Neighbor classifier (KNN) for this problem. Classify each test point using a majority rule i.e., by choosing the most common class among the $k$ training points it is closest to. In the case where two classes are equally as frequent, perform a tie-breaker by choosing whichever class has on average a smaller distance to the test point. This can be accomplished by completing the _TODO_ comment headers in the `KNN.m` and `KNN_Error.m` scripts.]

-    #figure(image("images/prop_var.png"))
+  #figure(image("images/prop_var.png"))

-    I used $K = 41$.
+  I used $K = 41$.

  c. #c[*(20 points)* Use the first $K = {10, 50, 100}$ principle components to approximate the first five images of the training set (first row of the data matrix) by projecting the centered data using the first $K$ principal components then "back project" (weighted sum of the components) to the original space and add the mean. For each $K$, plot the reconstructed image. This can be accomplished by completing the _TODO_ comment headers in the `Back_Project.m` script. Explain your observations in the report.]
--- a/assignments/hwk02/KNN.m
+++ b/assignments/hwk02/KNN.m
@ -13,7 +13,7 @@ function [test_err] = KNN(k, training_data, test_data, training_labels, test_lab

  % for each data point (row) in the test data
  for t = 1:n
-    % TODO: compute k-nearest neighbors for data point
+    % compute k-nearest neighbors for data point
    distances = pairwise_distance(:,t);
    [~, smallest_indexes] = sort(distances, 'ascend');
    smallest_k_indexes = smallest_indexes(1:k);
@ -25,20 +25,20 @@ function [test_err] = KNN(k, training_data, test_data, training_labels, test_lab
      distances_by_class(i,1) = class;
      distances_by_class(i,2) = mean(this_class_distances);
    end
-    distances_by_class_table = array2table(distances_by_class);

-    % TODO: classify test point using majority rule. Include tie-breaking
+    % classify test point using majority rule. Include tie-breaking
    % using whichever class is closer by distance. Fill in preds with the
    % predicted label.
    smallest_k_labels = training_labels(smallest_k_indexes);
    
+    % Try to resolve ties
    labels_by_count = tabulate(smallest_k_labels);
    labels_by_count_sorted = sortrows(labels_by_count, 2);
    most_frequent_label = labels_by_count_sorted(1,:);
    most_frequent_label_count = most_frequent_label(2);
    labels_that_have_most_frequent_count = labels_by_count_sorted(labels_by_count_sorted(:,2) == most_frequent_label_count,1);
    if length(labels_that_have_most_frequent_count) > 1
-      common_indexes = find(ismember(distances_by_class, labels_that_have_most_frequent_count));
+      common_indexes = ismember(distances_by_class, labels_that_have_most_frequent_count);
      common_distances = distances_by_class(common_indexes,:);
      sorted_distances = sortrows(common_distances,2);
      preds(t) = sorted_distances(1,1);
--- a/assignments/hwk02/KNN_Error.m
+++ b/assignments/hwk02/KNN_Error.m
@ -3,17 +3,25 @@
 function [] = KNN_Error(neigenvectors, ks, training_data, test_data, training_labels, test_labels)
  
  % perform PCA
+  data = vertcat(training_data, test_data);
+  coeff = pca(data);

+  % project data using the number of eigenvectors defined by neigenvectors
+  eigenvectors = coeff(:,1:neigenvectors);
+  projected_data = data * eigenvectors;

-  % TODO: project data using the number of eigenvectors defined by neigenvectors
+  % split matrix back out
+  training_rows = size(training_data, 1);
+  projected_training_data = projected_data(1:training_rows,:);
+  projected_test_data = projected_data(training_rows+1:end,:);

-  % TODO: compute test error for kNN with differents k's. Fill in
+  % compute test error for kNN with differents k's. Fill in
  % test_errors with the results for each k in ks.
  test_errors = zeros(1,length(ks));
  for i = 1:length(ks)
    k = ks(i);

-    test_errors(i) = KNN(k, training_data, test_data, training_labels, test_labels);
+    test_errors(i) = KNN(k, projected_training_data, projected_test_data, training_labels, test_labels);
  end

  % print error table
--- a/assignments/hwk02/images/eigenfaces.png
+++ b/assignments/hwk02/images/eigenfaces.png