This commit is contained in:
Michael Zhang 2023-10-22 01:04:22 -05:00
parent 5e820c4227
commit 920475fdde
5 changed files with 26 additions and 13 deletions

View file

@ -6,14 +6,15 @@ function [] = Back_Project(training_data, test_data, n_components)
% stack data % stack data
data = vertcat(training_data, test_data); data = vertcat(training_data, test_data);
% TODO: perform PCA % perform PCA
coeff = pca(data);
% for each number of principal components % for each number of principal components
for n = 1:length(n_components) for n_idx = 1:length(n_components)
n = n_components(n_idx);
% TODO: perform the back projection algorithm using the first n_components(n) principal components % TODO: perform the back projection algorithm using the first n_components(n) principal components
W = coeff(:,1:n);
% TODO: plot first 5 images back projected using the first % TODO: plot first 5 images back projected using the first
% n_components(n) principal components % n_components(n) principal components

View file

@ -25,6 +25,8 @@
c. #c[*(15 points)* Write the log likelihood function and derive $S_1$ and $S_2$ by maximum likelihood estimation of model 2. Note that since $S_1$ and $S_2$ are shared as $S$, you need to add the log likelihood function of the two classes to maximizing for deriving $S$.] c. #c[*(15 points)* Write the log likelihood function and derive $S_1$ and $S_2$ by maximum likelihood estimation of model 2. Note that since $S_1$ and $S_2$ are shared as $S$, you need to add the log likelihood function of the two classes to maximizing for deriving $S$.]
2. #c[*(50 points)* In this problem, you will work on dimension reduction and classification on a Faces dataset from the UCI repository. We provided the processed files `face_train_data_960.txt` and `face_test_data_960.txt` with 500 and 124 images, respectively. Each image is of size 30 #sym.times 32 with the pixel values in a row in the files and the last column identifies the labels: 1 (sunglasses), and 0 (open) of the image. You can visualize the $i$th image with the following matlab command line:] 2. #c[*(50 points)* In this problem, you will work on dimension reduction and classification on a Faces dataset from the UCI repository. We provided the processed files `face_train_data_960.txt` and `face_test_data_960.txt` with 500 and 124 images, respectively. Each image is of size 30 #sym.times 32 with the pixel values in a row in the files and the last column identifies the labels: 1 (sunglasses), and 0 (open) of the image. You can visualize the $i$th image with the following matlab command line:]
```matlab ```matlab
@ -35,6 +37,8 @@
a. #c[*(10 points)* Apply Principal Component Analysis (PCA) to find the principal components with combined training and test sets. First, visualize the first 5 eigen-faces using a similar command line as above. This can be accomplished by completing the _TODO_ comment headers in the `Eigenfaces.m` script.] a. #c[*(10 points)* Apply Principal Component Analysis (PCA) to find the principal components with combined training and test sets. First, visualize the first 5 eigen-faces using a similar command line as above. This can be accomplished by completing the _TODO_ comment headers in the `Eigenfaces.m` script.]
#figure(image("images/eigenfaces.png"))
b. #c[*(20 points)* Generate a plot of proportion of variance (see Figure 6.4 (b) in the main textbook) on the training data, and select the minimum number ($K$) of eigenvectors that explain at least 90% of the variance. Show both the plot and $K$ in the report. This can be accomplished by completing the _TODO_ headers in the `ProportionOfVariance.m` script. Project the training and test data to the $K$ principal components and run KNN on the projected data for $k = {1, 3, 5, 7}$. Print out the error rate on the test set. Implement your own version of and K-Nearest Neighbor classifier (KNN) for this problem. Classify each test point using a majority rule i.e., by choosing the most common class among the $k$ training points it is closest to. In the case where two classes are equally as frequent, perform a tie-breaker by choosing whichever class has on average a smaller distance to the test point. This can be accomplished by completing the _TODO_ comment headers in the `KNN.m` and `KNN_Error.m` scripts.] b. #c[*(20 points)* Generate a plot of proportion of variance (see Figure 6.4 (b) in the main textbook) on the training data, and select the minimum number ($K$) of eigenvectors that explain at least 90% of the variance. Show both the plot and $K$ in the report. This can be accomplished by completing the _TODO_ headers in the `ProportionOfVariance.m` script. Project the training and test data to the $K$ principal components and run KNN on the projected data for $k = {1, 3, 5, 7}$. Print out the error rate on the test set. Implement your own version of and K-Nearest Neighbor classifier (KNN) for this problem. Classify each test point using a majority rule i.e., by choosing the most common class among the $k$ training points it is closest to. In the case where two classes are equally as frequent, perform a tie-breaker by choosing whichever class has on average a smaller distance to the test point. This can be accomplished by completing the _TODO_ comment headers in the `KNN.m` and `KNN_Error.m` scripts.]
#figure(image("images/prop_var.png")) #figure(image("images/prop_var.png"))

View file

@ -13,7 +13,7 @@ function [test_err] = KNN(k, training_data, test_data, training_labels, test_lab
% for each data point (row) in the test data % for each data point (row) in the test data
for t = 1:n for t = 1:n
% TODO: compute k-nearest neighbors for data point % compute k-nearest neighbors for data point
distances = pairwise_distance(:,t); distances = pairwise_distance(:,t);
[~, smallest_indexes] = sort(distances, 'ascend'); [~, smallest_indexes] = sort(distances, 'ascend');
smallest_k_indexes = smallest_indexes(1:k); smallest_k_indexes = smallest_indexes(1:k);
@ -25,20 +25,20 @@ function [test_err] = KNN(k, training_data, test_data, training_labels, test_lab
distances_by_class(i,1) = class; distances_by_class(i,1) = class;
distances_by_class(i,2) = mean(this_class_distances); distances_by_class(i,2) = mean(this_class_distances);
end end
distances_by_class_table = array2table(distances_by_class);
% TODO: classify test point using majority rule. Include tie-breaking % classify test point using majority rule. Include tie-breaking
% using whichever class is closer by distance. Fill in preds with the % using whichever class is closer by distance. Fill in preds with the
% predicted label. % predicted label.
smallest_k_labels = training_labels(smallest_k_indexes); smallest_k_labels = training_labels(smallest_k_indexes);
% Try to resolve ties
labels_by_count = tabulate(smallest_k_labels); labels_by_count = tabulate(smallest_k_labels);
labels_by_count_sorted = sortrows(labels_by_count, 2); labels_by_count_sorted = sortrows(labels_by_count, 2);
most_frequent_label = labels_by_count_sorted(1,:); most_frequent_label = labels_by_count_sorted(1,:);
most_frequent_label_count = most_frequent_label(2); most_frequent_label_count = most_frequent_label(2);
labels_that_have_most_frequent_count = labels_by_count_sorted(labels_by_count_sorted(:,2) == most_frequent_label_count,1); labels_that_have_most_frequent_count = labels_by_count_sorted(labels_by_count_sorted(:,2) == most_frequent_label_count,1);
if length(labels_that_have_most_frequent_count) > 1 if length(labels_that_have_most_frequent_count) > 1
common_indexes = find(ismember(distances_by_class, labels_that_have_most_frequent_count)); common_indexes = ismember(distances_by_class, labels_that_have_most_frequent_count);
common_distances = distances_by_class(common_indexes,:); common_distances = distances_by_class(common_indexes,:);
sorted_distances = sortrows(common_distances,2); sorted_distances = sortrows(common_distances,2);
preds(t) = sorted_distances(1,1); preds(t) = sorted_distances(1,1);

View file

@ -3,17 +3,25 @@
function [] = KNN_Error(neigenvectors, ks, training_data, test_data, training_labels, test_labels) function [] = KNN_Error(neigenvectors, ks, training_data, test_data, training_labels, test_labels)
% perform PCA % perform PCA
data = vertcat(training_data, test_data);
coeff = pca(data);
% project data using the number of eigenvectors defined by neigenvectors
eigenvectors = coeff(:,1:neigenvectors);
projected_data = data * eigenvectors;
% TODO: project data using the number of eigenvectors defined by neigenvectors % split matrix back out
training_rows = size(training_data, 1);
projected_training_data = projected_data(1:training_rows,:);
projected_test_data = projected_data(training_rows+1:end,:);
% TODO: compute test error for kNN with differents k's. Fill in % compute test error for kNN with differents k's. Fill in
% test_errors with the results for each k in ks. % test_errors with the results for each k in ks.
test_errors = zeros(1,length(ks)); test_errors = zeros(1,length(ks));
for i = 1:length(ks) for i = 1:length(ks)
k = ks(i); k = ks(i);
test_errors(i) = KNN(k, training_data, test_data, training_labels, test_labels); test_errors(i) = KNN(k, projected_training_data, projected_test_data, training_labels, test_labels);
end end
% print error table % print error table

Binary file not shown.

After

Width:  |  Height:  |  Size: 147 KiB