idgi
This commit is contained in:
parent
5e820c4227
commit
920475fdde
5 changed files with 26 additions and 13 deletions
|
@ -6,14 +6,15 @@ function [] = Back_Project(training_data, test_data, n_components)
|
|||
% stack data
|
||||
data = vertcat(training_data, test_data);
|
||||
|
||||
% TODO: perform PCA
|
||||
|
||||
% perform PCA
|
||||
coeff = pca(data);
|
||||
|
||||
% for each number of principal components
|
||||
for n = 1:length(n_components)
|
||||
for n_idx = 1:length(n_components)
|
||||
n = n_components(n_idx);
|
||||
|
||||
% TODO: perform the back projection algorithm using the first n_components(n) principal components
|
||||
|
||||
W = coeff(:,1:n);
|
||||
|
||||
% TODO: plot first 5 images back projected using the first
|
||||
% n_components(n) principal components
|
||||
|
|
|
@ -25,6 +25,8 @@
|
|||
|
||||
c. #c[*(15 points)* Write the log likelihood function and derive $S_1$ and $S_2$ by maximum likelihood estimation of model 2. Note that since $S_1$ and $S_2$ are shared as $S$, you need to add the log likelihood function of the two classes to maximizing for deriving $S$.]
|
||||
|
||||
|
||||
|
||||
2. #c[*(50 points)* In this problem, you will work on dimension reduction and classification on a Faces dataset from the UCI repository. We provided the processed files `face_train_data_960.txt` and `face_test_data_960.txt` with 500 and 124 images, respectively. Each image is of size 30 #sym.times 32 with the pixel values in a row in the files and the last column identifies the labels: 1 (sunglasses), and 0 (open) of the image. You can visualize the $i$th image with the following matlab command line:]
|
||||
|
||||
```matlab
|
||||
|
@ -35,10 +37,12 @@
|
|||
|
||||
a. #c[*(10 points)* Apply Principal Component Analysis (PCA) to find the principal components with combined training and test sets. First, visualize the first 5 eigen-faces using a similar command line as above. This can be accomplished by completing the _TODO_ comment headers in the `Eigenfaces.m` script.]
|
||||
|
||||
#figure(image("images/eigenfaces.png"))
|
||||
|
||||
b. #c[*(20 points)* Generate a plot of proportion of variance (see Figure 6.4 (b) in the main textbook) on the training data, and select the minimum number ($K$) of eigenvectors that explain at least 90% of the variance. Show both the plot and $K$ in the report. This can be accomplished by completing the _TODO_ headers in the `ProportionOfVariance.m` script. Project the training and test data to the $K$ principal components and run KNN on the projected data for $k = {1, 3, 5, 7}$. Print out the error rate on the test set. Implement your own version of and K-Nearest Neighbor classifier (KNN) for this problem. Classify each test point using a majority rule i.e., by choosing the most common class among the $k$ training points it is closest to. In the case where two classes are equally as frequent, perform a tie-breaker by choosing whichever class has on average a smaller distance to the test point. This can be accomplished by completing the _TODO_ comment headers in the `KNN.m` and `KNN_Error.m` scripts.]
|
||||
|
||||
#figure(image("images/prop_var.png"))
|
||||
#figure(image("images/prop_var.png"))
|
||||
|
||||
I used $K = 41$.
|
||||
I used $K = 41$.
|
||||
|
||||
c. #c[*(20 points)* Use the first $K = {10, 50, 100}$ principle components to approximate the first five images of the training set (first row of the data matrix) by projecting the centered data using the first $K$ principal components then "back project" (weighted sum of the components) to the original space and add the mean. For each $K$, plot the reconstructed image. This can be accomplished by completing the _TODO_ comment headers in the `Back_Project.m` script. Explain your observations in the report.]
|
|
@ -13,7 +13,7 @@ function [test_err] = KNN(k, training_data, test_data, training_labels, test_lab
|
|||
|
||||
% for each data point (row) in the test data
|
||||
for t = 1:n
|
||||
% TODO: compute k-nearest neighbors for data point
|
||||
% compute k-nearest neighbors for data point
|
||||
distances = pairwise_distance(:,t);
|
||||
[~, smallest_indexes] = sort(distances, 'ascend');
|
||||
smallest_k_indexes = smallest_indexes(1:k);
|
||||
|
@ -25,20 +25,20 @@ function [test_err] = KNN(k, training_data, test_data, training_labels, test_lab
|
|||
distances_by_class(i,1) = class;
|
||||
distances_by_class(i,2) = mean(this_class_distances);
|
||||
end
|
||||
distances_by_class_table = array2table(distances_by_class);
|
||||
|
||||
% TODO: classify test point using majority rule. Include tie-breaking
|
||||
% classify test point using majority rule. Include tie-breaking
|
||||
% using whichever class is closer by distance. Fill in preds with the
|
||||
% predicted label.
|
||||
smallest_k_labels = training_labels(smallest_k_indexes);
|
||||
|
||||
% Try to resolve ties
|
||||
labels_by_count = tabulate(smallest_k_labels);
|
||||
labels_by_count_sorted = sortrows(labels_by_count, 2);
|
||||
most_frequent_label = labels_by_count_sorted(1,:);
|
||||
most_frequent_label_count = most_frequent_label(2);
|
||||
labels_that_have_most_frequent_count = labels_by_count_sorted(labels_by_count_sorted(:,2) == most_frequent_label_count,1);
|
||||
if length(labels_that_have_most_frequent_count) > 1
|
||||
common_indexes = find(ismember(distances_by_class, labels_that_have_most_frequent_count));
|
||||
common_indexes = ismember(distances_by_class, labels_that_have_most_frequent_count);
|
||||
common_distances = distances_by_class(common_indexes,:);
|
||||
sorted_distances = sortrows(common_distances,2);
|
||||
preds(t) = sorted_distances(1,1);
|
||||
|
|
|
@ -3,17 +3,25 @@
|
|||
function [] = KNN_Error(neigenvectors, ks, training_data, test_data, training_labels, test_labels)
|
||||
|
||||
% perform PCA
|
||||
data = vertcat(training_data, test_data);
|
||||
coeff = pca(data);
|
||||
|
||||
% project data using the number of eigenvectors defined by neigenvectors
|
||||
eigenvectors = coeff(:,1:neigenvectors);
|
||||
projected_data = data * eigenvectors;
|
||||
|
||||
% TODO: project data using the number of eigenvectors defined by neigenvectors
|
||||
% split matrix back out
|
||||
training_rows = size(training_data, 1);
|
||||
projected_training_data = projected_data(1:training_rows,:);
|
||||
projected_test_data = projected_data(training_rows+1:end,:);
|
||||
|
||||
% TODO: compute test error for kNN with differents k's. Fill in
|
||||
% compute test error for kNN with differents k's. Fill in
|
||||
% test_errors with the results for each k in ks.
|
||||
test_errors = zeros(1,length(ks));
|
||||
for i = 1:length(ks)
|
||||
k = ks(i);
|
||||
|
||||
test_errors(i) = KNN(k, training_data, test_data, training_labels, test_labels);
|
||||
test_errors(i) = KNN(k, projected_training_data, projected_test_data, training_labels, test_labels);
|
||||
end
|
||||
|
||||
% print error table
|
||||
|
|
BIN
assignments/hwk02/images/eigenfaces.png
Normal file
BIN
assignments/hwk02/images/eigenfaces.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 147 KiB |
Loading…
Reference in a new issue