upd

2023-10-25 08:58:03 -05:00 · 2023-10-25 08:58:03 -05:00 · 301c76e4a1
commit 301c76e4a1
parent 0fc9d8a01f
1 changed files with 21 additions and 12 deletions
--- a/assignments/hwk02/HW2.typ
+++ b/assignments/hwk02/HW2.typ
@ -22,22 +22,31 @@
  a. #c[*(30 points)* Implement all the three models and test your program on the three pairs of training data and test data. The main script function, Problem 1 (training data file,test data file) is given and this script should not be modified. There are 3 scripts that need to be completed for Problem 1 (`Error_Rate.m`, `Param_Est.m`, `Classify.m`). The _TODO_: comment headers must be filled in in all 3 of these files. These _TODO_ headers describe exactly what code needs to be written to obtain full credit. The script `Error_Rate.m` is for calculating the error rate. `Param_Est.m` is for estimating the parameters of each multivariante Gaussian distribution under the 3 different models. `Classify.m` is for classify the test data using the learned models. For each test dataset, the problem calls several functions and print out the training error rate and test error rate of each model to the MATLAB command window.]

  ```
-  >> Problem1('training_data1.txt', 'test_data1.txt')
-  Model 1: (train err = 28.0% 28.0% 100.0% ), (test error = 19.0% 19.0% 100.0% )
-  Model 2: (train err = 27.0% 27.0% 100.0% ), (test error = 25.0% 25.0% 100.0% )
-  Model 3: (train err = 29.0% 29.0% 100.0% ), (test error = 26.0% 26.0% 100.0% )
-  >> Problem1('training_data2.txt', 'test_data2.txt')
-  Model 1: (train err = 29.0% 29.0% 100.0% ), (test error = 19.0% 19.0% 100.0% )
-  Model 2: (train err = 15.0% 15.0% 100.0% ), (test error = 15.0% 15.0% 100.0% )
-  Model 3: (train err = 27.0% 27.0% 100.0% ), (test error = 22.0% 22.0% 100.0% )
-  >> Problem1('training_data3.txt', 'test_data3.txt')
-  Model 1: (train err = 30.0% 30.0% 100.0% ), (test error = 21.0% 21.0% 100.0% )
-  Model 2: (train err = 0.0% 0.0% 100.0% ), (test error = 0.0% 0.0% 100.0% )
-  Model 3: (train err = 30.0% 30.0% 100.0% ), (test error = 28.0% 28.0% 100.0% )
+  >> AllProblem1
+  Dataset 1:
+  Model 1: (train err = 5% ), (test error = 20% )
+  Model 2: (train err = 6% ), (test error = 17% )
+  Model 3: (train err = 7% ), (test error = 18% )
+
+  Dataset 2:
+  Model 1: (train err = 7% ), (test error = 23% )
+  Model 2: (train err = 14% ), (test error = 56% )
+  Model 3: (train err = 13% ), (test error = 53% )
+
+  Dataset 3:
+  Model 1: (train err = 1% ), (test error = 12% )
+  Model 2: (train err = 19% ), (test error = 45% )
+  Model 3: (train err = 2% ), (test error = 5% )
  ```

  b. #c[*(5 points)* State which model works best on each test data set and explain why you believe this is the case. Discuss your observations.]

+  It's actually interesting that each dataset seems to have a different model as its most successful model:
+
+  - For dataset 1, model 2 worked the best.
+  - For dataset 2, model 1 worked the best.
+  - For dataset 3, model 3 worked the best.
+
  c. #c[*(15 points)* Write the log likelihood function and derive $S_1$ and $S_2$ by maximum likelihood estimation of model 2. Note that since $S_1$ and $S_2$ are shared as $S$, you need to add the log likelihood function of the two classes to maximizing for deriving $S$.]

  The maximum likelihood of a single