csci5451/assignments/01/report.md

---
geometry: margin=2cm
output: pdf_document
title: CSCI 5451 Assignment 1
date: \today

author: |
  | Michael Zhang \<zhan4854@umn.edu\> $\cdot$ ID: 5289259
---

1.  _A short description of how you went about parallelizing the classification algorithm. You should include how you decomposed the problem and why, i.e., what were the tasks being parallelized._

    The parallelization I used was incredibly simple, primarily just parallelizing outer iterations. For the OpenMP version I also went ahead and parallelized by the rows, but only if there were more cores than the number of dimensions, as a simple heuristic.

    The reason I didn't go further was that further breaking down of the for loops incurred more overhead from managing the parallelization than was actually gained. I have run this several times and the gains were either neglient, or it actually ran slower than the serial version.

    Some other optimizations I did are:

    - inlined most of the calculations to require as few loops as possible
    - moved all allocations to the top level
    - arranged my data buffer in column-major order instead since the iteration pattern was by dimension rather than by row

2.  _Timing results for 1, 2, 4, 8, and 16 threads for the classification. You should include results with outer iterations set to 10._

    Run on my local machine:

    ```
    ./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 1
    Program time (compute): 0.0069s
    ./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 2
    Program time (compute): 0.0027s
    ./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 4
    Program time (compute): 0.0027s
    ./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 8
    Program time (compute): 0.0033s
    ./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 16
    Program time (compute): 0.0031s
    ./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 1
    Program time (compute): 21.5287s
    ./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 2
    Program time (compute): 10.6175s
    ./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 4
    Program time (compute): 5.2198s
    ./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 8
    Program time (compute): 4.5690s
    ./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 16
    Program time (compute): 3.6433s
    ./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 1
    Program time (compute): 0.0033s
    ./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 2
    Program time (compute): 0.0017s
    ./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 4
    Program time (compute): 0.0011s
    ./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 8
    Program time (compute): 0.0020s
    ./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 16
    Program time (compute): 0.0032s
    ./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 1
    Program time (compute): 21.7196s
    ./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 2
    Program time (compute): 10.4035s
    ./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 4
    Program time (compute): 5.2449s
    ./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 8
    Program time (compute): 4.1550s
    ./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 16
    Program time (compute): 3.5328s
    ```

    Run on `csel-plate` (slower clock speed but significantly more cores):

    ```
    ./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 1
    Program time (compute): 0.0519s
    ./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 2
    Program time (compute): 0.0288s
    ./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 4
    Program time (compute): 0.0248s
    ./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 8
    Program time (compute): 0.0335s
    ./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 16
    Program time (compute): 0.0299s
    ./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 1
    Program time (compute): 739.2866s
    ./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 2
    Program time (compute): 375.7334s
    ./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 4
    Program time (compute): 187.6661s
    ./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 8
    Program time (compute): 93.6721s
    ./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 16
    Program time (compute): 46.9217s
    ./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 1
    Program time (compute): 0.0298s
    ./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 2
    Program time (compute): 0.0163s
    ./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 4
    Program time (compute): 0.0122s
    ./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 8
    Program time (compute): 0.0108s
    ./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 16
    Program time (compute): 0.0099s
    ./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 1
    Program time (compute): 730.4170s
    ./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 2
    Program time (compute): 375.2444s
    ./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 4
    Program time (compute): 187.1316s
    ./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 8
    Program time (compute): 93.7702s
    ./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 16
    Program time (compute): 46.7320s
    ```

    This data was generated using the `run_benchmark.sh > out.txt` script.

## NOTES

I noticed that the loss sometimes fluctuates rather wildly. I think this is because there's no fixed learning rate, so instead of going incrementally, we kind of just take each dimension's minimum and haphazardly combine them together. In Wikipedia's description[^1] of the algorithm, they take the $w_i$ in particular that results in the minimal loss _by itself_, and then only use that $w_i$ for that outer iteration. I'm wondering if this will produce a better convergence, but I tried to stick to implementing the algorithm described in the pdf for the sake of the assignment since I'm guessing the effectiveness of the machine learning model isn't the important thing here.

[^1]: https://en.wikipedia.org/wiki/Coordinate_descent

Also, there's a part in the end of the program that performs validation on the trained model by using a train/test data set split. I didn't count this towards execution time but felt that it was important enough to keep since it ensured that my program was still behaving correctly.
polish 2023-10-09 09:17:14 +00:00			`---`
			`geometry: margin=2cm`
			`output: pdf_document`
			`title: CSCI 5451 Assignment 1`
			`date: \today`

			`author: \|`
			`\| Michael Zhang \<zhan4854@umn.edu\> $\cdot$ ID: 5289259`
			`---`

			`1. _A short description of how you went about parallelizing the classification algorithm. You should include how you decomposed the problem and why, i.e., what were the tasks being parallelized._`

report 2023-10-09 13:38:30 +00:00			`The parallelization I used was incredibly simple, primarily just parallelizing outer iterations. For the OpenMP version I also went ahead and parallelized by the rows, but only if there were more cores than the number of dimensions, as a simple heuristic.`
polish 2023-10-09 09:17:14 +00:00
			`The reason I didn't go further was that further breaking down of the for loops incurred more overhead from managing the parallelization than was actually gained. I have run this several times and the gains were either neglient, or it actually ran slower than the serial version.`

report 2023-10-09 13:38:30 +00:00			`Some other optimizations I did are:`

			`- inlined most of the calculations to require as few loops as possible`
			`- moved all allocations to the top level`
			`- arranged my data buffer in column-major order instead since the iteration pattern was by dimension rather than by row`
polish 2023-10-09 09:17:14 +00:00
			`2. _Timing results for 1, 2, 4, 8, and 16 threads for the classification. You should include results with outer iterations set to 10._`

report 2023-10-09 13:38:30 +00:00			`Run on my local machine:`

polish 2023-10-09 09:17:14 +00:00			```
			`./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 1`
			`Program time (compute): 0.0069s`
			`./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 2`
			`Program time (compute): 0.0027s`
			`./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 4`
			`Program time (compute): 0.0027s`
			`./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 8`
			`Program time (compute): 0.0033s`
			`./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 16`
			`Program time (compute): 0.0031s`
			`./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 1`
			`Program time (compute): 21.5287s`
			`./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 2`
			`Program time (compute): 10.6175s`
			`./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 4`
			`Program time (compute): 5.2198s`
			`./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 8`
			`Program time (compute): 4.5690s`
			`./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 16`
			`Program time (compute): 3.6433s`
			`./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 1`
			`Program time (compute): 0.0033s`
			`./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 2`
			`Program time (compute): 0.0017s`
			`./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 4`
			`Program time (compute): 0.0011s`
			`./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 8`
			`Program time (compute): 0.0020s`
			`./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 16`
			`Program time (compute): 0.0032s`
			`./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 1`
			`Program time (compute): 21.7196s`
			`./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 2`
			`Program time (compute): 10.4035s`
			`./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 4`
			`Program time (compute): 5.2449s`
			`./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 8`
			`Program time (compute): 4.1550s`
			`./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 16`
			`Program time (compute): 3.5328s`
			```

report 2023-10-09 13:38:30 +00:00			Run on `csel-plate` (slower clock speed but significantly more cores):

			```
			`./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 1`
			`Program time (compute): 0.0519s`
			`./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 2`
			`Program time (compute): 0.0288s`
			`./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 4`
			`Program time (compute): 0.0248s`
			`./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 8`
			`Program time (compute): 0.0335s`
			`./lc_pthreads ./dataset/small_data.csv ./dataset/small_data.csv 10 16`
			`Program time (compute): 0.0299s`
			`./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 1`
			`Program time (compute): 739.2866s`
			`./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 2`
			`Program time (compute): 375.7334s`
			`./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 4`
			`Program time (compute): 187.6661s`
			`./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 8`
			`Program time (compute): 93.6721s`
			`./lc_pthreads ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 16`
			`Program time (compute): 46.9217s`
			`./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 1`
			`Program time (compute): 0.0298s`
			`./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 2`
			`Program time (compute): 0.0163s`
			`./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 4`
			`Program time (compute): 0.0122s`
			`./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 8`
			`Program time (compute): 0.0108s`
			`./lc_openmp ./dataset/small_data.csv ./dataset/small_data.csv 10 16`
			`Program time (compute): 0.0099s`
			`./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 1`
			`Program time (compute): 730.4170s`
			`./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 2`
			`Program time (compute): 375.2444s`
			`./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 4`
			`Program time (compute): 187.1316s`
			`./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 8`
			`Program time (compute): 93.7702s`
			`./lc_openmp ./dataset/MNIST_data.csv ./dataset/MNIST_label.csv 10 16`
			`Program time (compute): 46.7320s`
			```

polish 2023-10-09 09:17:14 +00:00			This data was generated using the `run_benchmark.sh > out.txt` script.

update 2023-10-09 09:30:22 +00:00			`## NOTES`
polish 2023-10-09 09:17:14 +00:00
update 2023-10-09 09:30:22 +00:00			I noticed that the loss sometimes fluctuates rather wildly. I think this is because there's no fixed learning rate, so instead of going incrementally, we kind of just take each dimension's minimum and haphazardly combine them together. In Wikipedia's description[^1] of the algorithm, they take the $w_i$ in particular that results in the minimal loss _by itself_, and then only use that $w_i$ for that outer iteration. I'm wondering if this will produce a better convergence, but I tried to stick to implementing the algorithm described in the pdf for the sake of the assignment since I'm guessing the effectiveness of the machine learning model isn't the important thing here.
polish 2023-10-09 09:17:14 +00:00
update 2023-10-09 09:30:22 +00:00			`[^1]: https://en.wikipedia.org/wiki/Coordinate_descent`

			`Also, there's a part in the end of the program that performs validation on the trained model by using a train/test data set split. I didn't count this towards execution time but felt that it was important enough to keep since it ensured that my program was still behaving correctly.`