csci5451/assignments/03/report.typ
2023-12-10 17:38:53 -06:00

79 lines
No EOL
2.5 KiB
XML

== Step 2-4
For steps 2-4, I calculated all of each process' outgoing nodes, sorted it in
order and used its sorted position as a way to identify which nodes are being
sent.
This saves an extra communication and lets me index the same items for each
loop.
== Step 5
I exchanged data using the unstructured communication approach, doing an
all-to-all transfer.
To read the result efficiently, I tried using the approach given in the slides.
I also tried to use binary search since this would yield $log(n)$ time.
However, this was taking a long time (up to 45 seconds for the 10,000 case), and
it was the bottleneck. Using STL's `std::map` proved to be orders of magnitude
faster.
== Other remarks
On the original example dataset, it poorly using larger numbers. I have an
explanation for this after looking at the performance characteristics of the
run: it completes in one iteration where every single edge is assigned. The data
distribution also indicates that almost everything is connected into the first
node, which isn't balanced.
I've written a generation script in Python using the `igraph` library.
- 1,000: 93 components
- 10,000: 947 components
- 100,000: 9,423 components
- 1,000,000: 92,880 components
Using this data, I was able to achieve much better speedup. I didn't attach the
actual data files but they can be generated with the same script (seeded for
reproducibility).
*NOTE:* I noticed that afterwards, the data was changed again, with a more balanced graph this time.
So the numbers will not reflect the poorer performance.
== Timing on example dataset
This experiment was performed on CSELabs by using my bench script, and the table
was generated with another script.
#table(
columns: (auto, auto, auto, auto, auto, auto),
[], [1], [2], [4], [8], [16] ,
[1000],
[0.0249s #linebreak() 0.0151s],
[0.0234s #linebreak() 0.0122s],
[0.0206s #linebreak() 0.0099s],
[0.0491s #linebreak() 0.0248s],
[0.0177s #linebreak() 0.0106s],
[10000],
[0.2929s #linebreak() 0.1830s],
[0.2933s #linebreak() 0.1540s],
[0.2457s #linebreak() 0.1178s],
[0.3793s #linebreak() 0.1328s],
[0.2473s #linebreak() 0.1197s],
[100000],
[3.7888s #linebreak() 2.4881s],
[3.7592s #linebreak() 2.0212s],
[3.3819s #linebreak() 1.6036s],
[2.9485s #linebreak() 1.3954s],
[2.8593s #linebreak() 1.3107s],
[1000000],
[46.7895s #linebreak() 31.9648s],
[45.2284s #linebreak() 24.8540s],
[40.3994s #linebreak() 20.2851s],
[36.9628s #linebreak() 17.6794s],
[35.7110s #linebreak() 16.6276s],
)