79 lines
No EOL
2.5 KiB
XML
79 lines
No EOL
2.5 KiB
XML
== Step 2-4
|
|
|
|
For steps 2-4, I calculated all of each process' outgoing nodes, sorted it in
|
|
order and used its sorted position as a way to identify which nodes are being
|
|
sent.
|
|
|
|
This saves an extra communication and lets me index the same items for each
|
|
loop.
|
|
|
|
== Step 5
|
|
|
|
I exchanged data using the unstructured communication approach, doing an
|
|
all-to-all transfer.
|
|
|
|
To read the result efficiently, I tried using the approach given in the slides.
|
|
I also tried to use binary search since this would yield $log(n)$ time.
|
|
However, this was taking a long time (up to 45 seconds for the 10,000 case), and
|
|
it was the bottleneck. Using STL's `std::map` proved to be orders of magnitude
|
|
faster.
|
|
|
|
== Other remarks
|
|
|
|
On the original example dataset, it poorly using larger numbers. I have an
|
|
explanation for this after looking at the performance characteristics of the
|
|
run: it completes in one iteration where every single edge is assigned. The data
|
|
distribution also indicates that almost everything is connected into the first
|
|
node, which isn't balanced.
|
|
|
|
I've written a generation script in Python using the `igraph` library.
|
|
|
|
- 1,000: 93 components
|
|
- 10,000: 947 components
|
|
- 100,000: 9,423 components
|
|
- 1,000,000: 92,880 components
|
|
|
|
Using this data, I was able to achieve much better speedup. I didn't attach the
|
|
actual data files but they can be generated with the same script (seeded for
|
|
reproducibility).
|
|
|
|
*NOTE:* I noticed that afterwards, the data was changed again, with a more balanced graph this time.
|
|
So the numbers will not reflect the poorer performance.
|
|
|
|
== Timing on example dataset
|
|
|
|
This experiment was performed on CSELabs by using my bench script, and the table
|
|
was generated with another script.
|
|
|
|
#table(
|
|
columns: (auto, auto, auto, auto, auto, auto),
|
|
[], [1], [2], [4], [8], [16] ,
|
|
[1000],
|
|
[0.0249s #linebreak() 0.0151s],
|
|
[0.0234s #linebreak() 0.0122s],
|
|
[0.0206s #linebreak() 0.0099s],
|
|
[0.0491s #linebreak() 0.0248s],
|
|
[0.0177s #linebreak() 0.0106s],
|
|
|
|
[10000],
|
|
[0.2929s #linebreak() 0.1830s],
|
|
[0.2933s #linebreak() 0.1540s],
|
|
[0.2457s #linebreak() 0.1178s],
|
|
[0.3793s #linebreak() 0.1328s],
|
|
[0.2473s #linebreak() 0.1197s],
|
|
|
|
[100000],
|
|
[3.7888s #linebreak() 2.4881s],
|
|
[3.7592s #linebreak() 2.0212s],
|
|
[3.3819s #linebreak() 1.6036s],
|
|
[2.9485s #linebreak() 1.3954s],
|
|
[2.8593s #linebreak() 1.3107s],
|
|
|
|
[1000000],
|
|
[46.7895s #linebreak() 31.9648s],
|
|
[45.2284s #linebreak() 24.8540s],
|
|
[40.3994s #linebreak() 20.2851s],
|
|
[36.9628s #linebreak() 17.6794s],
|
|
[35.7110s #linebreak() 16.6276s],
|
|
|
|
) |