== Step 2-4 For steps 2-4, I calculated all of each process' outgoing nodes, sorted it in order and used its sorted position as a way to identify which nodes are being sent. This saves an extra communication and lets me index the same items for each loop. == Step 5 I exchanged data using the unstructured communication approach, doing an all-to-all transfer. To read the result efficiently, I tried using the approach given in the slides. I also tried to use binary search since this would yield $log(n)$ time. However, this was taking a long time (up to 45 seconds for the 10,000 case), and it was the bottleneck. Using STL's `std::map` proved to be orders of magnitude faster. == Other remarks On the original example dataset, it poorly using larger numbers. I have an explanation for this after looking at the performance characteristics of the run: it completes in one iteration where every single edge is assigned. The data distribution also indicates that almost everything is connected into the first node, which isn't balanced. I've written a generation script in Python using the `igraph` library. - 1,000: 93 components - 10,000: 947 components - 100,000: 9,423 components - 1,000,000: 92,880 components Using this data, I was able to achieve much better speedup. I didn't attach the actual data files but they can be generated with the same script (seeded for reproducibility). *NOTE:* I noticed that afterwards, the data was changed again, with a more balanced graph this time. So the numbers will not reflect the poorer performance. == Timing on example dataset This experiment was performed on CSELabs by using my bench script, and the table was generated with another script. #table( columns: (auto, auto, auto, auto, auto, auto), [], [1], [2], [4], [8], [16] , [1000], [0.0249s #linebreak() 0.0151s], [0.0234s #linebreak() 0.0122s], [0.0206s #linebreak() 0.0099s], [0.0491s #linebreak() 0.0248s], [0.0177s #linebreak() 0.0106s], [10000], [0.2929s #linebreak() 0.1830s], [0.2933s #linebreak() 0.1540s], [0.2457s #linebreak() 0.1178s], [0.3793s #linebreak() 0.1328s], [0.2473s #linebreak() 0.1197s], [100000], [3.7888s #linebreak() 2.4881s], [3.7592s #linebreak() 2.0212s], [3.3819s #linebreak() 1.6036s], [2.9485s #linebreak() 1.3954s], [2.8593s #linebreak() 1.3107s], [1000000], [46.7895s #linebreak() 31.9648s], [45.2284s #linebreak() 24.8540s], [40.3994s #linebreak() 20.2851s], [36.9628s #linebreak() 17.6794s], [35.7110s #linebreak() 16.6276s], )