140 lines
6.7 KiB
Markdown
140 lines
6.7 KiB
Markdown
## Introduction
|
|
|
|
The purpose of this assignment is to become familiar with unstructured and input data instance dependent communication patterns by developing a parallel connected-component algorithm using label propagation in MPI. This is a non-trivial programming assignment so you need to start thinking and working on it as soon as possible.
|
|
|
|
You need to write a program that will takes as input one file, a graph G and outputs the connected components of the graph, i.e., the vertex labels identifying the component to which each vertex belongs. The input file for the graph will consist of a set of (i j) space-separated tuples (each on its own line) corresponding to the edges of the graph. You can assume that the input files are sorted in increasing (i, j) order.
|
|
|
|
Here is the output function you should use:
|
|
|
|
```c
|
|
/**
|
|
* @brief Write a vector of labels to a file.
|
|
*
|
|
* @param filename The name of the file to write to.
|
|
* @param labels The array of labels.
|
|
* @param nlabels How many labels to write.
|
|
*/
|
|
static void print_labels(
|
|
char const * const filename,
|
|
unsigned const * const labels,
|
|
size_t const nlabels)
|
|
{
|
|
size_t i;
|
|
FILE * fout;
|
|
|
|
/* open file */
|
|
if((fout = fopen(filename, "w")) == NULL) {
|
|
fprintf(stderr, "error opening '%s'\n", filename);
|
|
abort();
|
|
}
|
|
|
|
/* write labels to fout */
|
|
for(i = 0; i < nlabels; ++i) {
|
|
fprintf(fout, "%u\n", labels[i]);
|
|
}
|
|
|
|
fclose(fout);
|
|
}
|
|
```
|
|
|
|
## Parallelization strategy
|
|
|
|
Design your programs so that they follow the following steps:
|
|
|
|
1. One process reads the file and distributes the data to the other processes using a 1D decomposition (each rank gets approx same number of vertices). You should take care to perform this decomposition using consecutive vertices (i.e. if there are two processes and 60 vertices, then proc. 1 should be responsible the first 30 vertices and proc. 2 should be responsible the second 30 vertices). Not performing a 1D decomposition in this way will result in large increases in your timings.
|
|
2. Each process analyzes the non-local edges that are contained in its portion of the graph.
|
|
3. Each process determines which processors stores the non-local vertices corresponding to the non-local edges.
|
|
4. All the processes are communicating to figure out which process needs to send what data to the other processes.
|
|
5. The processes perform the transfers of non-local labels and updates of local labels until convergence.
|
|
6. The results are gathered to a single process, which writes them to the disk.
|
|
|
|
```
|
|
initialization by assigning a unique label to each node in G
|
|
|
|
pre-compute the unstructured communication information
|
|
|
|
while not converged, (convergence means NO node changes its label)
|
|
exchange non-local labels
|
|
for each local node n
|
|
assign the new label of n to be the maximum label of its neighbors (local and non-local)
|
|
```
|
|
|
|
Please make sure that you design your data structures in such a way so that accessing the received non-local labels does not cost any more than the time required to access your local labels. Also, your solution should be memory scalable (i.e., allocating a label vector of size n is not an option). You should be using exactly as much memory as required to store the labels that you need to compute the connected components.
|
|
|
|
## Testing
|
|
|
|
A few test graphs are available on the csel-plate machines at /export/scratch/CSCI5451_F23/assignment-3/dataset. The input files are organized such that the first line contains the number of nodes and the number of edges (space separated). The following lines contain the edges of the graph. The graphs are undirected (i.e., if (i, j) is present in the input file, then (j, i) will be present as well). The graphs also do not contain any self-referential edges (i.e., there is no i such that (i, i) is in our input file). Finally, the file names indicate the number of vertices in the graph (i.e., 12.txt has 12 vertices). There are 50 components for 1000.txt, 500 components for both 10000.txt and 100000.txt, 5000 components for 1000000.txt.
|
|
|
|
Your file output should contain n lines-each line should contain the component label for node i on the i-th line. As an example, if we have 4 nodes with 2 edges of (0, 1) and (2, 3), then your file output, using the above algorithm, should be as follows.
|
|
|
|
```
|
|
1
|
|
1
|
|
3
|
|
3
|
|
```
|
|
|
|
## Timing
|
|
|
|
Use the function MPI_Wtime() to time sections of your source code. You should provide times for steps 2-5 and also times for steps 5. You must use barriers before timers are started and ended to ensure accurate timing results. For your timings, it is very important that you only exchange the needed label information between processes. Exchanging more label information than is necessary will result in large increases in runtime on the provided test graph.
|
|
|
|
You should use this function to print your times:
|
|
|
|
```c
|
|
/**
|
|
* @brief Output the seconds elapsed steps 2-5. This excludes input and
|
|
* output time. This should be wallclock time, not CPU time.
|
|
*
|
|
* @param seconds Seconds spent sorting.
|
|
*/
|
|
static void print_time25(
|
|
double const seconds)
|
|
{
|
|
printf("2-5 Time: %0.04fs\n", seconds);
|
|
}
|
|
```
|
|
|
|
```c
|
|
/**
|
|
* @brief Output the seconds elapsed for step 5. This excludes input and
|
|
* output time. This should be wallclock time, not CPU time.
|
|
*
|
|
* @param seconds Seconds spent sorting.
|
|
*/
|
|
static void print_time5(
|
|
double const seconds)
|
|
{
|
|
printf("5 Time: %0.04fs\n", seconds);
|
|
}
|
|
```
|
|
|
|
## What you need to turn in
|
|
|
|
1. The source code of your program. Your major source file (containing the main function and all MPI calls) should be named lpa.c.
|
|
2. A short writeup describing how you went about solving steps 2-5. For step 5, describe how your label propagation code accesses the non-local labels that it received.
|
|
3. Timing results on 1, 2, 4, 8, & 16 processors for performing the steps 2-5 above for graph 100000.txt. (100,000 vertices)
|
|
4. Timing results for just step 5 above on 100000.txt. (100,000 vertices)
|
|
|
|
Please, do NOT include the test files.
|
|
|
|
## Code Submission Guidelines
|
|
|
|
1. A makefile must be provided to compile and generate the executable, which must be named lpa
|
|
2. The first line of the output must be the total time taken (steps 2-5) followed by time taken for step 5. Use the provided functions.
|
|
3. Your program should be invoked via:
|
|
|
|
```
|
|
mpirun -np 8 ./lpa <graph> <labels>
|
|
```
|
|
|
|
4. The report must be named report.pdf.
|
|
5. All files (Code + Report) must be in a single directory and directory's name should be your University Student ID.
|
|
6. Submission must be only .tar.gz files.
|
|
|
|
## Evaluation criteria
|
|
|
|
Here are the criteria that you will be graded on:
|
|
|
|
1. Efficiency in performing each one of the steps.
|
|
2. Low memory complexity.
|
|
3. Speedup. I will like to evaluate the speedup for performing steps 2-5 and also for step 5.
|