https://github.com/dreeseaw/countsort-openmp
A C++ & OpenMP implementation of the count sort algorithm
https://github.com/dreeseaw/countsort-openmp
Last synced: 7 months ago
JSON representation
A C++ & OpenMP implementation of the count sort algorithm
- Host: GitHub
- URL: https://github.com/dreeseaw/countsort-openmp
- Owner: Dreeseaw
- Created: 2018-11-06T19:03:36.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2018-12-07T18:11:41.000Z (almost 7 years ago)
- Last Synced: 2025-01-30T13:27:12.636Z (8 months ago)
- Language: C++
- Homepage:
- Size: 7.81 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CountSort-OpenMP
#### An OpenMP implementation of the count sort algorithmFor my OpenMP code, I use a private middle array to calculate the total appearances of each value in a given subsection of the unsorted array, and apply prefix sums in each thread. After prefix sums, I have to reduce each array index to a total array to use for the last part of my code. The trick here was to use a ‘for’ pragma with a ‘reduction(+:mid_array(K)' clause. Here my for loop ranged from 0 to the number of threads being used, so each thread got one instance of the for loop, which performed a reduction across the array wiht the respective thread's version of the middle array.
My trick to create more parallelism in the final part of the algorithm was to assign a range of K values to a certain thread, so that different threads weren’t trying to read and write from the same index simultaneously. To further alleviate this, I added a padding of 8 (CPAD) to each element of the middle array. These two combined allowed me to parallelise this last part of the code.
Here's some tests sorting a 100-million element array, with integers ranging 0-100K using N threads of dual 10- or 12-core Xeon V2 E5-2680 processors.
| 1 (Serial) | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 18 | 20 |
| ---------- | ----- | ----- | ----- | ----- | ------ | ------ | ------ | ------ | ------ | ------ |
| 2.50s | 1.32s | 1.23s | 0.98s | 0.84s | 0.77s | 0.73s | 0.71s | 0.69s | 0.69s | 0.71s |