{"id":17632726,"url":"https://github.com/apostolis1/parallel-processing-systems","last_synced_at":"2026-04-29T15:03:26.379Z","repository":{"id":250010814,"uuid":"831508501","full_name":"apostolis1/Parallel-Processing-Systems","owner":"apostolis1","description":"Project of the undergrad course \"Parallel Processing Systems\" - NTUA","archived":false,"fork":false,"pushed_at":"2024-07-24T14:38:35.000Z","size":3410,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-07T05:02:47.517Z","etag":null,"topics":["benchmark","c","cuda","mpi","openmp","parallel-computing"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apostolis1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-20T19:01:26.000Z","updated_at":"2024-07-24T14:52:56.000Z","dependencies_parsed_at":"2024-07-24T18:24:50.608Z","dependency_job_id":"18fdfefe-20a0-4f71-9fcb-0d61d5da5970","html_url":"https://github.com/apostolis1/Parallel-Processing-Systems","commit_stats":null,"previous_names":["apostolis1/parallel-processing-systems"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/apostolis1/Parallel-Processing-Systems","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apostolis1%2FParallel-Processing-Systems","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apostolis1%2FParallel-Processing-Systems/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apostolis1%2FParallel-Processing-Systems/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apostolis1%2FParallel-Processing-Systems/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apostolis1","download_url":"https://codeload.github.com/apostolis1/Parallel-Processing-Systems/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apostolis1%2FParallel-Processing-Systems/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32430803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T13:34:34.882Z","status":"ssl_error","status_checked_at":"2026-04-29T13:34:29.830Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","c","cuda","mpi","openmp","parallel-computing"],"created_at":"2024-10-23T01:45:16.853Z","updated_at":"2026-04-29T15:03:26.343Z","avatar_url":"https://github.com/apostolis1.png","language":"C","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Parallel Processing Systems Lab\n\n| Authors                                                                |\n|------------------------------------------------------------------------|\n| Dimitra Leventi ([@dileventi](https://github.com/dileventi))           |\n| Dimitrios Mitropoulos ([@dimitrismit](https://github.com/dimitrismit)) |\n| Apostolis Stamatis ([@apostolis1](https://github.com/apostolis1))      |\n\n## Overview\n\nWe conducted benchmarks using different configurations of resources (nodes, processors, processors per node), tasks (where applicable) and input sizes to determine scalability and bottlenecks\n\nThe results of the benchmarks and their in depth analysis can be found in the report\n\nThe report also contains critical parts of the source code \n\n\n## Lab 1 - Conway's Game of Life using OpenMP\n\nGiven a serial algorithm of Conway's Game of Life:\n1. Detect the parallelization possibilities\n2. Implement a solution using OpenMP in a shared address space architecture\n3. Perform benchmarks\n\n## Lab 2 - Parallelization and optimization on shared memory architectures\n\nGiven a serial K-means algorithm:\n\n1. Add the necessary synchronization commands when accessing shared resources, so the algorithmm can be run on a parallel system\n2. Improve algorithm of (1) by creating local data structures to avoid synchronization using reduction\n3. Perform benchmarks\n\n## Lab 3 - Locks and mutex on shared memory architectures \n\nBenchmark different lock implementations for parallel systems, compare and interpret the results \n\n1. pthread_mutex_t lock from the Pthreads library\n2. pthread_spinlock_t  lock from the Pthreads library\n3. test-and-set lock\n4. test-and-test-and-set lock\n5. array based lock\n6. linked list lock from chapter 7 of \"The Art of Multiprocessor Programming\"\n\nImplementation of the Floyd-Warshall algorihtm using parallel tasks, understanding the limitations of `parallel for`\n\n## Lab 4 - Concurrent data structures\n\nBenchmark the following implementations of a concurrent double linked list:\n\n1. Coarse-grain locking\n2. Fine-grain locking\n3. Optimistic synchronization\n4. Lazy synchronization\n5. Non-blocking synchronization\n\n## Lab 5 - Parallelization and optimization on NVIDIA GPUs using CUDA\n\nDifferent implementations and optimizations of the K-means algorithm\n\n1. Naïve version: Nearest clusters calculation is offloaded to the GPU\n2. Transpose version: Implement column-based indexing for the arrays (instead of row-based which is used in the naïve version)\n3. Shared version: Move the frequently accessed `clusters` array to the shared GPU memory\n\n\n## Lab 6 - Parallelization and optimization on distributed memory architectures\n\nGiven the serial versions of Jacobi and Gauss-Seidel kernels for the ... problem:\n1. Identify parallelism possiblities on Jacobi and Gauss-Seidel kernels\n2. Design and implement a solution for a distributed memory arhcitecture using message passing with MPI \n3. Perform benchmarks","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapostolis1%2Fparallel-processing-systems","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapostolis1%2Fparallel-processing-systems","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapostolis1%2Fparallel-processing-systems/lists"}