{"id":19038603,"url":"https://github.com/garyhtou/parallel-zip","last_synced_at":"2025-04-23T19:46:18.827Z","repository":{"id":152165813,"uuid":"459003090","full_name":"garyhtou/Parallel-Zip","owner":"garyhtou","description":"A multi-threaded program that compresses files using semaphores, locks, and RLE.","archived":false,"fork":false,"pushed_at":"2022-03-19T04:44:14.000Z","size":9321,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-18T04:54:31.449Z","etag":null,"topics":["concurrency","cpsc3500","locks","multithreading","pzip","rle","semaphore","zip"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/garyhtou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-02-14T03:29:55.000Z","updated_at":"2024-11-21T10:38:28.000Z","dependencies_parsed_at":null,"dependency_job_id":"fe26daae-82fd-455c-b8ce-66a063c6d23f","html_url":"https://github.com/garyhtou/Parallel-Zip","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garyhtou%2FParallel-Zip","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garyhtou%2FParallel-Zip/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garyhtou%2FParallel-Zip/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/garyhtou%2FParallel-Zip/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/garyhtou","download_url":"https://codeload.github.com/garyhtou/Parallel-Zip/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250502690,"owners_count":21441281,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["concurrency","cpsc3500","locks","multithreading","pzip","rle","semaphore","zip"],"created_at":"2024-11-08T22:04:10.129Z","updated_at":"2025-04-23T19:46:18.813Z","avatar_url":"https://github.com/garyhtou.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🛤️ Parallel Zip (`pzip`)\n\n## About\n\nParallel Zip (`pzip`) is a multi-threaded program that compresses a list of\ninput files specified in the command line arguments using Run Length Encoding\n(RLE). It implements **locks** and **semaphores** to ensure multiple threads\ncan safely access a shared unbounded buffer. Additional semaphores are also used\nto order the output (print in the same order as the input list).\n\n\u003csub\u003eMore information can be found [here](/assignment/Project3_para_zip.pdf).\u003c/sub\u003e\n\n## Team members and contribution\n\n- Gary Tou ([@garyhtou](https://github.com/garyhtou))\n- Castel Villalobos ([@impropernoun](https://github.com/impropernoun))\n- Hank Rudolph ([@hankrud](https://github.com/HankRud))\n\n## Design Considerations\n\n### Paralleling the compression\n\nWe used multiple threads to compress the file. This allows us to run the\ncompression algorithm in parallel. In addition, we saved this compressed data in\nmemory to decrease the amount of time spent in the ordering **semaphores'\ncritical section**.\n\n### Determine the number of threads to create\n\nUsing `get_nprocs()`, we can determine the number of processors available on the\nsystem. This number is then used as the max thread limit (unless the system does\nnot have multiple cores — which it would then default to 5). The program will\nnot create more threads than needed (except for the 5 default threads).\n\n### Efficiency of each thread\n\nBy **memory mapping** input files, using a **thread pool**, and storing\ncompressed data in memory until their turn to print, we can efficiently perform\neach piece of work in parallel.\n\n### Access the input files efficiently\n\n**Memory mapping** was the way we efficiently accessed the input files. This\nallows us to have easier/quicker access to the files. In addition, the memory\nmapping occurs in the worker threads. This allows input files to be\nread/processed concurrently!\n\n### Coordinating multiple threads\n\nWe used a lock to protect shared data (the job queue). A semaphore to prevent\njob worker threads from running when the queue is empty. And multiple semaphores\nto order the printing output.\n\n### Terminating threads in the thread pool\n\nWe created a `kill` boolean in the job struct (this struct is added to the job\nqueue). Whenever a worker thread receives a new job, it will check the `kill`\nboolean. If `kill` is `true`, we killed the thread and exit appropriately.\n\n## Strengths and Weaknesses\n\nStrengths:\n\n- Parallelizes the compression algorithm\n- Saves compressed data to memory before printing\n  - Prevents computation and printing bottleneck\n- Faster than `wzip`\n- Handles potential system call errors\n\nWeaknesses:\n\n- Only one thread per file\n- Uses Run Length Encoding (RLE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgaryhtou%2Fparallel-zip","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgaryhtou%2Fparallel-zip","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgaryhtou%2Fparallel-zip/lists"}