https://github.com/garyhtou/parallel-zip
A multi-threaded program that compresses files using semaphores, locks, and RLE.
https://github.com/garyhtou/parallel-zip
concurrency cpsc3500 locks multithreading pzip rle semaphore zip
Last synced: about 1 year ago
JSON representation
A multi-threaded program that compresses files using semaphores, locks, and RLE.
- Host: GitHub
- URL: https://github.com/garyhtou/parallel-zip
- Owner: garyhtou
- Created: 2022-02-14T03:29:55.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-03-19T04:44:14.000Z (about 4 years ago)
- Last Synced: 2025-04-18T04:54:31.449Z (about 1 year ago)
- Topics: concurrency, cpsc3500, locks, multithreading, pzip, rle, semaphore, zip
- Language: C++
- Homepage:
- Size: 8.89 MB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🛤️ Parallel Zip (`pzip`)
## About
Parallel Zip (`pzip`) is a multi-threaded program that compresses a list of
input files specified in the command line arguments using Run Length Encoding
(RLE). It implements **locks** and **semaphores** to ensure multiple threads
can safely access a shared unbounded buffer. Additional semaphores are also used
to order the output (print in the same order as the input list).
More information can be found [here](/assignment/Project3_para_zip.pdf).
## Team members and contribution
- Gary Tou ([@garyhtou](https://github.com/garyhtou))
- Castel Villalobos ([@impropernoun](https://github.com/impropernoun))
- Hank Rudolph ([@hankrud](https://github.com/HankRud))
## Design Considerations
### Paralleling the compression
We used multiple threads to compress the file. This allows us to run the
compression algorithm in parallel. In addition, we saved this compressed data in
memory to decrease the amount of time spent in the ordering **semaphores'
critical section**.
### Determine the number of threads to create
Using `get_nprocs()`, we can determine the number of processors available on the
system. This number is then used as the max thread limit (unless the system does
not have multiple cores — which it would then default to 5). The program will
not create more threads than needed (except for the 5 default threads).
### Efficiency of each thread
By **memory mapping** input files, using a **thread pool**, and storing
compressed data in memory until their turn to print, we can efficiently perform
each piece of work in parallel.
### Access the input files efficiently
**Memory mapping** was the way we efficiently accessed the input files. This
allows us to have easier/quicker access to the files. In addition, the memory
mapping occurs in the worker threads. This allows input files to be
read/processed concurrently!
### Coordinating multiple threads
We used a lock to protect shared data (the job queue). A semaphore to prevent
job worker threads from running when the queue is empty. And multiple semaphores
to order the printing output.
### Terminating threads in the thread pool
We created a `kill` boolean in the job struct (this struct is added to the job
queue). Whenever a worker thread receives a new job, it will check the `kill`
boolean. If `kill` is `true`, we killed the thread and exit appropriately.
## Strengths and Weaknesses
Strengths:
- Parallelizes the compression algorithm
- Saves compressed data to memory before printing
- Prevents computation and printing bottleneck
- Faster than `wzip`
- Handles potential system call errors
Weaknesses:
- Only one thread per file
- Uses Run Length Encoding (RLE)