https://github.com/randikabanura/bdat_assignment
https://github.com/randikabanura/bdat_assignment
Last synced: 6 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/randikabanura/bdat_assignment
- Owner: randikabanura
- License: mit
- Created: 2023-02-25T16:39:48.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-03-04T05:15:31.000Z (over 2 years ago)
- Last Synced: 2025-03-23T18:52:50.592Z (3 months ago)
- Language: HiveQL
- Size: 2.89 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# BDAT Assignment (Mapreduce vs Spark)
## Dataset
According to a 2010 report made by the US Federal Aviation Administration,
the economic price of domestic flight delays entails a yearly cost of 32.9 billion dollars
to passengers, airlines and other parts of the economy. More than half of that amount comes
from the pockets of passengers who not only lose time waiting for their planes to leave, but
they also miss connecting flights, spend money on food and have to sleep on hotel rooms while they're stranded.## Mapreduce via HiveQL
With hive installed hql script can be run with the following arguments:
```shell
--hivevar delay_type_col_name= // (CarrierDelay, etc.)
--hiveconf hive.session.id=calculate-flight-delay--1 // (CarrierDelay, etc.)
--hiveconf hive.execution.engine=mr
```## Spark
With spark installed python script can be run with following arguments:
```shell
--data_source
--output_uri
--delay_type_col_name // (CarrierDelay, etc.)
--iterations 1
```## Comparison
See the below table for details on time consumption with queries and iterations.
## Presentation and Demo
Can check out the presentation on Mapreduce vs Spark and how each task is executed with the following video.
[Watch the video](https://drive.google.com/file/d/10x7jTuetRrKrgC8gFRyjz__U_6FlX7qn/view?usp=share_link)
## Author
Name: [Banura Randika Perera](https://github.com/randikabanura)
Linkedin: [randika-banura](https://www.linkedin.com/in/randika-banura/)
Email: [[email protected]](mailto:[email protected])## Show your support
Please ⭐️ this repository if this project helped you!
## License
See [LICENSE](LICENSE) © [randikabanura](https://github.com/randikabanura/)