https://github.com/bernhard-42/world-indicators-spark
https://github.com/bernhard-42/world-indicators-spark
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/bernhard-42/world-indicators-spark
- Owner: bernhard-42
- Created: 2016-02-17T15:11:42.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-02-17T16:57:16.000Z (over 9 years ago)
- Last Synced: 2025-04-07T18:02:30.138Z (6 months ago)
- Size: 23.4 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
Given the [World Development Indicator Data](https://www.kaggle.com/worldbank/world-development-indicators/downloads/world-development-indicators-release-2016-01-28-06-31-53.zip) from [Kaggle public data](https://www.kaggle.com/worldbank/world-development-indicators)
this [Apache Zeppelin](https://zeppelin.incubator.apache.org/) notebook tries to solve the following tasks:
- Load the data from HDFS (csv with commas in quotes)
- Trasform the data from stacked to unstacked, i.e. Indicators keys as columns instead of rows
- Store data as [ORC](https://orc.apache.org/)
- Relaod data from [ORC](https://orc.apache.org/) and do some simple queries
- Come up with a SQL solution for the same result using the unstacked (original) dataAn Markdown copy of the Zeppelin Notebook can be found in [Code.md](Code.md)