https://github.com/bernhard-42/world-indicators-spark

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/bernhard-42/world-indicators-spark
Owner: bernhard-42
Created: 2016-02-17T15:11:42.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2016-02-17T16:57:16.000Z (over 9 years ago)
Last Synced: 2025-04-07T18:02:30.138Z (6 months ago)
Size: 23.4 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

          Given the [World Development Indicator Data](https://www.kaggle.com/worldbank/world-development-indicators/downloads/world-development-indicators-release-2016-01-28-06-31-53.zip) from [Kaggle public data](https://www.kaggle.com/worldbank/world-development-indicators)

this [Apache Zeppelin](https://zeppelin.incubator.apache.org/) notebook tries to solve the following tasks:

- Load the data from HDFS (csv with commas in quotes)

- Trasform the data from stacked to unstacked, i.e. Indicators keys as columns instead of rows

- Store data as [ORC](https://orc.apache.org/)

- Relaod data from [ORC](https://orc.apache.org/) and do some simple queries

- Come up with a SQL solution for the same result using the unstacked (original) data

An Markdown copy of the Zeppelin Notebook can be found in [Code.md](Code.md)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bernhard-42/world-indicators-spark

Awesome Lists containing this project

README