https://github.com/alexitc/challenge-etl
https://github.com/alexitc/challenge-etl
Last synced: 10 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/alexitc/challenge-etl
- Owner: AlexITC
- Created: 2017-04-13T22:05:22.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2017-04-14T20:58:33.000Z (about 9 years ago)
- Last Synced: 2025-08-04T11:22:29.747Z (11 months ago)
- Language: Scala
- Size: 43.3 MB
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ODA-Hackathon
## ETL Challenge
This project implements a very simple ETL.
There is a main application built with Scala and Spark which can receive a CSV file or a MySQL table and load it into a Hadoop File System.
When the application is run, it detects if the model already exists to perform incremental updates.
## Running
Before being able to run the application you will need to install some dependencies:
- JDK 8
- SBT
### Importing CSV file
Run the following command:
```
sbt "run csv [output-location] [key] [csv-file-path]"
```
Where:
- [output-location] is the location the imported data is stored
- [key] is the name of the model that you are importing.
- [csv-file-path] is the path to the csv file to import.
For example, run the following to import the example `check-in.csv` file:
```
sbt "run csv hdfs://localhost:9000 checkin csv/check-in.csv"
```
### Importing MySQL table
Run the following command:
```
sbt "run mysql [output-location] [table] [host] [port] [database] [user] [password]"
```
Where:
- [output-location] is the location the imported data is stored
- [table] is the name of the table that you are importing (it is handle the same way as key while importing a csv file).
- [host] the mysql host.
- [port] the mysql port.
- [database] the mysql database where the table is located.
- [user] the mysql user.
- [password] the mysql password.
For example, run the following to import the example `person` table:
```
sbt "run mysql hdfs://localhost:9000 person 127.0.1.1 33060 hackathon root root"
```
## Development
In order to do development related tasks, you will need to also install `docker` and `docker-compose`.
To set up the environment, run `docker-compose up` and then `./fill_mysql.sh`.
Use `sbt compile` to compile the application and run the application using root user.