Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gaussalgo/mlp_2017_workshop_hadoop
https://github.com/gaussalgo/mlp_2017_workshop_hadoop
Last synced: 13 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/gaussalgo/mlp_2017_workshop_hadoop
- Owner: gaussalgo
- Created: 2017-04-20T21:43:18.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-06-01T10:51:39.000Z (over 7 years ago)
- Last Synced: 2024-11-08T12:34:35.627Z (2 months ago)
- Size: 1.73 MB
- Stars: 1
- Watchers: 1
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MLP 2017 - Advanced data analysis on Hadoop clusters workshop
This repository holds the materials for the Machine Learning Prague 2017 workshop created by Gauss Algorithmic, focusing on methods and techniques of advanced data analysis we've used in enterprise environments.
## Goal of the workshop
The goal of this workshop is to give attendees a blueprint for building an end-to-end enterprise-ready ML solution and demonstrate its usage on typical ML corporate use cases (telco, digital marketing).
## Speakers & mentors
Johnson Darkwah - Big Data Solution Architect - Gauss Algorithmic - [email protected]
Karel Vaculik - Data Scientist - Gauss Algorithmic
Jiri Polcar - Chief Data Scientist - Gauss Algorithmic
Balazs Gaspar - Pre-sales Engineer - Cloudera
## Setup
To successfully run the workshop, we suggest to fork this repo, then clone your fork to a local machine or directly to your cloud instances. If you come across any mistakes, then don't hesitate to come to us or open an issue on GitHub repo.
## Workshop assumptions
The workshop material assumes you have knowledge and experience sufficient to:
- Preparing a Linux platform for production use (centOS)
- Python and/or Scala programming skills
- Understanding you cloud provider environment## Workshop topics
* Basics of production Hadoop ecosystems.
* Challenges of production data science work.
* Architecture and other concepts.
* Cluster installation.
* [Telco churn use case](https://github.com/gaussalgo/MLP_2017_workshop)