Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jswong65/machine_learning_nanodegree
Projects of Udacity Machine Learning nanodegree
https://github.com/jswong65/machine_learning_nanodegree
machine-learning numpy pandas python scikit-learn scipy
Last synced: 24 days ago
JSON representation
Projects of Udacity Machine Learning nanodegree
- Host: GitHub
- URL: https://github.com/jswong65/machine_learning_nanodegree
- Owner: jswong65
- Created: 2016-10-30T19:42:53.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-26T19:17:54.000Z (almost 7 years ago)
- Last Synced: 2024-10-29T20:06:40.425Z (2 months ago)
- Topics: machine-learning, numpy, pandas, python, scikit-learn, scipy
- Language: HTML
- Homepage:
- Size: 3.45 MB
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Udacity Machine Learning Nanodegree
Project implementation for Udacity Machine Learning Nanodegree. These projects covers different aspects of machine learning, including **Supervised Learning**, **Unsupervised Learning**, **Reinforcement Learning**, **Model Evaluation & Validation**, etc.
Several python data analytic packages are used for the project implementation.
* **Numpy**: Performs numerical operations.
* **Pandas**: Data I/O, manipulation, and visualization.
* **Matplotlib, seaborn**: Data visualization
* **scikit-learn**: Builds, trains, and tests machine learning models.Many datasets used in these projects can be found on [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php)
| Project | Description |
|---------------------------------------------------------------------------------------------------------------------------------||
| [titanic_survival_exploratio](https://github.com/jswong65/Machine_Learning_Nano_Degree/tree/master/titanic_survival_exploratio) | **An Intro project to Machine Learning.** Exploring various variables that can be applied to predict the survival rate of Titanic passengers, including socio-economic class, gender, age, fare, etc. The results implies **gender**, **age**, and **socio-economic class** can be the important variables for prediction. |
| [boston_housing](https://github.com/jswong65/Machine_Learning_Nano_Degree/tree/master/boston_housing) | **Model Evaluation & Validation.** The goal of this project is Predicting Boston Housing Prices.
- Apply **DecisionTreeRegressor** to predict the housing prices.
- Evaluate a model with **R-squared** score, the **learning curve** and the **model complexity curve** - Bias-Variance Trade-Off.
- Use **grid search**, and **K-fold cross-validation** to find the parameters for optimizing a prediction model.
| [finding_donors](https://github.com/jswong65/Machine_Learning_Nanodegree/tree/master/finding_donors) | **Supervised Learning.** The goal of this project is Finding Donors for Charity.
- Data Preprocessing
- Log transformation for skewed continuous variables
- Data normalization for numerical variables (MinMaxScaler)
- One-hot encoding for categorical variables (pandas.get_dummies)
- Train, evaluate, and compare three different classifiers, including **KNeighborsClassifier**, **RandomForestClassifier** (bagging), **GradientBoostingClassifier** (boosting) with both accuracy and F-beta-score.
- Use **grid search** and **cross-validation** to find the parameters for model optimization.
- Use principal component analysis (PCA) to reduce the dimensions of the data
| [customer_segments](https://github.com/jswong65/Machine_Learning_Nano_Degree/tree/master/customer_segments) | **Unsupervised Learning**: The goal of this project is Creating Customer Segments.
- Feature Exploration
- Use **box plot** and **histogram** to examine the distribution of individual variables
- Leverage a **matrix of scatter plot** and a **heatmap** to study correlation between variables
- Apply **multiple coordinate** to investigate relationships between multiple variables
- Data Preprocessing
- Perform feature scaling (using natural logarithm) to reduce the skewness of highly skewed data
- Apply **Tukey's method** to identify the outliers to be removed
- Compare the **K-means clustering** and **Gaussian mixture model** (GMM) for data clustering.
- Apply **GMM** to perform data clustering, and leverage **silhouette coefficient** as well as **Bayesian information criterion** (BIC) to choose the number of clusters.
| [smartcab](https://github.com/jswong65/Machine_Learning_Nano_Degree/tree/master/smartcab) | **Reinforcement Learning**: The goal of this project is Training a Smartcab to Drive
- Apply **Q-Learning** to teach a cab to drive safely and efficiently in a simulation.
- Appropriate features were identified for modeling the Smartcab in the environment (build a state)
- Rewards and punishments were attached to different outcomes to teach the cab to reach the destination as soon as possible without causing an accident.