Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/jswong65/machine_learning_nanodegree

Projects of Udacity Machine Learning nanodegree
https://github.com/jswong65/machine_learning_nanodegree

machine-learning numpy pandas python scikit-learn scipy

Last synced: 24 days ago
JSON representation

Projects of Udacity Machine Learning nanodegree

Awesome Lists containing this project

README

        

# Udacity Machine Learning Nanodegree

Project implementation for Udacity Machine Learning Nanodegree. These projects covers different aspects of machine learning, including **Supervised Learning**, **Unsupervised Learning**, **Reinforcement Learning**, **Model Evaluation & Validation**, etc.

Several python data analytic packages are used for the project implementation.
* **Numpy**: Performs numerical operations.
* **Pandas**: Data I/O, manipulation, and visualization.
* **Matplotlib, seaborn**: Data visualization
* **scikit-learn**: Builds, trains, and tests machine learning models.

Many datasets used in these projects can be found on [UC Irvine Machine Learning Repository](https://archive.ics.uci.edu/ml/index.php)

| Project | Description |
|---------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [titanic_survival_exploratio](https://github.com/jswong65/Machine_Learning_Nano_Degree/tree/master/titanic_survival_exploratio) | **An Intro project to Machine Learning.** Exploring various variables that can be applied to predict the survival rate of Titanic passengers, including socio-economic class, gender, age, fare, etc. The results implies **gender**, **age**, and **socio-economic class** can be the important variables for prediction. |
| [boston_housing](https://github.com/jswong65/Machine_Learning_Nano_Degree/tree/master/boston_housing) | **Model Evaluation & Validation.** The goal of this project is Predicting Boston Housing Prices.


  • Apply **DecisionTreeRegressor** to predict the housing prices.
  • Evaluate a model with **R-squared** score, the **learning curve** and the **model complexity curve** - Bias-Variance Trade-Off.
  • Use **grid search**, and **K-fold cross-validation** to find the parameters for optimizing a prediction model.

|
| [finding_donors](https://github.com/jswong65/Machine_Learning_Nanodegree/tree/master/finding_donors) | **Supervised Learning.** The goal of this project is Finding Donors for Charity.
  • Data Preprocessing
    • Log transformation for skewed continuous variables
    • Data normalization for numerical variables (MinMaxScaler)
    • One-hot encoding for categorical variables (pandas.get_dummies)
  • Train, evaluate, and compare three different classifiers, including **KNeighborsClassifier**, **RandomForestClassifier** (bagging), **GradientBoostingClassifier** (boosting) with both accuracy and F-beta-score.
  • Use **grid search** and **cross-validation** to find the parameters for model optimization.
  • Use principal component analysis (PCA) to reduce the dimensions of the data
|
| [customer_segments](https://github.com/jswong65/Machine_Learning_Nano_Degree/tree/master/customer_segments) | **Unsupervised Learning**: The goal of this project is Creating Customer Segments.
  • Feature Exploration
    • Use **box plot** and **histogram** to examine the distribution of individual variables
    • Leverage a **matrix of scatter plot** and a **heatmap** to study correlation between variables
    • Apply **multiple coordinate** to investigate relationships between multiple variables
  • Data Preprocessing
    • Perform feature scaling (using natural logarithm) to reduce the skewness of highly skewed data
    • Apply **Tukey's method** to identify the outliers to be removed
  • Compare the **K-means clustering** and **Gaussian mixture model** (GMM) for data clustering.
  • Apply **GMM** to perform data clustering, and leverage **silhouette coefficient** as well as **Bayesian information criterion** (BIC) to choose the number of clusters.
|
| [smartcab](https://github.com/jswong65/Machine_Learning_Nano_Degree/tree/master/smartcab) | **Reinforcement Learning**: The goal of this project is Training a Smartcab to Drive
  • Apply **Q-Learning** to teach a cab to drive safely and efficiently in a simulation.
  • Appropriate features were identified for modeling the Smartcab in the environment (build a state)
  • Rewards and punishments were attached to different outcomes to teach the cab to reach the destination as soon as possible without causing an accident.