An open API service indexing awesome lists of open source software.

https://github.com/mightypixel/mightylab

A collection of small projects in the field of the data science.
https://github.com/mightypixel/mightylab

concept data-science machine-learning python spark study

Last synced: about 2 months ago
JSON representation

A collection of small projects in the field of the data science.

Awesome Lists containing this project

README

          

# MightyLab
A collection of small projects in the field of data science. Each project is independent and serves mostly to demonstrate a concept and help me understand it better. This repository is mostly for keeping track of my study, but hopefully it can be useful to someone else too. :)

## About
I recently finished Stanford's course of [Machine Learning by Andrew Ng](https://www.coursera.org/learn/machine-learning) in Coursera. Sadly sharing the programming
assignments is against the "Code of Honor" which I do respect. Therefore I will not upload them here - if you are looking for copy-paste material this is not the place. But in order to further study the topics from the course I plan to implement the concepts in Python. I'll not translate the Octave code into Python, but instead I'll try to improve the prediction models and maybe find more interesting datasets.

Besides the Coursera inspired mini-projects I plan to put my notes and homework for [Sofia University FMI course of machine learning with Python](http://fmi.machine-learning.bg/ "Course Website")
and some Big data processing programs with Spark + Scala (some of them related to [Advanced Analytics with Spark](http://shop.oreilly.com/product/0636920035091.do))

And last but not least I'll start my own data-related projects here until they prove to be worthy of their own repository.

I hope this repository can help someone else in their learning path. Either way if you have feedback please tweet me [@mightypixel](https://twitter.com/mightypixel) or just put a PR on GitHub.

## Roadmap
- Fundamentals:
- Linear regression
- [Crime vs Population correlation](https://github.com/MightyPixel/MightyLab/blob/master/Fundamentals/linear_regression-crime_vs_population/population_vs_crimes.ipynb)
- [House price prediction](https://github.com/MightyPixel/MightyLab/blob/master/Fundamentals/linear_regression-house_sales/house_sales.ipynb)
- Logistic regression
- [Exam Scores](/Fundamentals/logistic_regression/Logistic\ Regression.ipynb)
- Multi-class classification with neural network
- SVM
- K-Means clustering and PCA
- Anomaly Detection
- Recommendation systems
- Dataset exploration:
- [Titanic](https://github.com/MightyPixel/MightyLab/blob/master/Fundamentals/logistic_regression-titanic/Titanic.ipynb)
- Exoplanet Hunting in Deep Space (TODO: https://www.kaggle.com/keplersmachines/kepler-labelled-time-series-data)
- Uber Drives (TODO: https://www.kaggle.com/zusmani/uberdrives)
- [Big data with Spark](./Analysis/README.md)
- [word count (aka hello world in spark)](https://github.com/MightyPixel/MightyLab/blob/master/wikipedia/src/main/scala/com/oangelov/wikipedia/wikipedia/WordCount.scala)
- [Wikipedia](./Analysis/Wikipedia.scala)
- [stackoverflow](./Analysis/stackoverflow.scala)
- [linkage](./Analysis/linkage.scala)