Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/suvoooo/machine_learning

Some fundamental machine learning and data-analysis techniques are explained through realistic examples.
https://github.com/suvoooo/machine_learning

machine-learning pandas python3 seaborn sklearn

Last synced: 3 days ago
JSON representation

Some fundamental machine learning and data-analysis techniques are explained through realistic examples.

Awesome Lists containing this project

README

        

## Machine Learning and Data Analysis

### This repo contains introduction and examples of some of the most important machine learning and data-analysis techniques.
#### Filenames are preceded by DDMMYY. For descriptions and more check the Wiki Page.
#### Dedicated _Deep Learning Repository_ similar to this is [here](https://github.com/suvoooo/Learn-TensorFlow).
----------------------------------------------------------------------------------------------------------------------------

#### Libraries
![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54) ![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=for-the-badge&logo=numpy&logoColor=white) ![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white) ![scikit-learn](https://img.shields.io/badge/scikit--learn-%23F7931E.svg?style=for-the-badge&logo=scikit-learn&logoColor=white) ![TensorFlow](https://img.shields.io/badge/TensorFlow-%23FF6F00.svg?style=for-the-badge&logo=TensorFlow&logoColor=white) ![SciPy](https://img.shields.io/badge/SciPy-%230C55A5.svg?style=for-the-badge&logo=scipy&logoColor=%white) ![pymc3](https://drive.google.com/uc?export=view&id=1oi-5--D8kcgJdVV_GAI-pZq-ZKr0STOX)

-----------------------------------------------------------------------------------------------------------------------------------------

*PCA_Muller.py 190818:* Principal component analysis example with breast cancer data-set.

*270918: RidgeandLin.py, LassoandLin.py:* Lasso and Ridge regression examples.

*081018: bank.csv*, data set of selling products of a portuguese company to random customers over phone call(s). Data-set description is available [here](http://archive.ics.uci.edu/ml/datasets/Bank+Marketing).

*161018: gender_purchase.csv*, data-set of two columns describing customers buying a product depending on gender.

*111118: winequality-red.csv*, red wine data set, where the output is the quality column which ranges from 0 to 10.

*121118: pipelineWine.py*, A simple example of applying pipeline and gridsearchCV together using the red wine data.

*24112018: lagmult.py*, This program just demonstrate a simple constrained optimization problem using figures.

*11122018: Consumer_Complaints_short.csv*, 3 columns describing the complaints, product_label and category. Complete file can be obtained from [Govt.data](https://catalog.data.gov/dataset/consumer-complaint-database/resource/2f297213-7198-4be1-af1e-2d2623e7f6e9).

*13122018: Text-classification_compain_suvo.py*, Classify the consumer complaints data, which is already described above.

1912018: SVMdemo.py*, this program shows the effect of using RBF kernel to map from 2d space to 3d space. Animation requires ffmpeg in unix system.

*05032019: IBM_Python_Web_Scrapping.ipynb*, Deals with basic web scrapping, string handling, image manipulation.

*06042019: datacleaning*, Folder containing files and images related to data cleaning with pandas.

*08062010: DBSCAN_Complete*, Folder containing files and images related to application of DBSCAN algorithm to cluster Weather Stations in Canada.

*13072019: SVM_Decision_Boundary*, Pipeline + GridSearchCV were performed to find best-fit parameters for SVM and then decision function contours of SVM classifier for binary classification are plotted.

*28122019: DecsTree*, Folder contains notebook using a decision tree classifier on the [Bank Marketing Data-Set](http://archive.ics.uci.edu/ml/datasets/Bank+Marketing).

*07032020: Conjugate Prior*, Folder contains a notebook where concept of conjugate prior is discussed including an introduction to [PyMC3](https://docs.pymc.io/).

*29052020: ExMax_Algo*, Folder contains a notebook completely explaining the Expectation Maximization algorithm.

*11092020: AdaptiveLoss.ipynb*, File contains description and a simple implemetation of robust and adaptive loss function. [Original Paper by J. Barron](https://arxiv.org/pdf/1701.03077.pdf). More details on [TDS](https://medium.com/@saptashwa/the-most-awesome-loss-function-172ffc106c99).

*31092020: pima_diabetes.ipynb*, file contains description of data preparation and choosing best machine learning algorithm for binary classification task.
Little more details on [kaggle kernel](https://www.kaggle.com/suvoooo/eda-and-choosing-best-classifier-on-pima-diabetes).

*15112020: terrorism_kaggle.ipynb*, Notebook contains elaborate examples on how to think about problems and interpret large scale data using [Global Terrorism Database](https://www.kaggle.com/START-UMD/gtd). Apart from Pandas Groupby, Crosstab methods I have also used Folium, Basemap libraries for visualizing Leaflet map and 2D data on maps respectively. More on [The Startup](https://medium.com/swlh/practical-data-analysis-using-pandas-global-terrorism-database-20b29009adad).

*15022021: FocalLoss_Ex.ipynb*, Notebook contains explanation on detail of how Focal Loss works. Please read the original [Focal Loss paper](https://arxiv.org/abs/1708.02002). Example of implementing Focal Loss using Tensorflow is also shown. For more detail check the post on [TDS](https://towardsdatascience.com/a-loss-function-suitable-for-class-imbalanced-data-focal-loss-af1702d75d75).

*19062021: Augly_Try.ipynb*, Notebook contains examples of image augmentation using [Facebook's Augly](https://ai.facebook.com/blog/augly-a-new-data-augmentation-library-to-help-build-more-robust-ai-models/) Library. For more detail check the notebook and [TDS](https://towardsdatascience.com/facebook-just-launched-the-coolest-augmentation-library-augly-3910c05db505) post.

*24122021: NB_LogisticReg.ipynb*, Notebook clearly explains connection between Gaussian Naive Bayes and Logistic Regression and determine parameters of Logistic Regression starting from GNB. The notebook is self-explanatory but you can also check the [TDS post](https://towardsdatascience.com/connecting-naive-bayes-and-logistic-regression-binary-classification-ce69e527157f).

------------------------

## License

Distributed under Apache License. Read `LICENSE.md` for detail.

-----------------------------
## Contacts

[Saptashwa](https://www.linkedin.com/in/saptashwa/).