Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/suvoooo/machine_learning
Some fundamental machine learning and data-analysis techniques are explained through realistic examples.
https://github.com/suvoooo/machine_learning
machine-learning pandas python3 seaborn sklearn
Last synced: 3 days ago
JSON representation
Some fundamental machine learning and data-analysis techniques are explained through realistic examples.
- Host: GitHub
- URL: https://github.com/suvoooo/machine_learning
- Owner: suvoooo
- License: other
- Created: 2018-09-19T03:30:27.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-09-18T14:55:05.000Z (4 months ago)
- Last Synced: 2025-01-13T05:03:30.623Z (3 days ago)
- Topics: machine-learning, pandas, python3, seaborn, sklearn
- Language: Jupyter Notebook
- Homepage:
- Size: 52.3 MB
- Stars: 120
- Watchers: 9
- Forks: 202
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
## Machine Learning and Data Analysis
### This repo contains introduction and examples of some of the most important machine learning and data-analysis techniques.
#### Filenames are preceded by DDMMYY. For descriptions and more check the Wiki Page.
#### Dedicated _Deep Learning Repository_ similar to this is [here](https://github.com/suvoooo/Learn-TensorFlow).
----------------------------------------------------------------------------------------------------------------------------#### Libraries
![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54) ![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=for-the-badge&logo=numpy&logoColor=white) ![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white) ![scikit-learn](https://img.shields.io/badge/scikit--learn-%23F7931E.svg?style=for-the-badge&logo=scikit-learn&logoColor=white) ![TensorFlow](https://img.shields.io/badge/TensorFlow-%23FF6F00.svg?style=for-the-badge&logo=TensorFlow&logoColor=white) ![SciPy](https://img.shields.io/badge/SciPy-%230C55A5.svg?style=for-the-badge&logo=scipy&logoColor=%white) ![pymc3](https://drive.google.com/uc?export=view&id=1oi-5--D8kcgJdVV_GAI-pZq-ZKr0STOX)-----------------------------------------------------------------------------------------------------------------------------------------
*PCA_Muller.py 190818:* Principal component analysis example with breast cancer data-set.
*270918: RidgeandLin.py, LassoandLin.py:* Lasso and Ridge regression examples.
*081018: bank.csv*, data set of selling products of a portuguese company to random customers over phone call(s). Data-set description is available [here](http://archive.ics.uci.edu/ml/datasets/Bank+Marketing).
*161018: gender_purchase.csv*, data-set of two columns describing customers buying a product depending on gender.
*111118: winequality-red.csv*, red wine data set, where the output is the quality column which ranges from 0 to 10.
*121118: pipelineWine.py*, A simple example of applying pipeline and gridsearchCV together using the red wine data.
*24112018: lagmult.py*, This program just demonstrate a simple constrained optimization problem using figures.
*11122018: Consumer_Complaints_short.csv*, 3 columns describing the complaints, product_label and category. Complete file can be obtained from [Govt.data](https://catalog.data.gov/dataset/consumer-complaint-database/resource/2f297213-7198-4be1-af1e-2d2623e7f6e9).
*13122018: Text-classification_compain_suvo.py*, Classify the consumer complaints data, which is already described above.
1912018: SVMdemo.py*, this program shows the effect of using RBF kernel to map from 2d space to 3d space. Animation requires ffmpeg in unix system.
*05032019: IBM_Python_Web_Scrapping.ipynb*, Deals with basic web scrapping, string handling, image manipulation.
*06042019: datacleaning*, Folder containing files and images related to data cleaning with pandas.
*08062010: DBSCAN_Complete*, Folder containing files and images related to application of DBSCAN algorithm to cluster Weather Stations in Canada.
*13072019: SVM_Decision_Boundary*, Pipeline + GridSearchCV were performed to find best-fit parameters for SVM and then decision function contours of SVM classifier for binary classification are plotted.
*28122019: DecsTree*, Folder contains notebook using a decision tree classifier on the [Bank Marketing Data-Set](http://archive.ics.uci.edu/ml/datasets/Bank+Marketing).
*07032020: Conjugate Prior*, Folder contains a notebook where concept of conjugate prior is discussed including an introduction to [PyMC3](https://docs.pymc.io/).
*29052020: ExMax_Algo*, Folder contains a notebook completely explaining the Expectation Maximization algorithm.
*11092020: AdaptiveLoss.ipynb*, File contains description and a simple implemetation of robust and adaptive loss function. [Original Paper by J. Barron](https://arxiv.org/pdf/1701.03077.pdf). More details on [TDS](https://medium.com/@saptashwa/the-most-awesome-loss-function-172ffc106c99).
*31092020: pima_diabetes.ipynb*, file contains description of data preparation and choosing best machine learning algorithm for binary classification task.
Little more details on [kaggle kernel](https://www.kaggle.com/suvoooo/eda-and-choosing-best-classifier-on-pima-diabetes).*15112020: terrorism_kaggle.ipynb*, Notebook contains elaborate examples on how to think about problems and interpret large scale data using [Global Terrorism Database](https://www.kaggle.com/START-UMD/gtd). Apart from Pandas Groupby, Crosstab methods I have also used Folium, Basemap libraries for visualizing Leaflet map and 2D data on maps respectively. More on [The Startup](https://medium.com/swlh/practical-data-analysis-using-pandas-global-terrorism-database-20b29009adad).
*15022021: FocalLoss_Ex.ipynb*, Notebook contains explanation on detail of how Focal Loss works. Please read the original [Focal Loss paper](https://arxiv.org/abs/1708.02002). Example of implementing Focal Loss using Tensorflow is also shown. For more detail check the post on [TDS](https://towardsdatascience.com/a-loss-function-suitable-for-class-imbalanced-data-focal-loss-af1702d75d75).
*19062021: Augly_Try.ipynb*, Notebook contains examples of image augmentation using [Facebook's Augly](https://ai.facebook.com/blog/augly-a-new-data-augmentation-library-to-help-build-more-robust-ai-models/) Library. For more detail check the notebook and [TDS](https://towardsdatascience.com/facebook-just-launched-the-coolest-augmentation-library-augly-3910c05db505) post.
*24122021: NB_LogisticReg.ipynb*, Notebook clearly explains connection between Gaussian Naive Bayes and Logistic Regression and determine parameters of Logistic Regression starting from GNB. The notebook is self-explanatory but you can also check the [TDS post](https://towardsdatascience.com/connecting-naive-bayes-and-logistic-regression-binary-classification-ce69e527157f).
------------------------
## License
Distributed under Apache License. Read `LICENSE.md` for detail.
-----------------------------
## Contacts[Saptashwa](https://www.linkedin.com/in/saptashwa/).