An open API service indexing awesome lists of open source software.

awesome-machine-learning

A curated list of awesome Machine Learning frameworks, libraries and software.
https://github.com/ml13571/awesome-machine-learning

Last synced: 10 days ago
JSON representation

  • Python

    • General-Purpose Machine Learning

      • deap - Evolutionary algorithm framework.
      • pydeep - Deep Learning In Python. **[Deprecated]**
      • mlxtend - A library consisting of useful tools for data science and machine learning tasks.
      • neon - Nervana's [high-performance](https://github.com/soumith/convnet-benchmarks) Python-based Deep Learning framework [DEEP LEARNING]. **[Deprecated]**
      • Neural Networks and Deep Learning - Code samples for my book "Neural Networks and Deep Learning" [DEEP LEARNING].
      • Annoy - Approximate nearest neighbours implementation.
      • TPOT - Tool that automatically creates and optimizes machine learning pipelines using genetic programming. Consider it your personal data science assistant, automating a tedious part of machine learning.
      • pgmpy
      • DIGITS - The Deep Learning GPU Training System (DIGITS) is a web application for training deep learning models.
      • Orange - Open source data visualization and data analysis for novices and experts.
      • milk - Machine learning toolkit focused on supervised classification. **[Deprecated]**
      • TFLearn - Deep learning library featuring a higher-level API for TensorFlow.
      • REP - an IPython-based environment for conducting data-driven research in a consistent and reproducible way. REP is not trying to substitute scikit-learn, but extends it and provides better user experience. **[Deprecated]**
      • rgf_python - Python bindings for Regularized Greedy Forest (Tree) Library.
      • skbayes - Python package for Bayesian Machine Learning with scikit-learn API.
      • fuku-ml - Simple machine learning library, including Perceptron, Regression, Support Vector Machine, Decision Tree and more, it's easy to use and easy to learn for beginners.
      • Xcessiv - A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling.
      • PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
      • PyTorch Lightning - The lightweight PyTorch wrapper for high-performance AI research.
      • skorch - A scikit-learn compatible neural network library that wraps PyTorch.
      • ML-From-Scratch - Implementations of Machine Learning models from scratch in Python with a focus on transparency. Aims to showcase the nuts and bolts of ML in an accessible way.
      • xRBM - A library for Restricted Boltzmann Machine (RBM) and its conditional variants in Tensorflow.
      • stacked_generalization - Implementation of machine learning stacking technique as a handy library in Python.
      • modAL - A modular active learning framework for Python, built on top of scikit-learn.
      • Cogitare
      • Parris - Parris, the automated infrastructure setup tool for machine learning algorithms.
      • Turi Create - Machine learning from Apple. Turi Create simplifies the development of custom machine learning models. You don't have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.
      • mlens - A high performance, memory efficient, maximally parallelized ensemble learning, integrated with scikit-learn.
      • MindsDB - Open Source framework to streamline use of neural networks.
      • StellarGraph - structured (network-structured) data.
      • BentoML
      • MiraiML - time usage.
      • numpy-ML
      • Neuraxle
      • Cornac - A comparative framework for multimodal recommender systems with a focus on models leveraging auxiliary data.
      • Catalyst - High-level utils for PyTorch DL & RL research. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing. Being able to research/develop something new, rather than write another regular train loop.
      • Fastai - High-level wrapper built on the top of Pytorch which supports vision, text, tabular data and collaborative filtering.
      • scikit-multiflow - A machine learning framework for multi-output/multi-label and stream data.
      • Lightwood - A Pytorch based framework that breaks down machine learning problems into smaller blocks that can be glued together seamlessly with objective to build predictive models with one line of code.
      • bayeso - A simple, but essential Bayesian optimization package, written in Python.
      • mljar-supervised - An Automated Machine Learning (AutoML) python package for tabular data. It can handle: Binary Classification, MultiClass Classification and Regression. It provides explanations and markdown reports.
      • evostra - A fast Evolution Strategy implementation in Python.
      • Determined - Scalable deep learning training platform, including integrated support for distributed training, hyperparameter tuning, experiment tracking, and model management.
      • PySyft - A Python library for secure and private Deep Learning built on PyTorch and TensorFlow.
      • OPFython - A Python-inspired implementation of the Optimum-Path Forest classifier.
      • Opytimizer - Python-based meta-heuristic optimization techniques.
      • Gradio - A Python library for quickly creating and sharing demos of models. Debug models interactively in your browser, get feedback from collaborators, and generate public links without deploying anything.
      • Hub - Fastest unstructured dataset management for TensorFlow/PyTorch. Stream & version-control data. Store even petabyte-scale data in a single numpy-like array on the cloud accessible on any machine. Visit [activeloop.ai](https://activeloop.ai) for more info.
      • Synthia - Multidimensional synthetic data generation in Python.
      • ByteHub - An easy-to-use, Python-based feature store. Optimized for time-series data.
      • Backprop - Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
      • River
      • Sklearn-genetic-opt - in callbacks, plotting, remote logging and more.
      • Evidently
      • Streamlit
      • Optuna
      • Deepchecks
      • Shapash
      • Eurybia
      • Colossal-AI - source deep learning system for large-scale model training and inference with high efficiency and low cost.
      • dirty_cat - facilitates machine-learning on dirty, non-curated categories. It provides transformers and encoders robust to morphological variants, such as typos.
      • Upgini - automatically searches through thousands of ready-to-use features from public and community shared data sources and enriches your training dataset with only the accuracy improving features.
      • AutoML-Implementation-for-Static-and-Dynamic-Data-Analytics
      • SKBEL
      • NannyML - deployment model performance without access to targets.
      • cleanlab - centric AI package for data quality and machine learning with messy, real-world data and labels.
      • AutoGluon - Series, and MultiModal Data.
      • PyBroker - Algorithmic Trading with Machine Learning.
      • Frouros
      • CometML - in-class MLOps platform with experiment tracking, model production monitoring, a model registry, and data lineage from training straight through to production.
      • Okrolearn
      • Opik
      • DataComPy - A library to compare Pandas, Polars, and Spark data frames. It provides stats and lets users adjust for match accuracy.
      • DataVisualization - A GitHub Repository Where you can Learn Datavisualizatoin Basics to Intermediate level.
      • Cartopy - Cartopy is a Python package designed for geospatial data processing in order to produce maps and other geospatial data analyses.
      • SciPy - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
      • AutoViz - a-new-tool-for-automated-visualization-ec9c1744a6ad?source=friends_link&sk=c9e9503ec424b191c6096d7e3f515d10">Medium article</a>.
      • Numba - Python JIT (just in time) compiler to LLVM aimed at scientific Python by the developers of Cython and NumPy.
      • Mars - A tensor-based framework for large-scale data computation which is often regarded as a parallel and distributed version of NumPy.
      • igraph - binding to igraph library - General purpose graph library.
      • Pandas - A library providing high-performance, easy-to-use data structures and data analysis tools.
      • Vaex - A high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. Documentation can be found [here](https://vaex.io/docs/index.html).
      • Open Mining - Business Intelligence (BI) in Python (Pandas web interface) **[Deprecated]**
      • PyMC - Markov Chain Monte Carlo sampling toolkit.
      • zipline - A Pythonic algorithmic trading library.
      • PyDy - Short for Python Dynamics, used to assist with workflow in the modelling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.
      • SymPy - A Python library for symbolic mathematics.
      • statsmodels - Statistical modelling and econometrics in Python.
      • astropy - A community Python library for Astronomy.
      • matplotlib - A Python 2D plotting library.
      • bokeh - Interactive Web Plotting for Python.
      • altair - A Python to Vega translator.
      • d3py - A plotting library for Python, based on [D3.js](https://d3js.org/).
      • PyDexter - Simple plotting for Python. Wrapper for D3xterjs; easily render charts in-browser.
      • ggplot - Same API as ggplot2 for R. **[Deprecated]**
      • ggfortify - Unified interface to ggplot2 popular R packages.
      • Kartograph.py - Rendering beautiful SVG maps in Python.
      • PyQtGraph - A pure-python graphics and GUI library built on PyQt4 / PySide and NumPy.
      • pycascading
      • Petrel - Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.
      • Blaze - NumPy and Pandas interface to Big Data.
      • emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
      • windML - A Python Framework for Wind Energy Analysis and Prediction.
      • vispy - GPU-based high-performance interactive OpenGL 2D/3D data visualization library.
      • cerebro2 - based visualization and debugging platform for NuPIC. **[Deprecated]**
      • NuPIC Studio - in-one NuPIC Hierarchical Temporal Memory visualization and debugging super-tool! **[Deprecated]**
      • SparklingPandas
      • Seaborn - A python visualization library based on matplotlib.
      • ipychart - The power of Chart.js in Jupyter Notebook.
      • pastalog - Simple, realtime visualization of neural network training performance.
      • Dora - Tools for exploratory data analysis in Python.
      • Ruffus - Computation Pipeline library for python.
      • SOMPY - Self Organizing Map written in Python (Uses neural networks for data analysis).
      • somoclu - organizing maps: accelerate training on multicore CPUs, GPUs, and clusters, has python API.
      • HDBScan - implementation of the hdbscan algorithm in Python - used for clustering
      • visualize_ML - A python package for data exploration and data analysis. **[Deprecated]**
      • scikit-plot - A visualization library for quick and easy generation of common plots in data analysis and machine learning.
      • Bowtie - A dashboard library for interactive visualizations using flask socketio and react.
      • lime - Lime is about explaining what machine learning classifiers (or models) are doing. It is able to explain any black box classifier, with two or more classes.
      • PyCM - PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters
      • Dash - A framework for creating analytical web applications built on top of Plotly.js, React, and Flask
      • Lambdo - A workflow engine for solving machine learning problems by combining in one analysis pipeline (i) feature engineering and machine learning (ii) model training and prediction (iii) table population and column evaluation via user-defined (Python) functions.
      • TensorWatch - Debugging and visualization tool for machine learning and data science. It extensively leverages Jupyter Notebook to show real-time visualizations of data in running processes such as machine learning training.
      • dowel - A little logger for machine learning research. Output any object to the terminal, CSV, TensorBoard, text logs on disk, and more with just one call to `logger.log()`.
      • MiniGrad
      • Map/Reduce implementations of common ML algorithms - means, alternating least squares), using Python NumPy, and how to then make these implementations scalable using Map/Reduce and Spark.
      • BioPy - Biologically-Inspired and Machine Learning Algorithms in Python. **[Deprecated]**
      • CAEs for Data Assimilation - Convolutional autoencoders for 3D image/field compression applied to reduced order [Data Assimilation](https://en.wikipedia.org/wiki/Data_assimilation).
      • handsonml - Fundamentals of machine learning in python.
      • SVM Explorer - Interactive SVM Explorer, using Dash and scikit-learn
      • pattern_classification
      • thinking stats 2
      • hyperopt
      • 2012-paper-diginorm
      • A gallery of interesting IPython notebooks
      • ipython-notebooks
      • data-science-ipython-notebooks - Continually updated Data Science Python Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, and various command lines.
      • decision-weights
      • Sarah Palin LDA - Topic Modelling the Sarah Palin emails.
      • Diffusion Segmentation - A collection of image segmentation algorithms based on diffusion methods.
      • Scipy Tutorials - SciPy tutorials. This is outdated, check out scipy-lecture-notes.
      • Crab - A recommendation engine library for Python.
      • BayesPy - Bayesian Inference Tools in Python.
      • scikit-learn tutorials - Series of notebooks for learning scikit-learn.
      • sentiment-analyzer - Tweets Sentiment Analyzer
      • sentiment_classifier - Sentiment classifier using word sense disambiguation.
      • group-lasso - Some experiments with the coordinate descent algorithm used in the (Sparse) Group Lasso model.
      • jProcessing - Kanji / Hiragana / Katakana to Romaji Converter. Edict Dictionary & parallel sentences Search. Sentence Similarity between two JP Sentences. Sentiment Analysis of Japanese Text. Run Cabocha(ISO--8859-1 configured) in Python.
      • mne-python-notebooks - IPython notebooks for EEG/MEG data processing using mne-python.
      • Neon Course - IPython notebooks for a complete course around understanding Nervana's Neon.
      • pandas cookbook - Recipes for using Python's pandas library.
      • climin - Optimization library focused on machine learning, pythonic implementations of gradient descent, LBFGS, rmsprop, adadelta and others.
      • Allen Downey’s Data Science Course - Code for Data Science at Olin College, Spring 2014.
      • Allen Downey’s Think Complexity Code - Code for Allen Downey's book Think Complexity.
      • Allen Downey’s Think OS Code - Text and supporting code for Think OS: A Brief Introduction to Operating Systems.
      • Python Programming for the Humanities - Course for Python programming for the Humanities, assuming no prior knowledge. Heavy focus on text processing / NLP.
      • Dive into Machine Learning with Python Jupyter notebook and scikit-learn - "I learned Python by hacking first, and getting serious *later.* I wanted to do this with Machine Learning. If this is your style, join me in getting a bit ahead of yourself."
      • TDB - TensorDebugger (TDB) is a visual debugger for deep learning. It features interactive, node-by-node debugging and visualization for TensorFlow.
      • Introduction to machine learning with scikit-learn - IPython notebooks from Data School's video tutorials on scikit-learn.
      • Practical XGBoost in Python - comprehensive online course about using XGBoost in Python.
      • Introduction to Machine Learning with Python - Notebooks and code for the book "Introduction to Machine Learning with Python"
      • Pydata book - Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media
      • Homemade Machine Learning - Python examples of popular machine learning algorithms with interactive Jupyter demos and math being explained
      • Prodmodel - Build tool for data science pipelines.
      • the-elements-of-statistical-learning - This repository contains Jupyter notebooks implementing the algorithms found in the book and summary of the textbook.
      • Hyperparameter-Optimization-of-Machine-Learning-Algorithms - Code for hyperparameter tuning/optimization of machine learning and deep learning algorithms.
      • Heart_Disease-Prediction - Given clinical parameters about a patient, can we predict whether or not they have heart disease?
      • Flight Fare Prediction - This basically to gauge the understanding of Machine Learning Workflow and Regression technique in specific.
      • Keras Tuner - An easy-to-use, scalable hyperparameter optimization framework that solves the pain points of hyperparameter search.
      • Kinho - Simple API for Neural Network. Better for image processing with CPU/GPU + Transfer Learning.
      • nn_builder - nn_builder is a python package that lets you build neural networks in 1 line
      • NeuralTalk - NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
      • NeuralTalk - NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences. **[Deprecated]**
      • Neuron - Neuron is simple class for time series predictions. It's utilize LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron - Extreme Learning Machine) neural networks learned with Gradient descent or LeLevenberg–Marquardt algorithm. **[Deprecated]**
      • Data Driven Code - Very simple implementation of neural networks for dummies in python without using any libraries, with detailed comments.
      • Machine Learning, Data Science and Deep Learning with Python - LiveVideo course that covers machine learning, Tensorflow, artificial intelligence, and neural networks.
      • Jina AI
      • sequitur
      • Rockpool - A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.
      • Sinabs - A deep learning library for spiking neural networks which is based on PyTorch, focuses on fast training and supports inference on neuromorphic hardware.
      • Tonic - A library that makes downloading publicly available neuromorphic datasets a breeze and provides event-based data transformation/augmentation pipelines.
      • lifelines - lifelines is a complete survival analysis library, written in pure Python
      • Scikit-Survival - scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizing the power of scikit-learn, e.g., for pre-processing or doing cross-validation.
      • Tensorflow-Federated
      • open-solution-home-credit - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Home-Credit-Default-Risk) for [Home Credit Default Risk](https://www.kaggle.com/c/home-credit-default-risk).
      • open-solution-googleai-object-detection - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Google-AI-Object-Detection-Challenge) for [Google AI Open Images - Object Detection Track](https://www.kaggle.com/c/google-ai-open-images-object-detection-track).
      • open-solution-salt-identification - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Salt-Detection) for [TGS Salt Identification Challenge](https://www.kaggle.com/c/tgs-salt-identification-challenge).
      • open-solution-ship-detection - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Ships) for [Airbus Ship Detection Challenge](https://www.kaggle.com/c/airbus-ship-detection).
      • open-solution-data-science-bowl-2018 - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Data-Science-Bowl-2018) for [2018 Data Science Bowl](https://www.kaggle.com/c/data-science-bowl-2018).
      • open-solution-value-prediction - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Santander-Value-Prediction-Challenge) for [Santander Value Prediction Challenge](https://www.kaggle.com/c/santander-value-prediction-challenge).
      • wiki challenge - An implementation of Dell Zhang's solution to Wikipedia's Participation Challenge on Kaggle.
      • kaggle insults - Kaggle Submission for "Detecting Insults in Social Commentary".
      • kaggle_acquire-valued-shoppers-challenge - Code for the Kaggle acquire valued shoppers challenge.
      • kaggle-cifar - Code for the CIFAR-10 competition at Kaggle, uses cuda-convnet.
      • kaggle-blackbox - Deep learning made easy.
      • kaggle-accelerometer - Code for Accelerometer Biometric Competition at Kaggle.
      • kaggle-advertised-salaries - Predicting job salaries from ads - a Kaggle competition.
      • kaggle amazon - Amazon access control challenge.
      • kaggle-bestbuy_big - Code for the Best Buy competition at Kaggle.
      • kaggle-bestbuy_small