An open API service indexing awesome lists of open source software.

awesome-machine-learning

A curated list of awesome Machine Learning frameworks, libraries and software.
https://github.com/eric-erki/awesome-machine-learning

Last synced: 13 days ago
JSON representation

  • Python

    • General-Purpose Machine Learning

      • jieba - Chinese Words Segmentation Utilities.
      • SnowNLP - A library for processing Chinese text.
      • genius - A Chinese segment base on Conditional Random Field.
      • nut - Natural language Understanding Toolkit.
      • Rosetta - Text processing tools and wrappers (e.g. Vowpal Wabbit)
      • PyNLPl - Python Natural Language Processing Library. General purpose NLP library for Python. Also contains some specific modules for parsing common NLP formats, most notably for [FoLiA](http://proycon.github.io/folia/), but also ARPA language models, Moses phrasetables, GIZA++ alignments.
      • python-ucto - Python binding to ucto (a unicode-aware rule-based tokenizer for various languages).
      • python-frog - Python binding to Frog, an NLP suite for Dutch. (pos tagging, lemmatisation, dependency parsing, NER)
      • python-zpar - Python bindings for [ZPar](https://github.com/frcchang/zpar), a statistical part-of-speech-tagger, constiuency parser, and dependency parser for English.
      • PyStanfordDependencies - Python interface for converting Penn Treebank trees to Stanford Dependencies.
      • Distance - Levenshtein and Hamming distance computation.
      • Fuzzy Wuzzy - Fuzzy String Matching in Python.
      • jellyfish - a python library for doing approximate and phonetic matching of strings.
      • editdistance - fast implementation of edit distance.
      • textacy - higher-level NLP built on Spacy.
      • stanford-corenlp-python - Python wrapper for [Stanford CoreNLP](https://github.com/stanfordnlp/CoreNLP)
      • CLTK - The Classical Language Toolkit.
      • yase - Transcode sentence (or other sequence) to list of word vector .
      • Polyglot - Multilingual text (NLP) processing toolkit.
      • DrQA - Reading Wikipedia to answer open-domain questions.
      • Dedupe - A python library for accurate and scaleable fuzzy matching, record deduplication and entity-resolution.
      • Snips NLU - Natural Language Understanding library for intent classification and entity extraction
      • steppy - > Lightweight, Python library for fast and reproducible machine learning experimentation. Introduces very simple interface that enables clean machine learning pipeline design.
      • auto_ml - Automated machine learning for production and analytics. Lets you focus on the fun parts of ML, while outputting production-ready code, and detailed analytics of your dataset and results. Includes support for NLP, XGBoost, CatBoost, LightGBM, and soon, deep learning.
      • machine learning - automated build consisting of a [web-interface](https://github.com/jeff1evesque/machine-learning#web-interface), and set of [programmatic-interface](https://github.com/jeff1evesque/machine-learning#programmatic-interface) API, for support vector machines. Corresponding dataset(s) are stored into a SQL database, then generated model(s) used for prediction(s), are stored into a NoSQL datastore.
      • Bayesian Methods for Hackers - Book/iPython notebooks on Probabilistic Programming in Python.
      • Featureforge - learn compatible API.
      • scikit-learn - A Python module for machine learning built on top of SciPy.
      • SimpleAI
      • pattern - Web mining module for Python.
      • Pylearn2 - A Machine Learning library based on [Theano](https://github.com/Theano/Theano).
      • keras - High-level neural networks frontend for [TensorFlow](https://github.com/tensorflow/tensorflow), [CNTK](https://github.com/Microsoft/CNTK) and [Theano](https://github.com/Theano/Theano).
      • Lasagne - Lightweight library to build and train neural networks in Theano.
      • hebel - GPU-Accelerated Deep Learning Library in Python.
      • Chainer - Flexible neural network framework.
      • gensim - Topic Modelling for Humans.
      • topik - Topic modelling toolkit.
      • PyBrain - Another Python Machine Learning Library.
      • Brainstorm - Fast, flexible and fun neural networks. This is the successor of PyBrain.
      • Crab - A flexible, fast recommender engine.
      • python-recsys - A Python library for implementing a Recommender System.
      • thinking bayes - Book on Bayesian Analysis.
      • Image-to-Image Translation with Conditional Adversarial Networks - Implementation of image to image (pix2pix) translation from the paper by [isola et al](https://arxiv.org/pdf/1611.07004.pdf).[DEEP LEARNING]
      • Restricted Boltzmann Machines - Restricted Boltzmann Machines in Python. [DEEP LEARNING]
      • Bolt - Bolt Online Learning Toolbox.
      • CoverTree - Python implementation of cover trees, near-drop-in replacement for scipy.spatial.kdtree
      • nilearn - Machine learning for NeuroImaging in Python.
      • neuropredict - Aimed at novice machine learners and non-expert programmers, this package offers easy (no coding needed) and comprehensive machine learning (evaluation and full report of predictive performance WITHOUT requiring you to code) in Python for NeuroImaging and any other type of features. This is aimed at absorbing the much of the ML workflow, unlike other packages like nilearn and pymvpa, which require you to learn their API and code to produce anything useful.
      • Pyevolve - Genetic algorithm framework.
      • breze - Theano based library for deep and recurrent neural networks.
      • pyhsmm - library for approximate unsupervised inference in Bayesian Hidden Markov Models (HMMs) and explicit-duration Hidden semi-Markov Models (HSMMs), focusing on the Bayesian Nonparametric extensions, the HDP-HMM and HDP-HSMM, mostly with weak-limit approximations.
      • SKLL - A wrapper around scikit-learn that makes it simpler to conduct experiments.
      • neurolab - https://github.com/zueve/neurolab
      • Spearmint - Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper: Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Hugo Larochelle and Ryan P. Adams. Advances in Neural Information Processing Systems, 2012.
      • Pebl - Python Environment for Bayesian Learning.
      • python-timbl - A Python extension module wrapping the full TiMBL C++ programming interface. Timbl is an elaborate k-Nearest Neighbours machine learning toolkit.
      • deap - Evolutionary algorithm framework.
      • pydeep - Deep Learning In Python.
      • mlxtend - A library consisting of useful tools for data science and machine learning tasks.
      • neon - Nervana's [high-performance](https://github.com/soumith/convnet-benchmarks) Python-based Deep Learning framework [DEEP LEARNING].
      • Neural Networks and Deep Learning - Code samples for my book "Neural Networks and Deep Learning" [DEEP LEARNING].
      • Annoy - Approximate nearest neighbours implementation.
      • skflow - Simplified interface for TensorFlow, mimicking Scikit Learn.
      • pgmpy
      • DIGITS - The Deep Learning GPU Training System (DIGITS) is a web application for training deep learning models.
      • milk - Machine learning toolkit focused on supervised classification.
      • TFLearn - Deep learning library featuring a higher-level API for TensorFlow.
      • REP - an IPython-based environment for conducting data-driven research in a consistent and reproducible way. REP is not trying to substitute scikit-learn, but extends it and provides better user experience.
      • skbayes - Python package for Bayesian Machine Learning with scikit-learn API.
      • fuku-ml - Simple machine learning library, including Perceptron, Regression, Support Vector Machine, Decision Tree and more, it's easy to use and easy to learn for beginners.
      • Xcessiv - A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling.
      • PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
      • ML-From-Scratch - Implementations of Machine Learning models from scratch in Python with a focus on transparency. Aims to showcase the nuts and bolts of ML in an accessible way.
      • xRBM - A library for Restricted Boltzmann Machine (RBM) and its conditional variants in Tensorflow.
      • stacked_generalization - Implementation of machine learning stacking technic as handy library in Python.
      • Cogitare
      • Parris - Parris, the automated infrastructure setup tool for machine learning algorithms.
      • Turi Create - Machine learning from Apple. Turi Create simplifies the development of custom machine learning models. You don't have to be a machine learning expert to add recommendations, object detection, image classification, image similarity or activity classification to your app.
      • mlens - A high performance, memory efficient, maximally parallelized ensemble learning, integrated with scikit-learn.
      • Open Mining - Business Intelligence (BI) in Python (Pandas web interface)
      • PyMC - Markov Chain Monte Carlo sampling toolkit.
      • zipline - A Pythonic algorithmic trading library.
      • SymPy - A Python library for symbolic mathematics.
      • statsmodels - Statistical modeling and econometrics in Python.
      • bokeh - Interactive Web Plotting for Python.
      • vincent - A Python to Vega translator.
      • d3py - A plotting library for Python, based on [D3.js](https://d3js.org/).
      • PyDexter - Simple plotting for Python. Wrapper for D3xterjs; easily render charts in-browser.
      • ggplot - Same API as ggplot2 for R.
      • ggfortify - Unified interface to ggplot2 popular R packages.
      • Kartograph.py - Rendering beautiful SVG maps in Python.
      • PyQtGraph - A pure-python graphics and GUI library built on PyQt4 / PySide and NumPy.
      • pycascading
      • Petrel - Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.
      • Blaze - NumPy and Pandas interface to Big Data.
      • emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
      • vispy - GPU-based high-performance interactive OpenGL 2D/3D data visualization library.
      • NuPIC Studio - in-one NuPIC Hierarchical Temporal Memory visualization and debugging super-tool!
      • SparklingPandas
      • pastalog - Simple, realtime visualization of neural network training performance.
      • Dora - Tools for exploratory data analysis in Python.
      • Ruffus - Computation Pipeline library for python.
      • SOMPY - Self Organizing Map written in Python (Uses neural networks for data analysis).
      • somoclu - organizing maps: accelerate training on multicore CPUs, GPUs, and clusters, has python API.
      • HDBScan - implementation of the hdbscan algorithm in Python - used for clustering
      • visualize_ML - A python package for data exploration and data analysis.
      • scikit-plot - A visualization library for quick and easy generation of common plots in data analysis and machine learning.
      • Bowtie - A dashboard library for interactive visualizations using flask socketio and react.
      • lime - Lime is about explaining what machine learning classifiers (or models) are doing. It is able to explain any black box classifier, with two or more classes.
      • PyCM - PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters
      • Dash - A framework for creating analytical web applications built on top of Plotly.js, React, and Flask
      • Map/Reduce implementations of common ML algorithms - means, alternating least squares), using Python NumPy, and how to then make these implementations scalable using Map/Reduce and Spark.
      • BioPy - Biologically-Inspired and Machine Learning Algorithms in Python.
      • SVM Explorer - Interactive SVM Explorer, using Dash and scikit-learn
      • pattern_classification
      • thinking stats 2
      • hyperopt
      • 2012-paper-diginorm
      • ipython-notebooks
      • data-science-ipython-notebooks - Continually updated Data Science Python Notebooks: Spark, Hadoop MapReduce, HDFS, AWS, Kaggle, scikit-learn, matplotlib, pandas, NumPy, SciPy, and various command lines.
      • decision-weights
      • Sarah Palin LDA - Topic Modeling the Sarah Palin emails.
      • Diffusion Segmentation - A collection of image segmentation algorithms based on diffusion methods.
      • Scipy Tutorials - SciPy tutorials. This is outdated, check out scipy-lecture-notes.
      • Crab - A recommendation engine library for Python.
      • BayesPy - Bayesian Inference Tools in Python.
      • scikit-learn tutorials - Series of notebooks for learning scikit-learn.
      • sentiment-analyzer - Tweets Sentiment Analyzer
      • sentiment_classifier - Sentiment classifier using word sense disambiguation.
      • group-lasso - Some experiments with the coordinate descent algorithm used in the (Sparse) Group Lasso model.
      • jProcessing - Kanji / Hiragana / Katakana to Romaji Converter. Edict Dictionary & parallel sentences Search. Sentence Similarity between two JP Sentences. Sentiment Analysis of Japanese Text. Run Cabocha(ISO--8859-1 configured) in Python.
      • mne-python-notebooks - IPython notebooks for EEG/MEG data processing using mne-python.
      • Neon Course - IPython notebooks for a complete course around understanding Nervana's Neon.
      • pandas cookbook - Recipes for using Python's pandas library.
      • climin - Optimization library focused on machine learning, pythonic implementations of gradient descent, LBFGS, rmsprop, adadelta and others.
      • Allen Downey’s Data Science Course - Code for Data Science at Olin College, Spring 2014.
      • Allen Downey’s Think Complexity Code - Code for Allen Downey's book Think Complexity.
      • Allen Downey’s Think OS Code - Text and supporting code for Think OS: A Brief Introduction to Operating Systems.
      • Dive into Machine Learning with Python Jupyter notebook and scikit-learn - "I learned Python by hacking first, and getting serious *later.* I wanted to do this with Machine Learning. If this is your style, join me in getting a bit ahead of yourself."
      • TDB - TensorDebugger (TDB) is a visual debugger for deep learning. It features interactive, node-by-node debugging and visualization for TensorFlow.
      • Introduction to machine learning with scikit-learn - IPython notebooks from Data School's video tutorials on scikit-learn.
      • Introduction to Machine Learning with Python - Notebooks and code for the book "Introduction to Machine Learning with Python"
      • Pydata book - Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media
      • NeuralTalk - NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences.
      • Neuron - Neuron is simple class for time series predictions. It's utilize LNU (Linear Neural Unit), QNU (Quadratic Neural Unit), RBF (Radial Basis Function), MLP (Multi Layer Perceptron), MLP-ELM (Multi Layer Perceptron - Extreme Learning Machine) neural networks learned with Gradient descent or LeLevenberg–Marquardt algorithm.
      • Data Driven Code - Very simple implementation of neural networks for dummies in python without using any libraries, with detailed comments.
      • open-solution-home-credit - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Home-Credit-Default-Risk) for [Home Credit Default Risk](https://www.kaggle.com/c/home-credit-default-risk).
      • open-solution-googleai-object-detection - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Google-AI-Object-Detection-Challenge) for [Google AI Open Images - Object Detection Track](https://www.kaggle.com/c/google-ai-open-images-object-detection-track).
      • open-solution-salt-identification - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Salt-Detection) for [TGS Salt Identification Challenge](https://www.kaggle.com/c/tgs-salt-identification-challenge).
      • open-solution-ship-detection - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Ships) for [Airbus Ship Detection Challenge](https://www.kaggle.com/c/airbus-ship-detection).
      • open-solution-data-science-bowl-2018 - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Data-Science-Bowl-2018) for [2018 Data Science Bowl](https://www.kaggle.com/c/data-science-bowl-2018).
      • open-solution-value-prediction - > source code and [experiments results](https://app.neptune.ml/neptune-ml/Santander-Value-Prediction-Challenge) for [Santander Value Prediction Challenge](https://www.kaggle.com/c/santander-value-prediction-challenge).
      • wiki challenge - An implementation of Dell Zhang's solution to Wikipedia's Participation Challenge on Kaggle.
      • kaggle insults - Kaggle Submission for "Detecting Insults in Social Commentary".
      • kaggle_acquire-valued-shoppers-challenge - Code for the Kaggle acquire valued shoppers challenge.
      • kaggle-cifar - Code for the CIFAR-10 competition at Kaggle, uses cuda-convnet.
      • kaggle-blackbox - Deep learning made easy.
      • kaggle-accelerometer - Code for Accelerometer Biometric Competition at Kaggle.
      • kaggle-advertised-salaries - Predicting job salaries from ads - a Kaggle competition.
      • kaggle amazon - Amazon access control challenge.
      • kaggle-bestbuy_big - Code for the Best Buy competition at Kaggle.
      • kaggle-bestbuy_small
      • Kaggle Dogs vs. Cats - Code for Kaggle Dogs vs. Cats competition.
      • Kaggle Galaxy Challenge - Winning solution for the Galaxy Challenge on Kaggle.
      • Kaggle Gender - A Kaggle competition: discriminate gender based on handwriting.
      • Kaggle Merck - Merck challenge at Kaggle.
      • Kaggle Stackoverflow - Predicting closed questions on Stack Overflow.
      • wine-quality - Predicting wine quality.
      • DeepMind Lab - DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. Its primary purpose is to act as a testbed for research in artificial intelligence, especially deep reinforcement learning.
      • Gym - OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms.
      • Serpent.AI - Serpent.AI is a game agent framework that allows you to turn any video game you own into a sandbox to develop AI and machine learning experiments. For both researchers and hobbyists.
      • Universe - Universe is a software platform for measuring and training an AI's general intelligence across the world's supply of games, websites and other applications.
      • ViZDoom - ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research in machine visual learning, and deep reinforcement learning, in particular.
      • Roboschool - Open-source software for robot simulation, integrated with OpenAI Gym.
      • Retro - Retro Games in Gym
      • SLM Lab - Modular Deep Reinforcement Learning framework in PyTorch.
      • NumPy - A fundamental package for scientific computing with Python.
      • matplotlib - A Python 2D plotting library.
      • graphlab-create - A library with various machine learning models (regression, clustering, recommender systems, graph analytics, etc.) implemented on top of a disk-backed DataFrame.
      • Surprise - A scikit for building and analyzing recommender systems.
      • NLTK - A leading platform for building Python programs to work with human language data.
      • TextBlob - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.
      • rgf_python - Python bindings for Regularized Greedy Forest (Tree) Library.
      • pygal - A Python SVG Charts Creator.
      • albumentations - А fast and framework agnostic image augmentation library that implements a diverse set of augmentation techniques. Supports classification, segmentation, detection out of the box. Was used to win a number of Deep Learning competitions at Kaggle, Topcoder and those that were a part of the CVPR workshops.
      • Pebl - Python Environment for Bayesian Learning.
      • Pattern - A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.
      • spaCy - Industrial strength NLP with Python and Cython.
      • steppy-toolkit - > Curated collection of the neural networks, transformers and models that make your machine learning work faster and more effective.
      • Theano - Optimizing GPU-meta-programming code generating array oriented optimizing math compiler in Python.
      • TensorFlow - Open source software library for numerical computation using data flow graphs.
      • yahmm - Hidden Markov Models for Python, implemented in Cython for speed and efficiency.
      • pycascading
      • caravel - A data exploration platform designed to be visual, intuitive, and interactive.
      • NuPIC - Numenta Platform for Intelligent Computing.
      • igraph - binding to igraph library - General purpose graph library.
      • astropy - A community Python library for Astronomy.
      • bqplot - An API for plotting in Jupyter (IPython).
      • Suiron - Machine Learning for RC Cars.
      • loso - Another Chinese segmentation library.