An open API service indexing awesome lists of open source software.

awesome-machine-learning

A curated list of awesome Machine Learning frameworks, libraries and software.
https://github.com/eric-erki/awesome-machine-learning

Last synced: 13 days ago
JSON representation

  • APL

    • naive-apl - Naive Bayesian Classifier implementation in APL.
  • C

    • Speech Recognition

      • HTK - The Hidden Markov Model Toolkit (HTK) is a portable toolkit for building and manipulating hidden Markov models.
      • Darknet - Darknet is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation.
      • Recommender - A C library for product recommendations/suggestions using collaborative filtering (CF).
      • Hybrid Recommender System - A hybrid recommender system based upon scikit-learn algorithms.
      • neonrvm - neonrvm is an open source machine learning library based on RVM technique. It's written in C programming language and comes with Python programming language bindings.
      • CCV - C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library.
      • VLFeat - VLFeat is an open and portable library of computer vision algorithms, which has Matlab toolbox.
  • C++

    • Speech Recognition

      • DLib - DLib has C++ and Python interfaces for face detection and training general object detectors.
      • Distributed Machine learning Tool Kit (DMTK) - A distributed machine learning (parameter server) framework by Microsoft. Enables training models on large data sets across multiple machines. Current tools bundled with it include: LightLDA and Distributed (Multisense) Word Embedding.
      • DLib - A suite of ML tools designed to be easy to imbed in other applications.
      • encog-cpp
      • libfm - A generic approach that allows to mimic most factorization models by feature engineering.
      • PyCUDA - Python interface to CUDA
      • shark - A fast, modular, feature-rich open-source C++ machine learning library.
      • sofia-ml - Suite of fast incremental algorithms.
      • CRFsuite - CRFsuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data.
      • OpenCV - OpenCV has C++, C, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS.
      • VIGRA - VIGRA is a generic cross-platform C++ computer vision and machine learning library for volumes of arbitrary dimensionality with Python bindings.
      • BanditLib - A simple Multi-armed Bandit library.
      • CatBoost - General purpose gradient boosting on decision trees library with categorical features support out of the box. It is easy to install, contains fast inference implementation and supports CPU and GPU (even multi-GPU) computation.
      • CNTK - The Computational Network Toolkit (CNTK) by Microsoft Research, is a unified deep-learning toolkit that describes neural networks as a series of computational steps via a directed graph.
      • DeepDetect - A machine learning API and server written in C++11. It makes state of the art machine learning easy to work with and integrate into existing applications.
      • DyNet - A dynamic neural network library working well with networks that have dynamic structures that change for every training instance. Written in C++ with bindings in Python.
      • Fido - A highly-modular C++ machine learning library for embedded electronics and robotics.
      • LightGBM - Microsoft's fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
      • libfm - A generic approach that allows to mimic most factorization models by feature engineering.
      • proNet-core - A general-purpose network embedding framework: pair-wise representations optimization Network Edit.
      • Shogun - The Shogun Machine Learning Toolbox.
      • Timbl - A software package/C++ library implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification, and IGTree, a decision-tree approximation of IB1-IG. Commonly used for NLP.
      • Warp-CTC - A fast parallel implementation of Connectionist Temporal Classification (CTC), on both CPU and GPU.
      • XGBoost - A parallelized optimized general purpose gradient boosting library.
      • LKYDeepNN - A header-only C++11 Neural Network library. Low dependency, native traditional chinese document.
      • xLearn - A high performance, easy-to-use, and scalable machine learning package, which can be used to solve large-scale machine learning problems. xLearn is especially useful for solving machine learning problems on large-scale sparse data, which is very common in Internet services such as online advertisement and recommender systems.
      • BLLIP Parser - BLLIP Natural Language Parser (also known as the Charniak-Johnson parser).
      • colibri-core - C++ library, command line tools, and Python binding for extracting and working with basic linguistic constructions such as n-grams and skipgrams in a quick and memory-efficient way.
      • CRF++ - Open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data & other Natural Language Processing tasks.
      • CRFsuite - CRFsuite is an implementation of Conditional Random Fields (CRFs) for labeling sequential data.
      • frog - Memory-based NLP suite developed for Dutch: PoS tagger, lemmatiser, dependency parser, NER, shallow parser, morphological analyzer.
      • libfolia - C++ library for the [FoLiA format](http://proycon.github.io/folia/)
      • MeTA - [MeTA : ModErn Text Analysis](https://meta-toolkit.org/) is a C++ Data Sciences Toolkit that facilitates mining big text data.
      • MIT Information Extraction Toolkit - C, C++, and Python tools for named entity recognition and relation extraction
      • ucto - Unicode-aware regular-expression based tokenizer for various languages. Tool and C++ library. Supports FoLiA format.
      • Kaldi - Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0. Kaldi is intended for use by speech recognition researchers.
      • ToPS - This is an objected-oriented framework that facilitates the integration of probabilistic models for sequences over a user defined alphabet.
      • grt - The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, C++ machine learning library designed for real-time gesture recognition.
      • CUDA - This is a fast C++/CUDA implementation of convolutional [DEEP LEARNING]
      • Vowpal Wabbit (VW) - A fast out-of-core learning system.
      • EBLearn - Eblearn is an object-oriented C++ library that implements various machine learning models
      • CXXNET - Yet another deep learning framework with less than 1000 lines core code [DEEP LEARNING]
      • Featuretools - A library for automated feature engineering. It excels at transforming transactional and relational datasets into feature matrices for machine learning using reusable feature engineering "primitives".
      • DLib - A suite of ML tools designed to be easy to imbed in other applications.
      • DSSTNE - A software library created by Amazon for training and deploying deep neural networks using GPUs which emphasizes speed and scale over experimental flexibility.
      • Stan - A probabilistic programming language implementing full Bayesian statistical inference with Hamiltonian Monte Carlo sampling.
      • mlpack - A scalable C++ machine learning library.
  • Clojure

    • Speech Recognition

      • Incanter - Incanter is a Clojure-based, R-like platform for statistical computing and graphics.
      • Clojure-openNLP - Natural Language Processing in Clojure (opennlp).
      • Infections-clj - Rails-like inflection library for Clojure and ClojureScript.
      • Clojush - The Push programming language and the PushGP genetic programming system implemented in Clojure.
      • Infer - Inference and machine learning in Clojure.
      • Clj-ML - A machine learning library for Clojure built on top of Weka and friends.
      • Encog - Clojure wrapper for Encog (v3) (Machine-Learning framework that specializes in neural-nets).
      • Fungp - A genetic programming library for Clojure.
      • Statistiker - Basic Machine Learning algorithms in Clojure.
      • clortex - General Machine Learning library using Numenta’s Cortical Learning Algorithm.
      • comportex - Functionally composable Machine Learning library using Numenta’s Cortical Learning Algorithm.
      • lambda-ml - Simple, concise implementations of machine learning techniques and utilities in Clojure.
      • Incanter - Incanter is a Clojure-based, R-like platform for statistical computing and graphics.
      • PigPen - Map-Reduce for Clojure.
      • Envision - Clojure Data Visualisation library, based on Statistiker and D3.
      • Touchstone - Clojure A/B testing library.
      • DL4CLJ - Clojure wrapper for Deeplearning4j.
  • Common Lisp

    • Speech Recognition

      • cl-online-learning - Online learning algorithms (Perceptron, AROW, SCW, Logistic Regression).
      • cl-random-forest - Implementation of Random Forest in Common Lisp.
      • mgl - Neural networks (boltzmann machines, feed-forward and recurrent nets), Gaussian Processes.
      • mgl-gpr - Evolutionary algorithms.
      • cl-libsvm - Wrapper for the libsvm support vector machine library.
  • Credits

  • Crystal

    • Speech Recognition

      • machine - Simple machine learning algorithm.
  • Elixir

    • Speech Recognition

      • Simple Bayes - A Simple Bayes / Naive Bayes implementation in Elixir.
      • Stemmer - An English (Porter2) stemming implementation in Elixir.
  • Erlang

    • Speech Recognition

      • Disco - Map Reduce in Erlang.
  • Go

    • Speech Recognition

      • SVGo - The Go Language library for SVG generation.
      • go-porterstemmer - A native Go clean room implementation of the Porter Stemming algorithm.
      • paicehusk - Golang implementation of the Paice/Husk Stemming Algorithm.
      • snowball - Snowball Stemmer for Go.
      • go-ngram - In-memory n-gram index with compression.
      • sentences - Golang implementation of Punkt sentence tokenizer.
      • eaopt - An evolutionary optimization library.
      • Go Learn - Machine Learning for Go.
      • go-pr - Pattern recognition package in Go lang.
      • go-ml - Linear / Logistic regression, Neural Networks, Collaborative Filtering and Gaussian Multivariate Distribution.
      • bayesian - Naive Bayesian Classification for Golang.
      • go-galib - Genetic Algorithms library written in Go / Golang.
      • Cloudforest - Ensembles of decision trees in Go/Golang.
      • gobrain - Neural Networks written in Go.
      • GoNN - GoNN is an implementation of Neural Network in Go Language, which includes BPNN, RBF, PCN.
      • go-mxnet-predictor - Go binding for MXNet c_predict_api to do inference with pre-trained model.
      • neat - Plug-and-play, parallel Go framework for NeuroEvolution of Augmenting Topologies (NEAT).
      • go-graph - Graph library for Go/Golang language.
      • RF - Random forests implementation in Go.
      • word-embedding - Word Embeddings: the full implementation of word2vec, GloVe in Go.
      • MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Go, Javascript and more.
      • SVGo - The Go Language library for SVG generation.
      • Glot - Glot is a plotting library for Golang built on top of gnuplot.
  • Haskell

    • Speech Recognition

      • hnn - Haskell Neural Network library.
      • haskell-ml - Haskell implementations of various ML algorithms.
      • HLearn - a suite of libraries for interpreting machine learning models according to their algebraic structure.
      • hopfield-networks - Hopfield Networks for unsupervised learning in Haskell.
      • caffegraph - A DSL for deep neural networks.
      • LambdaNet - Configurable Neural Networks in Haskell.
  • Java

    • Speech Recognition

      • IRIS - [Cortical.io's](http://cortical.io) FREE NLP, Retina API Analysis Tool (written in JavaFX!) - [See the Tutorial Video](https://www.youtube.com/watch?v=CsF4pd7fGF0).
      • Stanford Topic Modeling Toolbox - Topic modeling tools to social scientists and others who wish to perform analysis on datasets.
      • OpenNLP - a machine learning based toolkit for the processing of natural language text.
      • ClearTK - ClearTK provides a framework for developing statistical natural language processing (NLP) components in Java and is built on top of Apache UIMA.
      • AMIDST Toolbox - A Java Toolbox for Scalable Probabilistic Machine Learning.
      • ELKI - Java toolkit for data mining. (unsupervised: clustering, outlier detection etc.)
      • RankLib - RankLib is a library of learning to rank algorithms.
      • WalnutiQ - object oriented model of the human brain.
      • Weka - Weka is a collection of machine learning algorithms for data mining tasks.
      • Twitter Text Java - A Java implementation of Twitter's text processing library.
      • LingPipe - A tool kit for processing text using computational linguistics.
      • ClearNLP - The ClearNLP project provides software and resources for natural language processing. The project started at the Center for Computational Language and EducAtion Research, and is currently developed by the Center for Language and Information Research at Emory University. This project is under the Apache 2 license.
      • CogcompNLP - This project collects a number of core libraries for Natural Language Processing (NLP) developed in the University of Illinois' Cognitive Computation Group, for example `illinois-core-utilities` which provides a set of NLP-friendly data structures and a number of NLP-related utilities that support writing NLP applications, running experiments, etc, `illinois-edison` a library for feature extraction from illinois-core-utilities data structures and many other packages.
      • aerosolve - A machine learning library by Airbnb designed from the ground up to be human friendly.
      • AMIDST Toolbox - A Java Toolbox for Scalable Probabilistic Machine Learning.
      • Datumbox - Machine Learning framework for rapid development of Machine Learning and Statistical applications.
      • H2O - ML engine that supports distributed learning on Hadoop, Spark or your laptop via APIs in R, Python, Scala, REST/JSON.
      • htm.java - General Machine Learning library using Numenta’s Cortical Learning Algorithm.
      • java-deeplearning - Distributed Deep Learning Platform for Java, Clojure, Scala.
      • Mahout - Distributed machine learning.
      • Hydrosphere Mist - a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services.
      • ORYX - Lambda Architecture Framework using Apache Spark and Apache Kafka with a specialization for real-time large-scale machine learning.
      • rapaio - statistics, data mining and machine learning toolbox in Java.
      • SmileMiner - Statistical Machine Intelligence & Learning Engine.
      • Hadoop - Hadoop/HDFS.
      • Onyx - Distributed, masterless, high performance, fault tolerant data processing. Written entirely in Clojure.
      • Spark - Spark is a fast and general engine for large-scale data processing.
      • Impala - Real-time Query for Hadoop.
      • Weka - Weka is a collection of machine learning algorithms for data mining tasks.
      • CMU Sphinx - Open Source Toolkit For Speech Recognition purely based on Java speech recognition library.
      • FlinkML in Apache Flink - Distributed machine learning library in Flink.
      • Stanford Classifier - A classifier is a machine learning tool that will take data items and place them into one of k classes.
      • CoreNLP - Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words.
      • Stanford POS Tagger - A Part-Of-Speech Tagger (POS Tagger).
      • Twitter Text Java - A Java implementation of Twitter's text processing library.
      • Meka - An open source implementation of methods for multi-label classification and evaluation (extension to Weka).
      • Neuroph - Neuroph is lightweight Java neural network framework
      • SystemML - flexible, scalable machine learning (ML) language.
      • Storm - Storm is a distributed realtime computation system.
      • Cortical.io - Retina: an API performing complex NLP operations (disambiguation, classification, streaming text filtering, etc...) as quickly and intuitively as the brain.
      • Stanford Parser - A natural language parser is a program that works out the grammatical structure of sentences.
      • Stanford Name Entity Recognizer - Stanford NER is a Java implementation of a Named Entity Recognizer.
      • Stanford Word Segmenter - Tokenization of raw text is a standard pre-processing step for many NLP tasks.
      • Stanford Phrasal: A Phrase-Based Translation System
      • Stanford English Tokenizer - Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java.
  • Javascript

    • Speech Recognition

      • Twitter-text - A JavaScript implementation of Twitter's text processing library.
      • D3.js
      • dimple
      • amCharts
      • Datamaps - Customizable SVG map/geo visualizations using D3.js.
      • ZingChart - library written on Vanilla JS for big data visualization.
      • Learn JS Data
      • figue - K-means, fuzzy c-means and agglomerative clustering.
      • Machine Learning - Machine learning library for Node.js
      • mil-tokyo - List of several machine learning libraries.
      • Machine Learning - Machine learning library for Node.js
      • Machine Learning - Machine learning library for Node.js
      • Machine Learning - Machine learning library for Node.js
      • natural - General natural language facilities for node.
      • Knwl.js - A Natural Language Processor in JS.
      • NLP Compromise - Natural Language processing in the browser.
      • nlp.js - An NLP library built in node over Natural, with entity extraction, sentiment analysis, automatic language identify, and so more
      • dc.js
      • dimple
      • D3xter - Straight forward plotting built on D3.
      • statkit - Statistics kit for JavaScript.
      • datakit - A lightweight framework for data analysis in JavaScript
      • Z3d - Easily make interactive 3d plots built on Three.js
      • Learn JS Data
      • Clusterfck - Agglomerative hierarchical clustering implemented in Javascript for Node.js and the browser.
      • Clustering.js - Clustering algorithms implemented in Javascript for Node.js and the browser.
      • Gaussian Mixture Model - Unsupervised machine learning with multivariate Gaussian mixture model.
      • Node-fann - FANN (Fast Artificial Neural Network Library) bindings for Node.js
      • Keras.js - Run Keras models in the browser, with GPU support provided by WebGL 2.
      • Kmeans.js - Simple Javascript implementation of the k-means algorithm, for node.js and the browser.
      • LDA.js - LDA topic modeling for Node.js
      • Learning.js - Javascript implementation of logistic regression/c4.5 decision tree
      • machineJS - Automated machine learning, data formatting, ensembling, and hyperparameter optimization for competitions and exploration- just give it a .csv file!
      • Node-SVM - Support Vector Machine for Node.js
      • Brain - Neural networks in JavaScript **[Deprecated]**
      • Bayesian-Bandit - Bayesian bandit implementation for Node and the browser.
      • Synaptic - Architecture-free neural network library for Node.js and the browser.
      • kNear - JavaScript implementation of the k nearest neighbors algorithm for supervised learning.
      • NeuralN - C++ Neural Network library for Node.js. It has advantage on large dataset and multi-threaded training.
      • kalman - Kalman filter for Javascript.
      • shaman - Node.js library with support for both simple and multiple linear regression.
      • ml.js - Machine learning and numerical analysis tools for Node.js and the Browser!
      • Pavlov.js - Reinforcement learning using Markov Decision Processes.