An open API service indexing awesome lists of open source software.

awesome-python-data-science

A curated list of Python libraries used for data science.
https://github.com/thomasjpfan/awesome-python-data-science

Last synced: 13 days ago
JSON representation

  • AutoML

    • Nevergrad - Gradient-free optimization.
    • featuretools - Automated feature engineering.
    • auto-sklearn - Automated machine learning.
    • tpot - Automated machine learning.
    • auto_ml - Automated machine learning.
    • MLBox - Automated Machine Learning python library.
    • devol - Automated deep neural network design via genetic programming.
    • skll - SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
    • autokeras - Automated machine learning in Keras.
    • SMAC3 - Sequential Model-based Algorithm Configuration.
  • Data Gathering

    • Ranking/Recommender

      • gain - Web crawling framework based on asyncio.
      • MechanicalSoup - A Python library for automating interaction with websites.
      • Pandarallel - Parallel pandas.
      • parse - Parse strings using a specification based on the Python format() syntax.
      • CleverCSV - CleverCSV is a Python package for handling messy CSV files
  • Deep Learning Frameworks

    • tensorlayer - A Deep Learning and Reinforcement Learning Library for Researchers and Engineers.
    • Tensorflow - DL Framework.
  • Deep Learning Projects

    • tensorflow-wavenet - DeepMind's WaveNet.
    • DeepRecommender - Recommender systems.
    • DrQA - Reading Wikipedia to Answer Open-Domain Questions.
    • vqa.pytorch - Visual Question Answering in Pytorch.
    • Half-Life Regression - Model for spaced repetition practice.
    • learning-to-learn - Learning to Learn in Tensorflow.
    • capsule-networks - A PyTorch implementation of the NIPS 2017 paper "Dynamic Routing Between Capsules".
    • Mask_RCNN - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow.
    • lightnet - Bringing pjreddie's DarkNet out of the shadows.
    • pytorch-openai-transformer-lm - OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI.
    • maskrcnn-benchmark - Fast, modular reference implementation of Semantic Segmentation and Object Detection algorithm in PyTorch.
    • LovaszSoftmax - Lovász-Softmax loss.
    • ludwing - Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.
  • Deep Learning Tools

    • lightly - Lightly is a computer vision framework for self-supervised learning.
    • TorchDrift - TorchDrift is a data and concept drift library for PyTorch.
    • TorchDrift - TorchDrift is a data and concept drift library for PyTorch.
    • Edward - Probabilistic programming language in TensorFlow.
    • pomegranate - Probabilistic modelling.
    • DLTK - Deep Learning Toolkit for Medical Image Analysis.
    • sonnet - TensorFlow-based neural network library.
    • rasa_core - Dialogue engine.
    • luminoth - Computer Vision.
    • allennlp - NLP Research library.
    • spotlight - Pytorch Recommender framework.
    • tensorforce - TensorFlow library for applied reinforcement learning.
    • keras-vis - Neural network visualization toolkit for keras.
    • hyperas - Keras + Hyperopt.
    • tensorboard_logger - Log TensorBoard events without touching TensorFlow.
    • foolbox - Python toolbox to create adversarial examples that fool neural networks.
    • pytorch/vision - Datasets, Transforms and Models specific to Computer Vision.
    • gluon-nlp - NLP made easy.
    • pytorch/ignite - High-level library to help with training neural networks in PyTorch.
    • Netron - Visualizer for deep learning and machine learning models.
    • gpytorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch.
    • tensorly - Tensor Learning in Python.
    • einops - Deep learning operations reinvented.
    • hiddenlayer - Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.
    • pytorch-lightning - The lightweight PyTorch wrapper.
    • lightly - Lightly is a computer vision framework for self-supervised learning.
    • tensorboard-pytorch - Tensorboard for pytorch.
    • segmentation_models.pytorch - Segmentation models with pretrained backbones.
  • Deployment

    • Ranking/Recommender

      • evidently - Evidently helps evaluate machine learning models during validation and monitor them in production.
      • onnx - Open Neutral Network Exchange.
      • lore - Lore makes machine learning approachable for Software Engineers and maintainable for Machine Learning Researchers.
      • kubeflow - Machine Learning Toolkit for Kubernetes.
      • airflow - ETL.
      • mlflow - Open source platform for the complete machine learning lifecycle.
      • sklearn-porter - Transpile trained scikit-learn estimators.
      • sklearn-compiledtrees - Compiled Decision Trees for scikit-learn.
  • Exploration

    • fitter - simple class to identify the distribution from which a data samples is generated from.
    • Dora - Exploratory data analysis.
    • mlxtend - A library of extension and helper modules for Python's data analysis and machine learning libraries.
    • yellowbrick - Visual analysis and diagnostic tools.
    • sklearn-evaluation - scikit-learn model evaluation.
    • missingno - Missing data visualization.
    • hypertools - Gaining geometric insights into high-dimensional data.
    • scikit-plot - Plotting functionality to scikit-learn objects.
    • elih - Explain Machine Learning.
    • kmeans_smote - Oversampling for imbalanced learning based on k-means and SMOTE.
    • pyUpSet - UpSet suite of visualisation methods.
    • lime - Explaining the predictions of any machine learning classifier.
    • SauceCat/PDPbox - Partial dependence plot toolbox.
    • eli5 - Debug machine learning classifiers and explain their predictions.
    • rfpimp - Permutation and drop-column importance for scikit-learn random forests.
    • pypeln - Concurrent data pipelines made easy.
    • pycm - Multi-class confusion matrix library in Python.
    • great_expectations - Always know what to expect from your data.
    • alibi - Algorithms for monitoring and explaining machine learning models.
    • InterpretML - Fit interpretable models. Explain blackbox machine learning.
    • cleanlab - Finding label errors in datasets and learning with noisy labels.
    • dtale - Flask/React client for visualizing pandas data structures
    • dabl - Data Analysis Baseline Library
    • XAI - XAI - An eXplainability toolbox for machine learning
    • explainerdashboard - This package makes it convenient to quickly deploy a dashboard web app that explains the workings of a (scikit-learn compatible) machine learning model.
    • alibi-detect - Open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series.
    • Skater - Model Agnostic Interpretation.
    • pandas-profiling - Profiling reports for pandas DataFrame objects.
    • shap - A unified approach to explain the output of any machine learning model.
  • Feature Extraction

    • Audio

      • python_speech_features - Speech features.
      • speechpy - A Library for Speech Processing and Recognition.
      • magenta - Music and Art Generation with Machine Intelligence.
      • librosa - Audio and music analysis.
      • pydub - Manipulate audio with a simple and easy high level interface.
      • pytorch/audio - simple audio I/O for pytorch.
    • General Feature Extraction

      • dirty_cat - Encoding methods for dirty categorical variables.
      • sklearn-pandas - Pandas integration with sklearn.
      • datacleaner - Tool that automatically cleans data sets and readies them for analysis.
      • fancyimpute - Multivariate imputation and matrix completion algorithms.
      • raccoon - DataFrame with fast insert and appends.
      • kmodes - k-modes and k-prototypes clustering algorithm.
      • annoy - Approximate Nearest Neighbors.
      • scikit-feature - Filter methods for feature selection.
      • mifs - Parallelized Mutual Information based Feature Selection module.
      • skggm - Scikit-learn compatible estimation of general graphical models.
      • Impyute - Data imputations library to preprocess datasets with missing data.
      • eif - Extended Isolation Forest for Anomaly Detection.
      • featexp - Feature exploration for supervised learning.
      • feature_engine - Feature engineering package with sklearn like functionality.
      • stumpy - STUMPY is a powerful and scalable Python library that can be used for a variety of time series data mining tasks.
      • n2 - Lightweight approximate Nearest Neighbor library which runs faster even with large datasets.
      • compressio - Compressio provides lossless in-memory compression of pandas DataFrames and Series.
      • pdpipe - Easy pipelines for pandas DataFrames.
    • Geolocation

    • Images and Video

    • Ranking/Recommender

      • Surprise - Analyzing recommender systems.
      • trueskill - TrueSkill rating system.
      • LightFM - Hybrid recommendation algorithm.
      • implicit - Collaborative Filtering for Implicit Datasets.
    • Text/NLP

      • preprocessing - Simple interface for the CMU Pronouncing Dictionary.
      • BlingFire - A lightning fast Finite State machine and REgular expression manipulation library.
      • Fuzzy - Soundex, NYSIIS, Double Metaphone.
      • wordfreq - Library for looking up the frequencies of words in many languages, based on many sources of data.
      • BERT-pytorch - Google AI 2018 BERT pytorch implementation.
      • gensim - Topic Modeling.
      • pattern - Web ining module.
      • probablepeople - Parsing unstructured western names into name components.
      • Expynent - Regular expression patterns.
      • mimesis - Generate synthetic data.
      • parserator - Domain-specific probabilistic parsers.
      • usaddress - Parsing unstructured address strings into address components.
      • python-phonenumbers - Python port of Google's libphonenumber.
      • jellyfish - Approximate and phonetic matching of strings.
      • langid - Stand-alone language identification system.
      • fuzzywuzzy - Fuzzy String Matching.
      • snowball - Snowball compiler and stemming algorithms.
      • leven - Levenshtein edit distance.
      • flashtext - Extract Keywords from sentence or Replace keywords in sentences.
      • polyglot - Multilingual text NLP processing toolkit.
      • sentencepiece - Unsupervised text tokenizer for Neural Network-based text generation.
      • pyfasttext - Binding for fastText.
      • python-wordsegment - English word segmentation.
      • pyahocorasick - Exact or approximate multi-pattern string search.
      • Wordbatch - Parallel text feature extraction for machine learning.
      • langdetect - Port of Google's language-detection library.
      • translation - Uses web services for text translation.
      • unidecode - ASCII transliterations of Unicode text.
      • pytorch/text - Data loaders and abstractions for text and NLP.
      • sent2vec - General purpose unsupervised sentence representations.
      • pyhunspell - Python bindings for the Hunspell spellchecker engine.
      • facebook/fastText - Library for fast text representation and classification.
      • textblob - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
      • facebook/InferSent - Sentence embeddings (InferSent) and training code for NLI.
      • nmslib - Non-Metric Space Library.
      • ftfy - Fixes mojibake and other glitches in Unicode text, after the fact.
      • fletcher - Pandas ExtensionDType/Array backed by Apache Arrow.
      • textacy - NLP, before and after spaCy.
      • hmtl - Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP.
      • pytext - A natural language modeling framework based on PyTorch.
      • flair - A very simple framework for state-of-the-art Natural Language Processing.
      • LASER - Language-Agnostic SEntence Representations.
      • transformer-xl - Attentive Language Models Beyond a Fixed-Length Context.
      • nlpaug - Augmenting nlp for your machine learning projects.
      • sum - Automatic summarization of text documents and HTML.
      • textract - Extract text from any document.
      • newspaper - News extraction, article extraction and content curation.
      • pytorch-pretrained-BERT - PyTorch version of Google AI's BERT model with script to load Google's pre-trained models.
      • textdistance - Compute distance between sequences.
    • Time Series

      • Merlion - A Machine Learning Library for Time Series
      • Darts - darts is a Python library for easy manipulation and forecasting of time series.
      • GrayKite - Greykite: A flexible, intuitive and fast forecasting library
      • Causality - Causal analysis.
      • PyFlux - Time series library for Python.