Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/thomasjpfan/awesome-python-data-science

A curated list of Python libraries used for data science.
https://github.com/thomasjpfan/awesome-python-data-science

List: awesome-python-data-science

Last synced: 15 days ago
JSON representation

A curated list of Python libraries used for data science.

Awesome Lists containing this project

README

        

# Awesome Python Data Science

A curated list of Python libraries used for data science.

## Contents

- [Machine Learning Frameworks](#machine-learning-frameworks)
- [Scientific](#scientific)
- [Outlier Detection](#outliter-detection)
- [Deep Learning Frameworks](#deep-learning-frameworks)
- [Deep Learning Tools](#deep-learning-tools)
- [Deep Learning Projects](#deep-learning-projects)
- [Visualization](#visualization)
- [AutoML](#automl)
- [Exploration](#exploration)
- [Feature Extraction](#feature-extraction)
- [Trading](#trading)
- [Misc](#misc)
- [Deployment](#deployment)
- [Profiling](#profiling)
- [Python Tools](#python-tools)
- [Data Gathering](#data-gathering)

## Machine Learning Frameworks

- [scikit-learn](http://scikit-learn.org/stable/) - Machine learning.
- [CatBoost](https://catboost.yandex) - Gradient boosting library with categorical features support.
- [LightGBM](http://lightgbm.readthedocs.io) - Fast, distributed, high performance gradient boosting.
- [Xgboost](https://xgboost.readthedocs.io/en/latest/) - Scalable, Portable and Distributed Gradient Boosting.
- [PyMC](https://github.com/pymc-devs/pymc3) - Probabilistic Programming.
- [statsmodels](https://github.com/statsmodels/statsmodels) - Statistical modeling and econometrics.
- [SymPy](https://github.com/sympy/sympy) - A computer algebra system.
- [NetworkX](https://networkx.github.io/) - Creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
- [dask-ml](https://github.com/dask/dask-ml) - Distributed and parallel machine learning.
- [imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn) - Perform under sampling and over sampling.
- [lightning](https://github.com/scikit-learn-contrib/lightning) - Large-scale linear models.
- [scikit-optimize](https://github.com/scikit-optimize/scikit-optimize) - Sequential model-based optimization with a `scipy.optimize` interface.
- [BayesianOptimization](https://github.com/fmfn/BayesianOptimization) - Global optimization with gaussian processes.
- [gplearn](https://github.com/trevorstephens/gplearn) - Genetic Programming.
- [python-glmnet](https://github.com/civisanalytics/python-glmnet) - glmnet package for fitting generalized linear models.
- [hmmlearn](https://github.com/hmmlearn/hmmlearn) - Hidden Markov Models.
- [vecstack](https://github.com/vecxoz/vecstack) - stacking (machine learning technique).
- [modAL](https://github.com/cosmic-cortex/modAL) - Modular Active Learning framework
- [deap](https://github.com/DEAP/deap) - Evolutionary computation framework.
- [pyro](https://github.com/uber/pyro) - Deep universal probabilistic programming with PyTorch.
- [civisml-extensions](https://github.com/civisanalytics/civisml-extensions) - scikit-learn-compatible estimators from Civis Analytics.
- [hyperopt-sklearn](https://github.com/hyperopt/hyperopt-sklearn) - Hyper-parameter optimization for sklearn.
- [scikit-survival](https://github.com/sebp/scikit-survival) - Survival analysis built on top of scikit-learn.
- [dstoolbox](https://github.com/ottogroup/dstoolbox) - Tools that make working with scikit-learn and pandas easier.
- [modin](https://github.com/modin-project/modin) - Unify the way you interact with your data.
- [pyomo](https://github.com/Pyomo/pyomo) - Python Optimization MOdels.
- [BAMBI](https://github.com/bambinos/bambi) - BAyesian Model-Building Interface.
- [combo](https://github.com/yzhao062/combo) - A Python Toolbox for Machine Learning Model Combination.
- [fastai](https://github.com/fastai/fastai) - The fast.ai deep learning library, lessons, and tutorials.
- [pycaret](https://github.com/pycaret/pycaret) - Low-code machine learning library in Python.
- [river](https://github.com/online-ml/river) - River is a Python library for online machine learning.

## Scientific

- [NumPy](http://www.numpy.org/) - A fundamental package for scientific computing with Python.
- [SciPy](http://www.scipy.org/) - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
- [Pandas](http://pandas.pydata.org/) - A library providing high-performance, easy-to-use data structures and data analysis tools.
- [Numba](http://numba.pydata.org/) - NumPy aware dynamic Python compiler using LLVM.
- [blaze](https://github.com/blaze/blaze) - NumPy and Pandas for databases.
- [astropy](http://www.astropy.org/) - Astronomy and astrophysics.
- [Biopython](http://biopython.org) - Astronomy and astrophysics.
- [PyDy](http://www.pydy.org) - Multibody Dynamics.
- [nilearn](https://github.com/nilearn/nilearn) - NeuroImaging.
- [patsy](https://github.com/pydata/patsy) - Describing statistical models using symbolic formulas.
- [numexpr](https://github.com/pydata/numexpr) - Fast numerical array expression evaluator.
- [dask](https://github.com/dask/dask) - Parallel computing with task scheduling.
- [or-tools](https://github.com/google/or-tools) - Google's Operations Research tools. Classical CS algorithms.
- [cvxpy](https://github.com/cvxgrp/cvxpy) - Python-embedded modeling language for convex optimization problems.

## Outlier Detection

- [PyOD](https://github.com/yzhao062/pyod) - Versatile Python library for detecting anomalies in multivariate data.
- [DeepOD](https://github.com/xuhongzuo/DeepOD) - Deep learning-based outlier/anomaly detection

## Deep Learning Frameworks

- [Tensorflow](https://github.com/tensorflow/tensorflow) - DL Framework.
- [PyTorch](http://pytorch.org) - DL Framework.
- [Keras](https://keras.io) - High-level neutral networks API.
- [tensorlayer](https://github.com/tensorlayer/tensorlayer) - A Deep Learning and Reinforcement Learning Library for Researchers and Engineers.
- [mxnet](https://mxnet.incubator.apache.org) - Apache MXNet: A flexible and efficient library for deep learning.

## Deep Learning Tools

- [TorchDrift](https://github.com/torchdrift/torchdrift/) - TorchDrift is a data and concept drift library for PyTorch.
- [Edward](https://github.com/blei-lab/edward) - Probabilistic programming language in TensorFlow.
- [pomegranate](https://github.com/jmschrei/pomegranate) - Probabilistic modelling.
- [skorch](https://github.com/dnouri/skorch) - Scikit-learn PyTorch.
- [DLTK](https://github.com/DLTK/DLTK) - Deep Learning Toolkit for Medical Image Analysis.
- [sonnet](https://github.com/deepmind/sonnet) - TensorFlow-based neural network library.
- [rasa_core](https://github.com/RasaHQ/rasa_core) - Dialogue engine.
- [luminoth](https://github.com/tryolabs/luminoth) - Computer Vision.
- [allennlp](https://github.com/allenai/allennlp) - NLP Research library.
- [spotlight](https://github.com/maciejkula/spotlight) - Pytorch Recommender framework.
- [tensorforce](https://github.com/reinforceio/tensorforce) - TensorFlow library for applied reinforcement learning.
- [tensorboard-pytorch](https://github.com/lanpa/tensorboard-pytorch) - Tensorboard for pytorch.
- [keras-vis](https://github.com/raghakot/keras-vis) - Neural network visualization toolkit for keras.
- [hyperas](https://github.com/maxpumperla/hyperas) - Keras + Hyperopt.
- [spaCy](https://spacy.io) - Natural Language processing.
- [tensorboard_logger](https://github.com/TeamHG-Memex/tensorboard_logger) - Log TensorBoard events without touching TensorFlow.
- [foolbox](https://github.com/bethgelab/foolbox) - Python toolbox to create adversarial examples that fool neural networks.
- [pytorch/vision](https://github.com/pytorch/vision) - Datasets, Transforms and Models specific to Computer Vision.
- [gluon-nlp](https://github.com/dmlc/gluon-nlp) - NLP made easy.
- [pytorch/ignite](https://github.com/pytorch/ignite) - High-level library to help with training neural networks in PyTorch.
- [Netron](https://github.com/lutzroeder/Netron) - Visualizer for deep learning and machine learning models.
- [gpytorch](https://github.com/cornellius-gp/gpytorch) - A highly efficient and modular implementation of Gaussian Processes in PyTorch.
- [tensorly](https://github.com/tensorly/tensorly) - Tensor Learning in Python.
- [einops](https://github.com/arogozhnikov/einops) - Deep learning operations reinvented.
- [hiddenlayer](https://github.com/waleedka/hiddenlayer) - Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.
- [segmentation_models.pytorch](https://github.com/qubvel/segmentation_models.pytorch) - Segmentation models with pretrained backbones.
- [pytorch-lightning](https://github.com/williamFalcon/pytorch-lightning) - The lightweight PyTorch wrapper.
- [lightly](https://docs.lightly.ai/index.html) - Lightly is a computer vision framework for self-supervised learning.

## Deep Learning Projects

- [fairseq](https://github.com/pytorch/fairseq) - Sequence-to-Sequence Toolkit.
- [tensorflow-wavenet](https://github.com/ibab/tensorflow-wavenet) - DeepMind's WaveNet.
- [DeepRecommender](https://github.com/NVIDIA/DeepRecommender) - Recommender systems.
- [DrQA](https://github.com/facebookresearch/DrQA) - Reading Wikipedia to Answer Open-Domain Questions.
- [vqa.pytorch](https://github.com/Cadene/vqa.pytorch) - Visual Question Answering in Pytorch.
- [Half-Life Regression](https://github.com/duolingo/halflife-regression) - Model for spaced repetition practice.
- [learning-to-learn](https://github.com/deepmind/learning-to-learn) - Learning to Learn in Tensorflow.
- [capsule-networks](https://github.com/gram-ai/capsule-networks) - A PyTorch implementation of the NIPS 2017 paper "Dynamic Routing Between Capsules".
- [Mask_RCNN](https://github.com/matterport/Mask_RCNN) - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow.
- [lightnet](https://github.com/explosion/lightnet) - Bringing pjreddie's DarkNet out of the shadows.
- [pytorch-openai-transformer-lm](https://github.com/huggingface/pytorch-openai-transformer-lm) - OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI.
- [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) - Fast, modular reference implementation of Semantic Segmentation and Object Detection algorithm in PyTorch.
- [LovaszSoftmax](https://github.com/bermanmaxim/LovaszSoftmax) - Lovász-Softmax loss.
- [ludwing](https://github.com/uber/ludwig) - Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.

## Visualization

- [Great Tables](https://github.com/posit-dev/great-tables) - Absolutely Delightful Table-making in Python.
- [PyGWalker](https://docs.kanaries.net/pygwalker) - Turns pandas and polars dataframes into a Tableau-like user interface for visual exploration.
- [diagrams](https://github.com/mingrammer/diagrams) - Diagrams lets you draw the cloud system architecture in Python code.
- [matplotlib](http://matplotlib.org/) - 2D plotting.
- [seaborn](https://seaborn.pydata.org) - Visualization library.
- [bokeh](https://github.com/bokeh/bokeh) - Interactive web plotting.
- [plotly](https://plot.ly/python/) - Collaborative web plotting.
- [dash](https://github.com/plotly/dash) - Interactive Web plotting.
- [altair](https://github.com/altair-viz/altair) - Declarative statistical visualization.
- [folium](https://github.com/python-visualization/folium) - Leaflet.js Maps.
- [geoplot](https://github.com/ResidentMario/geoplot) - High-level geospatial data visualization.
- [datashader](http://datashader.org) - Graphics pipeline system.
- [mplleaftlet](https://github.com/jwass/mplleaflet) - Matplotlib plots from Python into interactive Leaflet web maps.
- [matplotlib-venn](https://github.com/konstantint/matplotlib-venn) - Area-weighted venn-diagrams.
- [pyLDAvis](https://github.com/bmabey/pyLDAvis) - Interactive topic model visualization.
- [cufflinks](https://github.com/santosjorge/cufflinks) - Productivity Tools for Plotly + Pandas.
- [scatterText](https://github.com/JasonKessler/scattertext) - Visualizations of how language differs among document types.
- [plotnine](https://github.com/has2k1/plotnine) - ggplot for python.
- [mizani](https://github.com/has2k1/mizani) - scales package.
- [bqplot](https://github.com/bloomberg/bqplot) - Plotting library for IPython/Jupyter Notebooks.
- [PtitPrince](https://github.com/pog87/PtitPrince) - Raindrop cloud.
- [joypy](https://github.com/sbebo/joypy) - Ridgeline plots.
- [dtreeviz](https://github.com/parrt/dtreeviz) - Decision tree visualization and model interpretation.
- [ipyvolume](https://github.com/maartenbreddels/ipyvolume) - 3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL.

## AutoML

- [Nevergrad](https://github.com/facebookresearch/nevergrad) - Gradient-free optimization.
- [featuretools](https://github.com/Featuretools/featuretools) - Automated feature engineering.
- [auto-sklearn](https://github.com/automl/auto-sklearn) - Automated machine learning.
- [tpot](https://github.com/EpistasisLab/tpot) - Automated machine learning.
- [auto_ml](https://github.com/ClimbsRocks/auto_ml) - Automated machine learning.
- [MLBox](https://github.com/AxeldeRomblay/MLBox) - Automated Machine Learning python library.
- [devol](https://github.com/joeddav/devol) - Automated deep neural network design via genetic programming.
- [skll](https://github.com/EducationalTestingService/skll) - SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
- [autokeras](https://github.com/jhfjhfj1/autokeras) - Automated machine learning in Keras.
- [SMAC3](https://github.com/automl/SMAC3) - Sequential Model-based Algorithm Configuration.

## Exploration

- [mlxtend](https://github.com/rasbt/mlxtend) - A library of extension and helper modules for Python's data analysis and machine learning libraries.
- [yellowbrick](https://github.com/DistrictDataLabs/yellowbrick) - Visual analysis and diagnostic tools.
- [pandas-profiling](https://github.com/pandas-profiling/pandas-profiling) - Profiling reports for pandas DataFrame objects.
- [Skater](https://github.com/datascienceinc/Skater) - Model Agnostic Interpretation.
- [Dora](https://github.com/NathanEpstein/Dora) - Exploratory data analysis.
- [sklearn-evaluation](https://github.com/edublancas/sklearn-evaluation) - scikit-learn model evaluation.
- [fitter](http://pythonhosted.org/fitter/) - simple class to identify the distribution from which a data samples is generated from.
- [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization.
- [hypertools](https://github.com/ContextLab/hypertools) - Gaining geometric insights into high-dimensional data.
- [scikit-plot](https://github.com/reiinakano/scikit-plot) - Plotting functionality to scikit-learn objects.
- [elih](https://github.com/fvinas/elih) - Explain Machine Learning.
- [kmeans_smote](https://github.com/felix-last/kmeans_smote) - Oversampling for imbalanced learning based on k-means and SMOTE.
- [pyUpSet](https://github.com/ImSoErgodic/py-upset) - UpSet suite of visualisation methods.
- [lime](https://github.com/marcotcr/lime) - Explaining the predictions of any machine learning classifier.
- [pandas-summary](https://github.com/mouradmourafiq/pandas-summary) - An extension to pandas dataframes describe function.
- [SauceCat/PDPbox](https://github.com/SauceCat/PDPbox) - Partial dependence plot toolbox.
- [shap](https://github.com/slundberg/shap) - A unified approach to explain the output of any machine learning model.
- [eli5](https://github.com/TeamHG-Memex/eli5) - Debug machine learning classifiers and explain their predictions.
- [rfpimp](https://github.com/parrt/random-forest-importances) - Permutation and drop-column importance for scikit-learn random forests.
- [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines made easy.
- [pycm](https://github.com/sepandhaghighi/pycm) - Multi-class confusion matrix library in Python.
- [great_expectations](https://github.com/great-expectations/great_expectations) - Always know what to expect from your data.
- [alibi](https://github.com/SeldonIO/alibi) - Algorithms for monitoring and explaining machine learning models.
- [InterpretML](https://github.com/interpretml/interpret) - Fit interpretable models. Explain blackbox machine learning.
- [cleanlab](https://github.com/cgnorthcutt/cleanlab) - Finding label errors in datasets and learning with noisy labels.
- [dtale](https://github.com/man-group/dtale) - Flask/React client for visualizing pandas data structures
- [dabl](https://github.com/dabl/dabl) - Data Analysis Baseline Library
- [XAI](https://github.com/EthicalML/xai) - XAI - An eXplainability toolbox for machine learning
- [explainerdashboard](https://github.com/oegedijk/explainerdashboard) - This package makes it convenient to quickly deploy a dashboard web app that explains the workings of a (scikit-learn compatible) machine learning model.
- [alibi-detect](https://github.com/SeldonIO/alibi-detect) - Open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series.

## Feature Extraction

### General Feature Extraction

- [sklearn-pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) - Pandas integration with sklearn.
- [pdpipe](https://github.com/shaypal5/pdpipe) - Easy pipelines for pandas DataFrames.
- [engarde](https://github.com/TomAugspurger/engarde) - Defensive data analysis.
- [datacleaner](https://github.com/rhiever/datacleaner) - Tool that automatically cleans data sets and readies them for analysis.
- [categorical-encoding](https://github.com/scikit-learn-contrib/categorical-encoding) - sklearn compatible categorical variable encoders.
- [fancyimpute](https://github.com/iskandr/fancyimpute) - Multivariate imputation and matrix completion algorithms.
- [raccoon](https://github.com/rsheftel/raccoon) - DataFrame with fast insert and appends.
- [kmodes](https://github.com/nicodv/kmodes) - k-modes and k-prototypes clustering algorithm.
- [annoy](https://github.com/spotify/annoy) - Approximate Nearest Neighbors.
- [datacleaner](https://github.com/rhiever/datacleaner) - Automatically cleans data sets and readies them for analysis.
- [scikit-feature](https://github.com/jundongl/scikit-feature) - Filter methods for feature selection.
- [mifs](https://github.com/danielhomola/mifs) - Parallelized Mutual Information based Feature Selection module.
- [skggm](https://github.com/skggm/skggm) - Scikit-learn compatible estimation of general graphical models.
- [dirty_cat](https://dirty-cat.github.io/stable/index.html) - Encoding methods for dirty categorical variables.
- [Impyute](https://github.com/eltonlaw/impyute) - Data imputations library to preprocess datasets with missing data.
- [eif](https://github.com/sahandha/eif) - Extended Isolation Forest for Anomaly Detection.
- [featexp](https://github.com/abhayspawar/featexp) - Feature exploration for supervised learning.
- [feature_engine](https://github.com/solegalli/feature_engine) - Feature engineering package with sklearn like functionality.
- [stumpy](https://github.com/TDAmeritrade/stumpy) - STUMPY is a powerful and scalable Python library that can be used for a variety of time series data mining tasks.
- [n2](https://github.com/kakao/n2) - Lightweight approximate Nearest Neighbor library which runs faster even with large datasets.
- [compressio](https://github.com/dylan-profiler/compressio) - Compressio provides lossless in-memory compression of pandas DataFrames and Series.

### Time Series

- [Merlion](https://github.com/salesforce/Merlion) - A Machine Learning Library for Time Series
- [Darts](https://github.com/unit8co/darts) - darts is a Python library for easy manipulation and forecasting of time series.
- [GrayKite](https://github.com/linkedin/greykite) - Greykite: A flexible, intuitive and fast forecasting library
- [Causality](https://github.com/akelleh/causality) - Causal analysis.
- [traces](https://github.com/datascopeanalytics/traces) - Unevenly-spaced time series analysis.
- [PyFlux](https://github.com/RJT1990/pyflux) - Time series library for Python.
- [prophet](https://github.com/facebook/prophet) - Tool for producing high quality forecasts.
- [tsfresh](https://github.com/blue-yonder/tsfresh) - Automatic extraction of relevant features from time series.
- [tslearn](https://github.com/rtavenar/tslearn) - Machine learning toolkit dedicated to time-series data.
- [pyts](https://github.com/johannfaouzi/pyts) - A Python package for time series transformation and classification.
- [sktime](https://github.com/alan-turing-institute/sktime) - A scikit-learn compatible Python toolbox for learning with time series data.
- [stumpy](https://github.com/TDAmeritrade/stumpy) - Matrix profiles.
- [luminaire](https://github.com/zillow/luminaire) - ML driven solutions for monitoring time series data.
- [NeuralProphet](https://github.com/ourownstory/neural_prophet) - A Neural Network based Time-Series model, inspired by Facebook Prophet and AR-Net, built on PyTorch.

### Audio

- [python_speech_features](https://github.com/jameslyons/python_speech_features) - Speech features.
- [speechpy](https://github.com/astorfi/speechpy) - A Library for Speech Processing and Recognition.
- [magenta](https://github.com/tensorflow/magenta) - Music and Art Generation with Machine Intelligence.
- [librosa](https://github.com/librosa/librosa) - Audio and music analysis.
- [pydub](https://github.com/jiaaro/pydub) - Manipulate audio with a simple and easy high level interface.
- [pytorch/audio](https://github.com/pytorch/audio) - simple audio I/O for pytorch.

### Images and Video

- [pillow](https://github.com/python-pillow/Pillow) - PIL fork.
- [scikit-image](http://scikit-image.org/) - Image processing.
- [hmap](https://github.com/rossgoodwin/hmap) - Image histogram remapping.
- [pyocr](https://github.com/openpaperwork/pyocr) - A wrapper for Tesseract and Cuneiform (Optical Character Recognition).
- [scikit-video](https://github.com/aizvorski/scikit-video) - Video processing.
- [moviepy](http://zulko.github.io/moviepy/) - Video editing.
- [OpenCV](http://opencv.org/) - Open Source Computer Vision Library.
- [SimpleCV](http://simplecv.org/) - Wrapper around OpenCV.
- [label-maker](https://github.com/developmentseed/label-maker) - Data Preparation for Satellite Machine Learning.
- [face_recognition](https://github.com/ageitgey/face_recognition) - Facial recognition.
- [imgaug](https://github.com/aleju/imgaug) - Image augmentation.
- [pyvips](https://github.com/jcupitt/pyvips) - Fast image processing.
- [ImageHash](https://github.com/JohannesBuchner/imagehash) - Image hashing.
- [Augmentor](https://github.com/mdbloice/Augmentor) - Image augmentation library.
- [PyAV](https://github.com/mikeboers/PyAV) - Bindings for FFmpeg.
- [imutils](https://github.com/jrosebr1/imutils) - Convenience functions to make basic image processing operations.
- [albumentations](https://github.com/albu/albumentations) - fast image augmentation library.

### Geolocation

- [geojson](https://github.com/frewsxcv/python-geojson) - Python bindings for GeoJSON.
- [geopy](https://github.com/geopy/geopy) - Python Geocoding Toolbox.
- [OSMnx](https://github.com/gboeing/osmnx) - Street networks.
- [reverse-geocoder](https://github.com/thampiman/reverse-geocoder) - A fast, offline reverse geocoder.
- [pysal](https://github.com/pysal/pysal) - Spatial Analysis Library.
- [geopandas](https://github.com/geopandas/geopandas) - Tools for geographic data.

### Text/NLP

- [wordfreq](https://github.com/rspeer/wordfreq) - Library for looking up the frequencies of words in many languages, based on many sources of data.
- [BlingFire](https://github.com/Microsoft/BlingFire) - A lightning fast Finite State machine and REgular expression manipulation library.
- [BERT-pytorch](https://github.com/codertimo/BERT-pytorch) - Google AI 2018 BERT pytorch implementation.
- [pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT) - PyTorch version of Google AI's BERT model with script to load Google's pre-trained models.
- [gensim](https://github.com/piskvorky/gensim) - Topic Modeling.
- [pattern](https://github.com/clips/pattern) - Web ining module.
- [probablepeople](https://github.com/datamade/probablepeople) - Parsing unstructured western names into name components.
- [Expynent](https://github.com/lk-geimfari/expynent) - Regular expression patterns.
- [mimesis](https://github.com/lk-geimfari/mimesis) - Generate synthetic data.
- [pyenchant](https://github.com/rfk/pyenchant) - Spell checking.
- [parserator](https://github.com/datamade/parserator) - Domain-specific probabilistic parsers.
- [scrubadub](https://github.com/datascopeanalytics/scrubadub) - Clean personally identifiable information from dirty dirty text.
- [usaddress](https://github.com/datamade/usaddress) - Parsing unstructured address strings into address components.
- [python-phonenumbers](https://github.com/daviddrysdale/python-phonenumbers) - Python port of Google's libphonenumber.
- [jellyfish](https://github.com/jamesturk/jellyfish) - Approximate and phonetic matching of strings.
- [preprocessing](https://pronouncing.readthedocs.io/en/latest/) - Simple interface for the CMU Pronouncing Dictionary.
- [langid](https://github.com/saffsd/langid.py) - Stand-alone language identification system.
- [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy) - Fuzzy String Matching.
- [Fuzzy](https://github.com/yougov/Fuzzy) - Soundex, NYSIIS, Double Metaphone.
- [snowball](https://github.com/snowballstem/snowball) - Snowball compiler and stemming algorithms.
- [leven](https://github.com/semanticize/leven) - Levenshtein edit distance.
- [flashtext](https://github.com/vi3k6i5/flashtext) - Extract Keywords from sentence or Replace keywords in sentences.
- [polyglot](https://github.com/aboSamoor/polyglot) - Multilingual text NLP processing toolkit.
- [sentencepiece](https://github.com/google/sentencepiece) - Unsupervised text tokenizer for Neural Network-based text generation.
- [pyfasttext](https://github.com/vrasneur/pyfasttext) - Binding for fastText.
- [python-wordsegment](https://github.com/grantjenks/python-wordsegment) - English word segmentation.
- [pyahocorasick](https://github.com/WojciechMula/pyahocorasick) - Exact or approximate multi-pattern string search.
- [Wordbatch](https://github.com/anttttti/Wordbatch) - Parallel text feature extraction for machine learning.
- [langdetect](https://github.com/Mimino666/langdetect) - Port of Google's language-detection library.
- [translation](https://github.com/littlecodersh/translation) - Uses web services for text translation.
- [nltk](http://www.nltk.org) - Natural Language Toolkit.
- [unidecode](https://github.com/avian2/unidecode) - ASCII transliterations of Unicode text.
- [pytorch/text](https://github.com/pytorch/text) - Data loaders and abstractions for text and NLP.
- [textdistance](https://github.com/orsinium/textdistance) - Compute distance between sequences.
- [sent2vec](https://github.com/epfml/sent2vec) - General purpose unsupervised sentence representations.
- [pyhunspell](https://github.com/blatinier/pyhunspell) - Python bindings for the Hunspell spellchecker engine.
- [facebook/fastText](https://github.com/facebookresearch/fastText) - Library for fast text representation and classification.
- [textblob](https://github.com/sloria/textblob) - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
- [facebook/InferSent](https://github.com/facebookresearch/InferSent) - Sentence embeddings (InferSent) and training code for NLI.
- [nmslib](https://github.com/nmslib/nmslib) - Non-Metric Space Library.
- [google/sentencepiece](https://github.com/google/sentencepiece) - Unsupervised text tokenizer for Neural Network-based text generation.
- [ftfy](https://github.com/LuminosoInsight/python-ftfy) - Fixes mojibake and other glitches in Unicode text, after the fact.
- [fletcher](https://github.com/xhochy/fletcher) - Pandas ExtensionDType/Array backed by Apache Arrow.
- [textacy](https://github.com/chartbeat-labs/textacy) - NLP, before and after spaCy.
- [hmtl](https://github.com/huggingface/hmtl) - Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP.
- [pytext](https://github.com/facebookresearch/pytext) - A natural language modeling framework based on PyTorch.
- [flair](https://github.com/zalandoresearch/flair) - A very simple framework for state-of-the-art Natural Language Processing.
- [LASER](https://github.com/facebookresearch/LASER) - Language-Agnostic SEntence Representations.
- [transformer-xl](https://github.com/kimiyoung/transformer-xl) - Attentive Language Models Beyond a Fixed-Length Context.
- [textstat](https://github.com/shivam5992/textstat) - Calculate readability statistics of a text object - paragraphs, sentences, articles.
- [nlpaug](https://github.com/makcedward/nlpaug) - Augmenting nlp for your machine learning projects.
- [sum](https://github.com/miso-belica/sumy) - Automatic summarization of text documents and HTML.
- [textract](https://github.com/deanmalmgren/textract) - Extract text from any document.
- [newspaper](https://github.com/codelucas/newspaper) - News extraction, article extraction and content curation.

### Ranking/Recommender

- [recommenders](https://github.com/microsoft/recommenders) - Examples and best practices for building recommendation systems
- [Surprise](https://github.com/NicolasHug/Surprise) - Analyzing recommender systems.
- [trueskill](https://github.com/sublee/trueskill) - TrueSkill rating system.
- [LightFM](https://github.com/lyst/lightfm) - Hybrid recommendation algorithm.
- [implicit](https://github.com/benfred/implicit) - Collaborative Filtering for Implicit Datasets.

## Trading

- [Clairvoyant](https://github.com/anfederico/Clairvoyant) - Identify and monitor social/historical cues.
- [zipline](https://github.com/quantopian/zipline) - Algorithmic Trading Library.
- [qstrader](https://github.com/mhallsmoore/qstrader/) - Advanced Trading Infrastructure.

## Misc

- [mmh3](https://github.com/hajimes/mmh3) - MurmurHash3, a set of fast and robust hash functions.
- [fbpca](https://github.com/facebook/fbpca) - Fast Randomized PCA/SVD.
- [annoy](https://github.com/spotify/annoy) - Approximate Nearest Neighbors.
- [pipeline](https://github.com/PipelineAI/pipeline) - Standard Runtime For Every Real-Time Machine Learning.
- [crayon](https://github.com/torrvision/crayon) - A language-agnostic interface to TensorBoard.
- [faiss](https://github.com/facebookresearch/faiss) - A library for efficient similarity search and clustering of dense vectors.
- [pyod](https://github.com/yzhao062/pyod) - Comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data.

## Deployment

- [evidently](https://github.com/evidentlyai/evidently) - Evidently helps evaluate machine learning models during validation and monitor them in production.
- [onnx](https://github.com/onnx/onnx) - Open Neutral Network Exchange.
- [lore](https://github.com/instacart/lore) - Lore makes machine learning approachable for Software Engineers and maintainable for Machine Learning Researchers.
- [kubeflow](https://github.com/kubeflow/kubeflow) - Machine Learning Toolkit for Kubernetes.
- [airflow](https://github.com/apache/incubator-airflow) - ETL.
- [mlflow](https://github.com/databricks/mlflow) - Open source platform for the complete machine learning lifecycle.
- [sklearn-porter](https://github.com/nok/sklearn-porter) - Transpile trained scikit-learn estimators.
- [sklearn-compiledtrees](https://github.com/ajtulloch/sklearn-compiledtrees) - Compiled Decision Trees for scikit-learn.

## Profiling

- [mem_usage_ui](https://github.com/parikls/mem_usage_ui) - Measuring and graphing memory usage of local processes.
- [viztracer](https://github.com/gaogaotiantian/viztracer) - VizTracer is a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.
- [py-spy](https://github.com/benfred/py-spy) - Sampling profiler for Python programs.
- [memory_profiler](https://pypi.python.org/pypi/memory_profiler) - monitoring memory usage of a python program.
- [line_profiler](https://github.com/rkern/line_profiler) - Line-by-line profiling.
- [filprofiler](https://github.com/pythonspeed/filprofiler) - Fil a memory profiler designed for data processing applications.
- [scalene](https://github.com/emeryberger/scalene) - High-performance CPU and memory profiler for Python.
- [python-flamegraph](https://github.com/evanhempel/python-flamegraph) - Statistical profiler which outputs in format suitable for FlameGraph.

## Python Tools

- [Typer](https://github.com/tiangolo/typer) - Build CLIs with type hints.
- [hydra](https://hydra.cc) - Framework for elegantly configuring complex applications.
- [neurtu](https://github.com/symerio/neurtu) - A Python package for parametric benchmarks.
- [pyprojroot](https://github.com/chendaniely/pyprojroot) - Finding project directories in Python.
- [datasette](https://datasette.io) - An open source multi-tool for exploring and publishing data.
- [delorean](https://github.com/myusuf3/delorean) - Time Travel Made Easy.
- [pip-tools](https://github.com/nvie/pip-tools) - Keeps dependencies up to date.
- [devpi](http://doc.devpi.net/latest/) - PyPI server and packaging/testing/release tool.
- [Jupyter Notebook](https://jupyter.org) - Notebooks are awseome.
- [click](https://github.com/pallets/click) - CLI package.
- [sacredboard](https://github.com/chovanecm/sacredboard) - Dashboard for sacred.
- [sacred](http://sacred.readthedocs.io/en/latest/) - Reproduce computational experiments.
- [magic-wormhole](https://github.com/warner/magic-wormhole) - get things from one computer to another, safely.

## Data Gathering

- [gain](https://github.com/gaojiuli/gain) - Web crawling framework based on asyncio.
- [MechanicalSoup](https://github.com/MechanicalSoup/MechanicalSoup) - A Python library for automating interaction with websites.
- [camelot](https://github.com/socialcopsdev/camelot) - Camelot: PDF Table Extraction for Humans.
- [Pandarallel](https://github.com/nalepae/pandarallel) - Parallel pandas.
- [great_expectations](https://github.com/great-expectations/great_expectations) - F framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests.
- [parse](https://github.com/r1chardj0n3s/parse) - Parse strings using a specification based on the Python format() syntax.
- [CleverCSV](https://github.com/alan-turing-institute/CleverCSV) - CleverCSV is a Python package for handling messy CSV files