{"id":21116,"url":"https://github.com/thomasjpfan/awesome-python-data-science","name":"awesome-python-data-science","description":"A curated list of Python libraries used for data science.","projects_count":327,"last_synced_at":"2026-04-14T17:00:26.435Z","repository":{"id":104388377,"uuid":"121789998","full_name":"thomasjpfan/awesome-python-data-science","owner":"thomasjpfan","description":"A curated list of Python libraries used for data science.","archived":false,"fork":false,"pushed_at":"2024-06-19T02:09:10.000Z","size":142,"stargazers_count":96,"open_issues_count":1,"forks_count":26,"subscribers_count":6,"default_branch":"master","last_synced_at":"2026-03-30T21:03:19.078Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomasjpfan.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-16T19:11:22.000Z","updated_at":"2026-03-12T08:30:43.000Z","dependencies_parsed_at":"2024-01-13T14:36:17.659Z","dependency_job_id":"c83a3c88-012d-403c-85a2-ad8ce800b26c","html_url":"https://github.com/thomasjpfan/awesome-python-data-science","commit_stats":{"total_commits":171,"total_committers":1,"mean_commits":171.0,"dds":0.0,"last_synced_commit":"415ca3e340c4f79a37e3bf7e0c387ba28bbe7397"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/thomasjpfan/awesome-python-data-science","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasjpfan%2Fawesome-python-data-science","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasjpfan%2Fawesome-python-data-science/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasjpfan%2Fawesome-python-data-science/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasjpfan%2Fawesome-python-data-science/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomasjpfan","download_url":"https://codeload.github.com/thomasjpfan/awesome-python-data-science/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasjpfan%2Fawesome-python-data-science/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31806209,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T11:13:53.975Z","status":"ssl_error","status_checked_at":"2026-04-14T11:13:53.299Z","response_time":153,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"readme":"# Awesome Python Data Science\n\nA curated list of Python libraries used for data science.\n\n## Contents\n\n- [Machine Learning Frameworks](#machine-learning-frameworks)\n- [Scientific](#scientific)\n- [Outlier Detection](#outliter-detection)\n- [Deep Learning Frameworks](#deep-learning-frameworks)\n- [Deep Learning Tools](#deep-learning-tools)\n- [Deep Learning Projects](#deep-learning-projects)\n- [Visualization](#visualization)\n- [AutoML](#automl)\n- [Exploration](#exploration)\n- [Feature Extraction](#feature-extraction)\n- [Trading](#trading)\n- [Misc](#misc)\n- [Deployment](#deployment)\n- [Profiling](#profiling)\n- [Python Tools](#python-tools)\n- [Data Gathering](#data-gathering)\n\n## Machine Learning Frameworks\n\n- [scikit-learn](http://scikit-learn.org/stable/) - Machine learning.\n- [CatBoost](https://catboost.yandex) - Gradient boosting library with categorical features support.\n- [LightGBM](http://lightgbm.readthedocs.io) - Fast, distributed, high performance gradient boosting.\n- [Xgboost](https://xgboost.readthedocs.io/en/latest/) - Scalable, Portable and Distributed Gradient Boosting.\n- [PyMC](https://github.com/pymc-devs/pymc3) - Probabilistic Programming.\n- [statsmodels](https://github.com/statsmodels/statsmodels) - Statistical modeling and econometrics.\n- [SymPy](https://github.com/sympy/sympy) - A computer algebra system.\n- [NetworkX](https://networkx.github.io/) - Creation, manipulation, and study of the structure, dynamics, and functions of complex networks.\n- [dask-ml](https://github.com/dask/dask-ml) - Distributed and parallel machine learning.\n- [imbalanced-learn](https://github.com/scikit-learn-contrib/imbalanced-learn) - Perform under sampling and over sampling.\n- [lightning](https://github.com/scikit-learn-contrib/lightning) - Large-scale linear models.\n- [scikit-optimize](https://github.com/scikit-optimize/scikit-optimize) - Sequential model-based optimization with a `scipy.optimize` interface.\n- [BayesianOptimization](https://github.com/fmfn/BayesianOptimization) - Global optimization with gaussian processes.\n- [gplearn](https://github.com/trevorstephens/gplearn) - Genetic Programming.\n- [python-glmnet](https://github.com/civisanalytics/python-glmnet) - glmnet package for fitting generalized linear models.\n- [hmmlearn](https://github.com/hmmlearn/hmmlearn) - Hidden Markov Models.\n- [vecstack](https://github.com/vecxoz/vecstack) - stacking (machine learning technique).\n- [modAL](https://github.com/cosmic-cortex/modAL) - Modular Active Learning framework\n- [deap](https://github.com/DEAP/deap) - Evolutionary computation framework.\n- [pyro](https://github.com/uber/pyro) - Deep universal probabilistic programming with PyTorch.\n- [civisml-extensions](https://github.com/civisanalytics/civisml-extensions) - scikit-learn-compatible estimators from Civis Analytics.\n- [hyperopt-sklearn](https://github.com/hyperopt/hyperopt-sklearn) - Hyper-parameter optimization for sklearn.\n- [scikit-survival](https://github.com/sebp/scikit-survival) - Survival analysis built on top of scikit-learn.\n- [dstoolbox](https://github.com/ottogroup/dstoolbox) - Tools that make working with scikit-learn and pandas easier.\n- [modin](https://github.com/modin-project/modin) - Unify the way you interact with your data.\n- [pyomo](https://github.com/Pyomo/pyomo) - Python Optimization MOdels.\n- [BAMBI](https://github.com/bambinos/bambi) - BAyesian Model-Building Interface.\n- [combo](https://github.com/yzhao062/combo) - A Python Toolbox for Machine Learning Model Combination.\n- [fastai](https://github.com/fastai/fastai) - The fast.ai deep learning library, lessons, and tutorials.\n- [pycaret](https://github.com/pycaret/pycaret) -  Low-code machine learning library in Python.\n- [river](https://github.com/online-ml/river) - River is a Python library for online machine learning.\n\n## Scientific\n\n- [NumPy](http://www.numpy.org/) - A fundamental package for scientific computing with Python.\n- [SciPy](http://www.scipy.org/) - A Python-based ecosystem of open-source software for mathematics, science, and engineering.\n- [Pandas](http://pandas.pydata.org/) - A library providing high-performance, easy-to-use data structures and data analysis tools.\n- [Numba](http://numba.pydata.org/) - NumPy aware dynamic Python compiler using LLVM.\n- [blaze](https://github.com/blaze/blaze) - NumPy and Pandas for databases.\n- [astropy](http://www.astropy.org/) - Astronomy and astrophysics.\n- [Biopython](http://biopython.org) - Astronomy and astrophysics.\n- [PyDy](http://www.pydy.org) - Multibody Dynamics.\n- [nilearn](https://github.com/nilearn/nilearn) - NeuroImaging.\n- [patsy](https://github.com/pydata/patsy) - Describing statistical models using symbolic formulas.\n- [numexpr](https://github.com/pydata/numexpr) - Fast numerical array expression evaluator.\n- [dask](https://github.com/dask/dask) - Parallel computing with task scheduling.\n- [or-tools](https://github.com/google/or-tools) - Google's Operations Research tools. Classical CS algorithms.\n- [cvxpy](https://github.com/cvxgrp/cvxpy) - Python-embedded modeling language for convex optimization problems.\n\n## Outlier Detection\n\n- [PyOD](https://github.com/yzhao062/pyod) - Versatile Python library for detecting anomalies in multivariate data.\n- [DeepOD](https://github.com/xuhongzuo/DeepOD) - Deep learning-based outlier/anomaly detection\n\n## Deep Learning Frameworks\n\n- [Tensorflow](https://github.com/tensorflow/tensorflow) - DL Framework.\n- [PyTorch](http://pytorch.org) - DL Framework.\n- [Keras](https://keras.io) - High-level neutral networks API.\n- [tensorlayer](https://github.com/tensorlayer/tensorlayer) - A Deep Learning and Reinforcement Learning Library for Researchers and Engineers.\n- [mxnet](https://mxnet.incubator.apache.org) - Apache MXNet: A flexible and efficient library for deep learning.\n\n## Deep Learning Tools\n\n- [TorchDrift](https://github.com/torchdrift/torchdrift/) - TorchDrift is a data and concept drift library for PyTorch.\n- [Edward](https://github.com/blei-lab/edward) - Probabilistic programming language in TensorFlow.\n- [pomegranate](https://github.com/jmschrei/pomegranate) - Probabilistic modelling.\n- [skorch](https://github.com/dnouri/skorch) - Scikit-learn PyTorch.\n- [DLTK](https://github.com/DLTK/DLTK) - Deep Learning Toolkit for Medical Image Analysis.\n- [sonnet](https://github.com/deepmind/sonnet) - TensorFlow-based neural network library.\n- [rasa_core](https://github.com/RasaHQ/rasa_core) - Dialogue engine.\n- [luminoth](https://github.com/tryolabs/luminoth) - Computer Vision.\n- [allennlp](https://github.com/allenai/allennlp) - NLP Research library.\n- [spotlight](https://github.com/maciejkula/spotlight) - Pytorch Recommender framework.\n- [tensorforce](https://github.com/reinforceio/tensorforce) - TensorFlow library for applied reinforcement learning.\n- [tensorboard-pytorch](https://github.com/lanpa/tensorboard-pytorch) - Tensorboard for pytorch.\n- [keras-vis](https://github.com/raghakot/keras-vis) - Neural network visualization toolkit for keras.\n- [hyperas](https://github.com/maxpumperla/hyperas) - Keras + Hyperopt.\n- [spaCy](https://spacy.io) - Natural Language processing.\n- [tensorboard_logger](https://github.com/TeamHG-Memex/tensorboard_logger) - Log TensorBoard events without touching TensorFlow.\n- [foolbox](https://github.com/bethgelab/foolbox) - Python toolbox to create adversarial examples that fool neural networks.\n- [pytorch/vision](https://github.com/pytorch/vision) - Datasets, Transforms and Models specific to Computer Vision.\n- [gluon-nlp](https://github.com/dmlc/gluon-nlp) - NLP made easy.\n- [pytorch/ignite](https://github.com/pytorch/ignite) - High-level library to help with training neural networks in PyTorch.\n- [Netron](https://github.com/lutzroeder/Netron) - Visualizer for deep learning and machine learning models.\n- [gpytorch](https://github.com/cornellius-gp/gpytorch) - A highly efficient and modular implementation of Gaussian Processes in PyTorch.\n- [tensorly](https://github.com/tensorly/tensorly) - Tensor Learning in Python.\n- [einops](https://github.com/arogozhnikov/einops) - Deep learning operations reinvented.\n- [hiddenlayer](https://github.com/waleedka/hiddenlayer) - Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.\n- [segmentation_models.pytorch](https://github.com/qubvel/segmentation_models.pytorch) - Segmentation models with pretrained backbones.\n- [pytorch-lightning](https://github.com/williamFalcon/pytorch-lightning) - The lightweight PyTorch wrapper.\n- [lightly](https://docs.lightly.ai/index.html) - Lightly is a computer vision framework for self-supervised learning.\n\n## Deep Learning Projects\n\n- [fairseq](https://github.com/pytorch/fairseq) - Sequence-to-Sequence Toolkit.\n- [tensorflow-wavenet](https://github.com/ibab/tensorflow-wavenet) - DeepMind's WaveNet.\n- [DeepRecommender](https://github.com/NVIDIA/DeepRecommender) - Recommender systems.\n- [DrQA](https://github.com/facebookresearch/DrQA) - Reading Wikipedia to Answer Open-Domain Questions.\n- [vqa.pytorch](https://github.com/Cadene/vqa.pytorch) - Visual Question Answering in Pytorch.\n- [Half-Life Regression](https://github.com/duolingo/halflife-regression) - Model for spaced repetition practice.\n- [learning-to-learn](https://github.com/deepmind/learning-to-learn) - Learning to Learn in Tensorflow.\n- [capsule-networks](https://github.com/gram-ai/capsule-networks) - A PyTorch implementation of the NIPS 2017 paper \"Dynamic Routing Between Capsules\".\n- [Mask_RCNN](https://github.com/matterport/Mask_RCNN) - Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow.\n- [lightnet](https://github.com/explosion/lightnet) - Bringing pjreddie's DarkNet out of the shadows.\n- [pytorch-openai-transformer-lm](https://github.com/huggingface/pytorch-openai-transformer-lm) - OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI.\n- [maskrcnn-benchmark](https://github.com/facebookresearch/maskrcnn-benchmark) - Fast, modular reference implementation of Semantic Segmentation and Object Detection algorithm in PyTorch.\n- [LovaszSoftmax](https://github.com/bermanmaxim/LovaszSoftmax) - Lovász-Softmax loss.\n- [ludwing](https://github.com/uber/ludwig) - Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code.\n\n## Visualization\n\n- [Great Tables](https://github.com/posit-dev/great-tables) - Absolutely Delightful Table-making in Python.\n- [PyGWalker](https://docs.kanaries.net/pygwalker) - Turns pandas and polars dataframes into a Tableau-like user interface for visual exploration.\n- [diagrams](https://github.com/mingrammer/diagrams) - Diagrams lets you draw the cloud system architecture in Python code.\n- [matplotlib](http://matplotlib.org/) - 2D plotting.\n- [seaborn](https://seaborn.pydata.org) - Visualization library.\n- [bokeh](https://github.com/bokeh/bokeh) - Interactive web plotting.\n- [plotly](https://plot.ly/python/) - Collaborative web plotting.\n- [dash](https://github.com/plotly/dash) - Interactive Web plotting.\n- [altair](https://github.com/altair-viz/altair) - Declarative statistical visualization.\n- [folium](https://github.com/python-visualization/folium) - Leaflet.js Maps.\n- [geoplot](https://github.com/ResidentMario/geoplot) - High-level geospatial data visualization.\n- [datashader](http://datashader.org) - Graphics pipeline system.\n- [mplleaftlet](https://github.com/jwass/mplleaflet) - Matplotlib plots from Python into interactive Leaflet web maps.\n- [matplotlib-venn](https://github.com/konstantint/matplotlib-venn) - Area-weighted venn-diagrams.\n- [pyLDAvis](https://github.com/bmabey/pyLDAvis) - Interactive topic model visualization.\n- [cufflinks](https://github.com/santosjorge/cufflinks) - Productivity Tools for Plotly + Pandas.\n- [scatterText](https://github.com/JasonKessler/scattertext) - Visualizations of how language differs among document types.\n- [plotnine](https://github.com/has2k1/plotnine) - ggplot for python.\n- [mizani](https://github.com/has2k1/mizani) - scales package.\n- [bqplot](https://github.com/bloomberg/bqplot) - Plotting library for IPython/Jupyter Notebooks.\n- [PtitPrince](https://github.com/pog87/PtitPrince) - Raindrop cloud.\n- [joypy](https://github.com/sbebo/joypy) - Ridgeline plots.\n- [dtreeviz](https://github.com/parrt/dtreeviz) - Decision tree visualization and model interpretation.\n- [ipyvolume](https://github.com/maartenbreddels/ipyvolume) - 3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL.\n\n## AutoML\n\n- [Nevergrad](https://github.com/facebookresearch/nevergrad) - Gradient-free optimization.\n- [featuretools](https://github.com/Featuretools/featuretools) - Automated feature engineering.\n- [auto-sklearn](https://github.com/automl/auto-sklearn) - Automated machine learning.\n- [tpot](https://github.com/EpistasisLab/tpot) - Automated machine learning.\n- [auto_ml](https://github.com/ClimbsRocks/auto_ml) - Automated machine learning.\n- [MLBox](https://github.com/AxeldeRomblay/MLBox) - Automated Machine Learning python library.\n- [devol](https://github.com/joeddav/devol) - Automated deep neural network design via genetic programming.\n- [skll](https://github.com/EducationalTestingService/skll) - SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.\n- [autokeras](https://github.com/jhfjhfj1/autokeras) - Automated machine learning in Keras.\n- [SMAC3](https://github.com/automl/SMAC3) - Sequential Model-based Algorithm Configuration.\n\n## Exploration\n\n- [mlxtend](https://github.com/rasbt/mlxtend) - A library of extension and helper modules for Python's data analysis and machine learning libraries.\n- [yellowbrick](https://github.com/DistrictDataLabs/yellowbrick) - Visual analysis and diagnostic tools.\n- [pandas-profiling](https://github.com/pandas-profiling/pandas-profiling) - Profiling reports for pandas DataFrame objects.\n- [Skater](https://github.com/datascienceinc/Skater) - Model Agnostic Interpretation.\n- [Dora](https://github.com/NathanEpstein/Dora) - Exploratory data analysis.\n- [sklearn-evaluation](https://github.com/edublancas/sklearn-evaluation) - scikit-learn model evaluation.\n- [fitter](http://pythonhosted.org/fitter/) - simple class to identify the distribution from which a data samples is generated from.\n- [missingno](https://github.com/ResidentMario/missingno) - Missing data visualization.\n- [hypertools](https://github.com/ContextLab/hypertools) - Gaining geometric insights into high-dimensional data.\n- [scikit-plot](https://github.com/reiinakano/scikit-plot) - Plotting functionality to scikit-learn objects.\n- [elih](https://github.com/fvinas/elih) - Explain Machine Learning.\n- [kmeans_smote](https://github.com/felix-last/kmeans_smote) - Oversampling for imbalanced learning based on k-means and SMOTE.\n- [pyUpSet](https://github.com/ImSoErgodic/py-upset) - UpSet suite of visualisation methods.\n- [lime](https://github.com/marcotcr/lime) - Explaining the predictions of any machine learning classifier.\n- [pandas-summary](https://github.com/mouradmourafiq/pandas-summary) - An extension to pandas dataframes describe function.\n- [SauceCat/PDPbox](https://github.com/SauceCat/PDPbox) - Partial dependence plot toolbox.\n- [shap](https://github.com/slundberg/shap) - A unified approach to explain the output of any machine learning model.\n- [eli5](https://github.com/TeamHG-Memex/eli5) - Debug machine learning classifiers and explain their predictions.\n- [rfpimp](https://github.com/parrt/random-forest-importances) - Permutation and drop-column importance for scikit-learn random forests.\n- [pypeln](https://github.com/cgarciae/pypeln) - Concurrent data pipelines made easy.\n- [pycm](https://github.com/sepandhaghighi/pycm) - Multi-class confusion matrix library in Python.\n- [great_expectations](https://github.com/great-expectations/great_expectations) - Always know what to expect from your data.\n- [alibi](https://github.com/SeldonIO/alibi) - Algorithms for monitoring and explaining machine learning models.\n- [InterpretML](https://github.com/interpretml/interpret) - Fit interpretable models. Explain blackbox machine learning.\n- [cleanlab](https://github.com/cgnorthcutt/cleanlab) - Finding label errors in datasets and learning with noisy labels.\n- [dtale](https://github.com/man-group/dtale) - Flask/React client for visualizing pandas data structures\n- [dabl](https://github.com/dabl/dabl) - Data Analysis Baseline Library\n- [XAI](https://github.com/EthicalML/xai) - XAI - An eXplainability toolbox for machine learning\n- [explainerdashboard](https://github.com/oegedijk/explainerdashboard) - This package makes it convenient to quickly deploy a dashboard web app that explains the workings of a (scikit-learn compatible) machine learning model.\n- [alibi-detect](https://github.com/SeldonIO/alibi-detect) - Open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series.\n\n## Feature Extraction\n\n### General Feature Extraction\n\n- [sklearn-pandas](https://github.com/scikit-learn-contrib/sklearn-pandas) - Pandas integration with sklearn.\n- [pdpipe](https://github.com/shaypal5/pdpipe) - Easy pipelines for pandas DataFrames.\n- [engarde](https://github.com/TomAugspurger/engarde) - Defensive data analysis.\n- [datacleaner](https://github.com/rhiever/datacleaner) - Tool that automatically cleans data sets and readies them for analysis.\n- [categorical-encoding](https://github.com/scikit-learn-contrib/categorical-encoding) - sklearn compatible categorical variable encoders.\n- [fancyimpute](https://github.com/iskandr/fancyimpute) - Multivariate imputation and matrix completion algorithms.\n- [raccoon](https://github.com/rsheftel/raccoon) - DataFrame with fast insert and appends.\n- [kmodes](https://github.com/nicodv/kmodes) - k-modes and k-prototypes clustering algorithm.\n- [annoy](https://github.com/spotify/annoy) - Approximate Nearest Neighbors.\n- [datacleaner](https://github.com/rhiever/datacleaner) - Automatically cleans data sets and readies them for analysis.\n- [scikit-feature](https://github.com/jundongl/scikit-feature) - Filter methods for feature selection.\n- [mifs](https://github.com/danielhomola/mifs) - Parallelized Mutual Information based Feature Selection module.\n- [skggm](https://github.com/skggm/skggm) - Scikit-learn compatible estimation of general graphical models.\n- [dirty_cat](https://dirty-cat.github.io/stable/index.html) - Encoding methods for dirty categorical variables.\n- [Impyute](https://github.com/eltonlaw/impyute) - Data imputations library to preprocess datasets with missing data.\n- [eif](https://github.com/sahandha/eif) - Extended Isolation Forest for Anomaly Detection.\n- [featexp](https://github.com/abhayspawar/featexp) - Feature exploration for supervised learning.\n- [feature_engine](https://github.com/solegalli/feature_engine) - Feature engineering package with sklearn like functionality.\n- [stumpy](https://github.com/TDAmeritrade/stumpy) - STUMPY is a powerful and scalable Python library that can be used for a variety of time series data mining tasks.\n- [n2](https://github.com/kakao/n2) - Lightweight approximate Nearest Neighbor library which runs faster even with large datasets.\n- [compressio](https://github.com/dylan-profiler/compressio) - Compressio provides lossless in-memory compression of pandas DataFrames and Series.\n\n### Time Series\n\n- [Merlion](https://github.com/salesforce/Merlion) - A Machine Learning Library for Time Series\n- [Darts](https://github.com/unit8co/darts) - darts is a Python library for easy manipulation and forecasting of time series.\n- [GrayKite](https://github.com/linkedin/greykite) - Greykite: A flexible, intuitive and fast forecasting library\n- [Causality](https://github.com/akelleh/causality) - Causal analysis.\n- [traces](https://github.com/datascopeanalytics/traces) - Unevenly-spaced time series analysis.\n- [PyFlux](https://github.com/RJT1990/pyflux) - Time series library for Python.\n- [prophet](https://github.com/facebook/prophet) - Tool for producing high quality forecasts.\n- [tsfresh](https://github.com/blue-yonder/tsfresh) - Automatic extraction of relevant features from time series.\n- [tslearn](https://github.com/rtavenar/tslearn) - Machine learning toolkit dedicated to time-series data.\n- [pyts](https://github.com/johannfaouzi/pyts) - A Python package for time series transformation and classification.\n- [sktime](https://github.com/alan-turing-institute/sktime) - A scikit-learn compatible Python toolbox for learning with time series data.\n- [stumpy](https://github.com/TDAmeritrade/stumpy) - Matrix profiles.\n- [luminaire](https://github.com/zillow/luminaire) - ML driven solutions for monitoring time series data.\n- [NeuralProphet](https://github.com/ourownstory/neural_prophet) - A Neural Network based Time-Series model, inspired by Facebook Prophet and AR-Net, built on PyTorch.\n\n### Audio\n\n- [python_speech_features](https://github.com/jameslyons/python_speech_features) - Speech features.\n- [speechpy](https://github.com/astorfi/speechpy) - A Library for Speech Processing and Recognition.\n- [magenta](https://github.com/tensorflow/magenta) - Music and Art Generation with Machine Intelligence.\n- [librosa](https://github.com/librosa/librosa) - Audio and music analysis.\n- [pydub](https://github.com/jiaaro/pydub) - Manipulate audio with a simple and easy high level interface.\n- [pytorch/audio](https://github.com/pytorch/audio) - simple audio I/O for pytorch.\n\n### Images and Video\n\n- [pillow](https://github.com/python-pillow/Pillow) - PIL fork.\n- [scikit-image](http://scikit-image.org/) - Image processing.\n- [hmap](https://github.com/rossgoodwin/hmap) - Image histogram remapping.\n- [pyocr](https://github.com/openpaperwork/pyocr) - A wrapper for Tesseract and Cuneiform (Optical Character Recognition).\n- [scikit-video](https://github.com/aizvorski/scikit-video) - Video processing.\n- [moviepy](http://zulko.github.io/moviepy/) - Video editing.\n- [OpenCV](http://opencv.org/) - Open Source Computer Vision Library.\n- [SimpleCV](http://simplecv.org/) - Wrapper around OpenCV.\n- [label-maker](https://github.com/developmentseed/label-maker) - Data Preparation for Satellite Machine Learning.\n- [face_recognition](https://github.com/ageitgey/face_recognition) - Facial recognition.\n- [imgaug](https://github.com/aleju/imgaug) - Image augmentation.\n- [pyvips](https://github.com/jcupitt/pyvips) - Fast image processing.\n- [ImageHash](https://github.com/JohannesBuchner/imagehash) - Image hashing.\n- [Augmentor](https://github.com/mdbloice/Augmentor) - Image augmentation library.\n- [PyAV](https://github.com/mikeboers/PyAV) - Bindings for FFmpeg.\n- [imutils](https://github.com/jrosebr1/imutils) - Convenience functions to make basic image processing operations.\n- [albumentations](https://github.com/albu/albumentations) - fast image augmentation library.\n\n### Geolocation\n\n- [geojson](https://github.com/frewsxcv/python-geojson) - Python bindings for GeoJSON.\n- [geopy](https://github.com/geopy/geopy) - Python Geocoding Toolbox.\n- [OSMnx](https://github.com/gboeing/osmnx) - Street networks.\n- [reverse-geocoder](https://github.com/thampiman/reverse-geocoder) - A fast, offline reverse geocoder.\n- [pysal](https://github.com/pysal/pysal) - Spatial Analysis Library.\n- [geopandas](https://github.com/geopandas/geopandas) - Tools for geographic data.\n\n### Text/NLP\n\n- [wordfreq](https://github.com/rspeer/wordfreq) - Library for looking up the frequencies of words in many languages, based on many sources of data.\n- [BlingFire](https://github.com/Microsoft/BlingFire) - A lightning fast Finite State machine and REgular expression manipulation library.\n- [BERT-pytorch](https://github.com/codertimo/BERT-pytorch) - Google AI 2018 BERT pytorch implementation.\n- [pytorch-pretrained-BERT](https://github.com/huggingface/pytorch-pretrained-BERT) - PyTorch version of Google AI's BERT model with script to load Google's pre-trained models.\n- [gensim](https://github.com/piskvorky/gensim) - Topic Modeling.\n- [pattern](https://github.com/clips/pattern) - Web ining module.\n- [probablepeople](https://github.com/datamade/probablepeople) - Parsing unstructured western names into name components.\n- [Expynent](https://github.com/lk-geimfari/expynent) - Regular expression patterns.\n- [mimesis](https://github.com/lk-geimfari/mimesis) - Generate synthetic data.\n- [pyenchant](https://github.com/rfk/pyenchant) - Spell checking.\n- [parserator](https://github.com/datamade/parserator) - Domain-specific probabilistic parsers.\n- [scrubadub](https://github.com/datascopeanalytics/scrubadub) - Clean personally identifiable information from dirty dirty text.\n- [usaddress](https://github.com/datamade/usaddress) - Parsing unstructured address strings into address components.\n- [python-phonenumbers](https://github.com/daviddrysdale/python-phonenumbers) - Python port of Google's libphonenumber.\n- [jellyfish](https://github.com/jamesturk/jellyfish) - Approximate and phonetic matching of strings.\n- [preprocessing](https://pronouncing.readthedocs.io/en/latest/) - Simple interface for the CMU Pronouncing Dictionary.\n- [langid](https://github.com/saffsd/langid.py) - Stand-alone language identification system.\n- [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy) - Fuzzy String Matching.\n- [Fuzzy](https://github.com/yougov/Fuzzy) - Soundex, NYSIIS, Double Metaphone.\n- [snowball](https://github.com/snowballstem/snowball) - Snowball compiler and stemming algorithms.\n- [leven](https://github.com/semanticize/leven) - Levenshtein edit distance.\n- [flashtext](https://github.com/vi3k6i5/flashtext) - Extract Keywords from sentence or Replace keywords in sentences.\n- [polyglot](https://github.com/aboSamoor/polyglot) - Multilingual text NLP processing toolkit.\n- [sentencepiece](https://github.com/google/sentencepiece) - Unsupervised text tokenizer for Neural Network-based text generation.\n- [pyfasttext](https://github.com/vrasneur/pyfasttext) - Binding for fastText.\n- [python-wordsegment](https://github.com/grantjenks/python-wordsegment) - English word segmentation.\n- [pyahocorasick](https://github.com/WojciechMula/pyahocorasick) - Exact or approximate multi-pattern string search.\n- [Wordbatch](https://github.com/anttttti/Wordbatch) - Parallel text feature extraction for machine learning.\n- [langdetect](https://github.com/Mimino666/langdetect) - Port of Google's language-detection library.\n- [translation](https://github.com/littlecodersh/translation) - Uses web services for text translation.\n- [nltk](http://www.nltk.org) - Natural Language Toolkit.\n- [unidecode](https://github.com/avian2/unidecode) - ASCII transliterations of Unicode text.\n- [pytorch/text](https://github.com/pytorch/text) - Data loaders and abstractions for text and NLP.\n- [textdistance](https://github.com/orsinium/textdistance) - Compute distance between sequences.\n- [sent2vec](https://github.com/epfml/sent2vec) - General purpose unsupervised sentence representations.\n- [pyhunspell](https://github.com/blatinier/pyhunspell) - Python bindings for the Hunspell spellchecker engine.\n- [facebook/fastText](https://github.com/facebookresearch/fastText) - Library for fast text representation and classification.\n- [textblob](https://github.com/sloria/textblob) - Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.\n- [facebook/InferSent](https://github.com/facebookresearch/InferSent) - Sentence embeddings (InferSent) and training code for NLI.\n- [nmslib](https://github.com/nmslib/nmslib) - Non-Metric Space Library.\n- [google/sentencepiece](https://github.com/google/sentencepiece) - Unsupervised text tokenizer for Neural Network-based text generation.\n- [ftfy](https://github.com/LuminosoInsight/python-ftfy) - Fixes mojibake and other glitches in Unicode text, after the fact.\n- [fletcher](https://github.com/xhochy/fletcher) - Pandas ExtensionDType/Array backed by Apache Arrow.\n- [textacy](https://github.com/chartbeat-labs/textacy) - NLP, before and after spaCy.\n- [hmtl](https://github.com/huggingface/hmtl) - Hierarchical Multi-Task Learning - A State-of-the-Art neural network model for several NLP tasks based on PyTorch and AllenNLP.\n- [pytext](https://github.com/facebookresearch/pytext) - A natural language modeling framework based on PyTorch.\n- [flair](https://github.com/zalandoresearch/flair) - A very simple framework for state-of-the-art Natural Language Processing.\n- [LASER](https://github.com/facebookresearch/LASER) - Language-Agnostic SEntence Representations.\n- [transformer-xl](https://github.com/kimiyoung/transformer-xl) - Attentive Language Models Beyond a Fixed-Length Context.\n- [textstat](https://github.com/shivam5992/textstat) - Calculate readability statistics of a text object - paragraphs, sentences, articles.\n- [nlpaug](https://github.com/makcedward/nlpaug) - Augmenting nlp for your machine learning projects.\n- [sum](https://github.com/miso-belica/sumy) - Automatic summarization of text documents and HTML.\n- [textract](https://github.com/deanmalmgren/textract) - Extract text from any document.\n- [newspaper](https://github.com/codelucas/newspaper) - News extraction, article extraction and content curation.\n\n### Ranking/Recommender\n\n- [recommenders](https://github.com/microsoft/recommenders) - Examples and best practices for building recommendation systems\n- [Surprise](https://github.com/NicolasHug/Surprise) - Analyzing recommender systems.\n- [trueskill](https://github.com/sublee/trueskill) - TrueSkill rating system.\n- [LightFM](https://github.com/lyst/lightfm) - Hybrid recommendation algorithm.\n- [implicit](https://github.com/benfred/implicit) - Collaborative Filtering for Implicit Datasets.\n\n## Trading\n\n- [Clairvoyant](https://github.com/anfederico/Clairvoyant) - Identify and monitor social/historical cues.\n- [zipline](https://github.com/quantopian/zipline) - Algorithmic Trading Library.\n- [qstrader](https://github.com/mhallsmoore/qstrader/) - Advanced Trading Infrastructure.\n\n## Misc\n\n- [mmh3](https://github.com/hajimes/mmh3) - MurmurHash3, a set of fast and robust hash functions.\n- [fbpca](https://github.com/facebook/fbpca) - Fast Randomized PCA/SVD.\n- [annoy](https://github.com/spotify/annoy) - Approximate Nearest Neighbors.\n- [pipeline](https://github.com/PipelineAI/pipeline) - Standard Runtime For Every Real-Time Machine Learning.\n- [crayon](https://github.com/torrvision/crayon) - A language-agnostic interface to TensorBoard.\n- [faiss](https://github.com/facebookresearch/faiss) - A library for efficient similarity search and clustering of dense vectors.\n- [pyod](https://github.com/yzhao062/pyod) - Comprehensive and scalable Python toolkit for detecting outlying objects in multivariate data.\n\n## Deployment\n\n- [evidently](https://github.com/evidentlyai/evidently) - Evidently helps evaluate machine learning models during validation and monitor them in production.\n- [onnx](https://github.com/onnx/onnx) - Open Neutral Network Exchange.\n- [lore](https://github.com/instacart/lore) - Lore makes machine learning approachable for Software Engineers and maintainable for Machine Learning Researchers.\n- [kubeflow](https://github.com/kubeflow/kubeflow) - Machine Learning Toolkit for Kubernetes.\n- [airflow](https://github.com/apache/incubator-airflow) - ETL.\n- [mlflow](https://github.com/databricks/mlflow) - Open source platform for the complete machine learning lifecycle.\n- [sklearn-porter](https://github.com/nok/sklearn-porter) - Transpile trained scikit-learn estimators.\n- [sklearn-compiledtrees](https://github.com/ajtulloch/sklearn-compiledtrees) - Compiled Decision Trees for scikit-learn.\n\n## Profiling\n\n- [mem_usage_ui](https://github.com/parikls/mem_usage_ui) - Measuring and graphing memory usage of local processes.\n- [viztracer](https://github.com/gaogaotiantian/viztracer) - VizTracer is a low-overhead logging/debugging/profiling tool that can trace and visualize your python code execution.\n- [py-spy](https://github.com/benfred/py-spy) - Sampling profiler for Python programs.\n- [memory_profiler](https://pypi.python.org/pypi/memory_profiler) - monitoring memory usage of a python program.\n- [line_profiler](https://github.com/rkern/line_profiler) - Line-by-line profiling.\n- [filprofiler](https://github.com/pythonspeed/filprofiler) - Fil a memory profiler designed for data processing applications.\n- [scalene](https://github.com/emeryberger/scalene) - High-performance CPU and memory profiler for Python.\n- [python-flamegraph](https://github.com/evanhempel/python-flamegraph) - Statistical profiler which outputs in format suitable for FlameGraph.\n\n## Python Tools\n\n- [Typer](https://github.com/tiangolo/typer) - Build CLIs with type hints.\n- [hydra](https://hydra.cc) - Framework for elegantly configuring complex applications.\n- [neurtu](https://github.com/symerio/neurtu) - A Python package for parametric benchmarks.\n- [pyprojroot](https://github.com/chendaniely/pyprojroot) - Finding project directories in Python.\n- [datasette](https://datasette.io) - An open source multi-tool for exploring and publishing data.\n- [delorean](https://github.com/myusuf3/delorean) - Time Travel Made Easy.\n- [pip-tools](https://github.com/nvie/pip-tools) - Keeps dependencies up to date.\n- [devpi](http://doc.devpi.net/latest/) - PyPI server and packaging/testing/release tool.\n- [Jupyter Notebook](https://jupyter.org) - Notebooks are awseome.\n- [click](https://github.com/pallets/click) - CLI package.\n- [sacredboard](https://github.com/chovanecm/sacredboard) - Dashboard for sacred.\n- [sacred](http://sacred.readthedocs.io/en/latest/) - Reproduce computational experiments.\n- [magic-wormhole](https://github.com/warner/magic-wormhole) - get things from one computer to another, safely.\n\n## Data Gathering\n\n- [gain](https://github.com/gaojiuli/gain) - Web crawling framework based on asyncio.\n- [MechanicalSoup](https://github.com/MechanicalSoup/MechanicalSoup) - A Python library for automating interaction with websites.\n- [camelot](https://github.com/socialcopsdev/camelot) - Camelot: PDF Table Extraction for Humans.\n- [Pandarallel](https://github.com/nalepae/pandarallel) - Parallel pandas.\n- [great_expectations](https://github.com/great-expectations/great_expectations) - F framework that helps teams save time and promote analytic integrity with a new twist on automated testing: pipeline tests.\n- [parse](https://github.com/r1chardj0n3s/parse) - Parse strings using a specification based on the Python format() syntax.\n- [CleverCSV](https://github.com/alan-turing-institute/CleverCSV) - CleverCSV is a Python package for handling messy CSV files\n","created_at":"2024-01-13T12:56:00.519Z","updated_at":"2026-04-14T17:00:26.436Z","primary_language":"Python","list_of_lists":false,"displayable":true,"categories":["Scientific","Feature Extraction","Machine Learning Frameworks","Python Tools","Visualization","Deep Learning Tools","Exploration","Profiling","Deep Learning Frameworks","Trading","Data Gathering","AutoML","Deployment","Misc","Deep Learning Projects","Outlier Detection"],"sub_categories":["Images and Video","Ranking/Recommender","General Feature Extraction","Text/NLP","Audio","Geolocation","Time Series"],"projects_url":"https://awesome.ecosyste.ms/api/v1/lists/thomasjpfan%2Fawesome-python-data-science/projects"}