Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/maxwelllzh/python-packages-for-data-geeks

A curated list of useful Python packages for data geeks
https://github.com/maxwelllzh/python-packages-for-data-geeks

machine-learning python visualization

Last synced: 2 months ago
JSON representation

A curated list of useful Python packages for data geeks

Awesome Lists containing this project

README

        

# A curated list of nice pakages for we data people

## Time Series
name|owner|stars|description
---|---|---|---
[**AnomalyDetection**](https://github.com/twitter/AnomalyDetection)|twitter|3.1k|Anomaly detection with r
[**stumpy**](https://github.com/TDAmeritrade/stumpy)|TDAmeritrade|879|Stumpy is a powerful and scalable python library that can be used for a variety of time series data mining tasks
[**gluon-ts**](https://github.com/awslabs/gluon-ts)|awslabs|765|Gluonts - probabilistic time series modeling in python
[**RobustSTL**](https://github.com/LeeDoYup/RobustSTL)|LeeDoYup|120|Unofficial implementation of robuststl: a robust seasonal-trend decomposition algorithm for long time series (aaai 2019)

## Feature Engineering
name|owner|stars|description
---|---|---|---
[**featuretools**](https://github.com/FeatureLabs/featuretools)|FeatureLabs|4.8k|An open source python library for automated feature engineering
[**Augly**](https://github.com/facebookresearch/AugLy)|facebookresearch|3.3k|A data augmentations library for audio, image, text, and video.
[**great_expectations**](https://github.com/great-expectations/great_expectations)|great-expectations|2.7k|Always know what to expect from your data.
[**categorical-encoders**](https://github.com/scikit-learn-contrib/categorical-encoding)|scikit-learn-contrib|1.1k|A library of sklearn compatible categorical variable encoders
[**fancy-impute**](https://github.com/iskandr/fancyimpute)|iskandr|735|Multivariate imputation and matrix completion algorithms implemented in python
[**dirty-cat**](https://github.com/dirty-cat/dirty_cat/)|dirty-cat|158|Encoding methods for dirty categorical variables

## Pandas Extensions
name|owner|stars|description
---|---|---|---
[**pandas-profiliing**](https://github.com/pandas-profiling/pandas-profiling)|pandas-profiling|5.9k|Create html profiling reports from pandas dataframe objects
[**pdpipe**](https://github.com/pdpipe/pdpipe)|pdpipe|557|Easy pipelines for pandas dataframes.
[**pydqc**](https://github.com/SauceCat/pydqc)|SauceCat|211|Python automatic data quality check toolkit
[**pandas_flavor**](https://github.com/Zsailer/pandas_flavor)|Zsailer|186|The easy way to write your own flavor of pandas
[**pandas-log**](https://github.com/eyaltrabelsi/pandas-log)|eyaltrabelsi|154|The goal of pandas-log is to provide feedback about basic pandas operations. it provides simple wrapper functions for the most common functions that add additional logs

## Feature Selection
name|owner|stars|description
---|---|---|---
[**scikit-features**](https://github.com/jundongl/scikit-feature)|jundongl|845|Open-source feature selection repository in python
[**boruta**](https://github.com/scikit-learn-contrib/boruta_py)|scikit-learn-contrib|615|Python implementations of the boruta all-relevant feature selection method.
[**ppscore**](https://github.com/8080labs/ppscore.git)|8080labs|321|Predictive power score (pps) in python
[**minepy**](https://github.com/minepy/minepy)|minepy|114|Minepy - maximal information-based nonparametric exploration
[**stability-selection**](https://github.com/scikit-learn-contrib/stability-selection)|scikit-learn-contrib|94|Scikit-learn compatible implementation of stability selection.

## Model Tunning
name|owner|stars|description
---|---|---|---
[**mlflow**](https://github.com/mlflow/mlflow)|mlflow|5.3k|Open source platform for the machine learning lifecycle
[**nnl**](https://github.com/microsoft/nni)|microsoft|4.5k|An open source automl toolkit for neural architecture search, model compression and hyper-parameter tuning.
[**metaflow**](https://github.com/Netflix/metaflow)|Netflix|2k|Build and manage real-life data science projects with ease.
[**skopt**](https://github.com/scikit-optimize/scikit-optimize)|scikit-optimize|1.6k|Sequential model-based optimization with a `scipy.optimize` interface
[**optuna**](https://github.com/optuna/optuna)|optuna|1.5k|A hyperparameter optimization framework

## AutoML
name|owner|stars|description
---|---|---|---
[**jina**](https://github.com/jina-ai/jina)|jina-ai|7.6k|Cloud-native neural search framework for 𝙖𝙣𝙮 kind of data
[**autokeras**](https://github.com/keras-team/autokeras)|keras-team|7k|An automl system based on keras
[**tpot**](https://github.com/EpistasisLab/tpot)|EpistasisLab|6.5k|A python automated machine learning tool that optimizes machine learning pipelines using genetic programming.
[**auto-scikitlearn**](https://github.com/automl/auto-sklearn)|automl|4.1k|Automated machine learning with scikit-learn
[**darts**](https://github.com/quark0/darts)|quark0|2.8k|Differentiable architecture search for convolutional and recurrent networks

## Dimension Reduction
name|owner|stars|description
---|---|---|---
[**umap**](https://github.com/lmcinnes/umap)|lmcinnes|3.4k|Uniform manifold approximation and projection
[**star-clustering**](https://github.com/josephius/star-clustering)|josephius|83|A clustering algorithm that automatically determines the number of clusters and works without hyperparameter fine-tuning.

## Machine Learning
name|owner|stars|description
---|---|---|---
[**pattern**](https://github.com/clips/pattern)|clips|7.2k|Web mining module for python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
[**VowpalWabbit**](https://github.com/VowpalWabbit/vowpal_wabbit)|VowpalWabbit|6.7k|Vowpal wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
[**xlearn**](https://github.com/aksnzhy/xlearn)|aksnzhy|2.6k|High performance, easy-to-use, and scalable machine learning (ml) package, including linear model (lr), factorization machines (fm), and field-aware factorization machines (ffm) for python and cli interface.
[**lightning**](https://github.com/scikit-learn-contrib/lightning)|scikit-learn-contrib|1.3k|Large-scale linear classification, regression and ranking in python
[**Metrics**](https://github.com/benhamner/Metrics)|benhamner|1.2k|Machine learning evaluation metrics, implemented in python, r, haskell, and matlab / octave
[**mlens**](https://github.com/flennerhag/mlens)|flennerhag|553|Ml-ensemble – high performance ensemble learning
[**NGBoost**](https://github.com/stanfordmlgroup/ngboost.git)|stanfordmlgroup|335|Natural gradient boosting for probabilistic prediction
[**polylearn**](https://github.com/scikit-learn-contrib/polylearn)|scikit-learn-contrib|191|A library for factorization machines and polynomial networks for classification and regression in python.

## Bayesian Statistics
name|owner|stars|description
---|---|---|---
[**pyro**](https://github.com/pyro-ppl/pyro)|pyro-ppl|6.2k|Deep universal probabilistic programming with python and pytorch
[**pymc**](https://github.com/pymc-devs/pymc3)|pymc-devs|4.6k|Probabilistic programming in python: bayesian modeling and probabilistic machine learning with theano
[**Edward**](https://github.com/blei-lab/edward)|blei-lab|4.6k|A probabilistic programming language in tensorflow. deep generative models, variational inference.

## Deep Learning
name|owner|stars|description
---|---|---|---
[**Autograd**](https://github.com/HIPS/autograd)|HIPS|4.4k|Efficiently computes derivatives of numpy code.
[**RAdam**](https://github.com/LiyuanLucasLiu/RAdam)|LiyuanLucasLiu|1.8k|On the variance of the adaptive learning rate and beyond
[**einops**](https://github.com/arogozhnikov/einops)|arogozhnikov|1.6k|Deep learning operations reinvented (for pytorch, tensorflow, chainer, gluon and others)
[**Pytorch Metric Learning**](https://github.com/KevinMusgrave/pytorch-metric-learning)|KevinMusgrave|1.3k|The easiest way to use deep metric learning in your application. modular, flexible, and extensible. written in pytorch.

## Model Training
name|owner|stars|description
---|---|---|---
[**horovod**](https://github.com/horovod/horovod)|horovod|11.8k|Distributed training framework for tensorflow, keras, pytorch, and apache mxnet.
[**tfx**](https://github.com/tensorflow/tfx)|tensorflow|1.2k|Tfx is an end-to-end platform for deploying production ml pipelines

## Distributed
name|owner|stars|description
---|---|---|---
[**ray**](https://github.com/ray-project/ray)|ray-project|13.3k|An open source framework that provides a simple, universal api for building distributed applications. ray is packaged with rllib, a scalable reinforcement learning library, and tune, a scalable hyperparameter tuning library.
[**dask**](https://github.com/dask/dask)|dask|7.5k|Parallel computing with task scheduling

## Federated Learning
name|owner|stars|description
---|---|---|---
[**FATE**](https://github.com/WeBankFinTech/FATE)|FederatedAI|1.1k|An industrial level federated learning framework

## Confident Learning
name|owner|stars|description
---|---|---|---
[**cleanlab**](https://github.com/cgnorthcutt/cleanlab)|cgnorthcutt|1.2k|Find label errors in datasets, weak supervision, and learning with noisy labels.

## Causal Inference
name|owner|stars|description
---|---|---|---
[**Edward**](https://github.com/blei-lab/edward)|blei-lab|4.6k|A probabilistic programming language in tensorflow. deep generative models, variational inference.
[**dowhy**](https://github.com/microsoft/dowhy)|microsoft|2.3k|Dowhy is a python library for causal inference that supports explicit modeling and testing of causal assumptions. dowhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
[**CausalML**](https://github.com/uber/causalml)|uber|2.1k|Uplift modeling and causal inference with machine learning algorithms
[**EconML**](https://github.com/microsoft/EconML)|microsoft|943|Alice (automated learning and intelligence for causation and economics) is a microsoft research project aimed at applying artificial intelligence concepts to economic decision making. one of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal …

## NLP Preprocessing
name|owner|stars|description
---|---|---|---
[**jieba**](https://github.com/fxsjy/jieba)|fxsjy|23k|结巴中文分词
[**HanLP**](https://github.com/hankcs/HanLP)|hankcs|19.4k|Natural language processing for the next decade. tokenization, part-of-speech tagging, named entity recognition, syntactic & semantic dependency parsing, document classification
[**datasets**](https://github.com/huggingface/datasets)|huggingface|8.6k|🤗 the largest hub of ready-to-use nlp datasets for ml models with fast, easy-to-use and efficient data manipulation tools
[**Chinese Word Embeddings**](https://github.com/Embedding/Chinese-Word-Vectors)|Embedding|6.6k|100+ chinese word vectors 上百种预训练中文词向量
[**sentencepiece**](https://github.com/google/sentencepiece)|google|3.3k|Unsupervised text tokenizer for neural network-based text generation.
[**ckiptagger**](https://github.com/ckiplab/ckiptagger)|ckiplab|1.1k|Ckip neural chinese word segmentation, pos tagging, and ner
[**jiagu**](https://github.com/ownthink/Jiagu)|ownthink|1k|Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
[**TextAttack**](https://github.com/QData/TextAttack)|QData|902|Textattack 🐙 is a python framework for adversarial attacks, data augmentation, and model training in nlp
[**MiNLP**](https://github.com/XiaoMi/MiNLP)|XiaoMi|494|Xiaomi natural language processing toolkits
[**fastHan**](https://github.com/fastnlp/fastHan)|fastnlp|163|Fasthan是基于fastnlp与pytorch实现的中文自然语言处理工具,像spacy一样调用方便。

## NLP Models
name|owner|stars|description
---|---|---|---
[**pytorch-transformers**](https://github.com/huggingface/pytorch-transformers)|huggingface|17.8k|🤗 transformers: state-of-the-art natural language processing for tensorflow 2.0 and pytorch.
[**xlnet**](https://github.com/zihangdai/xlnet)|zihangdai|5.3k|Xlnet: generalized autoregressive pretraining for language understanding
[**MatchZoo**](https://github.com/NTMC-Community/MatchZoo)|NTMC-Community|3.2k|Facilitating the design, comparison and sharing of deep text matching models.
[**GPT2-Chinese**](https://github.com/Morizeyao/GPT2-Chinese)|Morizeyao|2.4k|Chinese version of gpt2 training code, using bert tokenizer.
[**ALBERT**](https://github.com/brightmart/albert_zh)|brightmart|1.7k|A lite bert for self-supervised learning of language representations, 海量中文预训练albert模型
[**bertforkeras**](https://github.com/bojone/bert4keras)|bojone|1.2k|Light reimplement of bert for keras
[**AliceMind**](https://github.com/alibaba/AliceMind)|alibaba|820|Alibaba's collection of encoder-decoders from mind (machine intelligence of damo) lab
[**FinBert**](https://github.com/valuesimplex/FinBERT)|valuesimplex|356|
[**gensen**](https://github.com/Maluuba/gensen)|Maluuba|284|Learning general purpose distributed sentence representations via large scale multi-task learning

## Representation Learning
name|owner|stars|description
---|---|---|---
[**sentence-transformers**](https://github.com/UKPLab/sentence-transformers)|UKPLab|3k|Sentence embeddings with bert & xlnet
[**top2vec**](https://github.com/ddangelov/Top2Vec)|ddangelov|489|Top2vec learns jointly embedded topic, document and word vectors.
[**glyce embedding**](https://github.com/ShannonAI/glyce)|ShannonAI|238|Code for neurips 2019 - glyce: glyph-vectors for chinese character representations

## Image Processing
name|owner|stars|description
---|---|---|---
[**imgaug**](https://github.com/aleju/imgaug)|aleju|9k|Image augmentation for machine learning experiments.
[**albumentations**](https://github.com/albumentations-team/albumentations)|albumentations-team|7.2k|Fast image augmentation library and easy to use wrapper around other libraries. documentation: https://albumentations.ai/docs/ paper about library: https://www.mdpi.com/2078-2489/11/2/125
[**imagededupe**](https://github.com/idealo/imagededup)|idealo|2.7k|😎 finding duplicate images made easy!
[**imutils**](https://github.com/jrosebr1/imutils)|jrosebr1|2.6k|A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying matplotlib images easier with opencv and python.

## Object Detection
name|owner|stars|description
---|---|---|---
[**mmdetection**](https://github.com/open-mmlab/mmdetection)|open-mmlab|7.4k|Open mmlab detection toolbox and benchmark
[**keras-YOLO3**](https://github.com/qqwweee/keras-yolo3)|qqwweee|5.8k|A keras implementation of yolov3 (tensorflow backend)
[**Light Facial Detection**](https://github.com/Linzaer/Ultra-Light-Fast-Generic-Face-Detector-1MB)|Linzaer|3.9k|💎1mb lightweight face detection model (1mb轻量级人脸检测模型)
[**SSD-Tensorflow**](https://github.com/balancap/SSD-Tensorflow)|balancap|3.8k|Single shot multibox detector in tensorflow
[**detr**](https://github.com/facebookresearch/detr)|facebookresearch|3.4k|End-to-end object detection with transformers
[**FastMaskRCNN**](https://github.com/CharlesShang/FastMaskRCNN)|CharlesShang|3k|Mask rcnn in tensorflow
[**u2net**](https://github.com/NathanUA/U-2-Net)|NathanUA|873|"The code for our newly accepted paper in pattern recognition 2020: ""u^2-net: going deeper with nested u-structure for salient object detection."""
[**TFace**](https://github.com/Tencent/TFace)|Tencent|454|A trusty face recognition research platform developed by tencent youtu lab

## OCR
name|owner|stars|description
---|---|---|---
[**easyOCR**](https://github.com/JaidedAI/EasyOCR)|JaidedAI|8.3k|Ready-to-use ocr with 40+ languages supported including chinese, japanese, korean and thai
[**chineseocr-lite**](https://github.com/ouyanghuiyu/chineseocr_lite)|ouyanghuiyu|5.7k|超轻量级中文ocr,支持竖排文字识别, 支持ncnn推理 ( dbnet(1.8m) + crnn(2.5m) + anglenet(378kb)) 总模型仅4.7m
[**InvoiceNet**](https://github.com/naiveHobo/InvoiceNet)|naiveHobo|1.5k|Deep neural network to extract intelligent information from invoice documents.

## Recommendation
name|owner|stars|description
---|---|---|---
[**recommenders**](https://github.com/microsoft/recommenders)|microsoft|6.6k|Best practices on recommendation systems
[**DeepCTR**](https://github.com/shenweichen/DeepCTR)|shenweichen|3.3k|Easy-to-use,modular and extendible package of deep-learning based ctr models.
[**DeepFM**](https://github.com/ChenglongChen/tensorflow-DeepFM)|ChenglongChen|1.5k|Tensorflow implementation of deepfm for ctr prediction.
[**neural-collaborative-filtering**](https://github.com/hexiangnan/neural_collaborative_filtering)|hexiangnan|988|Neural collaborative filtering
[**deepmatch**](https://github.com/shenweichen/DeepMatch)|shenweichen|781|A deep matching model library for recommendations & advertising. it's easy to train models and to export representation vectors which can be used for ann search.
[**xDeepFM**](https://github.com/Leavingseason/xDeepFM)|Leavingseason|656|

## Outlier Detection
name|owner|stars|description
---|---|---|---
[**alibi-detect**](https://github.com/SeldonIO/alibi-detect)|SeldonIO|206|Algorithms for outlier and adversarial instance detection, concept drift and metrics.

## Graph
name|owner|stars|description
---|---|---|---
[**graph_nets**](https://github.com/deepmind/graph_nets)|deepmind|3.9k|Build graph nets in tensorflow
[**dgl**](https://github.com/dmlc/dgl)|dmlc|3.4k|Python package built to ease deep learning on graph, on top of existing dl frameworks.
[**graphSAGE**](https://github.com/williamleif/GraphSAGE)|williamleif|1.9k|Representation learning on large graphs using stochastic graph convolutions.
[**SNAP**](https://github.com/snap-stanford/snap)|snap-stanford|1.5k|Stanford network analysis platform (snap) is a general purpose network analysis and graph mining library.
[**stellargraph**](https://github.com/stellargraph/stellargraph)|stellargraph|1.2k|Stellargraph - machine learning on graphs
[**plato**](https://github.com/Tencent/plato)|Tencent|874|腾讯高性能分布式图计算框架plato
[**spektral**](https://github.com/danielegrattarola/spektral)|danielegrattarola|810|Graph neural networks with keras and tensorflow 2.
[**simple-graph**](https://github.com/dpapathanasiou/simple-graph)|dpapathanasiou|499|"This is a simple graph database in sqlite, inspired by ""sqlite as a document database"""

## Searching
name|owner|stars|description
---|---|---|---
[**faiss**](https://github.com/facebookresearch/faiss)|facebookresearch|9.7k|A library for efficient similarity search and clustering of dense vectors.
[**annoy**](https://github.com/spotify/annoy)|spotify|6.7k|Approximate nearest neighbors in c++/python optimized for memory usage and loading/saving to disk
[**haystack**](https://github.com/deepset-ai/haystack)|deepset-ai|2.1k|🔍 end-to-end python framework for building natural language search interfaces to data. leverages transformers and the state-of-the-art of nlp. supports dpr, elasticsearch, hugging face’s hub, and much more!

## Adversarial Learning
name|owner|stars|description
---|---|---|---
[**pytorch-CycleGAN-and-pix2pix**](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix)|junyanz|13.1k|Image-to-image translation in pytorch
[**CycleGAN**](https://github.com/junyanz/CycleGAN)|junyanz|9.7k|Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.
[**GANHacks**](https://github.com/soumith/ganhacks)|soumith|8.4k|"Starter from ""how to train a gan?"" at nips2016"
[**pix2pix**](https://github.com/phillipi/pix2pix)|phillipi|7.7k|Image-to-image translation with conditional adversarial nets
[**DCGAN**](https://github.com/carpedm20/DCGAN-tensorflow)|carpedm20|6.5k|"A tensorflow implementation of ""deep convolutional generative adversarial networks"""
[**ALAE**](https://github.com/podgorskiy/ALAE)|podgorskiy|1.8k|[cvpr2020] adversarial latent autoencoders
[**DoppelGANger**](https://github.com/fjxmlzn/DoppelGANger)|fjxmlzn|45|Using gans for sharing networked time series data: challenges, initial promise, and open questions, imc 2020

## Model Interpretation
name|owner|stars|description
---|---|---|---
[**SHAP**](https://github.com/slundberg/shap)|slundberg|7.2k|A unified approach to explain the output of any machine learning model.
[**LIME**](https://github.com/marcotcr/lime)|marcotcr|6.8k|Lime: explaining the predictions of any machine learning classifier
[**Tensorwatch**](https://github.com/microsoft/tensorwatch)|microsoft|2.5k|Debugging, monitoring and visualization for python machine learning and data science
[**eli5**](https://github.com/TeamHG-Memex/eli5)|TeamHG-Memex|1.8k|A library for debugging/inspecting machine learning classifiers and explaining their predictions
[**PDPBox**](https://github.com/SauceCat/PDPbox)|SauceCat|382|Python partial dependence plot toolbox

## Visualization
name|owner|stars|description
---|---|---|---
[**Dash**](https://github.com/plotly/dash)|plotly|10.8k|Analytical web apps for python & r. no javascript required.
[**prettymaps**](https://github.com/marceloprates/prettymaps)|marceloprates|7.2k|A small set of python functions to draw pretty maps from openstreetmap data. based on osmnx, matplotlib and shapely libraries.
[**Seaborn**](https://github.com/mwaskom/seaborn)|mwaskom|6.6k|Statistical data visualization using matplotlib
[**Plotly**](https://github.com/plotly/plotly.py)|plotly|5.8k|An open-source, interactive graphing library for python (includes plotly express) ✨
[**streamlit**](https://github.com/streamlit/streamlit)|streamlit|5.4k|Streamlit — the fastest way to build custom ml tools
[**folium**](https://github.com/python-visualization/folium)|python-visualization|4.3k|Python data. leaflet.js maps.
[**altair**](https://github.com/altair-viz/altair)|altair-viz|4.3k|Declarative statistical visualization library for python
[**dash sample apps**](https://github.com/plotly/dash-sample-apps)|plotly|2k|Open-source demos hosted on dash gallery
[**scikit-plot**](https://github.com/reiinakano/scikit-plot)|reiinakano|1.7k|An intuitive library to add plotting functionality to scikit-learn objects.
[**CNN-Visualizer**](https://github.com/poloclub/cnn-explainer)|poloclub|1.7k|Learning convolutional neural networks with interactive visualization. https://poloclub.github.io/cnn-explainer/

## Development Toolkit
name|owner|stars|description
---|---|---|---
[**free-apis**](https://github.com/public-apis/public-apis#geocoding)|public-apis|65.9k|A collective list of free apis for use in software and web development.
[**bash-bible**](https://github.com/dylanaraps/pure-bash-bible#strip-pattern-from-start-of-string)|dylanaraps|23.3k|📖 a collection of pure bash alternatives to external processes.
[**python-fire**](https://github.com/google/python-fire)|google|15.8k|Python fire is a library for automatically generating command line interfaces (clis) from absolutely any python object.
[**black**](https://github.com/psf/black)|psf|13.7k|The uncompromising python code formatter
[**PySnooper**](https://github.com/cool-RR/PySnooper)|cool-RR|12.9k|Never use print for debugging again
[**poetry**](https://github.com/sdispater/poetry)|sdispater|7.1k|Python dependency management and packaging made easy.
[**free api**](https://github.com/fangzesheng/free-api)|fangzesheng|6.5k|收集免费的接口服务,做一个api的搬运工
[**fastapi**](https://github.com/tiangolo/fastapi)|tiangolo|6.5k|Fastapi framework, high performance, easy to learn, fast to code, ready for production
[**playwright-python**](https://github.com/microsoft/playwright-python)|microsoft|4.9k|Python version of the playwright testing and automation library.
[**hypothesis**](https://github.com/HypothesisWorks/hypothesis)|HypothesisWorks|4k|Hypothesis is a powerful, flexible, and easy to use library for property-based testing.
[**modin**](https://github.com/modin-project/modin)|modin-project|3.6k|Modin: speed up your pandas workflows by changing a single line of code
[**pyautogui**](https://github.com/asweigart/pyautogui)|asweigart|3.2k|A cross-platform gui automation python module for human beings. used to programmatically control the mouse & keyboard.
[**jupytext**](https://github.com/mwouts/jupytext)|mwouts|3k|Jupyter notebooks as markdown documents, julia, python or r scripts
[**papermill**](https://github.com/nteract/papermill/)|nteract|2.7k|📚 parameterize, execute, and analyze notebooks
[**handclacs**](https://github.com/connorferster/handcalcs)|connorferster|2.3k|Python library for converting python calculations into rendered latex.
[**lark**](https://github.com/lark-parser/lark)|lark-parser|2k|Lark is a parsing toolkit for python, built with a focus on ergonomics, performance and modularity.
[**sqlfluff**](https://github.com/sqlfluff/sqlfluff)|sqlfluff|1.8k|A sql linter and auto-formatter for humans
[**handout**](https://github.com/danijar/handout)|danijar|1.8k|Turn python scripts into handouts with markdown and figures
[**urwind**](https://github.com/urwid/urwid)|urwid|1.7k|Console user interface library for python (official repo)
[**more-itertools**](https://github.com/more-itertools/more-itertools)|more-itertools|1.5k|More routines for operating on iterables, beyond itertools
[**xarray**](https://github.com/pydata/xarray)|pydata|1.5k|N-d labeled arrays and datasets in python
[**icecream - debugging**](https://github.com/gruns/icecream)|gruns|1.4k|🍦 sweet and creamy print debugging.
[**pygooglenews**](https://github.com/kotartemiy/pygooglenews)|kotartemiy|816|If google news had a python library
[**bottleneck**](https://github.com/pydata/bottleneck)|pydata|540|Fast numpy array functions written in c
[**wily**](https://github.com/tonybaloney/wily)|tonybaloney|445|A python application for tracking, reporting on timing and complexity in python code

## Tutorial
name|owner|stars|description
---|---|---|---
[**Python 100 days**](https://github.com/jackfrued/Python-100-Days)|jackfrued|70.8k|Python - 100天从新手到大师
[**Command line tutorial in one page**](https://github.com/jlevy/the-art-of-command-line)|jlevy|66.5k|Master the command line, in one page
[**Deep Learning 500 Questions**](https://github.com/scutan90/DeepLearning-500-questions)|scutan90|35.3k|深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为18个章节,50余万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系[email protected] 版权所有,违权必究 tan 2018.06
[**Learn Regex**](https://github.com/ziishaned/learn-regex)|ziishaned|31.9k|Learn regex the easy way
[**500 lines or less**](https://github.com/aosabook/500lines)|aosabook|23.9k|500 lines or less
[**Data Science Tutorial notebook**](https://github.com/donnemartin/data-science-ipython-notebooks)|donnemartin|17.7k|Data science python notebooks: deep learning (tensorflow, theano, caffe, keras), scikit-learn, kaggle, big data (spark, hadoop mapreduce, hdfs), matplotlib, pandas, numpy, scipy, python essentials, aws, and various command lines.
[**Awesome tensorflow**](https://github.com/jtoy/awesome-tensorflow)|jtoy|15.3k|Tensorflow - a curated list of dedicated resources http://tensorflow.org
[**NLP progress**](https://github.com/sebastianruder/NLP-progress)|sebastianruder|13k|Repository to track the progress in natural language processing (nlp), including the datasets and the current state-of-the-art for the most common nlp tasks.
[**《神经网络与深度学习》- 邱锡鹏**](https://github.com/nndl/nndl.github.io)|nndl|12k|《神经网络与深度学习》 邱锡鹏著 neural network and deep learning
[**wtfpython-cn**](https://github.com/leisurelicht/wtfpython-cn)|leisurelicht|9.2k|Wtfpython的中文翻译/施工结束/ 能力有限,欢迎帮我改进翻译
[**object-detection-papers**](https://github.com/hoya012/deep_learning_object_detection)|hoya012|8.1k|A paper list of object detection using deep learning.
[**MLAlgorithms**](https://github.com/rushter/MLAlgorithms)|rushter|7.8k|Minimal and clean examples of machine learning algorithms implementations
[**numpy-ml**](https://github.com/ddbourgin/numpy-ml)|ddbourgin|7.8k|Machine learning, in numpy
[**Reinforcement-learning-introduction**](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction)|ShangtongZhang|7.7k|Python implementation of reinforcement learning: an introduction
[**deep learning drizzle**](https://github.com/kmario23/deep-learning-drizzle)|kmario23|7.1k|Drench yourself in deep learning, reinforcement learning, machine learning, computer vision, and nlp by learning from these exciting lectures!!
[**Google Research**](https://github.com/google-research/google-research)|google-research|6k|Google ai research
[**GNN Papers**](https://github.com/thunlp/GNNPapers)|thunlp|5.7k|Must-read papers on graph neural networks (gnn)
[**minGPT**](https://github.com/karpathy/minGPT)|karpathy|5.3k|A minimal pytorch re-implementation of the openai gpt (generative pretrained transformer) training
[**UGATIT**](https://github.com/taki0112/UGATIT)|taki0112|4.4k|Official tensorflow implementation of u-gat-it: unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation
[**tensorflow2_tutorials_chinese**](https://github.com/czy36mengfei/tensorflow2_tutorials_chinese)|czy36mengfei|4k|Tensorflow2中文教程,持续更新(当前版本:tensorflow2.0),tag: tensorflow 2.0 tutorials
[**Tensorflow-2.x-Tutorials**](https://github.com/dragen1860/TensorFlow-2.x-Tutorials)|dragen1860|3.7k|Tensorflow 2.x version's tutorials and examples, including cnn, rnn, gan, auto-encoders, fasterrcnn, gpt, bert examples, etc. tf 2.0版入门实例代码,实战教程。
[**Machine Learning Notes from Prof. Yida Xu**](https://github.com/roboticcam/machine-learning-notes)|roboticcam|3.5k|My continuously updated machine learning, probabilistic models and deep learning notes and demos (1500+ slides) 我不间断更新的机器学习,概率模型和深度学习的讲义(1500+页)和视频链接
[**Awesome graph classification**](https://github.com/benedekrozemberczki/awesome-graph-classification)|benedekrozemberczki|2.5k|A collection of important graph embedding, classification and representation learning papers with implementations.
[**NLP-Beginner**](https://github.com/FudanNLP/nlp-beginner)|FudanNLP|2.3k|Nlp上手教程
[**openNRE**](https://github.com/thunlp/OpenNRE)|thunlp|2k|An open-source package for neural relation extraction (nre)
[**Microsoft NLP examples**](https://github.com/microsoft/nlp)|microsoft|1.9k|Natural language processing best practices & examples
[**anomaly detection**](https://github.com/yzhao062/anomaly-detection-resources)|yzhao062|1.9k|Anomaly detection related books, papers, videos, and toolboxes
[**Stanford Natural Language Understanding Course**](https://github.com/cgpotts/cs224u)|cgpotts|727|Code for stanford cs224u
[**Generative Models in TF2**](https://github.com/timsainb/tensorflow2-generative-models)|timsainb|690|Implementations of a number of generative models in tensorflow 2. gan, vae, seq2seq, vaegan, gaia, spectrogram inversion. everything is self contained in a jupyter notebook for easy export to colab.
[**Dimensional reduction algos**](https://github.com/heucoder/dimensionality_reduction_alo_codes)|heucoder|553|Pca、lda、mds、lle、tsne等降维算法的python实现
[**Generative Deep Learning**](https://github.com/davidADSP/GDL_code)|davidADSP|491|The official code repository for examples in the o'reilly book 'generative deep learning'
[**reinforcement learning**](https://github.com/dalmia/David-Silver-Reinforcement-learning)|dalmia|417|Notes for the reinforcement learning course by david silver along with implementation of various algorithms.
[**Keras Text classification**](https://github.com/yongzhuo/Keras-TextClassification)|yongzhuo|277|中文长文本分类、短句子分类、多标签分类、两句子相似度(chinese text classification of keras nlp, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,fasttext,textcnn,charcnn,textrnn, rcnn, dcnn, dpcnn, vdcnn, crnn, bert, xlnet, albert, attention, deepmoji, han, 胶囊网络-capsulenet, transformer-encode, seq2seq, ent, dmn,
[**Graph neural network implementation by Microsoft**](https://github.com/microsoft/tf-gnn-samples)|microsoft|161|Tensorflow implementations of graph neural networks

## Fun Stuff
name|owner|stars|description
---|---|---|---
[**funNLP**](https://github.com/fighting41love/funNLP)|fighting41love|14.9k|中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、it词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、coconlp信息抽取工具、国内电话号码正则匹配、清华大学xlore:中英文跨语言百科知识图谱、清华大学人工智能技术…
[**tiler**](https://github.com/nuno-faria/tiler)|nuno-faria|3.8k|👷 build images with images
[**Hacking neural nets**](https://github.com/Kayzaks/HackingNeuralNetworks)|Kayzaks|1.9k|A small course on exploiting and defending neural networks
[**KnockKnock**](https://github.com/huggingface/knockknock)|huggingface|1.7k|🚪✊knock knock: get notified when your training ends with only two additional lines of code
[**break-capcha**](https://github.com/zhaipro/easy12306)|zhaipro|1.4k|使用机器学习算法完成对12306验证码的自动识别
[**GNE**](https://github.com/kingname/GeneralNewsExtractor)|kingname|908|新闻网页正文通用抽取器 alpha 版.
[**pyforest**](https://github.com/8080labs/pyforest)|8080labs|689|Pyforest - feel the bliss of automated imports

## Trading
name|owner|stars|description
---|---|---|---
[**zipline**](https://github.com/quantopian/zipline)|quantopian|11.2k|Zipline, a pythonic algorithmic trading library
[**tensortrade**](https://github.com/tensortrade-org/tensortrade)|tensortrade-org|2k|An open source reinforcement learning framework for training, evaluating, and deploying robust trading agents.
[**mlfinlab**](https://github.com/hudson-and-thames/mlfinlab)|hudson-and-thames|1.3k|Mlfinlab helps portfolio managers and traders who want to leverage the power of machine learning by providing reproducible, interpretable, and easy to use tools.
[**tf-quant**](https://github.com/google/tf-quant-finance)|google|773|High-performance tensorflow library for quantitative finance.

## Contribution Guide

Add your favourite packages to `package.json`, and run `package_info.py` to update the page :)