Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock

Last synced: 28 Oct 2024

https://github.com/wowchemy/wowchemy-hugo-themes

🚨 GROW YOUR AUDIENCE WITH HUGOBLOX! 🚀 HugoBlox is an easy, fast no-code website builder for researchers, entrepreneurs, data scientists, and developers. Build stunning sites in minutes. 适合研究人员、企业家、数据科学家和开发者的简单快速无代码网站构建器。用拖放功能、可定制模板和内置SEO工具快速创建精美网站!

academic blog blog-engine cms data-science documentation-tool github-pages hugo hugo-theme jupyter netlify open-science page-builder portfolio r rmarkdown rstudio static-site-generator theme website-builder

Last synced: 02 Nov 2024

https://github.com/hugoblox/hugo-blox-builder

🚨 GROW YOUR AUDIENCE WITH HUGOBLOX! 🚀 HugoBlox is an easy, fast no-code website builder for researchers, entrepreneurs, data scientists, and developers. Build stunning sites in minutes. 适合研究人员、企业家、数据科学家和开发者的简单快速无代码网站构建器。用拖放功能、可定制模板和内置SEO工具快速创建精美网站!

academic blog blog-engine cms data-science documentation-tool github-pages hugo hugo-theme jupyter netlify open-science page-builder portfolio r rmarkdown rstudio static-site-generator theme website-builder

Last synced: 01 Nov 2024

https://github.com/vaexio/vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀

bigdata data-science dataframe hdf5 machine-learning machinelearning memory-mapped-file pyarrow python tabular-data visualization

Last synced: 29 Oct 2024

https://github.com/blue-yonder/tsfresh

Automatic extraction of relevant features from time series:

data-science feature-extraction time-series

Last synced: 28 Oct 2024

https://github.com/jackzhenguo/python-small-examples

告别枯燥,致力于打造 Python 实用小例子,更多Python良心教程见 https://ai-jupyter.com

data-science machine-learning python python-gui python-web pytorch tensorflow

Last synced: 29 Oct 2024

https://github.com/HugoBlox/hugo-blox-builder

😍 EASILY BUILD THE WEBSITE YOU WANT - NO CODE, JUST MARKDOWN BLOCKS! 使用块轻松创建任何类型的网站 - 无需代码。 一个应用程序,没有依赖项,没有 JS

academic blog blog-engine cms data-science documentation-tool github-pages hugo hugo-theme jupyter netlify open-science page-builder portfolio r rmarkdown rstudio static-site-generator theme website-builder

Last synced: 25 Oct 2024

https://github.com/catboost/catboost

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

big-data catboost categorical-features coreml cuda data-mining data-science decision-trees gbdt gbm gpu gpu-computing gradient-boosting kaggle machine-learning python r tutorial

Last synced: 28 Oct 2024

https://github.com/activeloopai/deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

ai computer-vision cv data-science data-version-control datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops python pytorch tensorflow vector-database vector-search

Last synced: 31 Oct 2024

https://github.com/activeloopai/Hub

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

ai computer-vision cv data-science data-version-control datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops python pytorch tensorflow vector-database vector-search

Last synced: 10 Aug 2024

https://github.com/drivendata/cookiecutter-data-science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

ai cookiecutter cookiecutter-data-science cookiecutter-template data-science machine-learning

Last synced: 14 Nov 2024

https://github.com/drivendataorg/cookiecutter-data-science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

ai cookiecutter cookiecutter-data-science cookiecutter-template data-science machine-learning

Last synced: 29 Oct 2024

https://github.com/mrdbourke/machine-learning-roadmap

A roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.

data data-science deep-learning machine-learning

Last synced: 14 Oct 2024

https://github.com/firmai/industry-machine-learning

A curated list of applied machine learning and data science notebooks and libraries across different industries (by @firmai)

data-science datascience example firmai jupyter-notebook machine-learning practical-machine-learning python

Last synced: 15 Oct 2024

https://github.com/rasbt/python-machine-learning-book-2nd-edition

The "Python Machine Learning (2nd edition)" book code repository and info resource

data-science deep-learning machine-learning python scikit-learn tensorflow

Last synced: 13 Oct 2024

https://github.com/unit8co/darts

A python library for user-friendly forecasting and anomaly detection on time series.

anomaly-detection data-science deep-learning forecasting machine-learning python time-series

Last synced: 29 Oct 2024

https://github.com/microsoft/dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

bayesian-networks causal-inference causal-machine-learning causal-models causality data-science do-calculus graphical-models machine-learning python3 treatment-effects

Last synced: 28 Aug 2024

https://github.com/py-why/dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

bayesian-networks causal-inference causal-machine-learning causal-models causality data-science do-calculus graphical-models machine-learning python3 treatment-effects

Last synced: 28 Oct 2024

https://microsoft.github.io/dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

bayesian-networks causal-inference causal-machine-learning causal-models causality data-science do-calculus graphical-models machine-learning python3 treatment-effects

Last synced: 03 Oct 2024

https://github.com/scikit-learn-contrib/imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

data-analysis data-science machine-learning python statistics

Last synced: 28 Oct 2024

https://github.com/jwilber/roughViz

Reusable JavaScript library for creating sketchy/hand-drawn styled charts in the browser.

charting-library d3v5 dashboard data-science data-visualization visualization

Last synced: 29 Oct 2024

https://github.com/h2oai/h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

automl big-data data-science deep-learning distributed ensemble-learning gbm gpu h2o h2o-automl hadoop java machine-learning naive-bayes opensource pca python r random-forest spark

Last synced: 29 Oct 2024

https://github.com/mahmoud/boltons

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

cache data-science data-structures file json python queue recursive standard-library statistics utilities

Last synced: 28 Oct 2024

https://github.com/Visualize-ML/Book3_Elements-of-Mathematics

Book_3_《数学要素》 | 鸢尾花书:从加减乘除到机器学习;上架;欢迎继续纠错,纠错多的同学还会有赠书!

data-science linear-algebra machine-learning mathematics matrix

Last synced: 11 Nov 2024

https://github.com/visualize-ml/book3_elements-of-mathematics

Book_3_《数学要素》 | 鸢尾花书:从加减乘除到机器学习;上架;欢迎继续纠错,纠错多的同学还会有赠书!

data-science linear-algebra machine-learning mathematics matrix

Last synced: 15 Oct 2024

https://github.com/rushter/data-science-blogs

A curated list of data science blogs

data-science machine-learning

Last synced: 13 Nov 2024

https://github.com/rhiever/data-analysis-and-machine-learning-projects

Repository of teaching materials, code, and data for my data analysis and machine learning projects.

data-analysis data-science evolutionary-algorithm ipython-notebook machine-learning python

Last synced: 11 Oct 2024

https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects

Repository of teaching materials, code, and data for my data analysis and machine learning projects.

data-analysis data-science evolutionary-algorithm ipython-notebook machine-learning python

Last synced: 27 Oct 2024

https://github.com/jpmorganchase/python-training

Python training for business analysts and traders

banking binder binder-ready cib data-science finance jpmorgan jupyter jupyterlab python

Last synced: 11 Oct 2024

https://github.com/dair-ai/ML-Course-Notes

🎓 Sharing machine learning course / lecture notes.

ai data-science deep-learning machine-learning natural-language-processing

Last synced: 08 Nov 2024

https://github.com/dair-ai/ml-course-notes

🎓 Sharing machine learning course / lecture notes.

ai data-science deep-learning machine-learning natural-language-processing

Last synced: 14 Oct 2024

https://github.com/skypilot-org/skypilot

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

cloud-computing cloud-management cost-management cost-optimization data-science deep-learning distributed-training finops gpu hyperparameter-tuning job-queue job-scheduler llm-serving llm-training machine-learning ml-infrastructure ml-platform multicloud spot-instances tpu

Last synced: 28 Oct 2024

https://github.com/HazyResearch/snorkel

A system for quickly generating training data with weak supervision

ai data-augmentation data-science data-slicing labeling machine-learning python snorkel training-data weak-supervision

Last synced: 05 Aug 2024

https://github.com/snorkel-team/snorkel

A system for quickly generating training data with weak supervision

ai data-augmentation data-science data-slicing labeling machine-learning python snorkel training-data weak-supervision

Last synced: 26 Oct 2024

https://hazyresearch.github.io/snorkel

A system for quickly generating training data with weak supervision

ai data-augmentation data-science data-slicing labeling machine-learning python snorkel training-data weak-supervision

Last synced: 16 Oct 2024

https://github.com/airbnb/knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.

data data-analysis data-science knowledge

Last synced: 29 Oct 2024

https://github.com/pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: 06 Nov 2024

https://github.com/ujjwalkarn/DataSciencePython

common data analysis and machine learning tasks using python

data-science data-scientists python python-tutorial

Last synced: 26 Oct 2024

https://github.com/ujjwalkarn/datasciencepython

common data analysis and machine learning tasks using python

data-science data-scientists python python-tutorial

Last synced: 15 Oct 2024

https://github.com/lux-org/lux

Automatically visualize your pandas dataframe via a single print! 📊 💡

data-science exploratory-data-analysis jupyter pandas python visualization visualization-tools

Last synced: 11 Oct 2024

https://github.com/pymupdf/pymupdf

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: 28 Oct 2024

https://github.com/aaronwangy/Data-Science-Cheatsheet

A helpful 5-page machine learning cheatsheet to assist with exam reviews, interview prep, and anything in-between.

cheatsheet data-science machine-learning

Last synced: 07 Nov 2024

https://github.com/aaronwangy/data-science-cheatsheet

A helpful 5-page machine learning cheatsheet to assist with exam reviews, interview prep, and anything in-between.

cheatsheet data-science machine-learning

Last synced: 15 Oct 2024

https://github.com/rasbt/mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.

association-rules data-mining data-science machine-learning python supervised-learning unsupervised-learning

Last synced: 28 Oct 2024

https://github.com/flyteorg/flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

data data-analysis data-science dataops declarative fine-tuning flyte golang grpc kubernetes kubernetes-operator llm machine-learning mlops orchestration-engine production production-grade python scale workflow

Last synced: 28 Oct 2024

https://github.com/blei-lab/edward

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow

Last synced: 13 Oct 2024

https://github.com/opensource9ja/danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

danfojs data-analysis data-analytics data-manipulation data-science dataframe javascript pandas plotting-charts stream-data stream-processing table tensorflow tensors

Last synced: 14 Nov 2024

https://github.com/javascriptdata/danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

danfojs data-analysis data-analytics data-manipulation data-science dataframe javascript pandas plotting-charts stream-data stream-processing table tensorflow tensors

Last synced: 13 Oct 2024

https://github.com/evidentlyai/evidently

Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b

data-drift data-science hacktoberfest html-report jupyter-notebook machine-learning machine-learning-operations mlops model-monitoring pandas-dataframe production-machine-learning

Last synced: 28 Oct 2024

https://github.com/Nyandwi/machine_learning_complete

A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

computer-vision data-analysis data-science data-visualization datascience deep-learning keras machine-learning matplotlib neural-networks nlp numpy open-source pandas python scikit-learn seaborn tensorflow

Last synced: 05 Nov 2024

https://github.com/nyandwi/machine_learning_complete

A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

computer-vision data-analysis data-science data-visualization datascience deep-learning keras machine-learning matplotlib neural-networks nlp numpy open-source pandas python scikit-learn seaborn tensorflow

Last synced: 10 Oct 2024

https://github.com/hadley/r4ds

R for data science: a book

book bookdown data-science r

Last synced: 14 Oct 2024

https://github.com/goq/telegram-list

List of telegram groups, channels & bots // Список интересных групп, каналов и ботов телеграма // Список чатов для программистов

bot coding community data-science data-science-club deep-learning devops devops-teams frontend hacker-news linux machine-learning microsoft news programming programming-languages smm telegram telegram-group theory

Last synced: 31 Oct 2024

https://github.com/BoltzmannEntropy/interviews.ai

It is my belief that you, the postgraduate students and job-seekers for whom the book is primarily meant will benefit from reading it; however, it is my hope that even the most experienced researchers will find it fascinating as well.

artificial-intelligence autograd bayesian-statistics convolutional-neural-networks data-science deep-learning ensemble-learning feature-extraction graduate-school information-theory interview-preparation jax jobs logistic-regression loss-functions machine-learning python pytorch pytorch-tutorial

Last synced: 30 Oct 2024

https://github.com/boltzmannentropy/interviews.ai

It is my belief that you, the postgraduate students and job-seekers for whom the book is primarily meant will benefit from reading it; however, it is my hope that even the most experienced researchers will find it fascinating as well.

artificial-intelligence autograd bayesian-statistics convolutional-neural-networks data-science deep-learning ensemble-learning feature-extraction graduate-school information-theory interview-preparation jax jobs logistic-regression loss-functions machine-learning python pytorch pytorch-tutorial

Last synced: 15 Oct 2024

https://github.com/okfn-brasil/serenata-de-amor

🕵 Artificial Intelligence for social control of public administration | **This repository does not receive frequent updates. Check out the README**

artificial-intelligence civic-tech data-science machine-learning open-data politics

Last synced: 15 Oct 2024

https://github.com/fluxml/flux.jl

Relax! Flux is the ML library that doesn't make you tensor

data-science deep-learning flux machine-learning neural-networks the-human-brain

Last synced: 15 Oct 2024

https://github.com/FluxML/Flux.jl

Relax! Flux is the ML library that doesn't make you tensor

data-science deep-learning flux machine-learning neural-networks the-human-brain

Last synced: 27 Oct 2024

https://github.com/datawhalechina/competition-baseline

数据挖掘、计算机视觉、自然语言处理、推荐系统竞赛知识、代码、思路

data-competition data-science deep-learning kaggle

Last synced: 16 Nov 2024

https://github.com/dsgiitr/d2l-pytorch

This project reproduces the book Dive Into Deep Learning (https://d2l.ai/), adapting the code from MXNet into PyTorch.

book computer-vision d2l data-science deep-learning dive-into-deep-learning mxnet nlp pytorch pytorch-implmention

Last synced: 14 Oct 2024

https://github.com/hill-a/stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

baselines data-science gym machine-learning openai python reinforcement-learning reinforcement-learning-algorithms toolbox

Last synced: 30 Oct 2024