Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/activeloopai/deeplake

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

ai computer-vision cv data-science data-version-control datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops python pytorch tensorflow vector-database vector-search

Last synced: 31 Jul 2024

https://github.com/drivendataorg/cookiecutter-data-science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

ai cookiecutter cookiecutter-data-science cookiecutter-template data-science machine-learning

Last synced: 01 Aug 2024

https://github.com/drivendata/cookiecutter-data-science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

ai cookiecutter cookiecutter-data-science cookiecutter-template data-science machine-learning

Last synced: 02 Aug 2024

https://github.com/mrdbourke/machine-learning-roadmap

A roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.

data data-science deep-learning machine-learning

Last synced: 31 Jul 2024

https://github.com/rasbt/python-machine-learning-book-2nd-edition

The "Python Machine Learning (2nd edition)" book code repository and info resource

data-science deep-learning machine-learning python scikit-learn tensorflow

Last synced: 30 Jul 2024

https://github.com/firmai/industry-machine-learning

A curated list of applied machine learning and data science notebooks and libraries across different industries (by @firmai)

data-science datascience example firmai jupyter-notebook machine-learning practical-machine-learning python

Last synced: 31 Jul 2024

https://github.com/unit8co/darts

A python library for user-friendly forecasting and anomaly detection on time series.

anomaly-detection data-science deep-learning forecasting machine-learning python time-series

Last synced: 31 Jul 2024

https://github.com/py-why/dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

bayesian-networks causal-inference causal-machine-learning causal-models causality data-science do-calculus graphical-models machine-learning python3 treatment-effects

Last synced: 31 Jul 2024

https://github.com/microsoft/dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

bayesian-networks causal-inference causal-machine-learning causal-models causality data-science do-calculus graphical-models machine-learning python3 treatment-effects

Last synced: 28 Aug 2024

https://github.com/scikit-learn-contrib/imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

data-analysis data-science machine-learning python statistics

Last synced: 30 Jul 2024

https://github.com/h2oai/h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

automl big-data data-science deep-learning distributed ensemble-learning gbm gpu h2o h2o-automl hadoop java machine-learning naive-bayes opensource pca python r random-forest spark

Last synced: 30 Jul 2024

https://github.com/jwilber/roughViz

Reusable JavaScript library for creating sketchy/hand-drawn styled charts in the browser.

charting-library d3v5 dashboard data-science data-visualization visualization

Last synced: 31 Jul 2024

https://github.com/mahmoud/boltons

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

cache data-science data-structures file json python queue recursive standard-library statistics utilities

Last synced: 30 Jul 2024

https://github.com/rushter/data-science-blogs

A curated list of data science blogs

data-science machine-learning

Last synced: 31 Jul 2024

https://github.com/Visualize-ML/Book3_Elements-of-Mathematics

Book_3_《数学要素》 | 鸢尾花书:从加减乘除到机器学习;上架;欢迎继续纠错,纠错多的同学还会有赠书!

data-science linear-algebra machine-learning mathematics matrix

Last synced: 02 Aug 2024

https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects

Repository of teaching materials, code, and data for my data analysis and machine learning projects.

data-analysis data-science evolutionary-algorithm ipython-notebook machine-learning python

Last synced: 31 Jul 2024

https://github.com/dair-ai/ML-Course-Notes

🎓 Sharing machine learning course / lecture notes.

ai data-science deep-learning machine-learning natural-language-processing

Last synced: 01 Aug 2024

https://github.com/skypilot-org/skypilot

SkyPilot: Run LLMs, AI, and Batch jobs on any cloud. Get maximum savings, highest GPU availability, and managed execution—all with a simple interface.

cloud-computing cloud-management cost-management cost-optimization data-science deep-learning distributed-training finops gpu hyperparameter-tuning job-queue job-scheduler llm-serving llm-training machine-learning ml-infrastructure ml-platform multicloud spot-instances tpu

Last synced: 31 Jul 2024

https://github.com/snorkel-team/snorkel

A system for quickly generating training data with weak supervision

ai data-augmentation data-science data-slicing labeling machine-learning python snorkel training-data weak-supervision

Last synced: 30 Jul 2024

https://github.com/HazyResearch/snorkel

A system for quickly generating training data with weak supervision

ai data-augmentation data-science data-slicing labeling machine-learning python snorkel training-data weak-supervision

Last synced: 05 Aug 2024

https://github.com/airbnb/knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.

data data-analysis data-science knowledge

Last synced: 31 Jul 2024

https://github.com/ujjwalkarn/DataSciencePython

common data analysis and machine learning tasks using python

data-science data-scientists python python-tutorial

Last synced: 30 Jul 2024

https://github.com/lux-org/lux

Automatically visualize your pandas dataframe via a single print! 📊 💡

data-science exploratory-data-analysis jupyter pandas python visualization visualization-tools

Last synced: 31 Jul 2024

https://github.com/flyteorg/flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

data data-analysis data-science dataops declarative fine-tuning flyte golang grpc kubernetes kubernetes-operator llm machine-learning mlops orchestration-engine production production-grade python scale workflow

Last synced: 31 Jul 2024

https://github.com/rasbt/mlxtend

A library of extension and helper modules for Python's data analysis and machine learning libraries.

association-rules data-mining data-science machine-learning python supervised-learning unsupervised-learning

Last synced: 30 Jul 2024

https://github.com/blei-lab/edward

A probabilistic programming language in TensorFlow. Deep generative models, variational inference.

bayesian-methods data-science deep-learning machine-learning neural-networks probabilistic-programming statistics tensorflow

Last synced: 31 Jul 2024

https://github.com/opensource9ja/danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

danfojs data-analysis data-analytics data-manipulation data-science dataframe javascript pandas plotting-charts stream-data stream-processing table tensorflow tensors

Last synced: 02 Aug 2024

https://github.com/javascriptdata/danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

danfojs data-analysis data-analytics data-manipulation data-science dataframe javascript pandas plotting-charts stream-data stream-processing table tensorflow tensors

Last synced: 31 Jul 2024

https://github.com/aaronwangy/Data-Science-Cheatsheet

A helpful 5-page machine learning cheatsheet to assist with exam reviews, interview prep, and anything in-between.

cheatsheet data-science machine-learning

Last synced: 01 Aug 2024

https://github.com/evidentlyai/evidently

Evaluate and monitor ML models from validation to production. Join our Discord: https://discord.com/invite/xZjKRaNp8b

data-drift data-science hacktoberfest html-report jupyter-notebook machine-learning machine-learning-operations mlops model-monitoring pandas-dataframe production-machine-learning

Last synced: 31 Jul 2024

https://github.com/pymupdf/PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

data-science epub extract-data font mupdf ocr pdf pdf-documents pymupdf python table-extraction tesseract text-processing text-shaping xps

Last synced: 01 Aug 2024

https://github.com/Nyandwi/machine_learning_complete

A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

computer-vision data-analysis data-science data-visualization datascience deep-learning keras machine-learning matplotlib neural-networks nlp numpy open-source pandas python scikit-learn seaborn tensorflow

Last synced: 01 Aug 2024

https://github.com/okfn-brasil/serenata-de-amor

🕵 Artificial Intelligence for social control of public administration | **This repository does not receive frequent updates. Check out the README**

artificial-intelligence civic-tech data-science machine-learning open-data politics

Last synced: 30 Jul 2024

https://github.com/goq/telegram-list

List of telegram groups, channels & bots // Список интересных групп, каналов и ботов телеграма // Список чатов для программистов

bot coding community data-science data-science-club deep-learning devops devops-teams frontend hacker-news linux machine-learning microsoft news programming programming-languages smm telegram telegram-group theory

Last synced: 31 Jul 2024

https://github.com/hadley/r4ds

R for data science: a book

book bookdown data-science r

Last synced: 02 Aug 2024

https://github.com/BoltzmannEntropy/interviews.ai

It is my belief that you, the postgraduate students and job-seekers for whom the book is primarily meant will benefit from reading it; however, it is my hope that even the most experienced researchers will find it fascinating as well.

artificial-intelligence autograd bayesian-statistics convolutional-neural-networks data-science deep-learning ensemble-learning feature-extraction graduate-school information-theory interview-preparation jax jobs logistic-regression loss-functions machine-learning python pytorch pytorch-tutorial

Last synced: 31 Jul 2024

https://github.com/FluxML/Flux.jl

Relax! Flux is the ML library that doesn't make you tensor

data-science deep-learning flux machine-learning neural-networks the-human-brain

Last synced: 31 Jul 2024

https://github.com/dsgiitr/d2l-pytorch

This project reproduces the book Dive Into Deep Learning (https://d2l.ai/), adapting the code from MXNet into PyTorch.

book computer-vision d2l data-science deep-learning dive-into-deep-learning mxnet nlp pytorch pytorch-implmention

Last synced: 31 Jul 2024

https://github.com/datawhalechina/competition-baseline

数据挖掘、计算机视觉、自然语言处理、推荐系统竞赛知识、代码、思路

data-competition data-science deep-learning kaggle

Last synced: 01 Aug 2024

https://github.com/louisfb01/start-machine-learning

A complete guide to start and improve in machine learning (ML), artificial intelligence (AI) in 2024 without ANY background in the field and stay up-to-date with the latest news and state-of-the-art techniques!

artificial-intelligence cheat-sheets course coursera coursera-machine-learning data-science deep-learning learn-to-code learning learning-python linear-algebra machine-learning neural-networks practice probability-statistics read-articles tutorial tutorials youtube youtube-playlist

Last synced: 31 Jul 2024

https://github.com/hill-a/stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

baselines data-science gym machine-learning openai python reinforcement-learning reinforcement-learning-algorithms toolbox

Last synced: 31 Jul 2024

https://github.com/Azure/MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft

azure azure-machine-learning azure-ml azureml data-science deep-learning machine-learning notebook

Last synced: 31 Jul 2024

https://github.com/faridrashidi/kaggle-solutions

🏅 Collection of Kaggle Solutions and Ideas 🏅

awesome competition data-mining data-science kaggle machine-learning solutions

Last synced: 31 Jul 2024

https://github.com/nteract/hydrogen

:atom: Run code interactively, inspect data, and plot. All the power of Jupyter kernels, inside your favorite text editor.

atom data-science hydrogen ipython jupyter jupyter-kernels nteract repl

Last synced: 31 Jul 2024

https://github.com/marimo-team/marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.

artificial-intelligence data-science data-visualization developer-tools machine-learning notebooks pipeline python reactive web-app

Last synced: 31 Jul 2024

https://github.com/aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

amazon-athena amazon-sagemaker-notebook apache-arrow apache-parquet athena aws aws-glue aws-lambda data-engineering data-science emr etl glue-catalog lambda modin mysql pandas python ray redshift

Last synced: 01 Aug 2024

https://github.com/justmarkham/scikit-learn-videos

Jupyter notebooks from the scikit-learn video series

data-science jupyter-notebook machine-learning python scikit-learn tutorial

Last synced: 01 Aug 2024

https://github.com/spotify/chartify

Python library that makes it easy for data scientists to create charts.

bokeh data-science plots plotting python visualization

Last synced: 30 Jul 2024

https://github.com/chiphuyen/python-is-cool

Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.

advanced-python data-science machine-learning python-tutorials python3

Last synced: 30 Jul 2024

https://github.com/ploomber/ploomber

The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

data-engineering data-science jupyter jupyter-notebooks machine-learning mlops notebooks papermill pipelines pycharm vscode workflow

Last synced: 01 Aug 2024

https://github.com/fastai/course-nlp

A Code-First Introduction to NLP course

data-science machine-learning nlp python

Last synced: 31 Jul 2024

https://github.com/deepchecks/deepchecks

Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.

data-drift data-science data-validation deep-learning html-report jupyter-notebook machine-learning ml mlops model-monitoring model-validation pandas-dataframe python pytorch

Last synced: 31 Jul 2024