Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/aws/aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

amazon-athena amazon-sagemaker-notebook apache-arrow apache-parquet athena aws aws-glue aws-lambda data-engineering data-science emr etl glue-catalog lambda modin mysql pandas python ray redshift

Last synced: 23 Dec 2024

https://github.com/nteract/hydrogen

:atom: Run code interactively, inspect data, and plot. All the power of Jupyter kernels, inside your favorite text editor.

atom data-science hydrogen ipython jupyter jupyter-kernels nteract repl

Last synced: 24 Dec 2024

https://github.com/lancedb/lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

apache-arrow computer-vision data-analysis data-analytics data-centric data-format data-science dataops deep-learning duckdb embeddings llms machine-learning mlops python rust

Last synced: 23 Dec 2024

https://github.com/justmarkham/scikit-learn-videos

Jupyter notebooks from the scikit-learn video series

data-science jupyter-notebook machine-learning python scikit-learn tutorial

Last synced: 19 Dec 2024

https://github.com/deepchecks/deepchecks

Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.

data-drift data-science data-validation deep-learning html-report jupyter-notebook machine-learning ml mlops model-monitoring model-validation pandas-dataframe python pytorch

Last synced: 23 Dec 2024

https://github.com/spotify/chartify

Python library that makes it easy for data scientists to create charts.

bokeh data-science plots plotting python visualization

Last synced: 24 Dec 2024

https://github.com/chiphuyen/python-is-cool

Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.

advanced-python data-science machine-learning python-tutorials python3

Last synced: 21 Dec 2024

https://github.com/ploomber/ploomber

The fastest ⚡️ way to build data pipelines. Develop iteratively, deploy anywhere. ☁️

data-engineering data-science jupyter jupyter-notebooks machine-learning mlops notebooks papermill pipelines pycharm vscode workflow

Last synced: 23 Dec 2024

https://github.com/fastai/course-nlp

A Code-First Introduction to NLP course

data-science machine-learning nlp python

Last synced: 19 Dec 2024

https://github.com/databricks/koalas

Koalas: pandas API on Apache Spark

big-data data-science dataframe mlflow pandas pydata spark

Last synced: 24 Dec 2024

https://github.com/alibaba/graphscope

🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统

analytics big-data data-science graph graph-analytics graph-computation graph-computing graph-data graph-neural-networks gremlin

Last synced: 23 Dec 2024

https://github.com/alibaba/GraphScope

🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统

analytics big-data data-science graph graph-analytics graph-computation graph-computing graph-data graph-neural-networks gremlin

Last synced: 06 Nov 2024

https://github.com/opengeos/leafmap

A Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment

data-science dataviz folium geoparquet geopython geospatial geospatial-analysis gis ipyleaflet jupyter jupyter-notebook leafmap mapping plotly python solara streamlit streamlit-webapp whiteboxtools

Last synced: 23 Dec 2024

https://github.com/ethen8181/machine-learning

:earth_americas: machine learning tutorials (mainly in Python3)

data-science deep-learning jupyter-notebook machine-learning python python3

Last synced: 18 Dec 2024

https://github.com/spark-notebook/spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

apache-spark data-science notebook reactive scala spark

Last synced: 19 Dec 2024

https://github.com/modelscope/data-juicer

Making data higher-quality, juicier, and more digestible for foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据!

chinese data-analysis data-science data-visualization dataset gpt gpt-4 instruction-tuning large-language-models llama llava llm llms multi-modal nlp opendata pre-training pytorch sora streamlit

Last synced: 22 Dec 2024

https://github.com/andypetrella/spark-notebook

Interactive and Reactive Data Science using Scala and Spark.

apache-spark data-science notebook reactive scala spark

Last synced: 12 Oct 2024

https://github.com/antonycourtney/tad

A desktop application for viewing and analyzing tabular data

csv data-analysis data-science database desktop-application duckdb parquet-viewer pivot-tables pivots tabular-data

Last synced: 28 Oct 2024

https://github.com/aksnzhy/xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

data-analysis data-science factorization-machines ffm fm machine-learning statistics

Last synced: 18 Dec 2024

https://github.com/quadratichq/quadratic

Quadratic | Spreadsheet with Python, SQL, and AI

ai data data-analysis data-engineering data-science etl python quadratic spreadsheet sql wasm webgl

Last synced: 18 Dec 2024

https://github.com/determined-ai/determined

Determined is an open-source machine learning platform that simplifies distributed training, hyperparameter tuning, experiment tracking, and resource management. Works with PyTorch and TensorFlow.

data-science deep-learning distributed-training hyperparameter-optimization hyperparameter-search hyperparameter-tuning keras kubernetes machine-learning ml-infrastructure ml-platform mlops pytorch tensorflow

Last synced: 23 Dec 2024

https://github.com/mrdbourke/zero-to-mastery-ml

All course materials for the Zero to Mastery Machine Learning and Data Science course.

data-science deep-learning machine-learning

Last synced: 24 Dec 2024

https://github.com/gokumohandas/mlops-course

Learn how to design, develop, deploy and iterate on production-grade ML applications.

data-engineering data-quality data-science deep-learning distributed-ml llms machine-learning mlops natural-language-processing python pytorch ray

Last synced: 18 Dec 2024

https://github.com/parrt/dtreeviz

A python library for decision tree visualization and model interpretation.

data-science decision-trees machine-learning model-interpretation python random-forest scikit-learn visualization xgboost

Last synced: 23 Dec 2024

https://github.com/GokuMohandas/mlops-course

Learn how to design, develop, deploy and iterate on production-grade ML applications.

data-engineering data-quality data-science deep-learning distributed-ml llms machine-learning mlops natural-language-processing python pytorch ray

Last synced: 30 Oct 2024

https://github.com/dotnet/interactive

.NET Interactive combines the power of .NET with many other languages to create notebooks, REPLs, and embedded coding experiences. Share code, explore data, write, and learn across your apps in ways you couldn't before.

csharp data-science dotnet-interactive fsharp interactive-programming jupyter notebooks polyglot polyglot-dev powershell

Last synced: 24 Dec 2024

https://github.com/giswqs/leafmap

A Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment

data-science dataviz folium geoparquet geopython geospatial geospatial-analysis gis ipyleaflet jupyter jupyter-notebook leafmap mapping plotly python streamlit streamlit-webapp whiteboxtools

Last synced: 11 Oct 2024

https://github.com/libffcv/ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)

data-science machine-learning pytorch

Last synced: 24 Dec 2024

https://github.com/rbhatia46/data-science-interview-resources

A repository listing out the potential sources which will help you in preparing for a Data Science/Machine Learning interview. New resources added frequently.

artificial-intelligence data-science data-science-interview interview-questions interview-resources learning-resources machine-learning machine-learning-interview

Last synced: 04 Dec 2024

https://github.com/rasbt/deep-learning-book

Repository for "Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python"

artificial-intelligence data-science deep-learning machine-learning neural-network python pytorch tensorflow

Last synced: 19 Dec 2024

https://github.com/teamhg-memex/eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions

crfsuite data-science explanation inspection lightgbm machine-learning nlp python scikit-learn xgboost

Last synced: 20 Dec 2024

https://github.com/TeamHG-Memex/eli5

A library for debugging/inspecting machine learning classifiers and explaining their predictions

crfsuite data-science explanation inspection lightgbm machine-learning nlp python scikit-learn xgboost

Last synced: 09 Nov 2024

https://github.com/matheusfacure/python-causality-handbook

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.

causal-inference causality data-science econometrics harmless-econometrics impact-estimation python

Last synced: 17 Dec 2024

https://matheusfacure.github.io/python-causality-handbook/

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.

causal-inference causality data-science econometrics harmless-econometrics impact-estimation python

Last synced: 05 Nov 2024

https://github.com/rbhatia46/Data-Science-Interview-Resources

A repository listing out the potential sources which will help you in preparing for a Data Science/Machine Learning interview. New resources added frequently.

artificial-intelligence data-science data-science-interview interview-questions interview-resources learning-resources machine-learning machine-learning-interview

Last synced: 07 Nov 2024

https://github.com/whylabs/whylogs

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

ai-pipelines analytics approximate-statistics calculate-statistics constraints data-constraints data-pipeline data-quality data-science dataops dataset logging machine-learning ml-pipelines mlops model-performance python statistical-properties

Last synced: 17 Dec 2024

https://github.com/justinzm/gopup

数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…

covid19-data data data-analysis data-science datasets economic-data gopup index-data python

Last synced: 19 Dec 2024

https://github.com/visualize-ml/book7_visualizations-for-machine-learning

Book_7_《机器学习》 | 鸢尾花书:从加减乘除到机器学习;欢迎批评指正

baysian data-science linear-algebra machine-learning machine-learning-algorithms matrix

Last synced: 19 Dec 2024

https://github.com/yunabe/lgo

Interactive Go programming with Jupyter

data-science go golang jupyter-notebook jupyter-notebook-kernel machine-learning repl

Last synced: 21 Dec 2024

https://github.com/eventual-inc/daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust

big-data data-engineering data-science dataframe distributed-computing machine-learning python rust

Last synced: 24 Dec 2024

https://github.com/reiinakano/scikit-plot

An intuitive library to add plotting functionality to scikit-learn objects.

data-science machine-learning plot plotting scikit-learn visualization

Last synced: 24 Dec 2024

https://github.com/mito-ds/mito

The mitosheet package, trymito.io, and other public Mito code.

data data-analysis data-science data-visualization jupyter pandas python streamlit-component

Last synced: 24 Dec 2024

https://github.com/PizzaDeDados/datascience-pizza

🍕 Repositório para juntar informações sobre materiais de estudo em análise de dados e áreas afins, empresas que trabalham com dados e dicionário de conceitos

dados data-science data-scientists hacktoberfest machine-learning

Last synced: 25 Oct 2024

https://github.com/IBM/claimed

The goal of CLAIMED is to enable low-code/no-code rapid prototyping style programming to seamlessly CI/CD into production.

data-science machine-learning

Last synced: 14 Dec 2024

https://github.com/claimed-framework/component-library

The goal of CLAIMED is to enable low-code/no-code rapid prototyping style programming to seamlessly CI/CD into production.

data-science machine-learning

Last synced: 24 Dec 2024

https://github.com/pizzadedados/datascience-pizza

🍕 Repositório para juntar informações sobre materiais de estudo em análise de dados e áreas afins, empresas que trabalham com dados e dicionário de conceitos

dados data-science data-scientists hacktoberfest machine-learning

Last synced: 02 Dec 2024

https://github.com/weijie-chen/linear-algebra-with-python

Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.

computational-science data-analysis data-science data-visualization diagonalization eigenvalues eigenvectors gram-schmidt jupyter linear-algebra linear-transformations mathematics matrix matrix-calculations multivariate-normal-distribution null-space python singular-value-decomposition symmetric-matrices vector-space

Last synced: 18 Dec 2024

https://github.com/weijie-chen/Linear-Algebra-With-Python

Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.

computational-science data-analysis data-science data-visualization diagonalization eigenvalues eigenvectors gram-schmidt jupyter linear-algebra linear-transformations mathematics matrix matrix-calculations multivariate-normal-distribution null-space python singular-value-decomposition symmetric-matrices vector-space

Last synced: 30 Oct 2024

https://github.com/Eventual-Inc/Daft

Distributed data engine for Python/SQL designed for the cloud, powered by Rust

big-data data-engineering data-science dataframe distributed-computing machine-learning python rust

Last synced: 06 Nov 2024

https://github.com/mckinsey/causalnex

A Python library that helps data scientists to infer causation rather than observing correlation.

bayesian-inference bayesian-networks causal-inference causal-models causal-networks causalnex data-science machine-learning

Last synced: 19 Dec 2024

https://github.com/quantumblacklabs/causalnex

A Python library that helps data scientists to infer causation rather than observing correlation.

bayesian-inference bayesian-networks causal-inference causal-models causal-networks causalnex data-science machine-learning

Last synced: 20 Nov 2024