An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/FilippoBovo/production-data-science

Production Data Science: a workflow for collaborative data science aimed at production

collaborative data-science production workflow

Last synced: 02 May 2025

https://github.com/girder/girder

A data management platform for the web, developed by Kitware

data-analytics data-management data-science javascript kitware python resonant

Last synced: 08 Apr 2026

https://github.com/aeturrell/skimpy

skimpy is a light weight tool that provides summary statistics about variables in data frames within the console.

data-science eda exploratory-data-analysis pandas statistics summary-statistics

Last synced: 10 Oct 2025

https://github.com/filippobovo/production-data-science

Production Data Science: a workflow for collaborative data science aimed at production

collaborative data-science production workflow

Last synced: 05 Apr 2025

https://github.com/dcai-course/dcai-lab

Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 ๐Ÿ‘ฉ๐Ÿฝโ€๐Ÿ’ป

course data-centric-ai data-science deep-learning homework lab machine-learning

Last synced: 26 Mar 2025

https://github.com/pgalko/BambooAI

A lightweight library that leverages Language Models (LLMs) to enable natural language interactions, allowing you to source and converse with data.

ai ai-agents data-analysis data-science gemini groq llm mistral ollama openai-api pandas pinecone python vector-database

Last synced: 23 Mar 2025

https://github.com/openintrostat/openintro-statistics

๐Ÿ“š An open-source textbook written at the college level. OpenIntro also offers a second college-level intro stat textbook and also a high school variant.

data-science latex openintro statistics textbook

Last synced: 26 Jan 2026

https://github.com/blackhc/toma

Helps you write algorithms in PyTorch that adapt to the available (CUDA) memory

data-science gpu machine-learning python pytorch

Last synced: 12 Apr 2025

https://github.com/BlackHC/toma

Helps you write algorithms in PyTorch that adapt to the available (CUDA) memory

data-science gpu machine-learning python pytorch

Last synced: 08 May 2025

https://github.com/tlkh/ai-lab

All-in-one AI container for rapid prototyping

cuda data-science deep-learning docker jupyter nvidia pytorch tensorflow

Last synced: 05 Apr 2025

https://github.com/ptyadana/data-science-and-machine-learning-projects-dojo

collections of data science, machine learning and data visualization projects with pandas, sklearn, matplotlib, tensorflow2, Keras, various ML algorithms like random forest classifier, boosting, etc

boosting-algorithms data-analysis data-science data-visualization deep-learning keras machine-learning machine-learning-algorithms natural-language-processing pandas probability-statistics scikit-learn seaborn tensorflow

Last synced: 05 Apr 2025

https://github.com/desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data

Last synced: 22 Nov 2025

https://github.com/kevintpeng/Learn-Something-Every-Day

๐Ÿ“ A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->

algorithm aws blog computer-science course-materials data-engineering data-science education educational engineering learning math mathematics research software-engineering university unix waterloo

Last synced: 20 Mar 2025

https://github.com/jobream/List-of-Learning-Resources

This collection provides a list of educational resources for Software Engineers. Feel free to add your favorite resources as well and help others in their journey of learning.

competitive-programming computer-science data-science resources software-engineering web-development

Last synced: 02 May 2025

https://github.com/DataScienceUB/introduction-datascience-python-book

Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications

analytics data data-science datascience machine-learning python sentiment-analysis

Last synced: 19 Jul 2025

https://github.com/plotly/dash-table

OBSOLETE: now part of https://github.com/plotly/dash

dash data-science data-visualization plotly plotly-dash python react table

Last synced: 04 Apr 2025

https://github.com/firmai/pandasvault

Advanced Pandas Vault โ€” Utilities, Functions and Snippets (by @firmai).

data-science data-structures dataframe functions pandas python snippets table tips

Last synced: 06 May 2025

https://github.com/ashishpatel26/resourcebank_cv_nlp_mlops_2022

This repository offers a goldmine of materials for students of computer vision, natural language processing, and machine learning operations.

computer-vision data-science deep-learning mlops natural-language-processing

Last synced: 05 Apr 2025

https://github.com/epistasislab/scikit-rebate

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

data-science feature-selection python

Last synced: 16 May 2025

https://github.com/5agado/data-science-learning

Repository of code and resources related to different data science and machine learning topics. For learning, practice and teaching purposes.

data-science deep-learning jupyter-notebook learning-by-doing machine-learning statistics

Last synced: 17 Apr 2025

https://github.com/sforaidl/genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL

algorithm-implementations benchmarking data-science deep-learning gym hacktoberfest machine-learning neural-network openai python pytorch reinforcement-learning reinforcement-learning-algorithms

Last synced: 09 Oct 2025

https://github.com/george0st/qgate-model

ML/AI meta-model, used in MLRun/Iguazio/Nuclio, see qgate-sln-<MLRun | solution>

data-science feature-store iguazu machine-learning meta-model mlops mlrun nuclio quality-assessment quality-assurance quality-gate testing

Last synced: 05 Sep 2025

https://github.com/okfn-brasil/rosie

๐Ÿค– Python application responsible for Serenata de Amor's intelligence

artificial-intelligence data-science machine-learning

Last synced: 28 Mar 2025

https://github.com/Niketkumardheeryan/ML-CaPsule

ML-capsule is a Project for beginners and experienced data science Enthusiasts who don't have a mentor or guidance and wish to learn Machine learning. Using our repo they can learn ML, DL, and many related technologies with different real-world projects and become Interview ready.

analytics data-analysis data-science data-visualization datascience deep-learning deep-neural-networks deployment flask heroku-deployment machine-learning python r statistics streamlit-webapp

Last synced: 05 May 2025

https://github.com/rebecca-vickery/data-science-learning-resources

A comprehensive list of free resources for learning data science

artificial-intelligence data data-science machine-learning python

Last synced: 26 Apr 2025

https://github.com/ClimbsRocks/machineJS

[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml

auto-ml automated-machine-learning automl data-science data-scientists javascript javascript-library kaggle machine-learning machine-learning-algorithms machine-learning-library ml numerai scikit-learn

Last synced: 19 Jul 2025

https://github.com/finos/jupyterlab_templates

Support for jupyter notebook templates in jupyterlab

data-science dataviz jupyter jupyterlab jupyterlab-extension machine-learning notebook

Last synced: 19 Jul 2025

https://github.com/tobgu/qframe

Immutable data frame for Go

data-frame data-science dataframe go golang immutable

Last synced: 04 Apr 2025

https://github.com/Chicago/food-inspections-evaluation

This repository contains the code to generate predictions of critical violations at food establishments in Chicago. It also contains the results of an evaluation of the effectiveness of those predictions.

cdph chicago data-science food-poisoning open-data open-science public-health

Last synced: 27 Mar 2025

https://github.com/xoolive/traffic

A toolbox for processing and analysing air traffic data

adsb air-traffic-data data-analytics data-science data-visualisation declarative-pipeline mode-s trajectory

Last synced: 14 May 2025

https://github.com/kunalj101/Data-Science-Hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 05 May 2025

https://github.com/SforAiDl/genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL

algorithm-implementations benchmarking data-science deep-learning gym hacktoberfest machine-learning neural-network openai python pytorch reinforcement-learning reinforcement-learning-algorithms

Last synced: 01 May 2025

https://github.com/climbsrocks/machinejs

[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml

auto-ml automated-machine-learning automl data-science data-scientists javascript javascript-library kaggle machine-learning machine-learning-algorithms machine-learning-library ml numerai scikit-learn

Last synced: 05 Apr 2025

https://github.com/kunalj101/data-science-hacks

Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.

computer-vision data data-analysis data-science data-visualization dataset hacks image-augmentation ipynb machine-learning nlp nlp-machine-learning numpy pandas pandas-dataframe pandas-python pandas-tutorial python python3 tips-and-tricks

Last synced: 28 Oct 2025

https://github.com/basedosdados/sdk

โš™๏ธ Cรณdigo de manutenรงรฃo do datalake (metadados e pacotes de acesso) | ๐Ÿ“– Docs: https://basedosdados.github.io/sdk/

bigquery dados-abertos data-science govtech hacktoberfest hacktoberfest2022 open-data python r sql transparencia

Last synced: 14 May 2025

https://github.com/aiguofer/gspread-pandas

A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.

data data-analytics data-engineering data-science dataframes google google-sheets google-spreadsheets gspread pandas python sheets

Last synced: 15 May 2025

https://github.com/ledell/user-machine-learning-tutorial

useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html

data-science deep-learning ensemble-learning gradient-boosting-machine machine-learning r random-forest tutorial

Last synced: 05 Apr 2025

https://github.com/ledell/useR-machine-learning-tutorial

useR! 2016 Tutorial: Machine Learning Algorithmic Deep Dive http://user2016.org/tutorials/10.html

data-science deep-learning ensemble-learning gradient-boosting-machine machine-learning r random-forest tutorial

Last synced: 19 Jul 2025

https://github.com/plotly/plotly_matlab

Plotly Graphing Library for MATLABยฎ

d3 d3js data-science data-visualization matlab plotly technical-computing webgl

Last synced: 15 May 2025

https://github.com/kevinliao159/mydatascienceportfolio

Applying Data Science and Machine Learning to Solve Real World Business Problems

api data-science data-visualization machine-learning neural-networks nlp recommendation-system spark

Last synced: 05 Apr 2025

https://github.com/airalcorn2/Michael-s-Data-Science-Curriculum

This is the companion curriculum to my guide to becoming a data scientist.

curriculum data-science machine-learning statistics

Last synced: 13 Jul 2025

https://github.com/weijie-chen/econometrics-with-python

Tutorials of econometrics featuring Python programming. This is a crash course for reviewing the most important concepts and techniques of basic econometrics, the theories are presented lightly without hustles of derivation and Python codes are straightforward.

data-analysis data-science econometrics economics python statistics time-series

Last synced: 05 Apr 2025

https://fraud-detection-handbook.github.io/fraud-detection-handbook/

Reproducible Machine Learning for Credit Card Fraud Detection - Practical Handbook

credit-card credit-card-fraud data-mining data-science fraud-detection machine-learning open-data

Last synced: 19 Nov 2025

https://github.com/datalayer/jupyter-ui

๐Ÿช โš›๏ธ React.js components ๐Ÿ’ฏ% compatible with ๐Ÿช Jupyter https://jupyter-ui-storybook.datalayer.tech

data data-product data-science data-visualisation datalayer ipywidgets jupyter jupyterlab lumino notebook reactjs ui

Last synced: 15 May 2025

https://github.com/EpistasisLab/scikit-rebate

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

data-science feature-selection python

Last synced: 27 Mar 2025

https://github.com/thoughtworks/mlops-platforms

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

azureml data-science databricks dataiku datarobot google-ai-platform h2oai iguazio knime kubeflow machine-learning mlflow mlops pachyderm sagemaker seldon

Last synced: 07 Aug 2025

https://github.com/solegalli/feature-engineering-for-machine-learning

Code repository for the online course Feature Engineering for Machine Learning

data-science feature-engineering feature-extraction machine-learning python

Last synced: 04 Apr 2025

https://github.com/yzkang/My-Data-Competition-Experience

ๆœฌไบบๅคšๆฌกๆœบๅ™จๅญฆไน ไธŽๅคงๆ•ฐๆฎ็ซž่ต›Top5็š„็ป้ชŒๆ€ป็ป“๏ผŒๆปกๆปก็š„ๅนฒ่ดง๏ผŒๆ‹ฟๅฅฝไธ่ฐข

automl catboost data-science deep-learning feature-engineering feature-selection gan hyperparameter-optimization kaggle-competition lightgbm machine-learning model-fusion model-selection python sql tianchi-competition xgboost

Last synced: 27 Apr 2025

https://github.com/Desbordante/desbordante-core

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

anomaly-detection correlations data-analytics data-cleaning data-cleansing data-engineering data-exploration data-mining data-mining-algorithms data-preprocessing data-profiling data-science data-wrangling exploratory-data-analysis feature-engineering feature-extraction feature-selection knowledge-discovery spreadsheets tabular-data

Last synced: 03 Apr 2025

https://github.com/dagshub/fds

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

data-science dvc git

Last synced: 12 Apr 2025

https://github.com/operatorai/modelstore

๐Ÿฌ modelstore is a Python library that allows you to version, export, and save a machine learning model to your filesystem or a cloud storage provider.

data-science keras machine-learning mlops modelstore python-library pytorch s3-storage scikit-learn tensorflow transformer

Last synced: 14 May 2025

https://github.com/DagsHub/fds

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

data-science dvc git

Last synced: 08 May 2025

https://github.com/plotly/dashR

Create data science and AI web apps in R

dash data-science data-visualization plotly plotly-dash python r react web-application

Last synced: 15 Mar 2025

https://github.com/matrix-profile-foundation/matrixprofile

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

algorithms anomaly-detection clustering data-mining data-science hacktoberfest matrixprofile motif-discovery python python2 python3 segmentation time-series time-series-analysis

Last synced: 16 May 2025

https://github.com/jkrumbiegel/chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

data-analysis data-science julia julia-language julia-package macro pipeline

Last synced: 12 Apr 2025

https://github.com/jkrumbiegel/Chain.jl

A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.

data-analysis data-science julia julia-language julia-package macro pipeline

Last synced: 14 May 2025

https://github.com/ozlerhakan/datacamp

๐Ÿง DataCamp data-science and machine learning courses

data-analysis data-science datacamp datacamp-course deep-learning machine-learning python statistics visualization

Last synced: 05 Apr 2025

https://github.com/aunum/goro

A High-level Machine Learning Library for Go

data-science go golang machine-learning machinelearning

Last synced: 04 Apr 2026

https://github.com/anothersamwilson/miceforest

Multiple Imputation with LightGBM in Python

data-science imputed-values mice-algorithm python random-forest

Last synced: 08 Apr 2025

https://github.com/adicherlavenkatasai/ml-workspace

Machine Learning (Beginners Hub), information(courses, books, cheat sheets, live sessions) related to machine learning, data science and python is available

cheat-sheets convolutional-networks data-science deep-learning deep-neural-networks gans harvard-edx interview-questions machine-learning python

Last synced: 11 Apr 2025

https://github.com/xorbit01/webpalm

๐Ÿ•ธ๏ธ Crawl in the web network

crawler crawling data data-science datamining go golang hack mining osint redteam spider tool

Last synced: 15 Dec 2025

https://github.com/souzatharsis/open-quant-live-book

An open source, hands-on and fully reproducible book in quantitative finance, data science and econophysics. Join us and help Make Wall Street Great Again!

algo-trading altdata data-science econophysics financial-analysis financial-markets machine-learning open-source quantitative-finance

Last synced: 27 Jan 2026

https://github.com/astronomer/astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

airflow apache-airflow bigquery dags data-analysis data-science elt etl gcs pandas postgres python s3 snowflake sql sqlite workflows

Last synced: 13 Apr 2025

https://github.com/aaronpenne/data_visualization

A collection of my data visualizations, mostly in Python.

data-science data-visualization python3 visualization

Last synced: 26 Feb 2026

https://github.com/maxhalford/xam

:dart: Personal data science and machine learning toolbox

data-science machine-learning preprocessing python stacking

Last synced: 19 Aug 2025

https://github.com/MaxHalford/xam

:dart: Personal data science and machine learning toolbox

data-science machine-learning preprocessing python stacking

Last synced: 08 May 2025

https://github.com/zhiningliu1998/imbalanced-ensemble

๐Ÿ› ๏ธ Class-imbalanced Ensemble Learning Toolbox. | ็ฑปๅˆซไธๅนณ่กก/้•ฟๅฐพๆœบๅ™จๅญฆไน ๅบ“

class-imbalance classification data-mining data-science ensemble ensemble-imbalanced-learning ensemble-learning ensemble-model imbalanced-classification imbalanced-data imbalanced-learning long-tail machine-learning multi-class-classification python python3 scikit-learn sklearn

Last synced: 15 May 2025

https://github.com/XORbit01/webpalm

๐Ÿ•ธ๏ธ Crawl in the web network

crawler crawling data data-science datamining go golang hack mining osint redteam spider tool

Last synced: 14 Apr 2025