An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/vertica/verticapy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

big-data data-science data-visualization machine-learning preparation python python-library vertica

Last synced: 06 Apr 2026

https://github.com/aws/amazon-redshift-python-driver

Redshift Python Connector. It supports Python Database API Specification v2.0.

amazon-redshift aws-redshift data-analysis data-science

Last synced: 04 Mar 2026

https://github.com/matyushkin/ds

👨‍🔬 In Russian: Обновляемая структурированная подборка бесплатных ресурсов по тематикам Data Science: курсы, книги, открытые данные, блоги и готовые решения.

bookshelf cheatsheets courses data data-science machine-learning reddit roadmap russian russian-language statistics

Last synced: 04 Apr 2026

https://github.com/anki-code/xonsh-cheatsheet

Cheat sheet for xonsh shell with copy-pastable examples. The best doc for the new users.

awesome awesome-cheatsheet cheat-sheet cheat-sheets cheatsheet cheatsheets console data-science devops devops-scripts hacking shell terminal xonsh xontrib

Last synced: 31 Dec 2025

https://github.com/ocademy-ai/machine-learning

Learn AI together, for free. AI learning and teaching resources for everyone.

ai data-engineering data-science deep-learning jupyter jupyter-notebook machine-learning ml mlops python scikit-learn visualization

Last synced: 16 Apr 2025

https://github.com/ashutosh1919/truvisory

This project is meant to provide resources to users who want to access good LinkedIn posts which contains resources to learn any Technology, Design, Self-Branding, Motivation etc. You can visit project by:

career data-science design full-stack linkedin linkedin-posts linkedin-profile marketing motivation opensource react react-template reactjs self-branding stacks ui-design website-design website-template

Last synced: 03 Apr 2025

https://github.com/toUpperCase78/formula1-datasets

Datasets & Analyses for Formula 1 World Championship

analysis data-science datasets formula1 jupyter-notebook motorsports python racing

Last synced: 26 Sep 2025

https://github.com/coqui-ai/trainer

🐸 - A general purpose model trainer, as flexible as it gets

ai data-science deep-learning machine-learning pytorch

Last synced: 16 May 2025

https://github.com/datatalksclub/datatalksclub.github.io

The web page for DataTalks.Club

data-science jekyll machine-learning

Last synced: 25 Oct 2025

https://github.com/trainingbypackt/data-science-for-marketing-analytics

Achieve your marketing goals with the data analytics power of Python

data-science data-visualization matplotlib numpy pandas python seaborn

Last synced: 09 Apr 2025

https://github.com/zeno-ml/zeno

AI Data Management & Evaluation Platform

ai data-science evaluation evaluation-framework machine-learning python

Last synced: 18 Apr 2025

https://github.com/google-aai/sc17

SuperComputing 2017 Deep Learning Tutorial

data-science deep-learning google-cloud-platform machine-learning tutorial

Last synced: 19 Jul 2025

https://github.com/fastverse/fastverse

An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R

c cpp data-aggregation data-manipulation data-science data-transformation high-performance low-dependency panel-data r rstats statistical-computing time-series weights

Last synced: 12 Dec 2025

https://github.com/speedml/speedml

Speedml is a Python package to speed start machine learning projects.

data-science machine-learning python

Last synced: 12 Jul 2025

https://github.com/shaildeliwala/delbot

It understands your voice commands, searches news and knowledge sources, and summarizes and reads out content to you.

ai bot bots chatbot data-science flask natural-language-processing python

Last synced: 16 May 2025

https://github.com/Speedml/speedml

Speedml is a Python package to speed start machine learning projects.

data-science machine-learning python

Last synced: 09 May 2025

https://github.com/activitysim/activitysim

An Open Platform for Activity-Based Travel Modeling

activitysim bsd-3-clause data-science microsimulation python travel-modeling

Last synced: 09 Sep 2025

https://github.com/dataplane-app/dataplane

Dataplane is an Airflow inspired unified data platform with additional data mesh and RPA capability to automate, schedule and design data pipelines and workflows. Dataplane is written in Golang with a React front end.

airflow data data-analysis data-engineering data-integration data-pipelines data-science dataplane datawarehouse etl finance golang kubernetes pipelines robotics-process-automation rpa scheduler workflow workflow-automation workflows

Last synced: 27 Dec 2025

https://github.com/build-on-aws/cloud-clubs-learner-library

A library for learners! Whether or not you're a part of AWS Cloud Clubs, take a look in this library for free, open, leveled content for students 18+ worldwide

ai aws containers data-analytics data-science databases iot kubernetes ml mobile-development security serverless web web-development

Last synced: 09 Apr 2025

https://github.com/saimadhu-polamuri/DataAspirant_codes

Complete machine learning model codes

data-mining data-science machine-learning python

Last synced: 13 Jul 2025

https://github.com/mvlearn/mvlearn

Python package for multi-view machine learning

data-science machine-learning multiview-learning python

Last synced: 21 Oct 2025

https://github.com/Fixy-TR/fixy

Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.

acikhack2 ai artificial-intelligence bert data-science deep-learning deeplearning keras natural-language-processing neural-network neural-networks nlp python

Last synced: 03 May 2025

https://github.com/tmthyjames/achoo

Achoo uses a Raspberry Pi to predict if my son will need his inhaler on any given day using weather, pollen, and air quality data. If the prediction for a given day is above a specified threshold, the Pi will email his school nurse, and myself, notifying her that he may need preemptive treatment. Community-sourced health monitoring!

air-quality-data data-science diy pollen prediction python r raspberry-pi weather

Last synced: 07 May 2025

https://github.com/ahammadmejbah/fueling-ambitions-via-book-discoveries

This series uncovers the most valuable insights from groundbreaking books in AI, Machine Learning, and Data Science, helping you accelerate your learning journey. Each episode transforms complex theories into practical knowledge, making advanced topics more accessible and actionable.

data-science data-structures data-visualization deep-learning generative-ai machine-learning

Last synced: 12 Apr 2025

https://github.com/predict-idlab/powershap

A power-full Shapley feature selection method.

data-science feature-selection machine-learning shap

Last synced: 07 Jul 2025

https://github.com/Ronak-59/Stock-Prediction

Smart Algorithms to predict buying and selling of stocks on the basis of Mutual Funds Analysis, Stock Trends Analysis and Prediction, Portfolio Risk Factor, Stock and Finance Market News Sentiment Analysis and Selling profit ratio. Project developed as a part of NSE-FutureTech-Hackathon 2018, Mumbai. Team : Semicolon

algorithms artificial-intelligence data-science lstm-neural-network machine-learning risk-analysis sentiment-analysis stock-prediction stock-price-prediction visualisation

Last synced: 02 Jun 2026

https://github.com/blobcity/python-for-data-science

A collection of Jupyter Notebooks for learning Python for Data Science.

data-science jupyter jupyter-notebook jupyter-notebooks learn-python python

Last synced: 13 Feb 2026

https://github.com/oracle-samples/oci-data-science-ai-samples

This repo contains a series of tutorials and code examples highlighting different features of the OCI Data Science and AI services, along with a release vehicle for experimental programs.

ai conda data-science data-science-notebooks deep-learning jupyter-notebook machine-learning oci oracle-cloud-infrastructure python

Last synced: 15 May 2025

https://github.com/netflix/metaflow-service

:rocket: Metadata tracking and UI service for Metaflow!

ai data-science machine-learning metaflow ml ml-infrastructure ml-platform productivity ui

Last synced: 01 Jul 2025

https://github.com/jgoerner/data-science-stack-cookiecutter

🐳📊🤓Cookiecutter template to launch an awesome dockerized Data Science toolstack (incl. Jupyster, Superset, Postgres, Minio, AirFlow & API Star)

airflow apistar cookiecutter data-science docker docker-image jupyter minio postgres python superset

Last synced: 29 Mar 2025

https://github.com/iterative/vscode-dvc

Machine learning experiment tracking and data versioning with DVC extension for VS Code

data data-science dvc machine-learning python visual-studio-code vscode vscode-extension

Last synced: 18 Jun 2025

https://github.com/Laurae2/Laurae

Advanced High Performance Data Science Toolbox for R by Laurae

data-science laurae machine-learning r supervised-learning xgboost

Last synced: 20 Jul 2025

https://github.com/rhenanbartels/hrv

A Python package for heart rate variability analysis

data-science hacktoberfest hrv python signal-processing

Last synced: 05 Apr 2025

https://github.com/danaugrs/go-tsne

t-Distributed Stochastic Neighbor Embedding (t-SNE) in Go

3d data-science dimensionality-reduction go machine-learning tsne unsupervised-learning visualization

Last synced: 30 Apr 2025

https://github.com/kevin-hanselman/dud

A lightweight CLI tool for versioning data alongside source code and building data pipelines.

data-engineering data-pipelines data-science dataset dvcs machine-learning mlops

Last synced: 29 Dec 2025

https://github.com/ideonate/cdsdashboards

JupyterHub extension for ContainDS Dashboards

bokeh data-science jupyter jupyterhub panel plotly-dash rshiny streamlit visualization

Last synced: 04 Apr 2025

https://github.com/ahammadmejbah/machine-learning-book-collections

Machine learning is the study and development of data-driven strategies to enhance task performance. AI includes it.

data-science deep-learning machine-learning

Last synced: 05 Mar 2025

https://github.com/nteract/bookstore

📚 Notebook storage and publishing workflows for the masses

data-science notebook nteract scheduling storage versioned-buckets

Last synced: 07 Apr 2025

https://github.com/agilescientific/striplog

Lithology and stratigraphic logs for wells or outcrop.

data-mining data-science geology petrophysics sedimentology swung-stack

Last synced: 09 Apr 2025

https://github.com/rapidsai/node

GPU-accelerated data science and visualization in node

cuda data-science data-visualization gpgpu gpu nodejs

Last synced: 16 May 2025

https://github.com/analysiscenter/batchflow

BatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.

data-science machine-learning pipeline pipeline-framework python python3 workflow workflow-engine

Last synced: 25 Oct 2025

https://github.com/ActivitySim/activitysim

An Open Platform for Activity-Based Travel Modeling

activitysim bsd-3-clause data-science microsimulation python travel-modeling

Last synced: 15 Mar 2025

https://github.com/cyb3r-monk/rita-j

Implementation of RITA (Real Intelligence Threat Analytics) in Jupyter Notebook with improved scoring algorithm.

cybersecurity data-science dfir jupyter-notebook threat-hunting

Last synced: 09 Mar 2026

https://github.com/microsoft/finnts

Microsoft Finance Time Series Forecasting Framework (FinnTS) is a forecasting package that utilizes cutting-edge time series forecasting and parallelization on the cloud to produce accurate forecasts for financial data.

business data-science feature-selection finance finnts forecasting machine-learning microsoft r r-package rstats time-series

Last synced: 15 May 2025

https://github.com/alertadengue/pysus

Library to download, clean and analyze openly available datasets from Brazilian Universal health system, SUS.

data-science geospatial health

Last synced: 18 May 2026

https://github.com/drakearch/kaggle-courses

Kaggle courses and tutorials to get you started in the Data Science world.

data-science deep-learning machine-learning pandas python

Last synced: 16 Apr 2025

https://github.com/voila-dashboards/voici

Voici turns any Jupyter Notebook into a static web application

dashboards data-science emscripten jupyter jupyterlite voila-dashboard wasm

Last synced: 19 Jun 2025

https://github.com/coqui-ai/Trainer

🐸 - A general purpose model trainer, as flexible as it gets

ai data-science deep-learning machine-learning pytorch

Last synced: 19 Jul 2025

https://github.com/robmarkcole/hass-data-detective

Explore and analyse your Home Assistant data

data data-science home home-assistant home-automation

Last synced: 16 May 2025

https://github.com/launchflow/buildflow

BuildFlow, is an open source framework for building large scale systems using Python. All you need to do is describe where your input is coming from and where your output should be written, and BuildFlow handles the rest. No configuration outside of the code is required.

batch data-science pipeline python streaming

Last synced: 16 Jul 2025

https://github.com/multimeric/PandasSchema

A validation library for Pandas data frames using user-friendly schemas

data-science pandas schema validation

Last synced: 23 Apr 2025

https://github.com/multimeric/pandasschema

A validation library for Pandas data frames using user-friendly schemas

data-science pandas schema validation

Last synced: 12 Dec 2025

https://github.com/robmarkcole/HASS-data-detective

Explore and analyse your Home Assistant data

data data-science home home-assistant home-automation

Last synced: 06 Apr 2025

https://github.com/seg/2016-ml-contest

Machine learning contest - October 2016 TLE

contest data-science fun geophysics geoscience machine-learning

Last synced: 19 Jul 2025

https://github.com/joeymeyer/raspberryturk

The Raspberry Turk is a robot that can play chess—it's entirely open source, based on Raspberry Pi, and inspired by the 18th century chess playing machine, the Mechanical Turk.

3d-printing chess computer-vision data-science machine-learning raspberry-pi robotics

Last synced: 12 May 2025

https://github.com/swoop-inc/spark-alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

data-engineering data-science scala spark

Last synced: 07 May 2025

https://github.com/microsoft/30daysof

30 Day of Learning Resources, Samples and Curricula

azure data-science powerapps pwa serverless staticwebapp

Last synced: 14 Apr 2026

https://github.com/curiousily/Machine-Learning-from-Scratch

Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.

artificial-intelligence book classification data-science machine-learning machine-learning-algorithms neural-networks notebook recommender-systems regression reinforcement-learning sentiment-analysis

Last synced: 20 Jul 2025

https://github.com/Azure/DataScienceVM

Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

ai azure big-data data-analysis data-science deep-learning dsvm machine-learning ml python r sqlserver

Last synced: 20 Jul 2025