An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/InseeFrLab/onyxia

๐Ÿ”ฌ Data science environment for k8s

bluehats data-science datalab helm insee kubernetes onyxia

Last synced: 27 Dec 2024

https://github.com/hugoblox/theme-research-group

๐Ÿ‘ฅ ่ฝปๆพๅˆ›ๅปบ็ ”็ฉถ็ป„ๆˆ–็ป„็ป‡็ฝ‘็ซ™ Easily create a stunning Research Group, Team, or Business Website with no-code

academia academic blogdown college-website data-science hugo hugo-theme landing-page landing-page-theme r research research-group research-lab research-lab-website research-tool team-website university university-website wowchemy

Last synced: 14 Apr 2025

https://github.com/KiranGershenfeld/VisualizingTwitchCommunities

Graphing communities on Twitch.tv in a visually intuitive way

community data-science python twitch visualization

Last synced: 14 Mar 2025

https://github.com/aporia-ai/mlnotify

๐Ÿ”” No need to keep checking your training - just one import line and you'll know the second it's done.

data-science deep-learning deeplearning machine-learning machinelearning machinelearning-python ml notification notifications opensource python python3 tool tools

Last synced: 05 Apr 2025

https://github.com/datmo/datmo

Open source production model management tool for data scientists

artificial-intelligence data-science deep-learning machine-learning reproducibility version-control

Last synced: 02 May 2025

https://github.com/gdsbook/book

This book serves as an introduction to a whole new way of thinking systematically about geographic data, using geographical analysis and computation to unlock new insights hidden within data.

data-analysis-python data-science geographic-data geographical-information-system spatial-analysis spatial-data-analysis spatial-statistics statistics

Last synced: 15 Mar 2025

https://github.com/larswaechter/voici.js

A Node.js library for pretty printing your data on the terminal๐ŸŽจ

console data-science javascript shell terminal tty typescript

Last synced: 05 Apr 2025

https://github.com/jovianhq/opendatasets

A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

data-science datasets machine-learning python

Last synced: 04 Apr 2025

https://github.com/yzhao062/data-mining-conferences

Ranking, acceptance rate, deadline, and publication tips

data-mining data-science research

Last synced: 09 Apr 2025

https://github.com/tommyod/efficient-apriori

An efficient Python implementation of the Apriori algorithm.

apriori-algorithm association-rules data-mining data-science machinelearning

Last synced: 15 May 2025

https://github.com/upgini/upgini

Data search & enrichment library for Machine Learning โ†’ Easily find and add relevant features to your ML & AI pipeline from hundreds of public and premium external data sources, including open & commercial LLMs

automated-feature-engineering automl automl-pipeline chatgpt data-enrichment data-science feature-engineering feature-extraction feature-selection features kaggle kaggle-solution large-language-models llm machine-learning open-data open-datasets public-data python-library scikit-learn

Last synced: 15 May 2025

https://github.com/mljar/plotai

PlotAI - Your Ultimate Plotting Assistant! ๐Ÿ“Š๐Ÿค– Use ChatGPT-3.5 to create plots in Python and Matplotlib directly in your Python script or notebook.

charts chatgpt data-science llm matplotlib plots python visualization

Last synced: 15 May 2025

https://github.com/anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 13 Apr 2025

https://github.com/petrobras/3w

Promotes development of ML algorithms for early detection and classification of undesirable events in offshore oil wells.

anomaly-detection data-science machine-learning multivariate-time-series-analysis oil-well-monitoring

Last synced: 08 Apr 2025

https://github.com/Anonyfox/elixir-scrape

Scrape any website, article or RSS/Atom Feed with ease!

data-science elixir feed html information-retrieval readability rss scrape scraping

Last synced: 30 Mar 2025

https://github.com/machine-learning-apps/issue-label-bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 23 Jan 2025

https://github.com/machine-learning-apps/Issue-Label-Bot

Code For The Issue Label Bot, an App that automatically labels issues using machine learning, available on the GitHub Marketplace. This is also code for the blog article: "How to automate tasks on GitHub with machine learning for fun and profit"

bigquery bootstrap data-science deep-learning end-to-end-application flask gcp-cloud gharchive github-api-v3 github-app keras kubernetes machine-learning machine-learning-tutorials nlp production-machine-learning tensorflow

Last synced: 14 Mar 2025

https://github.com/autonlab/auton-survival

Auton Survival - an open source package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events

causal-inference counterfactual-inference data-science deep-learning graphical-models machine-learning python regression reliability-analysis survival-analysis time-to-event

Last synced: 01 May 2025

https://github.com/profjsb/python-seminar

Python for Data Science (Seminar Course at UC Berkeley; AY 250)

data-science distributed-computing machine-learning python visualization

Last synced: 27 Nov 2024

https://github.com/databrickslabs/tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

data-analysis data-science pandas python scala time-series timeseries timeseries-analysis timeseries-data

Last synced: 29 Apr 2025

https://github.com/tommyod/Efficient-Apriori

An efficient Python implementation of the Apriori algorithm.

apriori-algorithm association-rules data-mining data-science machinelearning

Last synced: 26 Mar 2025

https://github.com/maxhumber/redframes

General Purpose Data Manipulation Library

data-science pandas python

Last synced: 05 Apr 2025

https://github.com/microsoft/genalog

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

data-generation data-science machine-learning ner ocr-recognition python synthetic-data synthetic-data-generation synthetic-images text-alignment

Last synced: 04 Apr 2025

https://github.com/kamu-data/kamu-cli

Next-generation decentralized data lakehouse and a multi-party stream processing network

blockchain data-as-code data-management data-science datafusion flink jupyter kamu open-data open-data-fabric spark sql

Last synced: 15 May 2025

https://github.com/solegalli/feature-selection-for-machine-learning

Code repository for the online course Feature Selection for Machine Learning

data-science feature-selection machine-learning python

Last synced: 15 May 2025

https://github.com/ml-tooling/ml-hub

๐Ÿงฐ Multi-user development platform for machine learning teams. Simple to setup within minutes.

data-science docker jupyter jupyterhub machine-learning python

Last synced: 06 Apr 2025

https://github.com/weecology/retriever

Quickly download, clean up, and install public datasets into a database management system

data data-retrieval data-science dataset datasets hacktobefest python

Last synced: 03 Apr 2025

https://jbryer.github.io/likert/

Package to analyze likert based items.

data-science r visualization

Last synced: 06 May 2025

https://github.com/CJWorkbench/cjworkbench

The data journalism platform with built in training

data-analysis data-journalism data-science data-visualization journalism notebook

Last synced: 24 Nov 2024

https://github.com/yamafaktory/hypergraph

Hypergraph is data structure library to create a directed hypergraph in which a hyperedge can join any number of vertices.

data data-science data-structure data-structures hypergraph hypergraphs rust rust-lang rustlang

Last synced: 15 May 2025

https://github.com/data-describe/data-describe

dataโŽฐdescribe: Pythonic EDA Accelerator for Data Science

analysis data-science eda exploratory-data-analysis pypi

Last synced: 12 Apr 2025

https://github.com/alibaba/feathub

FeatHub - A stream-batch unified feature store for real-time machine learning

apache-flink data data-engineering data-quality data-science feature-engineering feature-store machine-learning mlops streaming

Last synced: 04 Apr 2025

https://github.com/PPshrimpGo/BDCI2018-ChinauUicom-1st-solution

่ฟ™ๆ˜ฏBDCI2018็š„่”้€š่ต›้ข˜็ฌฌไธ€ๅ่งฃๅ†ณๆ–นๆกˆ

competition data-science

Last synced: 28 Nov 2024

https://github.com/PKU-DAIR/Hetu

A high-performance distributed deep learning system targeting large-scale and automated distributed training.

artificial-intelligence autograd data-science deep-learning deep-neural-networks distributed-systems distributed-training embeddings gpu high-dimensional machine-learning python state-of-the-art

Last synced: 20 Mar 2025

https://github.com/greenelab/scihub

Source code and data analyses for the Sci-Hub Coverage Study

crossref data-science doi journals libgen open-data sci-hub scimag scopus

Last synced: 09 Apr 2025

https://github.com/kde/labplot

LabPlot is a FREE, open source and cross-platform Data Visualization and Analysis software accessible to everyone.

data-analysis data-science data-visualization fitting graph graph2d plotting scientific-plotting scientific-visualization

Last synced: 16 May 2025

https://github.com/iterative/terraform-provider-iterative

โ˜๏ธ Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes

aws azure cloud cloud-computing cloud-infrastructure cloud-orchestration cloud-storage cml data-science developer-tools gcp gpu hacktoberfest k8s machine-learning mlops terraform terraform-provider terraform-provider-iterative tpi

Last synced: 30 Mar 2025

https://github.com/drivendataorg/deon

A command line tool to easily add an ethics checklist to your data science projects.

data-ethics data-science ethics machine-learning

Last synced: 04 Apr 2025

https://github.com/Dyakonov/PZAD

ะšัƒั€ั "ะŸั€ะธะบะปะฐะดะฝั‹ะต ะทะฐะดะฐั‡ะธ ะฐะฝะฐะปะธะทะฐ ะดะฐะฝะฝั‹ั…" (ะ’ะœะš, ะœะ“ะฃ ะธะผะตะฝะธ ะœ.ะ’. ะ›ะพะผะพะฝะพัะพะฒะฐ)

data-mining data-science data-visualization education lectures machine-learning ml russian slides

Last synced: 27 Nov 2024

https://github.com/Ibotta/sk-dist

Distributed scikit-learn meta-estimators in PySpark

data-science machine-learning ml scikit-learn spark

Last synced: 25 Nov 2024

https://github.com/microsoft/NimbusML

Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

data-science machine-learning ml mlnet nimbusml python scikit-learn

Last synced: 18 Apr 2025

https://github.com/ibotta/sk-dist

Distributed scikit-learn meta-estimators in PySpark

data-science machine-learning ml scikit-learn spark

Last synced: 16 May 2025

https://github.com/microsoft/nimbusml

Python machine learning package providing simple interoperability between ML.NET and scikit-learn components.

data-science machine-learning ml mlnet nimbusml python scikit-learn

Last synced: 25 Jan 2025

https://github.com/amanovishnu/ineuron-full-stack-data-science-assignments

this repository features assignments and projects from the iNeuron full stack data science course, providing valuable resources for learners to enhance their skills and apply their knowledge.

computer-vision data-science datascience deep-learning exploratory-data-analysis linear-regression machine-learning natural-language-processing python recommender-system sql statistics

Last synced: 08 Apr 2025

https://github.com/merantix-momentum/squirrel-core

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

ai cloud-computing collaboration computer-vision cv data-ingestion data-mesh data-science dataops datasets deep-learning distributed internal machine-learning ml natural-language-processing nlp python pytorch tensorflow

Last synced: 01 Apr 2025

https://github.com/glm-tools/pyglmnet

Python implementation of elastic-net regularized generalized linear models

data-science elastic-net glm lasso machine-learning python

Last synced: 01 May 2025

https://github.com/amanovishnu/ineuron-full-stack-data-science-assignment-collection

this repository features assignments and projects from the iNeuron full stack data science course, providing valuable resources for learners to enhance their skills and apply their knowledge.

computer-vision data-science datascience deep-learning exploratory-data-analysis linear-regression machine-learning natural-language-processing python recommender-system sql statistics

Last synced: 28 Feb 2025

https://github.com/mybridge/python-articles

Monthly Series - Top 10 Python Articles

data-science data-visualization django flask python python3

Last synced: 09 May 2025

https://github.com/project-ryoma/ryoma

Common AI agent framework solving your data problems

ai data-science llm

Last synced: 02 May 2025

https://github.com/Mybridge/python-articles

Monthly Series - Top 10 Python Articles

data-science data-visualization django flask python python3

Last synced: 22 Mar 2025

https://github.com/senseyeio/roger

Golang RServe client. Use R from Go

data-science go r rserve scientific-computing

Last synced: 09 Apr 2025

https://github.com/jupyter-naas/naas

Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)

ai binder data data-science data-transformation engine etl integration jupyter jupyterlab notebooks open-source pipeline

Last synced: 03 Apr 2025

https://github.com/stocknear/backend

Backend of stocknear - Open Source Stock Analysis

data data-science fastapi fastify finance javascript machine-learning nodejs pocketbase python redis

Last synced: 16 May 2025

https://github.com/nshiab/simple-data-analysis

Easy-to-use and high-performance JavaScript library for data analysis. Works with tabular and geospatial data.

analysis bun data data-analysis data-science duckdb geospatial javascript node node-js nodejs spatial spatial-analysis sql typescript

Last synced: 10 Apr 2025

https://github.com/Griperis/BlenderDataVis

Data visualisation addon for Blender

blender blender-addon chart data-science data-visualisation

Last synced: 09 May 2025

https://github.com/griperis/blenderdatavis

Data visualisation addon for Blender

blender blender-addon chart data-science data-visualisation

Last synced: 12 Apr 2025

https://github.com/kraina-ai/srai

Spatial Representations for Artificial Intelligence - a Python library toolkit for geospatial machine learning focused on creating embeddings for downstream tasks

artificial-intelligence data-science geo geospatial machine-learning python spatial spatial-analysis srai

Last synced: 15 May 2025

https://github.com/flyteorg/flytekit

Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.

automation data data-science extensible flyte flyte-tasks hacktoberfest mlops pypi python sdk spark workflows

Last synced: 14 May 2025

https://github.com/rasgointelligence/RasgoQL

Write python locally, execute SQL in your data warehouse

data-analysis data-science pandas python sql

Last synced: 27 Nov 2024

https://github.com/svenkreiss/pysparkling

A pure Python implementation of Apache Spark's RDD and DStream interfaces.

apache-spark data-processing data-science python

Last synced: 07 Apr 2025

https://github.com/vopani/datatableton

100 exercises to learn Python Datatable

data-science datatable pydatatable python tutorial-exercises

Last synced: 12 May 2025

https://github.com/durgeshsamariya/data-science-roadmap

Roadmap to learn Data Science and related areas.

data-science data-science-resources learn-data-science roadmap

Last synced: 20 Feb 2025

https://github.com/empower-ai/dsensei

AI-powered key driver analysis tool that pinpoints root cause behind metrics fluctuation in one minute.

analytics business-analytics business-intelligence data data-analytics data-insights data-science

Last synced: 08 Apr 2025

https://github.com/wizardforcel/data-science-notebook

:book: ๆฏไธ€ไธชไผŸๅคง็š„ๆ€ๆƒณๅ’Œ่กŒๅŠจ้ƒฝๆœ‰ไธ€ไธชๅพฎไธ่ถณ้“็š„ๅผ€ๅง‹

data-analysis data-science machine-learning notebook numpy pandas sklearn tensorflow

Last synced: 10 Apr 2025