An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/mam-dev/debianized-jupyterhub

:package: โ™ƒ Debian packaging of JupyterHub, a multi-user server for Jupyter notebooks

data-science debian-packages deployment devops dh-virtualenv jupyter-notebook jupyterhub omnibus-packages python-3

Last synced: 28 Oct 2025

https://github.com/slickml/slick-ml

SlickML ๐Ÿงž: Slick Machine Learning in Python

data-science machine-learning python

Last synced: 17 Mar 2026

https://github.com/dayyass/graph-based-clustering

Graph-Based Clustering using connected components and spanning trees.

clustering data-science graph graph-algorithms hacktoberfest machine-learning python sklearn

Last synced: 03 Mar 2026

https://github.com/noahgift/devml

Product of Pragmatic AI Labs: Machine Learning, Statistics and Utilities around Developer Productivity, Company Productivity and Project Productivity

ai churn-statistics data-science defects git github jupyter-notebook machine-intelligence machine-learning pandas productivity python seaborn visualization

Last synced: 01 Mar 2026

https://github.com/raybellwaves/cfanalytics

Downloading, analyzing and visualizing CrossFit data

crossfit crossfit-games data-frames data-science python

Last synced: 04 Mar 2026

https://github.com/aatmunbaxi/orgroamtools

Helper library for data analysis of org-roam collections

data-science emacs exploratory-data-analysis library org-roam personal-knowledge-management python

Last synced: 09 Apr 2025

https://github.com/amadeusitgroup/cpmml

cPMML is C++ library for scoring machine learning models serialized with the Predictive Model Markup Language (PMML)

ai data-science machine-learning ml model-deployment model-scoring pmml

Last synced: 25 Apr 2025

https://github.com/nitya/pydata-analysis-workshop

Step-by-step workshop for the "Simplifying Data Analysis" talk

data-science data-visualization python workshop

Last synced: 05 Sep 2025

https://github.com/alessandrocorradini/harvard-data-science-professional

Repository for the Data Science Professional Program from Harvard University on edX

data-analysis data-science datascience edx harvardx machine-learning machinelearning mooc moocs r r-language

Last synced: 13 Jul 2025

https://github.com/rasbt/hbind

Calculates hydrogen-bond interaction tables for protein-small molecule complexes, based on protein PDB and protonated ligand MOL2 structure input. Raschka et al. (2018) J. Computer-Aided Molec. Design

bioinformatics computational-biology data-science hydrogen-bonds protein-ligand-interfaces

Last synced: 05 Oct 2025

https://github.com/akabe/docker-iocaml-datascience

Dockerfile of Jupyter (IPython notebook) and IOCaml (OCaml kernel) with libraries for data science and machine learning

data-science deep-learning docker functional-programming iocaml jupyter-notebook machine-learning ocaml

Last synced: 07 Oct 2025

https://github.com/koheiw/proxyc

R package for large-scale similarity/distance computation

data-science distance-measures r similarity-measures

Last synced: 05 Apr 2025

https://github.com/dylan-profiler/compressio

Lossless in-memory compression of pandas DataFrames and Series powered by the visions type system. Up to 10x less RAM needed for the same data.

compression data-science dtype hacktoberfest pandas python types

Last synced: 15 Apr 2025

https://github.com/datasnakes/orthoevolution

An easy to use and comprehensive python package which aids in the analysis and visualization of orthologous genes. ๐Ÿต

bash bioinformatics biology biosql blast data-science ftp genetics ncbi orthologs orthologues orthology orthology-inference pbs phylogenetics python qsub sequence-alignment sge shell

Last synced: 15 Apr 2025

https://github.com/hugoblox/theme-markdown-slides

๐ŸŽ™ ๅœจ Markdown ไธญๅˆ›ๅปบๆผ‚ไบฎ็š„ๆผ”็คบๆ–‡็จฟใ€‚Write, share, and present your slides using the open, future-proof Markdown standard

blogdown data-science hugo hugo-learn-theme hugo-theme jupyter latex-math lms markdown markdown-slides mermaid obsidian obsidian-publish r reveal-js rstudio slides slideshow-maker static-site-generator theme

Last synced: 15 Oct 2025

https://github.com/thudm/kdd-industrial-papers

A list of recent industrial papers in KDD'16โ€“'18

data-mining data-science kdd paper-list

Last synced: 03 Mar 2025

https://github.com/mauroandretta/aifootballpredictions

AIFootballPredictions is an ML-based system to predict if a football match will have over 2.5 goals. Using historical data from top European leagues (Serie A, EPL, Bundesliga, La Liga, Ligue 1), it employs advanced feature engineering and model training techniques to provide accurate predictions. Perfect for sports analytics enthusiasts.

data-science football football-analytics football-prediction machine-learning prediction predictive-modeling soccer soccer-analytics

Last synced: 04 Apr 2025

https://github.com/antl3x/codeplot

โ–ฑ Codeplot is your infinity canvas for data exploration.

data-science data-visualization dataframe matplotlib pandas python

Last synced: 04 Oct 2025

https://github.com/mainakrepositor/data-analysis

Different types of data analytics projects : EDA, PDA, DDA, TSA and much more.....

data-analysis data-science deeplearning machine-learning-algorithms neural-networks time-series-analysis tsa

Last synced: 06 Mar 2026

https://github.com/brunorosilva/todoist-analytics

Just a simple app for weekly and monthly reviewing of tasks in todoist.

analytics dashboard data-science streamlit todoist

Last synced: 29 Jul 2025

https://github.com/urigoren/decorators4ds

Useful decorators every Data Scientist should know

data-science jupyter-notebook pyspark regression-testing s3 s3-bucket slack timer

Last synced: 15 Jul 2025

https://github.com/datapane/examples

Datapane Examples

data-science datapane jupyter python

Last synced: 22 Jul 2025

https://github.com/outerbounds/metaflowbot

Slack bot for monitoring your Metaflow flows!

data-science metaflow ml mlops slack slack-bot

Last synced: 22 Apr 2025

https://github.com/luiscib3r/solar-rad-forecasting

In these notebooks the entire research and implementation process carried out for the construction of various machine learning models based on neural networks that are capable of predicting levels of solar radiation is captured given a set of historical data taken by meteorological stations.

convolutional-neural-networks data-science deep-learning forecasting machine-learning rnn rnn-tensorflow

Last synced: 06 Mar 2026

https://github.com/bsomps/opengeoplotter

A PyQt5 app catered to the exploration industry for visualizing geologic drill hole data with features like cross-sections, simple 3D views, strip logs, scatter plots, and downhole line plots. Includes data transformation techniques like factor analysis, desurveying, and alpha-beta conversion.

cross-sections data-science drilling exploration geology geoscience pyqt5 python strip-logs

Last synced: 14 Jan 2026

https://github.com/oldratlee/data-science-practice

ๆ•ฐๆฎ็ง‘ๅญฆๅฎž่ทต | data science practice

anaconda data-science python statistics

Last synced: 10 Apr 2025

https://github.com/serkor1/slmetrics

A high-performance R :package: for supervised and unsupervised machine learning evaluation metrics witten in 'C++'.

armadillo armadillo-library artificial-intelligence cpp cran cran-r data-analysis data-science eigen3 machine-learning performance-metrics r r-package r-stats rcpp rcpparmadillo rcppeigen statistics supervised-learning

Last synced: 18 Feb 2026

https://github.com/danlessa/coursera-networkx

Notebooks used in the Network Data Science with NetworkX and Python guided course

course-project coursera data-science network-science networkx

Last synced: 10 Oct 2025

https://github.com/dlopezyse/ai-for-bio

A free and collaborative space for Machine Learning ๐Ÿค– applied to Biology ๐Ÿงฌ

bioinformatics biology biotech biotechnology course data-science education large-language-models llms machine-learning notebooks python

Last synced: 28 Feb 2025

https://github.com/SOM-Research/DescribeML

DescribeML is a Visual Studio Code language plug-in to describe machine-learning datasets in a structured format. Build better data describing the composition, provenance and social concerns of your dataset.

data-science dataset-generation datasets describeml langium machine-learning model-driven modeling open-data open-datasets visual-studio-code vscode

Last synced: 25 Oct 2025

https://github.com/haleshot/marimo-tutorials

Collection of marimo tutorials which encompass notebook/app examples in varying domains - CS/AI/ML

artificial-intelligence computer-vision data-science image-processing jax llm machine-learning marimo marimo-notebook pytorch recommender-system tensorflow

Last synced: 07 May 2025

https://github.com/kairen/learning-spark

Tidy up Spark and Hadoop tutorials.

bigdata data-science hadoop spark

Last synced: 10 Apr 2025

https://github.com/gsa/fedramp-ollalab-lean

The OllaLab-Lean project is designed to help both novice and experienced developers rapidly set up and begin working on LLM-based projects.

artificial-intelligence data-science jupyterlab llm llm-inference prompt-engineering prompt-templates research-and-development streamlit

Last synced: 11 Jul 2025

https://github.com/Haleshot/marimo-tutorials

Collection of marimo tutorials which encompass notebook/app examples in varying domains - CS/AI/ML

artificial-intelligence computer-vision data-science image-processing jax llm machine-learning marimo marimo-notebook pytorch recommender-system tensorflow

Last synced: 19 Apr 2025

https://github.com/0x0be/scrapeadvisor

A user-friendly python-based GUI which provides sentiment analysis of users' reviews toward a specific TripAdvisor facility

data-mining data-science python3 r scraping sentiment-analysis sentiment-classification text-mining tripadvisor tripadvisor-scraper web-scraping

Last synced: 04 Apr 2025

https://github.com/denadai2/google_street_view_deep_neural

Deep Neural Network model to predict security perception from Google Street View images. Model based on AlexNet CNNs

computational-social-science computer-vision data-science deep-learning urban-planning urban-science

Last synced: 15 Mar 2025

https://github.com/societe-generale/aikit

Automated machine learning package

automl data-science machine-learning python

Last synced: 15 Jul 2025

https://github.com/dawievlill/datascience-871

Data science module for economists written mostly in Julia and R

data-analysis data-science machine-learning

Last synced: 27 Feb 2025

https://github.com/epigen/unsupervised_analysis

A general purpose Snakemake workflow and MrBiomics module to perform unsupervised analyses (dimensionality reduction & cluster analysis) and visualizations of high-dimensional data.

cluster-analysis cluster-validation clustering clustering-algorithm clustree data-science data-visualization densmap dimensionality-reduction heatmap high-dimensional-data leiden-algorithm pca principal-component-analysis snakemake umap unsupervised-learning visualization workflow

Last synced: 15 Apr 2025

https://github.com/theengineeringworld/python-data-science

Python Data Science has all the data sets and jupyter notebook files for the Youtube course at http://youtube.com/theengineeringworld under the name of " Python Data Science Course ".

data data-analysis data-mining data-science data-visualization jupyter-notebook jupyter-notebooks machine-learning python python27

Last synced: 17 Nov 2025

https://github.com/anitagraser/eda-protocol-movement-data

Step-by-step exploratory movement data analysis protocol in a Jupyter notebook

data-quality-assessment data-science exploratory-data-analysis movement-data

Last synced: 25 Feb 2025

https://github.com/dharasim/mcr

Musical Corpora Register: A list of some music datasets

data-science datasets music

Last synced: 05 Jan 2026

https://github.com/computationalcore/introduction-to-python

A very useful collection of Jupyter Notebooks, which aims to introduce the Python programming language.

data-analysis data-science fundamental google-colab jupyter-notebook jupyter-notebooks numpy pandas python python-language python-programming python3

Last synced: 24 Apr 2025

https://github.com/arthurpaulino/miraiml

MiraiML: asynchronous, autonomous and continuous Machine Learning in Python

data-science hyperparameter-optimization machine-learning python

Last synced: 19 Apr 2025

https://github.com/infoslack/ml-book

Entendendo Machine Learning com Scikit-Learn e TensorFlow na prรกtica

data-science machine-learning matplotlib numpy pandas python scikit-learn tensorflow

Last synced: 14 Jul 2025

https://github.com/mkcor/advanced-pandas

Pandas is a powerful tool for data exploration and analysis (including timeseries).

data-analysis data-science labeled-data notebooks python3 teaching-materials

Last synced: 12 Oct 2025

https://github.com/azure/aml-run

GitHub Action that allows you to submit a run to your Azure Machine Learning Workspace.

aml azure azure-machine-learning data-science machine-learning mlops

Last synced: 20 Oct 2025

https://github.com/klaus78/data-science-flashcards

A large collection of challenges on Data Science and Machine Learning.

data-science hacktoberfest jekyll-website machine-learning python

Last synced: 20 Aug 2025

https://github.com/climopy-dev/climopy

๐ŸŒ๐ŸŒ๐ŸŒŽ A succinct toolset for analyzing climate data. This project is a work-in-progress.

climate-analysis climate-science data-science python xarray xarray-accessor

Last synced: 20 Jul 2025

https://github.com/rjbergerud/open-source-for-common-good

A list I'm keeping of active open source projects that serve a social or environmental goal.

citizen-science civic-tech community data-science humanity non-profit social social-impact sustainability

Last synced: 07 Mar 2026

https://github.com/Azure/aml-run

GitHub Action that allows you to submit a run to your Azure Machine Learning Workspace.

aml azure azure-machine-learning data-science machine-learning mlops

Last synced: 29 Jul 2025

https://github.com/lter/lterpalettefinder

Extract Color Palettes from Photos and Pick Official LTER Palettes

color-palette-generator data-science r r-package

Last synced: 11 Jul 2025

https://github.com/lourd/react-google-sheet

Pulling data from Google Sheets with React components

api-client data-science google-sheets javascript react spreadsheets

Last synced: 13 Apr 2025

https://github.com/mrankitgupta/python-roadmap

I am sharing Python lessons from scratch to intermediate with practice sets which I have studied into my Journey of 66DaysofData into Data Analytics.

66daysofdata analytics ankitgupta data-analysis data-analysis-python data-analytics data-mining data-science data-structures data-visualization jupyter matplotlib mrankitgupta numpy pandas programming python python-library python3

Last synced: 14 Jul 2025

https://github.com/thomasnield/bayes_user_input_prediction

Demonstration of using Naive Bayes to predict user inputs with Kotlin 1.2 std-lib

bayes bayes-classifier data-science kotlin

Last synced: 27 Mar 2025

https://github.com/anselmoo/spectrafit

๐Ÿ“Š๐Ÿ“ˆ๐Ÿ”ฌ SpectraFit is a command-line and Jupyter-notebook tool for quick data-fitting based on the regular expression of distribution functions.

console-application curve-fitting data-analysis data-analysis-python data-science data-visualization fitting juypter-notebook python science science-research scientific-plotting spectral-analysis spectroscopy

Last synced: 25 Nov 2025

https://github.com/mpds-io/mpds-api

Tutorials, notebooks, issue tracker, and website on the MPDS API: the data retrieval interface for the Materials Platform for Data Science

calphad crystal-structure crystallography data-science materials materials-informatics materials-platform materials-science mpds-api mpds-platform phase-diagram phase-diagrams

Last synced: 30 Jan 2026

https://github.com/sidgupta234/codingninjas_datascience_machinelearning

The notebooks are written in a way that they are sufficient on their own to learn the basics of Python, Machine Learning and Data Science.

data-science jupyter-notebook machine-learning python-3

Last synced: 25 Oct 2025

https://github.com/elisim/hydra-sklearn-pipelines

Code accompanying the blogpost: "Creating Configurable Data Pre-Processing Pipelines by Combining Hydra and Sklearn" by Eli Simhayev & Benjamin Bodner

data-science hydra machine-learning scikit-learn

Last synced: 09 Aug 2025

https://github.com/staircase-dev/piso

Pandas Interval Set Operations: providing methods for set operations, analytics, lookups and joins on pandas' Interval, IntervalArray and IntervalIndex

data-analysis data-science data-structures interval interval-arithmetic interval-set pandas set set-operations set-theory

Last synced: 20 Aug 2025

https://github.com/jongheepark/bayesiansocialscience

์‚ฌํšŒ๊ณผํ•™์ž๋ฅผ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๊ณผํ•™ ๋ฐฉ๋ฒ•๋ก  (์ฝ”๋“œ ์ €์žฅ์†Œ)

bayesian change-point data-science network social-science textbook

Last synced: 18 Apr 2025