An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/andrea-ballatore/open-geo-data-education

Open Geospatial Datasets for GIS Education: This is a repository of open geospatial datasets to be used in an educational context. I created these files over years of teaching Geographic Data Science and GIS. All original datasets are freely available online with open data licenses (see the dataset attribution for details). All the datasets in this repository have been selected, cleaned, harmonised, and repackaged for GIS exercises in a higher-education context. This is a pretty time-intensive process that other educators can hopefully avoid by using these versions.

data-science geojson geospatial-data geospatial-datasets gis gis-data gis-education tsv

Last synced: 15 Mar 2025

https://github.com/XpressAI/xircuits

Simple visual programming environment for jupyterlab

data-science jupyterlab python

Last synced: 25 Oct 2025

https://github.com/ncfrey/resources

A Highly Opinionated List of Open Source Materials Informatics Resources

data-science getting-started materials-informatics materials-science resources tutorials

Last synced: 17 Mar 2026

https://github.com/Invictify/Jupter-Notebook-REST-API

Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as a REST API Endpoint.

data-science data-science-pipelines docker dockerfile fastapi jupyter python rest-api

Last synced: 15 Mar 2025

https://github.com/woz-u/DS-Student-Resources

Data Science Student Companion Notebooks and Data Lake

data-analysis data-science data-visualization machine-learning nosql python r sql statistics

Last synced: 20 Jul 2025

https://github.com/great-expectations/great_expectations_action

A GitHub Action that makes it easy to use Great Expectations to validate your data pipelines in your CI workflows.

actions continuous-integration data-integrity data-quality data-science mlops

Last synced: 07 Apr 2025

https://github.com/bramvanroy/spacy_conll

Pipeline component for spaCy (and other spaCy-wrapped parsers such as spacy-stanza and spacy-udpipe) that adds CoNLL-U properties to a Doc and its sentences and tokens. Can also be used as a command-line tool.

conll conll-u data-science machine-learning natural-language-processing nlp pandas parser python spacy spacy-extension spacy-pipeline stanford-machine-learning stanford-nlp stanza udpipe

Last synced: 13 Apr 2025

https://github.com/visgl/deck.gl-data

Data for the data visualization library deck.gl examples (https://uber.github.io/deck.gl/#/)

data data-science data-visualization uber

Last synced: 12 Jun 2025

https://github.com/manumerous/vpselector

Visual Pandas Selector: Visualize and interactively select time-series data

data-science data-visualization pandas python selector

Last synced: 26 Mar 2025

https://github.com/imdeepmind/neuralpy

NeuralPy: A Keras like deep learning library works on top of PyTorch

data-science deep-learning keras library machine-learning neural-network neuralpy neuralpy-torch python pytorch

Last synced: 13 Aug 2025

https://github.com/tirthajyoti/synthetic-data-gen

Various methods for generating synthetic data for data science and ML

classification data data-science machine-learning python regression symbolic-computation time-series

Last synced: 30 Apr 2025

https://github.com/mainakrepositor/datasets

A bunch of some 200 datasets. You can call it mini-kaggle :)

csv data data-science database datasets image-files mini-kaggle ml nlp-machine-learning tsv

Last synced: 01 Mar 2025

https://github.com/piquette/qtrn

A cli tool to streamline financial markets data analysis :wrench:

cli data data-science finance go golang options quotes scraper stock stock-analysis stock-market

Last synced: 15 May 2025

https://github.com/thoughtspile/hippotable

๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ”ฌ๐Ÿ“Š Lightweight data analysis in your browser

csv dashboard data-analysis data-science javascript table visualization

Last synced: 06 Oct 2025

https://github.com/flintml/flintml

ML infrastructure for teams that just want to get sh*t done.

data-science deltalake jupyter machine-learning mlops polars

Last synced: 18 Jan 2026

https://github.com/trainingbypackt/applied-deep-learning-with-python

Applied Deep Learning with Python, published by Packt

data-science deep-learning machine-learning python

Last synced: 10 Apr 2025

https://github.com/ploomber/soorgeon

Convert monolithic Jupyter notebooks ๐Ÿ“™ into maintainable Ploomber pipelines. ๐Ÿ“Š

data-engineering data-science jupyter jupyter-notebooks machine-learning mlops workflow

Last synced: 10 Apr 2025

https://github.com/siddhujetty/Product-analytics-insights-collection

My Solutions to "A Collection of Data Science Take-Home Challenges" by Giulio Palombo.

data-science machine-learning r-programming solutions take-home-test

Last synced: 29 Jul 2025

https://github.com/5agado/conversation-analyzer

Analyzer and statistics generator for text-based conversations. Includes Facebook scraper and parser

data-science facebook quantified-self scraper

Last synced: 16 Apr 2025

https://github.com/cannlytics/cannabis-data-science

๐Ÿš€ Cannabis Data Science repository powered by ๐Ÿ”ฅ Cannlytics. ๐Ÿง‘โ€๐Ÿš€ Meetup, code, and advance cannabis science ๐Ÿงช. Join the fun!

cannabis cannabis-api cannabis-data data-science python statustics

Last synced: 19 Jun 2026

https://github.com/ndleah/8-week-sql-challenge

#8WeekSQLChallenge by Danny Ma.

data-analysis data-science sql

Last synced: 25 Oct 2025

https://github.com/urbslab/streamline

Simple Transparent End-To-End Automated Machine Learning Pipeline for Supervised Learning in Tabular Binary Classification Data

automl-pipeline binary-classification data-science data-visualization feature-selection imputation machine-learning model-application statistical-analysis supervised-learning

Last synced: 12 Jul 2025

https://github.com/wurmlab/oswitch

Provides access to complex Bioinformatics software (even BioLinux!) in just one command.

bioinformatics data-science docker virtualization

Last synced: 10 Apr 2025

https://github.com/dominodatalab/domino-research

Projects developed by Domino's R&D team

data-science mlflow mlops python sagemaker

Last synced: 11 Apr 2025

https://github.com/kianweelee/Edator

A python package that performs exploratory data analysis for users. Additionally, it generates 3 types of output files (cleaned CSV, plots and a text report).

data-analysis data-science exploratory-data-analysis

Last synced: 08 May 2025

https://github.com/jonrau1/SyntheticSun

SyntheticSun is a defense-in-depth security automation and monitoring framework which utilizes threat intelligence, machine learning, managed AWS security services and, serverless technologies to continuously prevent, detect and respond to threats.

anomaly-detection automation aws aws-security aws-serverless data-science data-visualization elasticsearch geolocation guardduty incident-response kibana machine-learning misp sagemaker security-automation security-tools serverless threat-detection threat-intelligence

Last synced: 12 Jul 2025

https://github.com/nbarrowman/vtree

An R package for calculating and drawing variable trees

data-science data-visualization exploratory-data-analysis r statistics

Last synced: 11 Oct 2025

https://github.com/scicloj/wolframite

An interface between Clojure and Wolfram Language (the language of Mathematica)

clojure data-science mathematica wolfram-language

Last synced: 05 Apr 2025

https://github.com/Vitruves/nail-parquet

Fast parquet command line tool with many functions, nailed it!

cli command-line-tool data-science database-management parquet parquet-format xlsx

Last synced: 18 Mar 2026

https://github.com/capitalone/dataCompareR

dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.

compare-data data data-analysis data-science r

Last synced: 30 Jul 2025

https://github.com/reymond-group/faerun-python

A python module for generating interactive views of chemical spaces.

chemical-spaces chemistry data-science data-visualization plotting python

Last synced: 06 Mar 2026

https://github.com/anovos/anovos

Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark

bigdata data-science feature-engineering feature-recommendation machine-learning pyspark python scale transformation visualization

Last synced: 08 Apr 2026

https://github.com/aiwithqasim/Free-Artificial-Intelligence-Resources

Welcome, to this Open Source Repository regarding FREE ARTIFICIAL INTELLIGENCE RESOURCE. Get Benefit from the free resources mention & kindly five STAR & FORK this so that it can get maximum Fame so that Everyone can take advantage.

ai article artificial-intelligence artificial-neural-networks blog data-science datascientist deep-learning freeresources hacktoberfest hecktoberfest2021 jobs machine-learning machine-learning-algorithms natural-language-processing nlp project python3 youtube

Last synced: 01 Apr 2025

https://github.com/glemaitre/pyparis-2018-sklearn

PyParis tutorial on machine learning using scikit-learn

data-science machine-learn pandas scikit-learn

Last synced: 09 Oct 2025

https://github.com/xiaodaigh/jlboost.jl

A 100%-Julia implementation of Gradient-Boosting Regression Tree algorithms

catboost data-science gbdt gbrt lightgbm machine-learning tree tree-boosting-algorithms xgboost

Last synced: 17 Jan 2026

https://github.com/felipenoris/math-server-docker

The ideal multi-user Data Science server with Jupyterhub and RStudio, ready for Python, R and Julia languages.

data-science docker julia julia-language jupyter jupyter-kernels jupyterhub jupyterlab latex python rstudio-servers shiny-server

Last synced: 21 Sep 2025

https://github.com/bcgov/bcmaps

An R package of map layers for British Columbia

data-science env r r-package rstats

Last synced: 04 Apr 2025

https://github.com/grailbio/bio

Bioinformatic infrastructure libraries

bioinformatics data-science golang

Last synced: 19 Apr 2025

https://github.com/tomasonjo/graphs-network-science

Accompanying repository for my book about Graph Data Science

algorithms data-science graph graph-algorithms machine-learning

Last synced: 30 Apr 2025

https://github.com/umessen/fhir-pyrate

FHIR-PYrate is a package that provides a high-level API to query FHIR Servers for bundles of resources and return the structured information as pandas DataFrames. It can also be used to filter resources using RegEx and SpaCy and download DICOM studies and series.

data-science fhir fhirpath healthcare pyrate python ship ukessen ume

Last synced: 03 Feb 2026

https://github.com/robertmartin8/udemyml

Templates, code and notes for Kirill Eremenko's Machine Learning course

data-science machine-learning python r tutorial udemy udemy-machine-learning

Last synced: 30 Apr 2025

https://github.com/verynifty/RolodETH

A Rolodex for popular Ethereum chain address.

data-science ethereum ethereum-blockchain

Last synced: 12 May 2025

https://github.com/davidrpugh/pybea

Python package for downloading data from the Bureau of Economic Analysis (BEA) data API.

data-science economics python-3

Last synced: 17 Aug 2025

https://github.com/shenxiangzhuang/pythondataanalysis

The data and code that used in my book.

data-science python3 webcrawler

Last synced: 08 Aug 2025

https://github.com/uc-r/Advanced-R

Advanced Analytics with R training material delivered in a 2 day format

data-science educational-materials r training-materials workshop-materials

Last synced: 04 May 2025

https://github.com/lesander/netflix-viewing-activity

:tv: Download your Netflix account viewing activity in JSON or CSV.

chrome-extension csv data-science javascript js json netflix netflix-api

Last synced: 15 Mar 2026

https://github.com/yusufcinarci/data-science-projects

In this repo, there are (beginner-upper) level projects in the field of data science. I will host these projects that I have done in this field every day in this repo. With the hope that it will be useful to those who are interested in the field of data science like me and will just start...

data-analysis data-science data-science-projects jupyter jupyter-notebook python

Last synced: 14 Mar 2026

https://github.com/aiwithqasim/free-artificial-intelligence-resources

Welcome, to this Open Source Repository regarding FREE ARTIFICIAL INTELLIGENCE RESOURCE. Get Benefit from the free resources mention & kindly five STAR & FORK this so that it can get maximum Fame so that Everyone can take advantage.

ai article artificial-intelligence artificial-neural-networks blog data-science datascientist deep-learning freeresources hacktoberfest hecktoberfest2021 jobs machine-learning machine-learning-algorithms natural-language-processing nlp project python3 youtube

Last synced: 17 Mar 2025

https://github.com/data-centric-ai/dcbench

A benchmark of data-centric tasks from across the machine learning lifecycle.

data-science machine-learning

Last synced: 27 Mar 2025

https://github.com/pbower/minarrow

Apache Arrow and Polars compatible, Rust-first columnar data library for real-time and systems workloads

arrow data-science dataengineering polars rust

Last synced: 17 May 2026

https://github.com/paris-saclay-cds/ramp-workflow

Toolkit for building predictive workflows on top of pydata (pandas, scikit-learn, pytorch, keras, etc.).

data-challenge data-science python ramp

Last synced: 14 Dec 2025

https://github.com/castlelemongrab/parlance

A minimum-dependency ECMAScript client library and CLI tool for Parler โ€“ a "free speech" social network that accepts real money to buy "influence" points to boost organic non-advertising content

data-science datascience datascraping disinformation es7 hatespeech javascript law-enforcement misinformation node nodejs osint parlance parler social-media social-networks speech twitter

Last synced: 15 Apr 2025

https://github.com/mdeff/ntds_2019

Material for the EPFL master course "A Network Tour of Data Science", edition 2019.

data-science education epfl graph-neural-networks graphs network-science

Last synced: 01 Aug 2025

https://github.com/shenxiangzhuang/PythonDataAnalysis

The data and code that used in my book.

data-science python3 webcrawler

Last synced: 26 Mar 2025

https://github.com/cannlytics/cannlytics

๐Ÿ”ฅ Cannlytics = cannabis + analytics. Data pipelines, user interfaces, and the best statistics in the game. Made with โค๏ธ

cannabis cannabis-api cannabis-app cannabis-data cannabis-scripts cannabis-strains cannabis-variety cannabisapp data-mining data-science django firebase machine-learning metrc nlp python strain-data terpene-profile terpenes

Last synced: 19 Jun 2026

https://github.com/gesiscss/css_methods_python

A full course of self-explanatory and freely available materials on CSS methods

data-science jupyter-notebook python

Last synced: 15 Mar 2025

https://github.com/argilla-io/biome-text

Custom Natural Language Processing with big and small models ๐ŸŒฒ๐ŸŒฑ

allennlp data-science natural-language-processing nlp pytorch

Last synced: 07 Oct 2025

https://github.com/devsgnr/breadroll

breadroll ๐ŸฅŸ is a simple lightweight library for data processing operations written in Typescript and powered by Bun.

bun csv csv-parser data-engineering data-science data-transformation eda exploratory-data-analysis tsv tsv-parser

Last synced: 11 Oct 2025

https://github.com/meteostat/weather-stations

A list of public weather stations everyone can edit and share.

climate data-science json meteostat weather weather-stations

Last synced: 20 Jul 2025

https://github.com/mitre/menelaus

Online and batch-based concept and data drift detection algorithms to monitor and maintain ML performance.

concept-drift data-drift data-science drift-detection machine-learning statistics

Last synced: 16 Mar 2026