An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/pottekkat/heart-disease-classifier

Given clinical parameters of a patient, can we predict whether or not they have heart disease?

data-science data-visualization heart-disease-analysis heart-disease-predictor jupyter-notebook machine-learning

Last synced: 25 Oct 2025

https://github.com/tnwei/nbread

Snappy previews of Jupyter notebooks from the command line, with ranger integration

data-science jupyter python ranger

Last synced: 22 Apr 2025

https://github.com/dr-montasir/mnjs

MATH NODE JS (MNJS): A tiny math library for node.js & JavaScript on browser

data-analysis data-science javascript js jsdelivr library math nextjs npm react svelte sveltekit ts typescript yarn

Last synced: 26 Apr 2025

https://github.com/andrewhinh/captafied

Multimodal Table Understanding

data-science python

Last synced: 31 Jan 2026

https://github.com/ahammadmejbah/artificial-intelligence-research-and-development-projects

The field of Artificial Intelligence (AI) is a frontier of computer science that focuses on creating systems capable of performing tasks that would typically require human intelligence. This encompasses a wide range of capabilities such as visual perception, speech recognition, decision-making, and language translation.

data-engineering data-science data-visualization database datascience deep-learning deep-learning-algorithms deep-neural-networks deep-reinforcement-learning machine-learning machine-learning-algorithms machine-vision machinelearning

Last synced: 27 Apr 2025

https://github.com/anshumansinha3301/matplotlib_visualizations

Some Graphs using Matplotlib in Python

data-science matplotlib python

Last synced: 07 Oct 2025

https://github.com/chicolucio/ifood-case-data-analyst

Projeto de ensino para o curso Ciรชncia de Dados ministrado por mim na Hashtag

classification-model clustering data-science python segmentation sklearn sklearn-pipeline teaching

Last synced: 07 Oct 2025

https://github.com/teddyoweh/cheat-model

NLP Text Binary Probabilistic Classification Model for predicting cheat statements

data-science machine-learning nlp tokenizer

Last synced: 23 Aug 2025

https://github.com/dsacms/deduplifhir

Prototype for basic deduplication and aggregation of eCQM data

ai cmsoss-tier3 data-science deduplication electron government healthcare poetry python

Last synced: 13 Apr 2025

https://github.com/akbaritabar/dask-duckdb-dbeaver

Parallelised and out of memory data analysis using Dask in Python and DuckDB and DBeaver in SQL. Using example of publicly accessible ORCID 2019 XML files

data-analysis data-science pandas parallel-computing python

Last synced: 08 Aug 2025

https://github.com/splines/deutsche-bahn-analysis

๐Ÿš† Analysis of delays of the Deutsche Bahn (DB)

data-science delay deutsche-bahn public-transport railway

Last synced: 15 Apr 2025

https://github.com/syamkakarla98/datascience_head_start

This repository focuses on the building path for the data science.

data-analysis data-science data-visualization machine-learning machinelearning-python python3

Last synced: 03 May 2025

https://github.com/olekscode/examples-pca-tsne

Some examples of using PCA and t-SNE for dimensionality reduction in Python and R

data-science dimensionality-reduction examples pca t-sne

Last synced: 18 Mar 2025

https://github.com/chaitanyak77/predictive-maintenance-of-gearbox-using-vibration-sensors-data-

This project focuses on the critical task of predictive maintenance in industrial settings, specifically targeting gearbox machinery. By harnessing the power of vibration sensor data, I have developed a predictive maintenance solution that enables early detection of potential faults and failures in gearboxes

data-science internship-task machine-learning

Last synced: 25 Sep 2025

https://github.com/yash22222/ibm-csrbox-internship-project

The objective of the Data Analytics internship at CSRBOX is to provide interns with hands-on experience in applying data analytics techniques to real-world projects in the field of corporate social responsibility (CSR). Interns will gain practical skills in data collection, cleaning, analysis, visualization, and reporting, while working on projects

data-mining data-preprocessing data-science exploratory-data-analysis feature-engineering lemmatization machine-learning pandas pos-tagging random-forest random-forest-classifier scikit-learn sentiment-analysis web-scraping wordcloud

Last synced: 22 Apr 2025

https://github.com/SamEdwardes/pydatafaker

A python package to create fake data with relationships between tables.

data data-science fake-data python

Last synced: 09 Jul 2025

https://github.com/ritvik19/vizard

Intuitive, Interactive, Easy and Quick Visualizations for Data Science Projects

data-analysis data-science data-visualization

Last synced: 10 Apr 2025

https://github.com/open-risk/dataqualitytoolkit

Python toolkit for evaluating and visualizing the data quality of excel spreadsheets

data-quality data-quality-measurement data-science excel spreadsheet

Last synced: 23 Oct 2025

https://github.com/tushar2704/stats-mosaic

Statistical-Minds is a comprehensive GitHub repository that aims to provide a growing collection of curated content and projects centered around statistics and its intersection with data science, machine learning, and artificial intelligence.

aritificial-intelligence data-analytics data-science machine-learning statistical-learning statistical-methods statistics streamlit

Last synced: 07 Aug 2025

https://github.com/thecoderpinar/big-tech-financial-insights

๐Ÿš€ A comprehensive project analyzing Big Tech stock prices using time series analysis, volatility modeling, and macroeconomic indicators. Featuring interactive dashboards and automated reporting! ๐Ÿ“ˆ๐Ÿ’ผ

data-analysis data-science finance machine-learning macroeconomics stock-analysis time-series-analysis volatility-modeling

Last synced: 03 Apr 2025

https://github.com/oscarsaharoy/functionfit

generate functions by placing points on a graph

data-science regression

Last synced: 29 Oct 2025

https://github.com/vicotrbb/data_science

Repository created to store all my studies about data science, machine learning and artificial intelligence.

data-science machine-learning python roadmap studies

Last synced: 14 Apr 2025

https://github.com/tsdataclinic/TREC

Transit Resilience for Essential Commuting (TREC)

climate-change data-science transit-data

Last synced: 20 Jul 2025

https://github.com/twipped/spiral

A bio-cycles tracker for all humans

biology data-science health mobile react-native transgender womens-health

Last synced: 10 Jul 2025

https://github.com/upsonic/server

Self-Driven Autonomous Python Libraries

data data-science gpt-4o library-management ml mlops python

Last synced: 22 Aug 2025

https://github.com/psyplot/psyplot-gui

Graphical User Interface for the psyplot package

data-science gui interactive ipython psyplot qtconsole sphinx

Last synced: 02 May 2025

https://github.com/srohit0/ml-misc

Miscellaneous Machine Learning and Data Analysis Projects

colaboratory data-analysis data-science data-visualization google-colab machine-learning-algorithms

Last synced: 15 Apr 2025

https://github.com/app-generator/devtool-data-converter

Open-Source Data Converter - CVS, XLS, DF | AppSeed

appseed-sample data-converter data-science

Last synced: 01 Aug 2025

https://github.com/thecoderpinar/hms-brainactivity-analysiss

Welcome to the GitHub repo for "HMS - EEG Exploration & Neurocritical Care Journey"! Explore EEG data, understand wave patterns, and delve into conditions like LPDs, GPDs, LRDA, and GRDA.

critical-care data-analysis data-science data-visualization deep-neural-networks eeg eeg-signals exploratory-data-analysis healthcare medical-research neuroscience signal-processing

Last synced: 30 Apr 2025

https://github.com/the-data-dilemma/parquettohuggingface

ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.

audio-dataset audio-processing automatic-speech-recognition data-analysis data-science dataset healthcare-application huggingface huggingface-datasets pandas parquet parquet-generator python3 speech-data speech-recognition speech-to-text speech-translation

Last synced: 21 Aug 2025

https://github.com/tushar2704/common_datasets

Common-datasets is a GitHub repository dedicated to providing a wide collection of common datasets for practicing and learning data science and machine learning.

aritificial-intelligence data-analytics data-engineering data-science data-visualization database dataset-generation datasets machine-learning

Last synced: 09 Aug 2025

https://github.com/shreeparab1890/fifa-wc-2022-qatar-data-analysis-eda

This is a Jupyter Notebook( iPython Notebook) with Data Analysis (EDA) on FIFA WC Qatar 2022 match data.

data-analysis data-analysis-python data-science data-visualization eda fifa matplotlib-pyplot numpy pandas plotly-express python-3

Last synced: 08 Mar 2026

https://github.com/fxstein/code-server-python

VSCode Code Server for Python Developers and Data Scientists

code-server data-science developer docker home-automation iot python synology vscode

Last synced: 25 Jul 2025

https://github.com/lars-quaedvlieg/swizz

Modular Python package for simple visualization and ML pipelines.

data-science latex machine-learning open-source plotting python research tables utilities

Last synced: 22 Jun 2025

https://github.com/arm-university/smart-school-projects

A collection of accessible and engaging projects for teachers and learners that utilise the more advanced features of Arduino in real-world contexts.

arduino coding computerscience computing data-science education educationprojects pbl physical-computing projects stem

Last synced: 15 Jun 2025

https://github.com/kleinhenz/wiki-network-extractor

python module for extracting link networks from wikimedia xml dumps

data-science network-graph python

Last synced: 07 May 2025

https://github.com/bradleyboehmke/r-training-text-mining

Resources for my Text Mining with R course (Mar 8-9, 2018)

data-science education r teaching teaching-materials text-analysis text-mining

Last synced: 13 Apr 2025

https://github.com/alexcj10/analyzing-amazon-sales-data

This repository is dedicated to analyzing Amazon sales data to identify trends and insights that can help improve sales strategies and performance.

amazon beautifulsoup data-analysis data-science data-visualization ecommerce machine-learning matlpotlib numpy pandas python sales skilearn

Last synced: 10 Jul 2025

https://github.com/synthesized-io/synthesized-notebooks

Discover the art of enhancing your data using generative modelling in these notebooks.

data-privacy data-science generative-modelling ml notebooks synthetic-data

Last synced: 14 Jul 2025

https://github.com/fredhutch/tfcb_2022

Course website for MCB 536 Tools for Computational Biology

data-science

Last synced: 16 Aug 2025

https://github.com/kevinknights29/regression--battery-life-prediction

This project uses a regression algorithm to predict the battery life of Li-Ion batteries using the NASA Batteries PCoE dataset

data-science jupyter-notebook

Last synced: 19 Jul 2025

https://github.com/tushar2704/ml-portfolio

This repository showcases a collection of machine learning projects in various domains, demonstrating my skills and expertise as a data scientist and machine learning engineer. Each project provides step-by-step instructions, code, and visualizations to showcase the data analysis and modeling techniques employed.

artificial-intelligence data-science machine-learning portfolio python streamlit-tushar2704 tushar2704

Last synced: 07 May 2025

https://github.com/edaaydinea/estimating-the-probability-of-confirmed-covid-19-cases-taking-into-the-intensive-care-unit-icu-

This repository includes the slides and coding parts for the Estimating the Probability of Confirmed COVID-19 Cases Taking into the Intensive Care Unit (ICU).

covid-19 data-analysis data-science data-visualization machine-learning

Last synced: 11 Apr 2025

https://github.com/akbaritabar/bibliodemography_imprs_phds_2022_idem187

Materials for the day 4 of the course on "Topics in Digital and Computational Demography" on Using large-scale bibliometric data for demographic research; Advantages and pitfalls of using Scopus data to trace internal and international scholarly migration worldwide, Instructor: Aliakbar Akbaritabar

computational-social-science data-science demographic-research migration-research python python3 rstats sql

Last synced: 07 May 2025

https://github.com/cworld1/r-learning

ๅ…ณไบŽ CWorld ๅœจๅญฆไน  R ่ฏญ่จ€ๆ—ถ็š„ไธ€ไบ›็ฌ”่ฎฐ

book data-science learn machine-learning r

Last synced: 09 Jul 2025

https://github.com/apreshill/ohsu-basic-stats

Introduction to Data Wrangling, Analysis, & Communication

data-science education r-stats statistics teaching

Last synced: 05 Mar 2025

https://github.com/cjdoris/chevrons.jl

Your friendly >> chevron >> based syntax for piping data through multiple transformations.

data data-science data-transformation julia julia-lang julia-language macros piping repl

Last synced: 07 Mar 2026

https://github.com/jobar8/subsurface_hackathon_2017

Three notebooks to jump start a data science project

data-science geophysics groundwater ipywidgets

Last synced: 28 Jan 2026

https://github.com/kennethleungty/pymysql-demo

PyMySQLโ€Š-โ€ŠConnecting Python and SQL for Data Science

data-analysis data-science mysql pandas python sql

Last synced: 12 Jul 2025

https://github.com/pathwiselabs/pixel-pipeline

A Python application with Gradio UI for batch processing and captioning of images, allowing for easy integration with AI image training workflows.

data-cleaning data-science flux generative-ai stable-diffusion stable-diffusion-webui

Last synced: 04 Mar 2026

https://github.com/dse-capstone-sharknado/advancedbpr

Amazon Recommendation System build on BPR TensorFlow implementation

data-prep data-science exploratory-analysis ipynb machine-learning recommender-system

Last synced: 15 Oct 2025

https://github.com/ivanrs297/endoscopycorruptions

The endoscopycorruptions Python package provides utilities to simulate common image corruptions that might occur during endoscopic procedures. This tool is designed to assist in the development and testing of image processing algorithms intended for endoscopic imagery by introducing realistic corruptions into clean images.

computer-vision data-science machine-learning medical-imaging python

Last synced: 25 Apr 2026

https://github.com/akashkobal/data-science

I'm excited to share my data science project๐Ÿš€, where I've applied various techniques and insights to solve a specific problem. The project follows best practices for maintainability and reproducibility, using the Data Science Project Template. Dive into the project to explore the code, datasets, documentation, and resources that showcase MyJourney

akash akash-kobal akashkobal applied-data-science artificial-intelligence classification data-science dataanalysis dataanalytics datascienceproject datascientist deep-learning kobal machine-learning prediction regression

Last synced: 17 Mar 2026

https://github.com/rolv-io/rolvapp

Rolv is your AI-powered research assistant for life sciences!

ai biology data-analysis data-science genomics life-sciences medicine

Last synced: 02 Mar 2026

https://github.com/hissain/jscipy

Java Scientific Computing Library for Signal Processing, Filters, and Transformations. A NumPy/SciPy port for JVM & Android, used in Machine Learning and Data Science.

android chebyshev-filter cubic-splines data-science dsp fft findpeaks hilbert-transform interpolation java machine-learning numerical-computing python resample savitzky-golay scientific-computing scipy scipy-signal signal-processing

Last synced: 23 Jan 2026

https://github.com/moindalvs/forecasting_airline_passengers_traffic

Forecast the Airlines Passengers. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

additive arima-forecasting data-science double-exponential-smoothing forecasting holt-winters holt-winters-forecasting multiplicative sarima-model seasonality-analysis simple-exponential-smoothing stationarity stationarity-test time-series-forecasting timeseries-analysis trend-analysis triple-exponential-smoothing

Last synced: 23 Apr 2025

https://github.com/recodehive/recode-website

recodehive helps you to learn and master the skills on data, and encourage you to code on opensource.

data data-science dataengineering opensource python sql tutorials website

Last synced: 15 Mar 2026

https://github.com/bitliner/d3-bipartite-graph

Hello world for bipartite graph in D3.js

charts data-science data-visualization graph

Last synced: 11 Jun 2025

https://github.com/fearlesssolutions/engineering-practice-domains

A mono-repo for the Engineering Practice Domains of Development, Data, Infrastructure, Testing, and Platforms

data data-engineering data-science database-design devops drupal end-to-end-testing engineering infrastructure machine-learning salesforce security testing web-development

Last synced: 26 Oct 2025

https://cufctl.github.io/mlbd/

Repository for the machine learning / big data creative inquiry

data-science high-performance-computing machine-learning python tensorflow

Last synced: 16 Mar 2025

https://github.com/urbanclimatefr/coursera-applied-data-science-with-python

This repository contains the materials to "Applied Data Science with Python", a specialization provided by University of Michigan through Coursera.

coursera data-science machine-learning python3

Last synced: 22 Apr 2025

https://github.com/mrgeislinger/udacitydand_proj_wrangleandanalyzedata

Wrangling and analyzing data project for Udacity's Data Analyst Nanodegree. Wrangles WeRateDogsโ„ข (@dog_rates) Twitter data from local, online, and Twitter API sources.

data-analysis data-analyst data-science datascience jupyter-notebook python3 twitter udacity-data-analyst-nanodegree udacity-nanodegree

Last synced: 09 Oct 2025

https://github.com/mettekou/matrixprofile

The matrix profile data structure and associated algorithms for mining time series data

algorithms anomaly-detection clustering data-mining data-science dotnet fsharp matrixprofile motif-discovery segmentation time-series time-series-analysis

Last synced: 14 Jan 2026

https://github.com/josechirif/2018-house-price-estimation---melbourne-australia

The project proposes to calculate the price of a Melbourne house according to its characteristics.

data data-science python

Last synced: 14 Apr 2025

https://github.com/zasper-io/zasper-benchmark

Benchmarking Zasper v/s JupyterLab (Jupyter Server)

ai data-science ipython jupyter jupyter-notebook jupyterlab machine-learning zasper

Last synced: 17 May 2026

https://github.com/omegaml/dashserve

develop and serve Plotly Dash apps in Jupyter Notebook or JupyterLab

data-science plotly plotly-dash scikit-learn

Last synced: 17 Mar 2026

https://github.com/tulip-lab/modern-data-science

Modern Data Science Course

big-data data-science python

Last synced: 21 Feb 2026