An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/sirius248/introduction-to-data-science-in-python

Introduction to Data Science in Python (Coursera)

data-science python

Last synced: 11 Nov 2025

https://github.com/kalyan4636/python-eering

PYTHON PROJECT WITH SOURCE CODE. the best Python project name is one that is descriptive, memorable, and fun for you to say. Don't be afraid to get creative and use emojis to make your project stand out! 📈

artificial-intelligence artificial-intelligence-algorithms data-science deep-learning django framework machine-learning machine-learning-algorithms numpy opencv opencv-python opensource pandas pil-tinker pillow python python-3 python-library python3

Last synced: 23 Apr 2025

https://github.com/kennethleungty/pymysql-demo

PyMySQL - Connecting Python and SQL for Data Science

data-analysis data-science mysql pandas python sql

Last synced: 12 Jul 2025

https://github.com/adrtod/rchallenge

A simple datascience challenge system using R Markdown and Dropbox.

challenge data-science r

Last synced: 21 Feb 2026

https://github.com/gesiscss/ptm

Introduction to Natural Language Processing with a special emphasis on the analysis of Job Advertisements

binder data-science information-retrieval labour-market nlp r text-mining topic-modeling

Last synced: 07 May 2025

https://github.com/badr-moufad/cookiecutter-simple-ds-project

A simple cookiecutter template to structure your Data Science projects.

cookiecutter data-science project-structure python simple-ds-project

Last synced: 23 Apr 2025

https://cufctl.github.io/mlbd/

Repository for the machine learning / big data creative inquiry

data-science high-performance-computing machine-learning python tensorflow

Last synced: 16 Mar 2025

https://github.com/akkefa/ml-notes

Notes for Mathematics for Machine learning and Data Science.

book computer-science data-science linear-algebra mathematics notes probability statistics topics

Last synced: 04 Feb 2026

https://github.com/zackakil/friendlier-data-labelling

Code resources for generating a google form for labelling data.

data-science google google-apps-script google-forms google-sheets machine-learning

Last synced: 04 Oct 2025

https://github.com/tulip-lab/modern-data-science

Modern Data Science Course

big-data data-science python

Last synced: 21 Feb 2026

https://github.com/urbanclimatefr/coursera-applied-data-science-with-python

This repository contains the materials to "Applied Data Science with Python", a specialization provided by University of Michigan through Coursera.

coursera data-science machine-learning python3

Last synced: 22 Apr 2025

https://github.com/csfelix/datascience-exercises

🐍 Just some DataScience exercises, nothing more... 🐍 (🔑 KeyWords: python, data science, data analysis, pandas 🔑)

data-analysis data-science datascience pandas python python3

Last synced: 05 Jul 2025

https://github.com/akbaritabar/bibliodemography_imprs_phds_2022_idem187

Materials for the day 4 of the course on "Topics in Digital and Computational Demography" on Using large-scale bibliometric data for demographic research; Advantages and pitfalls of using Scopus data to trace internal and international scholarly migration worldwide, Instructor: Aliakbar Akbaritabar

computational-social-science data-science demographic-research migration-research python python3 rstats sql

Last synced: 07 May 2025

https://github.com/josechirif/2018-house-price-estimation---melbourne-australia

The project proposes to calculate the price of a Melbourne house according to its characteristics.

data data-science python

Last synced: 14 Apr 2025

https://github.com/mituskillologies/ds-diploma-internship-jun24

Programs conducted at MITU Skillologies, Pune office in internship training on Data Science during June-July 2024 for Diploma Engineering Students.

data-analytics data-science data-visualization machine-learning project python python3

Last synced: 09 Apr 2025

https://github.com/mettekou/matrixprofile

The matrix profile data structure and associated algorithms for mining time series data

algorithms anomaly-detection clustering data-mining data-science dotnet fsharp matrixprofile motif-discovery segmentation time-series time-series-analysis

Last synced: 14 Jan 2026

https://github.com/blacksuan19/redash-python

A More complete Redash API python client

dashboards data-science data-visualization python

Last synced: 24 Apr 2025

https://github.com/noorkhokhar99/plagiarsim-checker

Plagiarsim checker using cosine algorithm #Plagiarsimchecker

ai api checker data-science database nlp nlptk plagiarsim python

Last synced: 16 Oct 2025

https://github.com/epiverse-trace/epi-training-kit

An e-learning strategy for training on analysis, modelling and response to outbreaks and epidemics in Latin-America and the Caribbean

data-science e-learning epidemics training

Last synced: 07 Jul 2025

https://github.com/moindalvs/forecasting_airline_passengers_traffic

Forecast the Airlines Passengers. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

additive arima-forecasting data-science double-exponential-smoothing forecasting holt-winters holt-winters-forecasting multiplicative sarima-model seasonality-analysis simple-exponential-smoothing stationarity stationarity-test time-series-forecasting timeseries-analysis trend-analysis triple-exponential-smoothing

Last synced: 23 Apr 2025

https://github.com/hissain/jscipy

Java Scientific Computing Library for Signal Processing, Filters, and Transformations. A NumPy/SciPy port for JVM & Android, used in Machine Learning and Data Science.

android chebyshev-filter cubic-splines data-science dsp fft findpeaks hilbert-transform interpolation java machine-learning numerical-computing python resample savitzky-golay scientific-computing scipy scipy-signal signal-processing

Last synced: 23 Jan 2026

https://github.com/phazerooman/dcai-ocr-krooki

OCR model, trained for extracted coordinates from Omani title deeds ("krooki") utilizing a Data Centric AI (DCAI) approach.

ai data-science ocr python

Last synced: 12 Apr 2025

https://github.com/moindalvs/resume_screening_and_parser

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Sample Data Set Details: Resumes and financial documents

data-science doc2txt doc2vec docx-converter docx-to-pdf docx2txt pdf-document-processor pdf2txt streamlit text text-analysis text-classification text-mining text-processing unstructured-data

Last synced: 23 Apr 2025

https://github.com/sondosaabed/introduction-to-sql

Course with udacity that cover SQL for data Scientists, this is my solution for the lessons and the project

aggregations data-science dvd-rental-database joins nanodegree sql subqueries udacity-nanodegree

Last synced: 21 Jan 2026

https://github.com/the-data-dilemma/parquettohuggingface

ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.

audio-dataset audio-processing automatic-speech-recognition data-analysis data-science dataset healthcare-application huggingface huggingface-datasets pandas parquet parquet-generator python3 speech-data speech-recognition speech-to-text speech-translation

Last synced: 21 Aug 2025

https://github.com/mrgeislinger/udacitydand_proj_wrangleandanalyzedata

Wrangling and analyzing data project for Udacity's Data Analyst Nanodegree. Wrangles WeRateDogs™ (@dog_rates) Twitter data from local, online, and Twitter API sources.

data-analysis data-analyst data-science datascience jupyter-notebook python3 twitter udacity-data-analyst-nanodegree udacity-nanodegree

Last synced: 09 Oct 2025

https://github.com/zehracakir/verimadenciliginotlarim

My notes and my own studies in the Data Mining course in the computer engineering department of Süleyman Demirel University

classifying clustering data data-mining data-science linear-regression machine-learning pandas python

Last synced: 18 Jun 2025

https://github.com/edaaydinea/python-ml-dl-ds-projects

This repository is included artificial intelligence, machine learning, data science, computer vision projects which are written Python language.

computer-vision data-science deep-learning machine-learning projects python

Last synced: 02 Jul 2025

https://github.com/ashwinpn/advanced-python

Python for Machine Learning/AI/DS, Game Theory and Convex Optimization using Python, Managing Docker in Python, Web Scraping / Development in Python using Django and Flask, Functional Programming in Python.

convex-optimization data-science docker flask functional-programming game-theory machine-learning machine-learning-algorithms python web-development web-scraping

Last synced: 13 Apr 2025

https://codeformunich.github.io/radlquartier/

Command-line tool to prepare and extract bike sharing data. Plus example implementations of visualizations and a example website.

data-science data-visualization munich open-data visualization

Last synced: 02 May 2025

https://github.com/tushar2704/ml-portfolio

This repository showcases a collection of machine learning projects in various domains, demonstrating my skills and expertise as a data scientist and machine learning engineer. Each project provides step-by-step instructions, code, and visualizations to showcase the data analysis and modeling techniques employed.

artificial-intelligence data-science machine-learning portfolio python streamlit-tushar2704 tushar2704

Last synced: 07 May 2025

https://github.com/alexcj10/analyzing-amazon-sales-data

This repository is dedicated to analyzing Amazon sales data to identify trends and insights that can help improve sales strategies and performance.

amazon beautifulsoup data-analysis data-science data-visualization ecommerce machine-learning matlpotlib numpy pandas python sales skilearn

Last synced: 10 Jul 2025

https://github.com/edaaydinea/estimating-the-probability-of-confirmed-covid-19-cases-taking-into-the-intensive-care-unit-icu-

This repository includes the slides and coding parts for the Estimating the Probability of Confirmed COVID-19 Cases Taking into the Intensive Care Unit (ICU).

covid-19 data-analysis data-science data-visualization machine-learning

Last synced: 11 Apr 2025

https://github.com/cworld1/r-learning

关于 CWorld 在学习 R 语言时的一些笔记

book data-science learn machine-learning r

Last synced: 09 Jul 2025

https://github.com/bradleyboehmke/r-training-text-mining

Resources for my Text Mining with R course (Mar 8-9, 2018)

data-science education r teaching teaching-materials text-analysis text-mining

Last synced: 13 Apr 2025

https://github.com/ruban2205/machine_learning_fundamentals

This repository contains a collection of fundamental topics and techniques in machine learning. It aims to provide a comprehensive understanding of various aspects of machine learning through simplified notebooks. Each topic is covered in a separate notebook, allowing for easy exploration and learning.

adaboost-classifier agglomerative-clustering apriori-algorithm data-preprocessing data-science ensemble-learning fuzzy-cmeans-clustering machine-learning machine-learning-algorithms machine-learning-models multilayer-perceptron python random-forest-classifier self-organizing-map single-layer-perceptron

Last synced: 26 Oct 2025

https://github.com/manikantasanjay/crop_yield_prediction_regression

Crop Yield Prediction using various ML approaches - Random-Forest Regressor, Gradient-Boosting Regressor, Decision-Tree Regressor, Support-Vector Regressor

crop-yield-prediction data-science decision-trees gradient-boosting random-forest regression-analysis support-vector-machines

Last synced: 11 Apr 2025

https://github.com/paypal/gators

Gators is a package to handle model building with big data and fast real-time pre-processing, even for a large number of QPS, using only Python.

big-data data-science machine-learning python

Last synced: 02 May 2025

https://github.com/kleinhenz/wiki-network-extractor

python module for extracting link networks from wikimedia xml dumps

data-science network-graph python

Last synced: 07 May 2025

https://github.com/philipperemy/github-full-data-set

Generating GitHub data (~1M repositories May 2017).

data-science dataset github github-api kaggle machine-learning

Last synced: 07 May 2025

https://github.com/hassanalgoz/python

كتاب البايثونية: مدخل عملي لتعلم البرمجة بلغة بايثون. كتاب موجه للمبتدئين في البرمجة من خلفيات تقنية أو غير تقنية، يعرض المفاهيم الأساسية بلغة واضحة، بتسلسل منطقي، مع مسائل واقعية وتطبيقات نافعة، بعيدًا عن العشوائية والسطحية. يصلح للتعلم الذاتي، وللتدريس كذلك.

ai arabic curriculum data-science learn-to-code learning-by-doing programming project-based-learning python

Last synced: 03 May 2025

https://github.com/prem07a/credit-score-classification

This is ML project which is based on Classification of Credit Score

data-science fastapi feature-extraction machine-learning python3 sklearn-classify website

Last synced: 13 Apr 2025

https://github.com/smac-group/ds

:notebook: This book is currently under development and has been designed as a support for students who are following (or are interested in) courses that provide the basic knowledge to master "statistical programming" with R. Compiled textbook:

data-science github programming r rstudio statistics

Last synced: 22 Jul 2025

https://github.com/sabyasachi-seal/stockmarketprediction

Stock Market Prediction using Numerical and Textual Analysis

aiml analysis data-science data-visualization machine-learning notebook prediction python

Last synced: 08 May 2025

https://github.com/hritik5102/fundamentals_of_ds_ml_dl

The repository encompasses the core concepts of Python, Statistics, Machine Learning, Deep Learning, Computer Vision, and Natural Language Processing.

computer-vision data-science deep-learning machine-learning neural-network statistics

Last synced: 13 May 2025

https://github.com/jobar8/subsurface_hackathon_2017

Three notebooks to jump start a data science project

data-science geophysics groundwater ipywidgets

Last synced: 28 Jan 2026

https://github.com/synthesized-io/synthesized-notebooks

Discover the art of enhancing your data using generative modelling in these notebooks.

data-privacy data-science generative-modelling ml notebooks synthetic-data

Last synced: 14 Jul 2025

https://github.com/egenn/pdsr

Code for the online PDSR book

data-science learning r rstats

Last synced: 17 Jun 2025

https://github.com/sachinl0har/data-analytics

Data Analytics in Python. Numpy, Pandas, Matplotlib, Seaborn. Still Learning...

data-analytics data-science data-visualization matplotlib numpy pandas python seaborn

Last synced: 07 Jul 2025

https://github.com/thecoderpinar/hms-brainactivity-analysiss

Welcome to the GitHub repo for "HMS - EEG Exploration & Neurocritical Care Journey"! Explore EEG data, understand wave patterns, and delve into conditions like LPDs, GPDs, LRDA, and GRDA.

critical-care data-analysis data-science data-visualization deep-neural-networks eeg eeg-signals exploratory-data-analysis healthcare medical-research neuroscience signal-processing

Last synced: 30 Apr 2025

https://github.com/linwin-cloud/linwin-db-server

在广袤无垠的现代大数据海洋之中,计算机深度的和信息以及数据绑定,承载这亿万数据的就是数据库软件。 Linwin Data Server,基于Java开发的国产高性能数据库软件。支持国产和Linux操作系统,支持多用户操作。采用Nosql结构,自研mys数据库操作语言,更加简单方便高效。 用户数据的增删改查全部在内存内操作,与硬盘的交互写入读取交由专门的线程管理,无不妨碍.

data data-science database hashmap http java javascript key-value linux programming-language python server typescript webserver website

Last synced: 05 Mar 2026

https://github.com/akashkobal/data-science

I'm excited to share my data science project🚀, where I've applied various techniques and insights to solve a specific problem. The project follows best practices for maintainability and reproducibility, using the Data Science Project Template. Dive into the project to explore the code, datasets, documentation, and resources that showcase MyJourney

akash akash-kobal akashkobal applied-data-science artificial-intelligence classification data-science dataanalysis dataanalytics datascienceproject datascientist deep-learning kobal machine-learning prediction regression

Last synced: 17 Mar 2026

https://github.com/pathwiselabs/pixel-pipeline

A Python application with Gradio UI for batch processing and captioning of images, allowing for easy integration with AI image training workflows.

data-cleaning data-science flux generative-ai stable-diffusion stable-diffusion-webui

Last synced: 04 Mar 2026

https://github.com/raphaelsenn/playervectors

Implementation of the paper "Player Vectors: Characterizing Soccer Players Playing Style from Match Event Streams".

data-science

Last synced: 04 Mar 2026

https://github.com/rolv-io/rolvapp

Rolv is your AI-powered research assistant for life sciences!

ai biology data-analysis data-science genomics life-sciences medicine

Last synced: 02 Mar 2026

https://github.com/neverinfamous/postgres-mcp

Secure PostgreSQL Administration & Observability with Code Mode— True V8 Isolate Sandbox Replacing 248 Specialized Tools for up to 90% Token Savings. Includes Tool Filtering, Payload Optimization, HTTP/SSE, OAuth 2.1, Audit & Token Logging, Deterministic Error Handling, Support for 12 Extensions (pgvector, PostGIS, pg_partman, pg_cron & more).

ai-agents citext code-mode data-science database database-management developer-tools hypopg kcache ltree mcp npm oauth2 pg-cron pg-partman pgcrypto pgvector postgis postgresql typescript

Last synced: 09 Apr 2026

https://github.com/ucdavisdatalab/workshop_web_maps

Learn to build an interactive web map to display spatial data

data-science geospatial-visualization teaching-materials ucdavis ucdavis-datalab workshop

Last synced: 05 Mar 2026

https://github.com/tchlux/util

My machine learning, optimization, and data science utilities package.

data-science machine-learning numerical-optimization python-utilities splines statistics visualization

Last synced: 02 May 2026

https://github.com/mine-cetinkaya-rundel/feedback-at-scale

Slides and sample learnr tutorial for rstudio::global(2021) talk

data-science gradethis learnr rstats tutorial

Last synced: 11 Feb 2026

https://github.com/betaandbit/wykresy

Wykresy od kuchni

data-science wizualizacja wykresy

Last synced: 01 Feb 2026

https://github.com/f0nzie/volve-reservoir-model-evolution

Volve dataset. Reservoir model. Analyze steps and field cumulatives from Eclipse PRT file

data-science petroleum-engineering reservoir-modeling reservoir-simulation rstats

Last synced: 11 Feb 2026