An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/recodehive/recode-website

recodehive helps you to learn and master the skills on data, and encourage you to code on opensource.

data data-science dataengineering opensource python sql tutorials website

Last synced: 15 Mar 2026

https://github.com/giswqs/learning-scipy

Learning SciPy for Numerical and Scientific Computing

data-science jupyter-notebook python scipy

Last synced: 12 May 2025

https://github.com/dse-capstone-sharknado/advancedbpr

Amazon Recommendation System build on BPR TensorFlow implementation

data-prep data-science exploratory-analysis ipynb machine-learning recommender-system

Last synced: 15 Oct 2025

https://github.com/rolv-io/rolvapp

Rolv is your AI-powered research assistant for life sciences!

ai biology data-analysis data-science genomics life-sciences medicine

Last synced: 02 Mar 2026

https://github.com/the-pew-inc/the-pew

ThePew is an advanced system of records that enables enterprises to detect trends and patterns from questions to drive marketing and business decisions toward their goals.

data data-science docker javascript machine-learning postgresql rails ruby

Last synced: 06 Oct 2025

https://github.com/linwin-cloud/linwin-db-server

在广袤无垠的现代大数据海洋之中,计算机深度的和信息以及数据绑定,承载这亿万数据的就是数据库软件。 Linwin Data Server,基于Java开发的国产高性能数据库软件。支持国产和Linux操作系统,支持多用户操作。采用Nosql结构,自研mys数据库操作语言,更加简单方便高效。 用户数据的增删改查全部在内存内操作,与硬盘的交互写入读取交由专门的线程管理,无不妨碍.

data data-science database hashmap http java javascript key-value linux programming-language python server typescript webserver website

Last synced: 05 Mar 2026

https://github.com/raphaelsenn/playervectors

Implementation of the paper "Player Vectors: Characterizing Soccer Players Playing Style from Match Event Streams".

data-science

Last synced: 04 Mar 2026

https://github.com/akashkobal/data-science

I'm excited to share my data science project🚀, where I've applied various techniques and insights to solve a specific problem. The project follows best practices for maintainability and reproducibility, using the Data Science Project Template. Dive into the project to explore the code, datasets, documentation, and resources that showcase MyJourney

akash akash-kobal akashkobal applied-data-science artificial-intelligence classification data-science dataanalysis dataanalytics datascienceproject datascientist deep-learning kobal machine-learning prediction regression

Last synced: 17 Mar 2026

https://github.com/mrgeislinger/udacitydand_proj_wrangleandanalyzedata

Wrangling and analyzing data project for Udacity's Data Analyst Nanodegree. Wrangles WeRateDogs™ (@dog_rates) Twitter data from local, online, and Twitter API sources.

data-analysis data-analyst data-science datascience jupyter-notebook python3 twitter udacity-data-analyst-nanodegree udacity-nanodegree

Last synced: 09 Oct 2025

https://github.com/fearlesssolutions/engineering-practice-domains

A mono-repo for the Engineering Practice Domains of Development, Data, Infrastructure, Testing, and Platforms

data data-engineering data-science database-design devops drupal end-to-end-testing engineering infrastructure machine-learning salesforce security testing web-development

Last synced: 26 Oct 2025

https://github.com/hassanalgoz/python

كتاب البايثونية: مدخل عملي لتعلم البرمجة بلغة بايثون. كتاب موجه للمبتدئين في البرمجة من خلفيات تقنية أو غير تقنية، يعرض المفاهيم الأساسية بلغة واضحة، بتسلسل منطقي، مع مسائل واقعية وتطبيقات نافعة، بعيدًا عن العشوائية والسطحية. يصلح للتعلم الذاتي، وللتدريس كذلك.

ai arabic curriculum data-science learn-to-code learning-by-doing programming project-based-learning python

Last synced: 03 May 2025

https://github.com/edaaydinea/estimating-the-probability-of-confirmed-covid-19-cases-taking-into-the-intensive-care-unit-icu-

This repository includes the slides and coding parts for the Estimating the Probability of Confirmed COVID-19 Cases Taking into the Intensive Care Unit (ICU).

covid-19 data-analysis data-science data-visualization machine-learning

Last synced: 11 Apr 2025

https://github.com/edaaydinea/python-ml-dl-ds-projects

This repository is included artificial intelligence, machine learning, data science, computer vision projects which are written Python language.

computer-vision data-science deep-learning machine-learning projects python

Last synced: 02 Jul 2025

https://github.com/sachinl0har/data-analytics

Data Analytics in Python. Numpy, Pandas, Matplotlib, Seaborn. Still Learning...

data-analytics data-science data-visualization matplotlib numpy pandas python seaborn

Last synced: 07 Jul 2025

https://codeformunich.github.io/radlquartier/

Command-line tool to prepare and extract bike sharing data. Plus example implementations of visualizations and a example website.

data-science data-visualization munich open-data visualization

Last synced: 02 May 2025

https://github.com/bradleyboehmke/r-training-text-mining

Resources for my Text Mining with R course (Mar 8-9, 2018)

data-science education r teaching teaching-materials text-analysis text-mining

Last synced: 13 Apr 2025

https://github.com/hritik5102/fundamentals_of_ds_ml_dl

The repository encompasses the core concepts of Python, Statistics, Machine Learning, Deep Learning, Computer Vision, and Natural Language Processing.

computer-vision data-science deep-learning machine-learning neural-network statistics

Last synced: 13 May 2025

https://github.com/cworld1/r-learning

关于 CWorld 在学习 R 语言时的一些笔记

book data-science learn machine-learning r

Last synced: 09 Jul 2025

https://github.com/tushar2704/ml-portfolio

This repository showcases a collection of machine learning projects in various domains, demonstrating my skills and expertise as a data scientist and machine learning engineer. Each project provides step-by-step instructions, code, and visualizations to showcase the data analysis and modeling techniques employed.

artificial-intelligence data-science machine-learning portfolio python streamlit-tushar2704 tushar2704

Last synced: 07 May 2025

https://github.com/kevinknights29/regression--battery-life-prediction

This project uses a regression algorithm to predict the battery life of Li-Ion batteries using the NASA Batteries PCoE dataset

data-science jupyter-notebook

Last synced: 19 Jul 2025

https://github.com/egenn/pdsr

Code for the online PDSR book

data-science learning r rstats

Last synced: 17 Jun 2025

https://github.com/zehracakir/verimadenciliginotlarim

My notes and my own studies in the Data Mining course in the computer engineering department of Süleyman Demirel University

classifying clustering data data-mining data-science linear-regression machine-learning pandas python

Last synced: 18 Jun 2025

https://github.com/sabyasachi-seal/stockmarketprediction

Stock Market Prediction using Numerical and Textual Analysis

aiml analysis data-science data-visualization machine-learning notebook prediction python

Last synced: 08 May 2025

https://github.com/thecoderpinar/hms-brainactivity-analysiss

Welcome to the GitHub repo for "HMS - EEG Exploration & Neurocritical Care Journey"! Explore EEG data, understand wave patterns, and delve into conditions like LPDs, GPDs, LRDA, and GRDA.

critical-care data-analysis data-science data-visualization deep-neural-networks eeg eeg-signals exploratory-data-analysis healthcare medical-research neuroscience signal-processing

Last synced: 30 Apr 2025

https://github.com/paypal/gators

Gators is a package to handle model building with big data and fast real-time pre-processing, even for a large number of QPS, using only Python.

big-data data-science machine-learning python

Last synced: 02 May 2025

https://github.com/alexcj10/analyzing-amazon-sales-data

This repository is dedicated to analyzing Amazon sales data to identify trends and insights that can help improve sales strategies and performance.

amazon beautifulsoup data-analysis data-science data-visualization ecommerce machine-learning matlpotlib numpy pandas python sales skilearn

Last synced: 10 Jul 2025

https://github.com/broadinstitute/pooled-cell-painting-profiling-recipe

:woman_cook: Recipe repository for image-based profiling of Pooled Cell Painting experiments

carpenter-lab cell-painting data-science in-situ-sequencing pooled-cell-painting pooled-screen recipe

Last synced: 01 Mar 2026

https://github.com/prem07a/credit-score-classification

This is ML project which is based on Classification of Credit Score

data-science fastapi feature-extraction machine-learning python3 sklearn-classify website

Last synced: 13 Apr 2025

https://github.com/ashwinpn/advanced-python

Python for Machine Learning/AI/DS, Game Theory and Convex Optimization using Python, Managing Docker in Python, Web Scraping / Development in Python using Django and Flask, Functional Programming in Python.

convex-optimization data-science docker flask functional-programming game-theory machine-learning machine-learning-algorithms python web-development web-scraping

Last synced: 13 Apr 2025

https://github.com/nok/weka-porter

Transpile trained decision trees from Weka to C, Java or JavaScript.

data-science machine-learning weka

Last synced: 09 May 2025

https://github.com/ruban2205/machine_learning_fundamentals

This repository contains a collection of fundamental topics and techniques in machine learning. It aims to provide a comprehensive understanding of various aspects of machine learning through simplified notebooks. Each topic is covered in a separate notebook, allowing for easy exploration and learning.

adaboost-classifier agglomerative-clustering apriori-algorithm data-preprocessing data-science ensemble-learning fuzzy-cmeans-clustering machine-learning machine-learning-algorithms machine-learning-models multilayer-perceptron python random-forest-classifier self-organizing-map single-layer-perceptron

Last synced: 26 Oct 2025

https://github.com/lars-quaedvlieg/swizz

Modular Python package for simple visualization and ML pipelines.

data-science latex machine-learning open-source plotting python research tables utilities

Last synced: 22 Jun 2025

https://github.com/spidy20/kaggle_kernels

It's contain a Data scince - Machine learning ,Data visualizations codes & Datasets

clustering data-science data-visualization eda kaggle-competition kaggle-dataset kaggle-scripts kmeans-clustering

Last synced: 12 Apr 2025

https://github.com/kleinhenz/wiki-network-extractor

python module for extracting link networks from wikimedia xml dumps

data-science network-graph python

Last synced: 07 May 2025

https://github.com/manikantasanjay/crop_yield_prediction_regression

Crop Yield Prediction using various ML approaches - Random-Forest Regressor, Gradient-Boosting Regressor, Decision-Tree Regressor, Support-Vector Regressor

crop-yield-prediction data-science decision-trees gradient-boosting random-forest regression-analysis support-vector-machines

Last synced: 11 Apr 2025

https://github.com/ishijo/Taylor-Swift-Lyrics

Database (.txt and .csv) of all Taylor Swift Song Lyrics upto April'23

data-science dataset datasets nlp-machine-learning taylor-swift text-mining

Last synced: 27 Jul 2025

https://github.com/epiverse-trace/epi-training-kit

An e-learning strategy for training on analysis, modelling and response to outbreaks and epidemics in Latin-America and the Caribbean

data-science e-learning epidemics training

Last synced: 07 Jul 2025

https://github.com/moindalvs/forecasting_airline_passengers_traffic

Forecast the Airlines Passengers. Prepare a document for each model explaining how many dummy variables you have created and RMSE value for each model. Finally which model you will use for Forecasting.

additive arima-forecasting data-science double-exponential-smoothing forecasting holt-winters holt-winters-forecasting multiplicative sarima-model seasonality-analysis simple-exponential-smoothing stationarity stationarity-test time-series-forecasting timeseries-analysis trend-analysis triple-exponential-smoothing

Last synced: 23 Apr 2025

https://github.com/kennethleungty/pymysql-demo

PyMySQL - Connecting Python and SQL for Data Science

data-analysis data-science mysql pandas python sql

Last synced: 12 Jul 2025

https://github.com/fxstein/code-server-python

VSCode Code Server for Python Developers and Data Scientists

code-server data-science developer docker home-automation iot python synology vscode

Last synced: 25 Jul 2025

https://github.com/kingabzpro/annual-recycled-energy-saved-in-singapore

Learn how much Singapore is saving energy per years by recycling plastics, paper, glass, ferrous and non-ferrous metal

cleaning-data data-analysis data-science deepnote energy environment

Last synced: 19 Jun 2025

https://github.com/moindalvs/resume_screening_and_parser

Business objective- The document classification solution should significantly reduce the manual human effort in the HRM. It should achieve a higher level of accuracy and automation with minimal human intervention Sample Data Set Details: Resumes and financial documents

data-science doc2txt doc2vec docx-converter docx-to-pdf docx2txt pdf-document-processor pdf2txt streamlit text text-analysis text-classification text-mining text-processing unstructured-data

Last synced: 23 Apr 2025

https://github.com/ksdkamesh99/medium-blogs

It is a stack of all my medium and analytics vidya articles on different technologies in computer science like AI,ML,Deep learning and many more

analytics-vidya-articles computer-science data-science deep-learning machine-learning medium medium-blogs python

Last synced: 12 May 2025

https://github.com/the-data-dilemma/parquettohuggingface

ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.

audio-dataset audio-processing automatic-speech-recognition data-analysis data-science dataset healthcare-application huggingface huggingface-datasets pandas parquet parquet-generator python3 speech-data speech-recognition speech-to-text speech-translation

Last synced: 21 Aug 2025

https://github.com/eurobios-mews-labs/acrocord

This package provide some useful tools to interact with postgresql server using pandas dataframe

data data-science database pandas-dataframe postgresql psycopg2 python python3 sqlalchemy table-factory

Last synced: 15 Apr 2025

https://github.com/kalyan4636/python-eering

PYTHON PROJECT WITH SOURCE CODE. the best Python project name is one that is descriptive, memorable, and fun for you to say. Don't be afraid to get creative and use emojis to make your project stand out! 📈

artificial-intelligence artificial-intelligence-algorithms data-science deep-learning django framework machine-learning machine-learning-algorithms numpy opencv opencv-python opensource pandas pil-tinker pillow python python-3 python-library python3

Last synced: 23 Apr 2025

https://github.com/arm-university/smart-school-projects

A collection of accessible and engaging projects for teachers and learners that utilise the more advanced features of Arduino in real-world contexts.

arduino coding computerscience computing data-science education educationprojects pbl physical-computing projects stem

Last synced: 15 Jun 2025

https://github.com/carpentries-incubator/open-science-with-r

Carpentry-style lesson on how to use R, RStudio together with git & Github to promote Open Science practices.

alpha carpentries data-science dplyr ggplot2 git github lesson open-science r rstudio scripting tidyr

Last synced: 02 Sep 2025

https://github.com/chrislemke/autoembedder

PyTorch autoencoder with additional embeddings layer for categorical data 🚘

anomaly-detection autoencoder data-science embedding machine-learning neural-network python pytorch pytorch-ignite

Last synced: 15 Apr 2025

https://github.com/akkefa/ml-notes

Notes for Mathematics for Machine learning and Data Science.

book computer-science data-science linear-algebra mathematics notes probability statistics topics

Last synced: 04 Feb 2026

https://github.com/ahammadnafiz/predicta

Predicta: Simplify your workflow with our powerful data analysis and machine learning tool.

analytics data-science data-visualization dataanalysis machine-learning pandas project python streamlit streamlit-webapp webapp

Last synced: 28 Jul 2025

https://github.com/smac-group/ds

:notebook: This book is currently under development and has been designed as a support for students who are following (or are interested in) courses that provide the basic knowledge to master "statistical programming" with R. Compiled textbook:

data-science github programming r rstudio statistics

Last synced: 22 Jul 2025

https://github.com/fredhutch/tfcb_2022

Course website for MCB 536 Tools for Computational Biology

data-science

Last synced: 16 Aug 2025

https://github.com/nemeslaszlo/sentiment-analysis-and-stock-values

Sentiment analysis of economic news headlines and examining their effects on stock market changes without the full article or analysis. Awareness and click generation are important roles for business news headlines as well. The effect can be demonstrated.

bert data-science data-visualization nltk recurrent-neural-network tensorflow textblob vader-sentiment-analysis

Last synced: 25 Jul 2025

https://github.com/x-tabdeveloping/rvfln

A Python implementation of random vector functional networks and broad learning systems using Sklearn's Regressor and classifier APIs

broad-learning data-science deep-learning machine-learning scikit-learn sklearn sklearn-compatible

Last synced: 22 Mar 2025

https://github.com/badr-moufad/cookiecutter-simple-ds-project

A simple cookiecutter template to structure your Data Science projects.

cookiecutter data-science project-structure python simple-ds-project

Last synced: 23 Apr 2025

https://github.com/rafaelpermec/live-broker-api

Um estudo sobre raspagem de dados em back-end, simulando uma corretora que realiza ações de compra e venda de ativos e fluxo de caixa de clientes em tempo real.

authentication authorization backend-api cheerio data-science express helmet jwt-authentication mysql nodejs typescript web-scraping

Last synced: 19 Apr 2025

https://github.com/timkong21/polyp-segmentation

Polyp segmentation tool utilizing U-Net for accurate medical image analysis, designed to enhance early detection and diagnosis of colorectal cancer. Features a user-friendly Streamlit web app for easy image processing and analysis, leveraging the Kvasir-SEG dataset for improved healthcare outcomes.

aws-s3 cancer-detection colonoscopy computer-vision data-augmentation data-science deep-learning diagnostics healthcare machine-learning medical-application medical-image-analysis medical-image-processing medical-image-segmentation opencv polyp-segmentation python streamlit tensorflow u-net

Last synced: 14 Apr 2025

https://github.com/dataship/python-dataship

Lightweight tools for reading, writing and storing data, locally and over the internet for python

column-store data-science machine-learning numpy pandas

Last synced: 23 Apr 2025

https://github.com/westlake-ai/dmt-learn

An Explainable Deep Network for Dimension Reduction

data-science dimension-reduction python

Last synced: 07 Aug 2025