An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/mindinventory/mrmst-main

Introducing MRMST – Your go-to solution for analyzing investment portfolio risks on the fly. Whether it's the break of dawn, midnight, or right after a trade, seize control of your portfolio with unprecedented ease and precision.

artificial-intelligence data-science financial-analysis machine-learning python stock-market streamlit

Last synced: 09 Jul 2025

https://github.com/systemvll/censys-client

A censys.io client that allow to you to use multiple api key

censys-api censys-client censys-search data-science shodan

Last synced: 21 Jun 2025

https://github.com/snth/split-apply-combine

Presentation about the split-apply-combine strategy in Data Science and Python

data-science jupyter-notebook python

Last synced: 11 Apr 2025

https://github.com/xperimental/ipromnb

Jupyter notebook kernel for running Prometheus queries.

data-science jupyter-notebook jupyter-notebook-kernel prometheus

Last synced: 22 Jun 2025

https://github.com/ahammadmejbah/data-science-interview-questions

Data Science is an interdisciplinary field that uses scientific methods, algorithms, and data analysis to extract valuable insights and knowledge from large and complex datasets, helping organizations make data-driven decisions and solve problems.

data-mining data-science data-visualization datascience interview machine-learning python

Last synced: 09 Jul 2025

https://github.com/anshumansinha3301/bitfusion_dynamics_gene_project

Project Based on Gene expression and concept related to genetics made using python(Matplotlib)

data-science genetics matplotlib

Last synced: 10 Jul 2025

https://github.com/hneth/i2ds

Introduction to data science (i2ds)

data-science education r r-package

Last synced: 05 May 2025

https://github.com/umitkaanusta/fraudringdetection-trustnetworks-trying-new-approach

Detecting fraud rings in a p2p trust network with high AUC score (better than 9 of 10 well-known algorithms)* using basic and intuitive trust metrics

data-science fraud fraud-detection graph graph-theory machine-learning network network-science social-network-analysis

Last synced: 27 Mar 2025

https://github.com/djego/ecommerce-peru-scrap-cli

Ecommerce Perú Scrap CLI is a project open source that extract products data by category and export to csv, json and other structure format files

cli data data-science python3 scraping

Last synced: 17 Mar 2025

https://github.com/cosmoduende/r-lego

StrangeR things: Building 2D and 3D models… with LEGO bricks? in R. How to emulate and visualize LEGO bricks in 2D and 3D with the "brickr" package in R

brickr data-analytics data-science data-visualization dataviz lego lego-sets lego-universe r-language r-library r-package r-programming r-script r-studio rstats rstats-package rstudio

Last synced: 11 Apr 2025

https://github.com/thebutlah/makrl

makrl - modular algorithm kit for reinforcement learning

data-science deep-learning deep-reinforcement-learning halite neural-networks reinforcement-learning

Last synced: 18 Mar 2025

https://github.com/bradleyboehmke/uc-bana-7025

Additional resources for the UC BANA 7025 Data Wrangling course

data-science data-visualization data-wrangling r

Last synced: 13 Apr 2025

https://github.com/nazchanel/fake-news-detection-webapp

A Flask webapp that detects fake news with a given text input using the power of Natural Language Processing. Deployment on Heroku failed due to the program's large memory consumption.

data-science dataset keras keras-tensorflow machine-learning natural-language-processing nlp nlp-machine-learning python scikit-learn tensorflow

Last synced: 06 Mar 2026

https://github.com/tushar2704/taipy-cookiecutter

This template provides a solid foundation for your projects, incorporating best practices and a streamlined structure. Whether you're a beginner or an experienced developer, this template will help you kickstart your projects in a Taipy manner.

data-science deep-learning llms machine-learning nlp python taipy taipy-core taipy-gui template-project tushar2704

Last synced: 07 May 2025

https://github.com/uznetdev/global-internet-users

This repository contains a project that analyzes global internet user data. The project includes scripts and tools to visualize and interpret various aspects of internet usage across the world. You can use this project with MIT license!

data-science matplotlib pandas pandas-dataframe pandas-python plotly plotly-dash python python3 seabor seaborn streamlit streamlit-application streamlit-dashboard streamlit-web streamlit-webapp visualization web website

Last synced: 11 Apr 2025

https://github.com/datasets/collective

📦 DataHub Collective's home and digital garden including notes and ideas on maintaining, curating and publishing (open) data.

data-engineering data-management data-science datasets open-data open-datasets opendata

Last synced: 17 Jan 2026

https://github.com/vsimkus/vae-conditional-sampling

[TMLR] Research code for the paper "Conditional Sampling of Variational Autoencoders via Iterated Approximate Ancestral Sampling".

conditional-sampling data-science importance-sampling incomplete-data mcmc missing-data vae

Last synced: 30 Jun 2026

https://github.com/UniversalDataTool/courseware

Create instructions for labeling datasets using the Universal Data Tool

annotators courseware data-science dataset hacktoberfest label

Last synced: 04 Apr 2025

https://github.com/amineouerfellii/econometron

A Python package for time series forecasting and economic analysis, providing tools for simulation, estimation, and model evaluation with a focus on scalability and research applications.

data-science deep-learning dsge-models econometrics forecasting localprojections macroeconometrics prediction projection-methods time-series

Last synced: 02 Apr 2026

https://github.com/nafisalawalidris/bitcoin-price-analysis-before-the-2024-halving

Analysing Bitcoin's price movements pre-2024 halving using Python, data analysis and machine learning to forecast future trends in cryptocurrency markets.

analysis bitcoin bitcoin-price cryptocurrency data-science machinelearning prediction python

Last synced: 14 Jul 2025

https://github.com/kalebu/worldmeter-coronavirus-scraper

A python program that tracks coronavirus statistics based on the worldometer website

beautifulsoup coronavirus data-extraction data-science python-tanzania tanzania webscraping worldmeter-coronavirus-scraper

Last synced: 08 May 2025

https://github.com/shridhar1504/sales-forecasting-datascience-project

Develop a data science project using historical sales data to build a regression model that accurately predicts future sales. Preprocess the dataset, conduct exploratory analysis, select relevant features, and employ regression algorithms for model development. Evaluate model performance, optimize hyperparameters, and provide actionable insights.

data-analytics data-cleaning data-science data-testing data-visualization forecasting-models machin model-evaluation model-fitting prediction predictive-modeling python3 regression-algorithms salesforecast sklearn-library supervised-learning

Last synced: 30 Oct 2025

https://github.com/bhavik-jikadara/peowt

Predicting the Energy Output of Wind Turbine Based on Weather Conditions

data-science deep-learning kaggle lstm machine-learning mysql nlp prediction python sklearn sql windpo

Last synced: 04 Apr 2025

https://github.com/omarsar/text_mining_lab_2017

Requirements for Text Mining Summer Course (Lab Session)

ai data-minig data-science deep-nlp machine-learning nlp text-mining word2vec

Last synced: 10 Apr 2025

https://github.com/equinor/sti

An investigation into the feasibility of using deep neural networks for local planning problems in well trajectory engineering.

data-science deep-neural-networks ml trajectory trajectory-optimization well wellbore

Last synced: 01 May 2025

https://github.com/beeva-jorgezaldivar/plumberModel

Create APIs for the deployment of R models with minimal code

api caret data-science deployment machine-learning plumber r

Last synced: 30 Jul 2025

https://github.com/creativepurus/PWSKILLS-Assignments

🌟 This repository is related to the assignments based upon 🧠 Artificial Intelligence, 🤖 Machine Learning and 💻 Data Science given by PWSKILLS for the course "DATA SCIENCE MASTERS - IMPACT BATCH 1" 🤓👨‍🎓

artificial-intelligence assignment assignment-solutions assignments creative-coding creative-commons creativepuru data-science ineuron ineuron-ai ineuron-assignments jupyter jupyter-notebook jupyter-notebooks machine-learning physicswallah project pwskills python python3

Last synced: 10 Mar 2025

https://github.com/javaidiqbal11/time-series-forecasting-using-arima-sarima

This repo for time series forecasting using ARIMA and SARIMA models with Python 3.x

arima-model data-science forecasting-models python3 sarima-model time-series

Last synced: 08 Sep 2025

https://github.com/rhenkin/visxhclust

A Shiny app and functions for visual exploration of hierarchical clustering.

clustering data-analysis data-science r r-package r-shiny rstats shiny-apps

Last synced: 02 Apr 2025

https://github.com/srinivasrm/mutual-funds-analysis-and-prediction

In this project I have performed analysis and prediction on 1,3,and 5 year returns on 1064 mutual funds in India. I have scraped data from a website which is the most visited website for mutual fund investments.I have tested regression models linear model,SGD Regressor , Random Forest Regressor,Decision Tree Regressor,Ridge,MLP Regressor and linear model (Lasso).After which I have selected the best perorming model and performed Hyper parameter tuning and then deployed an interactive application which can generate the visualization and send an email with the visualization to the users email address.

beautifulsoup data-analysis data-base data-cleaning data-science deployment etl finanace frontend funds machine-learning mutual mutual-funds pgsql python scikit-learn sql streamlit web webapplication

Last synced: 27 Oct 2025

https://github.com/mohammadvhossein/tf-gym

The TF Gym repo shares daily TensorFlow projects on ML/DL, including RL, providing educational resources for beginners and practical examples for experienced users with detailed instructions for applications like image classification and text generation.

ai artificial-intelligence computer-vision data-science deep-learning iris kears machine-learning mnist modeling nlp poetry-generator tensorflow time-series translator

Last synced: 10 Apr 2025

https://github.com/avinashkranjan/basic-data-analysis-and-visualization-in-python

📊 Some of the most important python tools in data science for Data Analysis and Data Visualization.

data-analysis data-science matplotlib matplotlib-pyplot numpy pandas plotly seabourne

Last synced: 30 Oct 2025

https://github.com/tushar2704/everyday-sql

Welcome to Everyday SQL Sheets – your go-to resource for everyday SQL cheat sheets, pro tips, interview questions, and more. Whether you're a beginner looking to learn SQL or an experienced developer seeking quick reference materials, this application has got you covered.

artificial-intelligence cheatsheet data-analysis data-science database mysql postgresql query-language sql sqlalchemy streamlit streamlit-tushar2704 tushar2704

Last synced: 05 Apr 2026

https://github.com/matthewcarbone/bootcamp

A collection of tutorials and resources for data science and machine learning

data-science education machine-learning

Last synced: 07 May 2025

https://github.com/nguyenanht/john-toolbox

This is my own toolbox to explore data science

data-science machine-learning pipeline python pytorch scikit-learn

Last synced: 10 Apr 2025

https://github.com/ewilk0/sklearn_special_ensembles

A library that creates robust, special-purpose ensembles from sklearn-type base models.

artificial-intelligence data-science ensemble-learning machine-learning

Last synced: 10 Apr 2025

https://github.com/sergio11/online_payment_fraud

Fraud detection using Deep Neural Networks to predict fraudulent transactions in financial data. 🚨🤖 Complete process from EDA and data preprocessing to model training and evaluation. 📊🔍

classification data-preprocessing data-science deep-neural-networks dnn exploratory-data-analysis financial-fraud fraud-detection fraud-detection-model imbalanced-data keras machine-learning neural-network python smote tensorflow

Last synced: 17 Aug 2025

https://github.com/shervinnd/btc_close_price_predict_ml

Predicting the price of Bitcoin closes with machine learning method and testing linear modes and using linear regression model.

bitcoin cryptocurrency data data-science datamining finance linear-regression linerregression machine-learning machine-learning-algorithms machinelearning ml numpy pandas predictive-modeling python regression sklearn

Last synced: 24 Oct 2025

https://github.com/josechirif/job-studies-relationship-a-kaggle-survey

Trabajo de Data science sobre una encuesta de empleo de Kaggle. Link en el notebook y README

data data-science python

Last synced: 25 Jun 2025

https://github.com/juliasouz/julia-projects

A collection of projects developed in Julia for learning and practice.

data-science julia julia-language julialang scientific-computing

Last synced: 10 Apr 2025

https://github.com/croach/jupyter_report_starter_kit

A starter kit for crafting reports based on Jupyter notebooks

data-science jupyter-notebook python reproducible-research

Last synced: 14 May 2026

https://github.com/omarsar/friendly_data_science

Material and resources for the "Friendly Data Science" YouTube series.

analytics data-science datamining deep-learning natural-language-processing neural-networks text-mining

Last synced: 07 Sep 2025

https://github.com/ynikitenko/lena

Lena is an architectural framework for data analysis

analysis-framework analysis-pipeline data-analysis data-science

Last synced: 30 Apr 2025

https://github.com/baptvit/artificial_intelligence

My courses and activities in Artificial Intelligence

data-science deep-learning excel machine-learning python r

Last synced: 22 Jul 2025

https://github.com/walterowisk/dio_labproject-pipeline-etl-python

Desafio de projeto proposto pela DIO dentro do Santander Bootcamp 2023 - Ciência de Dados com Python

colab-notebook data-science dio-bootcamp etl etl-pipeline google-colab python

Last synced: 12 Apr 2025

https://github.com/nasdin/kaggle-competitions-nasdin

Competitive Data Science, ML & AI competitions on Kaggle. A repo for my projects and progress. My road to becoming a kaggle grandmaster.

ai data-science kaggle kaggle-competition machine-learning python

Last synced: 12 Apr 2025

https://github.com/inoueakimitsu/wbic_bml

Statistical Causal Inference Library using Bayesian Mixed LiNGAM and WBIC

causal-inference data-science model-selection pymc3 python statistics

Last synced: 07 May 2025

https://github.com/turingtest37/sequencerj.jl

Julia-language port of the Sequencer algorithm, originally developed in python (https://github.com/dalya/Sequencer). The Sequencer finds trends in 1-dimensional data sets and has been used by its original authors for data analysis in astrophysics, seismology, image processing, etc. Contributions are welcome!

analysis data-science julia julia-language julia-library trend

Last synced: 07 May 2025

https://github.com/mert-byrktr/scrape-epl-data

Scrape English Premier League data from fbref and save it as csv for future works.

beautifulsoup beautifulsoup4 data-science epl football football-data pandas python python3 requests scraping webscraping

Last synced: 07 May 2025

https://github.com/creativepurus/pwskills-projects

🌟 This repository is related to the projects related to 🤖🧠💻 Artificial Intelligence, Machine Learning and Data Science given by PWSKILLS for the course "DATA SCIENCE MASTERS - IMPACT BATCH 1" 🚀

artificial-intelligence artificial-neural-networks creativepuru data-science datascience-machinelearning ineuron ineuron-ai ineuron-assignments ineuronassignment jupyter jupyter-notebook machine-learning powerbi pwskills python python3 pythonproject pythonprojects readme readme-profile

Last synced: 12 Apr 2025

https://github.com/praveen1664/easy-machine-learning

This is a curated list of Easy machine learning frameworks, libraries and software (by language

c cpp data-science deep-learning machine-learning neural-network

Last synced: 10 May 2026

https://github.com/anil951/diagno-guide

DiagnoGuide : AI for Precise Diagnosis, Personalized Medication, & Nearby Hospitals

data-science disease-prediction hospital-locator machine-learning medical medication medication-reminder ml random-forest svm

Last synced: 12 Apr 2025

https://github.com/pabvald/julia-for-data-science

Exercises from the course Julia for Data Science. Guide of the most important Julia libraries for Data Science.

course data-science julia library-catalogue

Last synced: 11 Oct 2025

https://github.com/fatihilhan42/data-science-projects

In this repo, there are (beginner-upper) level projects in the field of data science. I will host these projects that I have done in this field every day in this repo. With the hope that it will be useful to those who are interested in the field of data science like me and will just start...

data-analysis data-engineering data-mining data-science data-structures data-visualization database datascience fatihilhan fortytwo fortytwofficial jupyter-notebook python

Last synced: 11 Oct 2025

https://github.com/quantumudit/analyzing-whiskyexchange-whisky

This project focuses on scraping data related to Japanese Whiskey from the Whiskey Exchange website; performing necessary transformations on the scraped data and then analyzing & visualizing it using Jupyter Notebook and Power BI.

data-analysis data-science data-transformation data-visualization etl jupyter-notebook power-bi python webscraping

Last synced: 02 May 2026

https://github.com/mnpsnuwan/dj_ds

Django Data Science report application.

data-science data-visualization django html-css-javascript python reports

Last synced: 27 Jan 2026

https://github.com/gkar90/realized-volatility

Realized Volatility for stocks in Python

data-science data-visualization trading volatility

Last synced: 30 Apr 2025

https://github.com/sondosaabed/finding-charity-donors

In this project I apply supervised learning techniques and using analytical mind on data collected for the U.S. The Census to help CharityML (a fictitious charity organization) and identify people most likely to donate to their cause.

data-science donors intro-to-machine-learning-udacity machine-learning machine-learning-algorithms nanodegree python supervised-learning supervised-machine-learning udacity-nanodegree

Last synced: 19 Feb 2026

https://github.com/cparmet/pandas-checks

🐼🩺 Pandas Checks: Non-invasive health checks for Pandas method chains

data-analysis data-engineering data-science method-chaining pandas

Last synced: 27 May 2026

https://github.com/elc/jupyter-book-template-cookiecutter

A cookiecutter version of the Book Template repo for Jupyter Book

book cookiecutter cookiecutter-template data-science jupyter jupyter-book template

Last synced: 20 Oct 2025

https://github.com/mch-fauzy/data-science

Repository containing portfolio of data science and machine learning projects. Presented in the form of iPython Notebooks

data-analysis data-science data-visualization ipython-notebooks machine-learning natural-language-processing portfolio

Last synced: 24 Sep 2025

https://github.com/ethan-wickstrom/rrrs

Welcome to RRRS, a rapid, hyper-optimized CSV random sampling tool designed with performance and efficiency at its core. Crafted meticulously in Rust, RRRS offers an unparalleled solution for extracting random data samples from CSV files swiftly and effortlessly.

analytics cli command-line command-line-tool data data-analysis data-science dataset rust rust-lang sample samples

Last synced: 16 May 2025

https://github.com/anas436/ai-smart-attendance-system

A Face Recognition Web App for Smart Attendance System Using AI

artificial-intelligence computer-vision data-science face-recognition streamlit

Last synced: 16 Oct 2025

https://github.com/knaaptime/rypyrx

reproducible papers with quarto and pixi

academia data-science reproducible-research

Last synced: 07 May 2025

https://github.com/sondosaabed/preprocessing-for-machine-learning-in-python

DataCamp inetrmediate course on how and when to perform data preprocessing in any machine learning project to get the data ready for modeling

data-preprocessing data-science data-scientist datacamp-course machine-learning machine-learning-pipeline python

Last synced: 09 Apr 2025

https://github.com/sondosaabed/datacamp-data-scientist-track

26+ Hourse of learning Data Science with Python and SQL career track. To prepare for the Data Scientist with Python Certification

data-science datacamp intermediate-sql joining machine-learning python python-packages sql

Last synced: 09 Apr 2025

https://github.com/radanalyticsio/base-notebook

An image for running Jupyter notebooks and Apache Spark in the cloud on OpenShift

apache-spark data-science jupyter-notebook notebook openshift

Last synced: 06 Apr 2025

https://github.com/louis-heraut/card

🎴 Card of Analyse and Diagnostic in R for a user-friendly experience of data aggregation with parametrisation file.

aggregation climate-change climate-data climate-science data-analysis data-science diagnostic environment environment-variables hydrology hydrology-statistical inrae r statistics tools user-friendly

Last synced: 09 Mar 2026

https://github.com/timkong21/medical-appointment-no-show-prediction

A machine learning solution predicting patient no-shows in healthcare appointments. This project integrates EDA, data processing, feature engineering, and XGBoost modeling, with a workflow spanning from Snowflake data retrieval to AWS deployment (S3, SageMaker, Lambda, API Gateway), aiming to enhance appointment management in medical ERP systems.

api aws aws-lambda aws-s3 data-preprocessing data-science exploratory-data-analysis feature-engineering healthcare hyperopt hyperparameter-tuning hypothesis-testing machine-learning predictive-modeling python sagemaker snowflake sql statistical-analysis xgboost

Last synced: 26 Feb 2026