An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/tushar2704/everyday-sql

Welcome to Everyday SQL Sheets โ€“ your go-to resource for everyday SQL cheat sheets, pro tips, interview questions, and more. Whether you're a beginner looking to learn SQL or an experienced developer seeking quick reference materials, this application has got you covered.

artificial-intelligence cheatsheet data-analysis data-science database mysql postgresql query-language sql sqlalchemy streamlit streamlit-tushar2704 tushar2704

Last synced: 05 Apr 2026

https://github.com/josechirif/job-studies-relationship-a-kaggle-survey

Trabajo de Data science sobre una encuesta de empleo de Kaggle. Link en el notebook y README

data data-science python

Last synced: 25 Jun 2025

https://github.com/georgiosioannoucoder/vera

Voice Emotion Recognition of Audio (VERA) is an open-source project created for the Data Science track for the program CUNY Tech Prep (CTP) in Cohort 8. ๐Ÿ”Š

audio-classification classification cnn-model data-science emotion emotion-recognition librosa machine-learning speech-emotion-recognition voice-emotion

Last synced: 09 Aug 2025

https://github.com/iguptashubham/metro-operations-optimization

Metro Operations Optimization refers to the systematic process of enhancing the efficiency, reliability, and effectiveness of Metro services through various data-driven techniques and operational adjustments.

data-analysis-project data-science data-science-projects

Last synced: 11 Jun 2026

https://github.com/vanessaaleung/ds-interview-prep

A website that helps practice Data Science interview questions

data-science firebase react web

Last synced: 18 May 2026

https://github.com/vrathi101/taxfilingfusion

This package will provide the ability for users to access IRS data combined with geographic data in a powerful way.

data-science geographic-data python tax-database

Last synced: 29 Aug 2025

https://github.com/srinivasrm/mutual-funds-analysis-and-prediction

In this project I have performed analysis and prediction on 1,3,and 5 year returns on 1064 mutual funds in India. I have scraped data from a website which is the most visited website for mutual fund investments.I have tested regression models linear model,SGD Regressor , Random Forest Regressor,Decision Tree Regressor,Ridge,MLP Regressor and linear model (Lasso).After which I have selected the best perorming model and performed Hyper parameter tuning and then deployed an interactive application which can generate the visualization and send an email with the visualization to the users email address.

beautifulsoup data-analysis data-base data-cleaning data-science deployment etl finanace frontend funds machine-learning mutual mutual-funds pgsql python scikit-learn sql streamlit web webapplication

Last synced: 27 Oct 2025

https://github.com/numbats/cassowaryr

Compute scagnostics on your scatterplots

data-science data-visualization eda high-dimensional-data multivariate

Last synced: 19 Feb 2026

https://github.com/jimbrig/lossrunAnalyzer

R Package and Shiny App to Analyze Insurance Lossruns

actuarial data-analysis data-mining data-science insurance r record-linkage risk-management shiny

Last synced: 30 Jul 2025

https://github.com/monasri001/ai-based-job-recommendation-system

An AI-powered job recommendation system leveraging machine learning and MongoDB to match users with suitable job opportunities based on skills, experience, and locations.

ai automation-recommendation-engine data-science job-recommendation-system machine-learning mongodb naive-bayes python scikit-learn

Last synced: 30 Oct 2025

https://github.com/trflorian/image-processing

Explore multiprocessing for an image processing task

data-science opencv-python python

Last synced: 15 Apr 2025

https://github.com/negativenagesh/spam-ham_email_detection_machine_learning

This project focuses on classifying spam/ham emails, using machine learning algorithms like LGR, NB, RF, DT etc.. and based on the accuracy score and precision score I chose logistic regression for the classification. And I have used streamlit for frontend.

app data-analysis data-cleaning data-engineering data-science data-visualization data-visualizations jupyter-notebook logistic-regression machine-learning modeling naive-bayes-classifier nlp python

Last synced: 12 Apr 2025

https://github.com/rsangole/blog

My personal blog documenting some work in R, stats and ML

data-science machine-learning personal-blog quarto r time-series visualization

Last synced: 12 May 2025

https://github.com/shamspias/gpt3-data-preprocessing

This repository containing code for preprocessing text data from PDF and DOCX files for use with GPT-3. It includes steps such as tokenization, removal of stop words and punctuation, and formatting for GPT-3 input.

artificial-intelligence data-preprocessing data-preprocessing-pipelines data-science gpt-3 machine-learning

Last synced: 30 Jul 2025

https://github.com/afondiel/cs-courses

This is a list of Computer Science free courses and resources available on Github and internet.

ai algorithms compter-vision computer-science computer-science-courses cpp data-science free-courses machine-learning python rust software-engineering

Last synced: 22 Jul 2025

https://github.com/cbozan/graduation-project

Graduation project categorizes popular search phrases using Python and Spark and presents them on a website to inspire creators.

crisp-dm data-cleaning data-science machine-learning nlp nlp-machine-learning spark spark-mllib

Last synced: 14 Jul 2025

https://github.com/chuongmep/jupyterbim

A project allows explore power of Data in AEC by connect interactive with Jupyter Notebook.

aec ai autodesk bim bimdata data-science

Last synced: 22 Aug 2025

https://github.com/yash22222/tata-data-visualisation-virtual-internship

Data Visualisation: Empowering Business with Effective Insights Gain insights into leveraging data visualisations as a tool for making informed business decisions.

basics ceo charts cmo data-analysis data-interpretation data-science data-visualization graphs machine-learning mcq microsoft-excel microsoft-power-bi microsoft-word powerpoint-presentations python tableau tata tata-data-visualisation

Last synced: 22 Jul 2025

https://github.com/mch-fauzy/data-science

Repository containing portfolio of data science and machine learning projects. Presented in the form of iPython Notebooks

data-analysis data-science data-visualization ipython-notebooks machine-learning natural-language-processing portfolio

Last synced: 24 Sep 2025

https://github.com/mrgeislinger/lesson-git-for-data-science

Lesson for using git and GitHub for data scientist. Associated lecture recording in README

data-science git github lesson

Last synced: 12 May 2025

https://github.com/kingabzpro/data-science-projects-on-datacamp

Data Scince guided and guided course on Datacamp, learning more about Data cleaning, pipeline, Visualization, predictive analysis, Machne learning with both python and R

data-science datacamp learning machine-learning postgresql python r sql visualization

Last synced: 27 Sep 2025

https://github.com/cosmoduende/r-ufo-sightings

Are we alone in the universe?โ€Š-โ€ŠData Analysis and Data Visualization of UFO sightings withย R. How to analyze and visualize data of UFO sightings of the last century in the USA and the rest of theย world with R language.

data-analysis data-analytics data-science data-visualisation data-visualization data-visualizations dataviz ovni ovni-dataset r-code r-language r-programming r-stats ufo ufo-analysis ufo-dataset ufo-sighting ufo-sightings

Last synced: 13 May 2025

https://github.com/louis-heraut/card

๐ŸŽด Card of Analyse and Diagnostic in R for a user-friendly experience of data aggregation with parametrisation file.

aggregation climate-change climate-data climate-science data-analysis data-science diagnostic environment environment-variables hydrology hydrology-statistical inrae r statistics tools user-friendly

Last synced: 09 Mar 2026

https://github.com/arv-anshul/ineuron-money-laundering

A project from Ineuron Internship portal to build a ML model to predict the Money Laundering.

data-science ineuron-ai internship machine-learning project python3

Last synced: 13 Sep 2025

https://github.com/sayakpaul/mlplanner

Contains data, notebooks and other files of FloydHub's mini-series on machine learning project structuring, model debugging, various tips and tricks and more

data-science deep-learning floydhub machine-learning

Last synced: 17 Sep 2025

https://github.com/shervinnd/btc_close_price_predict_ml

Predicting the price of Bitcoin closes with machine learning method and testing linear modes and using linear regression model.

bitcoin cryptocurrency data data-science datamining finance linear-regression linerregression machine-learning machine-learning-algorithms machinelearning ml numpy pandas predictive-modeling python regression sklearn

Last synced: 24 Oct 2025

https://github.com/bartekpog/messenger-analysis

Messenger chat analyzer. Take a look at the in-depth study of your chat history.

algorithms data-science data-visualization exploration messenger python text-analysis

Last synced: 22 Mar 2025

https://github.com/thecoderpinar/earthquake_prediction_analysis_project

๐ŸŒ Welcome to the Earthquake Prediction Analysis Project! ๐Ÿš€ This project aims to predict earthquake magnitudes using LSTM neural networks and analyze seismic data. Explore, analyze, and forecast earthquakes with ease! ๐Ÿ“ˆ๐Ÿ”ฎ

analysis data-analysis data-science earthquake-prediction geocoding geology lstm lstm-neural-networks machine-learning matlab matlab-deep-learning open-source time-series visualization

Last synced: 16 Aug 2025

https://github.com/fatihilhan42/spacex_falcon-9_first_stage_landing_prediction

IBM project: SpaceX launch analysis in Python (gather data - data wrangling - sql and visualization data analysis - prediction model - dashboard - final report)

data-science elonmusk ibm machine-learning prediction spacex spacex-api spacex-launches

Last synced: 24 Jul 2025

https://github.com/sblack4/microsoft-professional-program-data-science

Notes๐Ÿ“ from the Microsoft Professional Program Data Science track offered on edx.org

data-science edx-course machine-learning microsoft mooc notes python

Last synced: 05 Mar 2026

https://github.com/vicotrbb/pylexitext

Pylexitext is a python library that aggregates a series of NLP methods, text analysis, content converters and other usefull stuff.

data-science machine-learning nlp python python3

Last synced: 14 Apr 2025

https://github.com/baptvit/artificial_intelligence

My courses and activities in Artificial Intelligence

data-science deep-learning excel machine-learning python r

Last synced: 22 Jul 2025

https://github.com/immunogenomics/amp_phase1_ra_viewer

๐ŸŒป View single-cell RNA-seq and mass cytometry data in synovial tissues from patients with RA or OA.

cytof-data data-science data-visualization r rna-seq-data shiny-apps single-cell-rna-seq

Last synced: 27 Jun 2025

https://github.com/gbennnn/learn-data-science

Data Science for Beginner | Repo Kuliah Pengantar Data Science

data-science jupyter-notebook python

Last synced: 26 Jun 2025

https://github.com/radanalyticsio/base-notebook

An image for running Jupyter notebooks and Apache Spark in the cloud on OpenShift

apache-spark data-science jupyter-notebook notebook openshift

Last synced: 06 Apr 2025

https://github.com/ahammadmejbah/data-science-interview-questions

Data Science is an interdisciplinary field that uses scientific methods, algorithms, and data analysis to extract valuable insights and knowledge from large and complex datasets, helping organizations make data-driven decisions and solve problems.

data-mining data-science data-visualization datascience interview machine-learning python

Last synced: 09 Jul 2025

https://github.com/uznetdev/global-internet-users

This repository contains a project that analyzes global internet user data. The project includes scripts and tools to visualize and interpret various aspects of internet usage across the world. You can use this project with MIT license!

data-science matplotlib pandas pandas-dataframe pandas-python plotly plotly-dash python python3 seabor seaborn streamlit streamlit-application streamlit-dashboard streamlit-web streamlit-webapp visualization web website

Last synced: 11 Apr 2025

https://github.com/umitkaanusta/fraudringdetection-trustnetworks-trying-new-approach

Detecting fraud rings in a p2p trust network with high AUC score (better than 9 of 10 well-known algorithms)* using basic and intuitive trust metrics

data-science fraud fraud-detection graph graph-theory machine-learning network network-science social-network-analysis

Last synced: 27 Mar 2025

https://github.com/xperimental/ipromnb

Jupyter notebook kernel for running Prometheus queries.

data-science jupyter-notebook jupyter-notebook-kernel prometheus

Last synced: 22 Jun 2025

https://github.com/juliasouz/julia-projects

A collection of projects developed in Julia for learning and practice.

data-science julia julia-language julialang scientific-computing

Last synced: 10 Apr 2025

https://github.com/systemvll/censys-client

A censys.io client that allow to you to use multiple api key

censys-api censys-client censys-search data-science shodan

Last synced: 21 Jun 2025

https://github.com/sergio11/online_payment_fraud

Fraud detection using Deep Neural Networks to predict fraudulent transactions in financial data. ๐Ÿšจ๐Ÿค– Complete process from EDA and data preprocessing to model training and evaluation. ๐Ÿ“Š๐Ÿ”

classification data-preprocessing data-science deep-neural-networks dnn exploratory-data-analysis financial-fraud fraud-detection fraud-detection-model imbalanced-data keras machine-learning neural-network python smote tensorflow

Last synced: 17 Aug 2025

https://github.com/avinashkranjan/basic-data-analysis-and-visualization-in-python

๐Ÿ“Š Some of the most important python tools in data science for Data Analysis and Data Visualization.

data-analysis data-science matplotlib matplotlib-pyplot numpy pandas plotly seabourne

Last synced: 30 Oct 2025

https://github.com/jgphilpott/polyplot

A data exploration application inspired by Ola Rosling's Trendalyzer software.

d3js data-exploration data-science ola-rosling threejs trendalyzer

Last synced: 11 Jul 2025

https://github.com/beeva-jorgezaldivar/plumberModel

Create APIs for the deployment of R models with minimal code

api caret data-science deployment machine-learning plumber r

Last synced: 30 Jul 2025

https://github.com/practicalli/clojure-data-science

Techniques and tools for data science with Clojure

book clojure data-science repl

Last synced: 01 Jul 2025

https://github.com/djego/ecommerce-peru-scrap-cli

Ecommerce Perรบ Scrap CLI is a project open source that extract products data by category and export to csv, json and other structure format files

cli data data-science python3 scraping

Last synced: 17 Mar 2025

https://github.com/thebutlah/makrl

makrl - modular algorithm kit for reinforcement learning

data-science deep-learning deep-reinforcement-learning halite neural-networks reinforcement-learning

Last synced: 18 Mar 2025

https://github.com/mohammadvhossein/tf-gym

The TF Gym repo shares daily TensorFlow projects on ML/DL, including RL, providing educational resources for beginners and practical examples for experienced users with detailed instructions for applications like image classification and text generation.

ai artificial-intelligence computer-vision data-science deep-learning iris kears machine-learning mnist modeling nlp poetry-generator tensorflow time-series translator

Last synced: 10 Apr 2025

https://github.com/croach/jupyter_report_starter_kit

A starter kit for crafting reports based on Jupyter notebooks

data-science jupyter-notebook python reproducible-research

Last synced: 14 May 2026

https://github.com/nazchanel/fake-news-detection-webapp

A Flask webapp that detects fake news with a given text input using the power of Natural Language Processing. Deployment on Heroku failed due to the program's large memory consumption.

data-science dataset keras keras-tensorflow machine-learning natural-language-processing nlp nlp-machine-learning python scikit-learn tensorflow

Last synced: 06 Mar 2026

https://github.com/omarsar/text_mining_lab_2017

Requirements for Text Mining Summer Course (Lab Session)

ai data-minig data-science deep-nlp machine-learning nlp text-mining word2vec

Last synced: 10 Apr 2025

https://github.com/kozodoi/dptools

Python package with utilities for data processing, aggregation, feature engineering and data versioning

aggregation data-preparation data-preprocessing data-science feature-engineering python

Last synced: 08 May 2025

https://github.com/mindinventory/mrmst-main

Introducing MRMST โ€“ Your go-to solution for analyzing investment portfolio risks on the fly. Whether it's the break of dawn, midnight, or right after a trade, seize control of your portfolio with unprecedented ease and precision.

artificial-intelligence data-science financial-analysis machine-learning python stock-market streamlit

Last synced: 09 Jul 2025

https://github.com/tushar2704/taipy-cookiecutter

This template provides a solid foundation for your projects, incorporating best practices and a streamlined structure. Whether you're a beginner or an experienced developer, this template will help you kickstart your projects in a Taipy manner.

data-science deep-learning llms machine-learning nlp python taipy taipy-core taipy-gui template-project tushar2704

Last synced: 07 May 2025

https://github.com/datasets/collective

๐Ÿ“ฆ DataHub Collective's home and digital garden including notes and ideas on maintaining, curating and publishing (open) data.

data-engineering data-management data-science datasets open-data open-datasets opendata

Last synced: 17 Jan 2026

https://github.com/omarsar/friendly_data_science

Material and resources for the "Friendly Data Science" YouTube series.

analytics data-science datamining deep-learning natural-language-processing neural-networks text-mining

Last synced: 07 Sep 2025

https://github.com/creativepurus/PWSKILLS-Assignments

๐ŸŒŸ This repository is related to the assignments based upon ๐Ÿง  Artificial Intelligence, ๐Ÿค– Machine Learning and ๐Ÿ’ป Data Science given by PWSKILLS for the course "DATA SCIENCE MASTERS - IMPACT BATCH 1" ๐Ÿค“๐Ÿ‘จโ€๐ŸŽ“

artificial-intelligence assignment assignment-solutions assignments creative-coding creative-commons creativepuru data-science ineuron ineuron-ai ineuron-assignments jupyter jupyter-notebook jupyter-notebooks machine-learning physicswallah project pwskills python python3

Last synced: 10 Mar 2025

https://github.com/hneth/i2ds

Introduction to data science (i2ds)

data-science education r r-package

Last synced: 05 May 2025

https://github.com/askbuddie/machine-learning-workshop-2019

Data Science & Machine Learning Workshop files

data-science dataset machine-learning

Last synced: 11 Apr 2025

https://github.com/matthewcarbone/bootcamp

A collection of tutorials and resources for data science and machine learning

data-science education machine-learning

Last synced: 07 May 2025

https://github.com/UniversalDataTool/courseware

Create instructions for labeling datasets using the Universal Data Tool

annotators courseware data-science dataset hacktoberfest label

Last synced: 04 Apr 2025

https://github.com/snth/split-apply-combine

Presentation about the split-apply-combine strategy in Data Science and Python

data-science jupyter-notebook python

Last synced: 11 Apr 2025

https://github.com/bradleyboehmke/uc-bana-7025

Additional resources for the UC BANA 7025 Data Wrangling course

data-science data-visualization data-wrangling r

Last synced: 13 Apr 2025

https://github.com/ynikitenko/lena

Lena is an architectural framework for data analysis

analysis-framework analysis-pipeline data-analysis data-science

Last synced: 30 Apr 2025

https://github.com/nafisalawalidris/bitcoin-price-analysis-before-the-2024-halving

Analysing Bitcoin's price movements pre-2024 halving using Python, data analysis and machine learning to forecast future trends in cryptocurrency markets.

analysis bitcoin bitcoin-price cryptocurrency data-science machinelearning prediction python

Last synced: 14 Jul 2025

https://github.com/walterowisk/dio_labproject-pipeline-etl-python

Desafio de projeto proposto pela DIO dentro do Santander Bootcamp 2023 - Ciรชncia de Dados com Python

colab-notebook data-science dio-bootcamp etl etl-pipeline google-colab python

Last synced: 12 Apr 2025

https://github.com/vsimkus/vae-conditional-sampling

[TMLR] Research code for the paper "Conditional Sampling of Variational Autoencoders via Iterated Approximate Ancestral Sampling".

conditional-sampling data-science importance-sampling incomplete-data mcmc missing-data vae

Last synced: 30 Jun 2026

https://github.com/anshumansinha3301/bitfusion_dynamics_gene_project

Project Based on Gene expression and concept related to genetics made using python(Matplotlib)

data-science genetics matplotlib

Last synced: 10 Jul 2025