An open API service indexing awesome lists of open source software.

Data Science

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge from structured and unstructured data. Data scientists perform data analysis and preparation, and their findings inform high-level decisions in many organizations.

https://github.com/just-krivi/real-estate-market-analysis

Streamlit web app using custom ML models (multiple linear regression and one-to-many multiclass kernel SVM) for predicting real estate prices; Scraping and analyzing real estate listings in Serbia

data-science docker gradient-descent machine-learning multiclass-support-vector-machine multiple-linear-regression postgresql python scrapy stramlit svm webscraping

Last synced: 04 Oct 2025

https://github.com/thetallprogrammer/stock-contender-app

Welcome to Stock Contender โ€“ an AI-powered tool designed to assist your market analysis. This tool is not an investment advisor and does not guarantee profits. Invest at your own risk. Stay updated with my latest developments.

artificial-intelligence chat-gpt data-science financial-data-analysis financial-technology fintech investment-analysis machine-learning openai openai-api python stock-market stock-prediction stock-trading

Last synced: 05 Sep 2025

https://github.com/nsembleai/nsvision

nsvision is the image data pre and post processing and data augmentation library. It provides utilities for working with image data.

data-science docker image image-classification image-manipulation image-processing jupyter library normalization numpy object-detection opencv opencv-python pillow python python-3 python-library python3 reduce-image-dimensions split-data

Last synced: 22 Feb 2026

https://github.com/tiagoantao/virtual-core

A data science core based on Docker containers

data-science docker

Last synced: 14 Mar 2026

https://github.com/valohai/minihai

An open-source application for running notebooks server-side

data-science jupyter jupyter-notebook machine-learning notebook

Last synced: 02 Aug 2025

https://github.com/yoshoku/numo-openblas

Numo::OpenBLAS builds and uses OpenBLAS as a background library for Numo::Linalg

data-science machine-learning numo openblas ruby

Last synced: 25 Apr 2025

https://github.com/tsg405/sql-for-data-science----coursera

This Repo contains - Starter files, Coursework, Programming Assignments for the course --> SQL for Data Science from University of California, Davis [COURSERA]

california chinook-database coursera data-science query-language quiz sql sqlite ucdavis-datalab yelp-dataset

Last synced: 14 Apr 2025

https://github.com/stappit/blog

I often post solutions to textbook exercises, including: Bayesian Data Analysis (BDA) by Gelman et al; Causal Inference in Statistics Primer (CISP) by Pearl et al; Purely Functional Data Structures (PFDS) by Okasaki.

bayesian-data-analysis blog data-analysis data-science gelman hakyll haskell pearl purely-functional-data-structures solutions stan static-site statistical-inference statistics

Last synced: 14 Mar 2025

https://github.com/jamesquinlan/intro-python

Introduction to Programming and Data Science with Python

data-science nlp python python-3

Last synced: 18 Aug 2025

https://github.com/waylonwalker/kedro-auto-catalog

Kedro catalog create with default configuration

data data-science kedro kedro-catalog kedro-hook kedro-plugin

Last synced: 12 Jun 2025

https://github.com/macropin/random-name-generator

Generate random male and female names with real-world probability.

data-science python random-generation test-data-generator

Last synced: 17 Jul 2025

https://github.com/ZackAkil/friendlier-data-labelling

Code resources for generating a google form for labelling data.

data-science google google-apps-script google-forms google-sheets machine-learning

Last synced: 04 Apr 2025

https://github.com/mohidex/data-pipeline-on-gcp

The Real-time Ecommerce Data Collection and Processing project empowers businesses with real-time insights by efficiently extracting, processing, and storing ecommerce data from multiple sources. Combining Golang and Python, this cutting-edge solution streamlines data handling from diverse ecommerce websites.

beautifulsoup data-engineer data-pipeline data-science database datastore dependency-injection firebase firestore gcp go golang google google-cloud pubsub python solid-principles storage web-scraping

Last synced: 14 Apr 2025

https://github.com/iamyajat/whatsapp-chat-analyzer-api

An API to analyse WhatsApp chats and generate insights

data-analysis data-science fastapi python whatsapp

Last synced: 17 Oct 2025

https://github.com/archie-cm/churn-analysis-ecommerce-customer

The objective of this project to is to predict customer churn, loss opportunity and provide recommendations to the business team so the company can implement a customer persona in retention strategy and can monitoring throught dashboard interactive.

data-science feature-engineering machine-learning python scikit-learn

Last synced: 23 Apr 2025

https://github.com/zeitsperre/canada-climate-python

A set of methods for collecting, parsing, converting, and presenting Environment Canada Weather Station data

canada climate data-science python weather

Last synced: 22 Jul 2025

https://github.com/sondosaabed/data-analyst-nanodegree

I aquired a full scholarship from Google Launchpad. Advanced data wrangling skills to work with messy, complex real-world datasets. Highly customized visualizations using the Matplotlib Python library

data-science dataanalysis datawrangling nanodegree python udacity-nanodegree

Last synced: 09 Apr 2025

https://github.com/inseefrlab/grandedim

Codes correspondant au document de travail "L'รฉconomรฉtrie en grande dimension"

data-science econometrics high-dimensional-data publication r statistics

Last synced: 13 Jun 2025

https://github.com/yisaienkov/tinysets

The project aims to collect various datasets for tasks such as classification, clustering, object detection... The purpose of this datasets is quick checking models and algorithms performance.

algorithms classification data data-science dataset datasets kaggle kaggle-dataset lego lego-minifigures lego-sets object-detection pypi python regression text-classification tinysets

Last synced: 14 Apr 2025

https://github.com/alipsa/matrix

Groovy library for working with tabular data.

analytics data-science groovy tables

Last synced: 02 Apr 2026

https://github.com/virajbhutada/spotify-track-analysis-and-recommendation

Experience a comprehensive exploration of Spotify's musical landscape seamlessly transitioned from Tableau visualizations to SQL analysis. Dive into track inventory, streaming metrics, and sonic trends via interactive dashboards, while leveraging SQL queries for deeper insights into KPIs and cross-platform rankings.

audio-analysis data-analysis data-analytics data-science data-visualization eda machine-learning-library ml-models mysql recommendation-system spotify spotify-data spotify-dataset sql-database sql-server streaming-metrics tableau tableau-public trends-analysis

Last synced: 28 Apr 2025

https://github.com/vbyan/deeva

๐Ÿš€Deeva - your smart analytics companion for Object Detection datasets

data data-science data-visualization datasets deeva machine-learning object-detection plotly python statistics streamlit visualization

Last synced: 26 Jun 2025

https://github.com/dhhruv/kisaani

"Kisaani" is an application that takes required parameters intelligently or from the database of the location (from the cloud) and provides the list of best crops suited for that land. The application should also be able to collect the outcome after cultivation and apply correction as appropriate for further advisories. The details of the crops for the region and conditions are provided. Applications should be interactive, user friendly for farmers (provide local language support) and should provide support in real time.

crop crop-recommendation data-science ieee ieee-hackathon machine-learning

Last synced: 07 Mar 2026

https://github.com/nhs-south-central-and-west/data-science-guides

Guides for common data science tasks, in R & Python

data-science machine-learning python r regression

Last synced: 03 May 2025

https://github.com/invia-flights/blitzly

Lightning-fast way to get plots with Plotly โšก๏ธ

data-analysis data-science plotly plotting-in-python python visualization

Last synced: 14 Jan 2026

https://github.com/stink-po/boxoffice_api

Unofficial Python API for Box Office Mojo

data-science dataset movies-and-cinemas scraper

Last synced: 07 Sep 2025

https://github.com/memgonzales/pisa-2018-analysis

Jupyter notebook presenting the process of data preparation, research question formulation, data analysis, and data modeling with the goal of extracting insights from the 2018 PISA Dataset

data-cleaning data-modeling data-science data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy oecd-data pandas pisa scipy statistical-inference

Last synced: 13 Jun 2025

https://github.com/zmoooooritz/stapy

An easy to use SensorThings API Client written in Python

api cli data-science database ogc python sensor sensor-data sensorthings sensorthings-api

Last synced: 17 Jan 2026

https://github.com/matcom/programming-for-data-science

Curso de Programaciรณn para la carrera de Ciencia de Datos de la Facultad de Matemรกtica y Computaciรณn de la Universidad de La Habana.

data-science data-science-python introduction-to-data-science introduction-to-programming introduction-to-python matcom matcom-uh programming programming-course python python-data-science university-of-havana

Last synced: 12 Oct 2025

https://github.com/itzmeanjan/corporatez

Data analysis done on Ministry of Corporate Affairs, Govt. of India's open data to get deeper insight, with :heart:

company-data corporate data-science data-visualization govt-company india matplotlib opendata python3 visualization

Last synced: 14 Oct 2025

https://github.com/dataship/python-dataship

Lightweight tools for reading, writing and storing data, locally and over the internet for python

column-store data-science machine-learning numpy pandas

Last synced: 23 Apr 2025

https://github.com/ruivieira/nim-mentat

A Nim library for data science and machine learning

data-science library machine-learning nim scientific-computing

Last synced: 10 Aug 2025

https://github.com/montanaz0r/mit-6.0002-course

My solutions to the assignments of MIT 6.0002 course.

data-science mit python python3

Last synced: 12 Aug 2025

https://github.com/nikoshet/new-york-city-taxi-fare-prediction-machine-learning

Project for course 'Machine Learning' for M.Sc. 'Data Science and Machine Learning' in NTUA

data-science keras-tensorflow machine-learning numpy pandas python

Last synced: 12 Sep 2025

https://github.com/sanvishal/Exoplanet-Explore

An Interactive data visualization of Exoplanets

animation d3js data-analysis data-science exoplanet python space visualization

Last synced: 14 Apr 2025

https://github.com/navdeep-g/sdss-2019

Interpretable Machine Learning with rsparkling

data-science h2o-3 machine-learning r rsparkling spark sparklyr xai

Last synced: 07 Apr 2025

https://github.com/zMoooooritz/stapy

An easy to use SensorThings API Client written in Python

api cli data-science database ogc python sensor sensor-data sensorthings sensorthings-api

Last synced: 15 May 2025

https://github.com/tchlux/util

My machine learning, optimization, and data science utilities package.

data-science machine-learning numerical-optimization python-utilities splines statistics visualization

Last synced: 02 May 2026

https://github.com/deezinn/fecomercio-dataanalysis

Um sistema de anรกlise de dados integra e processa informaรงรตes de mรบltiplas fontes, convertendo dados brutos em insights valiosos.

csv data-science jupyter-notebook powerbi

Last synced: 15 Jul 2025

https://github.com/pfed-prog/catalonia_data

we have analyzed air quality in Catalonia by using the data from the Catalan Transparency Portal.

data-science dspyt jupyter-notebook ocean oceanprotocol python3

Last synced: 05 Oct 2025

https://github.com/ikegwukc/csc-405-605_spring_2022

Introductory Data Science Course Taught at UNCG (Spring 2022)

data-science datascience

Last synced: 09 Oct 2025

https://github.com/curiousily/simple-neural-network-with-tensorflow-js

Build a simple Neural Network model in TensorFlow.js to make a laptop buying decision. Learn why Neural Networks need activation functions and how should you initialize their weights.

artificial-intelligence data-science deep-learning javascript machine-learning neural-networks tensorflow

Last synced: 26 Apr 2025

https://github.com/raynardj/forgebox

The deep learning tool box

data-science machine-learning nlp pandas-dataframe

Last synced: 16 Oct 2025

https://github.com/2krishnayadav/datta-vishleshan.ai

DattaVishLeshan.AI is one of the first data analysis projects developed in India, and perhaps even in the world. The name "Datta" means "data," while "Vishleshan" translates to "analysis." It is a pioneering AI project dedicated to advanced data analysis.

data-analysis data-science data-visualization database

Last synced: 03 Jul 2026

https://github.com/andersy005/dask-notebooks

Dask tutorials for Big Data Analysis and Machine Learning as Jupyter notebooks

dask data-science distributed-computing jupyter-notebook parallel-computing python

Last synced: 30 Aug 2025

https://github.com/scicloj/tcutils

Utility functions for working with tablecloth datasets

clojure data-science scicloj tablecloth tmd

Last synced: 09 Apr 2025

https://github.com/d3group/dddex

The package 'data-driven density estimation x' (dddex) turns any standard point forecasting model into an estimator of the underlying conditional density

data-science density-estimation operations-research

Last synced: 24 Apr 2025

https://github.com/cricksmaidiene/mids_machine_learning

๐Ÿค– A unified repository of coursework fragments from UC Berkeley MIDS ML courses

coursework data-science generative-ai jupyter-notebook machine-learning numpy pandas prompt-engineering scikit-learn spark tensorflow uc-berkeley

Last synced: 10 Oct 2025

https://github.com/snowflakedb/snowpark-checkpoints

Snowpark Python / Spark Migration Testing Tools

data-analytics data-engineering data-science python snowflake sql

Last synced: 31 Aug 2025

https://github.com/wpanas/ml-snippets

Code snippets for faster ML development

data-science hacktoberfest machine-learning pandas python seaborn snippets

Last synced: 07 Apr 2025

https://github.com/wlongxiang/dutch_traffic_monitor

Visualize traffic on dutch high way A9 as an example

computer-vision data-science deep-learning object-detection opencv visualization

Last synced: 04 Aug 2025

https://github.com/lockedata/opentrainingcontent

An MIT & CCBY4.0 licensed repository of training materials from Locke Data

data-science open-course r-stats

Last synced: 29 Jul 2025

https://github.com/sanketrs/implementation-of-modern-data-engineering-architecture-with-fabric_analytics

Building a next-generation hybrid data pipeline architecture that combines the power of Microsoft Fabric, Azure Cloud, and Power BI. This pipeline is engineered to tackle the challenges of real-time data ingestion, multi-layered processing, and analytics, delivering business-critical insights.

azure azure-data-factory azure-fabric bi-analytics big-data-analytics big-data-projects cloud-data-warehouse cloud-dataflow data-analytics data-engineering data-engineering-pipeline data-engineering-project data-pipeline-monitoring data-science data-visualization data-warehouse etl etl-framework etl-pipeline

Last synced: 14 Apr 2026

https://github.com/insightsengineering/nest

Website for the Nest project ๐Ÿชบ

clinical-trial-analysis data-science nest r shiny website

Last synced: 12 Sep 2025

https://github.com/solarwinds/probprog-workshop

Workshop on probabilistic programming for Tech Summit 2020 (Brno, December 12 2019)

data-science machine-learning probabilistic-programming statistics

Last synced: 24 Apr 2025

https://github.com/ngupta23/more

This is a helper package for pandas, visualizations and scikit-learn

data-science helpers pandas python scikit-learn visualization visualizations

Last synced: 24 Oct 2025

https://github.com/tikerlade/ml-art-spbsu-sirius-2024

ะœะฐั‚ะตั€ะธะฐะปั‹ ะพะดะฝะพะณะพ ะบัƒั€ัะฐ ะฟะพ ML & Art ัะพ ัะผะตะฝั‹ ะฒ ะœะšะ ะกะธั€ะธัƒั 2024

art data-science orange

Last synced: 29 Jul 2025

https://github.com/koldlight/r4ds

R for data science course

course data-analysis data-science data-viz r

Last synced: 30 Apr 2025

https://github.com/ndleah/tsf-data-science-internship

This repository contains the tasks performed during the Data Science and Business Analytics Internship at The Sparks Foundation

data-science data-visualization exploratory-data-analysis machine-learning powerbi python virtual-internship

Last synced: 20 Sep 2025

https://github.com/hmiladhia/piskle

A serialization package optimized for scikit-learn

data-science machine-learning python scikit-learn serialization

Last synced: 28 Oct 2025

https://github.com/abhinav-ark/mal_lyrics_analysis

Preprocessing and EDA on a Dataset of Malayalam Songs and Lyrics

data-science eda jupyter-notebook python

Last synced: 22 Jul 2025

https://github.com/zsxkib/ttds-g35-cw3

TTDS Group Project: Video Games Search Engine. Sakib Ahamed. Dan Buxton, Kenza Amira, Wini Lau, Mansoor Ahmad

corpora data-science neural-ranking-models pagerank query search-engine technologies text text-analysis text-classification ttds web-search

Last synced: 10 Apr 2025

https://github.com/jackgerrits/reductionml

Reduction-based machine learning framework with a focus on contextual bandits

contextual-bandits data-science machine-learning online-learning rust

Last synced: 10 Apr 2025

https://github.com/curiousily/linear-regression-with-tensorflow-js

Build a Linear Regression model using TensorFlow.js and use to predict house prices

artificial-intelligence data-science javascript linear-regression machine-learning tensorflow tensorflowjs

Last synced: 01 Sep 2025

https://github.com/gbeckers/birdwatcher

A Python computer vision library for animal behavior

animal behavior computer-vision data-science ffmpeg opencv python science

Last synced: 13 Oct 2025

https://github.com/ucbds-infra/ds-course-infra-guide

An educator's guide to creating a data science course

data-science jupyter jupyter-book

Last synced: 07 Oct 2025

https://github.com/ren294/log-analysis-project

This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.

apache-kafka apache-nifi apache-spark big-data big-data-analytics cassandra cassandra-driver data-engineering data-science grafana hadoop hadoop-hdfs hive powerbi spark-rdd spark-sql spark-streaming

Last synced: 08 Jul 2025

https://github.com/erictleung/data-science

:computer: Repository for teaching materials and notes on machine learning and data science for freeCodeCamp

data-cleaning data-engineering data-science data-visualization freecodecamp learning machine-learning mathematics notes python statistics

Last synced: 25 Mar 2025