An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/nxion/sql-data-warehouse-project

Building a modern data warehouse with MS SQL server, ETL processes, data modeling and analyitics.

data data-analysis data-analytics data-engineering data-lakehouse data-warehouse datalake datascience etl etl-job medallion-architecture ms mssql sql sql-query sql-server

Last synced: 05 Jun 2026

https://github.com/fastpix/android-data-kaltura

This SDK enables seamless integration with Kaltura Player, offering advanced video analytics via the FastPix Dashboard

analytics android-sdk data fastpix kaltura kaltura-player metrics sdk video video-metrics

Last synced: 21 Apr 2026

https://github.com/vishwas-chakilam/movies-review-scraping-analysis

A project for collecting, cleaning, and analyzing movie data. Includes scripts for web scraping (deprecated) and using the OMDb API to fetch movie details. Analyze and visualize data with Python and Power BI to uncover insights and trends in movie ratings and genres.

data dataanalysis datacleaning datavisualization matplotlib-python numpy-library pandas python webscraping

Last synced: 21 Apr 2026

https://github.com/zawaung7791/streamlit-data-viewer

Data previewer using streamlit, plotly and python

data plotly python streamlit

Last synced: 21 Apr 2026

https://github.com/jdenn0514/surveycore

Core Survey Analysis Infrastructure

data r resear survey-analysis

Last synced: 21 Apr 2026

https://github.com/samridhisainii/airbnb-data-analysis

Data analysis of airbnb dataset

analysis data data-visualization eda models

Last synced: 16 May 2026

https://github.com/coryson/osm-mla-finder

Python script to locate institutions employing Medical Laboratory Assistants in Germany, developed for BTZ – Berufliche Bildung Köln GmbH. It uses OpenStreetMap, SerpAPI, and web scraping to find and verify relevant labs, clinics, and diagnostic centers.

beautifulsoup data openstreetmap osm python scraping serpapi webscraping

Last synced: 24 Apr 2026

https://github.com/cyberoctane29/python-for-data-analysis

A repository dedicated to learning Python for data analysis, data science, and data analytics. This collection of Jupyter notebooks covers practical exercises and concepts from the Google Advanced Data Analytics Professional Certificate program.

data data-analysis data-analytics data-science python

Last synced: 24 Apr 2026

https://github.com/issacto/kowloonwestparking

Deployed Web App

data hongkong react

Last synced: 24 Apr 2026

https://github.com/marielachirinosr/cyclistic-data-analytics-project

This project explores user behavior within a fictional bike-sharing system, modeled after Cyclistic, operating in Chicago.

data data-visualization pandas powerbi-report powerbi-visuals python

Last synced: 24 Apr 2026

https://github.com/mehmetkahya0/gallstone_dataset_analysis_project

Safra Taşı Hastalığı (Gallstone-1) Veri Seti Analizi (https://archive.ics.uci.edu/dataset/1150/gallstone-1)

analysis analytics data data-analysis data-science data-visualization database graph matplotlib python

Last synced: 25 Apr 2026

https://github.com/thinkphp/my-react-tictactoeai-app

App React Tic Tac Toe Component based on Artificial Intelligence

ai algoirthms data datastructures games javascript react

Last synced: 25 Apr 2026

https://github.com/mlkav/tri-hita-karana

Project Tri Hita Karana - Future Knowledge G20 Bali. DTS Kominfo x Binar Academy.

bali data data-science g20 science

Last synced: 06 Jun 2026

https://github.com/marielachirinosr/hotel-data-analysis

Pandas & Matplotlib Learning Analysis. Repository featuring data analysis projects using Pandas and Matplotlib libraries

data data-analysis matplotlib pandas python

Last synced: 25 Apr 2026

https://github.com/jigyasag18/multiple-disease-detection-app

This repository contains the implementation of a Multiple Disease Detection System, which employs advanced machine learning techniques for early detection and prediction of prevalent diseases, including diabetes, heart disease, and Parkinson's disease. The system utilizes a variety of patient health metrics such as demographics and medical history.

data datapreprocessing machine-learning machine-learning-algorithms machinelearningmodel prediction python streamlit streamlit-webapp

Last synced: 07 Jun 2026

https://github.com/luminati-io/seleniumbase-with-proxy

SeleniumBase with authenticated proxies to bypass restrictions, enhance web scraping, and manage rotating proxies for better data extraction.

data data-collection proxy-server python residential-proxy selenium seleniumwire web-scraping

Last synced: 27 Apr 2026

https://github.com/ioanzicu/batch_loading_one-to-many_data_model

Unesco Batch Loading One-to-Many Data using Django

batch data django sqlite3

Last synced: 27 Apr 2026

https://github.com/yuweaec/project-scidatapipeline

A comprehensive toolkit for processing, simulating, and analyzing scientific data, integrating Python, Fortran, and Jupyter notebooks for seamless workflows.

analysis data pipeline processing scientific simulation

Last synced: 27 Apr 2026

https://github.com/demkeys/lazydatatransfer

Lazy method to transfer upto 64kb of data over the network using UDP

data data-trans network python transfer udp

Last synced: 07 Jun 2026

https://github.com/schenkd/tweetminer

Data Miner for Twitter Streaming API

data dataminer datamining java twitter twitter-api twitter4j

Last synced: 07 Jun 2026

https://github.com/santiagoenriquega/custom_database

Python-based database library for database management, indexing, transactions, and constraints, showcasing foundational database concepts.

data data-engineering database database-design python

Last synced: 27 Apr 2026

https://github.com/chompfoods/stub-inflector

Inflector server stub for the Chomp Food & Recipe Database API. Use our API to get high-quality data on recipes and 875,000+ branded/grocery foods plus raw ingredients.

api branded chomp data database food grocery inflector ingredients nutrition raw recipe-api recipes server stub stub-inflector stub-server

Last synced: 27 Apr 2026

https://github.com/gngdb/llamass

LLAMASS is an arbitrary collection of tools I've put together to deal with motion data

amass data pose pytorch

Last synced: 28 Apr 2026

https://github.com/oguzhanfatihkucuk/data-analytics-project-kafka-spark

The data in this project was collected in a database using Apache Kafka and processed with Apache Spark Streaming. The project aims to create a forecasting model and analyze sales forecasts per customer.

big-data data data-visualization hadoop kafka ml mlpipeline plt pyhton spark

Last synced: 28 Apr 2026

https://github.com/shreeparab1890/indian-elections-2019-analysis-eda

This ipython notebook is the Exploratory data analysis (EDA) of the Indian Lok Sabha Elections 2019.

data data-analysis data-science data-visualization eda exploratory-data-analysis matplotlib numpy pandas plotly python python3 visualization

Last synced: 28 Apr 2026

https://github.com/priyanshubiswas-tech/e-commerce_data_analysis

Analyzes 9,994 e-commerce transactions to uncover insights on sales trends, customer behavior, profitability, and logistics using EDA and visualization. Identifies top products, customer segments, and shipping efficiencies to optimize marketing, inventory, and operations, making it valuable for retail, finance, and logistics.

data data-analysis data-visualization pandas pandas-dataframe plotly-analytics-projects plotly-express python

Last synced: 28 Apr 2026

https://github.com/mrlynn/sizing-exercise-data-generator

Data Generator for December 2017 Sizing Exercise

data generator mongodb

Last synced: 28 Apr 2026

https://github.com/i-am-uchenna/sql-data-warehouse-project

The Data Warehouse and Analytics Project is a comprehensive initiative designed to demonstrate the end-to-end process of building a modern data warehouse and deriving actionable insights through SQL-based analytics.

architecture business-intelligence crm data data-analysis database database-management datawarehouse erp etl etl-pipeline model sql sqlserver

Last synced: 15 May 2026

https://github.com/kfrural/customer-churn-prediction

Customer churn prediction using machine learning. The project follows CRISP-DM and KDD methodologies, including data preprocessing, feature engineering, modeling, and evaluation. It also features an interactive dashboard for visualizing results.

crisp-dm data jupyter kdd python

Last synced: 29 Apr 2026

https://github.com/mtalhaofc/nutrition_system

A simple AI-powered web app built using Streamlit that provides personalized weekly meal plans and nutrition recommendations based on user demographics, health goals, and nutritional preferences.

cosine-similarity data data-science food machine-learning model nutrition pandas python streamlit

Last synced: 29 Apr 2026

https://github.com/mumtaz4118/scraping-medium-and-data-analytics

The file DataExtraction.py extracts information from the json files scrapped by the scrapper medium_scrapper_post.py. To extract information from json files scrapped by medium_scrapper_tag_archive.py (scrapping from tags archive) then use Data_Extraction_Archive_Tags.py

data data-analysis data-analytics data-extraction data-preprocessing data-science data-scraping deep-learning machine-learning python

Last synced: 29 Apr 2026

https://github.com/stdlib-js/array-struct-factory

Return a constructor for creating arrays having a fixed-width composite data type.

array composite data factory javascript node node-js nodejs stdlib struct structure typed typed-array types

Last synced: 29 Apr 2026

https://github.com/shoaib1522/data-aggregator-tool-in-python

This all are the illustration of the things used in " Data Aggregation Tool " as a scenario of Data Science Engineer written in Document(PDF)

data data-science dataaggregation lists python-script python3 sets-python tuples

Last synced: 29 Apr 2026

https://github.com/mr-dhan/eda-sales-customer-transactions

Dalam dunia bisnis ritel yang kompetitif, pemahaman mendalam terhadap perilaku pelanggan merupakan fondasi penting untuk pengambilan keputusan strategis. Namun, data transaksi pelanggan seringkali berjumlah besar dan kompleks, sehingga memerlukan proses analisis yang efektif untuk mengungkap insight yang berharga.

dashboard data data-analysis data-analysis-python data-science data-visualization eda python

Last synced: 29 Apr 2026

https://github.com/chandansoren/financial-budget-analysis

Financial budget for 2021

analytics data python

Last synced: 29 Apr 2026

https://github.com/koltyakov/pgcopy

🐘 PostgreSQL data migration tool

cli data database golang migration postgresql sync

Last synced: 29 Apr 2026

https://github.com/diegoperea20/pytorch-vs-tensorflow

Testing the differences of the pytorch and tensorflow libraries in the different prediction and classification applications, each of them gives improvements depending on the problem they are assigned or data set assigned.

classification data images prediction pytorch tensorflow

Last synced: 29 Apr 2026

https://github.com/istinnew/eniac_ab_insight

Dive into a comprehensive analysis aimed at boosting iPhone 13 sales by optimizing the Click-Through Rate (CTR) of the “SHOP NOW” button, compare different button designs and determine the most effective strategy for increasing engagement.

ab-testing data data-analysis data-engineering data-science data-visualization google googlecolab libraries python testing testing-tools visual-studio-code

Last synced: 29 Apr 2026

https://github.com/dxtaner/graphql_events

Graphql-Events

data events graphql

Last synced: 29 Apr 2026

https://github.com/fs23yayan/membuatfungsidatapemrosesan

Membuat Fungsi Data Pemrosesan for Data Science in Marketing : Customer Segmentation with Python - Part 2

data function processing

Last synced: 29 Apr 2026

https://github.com/devcsrj/docparsr-jvm

JVM client for https://github.com/axa-group/Parsr

data document extraction nlp ocr pdf

Last synced: 08 Jun 2026

https://github.com/axnjr/csv-parser-utils

My own Pandas in Go, Python & Rust, Utility methods for Handling CSV Files in Core Go & Rust with bindings for python.

csv data dataanalysis datatools go golang golang-application pandas python rs rust

Last synced: 29 Apr 2026

https://github.com/gvatsal60/ds-on-kaggle

A collection of data science projects, experiments, and insights from Kaggle competitions and datasets

data data-science data-visualization numpy pandas python3

Last synced: 29 Apr 2026

https://github.com/patrickdavies100/pipeline38

An application to automate the creation and execution of SQL queries.

data pandas-dataframe pipeline postgresql psycopg2 sqlalchemy

Last synced: 30 Apr 2026

https://github.com/abhinav330/instagram-influencers-analysis

This Jupyter Notebook focuses on preprocessing and visualizing data from an Instagram profiles dataset. It includes data loading, inspection, visualization, and some data preprocessing steps.

data data-science data-visualization exploratory-data-analysis exploratory-data-visualizations influncer-products instagram scikit-learn sklearn

Last synced: 08 Jun 2026

https://github.com/samiksha29-patil/hr-employee-data-analysis-visualization-in-python

This project focuses on analyzing an HR Employee Dataset that contains details about employees such as demographics, job status, salaries, performance reviews, satisfaction levels, and attrition reasons.

csv-files data data-visualization dataanalysis matplotlib numpy pandas python seaborn

Last synced: 30 Apr 2026

https://github.com/omarsaad21/it-salary-eda

A python EDA project implemented on IT department salaries data we made data exploration and made data visulization for some questions on dataset

data explotary-data-analysis juypter-notebook numpy pandas python visualization

Last synced: 30 Apr 2026

https://github.com/chompfoods/sdk-jaxrs-cxf

JAXRS-CXF SDK for the Chomp Food & Recipe Database API. Use our API to get high-quality data on recipes and 875,000+ branded/grocery foods plus raw ingredients.

apache-cxf api branded chomp cxf data database food grocery ingredients java jax-rs nutrition raw recipe-api recipes sdk

Last synced: 30 Apr 2026

https://github.com/fatihilhan42/olympics-data-analysis-with-python

I will examine the Data Analysis of the Olympics between 1896-2016, which we have done on Python.

data data-science dataanalysis datavisualization jupyter-notebook olympics python

Last synced: 30 Apr 2026

https://github.com/ddeepanshu-997/datascience-e-commerce-shopping-details-

in this project i am going to apply data preprocessing technique on the dataset in order to clean the data using libraries, etc. make some insights/analyses to findout the hotpicks of the shopping along with some data visualsation libraries to get the trends and many more aspects in order to make a small contribution to the field of data science

cleaning-data data data-science data-visualization dataframe datapreprocessing dataset libraries matplotlib-pyplot numpy pandas plots python visualization

Last synced: 30 Apr 2026

https://github.com/dnut/json-match-finder

Python application used to match listings against openings via authenticated JSON API access.

data data-structures data-wrangling database json-api python-application python-modules

Last synced: 01 May 2026

https://github.com/dhimmel/hgnc

Extracting human gene families from HGNC

data gene-families genes hgnc hugo human

Last synced: 01 May 2026

https://github.com/dantetrb/diabetes-readmission-dbt

Predictive analytics on diabetic patient readmissions using dbt, DuckDB and Python – with explainability and clustering.

clustering data dataengineering dbt diabetes duckdb hdbscan healthcare jupyter lime readmission-prediction sql

Last synced: 01 May 2026

https://github.com/lut-ful/ibm-capstone-project-stack-overflow-job-survey

IBM Data Analyst professionale certificate program final project.

cognos data data-analytics looker power-bi python sql statics

Last synced: 01 May 2026

https://github.com/ahmed-naserelden/astro-success-analytics

This project analyzes key factors influencing success in the Space Race using data science techniques. It includes data collection, machine learning modeling, and insightful visualizations to predict mission outcomes.

data dataanalysis python

Last synced: 01 May 2026

https://github.com/dnut/associations

Python 3 library to identify high-dimensional statistical relationships in any data set.

analytics arch-linux association-rules data data-analysis data-mining data-science machine-learning python-modules

Last synced: 01 May 2026

https://github.com/shauryauppal/mydatatoolkit

A toolkit for data scientists to get work done faster, easier, and in a smarter way.

analytics awesome-list data data-science hacktoberfest

Last synced: 08 Jun 2026

https://github.com/chompfoods/sdk-kotlin

Kotlin SDK for the Chomp Food & Recipe Database API. Use our API to get high-quality data on recipes and 875,000+ branded/grocery foods plus raw ingredients.

api branded chomp data database food foods grocery ingredients kotlin nutrition raw recipe-api recipes sdk sdk-kotlin

Last synced: 01 May 2026

https://github.com/nel-zi/climainsights

Developed an automated ETL pipeline using Apache Airflow and Python to collect, process, and store weather data from multiple cities via Weatherstack API. Implemented data cleaning, orchestration, and error handling to ensure accuracy and scalability.

airflow apache-spark data data-engineering engineering etl-pipeline

Last synced: 01 May 2026

https://github.com/sandygcabanes/etl-earthquake-data-from-usgs-google-cloud-composer-airflow

Airflow, Google Cloud Composer, GCS, BigQuery, Python. This automated pipeline pulls daily earthquake data from a trusted public source, stores it securely in the cloud, and organizes it into clean, searchable tables for analysis.

cloud composer dag data engineering etl etl-pipeline google json python

Last synced: 01 May 2026

https://github.com/sorairolake/japanese-era-dataset

日本の元号のデータセット / Dataset of the Japanese era

data dataset date japanese-calendar japanese-era json toml wareki yaml

Last synced: 01 May 2026

https://github.com/sebastianbrzustowicz/github-data

Java + Spring Boot. Application for sending requests to GitHub API and collecting received data.

api ci data github json junit mapping parallel repository rest-api stream

Last synced: 01 May 2026

https://github.com/muhammadadilnaeem/bcg-data-science-job-simulation-on-forage-august-2024

This repository contains all the tasks, code, and documentation completed during the BCG Data Science job simulation on The Forage platform. The simulation focused on analyzing customer churn, building predictive models, and presenting insights for a major utility company.

bcg customer-churn-prediction-with-machine-learning data data-science forage numpy pandas

Last synced: 01 May 2026

https://github.com/robwiederstein/covid-19-ky

Monitor US covid-19 cases w/ Johns Hopkins data

data data-visualization leaflet plotly r shell

Last synced: 02 May 2026

https://github.com/lurenss/healthypandas

A library that takes row output from the export of the Iphone Health app and produce pandas dataframes.

data health ios pandas

Last synced: 02 May 2026

https://github.com/rbreeze/dashboard

My personal health dashboard, with daily stats on food and sleep. Undergone several redesigns since 2015.

css dashboard data data-visualization design front-end google-sheets google-sheets-api health html javascript personal-health-record personal-website running static static-site visualization

Last synced: 02 May 2026

https://github.com/gcoronelc/ucv_gdi-1_202302-a2

Taller de Gestión de Datos e Información I con Gustavo Coronel.

data data-science database databases machine-learning machinelearning oracle sql sql-server

Last synced: 02 May 2026

https://github.com/hafs96/prediction_consommation-de-carburant

Dans ce projet, l'objectif est de développer un modèle permettant de prédire si une voiture a une consommation de carburant élevée ou faible en fonction de ses caractéristiques techniques.

analysis data data-visualization machine-learning testing training

Last synced: 09 Jun 2026

https://github.com/mubashirsidiki/olympics-data-enigeering

Worked with Azure Data Factory, Databricks, Data Lake Storage, and Synapse Analytics to build an ETL pipeline for processing and analyzing Olympic Games data from Kaggle.

analytics azure big-data data dataengineering devops pipeline

Last synced: 02 May 2026

https://github.com/s1dewalker/electric-future

Visual Analysis: Future of Automotive Industry

data data-visualization machine-learning python3 regression-analysis tableau

Last synced: 02 May 2026

https://github.com/jesuscc1993/data-cleaner-extension

Clears browser data in a single click.

application-data chrome chrome-extension data

Last synced: 02 May 2026

https://github.com/prakashpandey16/sql_data_warehouse_project

Building a modern data warehouse with SQL Server, including ETL Processes, data modeling, and analytics.

cleaning-data data data-engineering data-science database etl-pipeline sqlserver

Last synced: 03 May 2026

https://github.com/ebrizzzz/data-visualization-project-using-tableau

A data visualization project for the Visual Data Analysis course (Spring Term 2025) at the University of Skövde. This project explores the factors influencing national happiness scores across different global regions from 2005 to 2022.

analytics data data-analysis data-science data-visualization python regression tableau

Last synced: 16 Jun 2025

https://github.com/asacxyz/flutter_aplicando_persistencia_de_dados

Para acompanhamento do curso Flutter: aplicando persistência de dados

dart data data-storage flutter persistence persistent-storage sqflite sql sqlite

Last synced: 03 May 2026

https://github.com/tn3w/moviedb-json

A JSON library with 981,530 films.

data database db json movie movie-database movies

Last synced: 03 May 2026

https://github.com/arnavk-09/phishing-detection

🎣 Detect Phishing URLs with Data Pre-fitted... API & Web UI

csv data fastapi flask python scikit-learn

Last synced: 03 May 2026

https://github.com/yugsumeet17/churn-analysis-project--power-bi-sql-machine-learning

Dataset Explained, Project Goals & Metrics Required, SQL Server ETL & Data Cleaning, Power BI Data Load, Transformation, Blueprint & Measures, Power BI Visualization - Summary Page, Building Machine Learning Model - Random Forest, Power BI Visualization - Churn Prediction Page

data data-visualization dataanalytics excel postgresql powerbi python3

Last synced: 03 May 2026

https://github.com/yash-chauhan-dev/spark_cluster_docker

Set-up local spark cluster, hadoop (hdfs), airflow, postgresql on docker with ease, without any local installations

apache-spark data data-engineering data-engineering-pipeline deployment docker docker-compose hadoop hdfs local-development localhost pyspark python

Last synced: 04 May 2026

https://github.com/qrailibs/dataflow

✨ Data processing in Node.js made multithreaded and type-safe.

data dataprocessing multithread node

Last synced: 04 May 2026