An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/adadalshabab/machine-predictive-maintenance-classification

This repository hosts a machine predictive maintenance classification project, aimed at predicting the maintenance needs of industrial machinery before they fail. By leveraging machine learning algorithms, this project seeks to enhance operational efficiency and reduce downtime by identifying potential maintenance requirements proactively.

data data-science datanalysis datanalytics machine-learning machine-learning-algorithms matplotlib-pyplot pandas

Last synced: 17 May 2026

https://github.com/istinnew/cook-me-up

[In Progress] Welcome to Cook-Me-Up! This project aims to analyze and organize cooking recipes using data analysis (Python, BigQuery SQL, Looker Studio etc.) and machine learning techniques. The goal is to simplify meal preparation and offer users a comprehensive database of culinary delights.

bigquery clustering cookme culinary data data-science dataanalysis datavisualization looker-studio machine-learning python recipe-search recipes unsupervised-learning

Last synced: 16 May 2026

https://github.com/ericmaddox/nyc-crime-analytics

Analyzes and visualizes crime data from the NYC Police Department using interactive maps and heatmaps, leveraging the NYC Open Data API.

crime-analysis crimedata data datavisualization esri folium heatmap nycopendata python python3 rtcc

Last synced: 24 Jun 2025

https://github.com/madhuresh2011/kulturehire-internship

☺️Hi folk, During my internship at KultureHire, I completed a real-world Data Analyst project. I created an interactive dashboard using pivot tables, conducted a thorough analysis, and provided actionable recommendations. I'm excited to share my work and the insights I discovered.

data data-analytics data-cleaning data-standardization data-visualization excel excel-pivot-charts excel-pivot-tables genz-aspirations my-sql

Last synced: 17 Feb 2026

https://github.com/nel-zi/zipco_foods

Developed an automated ETL pipeline using Python and Apache Airflow to consolidate fragmented CSV sales data into a normalized Azure SQL database for Zipco Foods.

airflow apache-spark data dataengineering etl pyspark wsl

Last synced: 03 May 2026

https://github.com/phette23/nces-ipeds-archive

download NCES IPEDS data

data datarescue ipeds nces

Last synced: 30 Oct 2025

https://github.com/toofancodes/h1b-dashboard-insights

An interactive Tableau dashboard that visualizes H1B visa data from the USCIS Employer Data Hub, offering insights into application trends, top employers, and geographic distributions. Showcases advanced data visualization, analytics, and business intelligence skills.

analysis analytics business-intelligence dashboard data data-visualization h1b h1b-visa interactive-data tableau

Last synced: 20 Jan 2026

https://github.com/amethyst-php/post

A comment, a note, a post, a pseudo-chat. Can be really anything

amethyst amethyst-package api data laravel post

Last synced: 17 May 2026

https://github.com/rameshaditya/dynamic-hybrid-data-grid

Facilitates faster read-and-write of large ordered collections of data.

algorithms data data-structures storage

Last synced: 23 Feb 2025

https://github.com/weecology/updating-data

Hugo website for instructions on how to make a regularly updating data pipeline

continuous-analysis continuous-integration data gh-actions living-data netlify travis-ci

Last synced: 17 Feb 2026

https://github.com/cmutel/jester

Import data from the olca-schema JSON-LD format into the HESTIA JSON-LD schema

agriculture data json-ld life-cycle-assessment ontology

Last synced: 26 Jul 2025

https://github.com/denisecase/cintel-03-data

Getting started with interactive data analytics in Python

analytics data interactive python shiny

Last synced: 11 Apr 2025

https://github.com/shivamsharma32/ipl-2022-analysis

The IPL 2022 Analysis project is a data-driven exploration of the Indian Premier League (IPL) 2022 cricket tournament. The analysis focuses on utilizing Python programming and various libraries to analyze and visualize the performance of teams, players, and key metrics in the IPL 2022 season.

data dataana dataanalytics datavi matplotlib python

Last synced: 17 May 2026

https://github.com/denisecase/buzzline-04-case

Adding live visualizations to streaming data applications

animation data kafka matplotlib python streaming

Last synced: 11 Apr 2025

https://github.com/aguven6/inmemory-data-processor

Convert tabular data to columnar data with index. Aim is to process huge data quicker especially in aggregation operation

columnar-storage data data-structures parallel-computing parallel-programming processing

Last synced: 17 May 2026

https://github.com/praveendecode/data-analysis

Implemented data analysis projects with interactive Streamlit UI for user-friendly data exploration and insights presentation

data data-science dataanalysis exploratory-data-analysis insights python streamlit-dashboard tableau tableau-public

Last synced: 04 Apr 2025

https://github.com/stkisengese/numpy-data-fundamentals

A comprehensive collection of NumPy exercises covering array manipulation, slicing, broadcasting, random data generation, and real-world data analysis applications.

data data-analysis numpy pre-processing

Last synced: 16 May 2026

https://github.com/rd-uk/rduk-data-sqlite

SQLite Data Provider implementation for rduk-data

data rduk sqlite

Last synced: 16 May 2026

https://github.com/ciscorn/japanmesh-rs

A Rust library for handling Japanese Grid Square Code (JIS X 0410:2002 地域メッシュコード)

census data geospatial japan rust

Last synced: 11 Jan 2026

https://github.com/pulipulichen/pts-local-news-dataset

A dataset containing local news from Public Television Service.

data dataset

Last synced: 27 Mar 2026

https://github.com/naufalbasara/superstores-pipeline

Data Pipeline on Dummy E-commerce with Apache Airflow

airflow data data-engineering data-pipeline data-warehouse postgresql

Last synced: 16 May 2026

https://github.com/paulveillard/cybersecurity-analytics

An ongoing collection of awesome software, libraries, learning tutorials, documents and books, technical resources and cool stuff about Analytics Engineering in Cybersecurity.

analytics bigdata bigquery cybernetics cybersecurity data data-engineering data-science encryption encryption-decryption seo seo-friendly seo-optimization

Last synced: 28 Mar 2025

https://github.com/hyfi06/unam-careers

A utility package for retrieving career information from UNAM.

career data npm-package unam

Last synced: 16 May 2026

https://github.com/sharmadhiraj/plot-pi

Graphical Representation of PI

data data-visualization html javascript js mathematics plot

Last synced: 28 Mar 2025

https://github.com/ellisvalentiner/legislation-embeddings

Embeddings for U.S. Congress legislation

data embeddings machine-learning nlp python

Last synced: 12 Aug 2025

https://github.com/ranjeetj06/insighthub

InsightHub is a data analytics project that helps automate the entire process of preparing, analyzing, and reporting on CSV data.

analysis begineer data springboot

Last synced: 17 May 2026

https://github.com/erictleung/2018-new-coder-survey

:beginner: Code to wrangle data from the 2018 New Coder Survey by freeCodeCamp

data data-cleaning dataset freecodecamp new-coders-survey programmers

Last synced: 03 Apr 2025

https://github.com/eloyhere/semantic-java

Semantic-Java is a modern, maven Java stream processing framework with zero dependencies. It elegantly blends the fluency of Java Streams, the laziness of JavaScript generators, and intelligent index-based control inspired by database indexing — perfect for time-series, event streams, and high-performance data pipelines as a maven pendency.

data functional functional-programming java pipeline stream

Last synced: 07 Apr 2026

https://github.com/chrisrobertsjr/chrisrobertsjr

Welcome to my Github Profile!

data data-analysis java r sql statistics

Last synced: 03 May 2026

https://github.com/danicaalana/breast-cancer-random-forest

This project is developed as part of Digital Skill Fair (DSF) 35.0 - Data Science by Dibimbing. I am using Wisconsin Breast Cancer Diagnostic Dataset from scikit-learn, which is a classic and very easy binary classification dataset.

breast-cancer-classification breast-cancer-wisconsin data eda machine-learning-algorithms python random-forest-classifier

Last synced: 16 May 2026

https://github.com/debruine/faux.jl

Julia version of faux for data simulation

data julia simulation

Last synced: 28 Mar 2025

https://github.com/shreedata/data-analysis-using-python-libraries-

The COVID-19 pandemic has significantly impacted India, necessitating a detailed analysis of the virus’s spread within the country. In this project, we explore an India-specific COVID-19 dataset, leveraging Python libraries such as Pandas, NumPy, Matplotlib, and Seaborn.

covid-19 data data-cleaning data-visualization datana kaggle-dataset matplotlib numpy pandas-python python3 pythonlibrarires scikit seaborn

Last synced: 28 Mar 2025

https://github.com/muneeb1030/webscrapper_politifact

This initiative seeks to extract and analyze fact-checking data from Politifact.com, providing valuable insights into political statements, rulings, and the evolving information landscape.

data data-collection dataanalysis python3 scrapy scrapy-spider webscraping

Last synced: 09 Sep 2025

https://github.com/tadiusfrank2001/data_mining_projects_labs_cs145

A collection of data mining course assignments to implement advanced predictive statistical analysis models

algorithms data data-mining data-science deep-learning predictive-modeling python3 wide-learning

Last synced: 16 May 2026

https://github.com/ishansurdi/data-visualisation-empowering-business-with-effective-insights

The following tasks are completed for Data Visualization: Empowering Business with Effective Insights on Forage in October 2024. It is important to note that this should not be interpreted as an endorsement.

chart communicating-insights-and-analysis dashboard data data-analysis forage powerbi powerbi-visuals tableau tata tata-group virtual-internship visual visualization

Last synced: 17 Feb 2026

https://github.com/umstek/sampler

Generate elaborate random data instantly.

data faker javascript json sample

Last synced: 20 Jul 2025

https://github.com/mx51/data-dictionary-action

GitHub Action for generating and checking freshness of data dictionaries

action analytics data

Last synced: 17 Jan 2026

https://github.com/akesling/csvb

Have CSV? Use CSVB!

analytics csv data database

Last synced: 02 Feb 2026

https://github.com/jefking/copyblobs

Copies all files in a container to another container, in another storage account.

aci arm azcopy azure blob container copy data file files from instant move one-time simple storage sync template to transfer

Last synced: 27 Mar 2025

https://github.com/eslamdyab21/apara-data-gui

Custom application for Apara's data wrangling scripts, Technologies used are Qt-designer, PyQt5 for the GUI and Pandas, Numpy for the data work.

csv data data-analysis data-wrangling gui pandas pyqt5-desktop-application qt5-gui

Last synced: 17 May 2026

https://github.com/huspacy/huspacy-resources

Resources for building and evaluating huspacy

data huspacy

Last synced: 21 Mar 2025

https://github.com/octoenergy/tentaclio-gdrive

A python project containing all the dependencies for the gdrive tentaclio schema

data

Last synced: 24 Jun 2025

https://github.com/tuscanicz/doctrine-data-applier

Symfony bundle for Doctrine Migrations of data using doctrine entities

data database doctrine entity migrations symfony symfony-bundle

Last synced: 02 Feb 2026

https://github.com/octoenergy/tentaclio-databricks

Module to give tentaclio support to databricks

data

Last synced: 24 Jun 2025

https://github.com/octoenergy/tentaclio-s3

A python project containing all the dependencies for s3 tentaclio schema.

data

Last synced: 24 Jun 2025

https://github.com/octoenergy/tentaclio-athena

A python project containing all the dependencies for awsathena+rest tentaclio schema.

data

Last synced: 24 Jun 2025

https://github.com/octoenergy/tentaclio-postgres

A python project containing all the dependencies for postgresq tentaclio schema.

data

Last synced: 24 Jun 2025

https://github.com/octoenergy/tentaclio-gs

A python project containing all the dependencies for gs tentaclio schema.

data

Last synced: 24 Jun 2025

https://github.com/prcharan592/olympic-insights-historical-data-analytics-in-r

This project analyzes 120 years of Olympic history (1896–2016), uncovering trends and insights from the data

data data-analytics data-science data-visualization kaggle r-programming

Last synced: 03 Apr 2025

https://github.com/kaizadp/bbwm_moisture

HOBO data for soil moisture - Bear Brook Watershed in Maine

data hobo-data soil-moisture

Last synced: 17 May 2026

https://github.com/theduardomaciel/cc-pe

Conteúdos, scripts em R e datasets utilizados durante a matéria de Probabilidade e Estatística.

data probability r statistics

Last synced: 27 Mar 2025

https://github.com/vijaykumar1303/sales-data-analysis-and-dashboard-development

To analyze sales data to uncover insights into sales performance, trends, and patterns, and to develop an interactive dashboard that provides a comprehensive view of sales metrics and KPIs.

data dataanalysis datacleaning datavisualisation dax-query powerbi powerquery sql sqldataanalysis

Last synced: 11 Feb 2026

https://github.com/wilcotomassen/lorem-datum-core

Java based data generator for data simulation

data dataset generator java lorem-ipsum simulated-data

Last synced: 11 Jan 2026

https://gitlab.com/sean-c/pdf_rules

Turn PDFs into CSVs by defining rules

Data Cleaning automation data data parsing

Last synced: 14 Apr 2025

https://github.com/maximkrouk/storage

Lightweight framework for storing data (beta)

cache data keychain memmory storage swift swift5-1 userdefaults

Last synced: 30 Oct 2025

https://github.com/ahmad-ali-rafique/logistic-regression-modeling

An in-depth exploration of logistic regression models, including data cleaning, model building, and performance evaluation on various datasets.

accuracy confusion-matrix data dataanalytics logistic-regression logistic-regression-classifier machine-learning-algorithms mlmodels model modelling regression-models

Last synced: 11 Sep 2025

https://github.com/lukaszkn/data-software-engineering-interview-questions

Data and Software engineering interview questions

data engineering interview-questions python

Last synced: 20 Jul 2025

https://github.com/campiohe/geomask

A very simple lib for creating geometric masks from spatial data using regular grids.

climate data gis weather

Last synced: 30 Dec 2025

https://github.com/mightymetrika/scdtb

Single Case Design Toolbox

data math r science statistics

Last synced: 04 Jan 2026

https://github.com/ramtinsoltani/safe-cli

A simple Command-line Interface which encrypts and decrypts UTF-8 files using AES-256.

aes-256 cli data data-hook decryption encryption generator handlebars hooks markup partial partial-decryption password safe swap temp temporary tool

Last synced: 16 Apr 2026

https://github.com/chocolateboy/corrigenda

Corrections, addenda, and deltas for data that's wrong on the Internet

addenda api corrections corrigenda data json json-data

Last synced: 27 Mar 2025

https://github.com/pawlo77/messenger-analyser

Repo for Data Visualization project, part of IAD study program at Faculty of Mathematics and Information Science, Warsaw University of Technology

data visualization

Last synced: 17 May 2026

https://github.com/nathanieliskandar26/data-analysis-project

This project demonstrates my ability to clean and analyze data using Python and SQL so far. The dataset used for this analysis focuses on general customer information. Through this project, I aimed to uncover meaningful insights and trends by cleaning the data and performing structured queries.

analysis data data-cleaning jupyter-notebook mysql mysql-database python

Last synced: 19 Apr 2026

https://github.com/apparaomulpuri/readline

Explains you the usage of readLine function in Swift.

data fromkeyboard keyboard reading readline swift

Last synced: 29 Mar 2025

https://github.com/vin20777/drone-data-layer

Drone Project Data Layer

csharp data drone layer software-design

Last synced: 18 May 2026

https://github.com/bodfdaf/api

api data service provider

api data detail instagram lazada shopee tiktok video

Last synced: 11 Mar 2025

https://github.com/pedelriomarron/spanish-api-covid19

Data from Spain of COVID-19 (by Datadista) as a service

api covid-19 covid-19-spain data now spain zeit

Last synced: 12 Mar 2025

https://github.com/solrikk/vargen

VarGen (Variation Generator) is a user-friendly desktop application designed to simplify the creation of product variations from CSV files.

csv-files csv-format csv-parser data data-engineering excel excelparser python

Last synced: 29 Mar 2025

https://github.com/jorgeatgu/dataset-elecciones-28a

Datasets generados a partir del dataset de elecciones generales de El País

28a data elecciones2019 elections spain

Last synced: 16 May 2026

https://github.com/a-poor/taro

A package for repeatable rectangular data transformations in Python.

data data-science data-transformation pipeline pypi-package python

Last synced: 13 Oct 2025

https://github.com/yash22222/olympic-games-analytics-using-apache-spark

The "Olympic Games Analytics Using Apache Spark Databricks" project explores data from the Olympic Games (1896-2016) to identify trends and insights. Using Apache Spark for big data processing and Databricks for visualization, the project analyzes key factors like top-performing countries and athlete attributes, showcasing real-world analytics.

apache apache-kafka apache-spark big-data-analytics csv data data-analytics data-visualization databricks excel mysql olympics regions

Last synced: 03 May 2026

https://github.com/hallmx/mx_utils

Utility scripts for software development in data science

colaboratory data development nbdev python science scripts software utlities

Last synced: 19 May 2026

https://github.com/gsmith257-cyber/BIT3434CVE

BI T3434 Project on data mining CVEs and Exploits

cve data data-mining exploits research-project

Last synced: 10 Mar 2025

https://github.com/nabilaagha/chest-x-ray-medical-diagnosis-using-deep-learning

This project uses deep learning to classify chest X-ray images for disease detection. It involves data preprocessing, pre-trained CNN models, and the ChestX-ray8 dataset to enhance medical diagnostics with AI.

computer-vision data data-processing deep-learning juypter-notebook medical-image-processing x-ray-images

Last synced: 15 Dec 2025

https://github.com/pratik-codes/zomato_data_eda

Cleaned, analysed messy data and created a predictive model with and accuracy of 93% with tree Regressor algorithm

bengaluru data data-cleaning data-science famous-restaurants restaurants-delivering-online restraunts

Last synced: 27 Mar 2025

https://github.com/encelo/nctracer-data

Data files for the ncTracer project

data icons ncine

Last synced: 15 Jan 2026

https://github.com/cemoktra/data_series

time series handling

data lazy-evaluation time-series

Last synced: 29 Oct 2025

https://github.com/luminati-io/zoominfo-dataset-samples

A sample dataset of over 1000 ZoomInfo companies, extracted using the Bright Data API, ideal for market growth, lead generation, and market analysis.

b2b business companies data data-extraction database dataset datasets web-scraping zoominfo

Last synced: 17 Mar 2025

https://github.com/noorkhokhar99/text-to-speech-demo

Text to Speech Demo

data python roboflow

Last synced: 27 Mar 2025

https://github.com/ahabdel/amazon-web-scraper

Amazon Web Scraper to scrape pricing adjustments and provide updates on a day to day basis

data web-scraping

Last synced: 29 Oct 2025

https://github.com/takamoso/umami

Cross browser compatibility data.

browser compat compatibility data dataset json

Last synced: 27 Mar 2025

https://github.com/annaanastasy/mushroom-binary-classification-eda-ml

Explored and modeled a competition dataset of mushroom species, focusing on data cleaning, exploratory data analysis, and building machine learning models for accurate classification of edible and poisonous mushrooms.

binary-classification data data-cleaning-and-preprocessing data-science exploratory-data-analysis machine-learning-algorithms xgboost-classifier

Last synced: 29 Mar 2025