An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/yugoff/ml-kaggle-regression-with-a-mohs-hardness-dataset

Your Goal: For this Episode of the Series, your task is to use regression to predict the Mohs hardness of a mineral, given its properties

data gradient-boosting kaggle kaggle-competition regression-models

Last synced: 18 May 2026

https://github.com/mkshah605/personal-brand-development

A data-driven approach to a personal brand development project.

branding data data-science growth music personal

Last synced: 12 Sep 2025

https://github.com/roggersanguzu/weather-medical-expense-prediction-ml-models

This repo contains a model for determining the rainfall patterns and another for medical expense prediction model

data data-analysis data-science datasets joblib machine-learning machine-learning-algorithms scikitlearn-machine-learning

Last synced: 30 Aug 2025

https://github.com/opengeoshub/vdownload

A Powerful Geospatial Data Downloader

data geospatial opendata

Last synced: 19 May 2026

https://github.com/ate47/playerdata

Get data about a player with a command

bukkit-plugin command data spigot-plugin

Last synced: 30 Aug 2025

https://github.com/kammarah/studentdata

I created & deployed a Streamlit app to store, manage & analyze student data. 📊🎓

connection data data-analysis data-visualization deploy deployments libraries python streamlit streamlit-webapp webapp

Last synced: 18 May 2026

https://github.com/annaanastasy/mushroom-binary-classification-eda-ml

Explored and modeled a competition dataset of mushroom species, focusing on data cleaning, exploratory data analysis, and building machine learning models for accurate classification of edible and poisonous mushrooms.

binary-classification data data-cleaning-and-preprocessing data-science exploratory-data-analysis machine-learning-algorithms xgboost-classifier

Last synced: 29 Mar 2025

https://github.com/encelo/nctracer-data

Data files for the ncTracer project

data icons ncine

Last synced: 15 Jan 2026

https://github.com/pratik-codes/zomato_data_eda

Cleaned, analysed messy data and created a predictive model with and accuracy of 93% with tree Regressor algorithm

bengaluru data data-cleaning data-science famous-restaurants restaurants-delivering-online restraunts

Last synced: 27 Mar 2025

https://github.com/haykam821/circle-tracking

A tool for generating Markdown tracking of the Circle of Trust experiment.

circle data markdown reddit subreddit tracker trust

Last synced: 19 May 2026

https://github.com/gsmith257-cyber/BIT3434CVE

BI T3434 Project on data mining CVEs and Exploits

cve data data-mining exploits research-project

Last synced: 10 Mar 2025

https://github.com/hallmx/mx_utils

Utility scripts for software development in data science

colaboratory data development nbdev python science scripts software utlities

Last synced: 19 May 2026

https://github.com/yash22222/olympic-games-analytics-using-apache-spark

The "Olympic Games Analytics Using Apache Spark Databricks" project explores data from the Olympic Games (1896-2016) to identify trends and insights. Using Apache Spark for big data processing and Databricks for visualization, the project analyzes key factors like top-performing countries and athlete attributes, showcasing real-world analytics.

apache apache-kafka apache-spark big-data-analytics csv data data-analytics data-visualization databricks excel mysql olympics regions

Last synced: 03 May 2026

https://github.com/itrauco/data-dirtying-tool

a simple command line tool to generate dirty data and do common data things in google cloud

data data-analysis data-engineering data-ops data-pipeline data-science data-visualization data-wrangling dirty-data google-cloud machine-learning

Last synced: 24 Feb 2025

https://github.com/a-poor/taro

A package for repeatable rectangular data transformations in Python.

data data-science data-transformation pipeline pypi-package python

Last synced: 13 Oct 2025

https://github.com/solrikk/vargen

VarGen (Variation Generator) is a user-friendly desktop application designed to simplify the creation of product variations from CSV files.

csv-files csv-format csv-parser data data-engineering excel excelparser python

Last synced: 29 Mar 2025

https://github.com/ngupta23/data_prep_helper

A helper package for preparing and combining data from a variety of sources

data data-science dataprep datapreparation dataprocessing helpers python

Last synced: 03 Apr 2025

https://github.com/bilgehangecici/datatypeconverter

Converting integer and floating numbers to appropriate bit-level representation.

data datatypeconverter java machine-level variables

Last synced: 30 Mar 2025

https://github.com/passly-nl/data

Source code of the data layer.

data passly ticketing typescript

Last synced: 27 May 2026

https://github.com/yourdataarchitect/french-realestate-data-pipeline

This repository contains a fully automated data pipeline built with Apache Airflow to extract, clean, analyze, and report real estate listings from Seloger. It pushes data to MongoDB, Elasticsearch, and Google Sheets, with real-time Slack alerts for monitoring.

airlfow data datanalysis datapipeline market-intelligence real-estate

Last synced: 31 Dec 2025

https://github.com/pedelriomarron/spanish-api-covid19

Data from Spain of COVID-19 (by Datadista) as a service

api covid-19 covid-19-spain data now spain zeit

Last synced: 12 Mar 2025

https://github.com/bodfdaf/api

api data service provider

api data detail instagram lazada shopee tiktok video

Last synced: 11 Mar 2025

https://github.com/onyxwizard/coding-challenges

A collection of fundamental recursion problems solved in Java, demonstrating core concepts like base cases, recursive decomposition, and problem-solving strategies for beginners. Perfect for mastering the art of thinking recursively!

algomaster algorithm-challenges algorithms algorithms-and-data-structures coding data datastructures hackerrank java java-8 leetcode neetcode takeuforward w3schools

Last synced: 03 Jul 2026

https://github.com/fliplet/fliplet-widget-data-source-query

Data Source Query Provider

data provider widget

Last synced: 11 Apr 2025

https://github.com/vin20777/drone-data-layer

Drone Project Data Layer

csharp data drone layer software-design

Last synced: 18 May 2026

https://github.com/apparaomulpuri/readline

Explains you the usage of readLine function in Swift.

data fromkeyboard keyboard reading readline swift

Last synced: 29 Mar 2025

https://github.com/encoreshao/data-science

Data analyze examples, using Jupyter notebook and Python!!!

data dataanalysis encore jupyter-notebook

Last synced: 29 Mar 2025

https://github.com/phette23/nces-ipeds-archive

download NCES IPEDS data

data datarescue ipeds nces

Last synced: 30 Jun 2026

https://github.com/kameronbrooks/datalys2-reporting

Datalys2 Reports allows you to create rich, interactive reports by simply defining a JSON configuration embedded in your HTML. It handles the layout, data visualization, and interactivity, so you don't need to write custom React code for every report.

data data-visualization html react

Last synced: 08 Apr 2026

https://github.com/nathanieliskandar26/data-analysis-project

This project demonstrates my ability to clean and analyze data using Python and SQL so far. The dataset used for this analysis focuses on general customer information. Through this project, I aimed to uncover meaningful insights and trends by cleaning the data and performing structured queries.

analysis data data-cleaning jupyter-notebook mysql mysql-database python

Last synced: 19 Apr 2026

https://github.com/shahules786/titanic-analysis

different analysis of titanic accident (data from kaggle)

analyze data titanic-kaggle

Last synced: 26 Jun 2025

https://github.com/jigyasag18/financial-risk-analysis-project

The Credit Card Financial Risk Analysis Dashboard is a real-time Power BI tool designed to provide insights into credit card transactions and customer demographics. It features interactive visualizations, efficient data processing, and actionable insights to support decision-making. Utilizing data from SQL database, the dashboard tracks key metrics

data dataanalysis database datacleaning datapreprocessing dataprocessing datavisualization financial-analysis financialriskanalysis mysql powerbi sql statistical-analysis

Last synced: 06 Mar 2026

https://github.com/pawlo77/messenger-analyser

Repo for Data Visualization project, part of IAD study program at Faculty of Mathematics and Information Science, Warsaw University of Technology

data visualization

Last synced: 17 May 2026

https://github.com/ramtinsoltani/safe-cli

A simple Command-line Interface which encrypts and decrypts UTF-8 files using AES-256.

aes-256 cli data data-hook decryption encryption generator handlebars hooks markup partial partial-decryption password safe swap temp temporary tool

Last synced: 16 Apr 2026

https://github.com/injamul3798/cpp_stl-discussion

As we know ,STL is mostly used tools is competitive programming.

data list map set structure vector

Last synced: 02 Apr 2025

https://github.com/amarlearning/exploring-the-evolution-of-linux

Data Analysis about the development of the Linux operating system by exploring its Git repository history.

cleaning-data data data-analysis data-wrangling datacamp first-commit git-history linux

Last synced: 12 May 2026

https://github.com/talitalobo/statistics-with-python

Repo about statistical concepts and (not always) their python implementation.

data data-science machine-learning statistics

Last synced: 11 Jan 2026

https://github.com/eyluldursun/data-science-project

This project involves a data science analysis conducted on the Obesity Data Set. The study explores factors influencing obesity, includes data visualization, and develops predictive models. The goal of the project is to gain insights to help prevent obesity.

data data-science obesity r rmarkdown

Last synced: 26 Jun 2025

https://github.com/mightymetrika/scdtb

Single Case Design Toolbox

data math r science statistics

Last synced: 04 Jan 2026

https://github.com/tomcardoso/journalism-data-intersection

A talk on working at the intersection of journalism and data science

data data-journalism journalism

Last synced: 15 May 2025

https://github.com/lukaszkn/data-software-engineering-interview-questions

Data and Software engineering interview questions

data engineering interview-questions python

Last synced: 20 Jul 2025

https://github.com/ahmad-ali-rafique/logistic-regression-modeling

An in-depth exploration of logistic regression models, including data cleaning, model building, and performance evaluation on various datasets.

accuracy confusion-matrix data dataanalytics logistic-regression logistic-regression-classifier machine-learning-algorithms mlmodels model modelling regression-models

Last synced: 11 Sep 2025

https://github.com/wilcotomassen/lorem-datum-core

Java based data generator for data simulation

data dataset generator java lorem-ipsum simulated-data

Last synced: 11 Jan 2026

https://github.com/kaizadp/bbwm_moisture

HOBO data for soil moisture - Bear Brook Watershed in Maine

data hobo-data soil-moisture

Last synced: 17 May 2026

https://github.com/akashlogics/street-data-tracking

Detect, Track and Count number of persons walking across the path(s) making use of YOLO. This Python project tracks people moving across predefined street zones

analysis data excel newdataset object-detection opencv python python3 yolo

Last synced: 19 May 2026

https://github.com/bishtrishu/netflix_movies_dashboard

This project is a comprehensive dashboard for analyzing Netflix movies and shows. Using a combination of Power BI, Python, and Excel, this dashboard provides insights into various aspects of Netflix's content library.

ai artifical-intelligense dashboard data dataanalysis dataanalyst dataanalytics datacleaning datahandling datascience datavisualization excel machine-learning msexcel powerbi report

Last synced: 09 Feb 2026

https://github.com/tadiusfrank2001/pythonprojects

Compilation of Some Fun Introduction to Python Lab Coding Projects introducing the foundamentals of data science, databases, and pythonlibraries

data data-science databases gamedesign python pythonlibrarires sorting-algorithms sqlite string-manipulation

Last synced: 06 May 2026

https://github.com/publici/state-integrity-data

Data from a comprehensive assessment of state government accountability and transparency

data

Last synced: 04 Feb 2026

https://github.com/igor-starostenko/sabre

Slice your files like a champ with **sabre**

data golang package

Last synced: 28 Mar 2025

https://github.com/metapsy-project/data-panic-psyctr

Database of psychotherapy for panic disorder compared to control conditions

data

Last synced: 18 Mar 2026

https://github.com/rrwen/twitter2mongodb-cli

Command line tool for extracting Twitter data to MongoDB databases

api cli cmd command data database get interface line mdb media mongo mongod mongodb post social stream tool tweet twitter

Last synced: 06 May 2026

https://github.com/neurazum-ai-department/tumor-stages-dataset---v1

Synthetic MRI data generated by the ‘HF’ and 'Vbai' models based on real data.

brain data dataset datasets image mri neuroscience tumor tumor-segmentation

Last synced: 18 Mar 2026

https://github.com/ludreinsalvador/global-covid-19-data-analysis

Contains Power BI dashboards that visualizes and analyzes global COVID-19 cases, deaths, and vaccination trends using data from the World Health Organization (WHO). The project aims to provide insights into the pandemic’s impact and vaccination progress worldwide through dynamic reports and advanced analytics.

analytics covid-19 covid19-data data data-analysis data-collection data-transformation data-visualization

Last synced: 26 Feb 2026

https://github.com/davidkhala/ai

GenAI index

data dify huggingface

Last synced: 27 Feb 2026

https://github.com/enescidem/twitter-topic-modeling

Topic modeling is an unsupervised method to identify topics in text. This project analyzes tweets from prominent Turkish accounts to uncover underlying themes in their shared content.

data data-science machine-learning nlp topic-modeling twitter x

Last synced: 10 Feb 2026

https://github.com/softloud/spunk

Nutritional interventions for male infertility: a systematic review and meta-analysis

cochrane data evisynth living

Last synced: 18 Mar 2026

https://github.com/azkarmoulana/winter-of-data-2019

:snowflake: :snowman: Winter of Data is coming..... :wolf:

data data-science machine-learning mathematics

Last synced: 05 Feb 2026

https://github.com/ekoepplin/dbt-bigquery-core

How to get data to BigQuery (or duckDB) and setup dbt tests for SODA cloud monitoring

bigquery data data-quality dbt dlt duckdb gcp soda

Last synced: 06 May 2026

https://github.com/bastianolea/simel_mercado_laboral

Datos estadísticos de SIMEL, obtenidos mediante web scraping

chile data genero laboral social tiempo

Last synced: 17 Jun 2026

https://github.com/bastianolea/minsal_suicidios

Casos de intento de suicidio y suicidio consumado en Chile

chile comunas data genero salud tiempo

Last synced: 19 Jan 2026

https://github.com/dahsie/machine_learning_from_scratch

This project aims to implement some machine learning basic techniques(e.g. MinMaxScaler, StandardScaler, TD-IDF, PCA, Logistic Regression, LDA, KNN, Naive Bayes Classifier) using only pyton, numpy and pandas. This will enable me to have hone my data scientist skills

classification clustering data data-processing datascience machienlearning nlp nltk numpy pandas python regression

Last synced: 04 May 2026

https://github.com/fiedsch/data_util

misc. Utilities for data files like variable name lists

data helper management php

Last synced: 14 Jun 2025

https://github.com/unkaktus/pktconn

wrapper around io.ReadWriteCloser that implements gopacket's 'device'

connection data gopacket packet

Last synced: 29 May 2026

https://github.com/miozilla/snowden

snowden :snowman::video_game: : VR Game # Snowflake # Data Engineering # ELT

data elt engineering snowflake sql vr-game

Last synced: 11 Feb 2026

https://github.com/sweta-kaundilya/power-bi-learning-projects

This repository contains completed exercises while learning Power BI

data datavisualization dax powerbi powerquery

Last synced: 27 Feb 2026

https://github.com/praveendecode/retail-revenue-forecasting

Designed an end-to-end ML model pipeline, forecasting department-wide sales by accounting for holiday markdown effects, spanning data collection to inferencing.

azure collection data datapreprocessing docker exploratory-data-analysis feature-engineering featureimportance model modelbuilding modeldeployment modelselction python report tableau

Last synced: 16 Apr 2026

https://github.com/bastianolea/mineduc_desvinculacion

Tasas de incidencia de desvinculación de estudiantes de enseñanza básica y media, por año, comuna y género.

chile comunas data educacion social tiempo

Last synced: 10 Oct 2025

https://github.com/pbinkley/tweets-national-emergency-library

A twarc harvest of tweets related to Internet Archive's National Emergency Library (2020-03-23 to 2021-02-13)

data social

Last synced: 11 Feb 2026

https://github.com/code-str8/time-series-forecasting

Developing a model that effectively forecasts the unit sales of numerous items across various Favorita stores with precision.

data dataanalysis forcasting machine-learning time-series visualizations

Last synced: 31 Mar 2025

https://github.com/redgoose-dev/baguni

이미지를 보관하고 탐색하는 웹 프로그램

data explorer file management upload

Last synced: 14 Apr 2026

https://github.com/project-renard/test-data

Files for testing

data

Last synced: 27 Feb 2026

https://github.com/kaiepi/ra-annotations

Thread-safe static buffer

data type

Last synced: 13 Jul 2025

https://github.com/khalyomede/request

Function to validate request data for V.

data function request validate vlang

Last synced: 12 Feb 2026

https://github.com/vianneymi/amplifai

Amplifai is a package that allows you to transform your raw unstructured text into structured data in a few lines of codes.

data data-mining extraction langchain llm pydantic

Last synced: 27 Feb 2026

https://github.com/fabsdevx/file-format-converter-handout

Data Engineering project for learning purposes. Credits to itversity

csv csv-import data data-engineering database pandas python

Last synced: 06 May 2026

https://github.com/deliprofesor/breast-cancer-detection-using-svm-with-smote-and-model-optimization

This project analyzes health and lifestyle factors influencing heart attack risk using statistical methods and machine learning, with Ridge Regression identified as the best predictive model.

classification data data-preprocessing data-science data-visualization gridsearchcv machine-learning python roc-curve smote svm

Last synced: 10 Apr 2025

https://github.com/sillyash/untappd-viz

A data visualisation page using public datasets and HTML/CSS/JS with D3.js.

beer beer-statistics data data-analysis data-visualization kaggle kaggle-dataset public-dataset school-project

Last synced: 18 May 2026

https://github.com/charlenry/python_data_science

Mes notebooks de travaux pratiques sur Python pour la Data Science

analysis data dataframe jupyter kaggle matplotlib notebook numpy pandas pyplot python science seaborn visualisation

Last synced: 25 Jun 2026

https://github.com/juanpablodiaz/beertv

A Next.js Full Stack app to displays funny Beer TV Ads

api-routes data next tailwindcss

Last synced: 07 May 2026

https://github.com/luminati-io/ZoomInfo-dataset-samples

A sample dataset of over 1000 ZoomInfo companies, extracted using the Bright Data API, ideal for market growth, lead generation, and market analysis.

b2b business companies data data-extraction database dataset datasets web-scraping zoominfo

Last synced: 09 Apr 2025