An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/adadalshabab/data-engineering-gcp-project

An end-to-end modern data engineering project, including deployment of ETL pipeline on Google Cloud Platform, using BigQuery for data analysis and leveraging Looker to generate an insight dashboard.

bigquery data data-science data-visualization databases dataengineering-a engineering etl-pipeline looker-studio powerbi

Last synced: 19 Jan 2026

https://github.com/tyriek-cloud/nyc-dca-etl

Created an ETL pipeline to merge two CSV files (converted to JSON) into a parquet file using Azure Data Factory, The data was extracted from NYC Open Data: https://opendata.cityofnewyork.us/ and I created a Blob Container within an existing storage account.

azure azure-data-factory blob-storage data data-engineering etl-pipeline

Last synced: 21 Jan 2026

https://github.com/mikeschinkel/go-testdata-defaulter

Simple package for Go to set table-driven test data defaults so that tables in tests only need include data that differs from defaults.

data defaults package testing tests

Last synced: 13 Oct 2025

https://github.com/deepanshkhurana/facebook-birthdays

Python script to create a .csv from Facebook's Event Data to list Birthdays.

data facebook python

Last synced: 14 Oct 2025

https://github.com/tabarzin/dh

A collection of links to various resources on Digital Humanities

data digitalhumanities opensource

Last synced: 24 Jan 2026

https://github.com/polyee99/kaggle-titanic-data-analytics

Jupiter notebook to predict the outcome of passengers who died or not in the tragical Titanic event.

data eda jupiter-notebook matplotlib numpy pandas python regression-analysis test-train-split visualization

Last synced: 05 Feb 2026

https://github.com/mominurr/fire-gas-leak-detection-system

A real-time fire prevention system integrating IoT sensors and computer vision to trigger evacuations.

ai computer-vision data datascience machine-learning ml python yolo

Last synced: 27 Jan 2026

https://github.com/jpcurada/exploralytics

A python package for creating intermediate plotly visualizations

data eda plotly python visualization

Last synced: 05 Feb 2026

https://github.com/datamine/yelp-date

Does being on a date impact the score on a yelp review? Let's find out!

data ipython ipython-notebook pandas python python-2 yelp yelp-reviews

Last synced: 14 Apr 2026

https://github.com/intersystems-ib/workshop-smart-data-fabric

Learn the main ideas involved in developing a Smart Data Fabric using InterSystems IRIS

analytics data datafabric interoperability smart

Last synced: 14 Apr 2026

https://github.com/st-universe/data

The STU data assets

assets data stu

Last synced: 14 Mar 2026

https://github.com/jigyasag18/project-diwali-sales-analysis

This project analyzes retail sales data during the Diwali festival using exploratory data analysis (EDA) to identify buyer demographics and product preferences. The findings reveal that the primary purchasers are married women aged 26-35 from Uttar Pradesh, Maharashtra, and Karnataka, working in IT, Healthcare, and Aviation.

analysis data datapr datapro eda jupyter-notebook python realtimedata

Last synced: 01 Jun 2026

https://github.com/rizkipragustono/extract_from_excel

Excel Contact Data Parser with Country Code Formatting

data excel extract python transform

Last synced: 18 May 2026

https://github.com/poissonconsulting/klexdatr

An R package of data from the Kootenay Lake Exploitation Study

cran data fish kootenay-lake rstats

Last synced: 16 Oct 2025

https://github.com/tyriek-cloud/statistical-work-sample

The purpose of this study is to observe if a sample of people that has siblings is independent of a sample of people that possess an opinion of whether patients with incurable diseases should be allowed to die.

analysis data spss statistics t-test

Last synced: 22 Jan 2026

https://github.com/bdr-pro/streamlint

ltra-cool Streamlit app, where you can interact with widgets, see data in action, and even upload and download files

data streamlit

Last synced: 14 Apr 2026

https://github.com/vanduc1102/parse-stackoverflow-data

Parse stackoverflow data

data parser stackoverflow

Last synced: 16 Oct 2025

https://github.com/bhemen/aave-data

Borrowing and lending data sets from the Aave protocol on Ethereum

aave borrow data ethereum lend python

Last synced: 05 Feb 2026

https://github.com/psgebeline/harvard-data-science

My work for the nine courses in Harvard's data science program, each with notes/assignments. Work in progress.

data linear-regression machine-learning modeling probability-theory r visualization wrangling

Last synced: 19 Oct 2025

https://github.com/parvezk/d3-fundamentals

D3 library API fundamentals

charts d3 data graphs visualization

Last synced: 19 Oct 2025

https://github.com/octoenergy/tentaclio-snowflake

A python project containing all the dependencies for snowflake tentaclio schema.

data

Last synced: 20 Oct 2025

https://github.com/dilkushsingh/webscraping-with-selenium-and-beautifulsoup

Web Scrapped a popular tech gadgets website using Selenium and BeautifulSoup, also performed Data Analysis on scrapped data.

beautifulsoup data datacleaning datagathering eda exploratory-data-analysis python selenium webscraping

Last synced: 24 Feb 2026

https://github.com/mohibmirza-py/email-verifier-script

Streamlit app to verify emails in bulk

ai analysis data streamlit

Last synced: 29 Apr 2026

https://github.com/politicaargentina/opinar

📈 ICG toolbox for R - Indice de Confianza en el Gobierno 🇦🇷 (Universidad Torcuato Di Tella)

argentina data political-science politics public-opinion

Last synced: 22 Oct 2025

https://github.com/robertoostenveld/dcn.dsc_62002071_01_114_v1

Simon task M/EEG data [Data set].

data datalad open-data

Last synced: 23 Jan 2026

https://github.com/shubhamsoni98/prediction-with-binomial-logistic-regression

To predict client subscription to term deposits and optimize marketing strategies by identifying potential subscribers.

binomial data data-science eda machine-learning matplotlib pipeline python scikit-learn seaborn sklearn sql visualization

Last synced: 06 Feb 2026

https://github.com/andrewl/danelaw

Geopackage containing the boundary of the Danelaw

data geospatial medieval viking

Last synced: 23 Jan 2026

https://github.com/sankooc/validatez

object validation for node

data validate

Last synced: 13 May 2026

https://github.com/brianlesko/r_data_science_stat5730

Written by Brian Lesko, the repository contains R Scripts demonstrating data science topics largely originating from study at Ohio State. Contents are written in R studio using the R markdown file. As of 1/21/23 Future projects concerning data science, statistics, and machine learning will be in python in my machine learning Repository

data data-analysis flight-data ggplot2 olympics-data r-markdown tidyverse

Last synced: 23 Jan 2026

https://github.com/harmanveer-2546/reducing-data-entries

Way to delete data entries from csv/excel file using. For excel file, use excel instead of csv in the code.

csv data data-entry delete-data excel numpy pandas python

Last synced: 05 May 2026

https://github.com/mikeasilva/api_data

API Data makes working with open data APIs easy.

api data python

Last synced: 23 Jan 2026

https://github.com/byndyusoft/byndyusoft.data.relational

Relational abstractions for Byndyusoft.Data.Relational.

byndyusoft data dataaccess db relational-databases

Last synced: 25 Oct 2025

https://github.com/metapsy-project/data-psychosis-psyctr

Database of psychological interventions for schizophrenia and psychosis compared to control conditions.

data

Last synced: 16 Mar 2026

https://github.com/encelo/wetpaper-data

Data files for the WetPaper project

data icons ncine

Last synced: 23 Jan 2026

https://github.com/alsult/alsult

Aliia Sultanova Portfolio

data datascience programming python

Last synced: 23 Jan 2026

https://github.com/prateekmaj21/tableau-public-links

Tableau work as part of Data Visualization [AI&DS_205]

data data-visualization dataanalytics tableau-public

Last synced: 24 Jan 2026

https://github.com/mfurmanczyk/wh-sales

E-commerce analytics data warehouse ETL made with Apache Spark.

airflow data data-engineering data-warehouse kotlin python spark

Last synced: 24 Jan 2026

https://github.com/robertoostenveld/dccn.dsc_3015055.00_583_v1

The FieldTrip-SimBio Pipeline for EEG Forward Solutions [Data set].

data datalad open-data

Last synced: 24 Jan 2026

https://github.com/woctezuma/hidden-gems-data

Data available to compute regional rankings of hidden gems.

data hidden-gems steam steam-reviews

Last synced: 06 Feb 2026

https://github.com/semcod/code2llm

Python Code Flow Analysis Tool - Static analysis for control flow graphs (CFG), data flow graphs (DFG), and call graph extraction

ast cfg code code2data code2logic code2process data dfg diagram flow graphs llm

Last synced: 01 Jun 2026

https://github.com/eugenedakin/des-encryption-decryption

Encrypt and Decrypt text in Xojo using DES - Written in Native Xojo Language - Cross Platform

data data-encryption-standard decryption des encryption standard xojo

Last synced: 24 Feb 2026

https://github.com/atharvapathak/twitter_sentiment_analysis_project

Twitter sentiment analysis is the process of analyzing tweets posted on the Twitter platform to determine the overall sentiment expressed within them. It involves using natural language processing (NLP) and machine learning techniques to classify tweets.

api bag-of-words bert cnn data gbm nltk rnn spacy twitter

Last synced: 28 Jan 2026

https://github.com/dynamiatools/module-importer

DynamiaTools extension to work with excel files for import data

data dynamia excel import java zk

Last synced: 06 Feb 2026

https://github.com/cmdrvl/rvl

rvl reveals the smallest set of numeric changes that explain what actually changed between two datasets — or confidently tells you nothing changed.

cli csv data data-quality data-validation diff finance numerical-analysis open-source ops rust tooling

Last synced: 25 Feb 2026

https://github.com/spatialcurrent/go-flat

Recursively flatten a slice of slices.

big-data bigdata data

Last synced: 29 Jan 2026

https://github.com/spatialcurrent/go-counter

Simple library and command line program for generating frequency distributions.

big-data bigdata data

Last synced: 29 Jan 2026

https://github.com/nasa-pds/nucleus

Nucleus is a software platform used to create workflows for the Planetary Data (PDS).

data ingestion pds planetary workflow

Last synced: 06 Feb 2026

https://github.com/audeering/datasets

Data cards for public audb datasets

audb audio data management

Last synced: 29 Jan 2026

https://github.com/aimin-nur/data-analyst-model-predictive

Sebuah Project data analyst yang bertujuan untuk mengindentifikasi karakteristik customer untuk menerima penawaran campaign marketing.

analyst data mechine-learning visualization

Last synced: 29 Jan 2026

https://github.com/apoorv74/njdg-stats

Tracking data from the National Judicial Data Grid's (NJDG) district courts portal

data git-scraping judiciary law

Last synced: 29 Jan 2026

https://github.com/tpltnt/wir_vs_virus_hackathon_projects

A list of all projects / challenges for the WirVsVirus hackathon as CSV

coronavirus csv data hackathon raw-data

Last synced: 29 Jan 2026

https://github.com/apigear-io/template-qtcpp

QtC++ technology template

data plugin qml qt qt5

Last synced: 25 Feb 2026

https://github.com/dfsp-spirit/neuroimaging_testdata

Contains test data for unit tests, used in developing neuroimaging software. Ignore this. Licenses in the individual archives.

data unittesting

Last synced: 25 Feb 2026

https://github.com/chenxingqiang/modeling_tabular_data

# modeling_tabular_data | Keywords: modeling_tabular_data focusing on modeling_tabular_data.

data modeling tabular

Last synced: 30 Jan 2026

https://github.com/rosacarla/databases

Bases de dados utilizados em atividades práticas do MBA Data Analytics do IGTI.

data data-analytics dataset

Last synced: 19 Mar 2026

https://github.com/bearaujus/bdatamatrix

Structured Tabular Data Management in Go

data go golang matrix

Last synced: 30 Jan 2026

https://github.com/bubblymaps/bubblymaps

The open source bubbler map. Mapping the world's water fountains. Open Code, Open Data.

bubbler bubbly-maps data fountain map open-source water

Last synced: 31 Jan 2026

https://github.com/opdev1004/totjs

Not totally new but a file format for managing human readable data in a file. JS version.

data data-storage data-store database database-management hacktoberfest hactoberfest-accepted nodejs

Last synced: 31 Jan 2026

https://github.com/azmag/spm-dashboard

System Performance Measures are a selection of criteria used by Department of Housing and Urban Development (HUD) to evaluate how local Continua of Care are performing.

data human-services spm

Last synced: 31 Jan 2026

https://github.com/pythoncoderunicorn/jamesbeardaward

a repo for James Beard Award data

data dataset jamesbeard

Last synced: 07 Feb 2026

https://github.com/drostlab/biodbretrievr

Retrieve and efficiently index entire biological sequence databases

biological-data biological-sequences data databasestoring retrieval

Last synced: 26 Feb 2026

https://github.com/mahtabranjbar/onlineshopping_analysis_dashboard

This project analyzes online shopper behavior using various machine learning models and EDA techniques.

dashboard data dataanalysis eda machine-learning streamlit

Last synced: 08 Feb 2026

https://github.com/gman-au/white-knight

Experimental .NET data abstraction using specification pattern

abstractions data datastore dotnet repository-pattern specification-pattern

Last synced: 17 Mar 2026

https://github.com/michaelfromyeg/lyrics

Lyric-store and API hosted on Git.

data lyrics

Last synced: 08 Feb 2026

https://github.com/suhanyujie/ai-driving-data

some AI driving data

ai-driving-car data

Last synced: 08 Feb 2026

https://github.com/neurazum-ai-department/tumor-stages-dataset---v1

Synthetic MRI data generated by the ‘HF’ and 'Vbai' models based on real data.

brain data dataset datasets image mri neuroscience tumor tumor-segmentation

Last synced: 18 Mar 2026

https://github.com/dysnomia-studio/achieve-games-dump

Dump parts of achieve.games database to public including Steam Games List

data dump games steam steam-api steam-game steam-games

Last synced: 27 Feb 2026

https://github.com/enescidem/twitter-topic-modeling

Topic modeling is an unsupervised method to identify topics in text. This project analyzes tweets from prominent Turkish accounts to uncover underlying themes in their shared content.

data data-science machine-learning nlp topic-modeling twitter x

Last synced: 10 Feb 2026