An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/cobluestars/dataherd-raika

"Dataherd-Raika is a library designed to simulate large-scale user behavior datasets. It takes a single user event (like a click or keyword input) and, by applying simple probability distributions and custom variables, expands it into a vast dataset."

big-data data data-generation data-generator data-science front-end javascript machine-learning npm-package simulator statistics typescript user-behavior user-experience

Last synced: 02 Jan 2026

https://github.com/alexandregazagnes/ghisa

ghisa - Github Import Statistic Analyzer is a free and open-source software, app and python package that helps you to analyze the import statistics of your github repositories.

analytics data dependencies git github github-api import package pypi python skills tool

Last synced: 27 Jun 2025

https://github.com/adrian-pasek-prv/data-modeling-with-cassandra

Create a data model in Apache Cassandra for music streaming app

apache-cassandra data data-engineering data-modeling python

Last synced: 02 Jan 2026

https://github.com/beangreen247/osfetch-old.sh

script that fetches system information and displays it to the user

247 bash bean beangreen247 data fetch green information neofetch neofetch-clone os script sh shell storage system tem zsh

Last synced: 02 Nov 2025

https://github.com/ibz-04/data-encryption

Encrypting and Decrypting given data of hospital patients such as: audio & image files

data decryption encryption

Last synced: 23 Jul 2025

https://github.com/oguzgn/a-case-study-for-a-livestreaming-platform

This project aims to analyze livestream watch times of users across different regions. The goal is to identify the top 5 users with the highest watch time for each region. The analysis involves multiple SQL transformations to extract meaningful insights from the data.

bigquery data data-analysis data-modeling live-streaming sql

Last synced: 23 Jun 2025

https://github.com/bredalis/matplotlib

📊 Library to create graphs in Python 📊

data graphics librery matplotlib matplotlib-pyplot python

Last synced: 30 Mar 2025

https://github.com/vulcalien/vulcdataformat

Simple data storage system for Java.

data data-storage java serialization

Last synced: 25 Feb 2025

https://github.com/e-kotov/mapineqr

Access Mapineq inequality indicators via API

data demogrpahy r rstats socio-economic-indicators

Last synced: 06 Apr 2025

https://github.com/mierune/tinygrib2

(experimental) A tiny toolkit for parsing JMA's GRIB2 files.

data grib grib2 meteorology rust weather

Last synced: 27 Jun 2025

https://github.com/tobinchilongo/oop-school-library

This project consists of Ruby script for the school library app. I implemented encapsulation and inheritance with Ruby by creating classes to represent students and teachers in the school.

data database gemfile input-output preserve rspec-testing rubocop unit-test

Last synced: 02 May 2026

https://github.com/bhpcv252/dda-binapprox-on-fits

Using the binapprox algorithm to efficiently estimate the median of each pixel from a set of astronomy images in FITS files.

astronomy data median python

Last synced: 22 Mar 2025

https://github.com/jensz12/uhc

Datapack til Minecraft 1.13+ UHC

data minecraft pack

Last synced: 21 Sep 2025

https://github.com/maxnowack/elastic-sync

Connector to sync mongodb documents into a elasticsearch index

data elasticsearch mongodb sync

Last synced: 20 Jan 2026

https://github.com/ferhatgec/tuc

TinyUrl CLI, generate short link/s from terminal.

data little python3 request script

Last synced: 18 Feb 2026

https://github.com/stdlib-js/ndarray-empty

Create an uninitialized ndarray having a specified shape and data type.

data empty javascript matrix ndarray node node-js nodejs stdlib structure types vector

Last synced: 14 May 2025

https://github.com/jen-uis/loan-status-prediction

This repository contains project materials for the Winter STAT 206 class, University of California, Riverside, A. Gary Anderson School of Management.

data data-analysis data-analytics data-cleaning data-visualization descriptive-analytics julia julia-language jupyter-notebook predictive-analytics predictive-modeling team-collaboration

Last synced: 02 Jan 2026

https://github.com/stefanpietrusky/facts

Repository for the article in the online magazine Data Science Collective.

ai arxiv-papers beautifulsoup data flask-application gensim llama matplotlib ollama plotly pyldavis python selenium webdriver

Last synced: 09 May 2026

https://github.com/umbaji/yodi

This is the official repository for Yodi, the speech recognition model for 8 words, in Ewè. The yodi package is also useful for rapid inference inference on speech data, especially on the mini_speech datasets.

data data-visualization keras python3 speech-recognition tensorflow

Last synced: 12 Jan 2026

https://github.com/kingsley-ezenwaka/app-profile-data-analysis

A Python data analysis project that aims to propose an app profile based on analysis of Google Playstore dataset.

analysis data jupyter-notebook matplotlib pandas python seaborn

Last synced: 29 Apr 2026

https://github.com/canelmas/data-producer

Fake data producer for Kafka, console and http endpoints

data fake-content fake-data fakerjs kafka kafka-producer

Last synced: 05 Apr 2025

https://github.com/priyanshubiswas-tech/aws-etl-pipeline-on-cloud-using-glue-athena-lambda-and-redshift

Serverless ETL pipeline on AWS using Glue, Lambda, Athena, and Redshift — automates data ingestion, transformation, and analytics with scalable, event-driven architecture.

athena aws aws-glue data data-engineering etl etl-pipeline lambda redshift

Last synced: 02 May 2026

https://github.com/davidgamero/gatech-covid-chart

Line chart showing COVID19 cases per day at Georgia Tech

covid covid19 data gatech

Last synced: 04 Jul 2026

https://github.com/nitsc/spell-from-threebodytrilogy

Implemented the process of extrapolating from Gaia stellar data, to 3D visualizations, to three-views, to three-view signals, to three-view audio of signals, and even their inversions. This project proves the feasibility of the Logic (Luoji)'s “spell” from “The Three Body Problem” trilogy.

3d 3d-graphics astronomy astronomy-astrophysics audio audio-processing data data-science data-visualization gaia graph information-technology information-visualization numpy python python-3 python3 signal signal-processing visiualization

Last synced: 02 May 2026

https://github.com/priyanka7411/customer-flight-prediction-app-mlflow

A comprehensive project predicting flight prices and customer satisfaction using machine learning models, deployed through interactive Streamlit apps.

classification customer-satisfaction data data-cleaning data-visualization feature-engineering flight-price-prediction machine-learning mlflow python regression streamlit

Last synced: 12 May 2026

https://github.com/tushar2704/interview-quest

Interview-Quest is comprehensive collection of interview questions and answers that can help you prepare for technical interviews. Whether you're a seasoned developer looking to brush up on your skills or a job seeker preparing for your next big opportunity, this repository aims to provide valuable resources to enhance your interview readiness.

artificial-intelligence data data-science interview interview-questions machine-learning

Last synced: 23 Jan 2026

https://github.com/eddybrando/peru-year-names

Directory of Peru's official year names

data json peru

Last synced: 23 Jul 2025

https://github.com/dhimmel/erc

Processing human Evolutionary Rate Covariation data

data erc evolution evolutionary-rate-covariation genes hetionet human rephetio

Last synced: 23 Jul 2025

https://github.com/cyberoctane29/cyclistic-bike-share--analyzing-rider-behavior

Analyzed Cyclistic's bike-share data to uncover usage differences between casual riders and annual members. Utilized SQL and MySQL for data processing, R for visualisation, and Kaggle for collaboration. Insights will guide marketing strategies to convert casual riders into annual members.

data dataanalysis dataanalytics database rlanguage rmarkdown spreadsheet sql

Last synced: 22 May 2026

https://github.com/tupizz/data-processing-pipeline-aws

This project is a serverless application built with the Serverless Framework, TypeScript, and AWS services. It provides an enrichment service that processes contact information and enriches it with additional data.

aws data pipeline serverless typescript

Last synced: 13 May 2026

https://github.com/phatdev12/diem-thi-tuyen-sinh-10-da-nang

Danh sách điểm thi tuyển sinh 10 Đà Nẵng 2023-2024

data data-science dataanalytics dataset json

Last synced: 28 Jun 2025

https://github.com/tbrowder/classfactory

Provides tools to create a data collection with classes to manipulate the persistent data.

class data persistent raku

Last synced: 04 Apr 2025

https://github.com/sarincr/basics-of-julia-programming-language

Julia is a high-level, high-performance, dynamic programming language. While it is a general purpose language and can be used to write any application, many of its features are well-suited for high-performance numerical analysis and computational science.

data data-analysis data-mining data-science data-visualization dataanalysis dataanalytics datascience julia julia-language julia-library julia-package julialang machine-learning

Last synced: 19 May 2026

https://github.com/ybelenko/openapi-data-mocker-server-middleware

PSR-15 HTTP Server Middleware to create mock responses from OpenAPI Schemas(OAS 3.0).

data fake faker middleware mock mocker oas oas3 openapi psr-15 swagger

Last synced: 15 Jun 2025

https://github.com/raigu/ordered-lists-sync

Library for synchronizing ordered data with the minimum of insert and delete operations. Suitable for lage data sets in isolated environments

data lists ordering sync syncrhonization update

Last synced: 12 Jan 2026

https://github.com/real-veersandhu/cia-country-comparison

Data analysis system on the CIA World Factbook

data

Last synced: 25 Feb 2025

https://github.com/lunastev/wson-rust

WSON data serialization parser

data parser serialization

Last synced: 07 Apr 2025

https://github.com/sandravizz/global_inequality_story

Dataviz Project about Global Inequality

data data-visualization inequality

Last synced: 03 Jul 2025

https://github.com/kevinsames/spark-fuse

spark-fuse is an open-source toolkit for PySpark — providing utilities, connectors, and tools to fuse your data workflows together.

data databricks fabric pyspark python spark

Last synced: 08 May 2026

https://github.com/thomd/git-scrape-hacker-news

scrape hacker news metadata for data analysis

data data-science git-scraping hacker-news

Last synced: 16 Sep 2025

https://github.com/stdlib-js/array-base-any-by-right

Test whether at least one element in an array passes a test implemented by a predicate function, while iterating from right to left.

any array data generic javascript node node-js nodejs predicate some stdlib structure test types validate

Last synced: 14 Apr 2025

https://github.com/jonsafari/toy-data

Embeddable submodule of parallel/monolingual text data, for use in testing code and sanity checks

data language-data machine-translation nlp sanity-checks toy-data

Last synced: 06 Nov 2025

https://github.com/epogrebnyak/business-conditions-digest-2017

Replicate illustration from Business Conditions Digest

data economics

Last synced: 22 Mar 2025

https://github.com/aruneshbasak/python-dsa-problems-geeksforgeeks-160-days

I will upload my daily Python DSA problems solved on GeeksforGeeks and post it here!

algorithms-and-data-structures and data data-structures dsa python python3 structure

Last synced: 08 May 2025

https://github.com/qeeqbox/data-security

Safeguarding your personal information (How your info is protected)

data data-security infosecsimplified qeeqbox security

Last synced: 19 Mar 2026

https://github.com/qeeqbox/data-lifecycle-management

Data Lifecycle Management (DLM) is a policy-based model for managing data in an organization

data data-lifecycle-management infosecsimplified lifecycle management qeeqbox

Last synced: 07 Mar 2026

https://github.com/kerlossony/nested-formdata

Nested-FormData is a Function designed to handle nested form data structures in a simplified and efficient way. It helps in managing complex form data, making it easier to work with forms that require hierarchical data

data forms javascript nested-structures nextjs reactjs typescript

Last synced: 08 Mar 2026

https://github.com/sixarm/sixarm_ruby_fab

SixArm.com → Ruby → Fab gem to fabricate sample data for testing

data fabrication factory fake gem mock ruby

Last synced: 24 Jul 2025

https://github.com/oya163/corteva

Corteva Data Ingestion Pipeline

corteva data engineering etl

Last synced: 25 Jul 2025

https://github.com/shysolocup/stews

Stews is a Node.JS package meant to make storing data easier by mixing parts from common data types.

aepl array arrays data datatypes html javascript js json map maps nodejs object objects package set sets stews

Last synced: 25 Jul 2025

https://github.com/stonecharioteer/renfield

Synchronize and Search through Hard Drives

catalogue data search storage synchronization

Last synced: 09 Feb 2026

https://github.com/patelabhi574/hotel_reservation_analysis

Analyzing data collected by hotel to make future prediction for the owner of what are the segments they are making most profit & also which are the patterns & trends which have been seen over the past years in the booking in different times throughout the year and price setting on the website in peak time as per availability index.

data data-visualization datamodeling looker-studio powerbi reporting sql-query sql-server

Last synced: 19 Feb 2026

https://github.com/public-health-scotland/waiting_times_clinical_prioritisation

This repository contains the Reproducible Analytical Pipeline (RAP) to produce the quarterly statistics on clinical prioritisation, part of the Stage of Treatment (SoT) publication.

data healthcare nhs public-health scotland shiny shiny-app treatment waiting-time

Last synced: 26 Jul 2025

https://github.com/incubrain/awesome-maharashtra-data

A collection of datasets specific to Maharashtra, India. WIP

ai artificial-intelligence data data-analysis data-science datasets maharashtra marathi

Last synced: 23 May 2026

https://github.com/joeyism/py-cifar10

This library was created to allow an easy usage of CIFAR 10 DATA. This is a wrapper around the instructions givn on the CIFAR 10 site

cifar cifar-10 cifar10 data machine-learning machinelearning

Last synced: 30 Jul 2025

https://github.com/cworld1/novel-data

The data repository of novel analysis

analysis data novel

Last synced: 01 Feb 2026

https://github.com/connectomicslab/cmtklib-data

Datalad dataset that stores all data resources of the cmtklib module of Connectome Mapper 3 (https://github.com/connectomicslab/connectomemapper3).

brain data parcellation resources software

Last synced: 16 Jan 2026

https://github.com/gallo13/neuralnetworks-deeplearning-stats-classification

Descriptive Statistics, Classification and Analysis Using Python & Python Libraries (Assignment 1)

analysis data datasets deep-learning jupyter-notebook matplotlib neural-networks numpy pandas plotting python seaborn

Last synced: 17 Apr 2026

https://github.com/outofbedlam/tine

TINE a data pipeline runner.

data pipeline

Last synced: 05 Oct 2025

https://github.com/dixslyf/nbparts

Unpack a Jupyter notebook into its sources, outputs and metadata.

data haskell jupyter jupyter-notebook nix nix-flake

Last synced: 05 Oct 2025

https://github.com/aniketkkajania/wassupanalyzer

WhatsAnalyzer is a powerful statistical analysis tool designed for analyzing WhatsApp chats. With the ability to process chat files exported from WhatsApp, this tool provides valuable insights by generating various plots and statistics.

data data-science datavisualization streamlit streamlit-webapp webapp whatsapp whatsapp-chat

Last synced: 25 Feb 2026

https://github.com/humbertocg18/pucrs-alest-i-2.3-2023.24

Trabalhos, Projetos, Exercícios e aulas realizados em Java na cadeira de Algoritimos e estrutura de dados 1, matéria do segundo semestre.

beecrowd beecrowd-solution-in-js beecrowd-solutions-in-java data data-structures datastructures-algorithms hashmap hashtable java-8 leetcode leetcode-javascript leetcode-solutions leetcodepra pucrs sorting-algorithms

Last synced: 29 Mar 2025

https://github.com/lamden/merk

A concise implementation of a merkle tree in Python.

crypto data hash merkle structure tree

Last synced: 27 May 2026

https://github.com/stdlib-js/ndarray-base-empty-like

Create an uninitialized ndarray having the same shape and data type as a provided ndarray.

base data empty javascript matrix ndarray node node-js nodejs stdlib structure types vector

Last synced: 09 Mar 2026

https://github.com/tee8z/noaa-oracle

NOAA data oracle, queryable from the browser and can attest to events for a Bitcoin DLC in dlctix style

data duckdb-wasm noaa-weather parquet-files sql weather

Last synced: 17 Feb 2026

https://github.com/kirkalyn13/portfolio-dashboard-site

Portfolio Site; Initially a Service Provider Metrics Dashboard using React.

dashboard data data-visualization react

Last synced: 15 Apr 2026

https://github.com/rrwen/twitter2mongodb

Module for extracting Twitter data to MongoDB databases

api data database geo get location mdb media mongo mongod mongodb oauth post rest sample social stream token tweet twitter

Last synced: 06 May 2026

https://github.com/themost-framework/memory

MOST Web Framework in-memory data adapter for testing environments

adapter data orm

Last synced: 01 Jul 2026

https://github.com/allianz/yukimi

Self-service Snowflake provisioning with built-in security and policy enforcement.

ai automation data security

Last synced: 05 Jun 2026

https://github.com/jigyasag18/gold-price-prediction-project-using-machine-learning

This repository contains a machine learning project focused on predicting gold prices (GLD) using historical stock market data, including indicators such as SPX, USO, SLV, and EUR/USD. The project implements a Random Forest Regressor for accurate price forecasting, complete with data visualization, correlation analysis, and model evaluation metrics

data dataset jupyter-notebook jupyter-notebooks machine-learning machinelearing machinelearningalgorithms machinelearningmodel machinelearningprojects matplotlib mlproject numpy pandas randomforestregressor seaborn

Last synced: 23 Jul 2025

https://github.com/woctezuma/download-steam-screenshots-data

Data consisting of Steam screenshots.

data steam steam-api

Last synced: 19 Feb 2026

https://github.com/abdul-rafay19/youngdevinterns_machine-learning_tasks

This internship offers hands-on exposure to real-world Machine Learning applications — from data visualization and preprocessing to model development, evaluation, and deployment. It focuses on real ML workflows, problem-solving, neural networks, and hyperparameter tuning — all within a collaborative, remote, and growth-oriented environment.

ai artificial-intelligence artificial-intelligence-algorithms artificial-neural-networks data data-visualization internship machine-learning machine-learning-algorithms machinelearning ml model model-development neural-network preprocessing programming-language python task tasks youngdevintern

Last synced: 29 Apr 2026

https://github.com/akatrevorjay/helm-nuke

Nukes all helm releases as well as tiller-owned k8s objects that may be left lying around.

all data destroy helm plugin

Last synced: 19 Sep 2025

https://github.com/frefrik/covid19norge-api

API for COVID-19 cases in Norway

api covid covid-19 covid19 data fastapi norge norway

Last synced: 10 May 2026

https://github.com/jinsyin/datagovernance

公众号:「数据之道」

data data-governance datagovernance governance

Last synced: 30 Jan 2026

https://github.com/satur-io/estoraje

Estoraje is the simplest distributed system for key-value storage in less than 800 lines of code. It is temporary consistent, high available, lightweight, scalable and gives a good performance.

data database distributed go golang key-value performance training

Last synced: 07 May 2026

https://github.com/aravind-selvam/bikeshare-company-analysis

Google Data Analytics Professional Certificate program's Capstone project, of a bike sharing company

analytics business-analytics business-intelligence data data-analysis data-visualization dataanalytics google-data-analytics postgresql sql sql-server

Last synced: 22 Apr 2026

https://github.com/howtoquitvivek/ai-crop-yeild-prediction

AI-driven crop yield prediction and agricultural optimization system (SIH 2025)

2025 2026 ai crop-yeild data minor-project ml predcition python science sih

Last synced: 23 Apr 2026

https://github.com/quasilyte/phpcorpus

A collection of various PHP code; useful for PHP tools writers to get some insights on how "real-world" PHP code looks like

analysis corpus data php php-corpus

Last synced: 04 Jul 2025

https://github.com/athul64/powerbi

Financial Reports Dashboard This repository showcases a Financial Reporting Dashboard that visualizes key financial metrics and performance insights. The dashboard contains Monthly and Annual reports, allowing users to switch between the two views to analyze data at different intervals.

data data-an data-visualization dax dax-expression powerbi

Last synced: 23 Feb 2026

https://github.com/alexscigalszky/palabras-aleatorias-data

This package have a set of datasets of random words, animals, colors, jokes, onomatopeias and types

aleatorias data palabras random words

Last synced: 04 Oct 2025

https://github.com/andygol/osm-diff-state

CLI tool to search OSM diff state files

custom data openstreetmap planet replication

Last synced: 24 Apr 2026

https://github.com/vapourismo/binary-io

Read and write values of types that implement Binary from and to Handles

data haskell haskell-library io parsing

Last synced: 28 Mar 2025

https://github.com/horisystems/uk_ev_data_analysis

Analysis of Electric Vehicle charging infrastructure in the United Kingdom.

data data-science electric-vehicles ev python uk united-kingdom

Last synced: 12 Jan 2026

https://github.com/divithraju/divith-aju-hadoop-pyspark-pipeline

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

apache-hadoop-framework apache-spark bigdata client data database dataengineering dataingestionframework datapreprocessing documentation ecommerce-platform hdfs pipeline project project-repository pyspark python3 software-engineering

Last synced: 27 Jan 2026

https://github.com/dbriane208/omdena-apprenticeship-project

This is part of my contribution to the Omdena apprenticeship program .

data data-science feature-engineering machine-learning

Last synced: 14 Mar 2026