An open API service indexing awesome lists of open source software.

data

Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)

https://github.com/utrechtuniversity/momentum-dataflow

Repository for publishing website about data management practices of the Momentum project

data datageneration datamanagement

Last synced: 27 Feb 2026

https://github.com/vatshayan/songs-datasets

Datasets for Songs and Music for Dancing, Emotional, Happy and scenic view

1000dataset classfication csv data datapackage datapackages dataset datasets excel free freedata freedatasets genre machine music sgenre song songs

Last synced: 18 Mar 2026

https://github.com/abhinavrobinson/mc-community-world

Minecraft community world data.

data minecraft server world

Last synced: 27 Feb 2026

https://github.com/nxank4/an-augment

A Python library for advanced and novel data augmentation, combining traditional techniques like cropping and blurring with state-of-the-art generative AI methods such as style transfer, image inpainting, and latent space interpolation. It boosts data diversity for robust machine learning applications.

computer-vision data data-augmentation data-augmentation-strategies data-augmentation-techniques generative-ai image image-processing synthetic-data

Last synced: 10 Mar 2026

https://github.com/shubhamsoni98/project_using_knn

This project applies the K-Nearest Neighbors (KNN) algorithm to predict iPhone purchases based on customer data. Using features like age, salary, and previous purchase behavior, the KNN model classifies customers into buyers and non-buyers.

anaconda analytics data data-science eda knn knn-classification machine-learning-algorithms predict project python scikit-learn tableau

Last synced: 03 Jan 2026

https://github.com/alextanhongpin/node-github-api

:page_with_curl: sample github api queries with nodejs for scraping purposes

data github-api nodejs

Last synced: 06 May 2026

https://github.com/azkarmoulana/winter-of-data-2019

:snowflake: :snowman: Winter of Data is coming..... :wolf:

data data-science machine-learning mathematics

Last synced: 05 Feb 2026

https://github.com/nisanth2004/springboot-kafka-real-world-project-wikimedia

Creating a project about Wikimedia using Kafka involves building a system that leverages Apache Kafka for data streaming and processing related to Wikimedia data.

async broker communication data java kafka message real-time real-time-analytics springboot wikimedia

Last synced: 14 May 2026

https://github.com/fuzzt/location-analyzer

The Location Data Analyzer is a Spring Boot application that offers insights on location data, such as counting locations by type, calculating average ratings, and identifying the most reviewed and incomplete entries. It features a simple frontend (HTML, CSS, JavaScript) and is deployed on Render.

analysis api average css data deployment docker fetch-api frontend html javascript location maven ratings render restful-api reviews spring-boot techstack

Last synced: 11 Apr 2026

https://github.com/jigyasag18/project-diwali-sales-analysis

This project analyzes retail sales data during the Diwali festival using exploratory data analysis (EDA) to identify buyer demographics and product preferences. The findings reveal that the primary purchasers are married women aged 26-35 from Uttar Pradesh, Maharashtra, and Karnataka, working in IT, Healthcare, and Aviation.

analysis data datapr datapro eda jupyter-notebook python realtimedata

Last synced: 01 Jun 2026

https://github.com/sushmashreeps/python

This repository showcases a comprehensive Python project, demonstrating expertise in backend development, data analysis, and machine learning. Built with Python 3.x, the project utilizes popular libraries like Django, Flask, NumPy, pandas, and scikit-learn. The project features efficient data processing, robust API integration, and scalable archite

api data data-science dataanalysis datavisualization game gamedeveloment python

Last synced: 12 May 2026

https://github.com/zulfachafidz/telco_churn_insight_customer_loss_prediction_with_random_forest_and_decision_tree-algorithms

The main problem in the business world is customer churn, or losing customers, especially in the telecommunications industry, which experiences very tight competition. To overcome this problem, an analysis was carried out to help the company understand how many customers have the potential to switch providers.

data data-science data-visualization dataanalysis dataanalyst dataanalytics datadrivenwithdataprovider decision-tree decision-tree-classifier decision-trees random-forest random-forest-classifier

Last synced: 01 May 2026

https://github.com/nabilaagha/chest-x-ray-medical-diagnosis-using-deep-learning

This project uses deep learning to classify chest X-ray images for disease detection. It involves data preprocessing, pre-trained CNN models, and the ChestX-ray8 dataset to enhance medical diagnostics with AI.

computer-vision data data-processing deep-learning juypter-notebook medical-image-processing x-ray-images

Last synced: 15 Dec 2025

https://github.com/oliver021/helppad-net

Versatile .NET Toolkit: A Comprehensive Set of Miscellaneous Helpers, Classes, and Utilities

assert async checks cryptographic-algorithms data date dotnet fluent functional functional-programming hash helpers parallel pipe pipeline pointers review supports tasks

Last synced: 15 Jun 2026

https://github.com/purarue/scramble-history

parses rubiks cube scramble history/solve time from cstimer.net, cubers.io, twistytimer -- merges them together giving you uniform averages/data/graphs

cstimer cubing data rubiks-cube speedsolving

Last synced: 11 Jun 2025

https://github.com/guilyx/airplane-booking

Simple airline ticket reservation program.

algorithms data linked-list

Last synced: 25 Jun 2025

https://github.com/chowington/bg-counter-tools

A set of tools that can pull data from Biogents BG-Counter smart mosquito traps and convert them into a Darwin Core compliant format.

bg-counter biogents darwin-core data internet-of-things mosquito-prevalence population-dynamics

Last synced: 10 Oct 2025

https://github.com/quantumudit/test-store-data-analysis

This repository showcases a web scraper with a pipeline structure for efficient data extraction and transformation from websites. The tool can be tailored to leverage its capabilities for insightful data analysis, providing valuable insights and informed decision-making.

data data-visualization dataanalytics python python-webscraping webscraper webscraping-data

Last synced: 11 Apr 2026

https://github.com/vidushibhadana/eda-on-nyc-taxi-data

About Conducting an Exploratory Data Analysis (EDA) on New York City taxi data and visualizing it through countplots, distribution plots (displot), and histograms using Python and it's libraries.

data data-visualization jupyter-notebook matplotlib numpy pandas python seaborn

Last synced: 11 Apr 2026

https://github.com/etmendz/mendz.data

Provides tools and guidance for creating data access contexts and repositories.

context data datasettings entity-framework mendz paginginfo repository resultinfo

Last synced: 11 Jun 2025

https://github.com/ournet/ournet.web.data

Ournet web data module

data ournet web

Last synced: 04 Apr 2025

https://github.com/lorenzobloise/client_satisfaction_classification

Jupyter notebook in which satisfaction from clients reviewing European hotels is analyzed using Python libraries such as pandas, numpy and scikit-learn. Various classification models are trained and tested to predict client satisfaction.

classification data data-mining jupyter jupyter-notebook machine-learning pandas python

Last synced: 21 Feb 2026

https://github.com/jorgeatgu/dataset-elecciones-28a

Datasets generados a partir del dataset de elecciones generales de El País

28a data elecciones2019 elections spain

Last synced: 16 May 2026

https://github.com/team-hydrogen/nasa-adc-data

All files relating to the computation of the data provided

data jupyter-notebook nasa-app-development-challenge

Last synced: 25 Mar 2025

https://github.com/nadahamdy217/movies-data-etl-using-python-gcp

Developed a comprehensive ETL pipeline for movie data using Python, Docker, and a GCP Pub/Sub emulator. Successfully processed and published the data in a local Docker environment, showcasing advanced data engineering skills.

analytics data data-engineering data-ingestion data-preparation data-preprocessing data-processing data-project docker etl etl-pipeline gcp matplotlib matplotlib-pyplot numpy pandas pubsub python scipy seaborn

Last synced: 06 Jan 2026

https://github.com/igor-starostenko/sabre

Slice your files like a champ with **sabre**

data golang package

Last synced: 28 Mar 2025

https://github.com/webobite/fact-chatbot

A Fact chatbot is a project in which it read a txt file which consist all facts ahead of time and answer the user with some useful information regarding the same on the basis of facts provided in text file.

chatbot chatgpt chatgpt3 data data-visualization embedding-vectors generativeai nlp

Last synced: 04 May 2026

https://github.com/ikcede/hinge-data-ts-wrapper

Typescript wrapper for exported Hinge data

data hinge typescript

Last synced: 10 Oct 2025

https://github.com/zeh237/superstore-data-analytics

This is a Flask based data analytics project based on the superstore dataset using flask, pandas, sql and python

analytics data data-analysis data-science data-visualization flask python superstore

Last synced: 04 May 2025

https://github.com/mapi-developer/dapo

Simple, zero-dependency tabular data manipulation and analysis for Python.

dapo data python

Last synced: 06 Mar 2026

https://github.com/jigyasag18/fake-news-prediction-app

The Fake News Prediction App Repository offers a machine learning project that focuses on identifying the authenticity of news articles as fake or real. It uses a dataset of 20,000 articles and employs methods such as TF-IDF vectorization and the Lemmatization algorithm, achieving ~95% classification accuracy with random forest classifier model

data datapreprocessing logistic-regression machine-learning machine-learning-algorithms numpy pandas prediction stemming streamlit streamlit-webapp vectorization

Last synced: 11 Apr 2026

https://github.com/vaibhavmojidra/data-structures---hashtable-using-array-and-linked-list-in-java

Hash Table is a data structure which stores data in an associative manner. In a hash table, data is stored in an array format, where each data value has its own unique index value. Access of data becomes very fast if we know the index of the desired data. Thus, it becomes a data structure in which insertion and search operations are very fast irrespective of the size of the data. Hash Table uses an array as a storage medium and uses hash technique to generate an index where an element is to be inserted or is to be located from.

arrays data data-structures hashing java linked-list mojidra vaibhav vaibhav-mojidra vaibhavmojidra

Last synced: 12 Apr 2025

https://github.com/entropyorg/p5-data-testimage

:notebook::camera: interface for retrieving test images

cpan data image-analysis

Last synced: 29 May 2026

https://github.com/myavuzokumus/simplemodelcomparison

This application allows users to upload datasets, handle missing data, and compare different imputation strategies.

algorithm data data-science machine-learning preprocessing streamlit

Last synced: 21 Jan 2026

https://github.com/bkataru/spotigo

AI-powered local music intelligence platform with a task runner server core to retrieve and backup spotify account data to storage(s) at set periodic intervals

ai backup cron data go intelligence local-llm music ollama rag runner spotify task-runner tool-calling

Last synced: 16 Jan 2026

https://github.com/halyusa16/basic-sql-employee-analysis

This project focuses on analyzing employee data through querying, performing table joins to connect related information, aggregating salary statistics, and using subqueries to extract meaningful insights.

data data-analytics data-exploration database mysql self-project sql

Last synced: 16 May 2026

https://github.com/keminghe/osu

Unofficial and publicly-available NPM data-package about The Ohio State University.

college data majors ohio-state organizations public students university unofficial

Last synced: 06 Jan 2026

https://github.com/ssiarhei115/cv-dbase-analysis

HeadHunter CVs data base analysis

analysis cv data data-science resume

Last synced: 09 Apr 2025

https://github.com/jneidel/animal-names

Dataset of 100 common animal names

animals data dataset json names opendata

Last synced: 25 Mar 2025

https://github.com/jneidel/nationalities

Dataset of 100 common nationalities

data dataset json nationalities nationality opendata

Last synced: 25 Mar 2025

https://github.com/rrwen/poster-gisci-osmol

Conference poster and short paper titled "Outlier Detection in OpenStreetMap Data using the RandomForest Algorithm and Variable Contributions" for the GIScience Conference in 2016

2016 algorithm conference contribution data detection forest gis giscience learn machine open openstreetmap osm outlier paper poster random short variable

Last synced: 03 Apr 2025

https://github.com/rrwen/geohoods-to

Geospatial dataset of 1000+ aggregated variables for neighbourhoods in Toronto, ON, CA

csv data dataset geo geojson gis neighborhood neighborhoods neighbourhood neighbourhoods open open-data toronto toronto-open-data

Last synced: 25 Jun 2025

https://github.com/mierune/tinybufr

[WIP] A Rust library for decoding BUFR (Binary Universal Form for the Representation of meteorological data) files.

bufr data meteorology rust weather wmo

Last synced: 15 May 2025

https://github.com/itrauco/data-dirtying-tool

a simple command line tool to generate dirty data and do common data things in google cloud

data data-analysis data-engineering data-ops data-pipeline data-science data-visualization data-wrangling dirty-data google-cloud machine-learning

Last synced: 24 Feb 2025

https://github.com/illustratien/toolphd

Make your analysis simple and reproducible

academic analysis data phd publications r r-package reproducible-research scientific

Last synced: 26 Jan 2026

https://github.com/acovaci/orbit

ORBIT: an Open source Rust-based implementation of a data Build Tool, inspired by DBT

cargo clap-rs data data-warehouse dbt rust rust-lang tokio-rs

Last synced: 16 Mar 2025

https://github.com/4strium/data-analysis-france

🔍 Script allowing the analysis and recovery of precise data on French cities.

cities csv data france python research

Last synced: 01 Apr 2025

https://github.com/codehard8/web-scrapping

In this repository we have provide a web scrapping project through beautifulSoup and related files

beutifulsoup data houses-for-sale python3 requests-library-python webscraping

Last synced: 01 Jul 2025

https://github.com/yash-rewalia/airbnb_eda_pandas

The goal of the project is to gather information and analyze the detailed information of the different entries in order to provide insights about the host and price of the property in a particular area as per your preference , type of rooms and number of reviews accordingly.

data data-cleaning data-insights data-preprocessing data-visualization matplotlib numpy pandas python seaborn

Last synced: 11 Apr 2026

https://github.com/elimu-ai/ml-event-simulator

🤖 Simulation of learning events and assessment events

data learning-analytics machine-learning ml

Last synced: 28 Feb 2025

https://github.com/jpcurada/exploralytics

A python package for creating intermediate plotly visualizations

data eda plotly python visualization

Last synced: 05 Feb 2026

https://github.com/jonprice99/regional-election-analysis

An analysis of election results in Allegheny County using Pandas and other Python libraries to better understand the voting habits, practices, and preferences of regional voters.

data data-visualization election-analysis election-data pandas python

Last synced: 05 May 2026

https://github.com/sajjad425/missingvalue

This repository provides a guide on handling missing values in Python, covering identification methods, imputation techniques (mean, median, mode, fill, interpolation), advanced methods (KNN, multiple imputation), and best practices. It includes practical examples for both numerical and categorical data.

data data-analysis-python data-science missing-value-handling missing-value-imputation

Last synced: 04 Apr 2025

https://github.com/abshek7/big-data

A repository for documenting the learning related to theory and practical notes of big data computing.

big-data data data-engineering mapreduce pyspark

Last synced: 15 Jun 2025

https://github.com/eva-kaushik/data-clustering

Clustering Accelerators for hard and soft clustering, including implementations of K-means, K-medoids, hierarchical clustering, fuzzy C-means, and Gaussian mixture models. Demonstrates text clustering using both hard and soft clustering algorithms.

clustering clustering-algorithm data datascience machine-learning-algorithms

Last synced: 09 Apr 2025

https://github.com/cassandrajm/reddit-dashboard

INTERACTIVE DASHBOARD: Analyzing Political Discourse on Reddit: A Multi-Faceted NLP Approach to Toxicity, Bias, and Political Stance

capstone data data-analysis data-science politics python reddit

Last synced: 09 Apr 2025

https://github.com/ahmad-mtr/prjkt_exam_schedule_test

I hate scrolling in a list of 300+ courses of my Uni exam schedule, so I'm creating this. this's a test btw :)

data strings-manipulation

Last synced: 11 Apr 2025

https://github.com/berviantoleo/bervdata

Temporary data definition as db

data

Last synced: 01 Apr 2025

https://github.com/yash-chauhan-dev/sf_analytics

Business teams often rely on data analysts to extract insights using SQL. This tool eliminates that dependency by bridging the gap between humans and data using AI.

aiml analytics data dbt langchain llm python snowflake streamlit

Last synced: 07 May 2026

https://github.com/chocolateboy/corrigenda

Corrections, addenda, and deltas for data that's wrong on the Internet

addenda api corrections corrigenda data json json-data

Last synced: 27 Mar 2025

https://github.com/writetome51/pagination-page-info

Intended to help a separate Paginator class paginate data. Specifically, this class contains the properties `itemsPerPage` and `totalPages`, which will be used by other classes

batch data javascript paginate pagination typescript

Last synced: 09 May 2026

https://github.com/mnazlukhanyan/da-projects

Портфолио с работами по аналитике данных, показывающие мои навыки, умения и опыт

data data-vizualisation hypothesis-tests matplotlib pandas plotly postgresql product-metrics python scipy seaborn sql visualization

Last synced: 11 Apr 2026

https://github.com/alexmcvay/uber-data

UBER sql clone

data data-visualization sql

Last synced: 19 Jan 2026

https://github.com/vladandreitoma/igisol_jyvaskyla_xept_experimental_campaign

A simulation toolkit together with data analysis for the Xe&Pt Exotic Nuclei Generation experiment @ Jyvaskyla December 2022. Helping dr.Paul Constantin with simulation development. Simulation is done using Geant4 provided by CERN. Data anlysis is done using ROOT by Cern. Both C++ based. Job distributors to run the sim are coded in pearl

analysis architecture-design cplusplus data oop oop-principles pearl simulations

Last synced: 05 Sep 2025

https://github.com/campiohe/geomask

A very simple lib for creating geometric masks from spatial data using regular grids.

climate data gis weather

Last synced: 30 Dec 2025

https://github.com/nukopian/shell-series

Extract columns from tabular text

automation data shell

Last synced: 11 Oct 2025

https://github.com/jensostertag-archive/charts.js

A JavaScript Plugin to draw Charts to visualize Data and Statistics on Websites

charts data javascript statistics webapplication

Last synced: 22 Jun 2025

https://github.com/davorg/towerbridge

When is Tower Bridge lifting?

data hacktoberfest london perl web-scraping

Last synced: 25 Oct 2025

https://github.com/jcloh98/rental-property-finder

A web scraper that helps users find rental properties by automatically gathering and organizing listings from various websites to discover available homes and apartments.

data headless-browser node scraper scraping web

Last synced: 17 May 2026

https://github.com/merekat/hb-passiv-income

Ein Rechner, der basierend auf historischen Daten unterschiedlicher Assets kalkuliert, welches voraussichtliche passive Einkommen der User abhängig von seinen Eingaben zu erwarten hat.

assets data datajournalism etf passive-income treasury

Last synced: 19 Jul 2025

https://gitlab.com/sean-c/pdf_rules

Turn PDFs into CSVs by defining rules

Data Cleaning automation data data parsing

Last synced: 14 Apr 2025

https://github.com/hivesolutions/crossline

Simple event pipping and storing infra-structure

counter data opencv warehouse

Last synced: 15 May 2026

https://github.com/kaiepi/ra-annotations

Thread-safe static buffer

data type

Last synced: 13 Jul 2025

https://github.com/GAMELEIRA/studies-database

Esse repositório têm como objetivo alocar todo e qualquer script para aprender e praticar gerenciamento de banco de dados SQL e NoSQL. Nesse projeto, serão consolidados os principais fundamentos e princípios, além da prática de exercícios e desenvolvimento de projetos.

data database mongodb mssql mysql nosql sql

Last synced: 03 May 2025

https://github.com/code-str8/time-series-forecasting

Developing a model that effectively forecasts the unit sales of numerous items across various Favorita stores with precision.

data dataanalysis forcasting machine-learning time-series visualizations

Last synced: 31 Mar 2025

https://github.com/ppmim/papi4k_old2

PAPI: the PANIC data reduction pipeline

data near-infrared pipeline processing

Last synced: 23 Jun 2025

https://github.com/dcmox/moxymapper

Data mapping made easy

data json mapper

Last synced: 15 May 2026