data
Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)
- GitHub: https://github.com/topics/data
- Wikipedia: https://en.wikipedia.org/wiki/Data
- Related Topics: datum,
- Last updated: 2026-06-27 00:07:33 UTC
- JSON Representation
https://github.com/abhijeetdasbakshi/ecommerce-insights
A Dockerized end-to-end project that combines unsupervised machine learning for customer segmentation with scalable data pipelines. It uses MongoDB for data ingestion, Scikit-learn for clustering, Airflow for orchestration, and Streamlit for interactive visualization — enabling actionable insights into e-commerce
airflow airflow-dags ci-cd-pipeline clustering dags data data-pipelines docker docker-compose docker-container dockerfile git great-expectations kafka mongodb pca-analysis postgresql pyspark t-sne umap-learn
Last synced: 04 Apr 2026
https://github.com/ffatahillah7/snowflake-data-governance-warehouses
Welcome to the Powered by Tasty Bytes - Zero to Snowflake Quickstart focused on Data Governance! Within this Quickstart we will learn about Snowflake Roles, Role Based Access Control and deploy both Column and Row Level Security that can scale with your business.
data data-governance snowflake
Last synced: 06 Jan 2026
https://github.com/rrwen/r-reference
Quick reference to learning R
analysis beginner data guide introduction learn r reference statistics stats syntax
Last synced: 02 Jul 2025
https://github.com/jpcadena/ventas-facturas
Ventas con facturas
data data-analysis data-exploration data-extraction data-science excel feature-engineering matplotlib microsoft numpy pandas powerbi product-sales pylint python receipts sales
Last synced: 12 Apr 2026
https://github.com/45harry/potato_disease_classification
Potato Disease Classification - Traning, Rest Api and FrontEnd to Test
cnn-classification data data-science datapreprocessing deep-learning fastapi flaskapi frontend keras restapi tensorflow
Last synced: 12 Apr 2026
https://github.com/0xHericles/ufcg-geojson
GeoJSON file containing the blocks and buildings of the Federal University of Campina Grande.
data data-visualization geojson map open-source ufcg university
Last synced: 24 Mar 2025
https://github.com/doppelgunner/baby
A program for storing data just for fun
data doppelgunner java note storing
Last synced: 12 Jun 2026
https://github.com/ragibasif/bobdylan
Bob Dylan
bob-dylan csv data data-science data-visualization lyrics music python
Last synced: 03 Sep 2025
https://github.com/petzi53/repair
R Datasets of the Open Repair Alliance (ORA).
Last synced: 19 May 2026
https://github.com/cpietsch/breitband
developer repo of breitband-berlin
d3js data threejs visualization
Last synced: 02 May 2026
https://github.com/infinitode/pyautoplot
PyAutoPlot is an open-source Python library designed to make dataset analysis much easier by generating helpful detailed plots using matplotlib. It automatically generates appropriate plots based on the dataset you feed it.
analysis automatic csv data dataset dataset-analysis generation matplotlib pandas plots plotting-in-python plotting-library python
Last synced: 16 Mar 2025
https://github.com/jigyasag18/movie-recommendation-system-project
This repository features a personalized movie recommendation system that offers tailored suggestions to users. It leverages a dataset of 5,000 English-language films and utilizes data processing, feature engineering, and a cosine similarity algorithm to analyze user preferences. The system includes an intuitive user interface for easy navigation.
data datacleaning datapreprocessing machine-learning machine-learning-algorithms python streamlit streamlit-webapp
Last synced: 28 May 2026
https://github.com/beriberikix/senml-zephyr
A codec for encoding and decoding Sensor Measurement Lists (SenML) for Zephyr
codec data iot senml sensor zephyr-rtos
Last synced: 24 Mar 2025
https://github.com/anuraganalog/twitter-data-analysis
My internship work during the 2020 summer
analysis data eda exploratory-data-analysis jupyter-notebook nlp spotle textblob twitter wordcloud
Last synced: 20 May 2026
https://github.com/eby8zevin/android-intent
Intent & Bundle - Android Studio
android android-development android-studio bundle data intent java xml
Last synced: 03 Sep 2025
https://github.com/aniruddha-biswas/shield-insurance-business-insights
Shield Insurance Business Insights
data data-visualization dataanalysis excel mysql powerbi sql
Last synced: 01 Apr 2025
https://github.com/giuleo129/dataanalysis
This folder contains two projects focused on data analysis and statistical learning using R, covering exploratory data analysis, modeling, and predictive techniques.
data data-analysis data-science statistical-learning
Last synced: 25 Jan 2026
https://github.com/yashaswitir28/yashaswitir28.github.io
This is my Portfolio Website
data data-analysis-python data-analyst data-cleaning data-science data-visualization excel html-css ms office365 portfolio-website powerbi python sql
Last synced: 29 May 2026
https://github.com/natanast/euroleaguebasketball
An R package providing data on Euroleague Basketball
Last synced: 01 Apr 2025
https://github.com/v-mayya/quantitative-analysis-data-dashboard
Quantitative survey data analysis using R
data data-analysis data-visualization flourish r
Last synced: 01 Apr 2025
https://github.com/suchi25sathavara/data-wrangling-with-r
Analyzing Road Accidents in Victoria, Australia
data r reporting rstudio wrangling-data
Last synced: 01 Apr 2025
https://github.com/suchi25sathavara/r-projects
R projects in Real world Scenerios for Data Analysis
data data-analysis datavisualization r
Last synced: 01 Apr 2025
https://github.com/wraith13/systematic-metasyntactic-variables
This is a list for that you can express the existence of different serieses when using metasyntax variables.
Last synced: 14 Jun 2025
https://github.com/armand-sauzay/datasets
Datasets for machine learning
ai data datasets machine-learning ml
Last synced: 18 Jan 2026
https://github.com/h-sutiwas/r2de-2025
This repository is related to the Road To Data Engineer Bootcamp by DataTH. It contains all related coursework, some mini projects and other resources within the field of Data Engineering.
data data-engineering data-visualization docker gcp pipeline spark
Last synced: 30 Apr 2026
https://github.com/pchaparro/search-engine
Full stack search-engine created from youtube videos obtained using "web-scraping"
data opensearch python python3 react scraper scraping scraping-websites search search-engine semantic-search sentence-transformers typescript website
Last synced: 17 Apr 2026
https://github.com/irsol/udacity-data-foundations-nd
data data-analysis data-visualization exel sql udacity udacity-data udacity-nanodegree
Last synced: 05 Mar 2026
https://github.com/lananolana/test_data_generator
Generate test data with Telegram bot in one click: random users, files, texts and credit cards.
credit-card data data-generation fake-data random telegram-bot test-data test-data-generator test-file-generator testing testing-tools text-generation user-generator
Last synced: 18 Jan 2026
https://github.com/alextanhongpin/node-github-api
:page_with_curl: sample github api queries with nodejs for scraping purposes
Last synced: 06 May 2026
https://github.com/jacoblincool/moodle-export
A streamlined library for retrieving data from Moodle.
Last synced: 07 May 2025
https://github.com/purarue/scramble-history
parses rubiks cube scramble history/solve time from cstimer.net, cubers.io, twistytimer -- merges them together giving you uniform averages/data/graphs
cstimer cubing data rubiks-cube speedsolving
Last synced: 11 Jun 2025
https://github.com/filiprokita/foldertoiso
Python script that converts a specified folder into an ISO.
automation command-line-interface command-line-tool compression cross-platform data file-system folder-to-iso iso iso-image iso-tool python python-cli python-script python3 shutil utility
Last synced: 24 Mar 2025
https://github.com/powersyang/visualization
data visualization templates 数据可视化模板
Last synced: 24 Mar 2025
https://github.com/team-hydrogen/nasa-adc-data
All files relating to the computation of the data provided
data jupyter-notebook nasa-app-development-challenge
Last synced: 25 Mar 2025
https://github.com/filipnet/infoscreen
Arduino subscribes values by MQTT and view info on an OLED I2C display
arduino data display i2c mqtt oled-display-ssd1306 visualization weather weatherstation
Last synced: 12 Apr 2026
https://github.com/keminghe/osu
Unofficial and publicly-available NPM data-package about The Ohio State University.
college data majors ohio-state organizations public students university unofficial
Last synced: 06 Jan 2026
https://github.com/jneidel/nationalities
Dataset of 100 common nationalities
data dataset json nationalities nationality opendata
Last synced: 25 Mar 2025
https://github.com/cosmos-loops/cosmos-data
Cosmos.Data is a inline project of COSMOS LOOPS PROGRAMME to provide several SQL-Query, RMDB/ORM and No-SQL components' extensions.
connection-pool data mysql mysqlconnector oracle postgresql sqlite sqlkata sqlserver transaction uow
Last synced: 12 Apr 2026
https://github.com/pawal/tldmonitor-ui-go
Web UI for TLDMonitor
analysis data dns go golang mongodb statistics webapp website
Last synced: 16 Jan 2026
https://github.com/sajjad425/missingvalue
This repository provides a guide on handling missing values in Python, covering identification methods, imputation techniques (mean, median, mode, fill, interpolation), advanced methods (KNN, multiple imputation), and best practices. It includes practical examples for both numerical and categorical data.
data data-analysis-python data-science missing-value-handling missing-value-imputation
Last synced: 04 Apr 2025
https://github.com/eva-kaushik/data-clustering
Clustering Accelerators for hard and soft clustering, including implementations of K-means, K-medoids, hierarchical clustering, fuzzy C-means, and Gaussian mixture models. Demonstrates text clustering using both hard and soft clustering algorithms.
clustering clustering-algorithm data datascience machine-learning-algorithms
Last synced: 09 Apr 2025
https://github.com/denisecase/620-mod6-web-scraping
Notes on how to get started scraping content from the web
beautifulsoup4 data mining python
Last synced: 11 Apr 2025
https://github.com/coko7/vegapull-records
Cards dataset for One Piece TCG
data dataset one-piece one-piece-card-game one-piece-tcg tcg
Last synced: 26 Feb 2025
https://github.com/survi218/angular-http-service
client-server communication using http service in angular
angularjs client-server communication data get http-client http-requests http-response http-server post
Last synced: 16 Mar 2025
https://github.com/plandes/datdesc
Describe and optimize data
data hyperparameter-optimization hyperparameter-tuning latex table
Last synced: 04 Sep 2025
https://github.com/getconversio/dig-the-data
Data visualizations for the Conversio blog
Last synced: 12 Apr 2026
https://github.com/tttardigrado/fq
Graffs for the MEDEA project
bokehplots data data-science dataanalysis pandas physics python3
Last synced: 12 Apr 2026
https://github.com/thewillyhuman/willyos-java
willyOS for java developers
collections data data-structures java os structures
Last synced: 12 Jun 2025
https://github.com/frer0t/userverse
creating api for data analysis
data data-analytics spring-boot users
Last synced: 12 Apr 2026
https://github.com/matheusafonseca/deploy-ml-models-with-streamlit-udemy
This repository is dedicated to storing the code developed during the "Machine Learning Model Deployment with Streamlit" course on Udemy. The course covers basic to advanced techniques for deploying machine learning models using Streamlit.
data data-science data-visualization interface joblib layout machine-learning optimization-algorithms python python3 sklearn sklearn-datasets sklearn-library sklearn-pipeline streamlit
Last synced: 19 Apr 2026
https://github.com/zcebeci/odetector
Outlier Detection Using Cluster Analysis
anomaly-detection cluster-analysis clustering clustering-methods data datapreparation datapreprocessing exception-handling fcm fraud-detection fuzzy-clustering novelty-detection outlier-detection outlier-removal outliers partitioning pcm r surprise-exploration
Last synced: 29 Oct 2025
https://github.com/astridlyre/offhand
A Random Data Generator Library for JavaScript.
data generator javascript library random typescript
Last synced: 20 May 2026
https://github.com/nikolatechie/spotify-playlist
Data pipeline that fetches recently played songs in the past 24 hours using Spotify API and saves the data in the SQLite database. Scheduled to run daily using Apache Airflow.
apache-airflow api data data-engineering python spotify sql sqlite
Last synced: 30 Apr 2026
https://github.com/davorg/cookingvinyl
Web site with info about Cooking Vinyl records
cooking-vinyl data hacktoberfest music perl
Last synced: 02 Apr 2025
https://github.com/dahsie/machine_learning_from_scratch
This project aims to implement some machine learning basic techniques(e.g. MinMaxScaler, StandardScaler, TD-IDF, PCA, Logistic Regression, LDA, KNN, Naive Bayes Classifier) using only pyton, numpy and pandas. This will enable me to have hone my data scientist skills
classification clustering data data-processing datascience machienlearning nlp nltk numpy pandas python regression
Last synced: 04 May 2026
https://github.com/rikiitokazu/dataprojects
Data analysis practice using SQL and Python
Last synced: 12 Apr 2026
https://github.com/jpb06/kubot-dal
data data-access-layer gulp-tasks mongodb typescript
Last synced: 12 Apr 2026
https://github.com/theanujsinha01/mcdonalds-customer-analysis
This project analyzes customer feedback data to understand what drives people to like or dislike McDonald’s. Using Python and data visualization tools in a Jupyter Notebook, we explore how different factors—such as taste, price, health, and visit frequency—affect customer satisfaction.
case-study data data-visualization dataanalysis
Last synced: 05 Sep 2025
https://github.com/veivel/f1-sentiment-analysis
An entiment analysis project on tweets about Formula 1. To be reworked.
data f1 nlp-library nlp-machine-learning
Last synced: 04 Jul 2025
https://github.com/eng-gabrielscardoso/data-science-formation
Data science course walkthrough
data data-science data-visualisation google-colab google-colaboratory google-colaboratory-notebooks python r r-lang
Last synced: 28 Feb 2025
https://github.com/ournet/topics-data
Ournet topics data package
data ournet storage topic topics topics-data topics-storage
Last synced: 12 Jun 2025
https://github.com/bcongdon/nid-data
National Inventory of Dams Data
data datasette government-data
Last synced: 21 Apr 2026
https://github.com/naliferopoulos/datamining
Bring your own pickaxe.
aueb aueb-students data data-mining machine-learning machine-learning-algorithms mining random-forest
Last synced: 25 Jan 2026
https://github.com/stefanpietrusky/factsv2
Repository for the article in the online magazine TDS.
ai arxiv-papers beautifulsoup data flask-application gensim llama matplotlib ollama plotly pyldavis python selenium webdriver
Last synced: 09 Apr 2025
https://github.com/so-cool/junction
My solution to the University of Bristol "Bristol Journey Time" Data Challenge https://So-Cool.github.io/junction
competition data modelling timeseries
Last synced: 02 Apr 2025
https://github.com/etmendz/mendz.data.oracle
Provides a generic Mendz.Data-aware context for ADO.Net-compatible access to Oracle databases.
ado-net context data database datasettings mendz oracle
Last synced: 13 Apr 2026
https://github.com/shadmanshaikh/data-analysis-and-ml-work
All of my work in Data Analysis and Machine learning
analytics artificial-intelligence data machine-learning
Last synced: 05 Jul 2025
https://github.com/white-gecko/lineage-dump
RDF dump of the device information from the lineage wiki
Last synced: 28 May 2026
https://github.com/codegouvfr/codegouvfr-sources
🧢 Static web frontend for code.gouv.fr
bluehats codegouvfr data frontend
Last synced: 28 Feb 2025
https://github.com/raghavendranhp/youtube_data_harvesting
The "YouTube Data Analyzer" is a versatile tool for businesses and content creators, enabling them to gather, analyze, and harness valuable insights from multiple YouTube channels. With streamlined data collection, storage in MongoDB, migration to SQL, and a user-friendly Streamlit interface, it empowers users to make data-driven decisions
apiintegration data datacollection eda googleapi googleapiclient matplotlib mongodb mysql mysqlconnector numpy oops pandas pymongo python pythonoops sql sqlalchemy streamlit youtube-api
Last synced: 13 Apr 2026
https://github.com/grace-mengke-hu/redditpushshiftapi
This package is for collecting Reddit dataset and organize the data in Mongo Database
Last synced: 13 Jun 2025
https://github.com/sakshamarora07/blinkit-sales-report-power-bi
This dashboard provides Blinkit with insights to optimize its grocery delivery operations and understand customer preferences. It evaluates sales trends, outlet performance, and item categories to identify key areas for improvement. The interactive visuals allow detailed exploration of sales distribution, customer ratings, and product popularity.
data data-science dataanalytics datavisualization excel powerbi sql
Last synced: 08 Jan 2026
https://github.com/farhashaad/farhashaad98
This is a repository to showcase my skills, share projects and track my progress in Data Science related projects.
data data-visualization dataanalysis matplotlib pandas python seaborn sql tableau
Last synced: 24 Apr 2026
https://github.com/vatshayan/youtube-user-analysis
Analysis of Youtube Users about their choice and preferences
data data-analysis data-mining data-science data-visualization dataset machine-learning machine-learning-algorithms
Last synced: 05 Feb 2026
https://github.com/stdlib-js/ndarray-vector-uint32
Create an unsigned 32-bit integer vector (i.e., a one-dimensional ndarray).
constructor ctor data javascript ndarray node node-js nodejs stdlib structure types uint32 vec vector
Last synced: 25 Apr 2026
https://github.com/luminati-io/LinkedIn-dataset-samples
Sample dataset of 1001 LinkedIn companies, extracted via Bright Data API, featuring essential data points for competitive analysis and market insights.
data database dataset linkedin linkedin-api linkedin-data linkedin-dataset linkedin-scraper sample web-scraping
Last synced: 09 Apr 2025
https://github.com/deliprofesor/virtual-reality-in-education-impact-analysis-and-insights
This project examines the impact of Virtual Reality (VR) on education, focusing on its effects on student engagement, learning outcomes, and creativity. It uses data analysis techniques like descriptive statistics, correlation analysis, and clustering to assess VR's effectiveness in enhancing learning.
clustering data data-analysis data-science data-visualization exploratory-data-analysis hypothesis-testing machine-learning python regression-analysis virtual-reality
Last synced: 14 Jun 2025
https://github.com/mukul273/spring-data-rest-jpa-demo
Spring Data Rest JPA Demo
data jpa rest spring spring-boot spring-mvc
Last synced: 20 Apr 2026
https://github.com/nagipragalathan/linkedin_backup_datas
This repository contains the backup data from my previous LinkedIn account. Unfortunately, my old LinkedIn account was compromised and subsequently blocked by LinkedIn. As a result, I created a new account, but that too got blocked for reasons unknown to me.
backup blocked data linkedin linkedin-account memory nagipragalathan recovery storage
Last synced: 18 Jan 2026
https://github.com/jstafford5380/provausio.testing.generators
Generate fake data for testing and/or mocking
data fake-data generator testing
Last synced: 14 Jan 2026
https://github.com/rosette-api/mock-data
Mock data that is used for unit testing of the Babel Street Analytics bindings
data entity-extraction entity-level-sentiment entity-linking entity-relationship entity-resolution language-detection machine-learning mock-data morphology natural-language-processing nlp relation-extraction sentiment-analysis test-framework testing text-mining text-processing tokenization
Last synced: 04 Mar 2026
https://github.com/afolabi022/getting-and-cleaning-data-course-project
Tidy Dataset Creation for Human Activity Recognition" This repository contains the code and files for cleaning and transforming the Human Activity Recognition Using Smartphones dataset into a tidy format. The project demonstrates data wrangling skills in R, including merging datasets
data data-science datacleaning r
Last synced: 25 Mar 2025
https://github.com/elkingarcia11/mlb-gameday-obp-odds
Small Python script that pulls MLB team on-base percentage (OBP) for the current season, loads today’s schedule, and writes CSV files that list each team’s OBP edge against its opponent for the day. It also labels each side of a game as betting favorite, not favorite, or equal using American moneylines from ESPN’s public game data.
api csv data http https json mlb mlb-stats-api moneyline odds python rest sports urllib
Last synced: 30 May 2026
https://github.com/shadeglare/genum
The ES Next tools to process data in a LINQ manner
data linq processing typescript
Last synced: 13 Apr 2026
https://github.com/Coko7/vegapull-records
Cards dataset for One Piece TCG
data one-piece one-piece-card-game one-piece-tcg tcg
Last synced: 28 Apr 2025
https://github.com/q-aware-labs/bias-insights
Bias detection project for the Chicago Face Database (CFD)
ai chicago-data-portal data data-science llm statistical-analysis
Last synced: 21 Jan 2026
https://github.com/primetdmomega/webscraper
A data web scraper that looks for jobs on Glassdoor.com
Last synced: 25 Mar 2025