data
Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)
- GitHub: https://github.com/topics/data
- Wikipedia: https://en.wikipedia.org/wiki/Data
- Related Topics: datum,
- Last updated: 2026-06-29 00:07:49 UTC
- JSON Representation
https://github.com/utrechtuniversity/momentum-dataflow
Repository for publishing website about data management practices of the Momentum project
data datageneration datamanagement
Last synced: 27 Feb 2026
https://github.com/vatshayan/songs-datasets
Datasets for Songs and Music for Dancing, Emotional, Happy and scenic view
1000dataset classfication csv data datapackage datapackages dataset datasets excel free freedata freedatasets genre machine music sgenre song songs
Last synced: 18 Mar 2026
https://github.com/abhinavrobinson/mc-community-world
Minecraft community world data.
Last synced: 27 Feb 2026
https://github.com/nxank4/an-augment
A Python library for advanced and novel data augmentation, combining traditional techniques like cropping and blurring with state-of-the-art generative AI methods such as style transfer, image inpainting, and latent space interpolation. It boosts data diversity for robust machine learning applications.
computer-vision data data-augmentation data-augmentation-strategies data-augmentation-techniques generative-ai image image-processing synthetic-data
Last synced: 10 Mar 2026
https://github.com/shubhamsoni98/project_using_knn
This project applies the K-Nearest Neighbors (KNN) algorithm to predict iPhone purchases based on customer data. Using features like age, salary, and previous purchase behavior, the KNN model classifies customers into buyers and non-buyers.
anaconda analytics data data-science eda knn knn-classification machine-learning-algorithms predict project python scikit-learn tableau
Last synced: 03 Jan 2026
https://github.com/woctezuma/humble-choice-leak
Retrieve leaks for Humble Choice.
data datamining humble-bundle humble-bundle-games humble-bundle-leak humble-choice humble-choice-leak humblebundle humblebundle-leak leak leaks steam steam-games
Last synced: 27 Mar 2025
https://github.com/alextanhongpin/node-github-api
:page_with_curl: sample github api queries with nodejs for scraping purposes
Last synced: 06 May 2026
https://github.com/azkarmoulana/winter-of-data-2019
:snowflake: :snowman: Winter of Data is coming..... :wolf:
data data-science machine-learning mathematics
Last synced: 05 Feb 2026
https://github.com/nisanth2004/springboot-kafka-real-world-project-wikimedia
Creating a project about Wikimedia using Kafka involves building a system that leverages Apache Kafka for data streaming and processing related to Wikimedia data.
async broker communication data java kafka message real-time real-time-analytics springboot wikimedia
Last synced: 14 May 2026
https://github.com/fuzzt/location-analyzer
The Location Data Analyzer is a Spring Boot application that offers insights on location data, such as counting locations by type, calculating average ratings, and identifying the most reviewed and incomplete entries. It features a simple frontend (HTML, CSS, JavaScript) and is deployed on Render.
analysis api average css data deployment docker fetch-api frontend html javascript location maven ratings render restful-api reviews spring-boot techstack
Last synced: 11 Apr 2026
https://github.com/stuffbymax/game-dependencies-db
data database game games-list json mit-license
Last synced: 15 May 2026
https://github.com/jigyasag18/project-diwali-sales-analysis
This project analyzes retail sales data during the Diwali festival using exploratory data analysis (EDA) to identify buyer demographics and product preferences. The findings reveal that the primary purchasers are married women aged 26-35 from Uttar Pradesh, Maharashtra, and Karnataka, working in IT, Healthcare, and Aviation.
analysis data datapr datapro eda jupyter-notebook python realtimedata
Last synced: 01 Jun 2026
https://github.com/ompreetham/data-structures
binary-search-tree c data data-structures datastructures graph linked-list list stack structures tree
Last synced: 25 Mar 2025
https://github.com/sushmashreeps/python
This repository showcases a comprehensive Python project, demonstrating expertise in backend development, data analysis, and machine learning. Built with Python 3.x, the project utilizes popular libraries like Django, Flask, NumPy, pandas, and scikit-learn. The project features efficient data processing, robust API integration, and scalable archite
api data data-science dataanalysis datavisualization game gamedeveloment python
Last synced: 12 May 2026
https://github.com/zulfachafidz/telco_churn_insight_customer_loss_prediction_with_random_forest_and_decision_tree-algorithms
The main problem in the business world is customer churn, or losing customers, especially in the telecommunications industry, which experiences very tight competition. To overcome this problem, an analysis was carried out to help the company understand how many customers have the potential to switch providers.
data data-science data-visualization dataanalysis dataanalyst dataanalytics datadrivenwithdataprovider decision-tree decision-tree-classifier decision-trees random-forest random-forest-classifier
Last synced: 01 May 2026
https://github.com/nabilaagha/chest-x-ray-medical-diagnosis-using-deep-learning
This project uses deep learning to classify chest X-ray images for disease detection. It involves data preprocessing, pre-trained CNN models, and the ChestX-ray8 dataset to enhance medical diagnostics with AI.
computer-vision data data-processing deep-learning juypter-notebook medical-image-processing x-ray-images
Last synced: 15 Dec 2025
https://github.com/oliver021/helppad-net
Versatile .NET Toolkit: A Comprehensive Set of Miscellaneous Helpers, Classes, and Utilities
assert async checks cryptographic-algorithms data date dotnet fluent functional functional-programming hash helpers parallel pipe pipeline pointers review supports tasks
Last synced: 15 Jun 2026
https://github.com/purarue/scramble-history
parses rubiks cube scramble history/solve time from cstimer.net, cubers.io, twistytimer -- merges them together giving you uniform averages/data/graphs
cstimer cubing data rubiks-cube speedsolving
Last synced: 11 Jun 2025
https://github.com/rishabhmathur06/data_analysis-netflix
data data-analytics data-science matplotlib-pyplot numpy pandas python seaborn
Last synced: 12 Apr 2026
https://github.com/guilyx/airplane-booking
Simple airline ticket reservation program.
Last synced: 25 Jun 2025
https://github.com/chowington/bg-counter-tools
A set of tools that can pull data from Biogents BG-Counter smart mosquito traps and convert them into a Darwin Core compliant format.
bg-counter biogents darwin-core data internet-of-things mosquito-prevalence population-dynamics
Last synced: 10 Oct 2025
https://github.com/quantumudit/test-store-data-analysis
This repository showcases a web scraper with a pipeline structure for efficient data extraction and transformation from websites. The tool can be tailored to leverage its capabilities for insightful data analysis, providing valuable insights and informed decision-making.
data data-visualization dataanalytics python python-webscraping webscraper webscraping-data
Last synced: 11 Apr 2026
https://github.com/vidushibhadana/eda-on-nyc-taxi-data
About Conducting an Exploratory Data Analysis (EDA) on New York City taxi data and visualizing it through countplots, distribution plots (displot), and histograms using Python and it's libraries.
data data-visualization jupyter-notebook matplotlib numpy pandas python seaborn
Last synced: 11 Apr 2026
https://github.com/etmendz/mendz.data
Provides tools and guidance for creating data access contexts and repositories.
context data datasettings entity-framework mendz paginginfo repository resultinfo
Last synced: 11 Jun 2025
https://github.com/afnanenayet/kaggle-titanic
The classic Kaggle Titanic data science challenge
backprop backpropagation classification classifier data forest kaggle layer learn mlp multi numpy pandas perceptron random science scikit sklearn titanic
Last synced: 12 Apr 2026
https://github.com/nouraalgohary/data-scientist-with-python
This repo comprises of my solutions for the tasks assigned in the course.
data data-science data-visualization datacamp datacamp-course datacamp-data-science datacamp-exercises datacamp-solutions-python datascience python
Last synced: 15 Jun 2025
https://github.com/lorenzobloise/client_satisfaction_classification
Jupyter notebook in which satisfaction from clients reviewing European hotels is analyzed using Python libraries such as pandas, numpy and scikit-learn. Various classification models are trained and tested to predict client satisfaction.
classification data data-mining jupyter jupyter-notebook machine-learning pandas python
Last synced: 21 Feb 2026
https://github.com/jorgeatgu/dataset-elecciones-28a
Datasets generados a partir del dataset de elecciones generales de El País
28a data elecciones2019 elections spain
Last synced: 16 May 2026
https://github.com/team-hydrogen/nasa-adc-data
All files relating to the computation of the data provided
data jupyter-notebook nasa-app-development-challenge
Last synced: 25 Mar 2025
https://github.com/nadahamdy217/movies-data-etl-using-python-gcp
Developed a comprehensive ETL pipeline for movie data using Python, Docker, and a GCP Pub/Sub emulator. Successfully processed and published the data in a local Docker environment, showcasing advanced data engineering skills.
analytics data data-engineering data-ingestion data-preparation data-preprocessing data-processing data-project docker etl etl-pipeline gcp matplotlib matplotlib-pyplot numpy pandas pubsub python scipy seaborn
Last synced: 06 Jan 2026
https://github.com/igor-starostenko/sabre
Slice your files like a champ with **sabre**
Last synced: 28 Mar 2025
https://github.com/robson-python/academic-performance
Project to evaluate students' academic performance.
csv-import data data-analysis data-science jupyter-notebook machine-learning matplotlib pandas python scikit-learn seaborn vscode
Last synced: 12 Apr 2026
https://github.com/stefanocoretta/aelfric-relatives
data old-english research-project
Last synced: 23 Feb 2026
https://github.com/climate-resource/input4mips_validation
Validation of input4MIPs data
cmip data forcing input4mips validation
Last synced: 20 Jan 2026
https://github.com/sanogotech/open-source-data-stack
modern open source data stack
airbyte airflow data data-science dbt docker postgresql python
Last synced: 11 Apr 2026
https://github.com/webobite/fact-chatbot
A Fact chatbot is a project in which it read a txt file which consist all facts ahead of time and answer the user with some useful information regarding the same on the basis of facts provided in text file.
chatbot chatgpt chatgpt3 data data-visualization embedding-vectors generativeai nlp
Last synced: 04 May 2026
https://github.com/ikcede/hinge-data-ts-wrapper
Typescript wrapper for exported Hinge data
Last synced: 10 Oct 2025
https://github.com/zeh237/superstore-data-analytics
This is a Flask based data analytics project based on the superstore dataset using flask, pandas, sql and python
analytics data data-analysis data-science data-visualization flask python superstore
Last synced: 04 May 2025
https://github.com/mapi-developer/dapo
Simple, zero-dependency tabular data manipulation and analysis for Python.
Last synced: 06 Mar 2026
https://github.com/jigyasag18/fake-news-prediction-app
The Fake News Prediction App Repository offers a machine learning project that focuses on identifying the authenticity of news articles as fake or real. It uses a dataset of 20,000 articles and employs methods such as TF-IDF vectorization and the Lemmatization algorithm, achieving ~95% classification accuracy with random forest classifier model
data datapreprocessing logistic-regression machine-learning machine-learning-algorithms numpy pandas prediction stemming streamlit streamlit-webapp vectorization
Last synced: 11 Apr 2026
https://github.com/vaibhavmojidra/data-structures---hashtable-using-array-and-linked-list-in-java
Hash Table is a data structure which stores data in an associative manner. In a hash table, data is stored in an array format, where each data value has its own unique index value. Access of data becomes very fast if we know the index of the desired data. Thus, it becomes a data structure in which insertion and search operations are very fast irrespective of the size of the data. Hash Table uses an array as a storage medium and uses hash technique to generate an index where an element is to be inserted or is to be located from.
arrays data data-structures hashing java linked-list mojidra vaibhav vaibhav-mojidra vaibhavmojidra
Last synced: 12 Apr 2025
https://github.com/entropyorg/p5-data-testimage
:notebook::camera: interface for retrieving test images
Last synced: 29 May 2026
https://github.com/myavuzokumus/simplemodelcomparison
This application allows users to upload datasets, handle missing data, and compare different imputation strategies.
algorithm data data-science machine-learning preprocessing streamlit
Last synced: 21 Jan 2026
https://github.com/bkataru/spotigo
AI-powered local music intelligence platform with a task runner server core to retrieve and backup spotify account data to storage(s) at set periodic intervals
ai backup cron data go intelligence local-llm music ollama rag runner spotify task-runner tool-calling
Last synced: 16 Jan 2026
https://github.com/piazzai/chess-variants
Analysis of Lichess variant games
analysis chess chess-variant chess-variants data data-mining data-science data-visualization lichess lichess-database logistic-regression logit-model pgn r r-code r-scripts regression regression-analysis shell shell-scripting
Last synced: 15 May 2026
https://github.com/halyusa16/basic-sql-employee-analysis
This project focuses on analyzing employee data through querying, performing table joins to connect related information, aggregating salary statistics, and using subqueries to extract meaningful insights.
data data-analytics data-exploration database mysql self-project sql
Last synced: 16 May 2026
https://github.com/keminghe/osu
Unofficial and publicly-available NPM data-package about The Ohio State University.
college data majors ohio-state organizations public students university unofficial
Last synced: 06 Jan 2026
https://github.com/ssiarhei115/cv-dbase-analysis
HeadHunter CVs data base analysis
analysis cv data data-science resume
Last synced: 09 Apr 2025
https://github.com/inphyt/quantitative_single_neuron_modeling_competition_2007
Data for the Quantitative Single-Neuron Modeling Competition (2007).
bayesian-inference bayesian-methods bayesian-optimization bayesian-statistics challenge competition computational-neuroscience data electrophysiological-data electrophysiology model-calibration modeling neuronal-models neuroscience neuroscience-competition parameter-estimation simulation simulation-modeling single-neuron-model uncertainty-quantification
Last synced: 26 Jul 2025
https://github.com/jneidel/nationalities
Dataset of 100 common nationalities
data dataset json nationalities nationality opendata
Last synced: 25 Mar 2025
https://github.com/rrwen/poster-gisci-osmol
Conference poster and short paper titled "Outlier Detection in OpenStreetMap Data using the RandomForest Algorithm and Variable Contributions" for the GIScience Conference in 2016
2016 algorithm conference contribution data detection forest gis giscience learn machine open openstreetmap osm outlier paper poster random short variable
Last synced: 03 Apr 2025
https://github.com/rrwen/geohoods-to
Geospatial dataset of 1000+ aggregated variables for neighbourhoods in Toronto, ON, CA
csv data dataset geo geojson gis neighborhood neighborhoods neighbourhood neighbourhoods open open-data toronto toronto-open-data
Last synced: 25 Jun 2025
https://github.com/mierune/tinybufr
[WIP] A Rust library for decoding BUFR (Binary Universal Form for the Representation of meteorological data) files.
bufr data meteorology rust weather wmo
Last synced: 15 May 2025
https://github.com/saisurajmatta/data-warehousing-and-advanced-data-analytics
Data Analytics Project: Analyzed Promotions and Provided Tangible Insights to Sales Director
data data-analysis data-architecture data-flow-analysis data-modeling data-pipeline data-segmentation data-visualization data-warehousing docker etl etl-pipeline mssql sql tableau
Last synced: 17 May 2026
https://github.com/itrauco/data-dirtying-tool
a simple command line tool to generate dirty data and do common data things in google cloud
data data-analysis data-engineering data-ops data-pipeline data-science data-visualization data-wrangling dirty-data google-cloud machine-learning
Last synced: 24 Feb 2025
https://github.com/illustratien/toolphd
Make your analysis simple and reproducible
academic analysis data phd publications r r-package reproducible-research scientific
Last synced: 26 Jan 2026
https://github.com/acovaci/orbit
ORBIT: an Open source Rust-based implementation of a data Build Tool, inspired by DBT
cargo clap-rs data data-warehouse dbt rust rust-lang tokio-rs
Last synced: 16 Mar 2025
https://github.com/pawal/tldmonitor-ui-go
Web UI for TLDMonitor
analysis data dns go golang mongodb statistics webapp website
Last synced: 16 Jan 2026
https://github.com/codehard8/web-scrapping
In this repository we have provide a web scrapping project through beautifulSoup and related files
beutifulsoup data houses-for-sale python3 requests-library-python webscraping
Last synced: 01 Jul 2025
https://github.com/yash-rewalia/airbnb_eda_pandas
The goal of the project is to gather information and analyze the detailed information of the different entries in order to provide insights about the host and price of the property in a particular area as per your preference , type of rooms and number of reviews accordingly.
data data-cleaning data-insights data-preprocessing data-visualization matplotlib numpy pandas python seaborn
Last synced: 11 Apr 2026
https://github.com/elimu-ai/ml-event-simulator
🤖 Simulation of learning events and assessment events
data learning-analytics machine-learning ml
Last synced: 28 Feb 2025
https://github.com/jpcurada/exploralytics
A python package for creating intermediate plotly visualizations
data eda plotly python visualization
Last synced: 05 Feb 2026
https://github.com/jonprice99/regional-election-analysis
An analysis of election results in Allegheny County using Pandas and other Python libraries to better understand the voting habits, practices, and preferences of regional voters.
data data-visualization election-analysis election-data pandas python
Last synced: 05 May 2026
https://github.com/sajjad425/missingvalue
This repository provides a guide on handling missing values in Python, covering identification methods, imputation techniques (mean, median, mode, fill, interpolation), advanced methods (KNN, multiple imputation), and best practices. It includes practical examples for both numerical and categorical data.
data data-analysis-python data-science missing-value-handling missing-value-imputation
Last synced: 04 Apr 2025
https://github.com/abshek7/big-data
A repository for documenting the learning related to theory and practical notes of big data computing.
big-data data data-engineering mapreduce pyspark
Last synced: 15 Jun 2025
https://github.com/eva-kaushik/data-clustering
Clustering Accelerators for hard and soft clustering, including implementations of K-means, K-medoids, hierarchical clustering, fuzzy C-means, and Gaussian mixture models. Demonstrates text clustering using both hard and soft clustering algorithms.
clustering clustering-algorithm data datascience machine-learning-algorithms
Last synced: 09 Apr 2025
https://github.com/coko7/vegapull-records
Cards dataset for One Piece TCG
data dataset one-piece one-piece-card-game one-piece-tcg tcg
Last synced: 26 Feb 2025
https://github.com/jatin-mehra119/paris_housing_price-kaggle-
Paris Housing Price Kaggle Competiton
data data-visualization kaggle-competition machine-learning numpy pandas predictive-modeling scikit-learn
Last synced: 29 Apr 2026
https://github.com/cassandrajm/reddit-dashboard
INTERACTIVE DASHBOARD: Analyzing Political Discourse on Reddit: A Multi-Faceted NLP Approach to Toxicity, Bias, and Political Stance
capstone data data-analysis data-science politics python reddit
Last synced: 09 Apr 2025
https://github.com/ahmad-mtr/prjkt_exam_schedule_test
I hate scrolling in a list of 300+ courses of my Uni exam schedule, so I'm creating this. this's a test btw :)
Last synced: 11 Apr 2025
https://github.com/yassin522/health-insurance-cross-sell-prediction
Prediction of Vehicles Health Insurance
data data-analysis data-science machine-learning plotly python
Last synced: 15 May 2026
https://github.com/yash-chauhan-dev/sf_analytics
Business teams often rely on data analysts to extract insights using SQL. This tool eliminates that dependency by bridging the gap between humans and data using AI.
aiml analytics data dbt langchain llm python snowflake streamlit
Last synced: 07 May 2026
https://github.com/chocolateboy/corrigenda
Corrections, addenda, and deltas for data that's wrong on the Internet
addenda api corrections corrigenda data json json-data
Last synced: 27 Mar 2025
https://github.com/amethyst-php/sku
amethyst amethyst-package api data laravel sku
Last synced: 17 May 2026
https://github.com/writetome51/pagination-page-info
Intended to help a separate Paginator class paginate data. Specifically, this class contains the properties `itemsPerPage` and `totalPages`, which will be used by other classes
batch data javascript paginate pagination typescript
Last synced: 09 May 2026
https://github.com/mnazlukhanyan/da-projects
Портфолио с работами по аналитике данных, показывающие мои навыки, умения и опыт
data data-vizualisation hypothesis-tests matplotlib pandas plotly postgresql product-metrics python scipy seaborn sql visualization
Last synced: 11 Apr 2026
https://github.com/Greatwoman23/Sentiment-Analysis-on-Amazon-Products-Review
Sentiment_Analysis_On_Amazon_Product_Review
analysis dashboard-application data data-science datascientistproject machine-learning publication python remotejob
Last synced: 04 May 2025
https://github.com/vladandreitoma/igisol_jyvaskyla_xept_experimental_campaign
A simulation toolkit together with data analysis for the Xe&Pt Exotic Nuclei Generation experiment @ Jyvaskyla December 2022. Helping dr.Paul Constantin with simulation development. Simulation is done using Geant4 provided by CERN. Data anlysis is done using ROOT by Cern. Both C++ based. Job distributors to run the sim are coded in pearl
analysis architecture-design cplusplus data oop oop-principles pearl simulations
Last synced: 05 Sep 2025
https://github.com/campiohe/geomask
A very simple lib for creating geometric masks from spatial data using regular grids.
Last synced: 30 Dec 2025
https://github.com/jensostertag-archive/charts.js
A JavaScript Plugin to draw Charts to visualize Data and Statistics on Websites
charts data javascript statistics webapplication
Last synced: 22 Jun 2025
https://github.com/davorg/towerbridge
When is Tower Bridge lifting?
data hacktoberfest london perl web-scraping
Last synced: 25 Oct 2025
https://github.com/jcloh98/rental-property-finder
A web scraper that helps users find rental properties by automatically gathering and organizing listings from various websites to discover available homes and apartments.
data headless-browser node scraper scraping web
Last synced: 17 May 2026
https://github.com/merekat/hb-passiv-income
Ein Rechner, der basierend auf historischen Daten unterschiedlicher Assets kalkuliert, welches voraussichtliche passive Einkommen der User abhängig von seinen Eingaben zu erwarten hat.
assets data datajournalism etf passive-income treasury
Last synced: 19 Jul 2025
https://gitlab.com/sean-c/pdf_rules
Turn PDFs into CSVs by defining rules
Data Cleaning automation data data parsing
Last synced: 14 Apr 2025
https://github.com/desininja/food-delivery-realtime-data-analysis
ETL Pipeline in AWS for Real Time Data Analysis
airflow data data-engineering emr-cluster etl kinesis kinesis-strea real-time redshift
Last synced: 15 Oct 2025
https://github.com/hivesolutions/crossline
Simple event pipping and storing infra-structure
Last synced: 15 May 2026
https://github.com/GAMELEIRA/studies-database
Esse repositório têm como objetivo alocar todo e qualquer script para aprender e praticar gerenciamento de banco de dados SQL e NoSQL. Nesse projeto, serão consolidados os principais fundamentos e princípios, além da prática de exercícios e desenvolvimento de projetos.
data database mongodb mssql mysql nosql sql
Last synced: 03 May 2025
https://github.com/code-str8/time-series-forecasting
Developing a model that effectively forecasts the unit sales of numerous items across various Favorita stores with precision.
data dataanalysis forcasting machine-learning time-series visualizations
Last synced: 31 Mar 2025
https://github.com/ppmim/papi4k_old2
PAPI: the PANIC data reduction pipeline
data near-infrared pipeline processing
Last synced: 23 Jun 2025