data
Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)
- GitHub: https://github.com/topics/data
- Wikipedia: https://en.wikipedia.org/wiki/Data
- Related Topics: datum,
- Last updated: 2026-07-03 00:07:49 UTC
- JSON Representation
https://github.com/jaldekoa/fiscaldataapi
A Python wrapper to easily retrieve data from the Fiscal Data (US Treasury) official API in pandas format.
api api-wrapper banking data finance pandas python united-states
Last synced: 27 Jan 2026
https://github.com/devlive-community/mockaroo
一个轻量级的 HTTP Mock 服务器,用于快速构建模拟数据接口,适用于前后端开发和接口测试场景。
Last synced: 08 Jul 2025
https://github.com/eve-ning/osumania_data
processed osu!mania data from osu!API
Last synced: 24 Feb 2026
https://github.com/desktopcleaner/naturemagazinescraper
Scrapes open-access Nature magazine articles and store as txt files.
data nature-magazine python scrapper word-frequency
Last synced: 06 Feb 2026
https://github.com/alejo1630/titanic_kaggle
This Python Notebook is a proposal to analyse the Titanic dataset for the Kaggle Competition, using several data science techniques and concepts.
data data-science jupyter-notebook notebook python titanic-survival-prediction
Last synced: 03 May 2026
https://github.com/miniql/miniql-express-mongodb-example
A MiniQL example for querying a MongoDB database through an Express REST API.
data database mongodb query query-language
Last synced: 19 Apr 2026
https://github.com/cainmi/data-page-project
A repository to pull code and files from, may be used to store page data links, code etc. mainly used for python for now
data html javascript python schema
Last synced: 21 Oct 2025
https://github.com/whitehathackerpr/data-visualization-tool
This is a Python-based web application that allows users to upload datasets, analyze data, and create visualizations interactively. The tool is designed for ease of use and provides a simple interface to perform basic data analysis and generate visualizations
data data-analysis data-visualization python python3
Last synced: 05 Sep 2025
https://github.com/tsvikas/covid-19-israel-data
Unofficial Github with the data published by The Israel Ministry of Health, regarding The Coronavirus disease
coronavirus-disease covid-19 csv daily-reports data health israel
Last synced: 05 Jan 2026
https://github.com/ariqf1/learn_data
Currently learning and building projects related to data pipelines, ETL processes, and data processing using Python. Passionate about scalable data solutions and modern data stack tools.
Last synced: 15 Apr 2026
https://github.com/stdlib-js/ndarray-base
Base ndarray.
array base buffer data javascript matrix multidimensional namespace ndarray node node-js nodejs ns stdlib structures types vector
Last synced: 09 Apr 2025
https://github.com/stdlib-js/array-zero-to-like
Generate a linearly spaced numeric array whose elements increment by 1 starting from zero and having the same length and data type as a provided input array.
array data float32array float64array int16array int32array javascript matrix ndarray node node-js nodejs stdlib structure typed typed-array types uint32array vector
Last synced: 07 Jan 2026
https://github.com/CheeseWithSauce/HadithsJSONFormat
Free, authentic Hadith data from sunnah.com organized bookwise specially for Muslim devs. Includes Arabic, English, and gradings. Use freely without credits. Collections: Bukhari, Muslim, Abu Dawud, Tirmidhi, Nasa'i, Ibn Majah, Malik, Riyad as-Salihin. Expanding soon, Inshallah.
api arabic data dev free hadith islam islamic muslim open-source quran sunnah
Last synced: 24 Feb 2026
https://github.com/varbrad/mindb
🗄 🔍 ⚡️ Schema-less document-oriented collection model data-store for Node & Browsers.
browser data datastore db document javascript json-schema mongo mongodb nodejs nosql query schema
Last synced: 13 Apr 2026
https://github.com/simranjeet97/quotes-analysis
Kaggle Dataset on Quotes Analysis and Visualization With Python, Pandas and MatplotLib Using Jupyter Notebook.
data data-science datavisualization jupyter-notebook kaggle kaggle-dataset machine-learning matplotlib-pyplot numpy pandas python quotes quotes-application
Last synced: 15 Apr 2026
https://github.com/aymane-maghouti/mobile-data-hive-insights
This project demonstrates the process of extracting data from a MySQL database, transferring it using Apache Sqoop, storing it in Hive Data warehouse (the data actually is store in Hadoop Distributed File System (HDFS)), and performing analysis using Hive Query Language (Hive QL) (it is a language close to SQL). Then visualize the data in Power BI,
apache-sqoop data data-integration data-visualization hadoop-hdfs hivedb hiveql powerbi
Last synced: 09 Mar 2026
https://github.com/gkapfham/ast2016-paper
Source Code of and Supporting Files for a Paper Published at AST 2016
data latex-document paper research
Last synced: 19 Oct 2025
https://github.com/pablolec/sb_querydsl_criteria_builder
Complex and dynamic frontend-to-backend queries using querydsl
api data design dynamic-queries hibernate java jpa json query query-builder querydsl querydsl-generator rest-api rsql spring spring-boot sql vue web
Last synced: 07 Feb 2026
https://github.com/ishanoshada/matplot3dex
A Matplotlib 3D Extension package for enhanced data visualization
data data-science matplotlib python-packages scikit-learn
Last synced: 05 Jan 2026
https://github.com/nnavales/desafios-data-engineer
En este proyecto abordaremos desafíos comunes en el rol de un Data Engineer con tecnologías modernas.
data data-engineering database dataengineering docker minio scrapping spark
Last synced: 01 Jun 2026
https://github.com/harmanveer-2546/supply-chain
Supply chain analytics is a valuable part of data-driven decision-making in various industries such as manufacturing, retail, healthcare, and logistics. It is the process of collecting, analyzing and interpreting data related to the movement of products and services from suppliers to customers.
customer-segmentation-analysis data data-analysis data-cleaning data-insights ggplot2 numpy pandas performance-evaluation predictive-analytics-for-business python risk-assessment sales-analysis statistical-analysis supply-chain tidyverse trend-analysis
Last synced: 10 Apr 2026
https://github.com/ttitcombe/timekeep
Defensive timeseries analysis in python
data data-science sklearn time-series time-series-analysis timeseries
Last synced: 05 Jan 2026
https://github.com/azrunguraya/kabyle-corpus-dataset
Dans l'univers du Traitement Automatique des Langues , l'accès à des datasets diversifiés et bien annotés est essentiel pour développer des modèles performants. Ce projet vise à combler cette lacune spécifique pour la langue taqbaylit, une langue berbère parlée principalement en Kabylie
ber berber berber-dataset corpus data dataset ia kabyle kabyle-art kb machine-learning nlp nlp-machine-learning python taqbaylit text words
Last synced: 31 Jul 2025
https://github.com/ispyhumanfly/prowler
Query the web, extract data from the results, and transform that data into a format you can use.
ai analytics business cryptocurrency data extract-data machine-learning mining scraping web
Last synced: 06 Sep 2025
https://github.com/elissorokin/data-analyst-portfolio-rus
Это репозиторий, в котором я демонстрирую свои навыки, делюсь проектами и отслеживаю прогресс в области анализа данных и Data Science.
ab-testing data data-analysis datalense matplotlib numpy pandas plotly portfolio postgresql python scipy seaborn sql statistical-analysis
Last synced: 25 Feb 2026
https://github.com/zalweny26/open_data_unipa
Progetto per l'esame di Laboratorio di Algoritmi 23-24, UniPa, Informatica L-31
Last synced: 26 Apr 2026
https://github.com/mftnakrsu/crm-rfm-analysis
CRM-RFM-Analysis
ai crm data data-analysis data-science deep-learning machine-learning python rfm rfm-analysis
Last synced: 16 Mar 2025
https://github.com/thiagopanini/datadelivery
Um módulo Terraform open source capaz de proporcionar um toolkit completo de infraestrutura para que usuários iniciem suas respectivas jornadas de exploração em serviços de Analytics na AWS.
analytics athena aws catalog crawler data datamesh glue s3 terraform
Last synced: 29 Nov 2025
https://github.com/tether/tether-schema
Custom protocol buffer schema for data validation
data protocol schema validation
Last synced: 09 Apr 2025
https://github.com/diegoperea20/own_dataset_segmentation_yolov8
Segmentacion y detection de objetos con propio dataset usando YOLOV8 , en el que se utiliza un dataset propio de una moneda de 200 pesos colombianos del año 2023.
coins colombia data opencv own python segmentation tensorflow yolov8
Last synced: 12 Apr 2026
https://github.com/desininja/data-engineer-interview-questions
This repository contains all the Data Engineer Interview Questions asked by interviewers.
data data-engineer-interview-questions
Last synced: 31 Mar 2025
https://github.com/stdlib-js/ndarray-base-to-reversed
Return a new ndarray where the order of elements of an input ndarray is reversed along each dimension.
base data flip javascript matrix ndarray node node-js nodejs reverse slice stdlib structure to-reversed types vector view
Last synced: 12 Apr 2026
https://github.com/himel-sarder/web-scraping-it-jobs-dataset
This project is a Python-based web scraping tool that collects job listings from TimesJobs for IT-related positions. It extracts job titles, company names, locations, and experience requirements, and saves the data into a CSV file. The tool uses BeautifulSoup and Pandas for web scraping and data manipulation.
data datascience dataset kaggle-dataset machine-learning machinelearning ml web-scraping
Last synced: 22 Feb 2026
https://github.com/rafaelfloressouza/Covid-19-Dashboard
Python web application to display COVID19 data from the world using Plotly and Dash
bootstrap covid-19 css data datavisualization plotly-dash python3
Last synced: 10 Mar 2025
https://github.com/so-cool/uobrain
My solution to the University of Bristol PURE Data Challenge
Last synced: 09 Sep 2025
https://github.com/SAP-archive/signavio-qualtrics-di
Setup an SAP Data Intelligence data pipeline to connect Qualtrics surveys data to SAP Signavio Process Intelligence via Ingestion API.
data intelligence process-intelligence qualtrics sample sap-data-intelligence sap-signavio-process-intelligence signavio
Last synced: 09 May 2025
https://github.com/ultrasage-danz/scikit-learn-ml
Machine Learning with scikit-learn by Data School
ai data data-school machine-learning macos ml scikit-learn ultrasage-dan
Last synced: 13 May 2026
https://github.com/camara94/introduction-to-data-engineering
Describe the different entities that form a modern data ecosystem. Describe and differentiate between the role and responsibilities of Data Engineers, Data Scientists, Data Analysts, Business Analysts, and Business Intelligence Analysts. Explain what Data Engineering is. List the tasks that need to be performed in a typical data engineering lifecycle. Describe what a day in the life of a Data Engineer looks like.
business-analytics business-intelligence data dataingestion dataintegration datascience machinelearning python statistical-analysis
Last synced: 09 Apr 2025
https://github.com/gmersy/data-carbon
Repository accompanying the paper: Toward a Life Cycle Assessment for the Carbon Footprint of Data
carbon-emissions carbon-footprint climate-change data data-science sustainability sustainable-software
Last synced: 31 Mar 2025
https://github.com/waylonwalker/exceltocsv
A usefull tool to convert excel spreadsheets to csv files without launching excel
csv-converter csv-files data excel python spreadsheet
Last synced: 05 May 2025
https://github.com/danreynolds/data_batcher
Data batcher batches and de-dupes data fetched in the same task of the event loop.
batching data flutter hacktoberfest
Last synced: 19 May 2026
https://github.com/abdul-rafay19/youngdevinterns_machine-learning_tasks
This internship offers hands-on exposure to real-world Machine Learning applications — from data visualization and preprocessing to model development, evaluation, and deployment. It focuses on real ML workflows, problem-solving, neural networks, and hyperparameter tuning — all within a collaborative, remote, and growth-oriented environment.
ai artificial-intelligence artificial-intelligence-algorithms artificial-neural-networks data data-visualization internship machine-learning machine-learning-algorithms machinelearning ml model model-development neural-network preprocessing programming-language python task tasks youngdevintern
Last synced: 29 Apr 2026
https://github.com/serhatderya/tabular-playground-series
This repository contains solutions of monthly Tabular Playground Series in Kaggle.
ai artificial-intelligence data data-preprocessing data-processing data-science data-visualization jupyter-notebook kaggle machine-learning numpy pandas python regression scikit-learn scikitlearn-machine-learning seaborn software statsmodels
Last synced: 11 Apr 2026
https://github.com/stdlib-js/array-zero-to
Generate a linearly spaced numeric array whose elements increment by 1 starting from zero.
array data float32array float64array int16array int32array javascript matrix ndarray node node-js nodejs stdlib structure typed typed-array types uint32array vector
Last synced: 08 Jan 2026
https://github.com/idea2app/public-meta-data
HTTP API for Public Meta Data, written in TypeScript & designed for CDN.
api cdn data http meta public typescript
Last synced: 15 Mar 2025
https://github.com/cintia0528/data_science-ab_testing
Conduct a 5-way AB Test on Montana State University Library's website, comparing the original "Interact" button with new versions ("Learn," "Help," "Connect," "Services") to boost user engagement.
abtesting bonferroni chisquare-test data data-science datacleaning datavisualization hypothesis-testing mde statistics
Last synced: 31 Mar 2025
https://github.com/cosmos-loops/cosmos-dapper
Cosmos.Dapper is a part of Cosmos.Data, a inline project of COSMOS LOOPS PROGRAMME. This repository provides a package of StackExchange.Dapper to improve development efficiency.
dapper data mysql mysqlconnector oracle postgresql sql-query sqlite sqlkata sqlserver
Last synced: 11 Apr 2026
https://github.com/spine-tools/metreload
Python application for downloading meteorological reanalysis data
Last synced: 01 Jul 2025
https://github.com/jrcichra/ingestd
HTTP server that easily ingests data into a database
data gin hacktoberfest ingest ingestion restful-api
Last synced: 28 Apr 2026
https://github.com/sbdk-dev/sbdk.dev
A complete reference implementation of a local-first ecosystem for AI-powered analytics. This repository contains the source code for the SBDK.dev website, the central hub for the SBDK suite of open-source tools.
ai-powered-analytics data data-engineering data-engineeringlocal-first data-pipeline-automation data-pipelines dbt dlt duckdb elt etl-pipeline llm local-first machine-learning pipeline sbdk semantic-layer
Last synced: 27 May 2026
https://github.com/marabesi/d3-visualization
Different visualizations using data and d3.js
charts css d3js data html js json timeline-chart visualization
Last synced: 01 May 2026
https://github.com/rayyan9477/dep
data data-science machine-learning python visualization web-scraping
Last synced: 08 May 2026
https://github.com/nesterenko-kv/object-id
ObjectIDs are a special type of identifier mainly used in MongoDB to uniquely identify documents within a collection. They consist of a 12-byte binary value that includes a timestamp, a machine identifier, a process identifier, and a counter.
c-sharp data id net object-id unique-identifier
Last synced: 16 May 2025
https://github.com/antononcube/raku-data-cryptocurrencies
Raku package of cryptocurrency data retrieval.
Last synced: 02 Apr 2025
https://github.com/akin-mustapha/portfolio-management-platform
Portfolio data ingestion pipeline
alembic-migration api dash dash-ui dashboard data data-engineering docker-compose ingestion-pipeline kafka postgres prefect stock-market system-design
Last synced: 27 May 2026
https://github.com/yord/klp-dsv
A delimiter-separated values plugin for klp (Kelpie), the small, fast, and magical command-line data processor.
csv data deserializer dsv json kelpie klp marshaller parser serializer ssv tsv
Last synced: 14 May 2026
https://github.com/oefenweb/python-untraceables
Randomizes IDs for a given set of tables making them untraceable across environments
anonymize data database mysql privacy python python2 python3 randomization
Last synced: 03 Feb 2026
https://github.com/apfirebolt/data-structures-and-algorithms-in-python
Data Structure and Algorithms in Python
algorithms data data-structures python python3 tkinter-gui
Last synced: 15 Mar 2025
https://github.com/ncgl-git/eriparse
Python code to parse the cost-of-living HTML from erieri.com, i.e. https://www.erieri.com/cost-of-living/united-states/illinois/chicago
cost-of-living crime crime-data data economic-research-institute erieri webscraper
Last synced: 14 Jan 2026
https://github.com/rapter1990/data-visualization-examples
Data Visualization Examples
data data-analysis data-visualization folium matplotlib plot plotly python seaborn visualization
Last synced: 13 Apr 2026
https://github.com/bunnysunny24/bluepulse
A Smart Water Management System
data data-processing data-visualization firebase iot machine-learning mysql-database reactjs
Last synced: 17 Mar 2025
https://github.com/rohancyberops/rp1
This project performs an analysis of Starbucks (SBUX) stock returns using R. The analysis includes both simple returns and continuously compounded returns (CC returns) for a period of one month. It also calculates the growth of $1 invested in SBUX and provides visual insights through various plots.
analysis cc data r rlanguage sbux
Last synced: 15 Mar 2025
https://github.com/yukti-09/extracting-data-from-twitter
Data From Twitter!
data data-mining extracting-data timeline tweepy tweets twitter
Last synced: 11 Oct 2025
https://github.com/dixslyf/nbparts
Unpack a Jupyter notebook into its sources, outputs and metadata.
data haskell jupyter jupyter-notebook nix nix-flake
Last synced: 05 Oct 2025
https://github.com/brandonhimpfen/data-size-parser
A tiny, practical parser for human-readable data sizes.
data data-size data-sizes npm open-source web-design web-development
Last synced: 12 Jun 2026
https://github.com/ndohvich/ndohvich
Je suis un grand fan de l'analyse des données avev PYTHON
anaconda arduino data github jypyter keras machine-learning machine-learning-algorithms numpy pandas python scikit-learn sql tensorflow visual-studio-code visualization-dashboard
Last synced: 11 Apr 2026
https://github.com/nikoshet/rust-dms-cdc-operator
The rust-dms-cdc-operator is a Rust-based utility for comparing the state of a list of tables in an Amazon RDS database with data stored in Parquet files on Amazon S3, particularly useful for change data capture (CDC) scenarios.
aws cdc data dms parquet pgdatadiff polars postgres rds rust s3 validation
Last synced: 18 Jan 2026
https://github.com/pharo-ai/data-imputers
This project contains transformers for missing value imputation
ai data data-science imputer pharo pharo-smalltalk smalltalk
Last synced: 18 Jan 2026
https://github.com/scienxlab/datasets
Some small datasets for demos, courses, testing, etc.
data open-data sample-data teaching-resources
Last synced: 09 Oct 2025
https://github.com/kamal-singh22/ai-driven-emotional-sentiments-analysis
This project leverages machine learning to analyze and classify the emotional sentiment of textual data. The goal is to accurately identify and categorize emotions, aiding applications in customer feedback analysis, social media sentiment analysis, and mental health monitoring.
analysis artificial-intelligence data emotion nlp-machine-learning python sentiment-analysis streamlit text-classification
Last synced: 14 Apr 2026
https://github.com/qeeqbox/data-states
Data states refer to structured and unstructured data divided into three categories (At Rest, In Use, and In Transit)
data data-state infosecsimplified qeeqbox
Last synced: 10 Mar 2026
https://github.com/famarks/grafarg
Grafarg is an interactive data analytics and graphical data visualization application. Grafarg being a progressive fork of Grafana 7.5.17 continues to be available under open source Apache 2.0 License
analytics charts data data-analysis data-science data-visualization grafana grafarg graph
Last synced: 19 Jan 2026
https://github.com/ilejuxepwaduzd/structured-data-extractor
🛠️ Extract structured data from messy texts using Chain-of-Thought prompting to improve processing of customer support and technical issues.
cdp chrome-fetcher data document-extraction ecommerce golang-library headless metadata-extraction ocr open-source pdf pdf-converter pdf-extractor ruby scraper shopify spider structured-data
Last synced: 10 Apr 2026
https://github.com/stdlib-js/array-base-assert-is-real-floating-point-data-type
Test if an input value is a supported array real-valued floating-point data type.
array assert base check data dtype is javascript node node-js nodejs stdlib test types util utilities utility utils valid validate
Last synced: 12 Oct 2025
https://github.com/malvfr/zap
Fill your database with fake data.
cli csv data database generator hacktoberfest mock node populate populate-database seed sql
Last synced: 21 Jan 2026
https://github.com/stdlib-js/datasets-herndon-venus-semidiameters
Fifteen observations of the vertical semidiameter of Venus, made by Lieutenant Herndon, with the meridian circle at Washington, in the year 1846.
astronomy data dataset datasets grubbs herndon javascript node node-js nodejs outlier outliers sample statistics stats stdlib venus
Last synced: 09 Oct 2025
https://github.com/mccarthy-m-g/alda
An R data package for the book "Applied longitudinal data analysis: Modeling change and event occurrence" by Singer and Willett (2003).
data growth-curves longitudinal-data mixed-models nonlinear-mixed-models r r-package structural-equation-modeling survival-analysis time-to-event
Last synced: 19 Jan 2026
https://github.com/genert/metis
Asynchronous data sender library
analytics asynchronous data dependency-free typescript
Last synced: 27 Jan 2026
https://github.com/milandjurdjevic/discriminalizer
.NET library designed for seamless JSON deserialization of objects with complex discrimination requirements, built on top of System.Text.Json.
data deserialization dotnet json
Last synced: 15 Apr 2025
https://github.com/Lemniscate-world/StratAI
This project analyzes financial assets using a Hidden Markov Model (HMM) to identify different market regimes and patterns. The analysis includes calculating daily returns, rolling volatility, and volume changes, and visualizing the hidden states identified by the HMM.
ai assets data data-science data-visualization finance financial-analysis fintech hmm-model hmmlearn machine-learning trading
Last synced: 13 Oct 2025
https://github.com/mattqdev/koalaz
Why don't use koalas as data mock? With this npm package you can!
data koala lorem-ipsum meme mock placeholder
Last synced: 13 Jan 2026