data
Individual facts, statistics, or items of information, often numeric. In a technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects. (https://en.wikipedia.org/w/index.php?title=Data&oldid=1093674723, released under CC BY-SA 3.0)
- GitHub: https://github.com/topics/data
- Wikipedia: https://en.wikipedia.org/wiki/Data
- Related Topics: datum,
- Last updated: 2026-07-03 00:07:49 UTC
- JSON Representation
https://github.com/jpcurada/exploralytics
A python package for creating intermediate plotly visualizations
data eda plotly python visualization
Last synced: 05 Feb 2026
https://github.com/datamine/yelp-date
Does being on a date impact the score on a yelp review? Let's find out!
data ipython ipython-notebook pandas python python-2 yelp yelp-reviews
Last synced: 14 Apr 2026
https://github.com/vatshayan/youtube-user-analysis
Analysis of Youtube Users about their choice and preferences
data data-analysis data-mining data-science data-visualization dataset machine-learning machine-learning-algorithms
Last synced: 05 Feb 2026
https://github.com/prishabhanot/facial_recognition_pca
A face recognition system using Principal Component Analysis (PCA) for dimensionality reduction and a Support Vector Machine (SVM) classifier for classification. PCA extracts essential features (eigenfaces) from facial images, significantly reducing computational complexity while retaining critical information for accurate recognition.
data eigenfaces facial-recognition pca python reducing-computational-complexity reducing-data-dimensions svm-classifier
Last synced: 01 Mar 2025
https://github.com/sandygcabanes/etl-earthquake-data-from-usgs-google-cloud-composer-airflow
Airflow, Google Cloud Composer, GCS, BigQuery, Python. This automated pipeline pulls daily earthquake data from a trusted public source, stores it securely in the cloud, and organizes it into clean, searchable tables for analysis.
cloud composer dag data engineering etl etl-pipeline google json python
Last synced: 01 May 2026
https://github.com/quangandrei1003/france_air_pollution_pipeline
End-to-end air pollution data pipeline for French metropolitan cities using Airflow, Python, dbt, BigQuery.
airflow bigquery data data-analytics data-engineering data-modeling data-visualization dbt docker etl pandas python terraform
Last synced: 13 Apr 2026
https://github.com/edjoukou/human_resources
A data analysis project using MySQL Server database
analysis data mysql powerbi sql visualization
Last synced: 25 Sep 2025
https://github.com/yagoluiz/enem-analise-extracao
[PT-BR] Extração e análise de dados do desempenho da região Centro-Oeste
analysis data extraction python3 r
Last synced: 17 Apr 2026
https://github.com/abdiasarsene/edusight-data-driven-insights-for-smarter-education
EduSight transforms educational data into actionable insights, helping NGOs, schools, and policymakers improve academic performance, optimize resources, and evaluate learning programs for better outcomes.
Last synced: 26 Jan 2026
https://github.com/tyriek-cloud/statistical-work-sample
The purpose of this study is to observe if a sample of people that has siblings is independent of a sample of people that possess an opinion of whether patients with incurable diseases should be allowed to die.
analysis data spss statistics t-test
Last synced: 22 Jan 2026
https://github.com/unknownsoup/budget_tracker
A personal budget tracker to build my knowledge of working with databases and data analysis. In this case using SQL and python for the analysis.
data data-science databases python sql
Last synced: 26 Jan 2026
https://github.com/muhammadadilnaeem/bcg-data-science-job-simulation-on-forage-august-2024
This repository contains all the tasks, code, and documentation completed during the BCG Data Science job simulation on The Forage platform. The simulation focused on analyzing customer churn, building predictive models, and presenting insights for a major utility company.
bcg customer-churn-prediction-with-machine-learning data data-science forage numpy pandas
Last synced: 01 May 2026
https://github.com/rugwiroparfait/alx_sql
This repo is where I save my queries and learning materials in Data Science program from ALX
anaconda data data-analysis jupyter-notebook sql
Last synced: 19 Aug 2025
https://github.com/quarkgluant/intro_ml_udemy
cours Udemy d'Introduction au Machine Learning
anaconda3 data data-preprocessing data-regression machine-learning python-3 udemy-machine-learning
Last synced: 12 May 2026
https://github.com/enoch208/eventmaster
A user-friendly application that helps you easily record and play back your keyboard and mouse actions. With its modern design using `tkinter` and `ttkthemes`, it provides a smooth and easy-to-use interface. The app combines reliable technical features to give you a great experience.
automation data key keylogging-python replay spy tools
Last synced: 01 Jun 2026
https://github.com/ahmad-ali-rafique/linear-regression-modeling
In-depth exploration of linear regression models, including data cleaning, model building, and performance evaluation on various datasets.
artificial-intelligence data dataanalytics linear-models linear-regression model multilinear-regression regression regression-models
Last synced: 19 Apr 2026
https://github.com/meokullu/colorizenumber
ColorizeNumber - Bodrum Papatya, visualizes numeric data into colors which creates an image.
color colorize colors data data-visualization visualization vizualize-data
Last synced: 01 Jun 2026
https://github.com/parvezk/d3-fundamentals
D3 library API fundamentals
charts d3 data graphs visualization
Last synced: 19 Oct 2025
https://github.com/gcoronelc/ucv_gdi-1_202302-a2
Taller de Gestión de Datos e Información I con Gustavo Coronel.
data data-science database databases machine-learning machinelearning oracle sql sql-server
Last synced: 02 May 2026
https://github.com/kahlery/my-jupyter-notebook-projects
🐊 collection of my data science analysis, actually I store most of my data science projects in my google drive because of google colab
Last synced: 12 Apr 2026
https://github.com/ayushverma135/dbms-labfile
Created for practical learning, this DBMS lab file offers hands-on exercises covering SQL queries, normalization, indexing, and more. With clear instructions and sample datasets, students gain invaluable experience in database design and management.
Last synced: 04 Feb 2026
https://github.com/dahsie/machine_learning_from_scratch
This project aims to implement some machine learning basic techniques(e.g. MinMaxScaler, StandardScaler, TD-IDF, PCA, Logistic Regression, LDA, KNN, Naive Bayes Classifier) using only pyton, numpy and pandas. This will enable me to have hone my data scientist skills
classification clustering data data-processing datascience machienlearning nlp nltk numpy pandas python regression
Last synced: 04 May 2026
https://github.com/cemc-oper/nmc-typhoon-db-client
A CLI client for NMC Typhoon Database.
Last synced: 01 Jun 2026
https://github.com/woctezuma/steamspy-data
Data snapshot from SteamSpy.
data data-dump data-dumps steam steam-data steamspy steamspy-api
Last synced: 07 Jan 2026
https://github.com/spajai/etl-sharepoint-data-uploader-pipeline
Custom Python Script to Pull specific data from source and Upload to the Microsoft SharePoint
data etl etl-pipeline microsoft microsoft365 python3 sharepoint sharepoint-online
Last synced: 11 Nov 2025
https://github.com/mubashirsidiki/olympics-data-enigeering
Worked with Azure Data Factory, Databricks, Data Lake Storage, and Synapse Analytics to build an ETL pipeline for processing and analyzing Olympic Games data from Kaggle.
analytics azure big-data data dataengineering devops pipeline
Last synced: 02 May 2026
https://github.com/syed-bakhtawar-fahim/dsa_algorithm_code
Assalam o Alikum Guys, This is the repo of Data Structure and Algorithm in C programming language. I hope it will help you in learning Data Structure and Algorithm in C. I'm also learning Data Structure and algorithm in Python in better and easy way you can also explore it
algorithm algorithms-and-data-structures c data data-structures-and-algorithms dsa-algorithm dsa-learning-series dsa-practice
Last synced: 12 Apr 2025
https://github.com/astridlyre/offhand
A Random Data Generator Library for JavaScript.
data generator javascript library random typescript
Last synced: 20 May 2026
https://github.com/wisdom-osborn/data-analytics-course-online-
🔍 Data Analytics with Python — Hands-on Course Materials Jupyter notebooks, projects, and datasets based on the freeCodeCamp Data Analysis with Python certification. Learn NumPy, Pandas, data cleaning, and visualization through real-world examples
data data-analysis data-science data-visualization freecodecamp numpy pandas pandas-dataframe project python
Last synced: 19 Apr 2026
https://github.com/getconversio/dig-the-data
Data visualizations for the Conversio blog
Last synced: 12 Apr 2026
https://github.com/yash-chauhan-dev/sf_analytics
Business teams often rely on data analysts to extract insights using SQL. This tool eliminates that dependency by bridging the gap between humans and data using AI.
aiml analytics data dbt langchain llm python snowflake streamlit
Last synced: 07 May 2026
https://github.com/cassandrajm/reddit-dashboard
INTERACTIVE DASHBOARD: Analyzing Political Discourse on Reddit: A Multi-Faceted NLP Approach to Toxicity, Bias, and Political Stance
capstone data data-analysis data-science politics python reddit
Last synced: 09 Apr 2025
https://github.com/saikatharryc/motionchart-d3js
A dynamic Motion chart Built with D3 js.
Last synced: 23 Dec 2025
https://github.com/miniql/notebook-example
An example of MiniQL in a JavaScript Notebook
comma-separated-values csv data data-analysis data-science graphql javascript notebook query query-language
Last synced: 13 May 2026
https://github.com/pawal/tldmonitor-ui-go
Web UI for TLDMonitor
analysis data dns go golang mongodb statistics webapp website
Last synced: 16 Jan 2026
https://github.com/cvinicius987/projetos-bigdata
Estudos de caso envolvendo projetos de BigData e Engenharia de Dados.
bigdata data data-engineering spark
Last synced: 13 May 2026
https://github.com/keminghe/osu
Unofficial and publicly-available NPM data-package about The Ohio State University.
college data majors ohio-state organizations public students university unofficial
Last synced: 06 Jan 2026
https://github.com/viniddev/active_finance
Nesse projeto busquei solucionar um problema corriqueiro que é a dificuldade de se manter atualizado sobre as variações do mercado de ações e fundos imobiliários. Usei selenium webdriver para buscar informações e uma API do Telegram para enviar relatórios para o usuário
automation data data-analisis rpa selenium-webdriver telegram-bot
Last synced: 03 May 2026
https://github.com/entropyorg/p5-data-testimage
:notebook::camera: interface for retrieving test images
Last synced: 29 May 2026
https://github.com/knowcnu12/metamask-wallet-recovery-funds-phrase-data-seed-token
This repository provides tools and guidelines for securely recovering MetaMask Wallet funds using recovery phrases, seed data, and tokens. It ensures safe and reliable methods for recovering access to your wallet and managing your cryptocurrency assets.
bitcoin blockchain cryptocurrencies cryptocurrency data ethereum funds metamask metamask-bot metamask-desktop metamask-extension metamask-plugin metamask-snap metamask-wallet phrase recovery seed token wallet wallet-security
Last synced: 08 Mar 2026
https://github.com/lorenzobloise/client_satisfaction_classification
Jupyter notebook in which satisfaction from clients reviewing European hotels is analyzed using Python libraries such as pandas, numpy and scikit-learn. Various classification models are trained and tested to predict client satisfaction.
classification data data-mining jupyter jupyter-notebook machine-learning pandas python
Last synced: 21 Feb 2026
https://github.com/afnanenayet/kaggle-titanic
The classic Kaggle Titanic data science challenge
backprop backpropagation classification classifier data forest kaggle layer learn mlp multi numpy pandas perceptron random science scikit sklearn titanic
Last synced: 12 Apr 2026
https://github.com/etmendz/mendz.data
Provides tools and guidance for creating data access contexts and repositories.
context data datasettings entity-framework mendz paginginfo repository resultinfo
Last synced: 11 Jun 2025
https://github.com/prajjwol09/power-bi-project
The Data Survey Breakdown is an interactive Power BI dashboard designed to present insights gathered from a survey of professionals and enthusiasts in the data industry.
dashboard data interactive powerbi survey
Last synced: 15 Mar 2026
https://github.com/prajakta1321/streetml-a-cityscape-traffic-volume-prognostication
StreetML leverages ML learning techniques to revolutionize urban traffic prediction through precise volume prognostication, aiming to enhance cityscape mobility through data-driven insights.
catboostregressor data datavisualisation exploratory-data-analysis lightgbm-regressor linearregression machine-learning machine-learning-algorithms predictive-analytics random-forest-regression xgboost-regression
Last synced: 08 Apr 2025
https://github.com/jacoblincool/moodle-export
A streamlined library for retrieving data from Moodle.
Last synced: 07 May 2025
https://github.com/antoineaugusti/youtubers-tips
Collecting data about tips given to Youtubers
data economy youtube youtubers
Last synced: 03 May 2026
https://github.com/alextanhongpin/node-github-api
:page_with_curl: sample github api queries with nodejs for scraping purposes
Last synced: 06 May 2026
https://github.com/sourceduty/data_architect
🛠️ Develop, model and simulate data architecture framework.
ai artificial-intelligence chatgpt custom-gpt custom-gpts data data-architect data-design data-strategy data-structures data-systems framework framework-development gpt gpts openai openai-chatgpt
Last synced: 08 Aug 2025
https://github.com/anyantudre/associate-data-scientist-track
Materials for the Associate Data Scientist in Python track on DataCamp.
data data-science experimental-design hypothesis-testing machine-learning matplotlib-pyplot pandas python regression sampling seaborn statistics statsmodels unsupervised-learning
Last synced: 03 May 2026
https://github.com/robertoostenveld/dcn.dsc_62002071_01_114_v1
Simon task M/EEG data [Data set].
Last synced: 23 Jan 2026
https://github.com/code-str8/time-series-forecasting
Developing a model that effectively forecasts the unit sales of numerous items across various Favorita stores with precision.
data dataanalysis forcasting machine-learning time-series visualizations
Last synced: 31 Mar 2025
https://github.com/andrewl/danelaw
Geopackage containing the boundary of the Danelaw
data geospatial medieval viking
Last synced: 23 Jan 2026
https://github.com/j-sephb-lt-n/data-warehouse-and-etl-best-practice
A catalogue of best practices for managing data
data data-cleaning data-engineering data-validation data-warehouse etl
Last synced: 23 Jan 2026
https://github.com/raphaellaude/usaschooldata
Cleaned and accessible school enrollment data for US schools
data duckdb duckdb-wasm education object-storage oss wasm
Last synced: 12 May 2026
https://github.com/sourceduty/data_marketer
💰 Analyze uploaded data and prepare a data marketing plan for selling data. Create data product plans.
ai ai-data ai-tool artificial-intelligence business chatgpt company custom-gpt customgpts data data-business data-market data-marketer data-marketing data-tool gpt gpt-store gpts gptstore openai
Last synced: 03 Sep 2025
https://github.com/srgchrksv/datacamp-projects
Datacamps projects
analytics data data-science dataanalysis education jupyter-notebook learning pandas projects python sql
Last synced: 06 May 2026
https://github.com/rylan12/apscores
A quick way to visualize how the AP score distributions have changed from year to year.
advanced-placement analysis ap-exam data scores
Last synced: 19 Jun 2026
https://github.com/davorg/cookingvinyl
Web site with info about Cooking Vinyl records
cooking-vinyl data hacktoberfest music perl
Last synced: 02 Apr 2025
https://github.com/zcebeci/odetector
Outlier Detection Using Cluster Analysis
anomaly-detection cluster-analysis clustering clustering-methods data datapreparation datapreprocessing exception-handling fcm fraud-detection fuzzy-clustering novelty-detection outlier-detection outlier-removal outliers partitioning pcm r surprise-exploration
Last synced: 29 Oct 2025
https://github.com/brianlesko/r_data_science_stat5730
Written by Brian Lesko, the repository contains R Scripts demonstrating data science topics largely originating from study at Ohio State. Contents are written in R studio using the R markdown file. As of 1/21/23 Future projects concerning data science, statistics, and machine learning will be in python in my machine learning Repository
data data-analysis flight-data ggplot2 olympics-data r-markdown tidyverse
Last synced: 23 Jan 2026
https://github.com/harmanveer-2546/reducing-data-entries
Way to delete data entries from csv/excel file using. For excel file, use excel instead of csv in the code.
csv data data-entry delete-data excel numpy pandas python
Last synced: 05 May 2026
https://github.com/nrrso/ex_quickfs
A wrapper / elixir client / SDK to access the quickfs.net API.
data elixir financial financial-data
Last synced: 04 Sep 2025
https://github.com/souza-vitor/stock-market
codecademy data data-analysis data-mining data-science sql sqlite
Last synced: 26 Jun 2026
https://github.com/louis-heraut/dataverseur
🫖 A dataverse API R wrapper to enhance the deposit procedure using only R variable declarations
data data-repository data-science datascience dataset dataverse dataverse-api json metadata metadata-management metadata-parser r
Last synced: 24 Oct 2025
https://github.com/thewillyhuman/willyos-java
willyOS for java developers
collections data data-structures java os structures
Last synced: 12 Jun 2025
https://github.com/mikeasilva/api_data
API Data makes working with open data APIs easy.
Last synced: 23 Jan 2026
https://github.com/byndyusoft/byndyusoft.data.relational
Relational abstractions for Byndyusoft.Data.Relational.
byndyusoft data dataaccess db relational-databases
Last synced: 25 Oct 2025
https://github.com/uznetdev/smoking-prediction
This project focuses on analyzing the "Smoking" dataset and building a predictive model for smoking status based on various health metrics. The goal is to identify factors influencing smoking behavior and develop a reliable model for prediction.
ai classification data data-science kaggle-competition machine-learning ml roc-auc sklearn smoking
Last synced: 17 Apr 2026
https://github.com/tttardigrado/fq
Graffs for the MEDEA project
bokehplots data data-science dataanalysis pandas physics python3
Last synced: 12 Apr 2026
https://github.com/luciarevaliente/shell_script_data_cleaning
This project focuses on cleaning and processing datasets using Shell scripts. It is part of the Fundamentals of Informatics course (2022-23) and involves handling movie and show data to create cleaned and filtered datasets for further analysis.
data data-cleaning shell-script
Last synced: 04 Feb 2026
https://github.com/coko7/vegapull-records
Cards dataset for One Piece TCG
data dataset one-piece one-piece-card-game one-piece-tcg tcg
Last synced: 26 Feb 2025
https://github.com/eva-kaushik/data-clustering
Clustering Accelerators for hard and soft clustering, including implementations of K-means, K-medoids, hierarchical clustering, fuzzy C-means, and Gaussian mixture models. Demonstrates text clustering using both hard and soft clustering algorithms.
clustering clustering-algorithm data datascience machine-learning-algorithms
Last synced: 09 Apr 2025
https://github.com/encelo/wetpaper-data
Data files for the WetPaper project
Last synced: 23 Jan 2026
https://github.com/alsult/alsult
Aliia Sultanova Portfolio
data datascience programming python
Last synced: 23 Jan 2026
https://github.com/acovaci/orbit
ORBIT: an Open source Rust-based implementation of a data Build Tool, inspired by DBT
cargo clap-rs data data-warehouse dbt rust rust-lang tokio-rs
Last synced: 16 Mar 2025
https://github.com/darshjasani/insurance-claim-analysis
This dataset contains insightful information related to insurance claims, giving us an in-depth look into the demographic patterns of those receiving them.
Last synced: 27 Aug 2025
https://github.com/powersyang/visualization
data visualization templates 数据可视化模板
Last synced: 24 Mar 2025
https://github.com/srvanderplas/statistical_atlas
Framed Charts and the Statistical Atlas of 1870
census data ggplot2 graphics r statistics visualization
Last synced: 29 May 2026
https://github.com/mfurmanczyk/wh-sales
E-commerce analytics data warehouse ETL made with Apache Spark.
airflow data data-engineering data-warehouse kotlin python spark
Last synced: 24 Jan 2026
https://github.com/srking501/uk-groceries-images
Repository Containing UK Groceries Images
data groceries grocery images links playwright playwright-python webscraping-data webscrapper
Last synced: 04 May 2026
https://github.com/semcod/code2llm
Python Code Flow Analysis Tool - Static analysis for control flow graphs (CFG), data flow graphs (DFG), and call graph extraction
ast cfg code code2data code2logic code2process data dfg diagram flow graphs llm
Last synced: 01 Jun 2026
https://github.com/zulfachafidz/telco_churn_insight_customer_loss_prediction_with_random_forest_and_decision_tree-algorithms
The main problem in the business world is customer churn, or losing customers, especially in the telecommunications industry, which experiences very tight competition. To overcome this problem, an analysis was carried out to help the company understand how many customers have the potential to switch providers.
data data-science data-visualization dataanalysis dataanalyst dataanalytics datadrivenwithdataprovider decision-tree decision-tree-classifier decision-trees random-forest random-forest-classifier
Last synced: 01 May 2026
https://github.com/maxwelllzh/gis-tutorial-
Tutorials for Columbia University GIS Club
Last synced: 04 May 2026