Projects in Awesome Lists tagged with big-data-analytics
A curated list of projects in awesome lists tagged with big-data-analytics .
https://github.com/data-centric-ai-community/fg-data-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
big-data-analytics data-analysis data-exploration data-profiling data-quality data-science deep-learning eda exploration exploratory-data-analysis hacktoberfest html-report jupyter jupyter-notebook machine-learning pandas pandas-dataframe pandas-profiling python statistics
Last synced: 08 May 2026
https://github.com/Data-Centric-AI-Community/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
big-data-analytics data-analysis data-exploration data-profiling data-quality data-science deep-learning eda exploration exploratory-data-analysis hacktoberfest html-report jupyter jupyter-notebook machine-learning pandas pandas-dataframe pandas-profiling python statistics
Last synced: 09 Mar 2026
https://github.com/ydataai/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
big-data-analytics data-analysis data-exploration data-profiling data-quality data-science deep-learning eda exploration exploratory-data-analysis hacktoberfest html-report jupyter jupyter-notebook machine-learning pandas pandas-dataframe pandas-profiling python statistics
Last synced: 16 Jan 2026
https://github.com/ict-bda/easyml
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
big-data-analytics learning-platform machine-learning machine-learning-platform machine-learning-studio
Last synced: 15 May 2025
https://github.com/ICT-BDA/EasyML
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
big-data-analytics learning-platform machine-learning machine-learning-platform machine-learning-studio
Last synced: 27 Mar 2025
https://github.com/dongsuo/vue-data-board
A Data Analysis Board in Vue.
bi big-data-analytics business-intelligence data-analysis data-analysis-board data-visualization databoard drag echarts element-ui no-code visualization vue
Last synced: 27 Mar 2025
https://github.com/mahmoudparsian/pyspark-tutorial
PySpark-Tutorial provides basic algorithms using PySpark
big-data big-data-analytics data-algorithms pyspark spark spark-dataframes spark-rdd
Last synced: 14 May 2025
https://github.com/v6d-io/v6d
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
big-data-analytics cloud-native cncf distributed distributed-comp distributed-systems graph-analytics in-memory-storage shared-memory sig-storage tag-storage
Last synced: 20 Mar 2025
https://github.com/caioricciuti/ch-ui
Use CH-UI to work with your data from Click House self-hosted with a user-friendly interface. CH-UI is a modern and feature-rich user interface for ClickHouse databases. It offers an intuitive platform for querying ClickHouse databases, executing queries, and visualizing metrics about your instance.
big-data big-data-analytics big-data-visualization clickhouse-ui
Last synced: 24 Feb 2026
https://github.com/metatron-app/metatron-discovery
Powerful & Easy way for big data discovery
apache-druid big-data-analytics business-intelligence chart dashboard data-analytics data-visualization druid self-service sql-editor
Last synced: 28 Mar 2025
https://github.com/lithops-cloud/lithops
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
big-data big-data-analytics cloud-computing data-processing distributed kubernetes multicloud multiprocessing object-storage parallel python serverless serverless-computing serverless-functions
Last synced: 03 Jan 2026
https://github.com/rouyang2017/SISSO
A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.
big-data-analytics compressed-sensing machine-learning material-science symbolic-regression
Last synced: 04 May 2025
https://github.com/Ashish7129/Graph_Sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
big-data big-data-analytics breadth-first-search data-mining graphs induction network network-analysis network-science networkx python random-walk sample sampling social-network-analysis subgraph
Last synced: 19 Jul 2025
https://github.com/archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
analysis apache-spark big-data big-data-analytics dataframe digital-humanities hadoop network-graphing pyspark python3 scala spark text-extraction webarchives
Last synced: 13 Apr 2025
https://github.com/trieu/leo-cdp-free-edition
The binary build of LEO CDP Free Edition for training purposes
arangodb big-data big-data-analytics cdp customer-analytics customer-data-platform data-analysis dataism leo-cdp survey-analysis
Last synced: 19 Jan 2026
https://github.com/u2i/egis
Egis - a handy Ruby interface for AWS Athena
aws aws-athena big-data big-data-analytics ruby ruby-gem
Last synced: 16 Jul 2025
https://github.com/ingef/conquery
Visual, interactive queries against big databases
big-data big-data-analytics java
Last synced: 08 May 2025
https://github.com/wittline/pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
aws aws-emr big-data big-data-analytics dataengineering ec2-spot ec2-spot-instances emr-cluster pyspark python spark wordcloud-generator
Last synced: 13 Apr 2025
https://github.com/arakat-community/arakat
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
big-data-analytics business-intelligence cloud-native-applications data-pipelines distributed-systems docker docker-swarm predictive-maintenance
Last synced: 07 May 2025
https://github.com/k-g-prajwal/big-data-engineering
big-data big-data-analytics data-engineering database-management python sql
Last synced: 07 May 2025
https://github.com/jaanli/american-community-survey
American Community Survey data on people and households
acs american american-community-survey big-data big-data-analytics census census-data community data-engineering dbt javascript observable observable-plot observablehq survey typescript
Last synced: 12 Apr 2025
https://github.com/azure/azurekusto
R interface to Azure Data Explorer, aka Kusto
azure azure-data-explorer azure-sdk-r big-data-analytics kusto r
Last synced: 20 Oct 2025
https://github.com/jdvelasq/courses
Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia
analytics big-data big-data-analytics data-science training-materials
Last synced: 23 Aug 2025
https://github.com/seeratawan01/autocapture.js
Build your own analytics - A single library to grabs every click, touch, page-view, and fill — forever.
analytics autocapture big-data-analytics events heatmap user-behavior-analytics user-behaviour user-events user-interaction
Last synced: 30 Apr 2025
https://github.com/amey-thakur/big-data-analytics-and-computational-lab-i
CSDLO7032: Big Data Analytics [BDA] & CSL704: Computational Lab - I | BE Semester VII | Computer Engineering
amey ameythakur analytics big-data big-data-analytics bigdata bigdataanalytics computational computer-engineering computer-science engineering megasatish textbooks
Last synced: 11 Mar 2026
https://github.com/amey-thakur/optimizing-stock-trading-strategy-with-k-means-clustering
Big Data Analytics [BDA] Mini Project
amey ameythakur big-data big-data-analytics bigdataanalytics computational computer-engineering engineering megasatish project
Last synced: 06 Oct 2025
https://github.com/srlozano/tinder-big-data-analysis
Big Data Analysis of Tinder done at Universitat Rovira i Virgili and Universitat Politècnica de Catalunya · BarcelonaTech
big-data big-data-analytics data-science dating-app mongodb python
Last synced: 11 Oct 2025
https://github.com/adityakamble49/loss-ratio-prediction
Predicting Loss Ratios for Auto Insurance Portfolios - ITCS 6100 Big Data Analytics for Competitive Advantage
big-data big-data-analytics data-science insurance jupyter-notebook politics python
Last synced: 04 Apr 2026
https://github.com/fiware/tutorials.big-data-flink
:blue_book: FIWARE 305: Real-time Processing of Context Data using Apache Flink
apache-flink big-data-analytics fiware fiware-cosmos flink orion-flink-connector tutorial
Last synced: 30 Apr 2025
https://github.com/n1ghtf1re/map-of-emergency-incidents
Emergency Map allows you to effectively visualize multi-dimensional information, has an intuitive interface. The developed code is easily modified for use in a variety of areas. The use of color mixing technology enhances the perception and analysis of information
big-data big-data-analytics big-data-visualization bigdata color-mixing colors data data-analytics data-science data-visualization data-visualization-challenges data-visualization-simpler mysql open-source-project php student-project
Last synced: 18 Mar 2025
https://github.com/nico-curti/phdthesis
PhD thesis in Applied Physics
algorithms big-data-analytics deep-neural-networks feature-selection network-analysis optimization-methods parallel-computing
Last synced: 27 Jan 2026
https://github.com/amey-thakur/hadoop
HADOOP
amey ameythakur big-data big-data-analytics bigdata bigdataanalytics computer-engineering engineering hadoop mapper megasatish reducer
Last synced: 06 Oct 2025
https://github.com/asavinov/bistro
A general-purpose data analysis engine radically changing the way batch and stream data is processed
analytics big-data-analytics edge-analytics iot stream-analytics stream-processing
Last synced: 10 Sep 2025
https://github.com/bydevmar/master_masd_fpo
Ce dépôt GitHub regroupe tous les cours, TP, TD, projets, et exercices de ma formation en master en mathématiques appliquées pour la science des données. Parcourez-le pour une vue complète de mon parcours académique, offrant une perspective détaillée de mon apprentissage dans ce domaine.
acp afc algebra big-data-analytics dashboards data-analysis datascience economics english graph-theory latex linear-algebra non-linear-algebra probability prog python scientific-research software-package statistics
Last synced: 05 May 2025
https://github.com/jofaval/tfm-iabd
Master's Final Degree Project on Artificial Intelligence and Big Data
ai-engineering big-data big-data-analytics data-analysis data-architecture data-engineering data-science data-science-project fastapi kafka mongo-db mongodb nlp node-red nodered python sentiment-analysis spark spark-streaming transformers
Last synced: 24 Oct 2025
https://github.com/ren294/covid-data-process
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.
airflow aws aws-ec2 aws-quicksight big-data big-data-analytics covid19-data docker docker-compose hadoop-hdfs hdfs hive kafka nifi pipeline redpanda spark spark-sql spark-streaming sparksql
Last synced: 29 Oct 2025
https://github.com/msusazureaccelerators/workplace-intelligence-accelerator
The Workplace Intelligence Accelerator leverages machine learning and big data analytics to combine and transform data, allowing customer to easily identify factors that influence how people work in their organization.
accelerator ai artificial-intelligence azure azure-devops big-data-analytics human-resources m365 machine-learning microsoft ml power-bi workplace-analytics
Last synced: 07 Oct 2025
https://github.com/noobpk/gemini-web-vulnerability-detection
Gemini-Web Vulnerability Detection (G-WVD) detecting web application vulnerabilities with deep learning
apache-kafka apache-spark artificial-intelligence big-data-analytics command-injection cross-site-scripting deep-learning docker-compose docker-image kafka pyspark sqlinjection vulnerability-detection
Last synced: 26 Apr 2025
https://github.com/sanketrs/implementation-of-modern-data-engineering-architecture-with-fabric_analytics
Building a next-generation hybrid data pipeline architecture that combines the power of Microsoft Fabric, Azure Cloud, and Power BI. This pipeline is engineered to tackle the challenges of real-time data ingestion, multi-layered processing, and analytics, delivering business-critical insights.
azure azure-data-factory azure-fabric bi-analytics big-data-analytics big-data-projects cloud-data-warehouse cloud-dataflow data-analytics data-engineering data-engineering-pipeline data-engineering-project data-pipeline-monitoring data-science data-visualization data-warehouse etl etl-framework etl-pipeline
Last synced: 14 Apr 2026
https://github.com/ren294/log-analysis-project
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
apache-kafka apache-nifi apache-spark big-data big-data-analytics cassandra cassandra-driver data-engineering data-science grafana hadoop hadoop-hdfs hive powerbi spark-rdd spark-sql spark-streaming
Last synced: 08 Jul 2025
https://github.com/tirendazacademy/hands-on-data-science-with-gcp
Google BigQuery Tutorial
big-data big-data-analytics bigdata bigquery bigquery-ml bigqueryml cloud-computing data-analysis data-analytics data-engineering data-science dataanalysis dataengineering google-bigquery google-cloud-platform machienlearning machine-learning
Last synced: 06 Oct 2025
https://github.com/erickarpovits/big-data-challenge-2020-2021
Big Data Hackathon Competition and Challenge!
big-data big-data-analytics challenge competition education education-funding-systems karpovits python stemfellowship
Last synced: 09 Nov 2025
https://github.com/logannye/emsqrt
Process any data size with a fixed, small memory footprint. EM-√ is an external-memory ETL/log processing engine with hard peak-RAM guarantees. Unlike traditional systems that "try" to stay within memory limits, EM-√ enforces a strict memory cap, enabling you to process arbitrarily large datasets using small memory footprints.
big-data big-data-analytics cloud cloud-computing edge-ai edge-computing efficiency efficient-algorithm memory-allocation rust streaming streaming-algorithms streaming-data
Last synced: 27 Apr 2026
https://github.com/dgkanatsios/gameanalyticseventhubfunctionscosmosdatalake
Big data reference architecture and implementation for an online multiplayer game
big-data big-data-analytics data-lake-analytics event-hubs lambda-architecture
Last synced: 14 Apr 2025
https://github.com/angeligareta/spark-flight-prediction
Assignment for Cloud Computing And Big Data Ecosystems Design subject that aims to predict flight arrival time using Apache Spark and Scala.
apache-spark big-data big-data-analytics cloud-computing scala upm
Last synced: 19 May 2026
https://github.com/nconnector/automotive-market-analysis-platform
Quantitative decision making in automotive industry 🚘📊
automotive big-data-analytics canada cars data-engineering data-science django mongodb python
Last synced: 17 Apr 2026
https://github.com/harshoza36/movielens_pyspark
MovieLens Dataset analysis using Hadoop and Pyspark
big-data-analytics hadoop movielens movielens-data-analysis pyspark spark spark-sql
Last synced: 17 May 2026
https://github.com/geraked/bigdata
Implementation of Big Data Analytics Algorithms in Python
amirkabir-university association-rules big-data big-data-analytics bigdata collaborative-filtering cs246 data-mining data-science frequent-itemset-mining friendship-algorithm geraked graph kmeans-clustering locality-sensitive-hashing rabist recommender-system stanford-course stream-processing triangle-counting
Last synced: 07 Sep 2025
https://github.com/garystafford/dataproc-java-demo
Demonstration of Google Cloud Dataproc for running Spark jobs with Java
big-data-analytics dataproc gcp google java spark
Last synced: 03 Aug 2025
https://github.com/fiware/tutorials.big-data-spark
:blue_book: FIWARE 306: Real-time Processing of Context Data using Apache Spark
apache-spark big-data-analytics fiware fiware-cosmos orion-spark-connector spark tutorial
Last synced: 27 Feb 2026
https://github.com/bayunova28/sas_visual_data_mining_machine_learning
This repository contains about my weekly projects from Big Data Analytics II course at my college
big-data-analytics big-data-projects data-science machine-learning neural-network
Last synced: 20 Mar 2026
https://github.com/aveek-saha/cricket-score-predictor
A Big data application to predict the outcome of a T20 cricket match.
big-data big-data-analytics clustering pyspark spark spark-mllib
Last synced: 11 Apr 2026
https://github.com/ssiarhei115/customer-classification
Developing ML model predicting bank' customer inclination to open a deposit
big-data big-data-analytics data data-science data-visualization mashine-learning
Last synced: 09 Apr 2025
https://github.com/windi-wulandari/pbi_kimia-farma-x-rakamin
A data-driven analytics project for Kimia Farma to evaluate business performance from 2020-2023 using BigQuery. Focused on transaction data, inventory, branch operations, and product insights. Results were visualized through an interactive dashboard to support strategic decisions and optimizations.
big-data-analytics bigquery datawarehouse googlelooker sql
Last synced: 03 Jan 2026
https://github.com/chukwuemekaaham/cloud-gcp-projects
Google Cloud Platform Projects, Workshop Training and Skill Badge
anthos big-data-analytics case-study cloud-identity cloud-infrastructure cloudbuild data-engineering devsecops gcp grafana-dashboard landing-zone migration mlops prometheus service-account spinnaker sre terraform vpn
Last synced: 16 May 2026
https://github.com/vara-co/home_sales
Module 22 challenge: Using Google Colab to work on Big Data queries with PySpark SQL, parquet, and cache partitions
big-data big-data-analytics cache google-colab google-colaboratory parquet pyspark pyspark-sql
Last synced: 28 Mar 2025
https://github.com/adnanrahin/spark-flights-data-analysis
The U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics tracks the on-time performance of domestic flights operated by large air carriers. Summary information on the number of on-time, delayed, canceled, and diverted flights is published in DOT's monthly Air Travel Consumer Report and in this dataset of 2015 flight delays and cancellations.
apache-spark big-data-analytics docker docker-compose docker-container java maven spark spark-sql spark-streaming
Last synced: 08 Apr 2026
https://github.com/abroniewski/idlecompute-data-management-architecture
Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.
bdma big-data big-data-analytics bigdata dataops hadoop-hdfs machine-learning parquet pipeline pyspark-mllib
Last synced: 08 Aug 2025
https://github.com/madhurimarawat/python-projects
This repository contains the projects that I made in the Python programming language.
automobile-dataset big-data-analytics csv-files data-analysis-python data-visualization data-visualization-project documentation email-spam-classification google-app-data image-background-removal image-processing nba-dataset python python-libraries rock-paper-scissors-game streamlit streamlit-deployment streamlit-webapp
Last synced: 12 Apr 2026
https://github.com/jabhij/predictionmodels_heartdiseases
Comparison of various Machine Learning algorithms for Heart Diseases (Heart Attack) prediction.
big-data-analytics bigdata data-visualization datamodeling decision-tree hive knn logistic-regression machine-learning mapreduce naviebayes random-forest svm
Last synced: 04 Jun 2026
https://github.com/nickenshidqia/big_data_analytics_kimia_farma
Big Data Analytics Project gives challenges to create data mart design and dashboard on Kimia Farma
big-data-analytics dashboard data-analyst looker-studio postgresql
Last synced: 10 Apr 2025
https://github.com/u5720002/artificial-intelligence
Collection of Artificial Intelligence projects
ai-agents ai-assistant ai-tools artificialintelligence big-data-analytics computer-vision github google-maps-api microsoft-azure ml-deployment ml-engineering ml-scan-barcode newsletter nvidia-ai robotics-programming
Last synced: 08 Jan 2026
https://github.com/theoliverlear/crypto-trader
A Spring Boot web app that buys and sells cryptocurrencies from API data sources. Its quick trading and other features allow users to leverage computer power to outperform the market.
ai ai-models big-data-analytics cryptocurrency data-science financial full-stack hibernate-jpa machine-learning nodejs python sass service spring-boot tensorflow typescript website
Last synced: 13 Apr 2026
https://github.com/vasugi2003/web-server-log-analysis-using-pyspark
Web Server Log analysis using Pyspark
algorithms analysis big-data-analytics hadoop ml prediction pyspark python3
Last synced: 14 Apr 2026
https://github.com/smusab9152/pyspark_programs_and_projects
Collection of PySpark programs and projects demonstrating the use of Apache Spark's Python API for big data processing and analysis. It includes practical implementations such as logistic regression classification, data analysis on the Iris dataset, and basic PySpark operations like temperature conversion.
apache-spark big-data big-data-analytics data-engineering distributed-computing etl pyspark spark-dataframes spark-rdd spark-sql
Last synced: 18 May 2026
https://github.com/datasciencelovers/ai-financial-market-data-analysis
Analyse Financial Market Data of AI companies with Python
ai artificial-intelligence big-data-analytics chatgpt data-analysis data-analytics data-science data-visualization financial-analysis gemini google llama machine-learning market-data-analysis matplotlib-python meta openai pandas-python python
Last synced: 05 May 2026
https://github.com/vinay-ram1999/data-engineer-playground
A fully containerized multi-service environment to prototype end-to-end ETL workflows.
airflow big-data-analytics data-engineering delta-lake docker iceberg lakehouse minio nessie postgresql spark sql trino unitycatalog
Last synced: 14 Apr 2026
https://github.com/rohith-2/big_data_analysis
Performance of Aircraft in the US from 1987 to 2008.
apache-spark big-data big-data-analytics bigdata dashboard scala spark tableau
Last synced: 23 Jan 2026
https://github.com/nathanvilbert/kaiture-agriculture-business-reports-with-power-bi
The project "Kaiture-Agriculture-Business-Reports-with-Power-BI" focuses on utilizing Business Intelligence to optimize agricultural yield and productivity. By integrating Power BI for data analysis, this project provides comprehensive insights into crop production patterns, market trends, and key factors affecting yield.
big-data-analytics data-visualization power-bi sas swot-analysis
Last synced: 19 Mar 2026
https://github.com/bayunova28/sas_viya_programming
This repository contains about my weekly projects from Big Data Analytics II course at my college
big-data-analytics big-data-projects data-science sas sas-viya
Last synced: 20 Mar 2026
https://github.com/sohhamseal/random-topic-ppts
A set of presentations I had created for various seminar work and/or coursework
big-data-analytics blockchain-technology brain-computer-interface
Last synced: 19 Mar 2026
https://github.com/hrolive/big-data-analysis-with-hadoop-and-rhadoop
Foundations of “Big Data” processing by introducing the Hadoop distributed computing architecture and providing an introductory level tutorial for Big Data analysis using Hadoop, Rhadoop, and R libraries parallel, doParallel, foreach and Rmpi.
big-data big-data-analytics hadoop hdfs hpc hpc-clusters jupyter mapreduce mpi python r rstudio unix
Last synced: 19 Apr 2026
https://github.com/varshithdupati/yelp-business-analysis
Big Data analysis on Yelp reviews/businesses for Arizona. Using Hadoop, Spark, PySpark.
arizona-state-university big-data big-data-analytics data-analysis hadoop pyspark spark yelp
Last synced: 04 May 2026
https://github.com/mrham17/spotify_streaming_analytics
Project is stable & documentation will be completed soon. Thank you for your understanding and patience.
big-data-analytics data-analysis google-colab music-data r-programming spotify streaming-analytics
Last synced: 24 Jul 2025
https://github.com/madhurimarawat/big-data-analytics
This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.
apache-kafka apache-spark big-data big-data-analytics big-data-analytics-techniques data-preprocessing-and-cleaning data-stratification data-visualization hadoop-hdfs hadoop-hive hadoop-installation hadoop-mapreduce hiveql python spark-graphx spark-mllib spark-mllib-library spark-rdd spark-streaming
Last synced: 05 Apr 2026
https://github.com/zmyzheng/stack_overflow_qa_assistant
Big Data Analysis project with recommendation, cluster analysis and graph database
big-data-analytics cluster-analysis data-visualization graph-database hadoop mahout recommendation-system
Last synced: 30 Mar 2025
https://github.com/abdullahkhurshid/ecommerce-marketing-analytics
Using Apache Spark for marketing analytics
apache-spark big-data-analytics cloud-computing marketing-analytics r supervised-learning unsupervised-learning
Last synced: 12 Apr 2026
https://github.com/zmyzheng/browserassistant
Big Data & Cloud Computing project for recommendation, cluster analysis, data visualization with Hadoop and Spark deployed in auto- scaling cloud environment, youtube link:
angular big-data-analytics cloud cluster-analysis data-visualization elasticsearch flask hadoop recommendation-system spark spring-boot
Last synced: 14 Apr 2026
https://github.com/hanif-syazul/analyzing-kimia-farma-sales-performance-with-gcp
This repository contains the final project for the Rakamin Big Data Analytics Internship. It include a complete dashboard of Kimia Farma's sales performance analysis from 2020 to 2023.
big-data-analytics bigquery internship-project kimia-farma looker-studio rakamin sql
Last synced: 02 Jan 2026
https://github.com/mehwishferoz/bda-project
A Hadoop MapReduce project analyzing the Consumer Complaints dataset with five queries to extract insights like complaints by product, state, company, tags, and timely responses.
big-data-analytics hadoop hadoop-hdfs hadoop-mapreduce java-8
Last synced: 10 May 2026
https://github.com/kaustubh-indulkar/te-it-dsbda-assignmnets
This repository contains the solutions for a series of assignments covering Data Science And Big Data Analytics concepts.
big-data big-data-analytics data-analytics data-science data-visualization sppu-2019-pattern sppu-it-dept
Last synced: 29 Mar 2025
https://github.com/burcuyesilyurt/big_data_analytics_six_degrees_of_kevin_bacon
Big Data Analytics PySpark Project
Last synced: 10 Jun 2025
https://github.com/rudra-g-23/rural-financial-inclusion-govt-scheme-recommendation
A Project where Analysis the Govt. Dataset to understand the rural Indian financial condition and create a ML model for prediction.
big-data big-data-analysis big-data-analytics big-data-and-ml big-data-projects big-data-visualization ml mlops
Last synced: 22 Feb 2026
https://github.com/mituskillologies/bigdata-ait-sep24
Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.
apache-hadoop apache-spark big-data big-data-analytics hadoop spark
Last synced: 10 Mar 2026
https://github.com/yuvrajsaraogi/unemployment-analysis-with-python
Unemployment is measured by the unemployment rate which is the number of people who are unemployed as a percentage of the total labour force. We have seen a sharp increase in the unemployment rate during Covid-19, so analyzing the unemployment rate can be a good data science project.
big-data big-data-analytics data-analysis data-science data-visualization engineering excel jupyter-notebook machine-learning mini-project natural-language-processing nlp project python3 sql
Last synced: 19 Apr 2026
https://github.com/aanujkhurana/bigdata-analysis
SocialMedia Big Data Analysis for Eminem (music artist), using RStudio and R lang
big-data big-data-analytics r-programming-language rstudio
Last synced: 13 Jul 2025
https://github.com/nik-kusanagi/python2.5
Retorno ao Uso do Python
algorithms big-data big-data-analytics blockchain blockchain-technology jupyter machine-learning python python3
Last synced: 24 Apr 2026
https://github.com/h-fuzzy-logic/technical-writing
Technical writing samples. Includes walkthroughs and tutorials around data engineering and cloud architectures.
big-data big-data-analytics cloud data-engineering
Last synced: 12 Jan 2026
https://github.com/aalkiyumi/project-4-big-data-analysis-with-pyspark-on-weather-data
In this project, I analyzed weather data from the NCEI Global Surface Summary of Day dataset using PySpark in Jupyter Notebook. Tasks included data cleaning, statistical analysis, and forecasting for temperature, wind speed, precipitation, and extreme weather events. The project also predicts future weather patterns for Cincinnati and Florida.
big-data-analytics cs5165 data-analysis data-cleaning data-engineering data-science introduction-to-cloud-computing jupyter-notebook machine-learning precipitation-analysis predictive-modeling pyspark statistical-analysis temperature-forecasting time-series-forecasting uc uc2026 university-of-cincinnati wind-speed-data
Last synced: 17 Mar 2025
https://github.com/jabulente/histogram-visualization-with-matplotlib
This repository showcases how to create visually appealing and customized histograms using Python’s Matplotlib and Seaborn libraries. It includes examples of enhancing default plots with colors, fonts, transparency, and layout adjustments to better communicate data distribution insights.
ai big-data-analytics data-science data-storytelling data-visualization histogram matplotlib
Last synced: 22 Jul 2025
https://github.com/adwaiy2912/bda-lab
Repository contains weekly lab work and assignments for the Big Data Analytics (BDA) course
big-data-analytics hadoop hbase hive mapreduce pig-latin spark
Last synced: 13 May 2026
https://github.com/radhikareddy-chintareddy/big-data-insights-nyc-taxi-trips-2013-
A project showcasing memory-efficient big data processing using Python, focusing on scalable data handling to overcome memory constraints. Includes anomaly detection, efficient visualizations, and actionable insights from the 2013 NYC Taxi Trip dataset.
big-data-analytics csv-reader matplotlib-pyplot python
Last synced: 16 Apr 2026
https://github.com/syed-bakhtawar-fahim/datavisualization
Data Visualization with Python
big-data-analytics data data-analysis data-analysis-python data-science data-visualization pandas pyspark
Last synced: 30 Apr 2026
https://github.com/r13i/cheapest-phone-call
Small challenge to find the best phone operator to use based on call price
big-data big-data-analytics cheapest data-analysis data-cruncher pandas phone-number pricelist
Last synced: 04 May 2026