Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
Projects in Awesome Lists tagged with big-data-analytics
A curated list of projects in awesome lists tagged with big-data-analytics .
https://github.com/ydataai/ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
big-data-analytics data-analysis data-exploration data-profiling data-quality data-science deep-learning eda exploration exploratory-data-analysis hacktoberfest html-report jupyter jupyter-notebook machine-learning pandas pandas-dataframe pandas-profiling python statistics
Last synced: 16 Dec 2024
https://github.com/ydataai/pandas-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
big-data-analytics data-analysis data-exploration data-profiling data-quality data-science deep-learning eda exploration exploratory-data-analysis hacktoberfest html-report jupyter jupyter-notebook machine-learning pandas pandas-dataframe pandas-profiling python statistics
Last synced: 14 Dec 2024
https://github.com/ICT-BDA/EasyML
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
big-data-analytics learning-platform machine-learning machine-learning-platform machine-learning-studio
Last synced: 30 Oct 2024
https://github.com/ict-bda/easyml
Easy Machine Learning is a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks.
big-data-analytics learning-platform machine-learning machine-learning-platform machine-learning-studio
Last synced: 21 Dec 2024
https://github.com/dongsuo/vue-data-board
A Data Analysis Board in Vue.
bi big-data-analytics business-intelligence data-analysis data-analysis-board data-visualization databoard drag echarts element-ui no-code visualization vue
Last synced: 30 Oct 2024
https://github.com/mahmoudparsian/pyspark-tutorial
PySpark-Tutorial provides basic algorithms using PySpark
big-data big-data-analytics data-algorithms pyspark spark spark-dataframes spark-rdd
Last synced: 21 Dec 2024
https://github.com/alibaba/v6d
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
big-data-analytics cloud-native cncf distributed distributed-comp distributed-systems graph-analytics in-memory-storage shared-memory sig-storage tag-storage
Last synced: 14 Dec 2024
https://github.com/v6d-io/v6d
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
big-data-analytics cloud-native cncf distributed distributed-comp distributed-systems graph-analytics in-memory-storage shared-memory sig-storage tag-storage
Last synced: 28 Oct 2024
https://github.com/metatron-app/metatron-discovery
Powerful & Easy way for big data discovery
apache-druid big-data-analytics business-intelligence chart dashboard data-analytics data-visualization druid self-service sql-editor
Last synced: 31 Oct 2024
https://github.com/lithops-cloud/lithops
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
big-data big-data-analytics cloud-computing data-processing distributed kubernetes multicloud multiprocessing object-storage parallel python serverless serverless-computing serverless-functions
Last synced: 15 Dec 2024
https://github.com/rouyang2017/SISSO
A data-driven method combining symbolic regression and compressed sensing for accurate & interpretable models.
big-data-analytics compressed-sensing machine-learning material-science symbolic-regression
Last synced: 13 Nov 2024
https://github.com/Ashish7129/Graph_Sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
big-data big-data-analytics breadth-first-search data-mining graphs induction network network-analysis network-science networkx python random-walk sample sampling social-network-analysis subgraph
Last synced: 27 Nov 2024
https://github.com/archivesunleashed/aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
analysis apache-spark big-data big-data-analytics dataframe digital-humanities hadoop network-graphing pyspark python3 scala spark text-extraction webarchives
Last synced: 18 Dec 2024
https://github.com/u2i/egis
Egis - a handy Ruby interface for AWS Athena
aws aws-athena big-data big-data-analytics ruby ruby-gem
Last synced: 24 Nov 2024
https://github.com/ingef/conquery
Visual, interactive queries against big databases
big-data big-data-analytics java
Last synced: 17 Nov 2024
https://github.com/arakat-community/arakat
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
big-data-analytics business-intelligence cloud-native-applications data-pipelines distributed-systems docker docker-swarm predictive-maintenance
Last synced: 14 Nov 2024
https://github.com/wittline/pyspark-on-aws-emr
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
aws aws-emr big-data big-data-analytics dataengineering ec2-spot ec2-spot-instances emr-cluster pyspark python spark wordcloud-generator
Last synced: 14 Oct 2024
https://github.com/jaanli/american-community-survey
American Community Survey data on people and households
acs american american-community-survey big-data big-data-analytics census census-data community data-engineering dbt javascript observable observable-plot observablehq survey typescript
Last synced: 07 Nov 2024
https://github.com/azure/azurekusto
R interface to Azure Data Explorer, aka Kusto
azure azure-data-explorer azure-sdk-r big-data-analytics kusto r
Last synced: 07 Oct 2024
https://github.com/k-g-prajwal/big-data-engineering
big-data big-data-analytics data-engineering database-management python sql
Last synced: 09 Nov 2024
https://github.com/seeratawan01/autocapture.js
Build your own analytics - A single library to grabs every click, touch, page-view, and fill — forever.
analytics autocapture big-data-analytics events heatmap user-behavior-analytics user-behaviour user-events user-interaction
Last synced: 09 Nov 2024
https://github.com/jdvelasq/courses
Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia
analytics big-data big-data-analytics data-science training-materials
Last synced: 05 Dec 2024
https://github.com/fiware/tutorials.big-data-flink
:blue_book: FIWARE 305: Real-time Processing of Context Data using Apache Flink
apache-flink big-data-analytics fiware fiware-cosmos flink orion-flink-connector tutorial
Last synced: 17 Nov 2024
https://github.com/amey-thakur/optimizing-stock-trading-strategy-with-k-means-clustering
Big Data Analytics [BDA] Mini Project
amey ameythakur big-data big-data-analytics bigdataanalytics computational computer-engineering engineering megasatish project
Last synced: 09 Nov 2024
https://github.com/n1ghtf1re/map-of-emergency-incidents
Emergency Map allows you to effectively visualize multi-dimensional information, has an intuitive interface. The developed code is easily modified for use in a variety of areas. The use of color mixing technology enhances the perception and analysis of information
big-data big-data-analytics big-data-visualization bigdata color-mixing colors data data-analytics data-science data-visualization data-visualization-challenges data-visualization-simpler mysql open-source-project php student-project
Last synced: 27 Oct 2024
https://github.com/adityakamble49/loss-ratio-prediction
Predicting Loss Ratios for Auto Insurance Portfolios - ITCS 6100 Big Data Analytics for Competitive Advantage
big-data big-data-analytics data-science insurance jupyter-notebook politics python
Last synced: 27 Nov 2024
https://github.com/asavinov/bistro
A general-purpose data analysis engine radically changing the way batch and stream data is processed
analytics big-data-analytics edge-analytics iot stream-analytics stream-processing
Last synced: 30 Oct 2024
https://github.com/amey-thakur/big-data-analytics-and-computational-lab-i
CSDLO7032: Big Data Analytics & CSL704: Computational Lab - I <Semester VII>
amey ameythakur analytics big-data big-data-analytics bigdata bigdataanalytics computational computer-engineering computer-science engineering megasatish textbooks
Last synced: 09 Nov 2024
https://github.com/amey-thakur/hadoop
HADOOP
amey ameythakur big-data big-data-analytics bigdata bigdataanalytics computer-engineering engineering hadoop mapper megasatish reducer
Last synced: 09 Nov 2024
https://github.com/ren294/log-analysis-project
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
apache-kafka apache-nifi apache-spark big-data big-data-analytics cassandra cassandra-driver data-engineering data-science grafana hadoop hadoop-hdfs hive powerbi spark-rdd spark-sql spark-streaming
Last synced: 11 Oct 2024
https://github.com/noobpk/gemini-web-vulnerability-detection
Gemini-Web Vulnerability Detection (G-WVD) detecting web application vulnerabilities with deep learning
apache-kafka apache-spark artificial-intelligence big-data-analytics command-injection cross-site-scripting deep-learning docker-compose docker-image kafka pyspark sqlinjection vulnerability-detection
Last synced: 11 Nov 2024
https://github.com/jofaval/tfm-iabd
Master's Final Degree Project on Artificial Intelligence and Big Data
ai-engineering big-data big-data-analytics data-analysis data-architecture data-engineering data-science data-science-project fastapi kafka mongo-db mongodb nlp node-red nodered python sentiment-analysis spark spark-streaming transformers
Last synced: 10 Oct 2024
https://github.com/bydevmar/master_masd_fpo
Ce dépôt GitHub regroupe tous les cours, TP, TD, projets, et exercices de ma formation en master en mathématiques appliquées pour la science des données. Parcourez-le pour une vue complète de mon parcours académique, offrant une perspective détaillée de mon apprentissage dans ce domaine.
acp afc algebra big-data-analytics dashboards data-analysis datascience economics english graph-theory latex linear-algebra non-linear-algebra probability prog python scientific-research software-package statistics
Last synced: 17 Nov 2024
https://github.com/nico-curti/phdthesis
PhD thesis in Applied Physics
algorithms big-data-analytics deep-neural-networks feature-selection network-analysis optimization-methods parallel-computing
Last synced: 07 Nov 2024
https://github.com/ren294/covid-data-process
This project integrates real-time data processing and analytics using Apache NiFi, Kafka, Spark, Hive, and AWS services for comprehensive COVID-19 data insights.
airflow aws aws-ec2 aws-quicksight big-data big-data-analytics covid19-data docker docker-compose hadoop-hdfs hdfs hive kafka nifi pipeline redpanda spark spark-sql spark-streaming sparksql
Last synced: 11 Oct 2024
https://github.com/nconnector/automotive-market-analysis-platform
Quantitative decision making in automotive industry 🚘📊
automotive big-data-analytics canada cars data-engineering data-science django mongodb python
Last synced: 20 Dec 2024
https://github.com/dgkanatsios/gameanalyticseventhubfunctionscosmosdatalake
Big data reference architecture and implementation for an online multiplayer game
big-data big-data-analytics data-lake-analytics event-hubs lambda-architecture
Last synced: 08 Nov 2024
https://github.com/angeligareta/spark-flight-prediction
Assignment for Cloud Computing And Big Data Ecosystems Design subject that aims to predict flight arrival time using Apache Spark and Scala.
apache-spark big-data big-data-analytics cloud-computing scala upm
Last synced: 22 Nov 2024
https://github.com/erickarpovits/big-data-challenge-2020-2021
Big Data Hackathon Competition and Challenge!
big-data big-data-analytics challenge competition education education-funding-systems karpovits python stemfellowship
Last synced: 07 Nov 2024
https://github.com/garystafford/dataproc-java-demo
Demonstration of Google Cloud Dataproc for running Spark jobs with Java
big-data-analytics dataproc gcp google java spark
Last synced: 06 Dec 2024
https://github.com/aveek-saha/cricket-score-predictor
A Big data application to predict the outcome of a T20 cricket match.
big-data big-data-analytics clustering pyspark spark spark-mllib
Last synced: 05 Nov 2024
https://github.com/raghavtwenty/pyspark-realtime-streaming-sentiment-analysis
⏱ Real-Time Sentiment Analysis using PySpark and simulation of Twitter/X API using FastAPI
apache-spark big-data-analytics bigdata deep-learning fastapi opensource projects pyspark python python-advanced raghavtwenty realtime realtime-analytics realtime-streaming sentiment-analysis sentiment-classification spark-streaming streaming text-classification textblob
Last synced: 12 Oct 2024
https://github.com/hrolive/big-data-analysis-with-hadoop-and-rhadoop
Foundations of “Big Data” processing by introducing the Hadoop distributed computing architecture and providing an introductory level tutorial for Big Data analysis using Hadoop, Rhadoop, and R libraries parallel, doParallel, foreach and Rmpi.
big-data big-data-analytics hadoop hdfs hpc hpc-clusters jupyter mapreduce mpi python r rstudio unix
Last synced: 09 Nov 2024
https://github.com/srlozano/tinder-big-data-analysis
Big Data Analysis of Tinder done at Universitat Rovira i Virgili and Universitat Politècnica de Catalunya · BarcelonaTech
big-data big-data-analytics data-science dating-app mongodb python
Last synced: 30 Nov 2024
https://github.com/fiware/tutorials.big-data-spark
:blue_book: FIWARE 306: Real-time Processing of Context Data using Apache Spark
apache-spark big-data-analytics fiware fiware-cosmos orion-spark-connector spark tutorial
Last synced: 17 Nov 2024
https://github.com/chukwuemekaaham/cloud-gcp-projects
Google Cloud Platform Projects, Workshop Training and Skill Badge
anthos big-data-analytics case-study cloud-identity cloud-infrastructure cloudbuild data-engineering devsecops gcp grafana-dashboard landing-zone migration mlops prometheus service-account spinnaker sre terraform vpn
Last synced: 11 Nov 2024
https://github.com/msusazureaccelerators/workplace-intelligence-accelerator
The Workplace Intelligence Accelerator leverages machine learning and big data analytics to combine and transform data, allowing customer to easily identify factors that influence how people work in their organization.
accelerator ai artificial-intelligence azure azure-devops big-data-analytics human-resources m365 machine-learning microsoft ml power-bi workplace-analytics
Last synced: 26 Nov 2024
https://github.com/abroniewski/idlecompute-data-management-architecture
Implementation of a big data management and analysis backbone architecture using PySpark for distributed and scalable data ingestion and MLlib for machine learning analysis. Part of Big Data Management and Analytics (BDMA) program.
bdma big-data big-data-analytics bigdata dataops hadoop-hdfs machine-learning parquet pipeline pyspark-mllib
Last synced: 12 Nov 2024
https://github.com/harshoza36/movielens_pyspark
MovieLens Dataset analysis using Hadoop and Pyspark
big-data-analytics hadoop movielens movielens-data-analysis pyspark spark spark-sql
Last synced: 12 Nov 2024
https://github.com/bayunova28/sas_visual_data_mining_machine_learning
This repository contains about my weekly projects from Big Data Analytics II course at my college
big-data-analytics big-data-projects data-science machine-learning neural-network
Last synced: 18 Dec 2024
https://github.com/geraked/bigdata
Implementation of Big Data Analytics Algorithms in Python
amirkabir-university association-rules big-data big-data-analytics bigdata collaborative-filtering cs246 data-mining data-science frequent-itemset-mining friendship-algorithm geraked graph kmeans-clustering locality-sensitive-hashing rabist recommender-system stanford-course stream-processing triangle-counting
Last synced: 09 Nov 2024
https://github.com/jabhij/predictionmodels_heartdiseases
Comparison of various Machine Learning algorithms for Heart Diseases (Heart Attack) prediction.
big-data-analytics bigdata data-visualization datamodeling decision-tree hive knn logistic-regression machine-learning mapreduce naviebayes random-forest svm
Last synced: 16 Nov 2024
https://github.com/hatoonguls/big-data-analytics
The repositary contains big data analytics projects using Apache Spark, SQL, and Machine Learning models.
apache-spark big-data-analytics machine-learning-algorithms python
Last synced: 16 Nov 2024
https://github.com/srking501/csc8101_coursework
A summative coursework for CSC8101 Engineering for AI
apache-parquet apache-spark azure-databricks big-data big-data-analytics big-data-processing data-science databri databricks-notebooks delta-file nyc-taxi-dataset parquet-files pyspark
Last synced: 16 Nov 2024
https://github.com/aanujkhurana/bigdata-analysis
SocialMedia Big Data Analysis for Eminem (music artist), using RStudio and R lang
big-data big-data-analytics r-programming-language rstudio
Last synced: 11 Nov 2024
https://github.com/jowilf/big-data-showcase
This repository contains a project showcasing the use of Big Data technologies in processing and visualizing real-time data from an eCommerce electronics store using tools such as Apache Kafka, Spark Streaming, Spark SQL, HBase, and Plotly
big-data-analytics hbase kafka plotly-dash spark-sql spark-streaming
Last synced: 14 Nov 2024
https://github.com/rakeshkanneeswaran/project-cytosine-guanine-gc-percentage-in-genome-sequence
This repository on GitHub contains a Python program that uses data science techniques to calculate the percentage of cytosine and guanine in a genome sequence. Cytosine and guanine are two of the four nucleotide bases found in DNA, and their percentage can be used as a measure of the overall composition of a genome.
big-data-analytics data-science data-visualization genome-sequencing matplotlib pandas-library
Last synced: 13 Nov 2024
https://github.com/fosfrancesco/tweet-popularity
Predict the number of retweets that a tweet about a specific museum will have.
big-data-analytics machine-learning
Last synced: 13 Nov 2024
https://github.com/madhurimarawat/big-data-analytics
This repository demonstrates big data processing, visualization, and machine learning using tools such as Hadoop, Spark, Kafka, and Python.
big-data big-data-analytics big-data-analytics-techniques hadoop-hdfs hadoop-installation hadoop-mapreduce python
Last synced: 14 Nov 2024
https://github.com/arxiver/airbnb-eda-and-regression
Big data exploration and analysis on Airbnb dataset as well as regression model for price prediction of entities
airbnb analysis big-data big-data-analytics bigdata eda python regression regression-models visualization xgboost
Last synced: 15 Nov 2024
https://github.com/jkhan01/kafka-spark-stream
The Project and workaround repository to generate a producer stream to kafka cluster, consume and then process it.
apache-kafka apache-spark big-data big-data-analytics maven pyspark
Last synced: 30 Nov 2024
https://github.com/tirendazacademy/hands-on-data-science-with-gcp
Google BigQuery Tutorial
big-data big-data-analytics bigdata bigquery bigquery-ml bigqueryml cloud-computing data-analysis data-analytics data-engineering data-science dataanalysis dataengineering google-bigquery google-cloud-platform machienlearning machine-learning
Last synced: 08 Nov 2024
https://github.com/r13i/cheapest-phone-call
Small challenge to find the best phone operator to use based on call price
big-data big-data-analytics cheapest data-analysis data-cruncher pandas phone-number pricelist
Last synced: 07 Dec 2024
https://github.com/nathanvilbert/kaiture-agriculture-business-reports-with-power-bi
The project "Kaiture-Agriculture-Business-Reports-with-Power-BI" focuses on utilizing Business Intelligence to optimize agricultural yield and productivity. By integrating Power BI for data analysis, this project provides comprehensive insights into crop production patterns, market trends, and key factors affecting yield.
big-data-analytics data-visualization power-bi sas swot-analysis
Last synced: 07 Dec 2024
https://github.com/bryanfks-dev/klempoken-analysis
Analysis and forcasting model for Klempoken MSMEs
big-data-analytics data-analysis data-forecast data-visualization
Last synced: 14 Dec 2024
https://github.com/nickenshidqia/big_data_analytics_kimia_farma
Big Data Analytics Project gives challenges to create data mart design and dashboard on Kimia Farma
big-data-analytics dashboard data-analyst looker-studio postgresql
Last synced: 05 Nov 2024
https://github.com/hanif-syazul/analyzing-kimia-farma-sales-performance-with-gcp
This repository contains the final project for the Rakamin Big Data Analytics Internship. It include a complete dashboard of Kimia Farma's sales performance analysis from 2020 to 2023.
big-data-analytics bigquery internship-project kimia-farma looker-studio rakamin sql
Last synced: 21 Nov 2024
https://github.com/h-fuzzy-logic/technical-writing
Technical writing samples. Includes walkthroughs and tutorials around data engineering and cloud architectures.
big-data big-data-analytics cloud data-engineering
Last synced: 15 Dec 2024
https://github.com/vara-co/home_sales
Module 22 challenge: Using Google Colab to work on Big Data queries with PySpark SQL, parquet, and cache partitions
big-data big-data-analytics cache google-colab google-colaboratory parquet pyspark pyspark-sql
Last synced: 07 Dec 2024
https://github.com/mrzresearcharena/big-data-using-python
Odin Python Courses
big-data-analytics machine-learning python
Last synced: 16 Nov 2024
https://github.com/burcuyesilyurt/big_data_analytics_six_degrees_of_kevin_bacon
Big Data Analytics PySpark Project
Last synced: 22 Nov 2024
https://github.com/ixgnoy/visualize_movie_with_rating
By using Hadoop, visualization.
big-data big-data-analytics hadoop query
Last synced: 08 Dec 2024
https://github.com/nataliabeltranarg/nosql_graphdatabases_neo4j
This repository showcases a practical exercise on graph databases using Neo4j, covering tasks like graph creation, evolution, querying, and similarity algorithms
big-data-analytics cypher-query-language datamanagement neo4j nosql python synthetic-data
Last synced: 10 Oct 2024
https://github.com/zmyzheng/browserassistant
Big Data & Cloud Computing project for recommendation, cluster analysis, data visualization with Hadoop and Spark deployed in auto- scaling cloud environment, youtube link:
angular big-data-analytics cloud cluster-analysis data-visualization elasticsearch flask hadoop recommendation-system spark spring-boot
Last synced: 11 Dec 2024
https://github.com/yash22222/olympic-games-analytics-using-apache-spark
The "Olympic Games Analytics Using Apache Spark Databricks" project explores data from the Olympic Games (1896-2016) to identify trends and insights. Using Apache Spark for big data processing and Databricks for visualization, the project analyzes key factors like top-performing countries and athlete attributes, showcasing real-world analytics.
apache apache-kafka apache-spark big-data-analytics csv data data-analytics data-visualization databricks excel mysql olympics regions
Last synced: 08 Dec 2024
https://github.com/mrham17/spotify_streaming_analytics
Project is stable & documentation will be completed soon. Thank you for your understanding and patience.
big-data-analytics data-analysis google-colab music-data r-programming spotify streaming-analytics
Last synced: 04 Dec 2024
https://github.com/smohanta23/uber_data-engineering_etl-project
This project demonstrates a comprehensive data engineering workflow using the Uber information dataset. It covers the full spectrum of data engineering pipelines, from data transformation to deployment on Google Cloud, with a focus on creating a scalable and insightful data model.
big-data-analytics bigquery cloudcomputing computeengine dashboard-application dataengineering datainsights datamodelling datapipeline datascience datavisualization etl-pipeline gcp-project googlecloudplatform mage opensource python uber uber-api
Last synced: 21 Nov 2024
https://github.com/zmyzheng/stack_overflow_qa_assistant
Big Data Analysis project with recommendation, cluster analysis and graph database
big-data-analytics cluster-analysis data-visualization graph-database hadoop mahout recommendation-system
Last synced: 11 Dec 2024
https://github.com/wlun001/youtube-video-analysis
YouTube video analysis based on datasets on Kaggle
big-data-analytics dataset kaggle scala spark
Last synced: 17 Dec 2024
https://github.com/saraasgari99/customer-big-data-analytics
Comprehensive exploratory & predictive analysis of customer behavior in e-commerce using big data analytics, visualization, and machine learning
big-data-analytics exploratory-data-analysis exploratory-data-visualizations machine-learning pandas pca python random-forest sklearn
Last synced: 07 Nov 2024
https://github.com/bayunova28/sas_viya_programming
This repository contains about my weekly projects from Big Data Analytics II course at my college
big-data-analytics big-data-projects data-science sas sas-viya
Last synced: 18 Dec 2024
https://github.com/rbalbinotti/prevendo_cons_energia_carros
Curso - Big Data Analytics com R e Microsoft Azure Machine Learning - Projeto Conclusão
azure big-data-analytics machine-learning r
Last synced: 08 Nov 2024
https://github.com/abdurrehman7452/search-engine-utilising-hadoop-mapreduce-technology-with-python-on-wikipedia-articles
Developing a Naive Search Engine Utilising Apache Hadoop MapReduce Technology on a dataset in comma-separated values (CSV) format containing around 5 million Wikipedia articles provided by Wikimedia, as part of an assignment for the Fundamental of Big Data Analytics (DS2004) course.
apache-hadoop big-data-analytics data-science hadoop-mapreduce mapreduce mapreduce-python search-engine wikimedia wikipedia wikipedia-articles
Last synced: 09 Nov 2024
https://github.com/srosalino/six_degrees_of_separation_and_engineering_the_perfect_cast
Leveraging PySpark to analyze the IMDB database, answer various queries, and develop machine learning models to predict a movie's popularity based on its cast
big-data big-data-analytics databricks pyspark pyspark-mllib
Last synced: 10 Nov 2024
https://github.com/syed-bakhtawar-fahim/datavisualization
Data Visualization with Python
big-data-analytics data data-analysis data-analysis-python data-science data-visualization pandas pyspark
Last synced: 06 Nov 2024
https://github.com/noobpk/gemini-bigdata
Gemini-Big Data (G-BD)
apache-kafka apache-spark big-data-analytics kafka-streams pyspark spark-streaming
Last synced: 11 Nov 2024
https://github.com/sayamalt/steel-energy-consumption-prediction-using-pyspark
Successfully established a machine learning model using PySpark which can precisely predict the energy consumption of the steel industry, up to an r2 score of approximately 99.5%.
apache-spark big-data-analytics big-data-processing cross-validation data-visualization exploratory-data-analysis hyperparameter-tuning machine-learning model-training-and-evaluation python regression spark sql
Last synced: 16 Nov 2024
https://github.com/mituskillologies/bigdata-ait-sep24
Programs conducted at Army Institute of Technology, Pune in training on Big Data Analytics during September 2024.
apache-hadoop apache-spark big-data big-data-analytics hadoop spark
Last synced: 16 Nov 2024
https://github.com/aalkiyumi/project-4-big-data-analysis-with-pyspark-on-weather-data
In this project, I analyzed weather data from the NCEI Global Surface Summary of Day dataset using PySpark in Jupyter Notebook. Tasks included data cleaning, statistical analysis, and forecasting for temperature, wind speed, precipitation, and extreme weather events. The project also predicts future weather patterns for Cincinnati and Florida.
big-data-analytics cs5165 data-analysis data-cleaning data-engineering data-science introduction-to-cloud-computing jupyter-notebook machine-learning precipitation-analysis predictive-modeling pyspark statistical-analysis temperature-forecasting time-series-forecasting uc uc2026 university-of-cincinnati wind-speed-data
Last synced: 23 Nov 2024