Projects in Awesome Lists tagged with pyspark-python
A curated list of projects in awesome lists tagged with pyspark-python .
https://github.com/asuiu/sparkorm
ORM for Apache Spark and DataFrames schema manager
orm pyspark pyspark-python python python3 spark spark-orm spark-sql sparkql sqlalchemy sqlalchemy-orm
Last synced: 07 May 2025
https://github.com/anandarauf/cekatanbiz
CekatanBiz is Software Tools Data Analyst,Business Analyst,and Business Intelligence. Developed using Python.
business-analysis business-analyst business-analytics business-intelligence businessanalytics data-analysis-python data-analyst data-analytics data-science data-visualization pyspark pyspark-notebook pyspark-python
Last synced: 11 Aug 2025
https://github.com/sarthak-1408/pyspark-tutorial
In this Repo, I create a tutorial of PySpark to better understand how to read and manage Big Data.
machine-learning pyspark pyspark-mllib pyspark-python pyspark-tutorial python3
Last synced: 14 Apr 2025
https://github.com/vigneshss-07/pyspark-acompleteguide
This repo explains pyspark modules in python. Used to deal with big data more practical handson.
pyspark pyspark-mllib pyspark-notebook pyspark-python pyspark-tutorial
Last synced: 13 Apr 2025
https://github.com/arturogonzalezm/convert_json_to_parquet
ETL (Extract, Transform, Load) job using PySpark - submodule
apache-spark etl etl-job etl-pipeline pyspark-python python python312
Last synced: 05 Mar 2026
https://github.com/camilajaviera91/pyspark-first-approach
This code demonstrates how to integrate PySpark with datasets and perform simple data transformations. It loads a sample dataset using PySpark's built-in functionalities or reads data from external sources and converts it into a PySpark DataFrame for distributed processing and manipulation.
curses fpdf google-oauth2 gspread kaggle kaggle-api matplotlib os pandas path pathlib pyspark-python pyspark-sql shutil sparksession
Last synced: 26 Jun 2025
https://github.com/abdelmajidlh/ml_diabet_predict_pyspark
Prédiction du diabète par régression logistique avec Python et PySpark
data-science logistic-regression machine-learning pyspark pyspark-mllib pyspark-python
Last synced: 22 Mar 2025
https://github.com/travelxml/apache-spark-pyspark-databricks
APACHE SPARK: Data Analysis, Transformation, and Visualisation with PySpark, IPL Data Analysis
apache-spark data-science data-visualization databricks databricks-notebooks dataframe ipl machine-learning pyspark pyspark-mllib pyspark-notebook pyspark-python pyspark-tutorial
Last synced: 26 Jan 2026
https://github.com/scifer99/spark-api-development
This is a template API via PySpark!
api pycharm-ide pyspark pyspark-api pyspark-python python3 scripting visual-studio-code
Last synced: 16 Feb 2026
https://github.com/soumyadipta2020/pyspark-sample
Sample codes/functions of pyspark
Last synced: 28 Jul 2025
https://github.com/burhanahmed1/iris-dataset-analysis-with-pyspark
Implementation of K-means,Bisecting K-means and Decision Tree in PySpark on the Iris Dataset.
bisecting-kmeans bisecting-kmeans-clustering decision-tree decision-trees jupyter-notebook kmeans kmeans-clustering matplotlib pyspark pyspark-machine-learning pyspark-ml pyspark-mllib pyspark-python python seaborn
Last synced: 24 Dec 2025
https://github.com/mananabbasi/data-science-complete-project-using-big-data-tools-techniques-
This repository contains Databricks projects utilizing RDDs, DataFrames, and SQL to process and analyze various real-world datasets. Data cleaning and analysis have been performed using PySpark functions to handle challenges such as inconsistent formats, missing values, and complex data structures. The project ensures efficient data transformation
azure databricks databricks-industry-solutions databricks-notebooks dataframe pyspark-mllib pyspark-notebook pyspark-python python-script rdd
Last synced: 23 Jan 2026
https://github.com/lucashomuniz/project-13
Validating a Machine Learning Model for Cryptocurrency Price Forecasting with PySpark
analytics apache-spark apache-spark-framework bitcoin-price criptocurrency data-analysis machine-learning-algorithms pyspark pyspark-python python-language realtime-database
Last synced: 10 Jun 2025
https://github.com/coderjolly/pyspark-yelp-data-analysis
A comparative study to understand the computing efficiencies of Pyspark architectures vs python based distributed programming methodologies such as MPI, multi-threading or multi-processing on the Yelp kaggle dataset.
distributed-system-design distributed-systems-challenges mpi multiprocessing multithreading pyspark pyspark-python
Last synced: 27 Mar 2025
https://github.com/phaniteja5789/Real-Time-Data-Processing-Pipeline-Development
This project perform Analytics on Streaming Data.
kafka-producer-consumer kafka-streams pyspark-python python3
Last synced: 28 Aug 2025
https://github.com/mohammadreza-mohammadi94/pyspark-analytics-hub
A PySpark repository for data analysis, machine learning projects, and hands-on exercises. Explore scalable data processing and advanced ML workflows with Spark.
large-scale-pretraining machine-learning pyspark pyspark-mllib pyspark-python python
Last synced: 22 Feb 2025
https://github.com/pixelbyaj/apache-spark
Start Apache Spark with Python - pyspark
apache-spark pyspark-python python spark winutils
Last synced: 13 Oct 2025
https://github.com/venkat-a/exploratory-data-analysis-eda-using-pyspark
Leverage the power of Apache Spark for large-scale data processing and analysis
dataframes descriptive-statistics hadoop-hdfs matplotlib plotly-express pyspark-python seaborn sql statistical-analysis visualization
Last synced: 25 Feb 2025
https://github.com/phaniteja5789/real-time-data-processing-pipeline-development
This project perform Analytics on Streaming Data.
kafka-producer-consumer kafka-streams pyspark-python python3
Last synced: 14 May 2025