Projects in Awesome Lists tagged with pyspark-python
A curated list of projects in awesome lists tagged with pyspark-python .
https://github.com/asuiu/sparkorm
ORM for Apache Spark and DataFrames schema manager
orm pyspark pyspark-python python python3 spark spark-orm spark-sql sparkql sqlalchemy sqlalchemy-orm
Last synced: 07 May 2025
https://github.com/anandarauf/cekatanbiz
CekatanBiz is Software Tools Data Analyst,Business Analyst,and Business Intelligence. Developed using Python.
business-analysis business-analyst business-analytics business-intelligence businessanalytics data-analysis-python data-analyst data-analytics data-science data-visualization pyspark pyspark-notebook pyspark-python
Last synced: 11 Aug 2025
https://github.com/sarthak-1408/pyspark-tutorial
In this Repo, I create a tutorial of PySpark to better understand how to read and manage Big Data.
machine-learning pyspark pyspark-mllib pyspark-python pyspark-tutorial python3
Last synced: 14 Apr 2025
https://github.com/vigneshss-07/pyspark-acompleteguide
This repo explains pyspark modules in python. Used to deal with big data more practical handson.
pyspark pyspark-mllib pyspark-notebook pyspark-python pyspark-tutorial
Last synced: 13 Apr 2025
https://github.com/arturogonzalezm/convert_json_to_parquet
ETL (Extract, Transform, Load) job using PySpark - submodule
apache-spark etl etl-job etl-pipeline pyspark-python python python312
Last synced: 02 Mar 2025
https://github.com/camilajaviera91/pyspark-first-approach
This code demonstrates how to integrate PySpark with datasets and perform simple data transformations. It loads a sample dataset using PySpark's built-in functionalities or reads data from external sources and converts it into a PySpark DataFrame for distributed processing and manipulation.
curses fpdf google-oauth2 gspread kaggle kaggle-api matplotlib os pandas path pathlib pyspark-python pyspark-sql shutil sparksession
Last synced: 26 Jun 2025
https://github.com/abdelmajidlh/ml_diabet_predict_pyspark
Prédiction du diabète par régression logistique avec Python et PySpark
data-science logistic-regression machine-learning pyspark pyspark-mllib pyspark-python
Last synced: 22 Mar 2025
https://github.com/scifer99/spark-api-development
This is a template API via PySpark!
api pycharm-ide pyspark pyspark-api pyspark-python python3 scripting visual-studio-code
Last synced: 12 Oct 2025
https://github.com/travelxml/apache-spark-pyspark-databricks
APACHE SPARK: Data Analysis, Transformation, and Visualisation with PySpark, IPL Data Analysis
apache-spark data-science data-visualization databricks databricks-notebooks dataframe ipl machine-learning pyspark pyspark-mllib pyspark-notebook pyspark-python pyspark-tutorial
Last synced: 29 Oct 2025
https://github.com/mohammadreza-mohammadi94/pyspark-analytics-hub
A PySpark repository for data analysis, machine learning projects, and hands-on exercises. Explore scalable data processing and advanced ML workflows with Spark.
large-scale-pretraining machine-learning pyspark pyspark-mllib pyspark-python python
Last synced: 22 Feb 2025
https://github.com/pixelbyaj/apache-spark
Start Apache Spark with Python - pyspark
apache-spark pyspark-python python spark winutils
Last synced: 13 Oct 2025
https://github.com/coderjolly/pyspark-yelp-data-analysis
A comparative study to understand the computing efficiencies of Pyspark architectures vs python based distributed programming methodologies such as MPI, multi-threading or multi-processing on the Yelp kaggle dataset.
distributed-system-design distributed-systems-challenges mpi multiprocessing multithreading pyspark pyspark-python
Last synced: 27 Mar 2025
https://github.com/burhanahmed1/iris-dataset-analysis-with-pyspark
Implementation of K-means,Bisecting K-means and Decision Tree in PySpark on the Iris Dataset.
bisecting-kmeans bisecting-kmeans-clustering decision-tree decision-trees jupyter-notebook kmeans kmeans-clustering matplotlib pyspark pyspark-machine-learning pyspark-ml pyspark-mllib pyspark-python python seaborn
Last synced: 24 Dec 2025
https://github.com/venkat-a/exploratory-data-analysis-eda-using-pyspark
Leverage the power of Apache Spark for large-scale data processing and analysis
dataframes descriptive-statistics hadoop-hdfs matplotlib plotly-express pyspark-python seaborn sql statistical-analysis visualization
Last synced: 25 Feb 2025
https://github.com/phaniteja5789/real-time-data-processing-pipeline-development
This project perform Analytics on Streaming Data.
kafka-producer-consumer kafka-streams pyspark-python python3
Last synced: 14 May 2025
https://github.com/soumyadipta2020/pyspark-sample
Sample codes/functions of pyspark
Last synced: 28 Jul 2025
https://github.com/lucashomuniz/project-13
Validating a Machine Learning Model for Cryptocurrency Price Forecasting with PySpark
analytics apache-spark apache-spark-framework bitcoin-price criptocurrency data-analysis machine-learning-algorithms pyspark pyspark-python python-language realtime-database
Last synced: 10 Jun 2025
https://github.com/phaniteja5789/Real-Time-Data-Processing-Pipeline-Development
This project perform Analytics on Streaming Data.
kafka-producer-consumer kafka-streams pyspark-python python3
Last synced: 28 Aug 2025