An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with pyspark-python

A curated list of projects in awesome lists tagged with pyspark-python .

https://github.com/asuiu/sparkorm

ORM for Apache Spark and DataFrames schema manager

orm pyspark pyspark-python python python3 spark spark-orm spark-sql sparkql sqlalchemy sqlalchemy-orm

Last synced: 07 May 2025

https://github.com/sarthak-1408/pyspark-tutorial

In this Repo, I create a tutorial of PySpark to better understand how to read and manage Big Data.

machine-learning pyspark pyspark-mllib pyspark-python pyspark-tutorial python3

Last synced: 14 Apr 2025

https://github.com/vigneshss-07/pyspark-acompleteguide

This repo explains pyspark modules in python. Used to deal with big data more practical handson.

pyspark pyspark-mllib pyspark-notebook pyspark-python pyspark-tutorial

Last synced: 13 Apr 2025

https://github.com/arturogonzalezm/convert_json_to_parquet

ETL (Extract, Transform, Load) job using PySpark - submodule

apache-spark etl etl-job etl-pipeline pyspark-python python python312

Last synced: 02 Mar 2025

https://github.com/camilajaviera91/pyspark-first-approach

This code demonstrates how to integrate PySpark with datasets and perform simple data transformations. It loads a sample dataset using PySpark's built-in functionalities or reads data from external sources and converts it into a PySpark DataFrame for distributed processing and manipulation.

curses fpdf google-oauth2 gspread kaggle kaggle-api matplotlib os pandas path pathlib pyspark-python pyspark-sql shutil sparksession

Last synced: 26 Jun 2025

https://github.com/abdelmajidlh/ml_diabet_predict_pyspark

Prédiction du diabète par régression logistique avec Python et PySpark

data-science logistic-regression machine-learning pyspark pyspark-mllib pyspark-python

Last synced: 22 Mar 2025

https://github.com/mohammadreza-mohammadi94/pyspark-analytics-hub

A PySpark repository for data analysis, machine learning projects, and hands-on exercises. Explore scalable data processing and advanced ML workflows with Spark.

large-scale-pretraining machine-learning pyspark pyspark-mllib pyspark-python python

Last synced: 22 Feb 2025

https://github.com/pixelbyaj/apache-spark

Start Apache Spark with Python - pyspark

apache-spark pyspark-python python spark winutils

Last synced: 13 Oct 2025

https://github.com/coderjolly/pyspark-yelp-data-analysis

A comparative study to understand the computing efficiencies of Pyspark architectures vs python based distributed programming methodologies such as MPI, multi-threading or multi-processing on the Yelp kaggle dataset.

distributed-system-design distributed-systems-challenges mpi multiprocessing multithreading pyspark pyspark-python

Last synced: 27 Mar 2025

https://github.com/soumyadipta2020/pyspark-sample

Sample codes/functions of pyspark

pyspark pyspark-python python

Last synced: 28 Jul 2025

https://github.com/vladkozhuhov/mindbox_test

Тестовые задания для Mindbox

csharp-library pyspark-python

Last synced: 07 May 2025