An open API service indexing awesome lists of open source software.

https://github.com/tdiprima/pysparklab

Exploring PySpark
https://github.com/tdiprima/pysparklab

apache-spark bigdata data-science datavisualization pyspark pythondatascience spark sparksql

Last synced: 8 months ago
JSON representation

Exploring PySpark

Awesome Lists containing this project

README

          

# PySparkLab

PySpark is similar to Pandas in some ways, but it's not exactly like Pandas. Here's a brief description of PySpark:

PySpark is a Python library that provides an interface to Apache Spark, a unified analytics engine for large-scale data processing. PySpark allows you to write Python code that can run on Spark's distributed computing engine, which means you can process large datasets in parallel across a cluster of machines.

## Following the tutorials:

[PySpark Tutorial](https://youtu.be/_C8kWso4ne4?si=xjnYdQpt2cwoPBrs)

[Pyspark-With-Python](https://github.com/krishnaik06/Pyspark-With-Python.git)

[Playlist](https://www.youtube.com/watch?v=WyZmM6K7ubc&list=PLZoTAELRMXVNjiiawhzZ0afHcPvC8jpcg)

[Try Databricks free](https://www.databricks.com/try-databricks#account)

## Attribution

I'm following **Krish C Naik's [Pyspark-With-Python](https://github.com/krishnaik06/Pyspark-With-Python)** tutorial.

## License Information

This project includes code from Pyspark-With-Python which is licensed under the **GNU General Public License v3 (GPLv3)**.
As required by GPLv3, this project is also licensed under **GPLv3**.

For more details, see the full license file: [GPLv3_LICENSE](GPLv3_LICENSE).