https://github.com/tdiprima/pysparklab
Exploring PySpark
https://github.com/tdiprima/pysparklab
apache-spark bigdata data-science datavisualization pyspark pythondatascience spark sparksql
Last synced: 8 months ago
JSON representation
Exploring PySpark
- Host: GitHub
- URL: https://github.com/tdiprima/pysparklab
- Owner: tdiprima
- Created: 2024-05-24T15:55:20.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-30T19:35:56.000Z (8 months ago)
- Last Synced: 2025-01-30T20:28:49.312Z (8 months ago)
- Topics: apache-spark, bigdata, data-science, datavisualization, pyspark, pythondatascience, spark, sparksql
- Language: Jupyter Notebook
- Homepage:
- Size: 116 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PySparkLab
PySpark is similar to Pandas in some ways, but it's not exactly like Pandas. Here's a brief description of PySpark:
PySpark is a Python library that provides an interface to Apache Spark, a unified analytics engine for large-scale data processing. PySpark allows you to write Python code that can run on Spark's distributed computing engine, which means you can process large datasets in parallel across a cluster of machines.
## Following the tutorials:
[PySpark Tutorial](https://youtu.be/_C8kWso4ne4?si=xjnYdQpt2cwoPBrs)
[Pyspark-With-Python](https://github.com/krishnaik06/Pyspark-With-Python.git)
[Playlist](https://www.youtube.com/watch?v=WyZmM6K7ubc&list=PLZoTAELRMXVNjiiawhzZ0afHcPvC8jpcg)
[Try Databricks free](https://www.databricks.com/try-databricks#account)
## Attribution
I'm following **Krish C Naik's [Pyspark-With-Python](https://github.com/krishnaik06/Pyspark-With-Python)** tutorial.
## License Information
This project includes code from Pyspark-With-Python which is licensed under the **GNU General Public License v3 (GPLv3)**.
As required by GPLv3, this project is also licensed under **GPLv3**.For more details, see the full license file: [GPLv3_LICENSE](GPLv3_LICENSE).