Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ajaymahadeven/apache-spark-programs
This repository contains Apache Spark programs implemented in Python. These programs are part of my learning process for Apache Spark and are intended to serve as examples for anyone who is also learning or working with Apache Spark.
https://github.com/ajaymahadeven/apache-spark-programs
apache-spark apache-spark-sql apache-sparksql pyspark
Last synced: about 1 month ago
JSON representation
This repository contains Apache Spark programs implemented in Python. These programs are part of my learning process for Apache Spark and are intended to serve as examples for anyone who is also learning or working with Apache Spark.
- Host: GitHub
- URL: https://github.com/ajaymahadeven/apache-spark-programs
- Owner: ajaymahadeven
- License: mit
- Created: 2023-03-27T00:34:56.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-03-27T00:48:34.000Z (almost 2 years ago)
- Last Synced: 2024-11-17T16:03:03.129Z (2 months ago)
- Topics: apache-spark, apache-spark-sql, apache-sparksql, pyspark
- Language: Python
- Homepage:
- Size: 3.94 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Apache Spark Programs
## DESCRIPTION
This repository contains Apache Spark programs implemented in Python. These programs are part of my learning process for Apache Spark and are intended to serve as examples for anyone who is also learning or working with Apache Spark.---
## Installation
Before running these programs, you need to install Apache Spark and PySpark on your system. You can follow the instructions on the official Apache Spark website to download and install the latest version of Apache Spark: https://spark.apache.org/downloads.html
Once you have installed Apache Spark, you can install PySpark using pip:
pip install pyspark
---
## Usage
To run any of the programs in this repository, navigate to the program's directory and run the following command:
spark-submit program-name.py
Make sure to replace program-name with the name of the program you want to run.---
## PROGRAMS :
Here is a list of all the programs in this repository:
1. Total Spent By customer (sorted and SparkSQL version)
2. Calculate Average Friends By Age
3. Filtering RDD's and finding Minimum Temperature
4. Movie Ratings Counter
5. Word Count using FlatMap
6. Calculating Min and Max Temperature using DataFrames
7. Social Graph Analysis using Marvel Superheroes
8. Calculating Average Friends By Age using SparkSQL
9. Calculating Total Spent By Customer using DataFrames
10. Word Count using SparkSQL
11. Calculating Average Friends By Age using DataFrames---
## CONTRIBUTIONS
If you have any suggestions or ideas for new Apache Spark programs, feel free to open an issue or submit a pull request.
---
## LICENSE
This repository is licensed under the MIT License. See the LICENSE file for more information.