Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/feliciamarlove/streaming-with-scala-and-spark
Related to Handling Fast Data with Apache Spark SQL and Streaming course on Pluralsight https://app.pluralsight.com/library/courses/apache-spark-sql-fast-data-handling-streaming/exercise-files
https://github.com/feliciamarlove/streaming-with-scala-and-spark
data-engineering hive parquet scala spark streaming
Last synced: 4 days ago
JSON representation
Related to Handling Fast Data with Apache Spark SQL and Streaming course on Pluralsight https://app.pluralsight.com/library/courses/apache-spark-sql-fast-data-handling-streaming/exercise-files
- Host: GitHub
- URL: https://github.com/feliciamarlove/streaming-with-scala-and-spark
- Owner: FeliciaMarlove
- Created: 2024-09-05T14:49:42.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-10-30T09:08:47.000Z (about 2 months ago)
- Last Synced: 2024-10-30T09:27:04.752Z (about 2 months ago)
- Topics: data-engineering, hive, parquet, scala, spark, streaming
- Language: Scala
- Homepage:
- Size: 6.84 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# How to - Read parquet files
```
pip install parquet-tools
parquet-tools show ./Output/FOLDER/FILE_PATH.parquet
```# Course
[Handling Fast Data with Apache Spark SQL and Streaming (Pluralsight)](https://app.pluralsight.com/library/courses/apache-spark-sql-fast-data-handling-streaming/exercise-files)# Notes
/!\ order of operations matters.
This will run, but it'll limit to 5 random rows before ordering```financesDf.limit(5).orderBy("Amount").show```
This will order, then limit
```financesDf.orderBy("Amount").limit(5).show```