Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/feliciamarlove/streaming-with-scala-and-spark

Related to Handling Fast Data with Apache Spark SQL and Streaming course on Pluralsight https://app.pluralsight.com/library/courses/apache-spark-sql-fast-data-handling-streaming/exercise-files
https://github.com/feliciamarlove/streaming-with-scala-and-spark

data-engineering hive parquet scala spark streaming

Last synced: 4 days ago
JSON representation

Related to Handling Fast Data with Apache Spark SQL and Streaming course on Pluralsight https://app.pluralsight.com/library/courses/apache-spark-sql-fast-data-handling-streaming/exercise-files

Host: GitHub
URL: https://github.com/feliciamarlove/streaming-with-scala-and-spark
Owner: FeliciaMarlove
Created: 2024-09-05T14:49:42.000Z (4 months ago)
Default Branch: main
Last Pushed: 2024-10-30T09:08:47.000Z (about 2 months ago)
Last Synced: 2024-10-30T09:27:04.752Z (about 2 months ago)
Topics: data-engineering, hive, parquet, scala, spark, streaming
Language: Scala
Homepage:
Size: 6.84 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# How to - Read parquet files

```
pip install parquet-tools
parquet-tools show ./Output/FOLDER/FILE_PATH.parquet
```

# Course
[Handling Fast Data with Apache Spark SQL and Streaming (Pluralsight)](https://app.pluralsight.com/library/courses/apache-spark-sql-fast-data-handling-streaming/exercise-files)

# Notes

/!\ order of operations matters.
This will run, but it'll limit to 5 random rows before ordering

```financesDf.limit(5).orderBy("Amount").show```

This will order, then limit

```financesDf.orderBy("Amount").limit(5).show```