Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/feliciamarlove/streaming-with-scala-and-spark

Related to Handling Fast Data with Apache Spark SQL and Streaming course on Pluralsight https://app.pluralsight.com/library/courses/apache-spark-sql-fast-data-handling-streaming/exercise-files
https://github.com/feliciamarlove/streaming-with-scala-and-spark

data-engineering hive parquet scala spark streaming

Last synced: 4 days ago
JSON representation

Related to Handling Fast Data with Apache Spark SQL and Streaming course on Pluralsight https://app.pluralsight.com/library/courses/apache-spark-sql-fast-data-handling-streaming/exercise-files

Awesome Lists containing this project

README

        

# How to - Read parquet files

```
pip install parquet-tools
parquet-tools show ./Output/FOLDER/FILE_PATH.parquet
```

# Course
[Handling Fast Data with Apache Spark SQL and Streaming (Pluralsight)](https://app.pluralsight.com/library/courses/apache-spark-sql-fast-data-handling-streaming/exercise-files)

# Notes

/!\ order of operations matters.
This will run, but it'll limit to 5 random rows before ordering

```financesDf.limit(5).orderBy("Amount").show```

This will order, then limit

```financesDf.orderBy("Amount").limit(5).show```