https://github.com/renardeinside/pyspark-logging-examples
Writing PySpark logs in Apache Spark and Databricks
https://github.com/renardeinside/pyspark-logging-examples
apache-spark databricks log4j logging logs
Last synced: 9 months ago
JSON representation
Writing PySpark logs in Apache Spark and Databricks
- Host: GitHub
- URL: https://github.com/renardeinside/pyspark-logging-examples
- Owner: renardeinside
- Created: 2022-06-11T15:01:09.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-06-13T12:56:49.000Z (almost 4 years ago)
- Last Synced: 2025-05-31T22:03:51.983Z (11 months ago)
- Topics: apache-spark, databricks, log4j, logging, logs
- Language: Python
- Homepage: https://polarpersonal.medium.com/writing-pyspark-logs-in-apache-spark-and-databricks-8590c28d1d51
- Size: 20.5 KB
- Stars: 16
- Watchers: 1
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PySpark logging examples in local environment and on Databricks clusters
This repo contains examples on how to configure PySpark logs in the local Apache Spark environment and when using Databricks clusters.
[Link to the blogpost with details](https://polarpersonal.medium.com/writing-pyspark-logs-in-apache-spark-and-databricks-8590c28d1d51).
## Local setup
Provide your logging configurations in `conf/local/log4j.properties` and pass this path via `SPARK_CONF_DIR` when initializing the Python session.
## Databricks setup
* Describe your logging configurations in `conf/databricks/driver-log4j.properties`.
* Provide your `DATABRICKS_CLI_PROFILE` environment variable in the `.env` file
* Upload the configurations to DBFS via `make upload-log-configuration`
* Add the init script in the cluster properties