Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/johnsesana/pyspark-better-show

Small Function that displays Spark Dataframes "df.show()" in a better UI (similar to Pandas) in a notebook
https://github.com/johnsesana/pyspark-better-show

html jupyter-notebook pyspark

Last synced: 15 days ago
JSON representation

Small Function that displays Spark Dataframes "df.show()" in a better UI (similar to Pandas) in a notebook

Awesome Lists containing this project

README

        

# Better Show

Small Function that displays Sparks .show() in a better UI (similar to Pandas) in a notebook

## [Code](https://github.com/JohnSesana/Better-Spark-Show/blob/main/better_show.py)

```python
from IPython.display import HTML

def better_show(df, num_rows=50):
"""
Display a PySpark DataFrame as an HTML table in Jupyter notebook.

Parameters:
df (DataFrame): The PySpark DataFrame to display.
num_rows (int): Number of rows to display. Default is 50.
"""
# Collect the specified number of rows as a list of dictionaries
rows = df.limit(num_rows).collect()

# Create an HTML table string with column headers
html = "" + "".join([f"{col}" for col in df.columns]) + ""

# Add the rows to the table
for row in rows:
html += "" + "".join([f"{value}" for value in row]) + ""

html += ""

# Display the HTML table
return HTML(html)
```

## Usage Example

```python
# Your spark context

spark_df = spark.read.format("orc").load(your_table) # Change the format as you need

better_show(spark_df, 15) # By default shows 50 rows
```