An open API service indexing awesome lists of open source software.

https://github.com/milesgranger/pontem

Treat Spark like pandas.
https://github.com/milesgranger/pontem

dataframe-api dataframes distributed-dataframe pandas pyspark spark-dataframes

Last synced: 7 days ago
JSON representation

Treat Spark like pandas.

Awesome Lists containing this project

README

          

# pontem
Treat PySpark DataFrames like pandas.

_This is currently just a hobby project, not suitable for use._
---

Turn somethinig like this:
```python
# Pure PySpark API; df is type pyspark.sql.DataFrame
def multiply(n):
return udf(lambda col: col * n, FloatType())
df = df.withColumn('new_col', df.select(multiply(2)(df['other_col'])))
```

...into this:
```python
# Using pontem.core.DataFrame object.
df['new_col'] = df['other_col'] * 2
```