https://github.com/milesgranger/pontem
Treat Spark like pandas.
https://github.com/milesgranger/pontem
dataframe-api dataframes distributed-dataframe pandas pyspark spark-dataframes
Last synced: 7 days ago
JSON representation
Treat Spark like pandas.
- Host: GitHub
- URL: https://github.com/milesgranger/pontem
- Owner: milesgranger
- License: bsd-3-clause
- Created: 2017-08-06T05:43:38.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-09-03T07:35:09.000Z (over 8 years ago)
- Last Synced: 2025-07-28T01:40:46.719Z (10 months ago)
- Topics: dataframe-api, dataframes, distributed-dataframe, pandas, pyspark, spark-dataframes
- Language: Python
- Size: 33.2 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# pontem
Treat PySpark DataFrames like pandas.
_This is currently just a hobby project, not suitable for use._
---
Turn somethinig like this:
```python
# Pure PySpark API; df is type pyspark.sql.DataFrame
def multiply(n):
return udf(lambda col: col * n, FloatType())
df = df.withColumn('new_col', df.select(multiply(2)(df['other_col'])))
```
...into this:
```python
# Using pontem.core.DataFrame object.
df['new_col'] = df['other_col'] * 2
```