Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/minrk/findspark
https://github.com/minrk/findspark
Last synced: 9 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/minrk/findspark
- Owner: minrk
- License: bsd-3-clause
- Created: 2015-06-12T21:34:06.000Z (over 9 years ago)
- Default Branch: main
- Last Pushed: 2022-02-11T08:01:15.000Z (over 2 years ago)
- Last Synced: 2024-05-16T02:42:00.021Z (6 months ago)
- Language: Python
- Size: 51.8 KB
- Stars: 508
- Watchers: 9
- Forks: 72
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
Awesome Lists containing this project
- awesome-python-machine-learning-resources - GitHub - 50% open · ⏱️ 11.02.2022): (Others)
README
# Find spark
PySpark isn't on sys.path by default, but that doesn't mean it can't be used as a regular library.
You can address this by either symlinking pyspark into your site-packages,
or adding pyspark to sys.path at runtime. `findspark` does the latter.To initialize PySpark, just call
```python
import findspark
findspark.init()import pyspark
sc = pyspark.SparkContext(appName="myAppName")
```Without any arguments, the SPARK_HOME environment variable will be used,
and if that isn't set, other possible install locations will be checked. If
you've installed spark withbrew install apache-spark
on OS X, the location `/usr/local/opt/apache-spark/libexec` will be searched.
Alternatively, you can specify a location with the `spark_home` argument.
```python
findspark.init('/path/to/spark_home')
```To verify the automatically detected location, call
```python
findspark.find()
```Findspark can add a startup file to the current IPython profile so that the environment vaiables will be properly set and pyspark will be imported upon IPython startup. This file is created when `edit_profile` is set to true.
```
ipython --profile=myprofile
findspark.init('/path/to/spark_home', edit_profile=True)
```Findspark can also add to the .bashrc configuration file if it is present so that the environment variables will be properly set whenever a new shell is opened. This is enabled by setting the optional argument `edit_rc` to true.
```python
findspark.init('/path/to/spark_home', edit_rc=True)
```If changes are persisted, findspark will not need to be called again unless the spark installation is moved.