{"id":18870558,"url":"https://github.com/alfred82santa/sparpy","last_synced_at":"2026-02-14T19:30:14.738Z","repository":{"id":44811167,"uuid":"276731777","full_name":"alfred82santa/sparpy","owner":"alfred82santa","description":"Sparpy: A spark entry point for python","archived":false,"fork":false,"pushed_at":"2022-11-28T16:21:15.000Z","size":113,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-01T06:04:50.094Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alfred82santa.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-07-02T19:42:21.000Z","updated_at":"2023-01-09T08:56:59.000Z","dependencies_parsed_at":"2023-01-23T05:46:09.645Z","dependency_job_id":null,"html_url":"https://github.com/alfred82santa/sparpy","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alfred82santa%2Fsparpy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alfred82santa%2Fsparpy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alfred82santa%2Fsparpy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alfred82santa%2Fsparpy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alfred82santa","download_url":"https://codeload.github.com/alfred82santa/sparpy/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239816857,"owners_count":19701816,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T05:21:19.481Z","updated_at":"2026-02-14T19:30:14.699Z","avatar_url":"https://github.com/alfred82santa.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"=====================================\nSparpy: A Spark entrypoint for Python\n=====================================\n\n---------\nChangelog\n---------\n\n......\nv0.5.5\n......\n\n* Added `--proxy` option in order to set a proxy to access to Python packages repositories.\n\n......\nv0.5.4\n......\n\n* Added `plugin-env` section on configuration file in order to be able to set environment\n  variables on plugin download process.\n\n* Added `--plugin-env` option (and its environment variable associated `SPARPY_PLUGIN_ENVVARS`)\n  in order to set environment variables on plugin download process. It could be necessary on some cases\n  using conda environments.\n\n* Added environment variable `SPARPY_CONFIG` for option `--config`.\n* Added environment variable `SPARPY_DEBUG` for option `--debug`.\n\n......\nv0.5.3\n......\n\n* Fix isparpy.\n\n......\nv0.5.2\n......\n\n* Fix ignoring all packages when exclude packages list is empty.\n\n......\nv0.5.1\n......\n\n* Fix Python package regex.\n* Fix download script.\n\n......\nv0.5.0\n......\n\n* Added `--exclude-python-packages` option in order to exclude python packages.\n* Better parsing plugins names.\n* Added `--exclude-packages` option in order to exclude spark packages.\n\n......\nv0.4.5\n......\n\n* Fix isparpy entrypoint. Allows `--class` parameter.\n* Allow to set constraints files.\n\n\n......\nv0.4.4\n......\n\n* Don't set `master` and `deploy_mode` default values.\n\n......\nv0.4.3\n......\n\n* Fix sparpy-submit entrypoint.\n* Fix `--property-file` option.\n* Fix `--class` option.\n\n......\nv0.4.2\n......\n\n* Able to use environment variables for the most of options.\n\n......\nv0.4.1\n......\n\n* Support to set pip options as configuration using `--conf sparpy.config-key=value` in order to allow to\n  use `sparpy-submit` in EMR-on-EKS images.\n\n* Allows `--class` in order to allow to use `sparpy-submit` in EMR-on-EKS images.\n* Allows `--property-file` in order to allow to use `sparpy-submit` in EMR-on-EKS images.\n\n......\nv0.4.0\n......\n\n* Added `--pre` option in order to allow pre-release packages.\n* Added `--env` option in order to set environment variables for spark process.\n* Added `spark-env` config section in order to set environment variables for spark process.\n* Write pip output when it fails.\n* Fixed problems with interactive sparpy.\n* Fixed `no-self` option in config file.\n\n* Allow to use plugins that don't use `click`. They must be callable with one argument of type `Sequence[str]`\n  in order to pass arguments to it.\n\n* Added `--version` option in order to print sparpy version.\n* Fixed error when a plugin requires a package which is already installed but version does not satisfy requirement.\n* `Sparpy` does not print error traceback when subprocess fails.\n\n......\nv0.3.0\n......\n\n* Enable `--force-download` option.\n* Added `--find-links` option in order to use a directory as package repository.\n* Added `--no-index` option in order to avoid to use external package repositories.\n* Added `--queue` option in order to set yarn queue.\n* Ensure driver's python executable is same python as `sparpy`.\n* Added new entry point `sparpy-download` just to download packages to specific directory.\n* Added new entry point `isparpy` in order to start an interactive session.\n\n......\nv0.2.1\n......\n\n* Force `pyspark` python executable to same as `sparpy`.\n* Fix unrecognized options.\n* Fix default configuration file names.\n\n......\nv0.2.0\n......\n\n* Added configuration file option.\n* Added `--debug` option.\n\n----------------------------\nHow to build a Sparpy plugin\n----------------------------\n\nOn package `setup.py` an entry point should be configured for Sparpy:\n\n.. code-block:: python\n\n    setup(\n        name='yourpackage',\n        ...\n\n        entry_points={\n            ...\n            'sparpy.cli_plugins': [\n                'my_command_1=yourpackage.module:command_1',\n                'my_command_2=yourpackage.module:command_2',\n            ]\n        }\n    )\n\n.. note::\n\n    Avoid to use PySpark as requirement in order to not download package from pypi.\n\n-------\nInstall\n-------\n\nIt must be installed on a Spark edge node.\n\n.. code-block:: bash\n\n    $  pip install sparpy[base]\n\n\n----------\nHow to use\n----------\n\nUsing default Spark submit parameters:\n\n.. code-block:: bash\n\n    $ sparpy --plugin \"mypackage\u003e=0.1\" my_plugin_command --myparam 1\n\n\n-------------------\nConfiguration files\n-------------------\n\n`sparpy` and `sparpu-submit` accept the parameter `--config` that allow to set a configuration file. If it is not set\nit will try to use configuration file `$HOME/.sparpyrc`. It if does not exist it will try to use `/etc/sparpy.conf`.\n\nFormat:\n\n.. code-block:: ini\n\n    [spark]\n\n    master=yarn\n    deploy-mode=client\n\n    queue=my_queue\n\n    spark-executable=/path/to/my-spark-submit\n    conf=\n        spark.conf.1=value1\n        spark.conf.2=value2\n\n    packages=\n        maven:package_1:0.1.1\n        maven:package_2:0.6.1\n\n    repositories=\n        https://my-maven-repository-1.com/mvn\n        https://my-maven-repository-2.com/mvn\n\n    reqs_paths=\n        /path/to/dir/with/python/packages_1\n        /path/to/dir/with/python/packages_2\n\n    [spark-env]\n\n    MY_ENV_VAR=value\n\n    [plugins]\n\n    extra-index-urls=\n        https://my-pypi-repository-1.com/simple\n        https://my-pypi-repository-2.com/simple\n\n    cache-dir=/path/to/cache/dir\n\n    plugins=\n        my-package1\n        my-package2==0.1.2\n\n    requirements-files=\n        /path/to/requirement-1.txt\n        /path/to/requirement-2.txt\n\n    find-links=\n        /path/to/directory/with/packages_1\n        /path/to/directory/with/packages_2\n\n    download-dir-prefix=my_prefix_\n\n    no-index=false\n    no-self=false\n    force-download=true\n\n    [plugin-env]\n\n    MY_ENV_VAR=value\n\n    [interactive]\n\n    pyspark-executable=/path/to/pyspark\n    python-interactive-driver=/path/to/interactive/driver\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falfred82santa%2Fsparpy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falfred82santa%2Fsparpy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falfred82santa%2Fsparpy/lists"}