{"id":19746361,"url":"https://github.com/astrolabsoftware/spark-lsst","last_synced_at":"2026-05-14T00:37:25.805Z","repository":{"id":92459594,"uuid":"142254965","full_name":"astrolabsoftware/spark-lsst","owner":"astrolabsoftware","description":"Collection of examples combining LSST codes with Apache Spark","archived":false,"fork":false,"pushed_at":"2018-07-25T20:37:48.000Z","size":419,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-02-28T07:49:01.365Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/astrolabsoftware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-25T06:07:22.000Z","updated_at":"2018-10-17T12:26:09.000Z","dependencies_parsed_at":"2023-06-02T12:45:29.791Z","dependency_job_id":null,"html_url":"https://github.com/astrolabsoftware/spark-lsst","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/astrolabsoftware/spark-lsst","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Fspark-lsst","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Fspark-lsst/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Fspark-lsst/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Fspark-lsst/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/astrolabsoftware","download_url":"https://codeload.github.com/astrolabsoftware/spark-lsst/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Fspark-lsst/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33004981,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T13:14:54.681Z","status":"ssl_error","status_checked_at":"2026-05-13T13:14:51.610Z","response_time":115,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T02:14:21.705Z","updated_at":"2026-05-14T00:37:25.800Z","avatar_url":"https://github.com/astrolabsoftware.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# spark-lsst\n\n_Collection of examples combining Apache Spark and LSST codes._\n\n## Why would you use Apache Spark in the context of LSST? \n\nAlthough [Apache Spark](http://spark.apache.org/) is not primarily meant to bring further speed-up on the computation itself, it is very efficient to deal with and manage a large volume of data. It often provides better performances than other tools for e.g. embarrassingly parallel job by optimizing data distribution (load balancing) and minimizing I/O latency.\n\nLet's also mention that one of the strength of Apache Spark resides in its simplicity. It’s impressive how seamless Spark works for dealing with pipeline and job management. Those are managed internally without user actions and are often executed more efficiently than what we could do manually by scheduling tasks with MPI for example.\n\n## What typically needs to be modified to use Apache Spark?\n\nWe acknowledge the fact that cluster computing (HTC) in general and Apache Spark in particular can be derouting at first sight for newcomers from HPC, and one often does not want to rewrite entirely a package.\nWe define 2 levels of _sparkfication_, with the first one being a minimal change of the original code for Spark to work, and the second one being a more aggressive restructuration to fully benefit from the Spark framework.\n\n### Level 0: minimal intrusion\n\nAt the level 0, we mostly focus on rewriting I/O to be compliant with Spark philosophy: input, output, and call to external data such as external DB queries or internal log/parameter files.\nOne should keep in mind that Spark is a distributed framework, so concepts such as absolute paths, or local data are often not valid here.\n\n### Level 1: code infection\n\nWe go beyond just wrapping the existing package and use the full potential of Spark: functional programming, native data source connectors (e.g. [spark-fits](https://github.com/astrolabsoftware/spark-fits)), ...\n\n## Available LSST package using Spark\n\n- [LSSTDESC/Spectractor](https://github.com/LSSTDESC/Spectractor)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrolabsoftware%2Fspark-lsst","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fastrolabsoftware%2Fspark-lsst","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrolabsoftware%2Fspark-lsst/lists"}