{"id":16600343,"url":"https://github.com/fsanaulla/spark-http-rdd","last_synced_at":"2026-04-28T08:39:33.730Z","repository":{"id":39845480,"uuid":"340506270","full_name":"fsanaulla/spark-http-rdd","owner":"fsanaulla","description":"RDD primitive for fetching data from an HTTP source","archived":false,"fork":false,"pushed_at":"2024-07-29T15:15:10.000Z","size":106,"stargazers_count":1,"open_issues_count":16,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-27T23:30:54.469Z","etag":null,"topics":["scala","spark"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fsanaulla.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"fsanaulla"}},"created_at":"2021-02-19T22:38:07.000Z","updated_at":"2021-12-21T12:10:58.000Z","dependencies_parsed_at":"2025-02-14T10:42:28.132Z","dependency_job_id":null,"html_url":"https://github.com/fsanaulla/spark-http-rdd","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/fsanaulla/spark-http-rdd","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fsanaulla%2Fspark-http-rdd","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fsanaulla%2Fspark-http-rdd/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fsanaulla%2Fspark-http-rdd/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fsanaulla%2Fspark-http-rdd/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fsanaulla","download_url":"https://codeload.github.com/fsanaulla/spark-http-rdd/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fsanaulla%2Fspark-http-rdd/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32373514,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-27T20:07:02.737Z","status":"online","status_checked_at":"2026-04-28T02:00:07.250Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["scala","spark"],"created_at":"2024-10-12T00:14:12.745Z","updated_at":"2026-04-28T08:39:33.711Z","avatar_url":"https://github.com/fsanaulla.png","language":"Scala","funding_links":["https://github.com/sponsors/fsanaulla"],"categories":[],"sub_categories":[],"readme":"# spark-http-rdd\n\n[![Scala CI](https://github.com/fsanaulla/spark-http-rdd/actions/workflows/scala.yml/badge.svg)](https://github.com/fsanaulla/spark-http-rdd/actions/workflows/scala.yml)\n[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.github.fsanaulla/spark2-http-rdd_2.12/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.github.fsanaulla/spark2-http-rdd_2.12)\n[![Scala Steward badge](https://img.shields.io/badge/Scala_Steward-helping-blue.svg?style=flat\u0026logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAA4AAAAQCAMAAAARSr4IAAAAVFBMVEUAAACHjojlOy5NWlrKzcYRKjGFjIbp293YycuLa3pYY2LSqql4f3pCUFTgSjNodYRmcXUsPD/NTTbjRS+2jomhgnzNc223cGvZS0HaSD0XLjbaSjElhIr+AAAAAXRSTlMAQObYZgAAAHlJREFUCNdNyosOwyAIhWHAQS1Vt7a77/3fcxxdmv0xwmckutAR1nkm4ggbyEcg/wWmlGLDAA3oL50xi6fk5ffZ3E2E3QfZDCcCN2YtbEWZt+Drc6u6rlqv7Uk0LdKqqr5rk2UCRXOk0vmQKGfc94nOJyQjouF9H/wCc9gECEYfONoAAAAASUVORK5CYII=)](https://scala-steward.org)\n\n## Installation\n\nAdd it into your `build.sbt`\n\n### Spark 3\n\nCompiled for scala 2.12\n\n```\nlibraryDependencies += \"com.github.fsanaulla\" %% \"spark3-http-rdd\" % \u003cversion\u003e\n```\n\n### Spark 2\n\nCross-compiled for scala 2.11, 2.12\n\n```\nlibraryDependencies += \"com.github.fsanaulla\" %% \"spark2-http-rdd\" % \u003cversion\u003e\n```\n\n## Usage\n\nLet's define our source URI:\n\n```scala\nval baseUri: URI = ???\n```\n\nWe will build our partitions on top of it using array of `URIModifier` that looks like:\n\n```scala\nval uriPartitioner: Array[URIModifier] = Array(\n  URIModifier.fromFunction { uri =\u003e\n    // uri modification logic, \n    // for example appending path, adding query params etc\n  },\n  ...\n)\n```\n\n**Important**: Number of `URIModifier` should be equal to desired number of partitions. Each URI will be used as a\nbase URI for separate partition\n\nThen we should define the way how we will work with http endpoint responses. By default it expect to receive line\nseparated number of rows where each row will be processed as separate entity during process of response mapping\n\n```scala\nval mapping: String =\u003e T = ??? \n```\n\nAnd then you can create our RDD:\n\n```scala\nval rdd: RDD[T] =\n  HttpRDD.create(\n    sc,\n    baseUri,\n    uriPartitioner,\n    mapping\n  )\n```\n\nMore details available in the source code. Also as an example you can use integration tests","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffsanaulla%2Fspark-http-rdd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffsanaulla%2Fspark-http-rdd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffsanaulla%2Fspark-http-rdd/lists"}