{"id":19292370,"url":"https://github.com/openaddresses/batch","last_synced_at":"2025-10-07T21:40:34.836Z","repository":{"id":38992482,"uuid":"240550063","full_name":"openaddresses/batch","owner":"openaddresses","description":"OpenAddresses/Machine based AWS Batch based ETL Processing","archived":false,"fork":false,"pushed_at":"2025-09-10T00:44:15.000Z","size":11385,"stargazers_count":6,"open_issues_count":45,"forks_count":7,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-09-17T00:54:21.600Z","etag":null,"topics":["addresses","geocoder","geocoding","geospatial","gis","openaddresses"],"latest_commit_sha":null,"homepage":"https://batch.openaddresses.io/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/openaddresses.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"open_collective":"openaddresses"}},"created_at":"2020-02-14T16:19:09.000Z","updated_at":"2025-06-03T12:50:06.000Z","dependencies_parsed_at":"2023-02-09T12:46:19.087Z","dependency_job_id":"6f54ea7a-eb0f-49ca-8b1b-1d9aed98b1dc","html_url":"https://github.com/openaddresses/batch","commit_stats":{"total_commits":1774,"total_committers":7,"mean_commits":"253.42857142857142","dds":"0.13190529875986468","last_synced_commit":"ebd5ea28b8ee8005e01ccd34f5afa3339a7164d6"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/openaddresses/batch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openaddresses%2Fbatch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openaddresses%2Fbatch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openaddresses%2Fbatch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openaddresses%2Fbatch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/openaddresses","download_url":"https://codeload.github.com/openaddresses/batch/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/openaddresses%2Fbatch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278854216,"owners_count":26057419,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-07T02:00:06.786Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["addresses","geocoder","geocoding","geospatial","gis","openaddresses"],"created_at":"2024-11-09T22:30:12.163Z","updated_at":"2025-10-07T21:40:34.796Z","avatar_url":"https://github.com/openaddresses.png","language":"JavaScript","readme":"\u003ch1 align=center\u003eOpenAddresses Batch\u003c/h1\u003e\n\n## Deploy\n\nBefore you are able to deploy infrastructure you must first setup the [OpenAddresses Deploy tools](https://github.com/openaddresses/deploy)\n\nOnce these are installed, you can create the production stack via:\n(Note: it should already exist!)\n\n```sh\ndeploy create prod\n```\n\nOr update to the latest GitSha or CloudFormation template via\n\n```sh\ndeploy update prod\n```\n\n### Parameters\n\nWhenever you deploy, you will be prompted for the following parameters\n\n#### GitSha\n\nOn every commit, GitHub actions will build the latest Docker image and push it to the `batch` ECR.\nThis parameter will be populated automatically by the `deploy` cli and simply points the stack\nto use the correspondingly Docker image from ECR.\n\n#### ProtomapsKey\n\nA Protomaps API key ([from here](https://protomaps.com/dashboard)) for displaying base maps underneath our address data.\n\n#### Bucket\n\nThe bucket in which assets should be saved to. See the `S3 Assets` section of this document for more information\n\n#### Branch\n\nThe branch with which weekly sources should be built from. When deployed into production this is generally `master`. When\ntesting new features this can be any `openaddresses/openaddresses` branch.\n\n#### DatabaseType\n\nThe AWS RDS database class that powers the backend.\n\n#### DatabasePassword\n\nThe password to set on the backend database. Passed to the API via docker env vars\n\n#### SharedSecret\n\nAPI functions that are public currently do not require any auth at all. Internal functions however are protected\nby a stack-wide shared secret. This secret is an alpha-numeric string that is included in a `secret` header, to\nauthenticate internal API calls.\n\nThis value can be any secure alpha-numeric combination of characters and is safe to change at any time.\n\n#### GithubSecret\n\nThis is the secret that Github uses to sign API events that are sent to this API. This shared signature allows\nus to verify that events are from github. Only the production stack should use this parameter.\n\n## Components\n\nThe project is divided into several componenets\n\n| Component | Purpose |\n| --------- | ------- |\n| cloudformation | Deploy Configuration |\n| api | Dockerized server for handling all API interactions |\n| api/web | Subfolder for UI specific components |\n| cli | CLI for manually queueing work to batch |\n| lambda | Lambda responsible for instantiating a batch job environement and submitting it |\n| task | Docker container for running a batch job |\n\n## S3 Assets\n\nBy default, processed job assets are uploaded to the bucket `v2.openaddresses.io` in the following format\n\n```\ns3://v2.openaddresses.io/\u003cstack\u003e/job/\u003cjob_id\u003e/source.png\ns3://v2.openaddresses.io/\u003cstack\u003e/job/\u003cjob_id\u003e/source.geojson\ns3://v2.openaddresses.io/\u003cstack\u003e/job/\u003cjob_id\u003e/cache.zip\n```\n\nManual sources (sources that are cached to s3 via the upload tool), are in the following format\n```\ns3://v2.openaddresses.io/\u003cstack\u003e/upload/\u003cuser_id\u003e/\u003cfile_name\u003e\n```\n\n## API\n\nAPI documentation is availiable [here](https://batch.openaddresses.io/docs/)\n\n## Development\n\nIn order to set up an effective dev environment, first obtain a copy of the metastore.\n\nCreate a local\n\n```sh\n./clone prod\n```\n\nThen from the `/api` directory, run\n\n```sh\nnpm run dev\n```\n\nNow, to build the latest UI, navigate to the `/api/web` directory in a seperate tab, and run:\n\n```sh\nnpm run build --watch\n```\n\nNote: changes to the website will now to automatically rebuilt, just refresh the page to see them.\n\nFinally, to access the api, navigate to `http://localhost:5000` in your web browser.\n\n## Database\n\nAll data is persisted in an AWS RDS managed postgres database.\n\n![dbdiagram.io](./docs/db.png)\n","funding_links":["https://opencollective.com/openaddresses"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenaddresses%2Fbatch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenaddresses%2Fbatch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenaddresses%2Fbatch/lists"}