{"id":13734115,"url":"https://github.com/FrontierDevelopmentLab/sat-extractor","last_synced_at":"2025-05-08T10:30:57.340Z","repository":{"id":42016128,"uuid":"420218118","full_name":"FrontierDevelopmentLab/sat-extractor","owner":"FrontierDevelopmentLab","description":"Extract Satellite Imagery from public constellations at scale","archived":false,"fork":false,"pushed_at":"2022-07-04T23:57:13.000Z","size":888,"stargazers_count":75,"open_issues_count":4,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-15T02:34:33.059Z","etag":null,"topics":["earth-observation","esa","satellite","satellite-imagery","zarr"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FrontierDevelopmentLab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-10-22T19:24:25.000Z","updated_at":"2024-10-23T19:05:45.000Z","dependencies_parsed_at":"2022-09-21T13:30:43.550Z","dependency_job_id":null,"html_url":"https://github.com/FrontierDevelopmentLab/sat-extractor","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrontierDevelopmentLab%2Fsat-extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrontierDevelopmentLab%2Fsat-extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrontierDevelopmentLab%2Fsat-extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FrontierDevelopmentLab%2Fsat-extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FrontierDevelopmentLab","download_url":"https://codeload.github.com/FrontierDevelopmentLab/sat-extractor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253045614,"owners_count":21845736,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["earth-observation","esa","satellite","satellite-imagery","zarr"],"created_at":"2024-08-03T03:00:52.721Z","updated_at":"2025-05-08T10:30:56.911Z","avatar_url":"https://github.com/FrontierDevelopmentLab.png","language":"Python","funding_links":[],"categories":["`Python` processing of optical imagery (non deep learning)"],"sub_categories":["Download"],"readme":"\u003cdiv id=\"top\"\u003e\u003c/div\u003e\n\u003c!--\n*** Thanks for checking out the Best-README-Template. If you have a suggestion\n*** that would make this better, please fork the repo and create a pull request\n*** or simply open an issue with the tag \"enhancement\".\n*** Don't forget to give the project a star!\n*** Thanks again! Now go create something AMAZING! :D\n--\u003e\n\n\n\n\u003c!-- PROJECT SHIELDS --\u003e\n\u003c!--\n*** I'm using markdown \"reference style\" links for readability.\n*** Reference links are enclosed in brackets [ ] instead of parentheses ( ).\n*** See the bottom of this document for the declaration of the reference variables\n*** for contributors-url, forks-url, etc. This is an optional, concise syntax you may use.\n*** https://www.markdownguide.org/basic-syntax/#reference-style-links\n--\u003e\n\u003c!-- [![Contributors][contributors-shield]][contributors-url]\n[![Forks][forks-shield]][forks-url]\n[![Stargazers][stars-shield]][stars-url]\n[![Issues][issues-shield]][issues-url]\n[![MIT License][license-shield]][license-url]\n[![LinkedIn][linkedin-shield]][linkedin-url] --\u003e\n\n\n\n\u003c!-- PROJECT LOGO --\u003e\n\u003cbr /\u003e\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://github.com/othneildrew/Best-README-Template\"\u003e\n    \u003cimg src=\"images/satextractor.png\" alt=\"Logo\"\u003e\n  \u003c/a\u003e\n\n  \u003ch3 align=\"center\"\u003eSatExtractor\u003c/h3\u003e\n\n  \u003cp align=\"center\"\u003e\n    Build, deploy and extract satellite public constellations with one command line.\n    \u003cbr /\u003e\n   \u003ca href=\"https://github.com/othneildrew/Best-README-Template\"\u003e\n    \u003cimg src=\"images/stac.gif\" alt=\"Logo\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\n\n\u003c!-- TABLE OF CONTENTS --\u003e\n\u003cdetails\u003e\n  \u003csummary\u003eTable of Contents\u003c/summary\u003e\n  \u003col\u003e\n    \u003cli\u003e\n      \u003ca href=\"#about-the-project\"\u003eAbout The Project\u003c/a\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\n      \u003ca href=\"#getting-started\"\u003eGetting Started\u003c/a\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003ca href=\"#structure\"\u003eStructure\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#prerequisites\"\u003ePrerequisites\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#installation\"\u003eInstallation\u003c/a\u003e\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#usage\"\u003eUsage\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#contributing\"\u003eContributing\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#citation\"\u003eCitation\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#acknowledgments\"\u003eAcknowledgments\u003c/a\u003e\u003c/li\u003e\n  \u003c/ol\u003e\n\u003c/details\u003e\n\n\n\n\u003c!-- ABOUT THE PROJECT --\u003e\n## About The Project\n\n- *tldr*: **SatExtractor** gets **all revisits in a date range** from a given **geojson region** from any public satellite constellation and store it in a **cloud friendly format**.\n\n\nThe large amount of image data makes it difficult to create datasets to train models quickly and reliably. Existing methods for extracting satellite images take a long time to process and have user quotas that restrict access.\n\nTherefore, we created an open source extraction tool **SatExtractor** to perform worldwide datasets extractions using serverless providers such as **Google Cloud Platform** or **AWS** and based on a common existing standard: **STAC**.\n\nThe tool scales horizontally as needed, extracting revisits and storing them in **zarr** format to be easily used by deep learning models.\n\nIt is fully configurable using [Hydra]([hydra](https://hydra.cc/)).\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\n\u003c!-- GETTING STARTED --\u003e\n## Getting Started\n\n**SatExtractor** needs a cloud provider to work. Before you start using it, you'll need to create and configure a cloud provider account.\n\nWe provide the implementation to work with [Google Cloud](https://cloud.google.com/), but **SatExtractor** is implemented to be easily extensible to other providers.\n\n### Structure\n\nThe package is structured in a modular and configurable approach. It is basically a pipeline containing 6 important steps (separated in modules).\n\n- **Builder**: contains the logic to build the container that will run the extraction. \u003cdetails\u003e\n  \u003csummary\u003emore info\u003c/summary\u003e\n  SatExtractor is based on a docker container. The Dockerfile in the root dir is used to build the core package and a reference in it to the specific provider extraction logic should be explicitly added (see the gcp example in directory providers/gcp).\n\n  This is done by setting \u003ccode\u003e ENV PROVIDER \u003c/code\u003e  var to point the provider directory. In the default Dockerfile it is set to gcp: \u003ccode\u003e ENV PROVIDER providers/gcp \u003c/code\u003e.\n\u003c/details\u003e\n\n- **Stac**: converts a public constellation to the **STAC standard**.  \u003cdetails\u003e\n  \u003csummary\u003emore info\u003c/summary\u003e\n  If the original constellation is not already in STAC standard it should be converted. To do so, you have to implement the constellation specific STAC conversor. Sentinel 2 and Landsat 7/8 examples can be found in \u003ccode\u003e src/satextractor/stac \u003c/code\u003e. The function that is actually called to perform the conversion to the STAC standard is set in stac hydra config file ( \u003ccode\u003e conf/stac/gcp.yaml \u003c/code\u003e)\n\u003c/details\u003e\n\n- **Tiler**: Creates tiles (patches) of the given region to perform the extraction. \u003cdetails\u003e\n  \u003csummary\u003emore info\u003c/summary\u003e\n  The Tiler split the region in tiles using \u003ca href=https://sentinelhub-py.readthedocs.io/en/latest/examples/large_area_utilities.html\u003e SentinelHub splitter \u003c/a\u003e. For example if a Tile size of 10000m is set, you will have in your storage patches of size 10000m. \n  The config about the tiler can be found in \u003ccode\u003e conf/tiler/utm.yaml \u003c/code\u003e. There, the size of the tiles can be specified. \n\u003c/details\u003e\n\n- **Scheduler**: Decides how those tiles are going to be scheduled creating extractions tasks. \u003cdetails\u003e\n  \u003csummary\u003emore info\u003c/summary\u003e\n  The Scheduler takes the resulting tiles from the Tiler and group them in bigger areas to be extracted.\n\n  For example, if the Tiler splitted the region in 1000x1000m tiles, now the scheduler can be set to group them in UTM splits of, say, 100000x100000m (100km). Also, the scheduler calculates the intersection between the patches and the constellation STAC assets. At the end, you'll have and object called \u003ccode\u003e ExtractionTask \u003c/code\u003e with the information to extract one revisit, one band and multiple patches. This \u003ccode\u003e ExtractionTask \u003c/code\u003e will be send to the cloud provider to perform the actual extraction.\n\n  The config about the scheduler can be found in \u003ccode\u003e conf/scheduler/utm.yaml \u003c/code\u003e.\n\u003c/details\u003e\n\n- **Preparer**: Prepare the files in the cloud storage. \u003cdetails\u003e\n  \u003csummary\u003emore info\u003c/summary\u003e\n  The Preparer creates the cloud file structure. It creates the needed zarr groups and arrays in order to later store the extracted patches.\n\n  The gcp preparer config can be found in \u003ccode\u003e conf/preparer/gcp.yaml \u003c/code\u003e.\n\u003c/details\u003e\n\n- **Deployer**: Deploy the extraction tasks created by the scheduler to perform the extraction. \u003cdetails\u003e\n  \u003csummary\u003emore info\u003c/summary\u003e\n  The Deployer sends one message per ExtractionTask to the cloud provider to perform the actal extraction. It works by publishing messages to a PubSub queue where the extraction is subscribed to. When a new message (ExtractionTask) arrives it will be automatically run on the cloud autoscaling.\n  The gcp deployer config can be found in \u003ccode\u003e conf/deployer/gcp.yaml \u003c/code\u003e.\n\u003c/details\u003e\n\n\nAll the steps are **optional** and the user decides which to run the **main config file**.\n\n\n### Prerequisites\n\nIn order to run **SatExtractor** we recommend to have a virtual env and a cloud provider user should already been created.\n\n### Installation\n\n\n1. Clone the repo\n   ```sh\n   git clone https://github.com/FrontierDevelopmentLab/sat-extractor\n   ```\n2. Install python packages\n   ```sh\n   pip install .\n   ```\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\n\n\u003c!-- USAGE EXAMPLES --\u003e\n## Usage\n\u0026#x1F534;\u0026#x1F534;\u0026#x1F534;\n```diff\n- WARNING!!!!:\nRunning SatExtractor will use your billable cloud provider services.\nWe strongly recommend testing it with a small region to get acquainted\nwith the process and have a first sense of your cloud provider costs\nfor the datasets you want to generate. Be sure you are running all your\ncloud provider services in the same region to avoid extra costs.\n```\n\u0026#x1F534;\u0026#x1F534;\u0026#x1F534;\n\nOnce a cloud provider user is set and the package is installed you'll need to grab the GeoJSON region you want (you can get it from the super-cool tool [geojson.io](http://geojson.io/)) and change the config files.\n\n\n1. Choose a region name (eg `cordoba` below) and create an output directory for it:\n```\nmkdir output/cordoba\n```\n2. Save the region GeoJSON as `aoi.geojson` and store it in the folder you just created.\n3. Open the `config.yaml` and you'll see something like this:\n\n```yaml\ndataset_name: cordoba\noutput: ./output/${dataset_name}\n\nlog_path: ${output}/main.log\ncredentials: ${output}/token.json\ngpd_input: ${output}/aoi.geojson\nitem_collection: ${output}/item_collection.geojson\ntiles: ${output}/tiles.pkl\nextraction_tasks: ${output}/extraction_tasks.pkl\n\nstart_date: 2020-01-01\nend_date: 2020-02-01\n\nconstellations:\n  - sentinel-2\n  - landsat-5\n  - landsat-7\n  - landsat-8\n\ndefaults:\n  - stac: gcp\n  - tiler: utm\n  - scheduler: utm\n  - deployer: gcp\n  - builder: gcp\n  - cloud: gcp\n  - preparer: gcp\n  - _self_\ntasks:\n  - build\n  - stac\n  - tile\n  - schedule\n  - prepare\n  - deploy\n\nhydra:\n  run:\n    dir: .\n```\n\nThe important here is to set the `dataset_name` to  `\u003cyour_region_name\u003e`, define the `start_date` and `end_date` for your revisits, your `constellations` and the tasks to be run (you would want to run the `build` only one time and the comment it out.)\n\n**Important**: the `token.json` contains the needed credentials to access you cloud provider. In this example case it contains the gcp credentials. You can see instructions for getting it below in the [Authentication](#authentication) instructions.\n\n3. Open the `cloud/\u003cprovider\u003e.yaml` and add there your account info as in the default provided file. The `storage_root` must point to an existing bucket/bucket directory. `user_id` is simply used for naming resources.\n   (optional): you can choose different configurations by changing modules configs: `builder`, `stac`, `tiler`, `scheduler`, `preparer`, etc. There you can change things like patch_size, chunk_size.\n\n4. Run `python src/satextractor/cli.py` and enjoy!\n\nSee the [open issues](https://github.com/FrontierDevelopmentLab/sat-extractor/issues) for a full list of proposed features (and known issues).\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\n## Authentication\n### Google Cloud\nTo get the `token.json` for Google Cloud, the recommended approach is to create a service account:\n1. Go to [Credentials](https://console.cloud.google.com/apis/credentials)\n2. Click `Create Credentials` and choose `Service account`\n3. Enter a name (e.g. `sat-extractor`) and click `Create and Continue`\n4. Under `Select a role`, choose `Basic` -\u003e `Editor` and then click `Done`\n4. Choose the account from the list and then to to the `Keys` tab\n5. Click `Add key` -\u003e `Create new key` -\u003e `JSON` and save the file that gets downloaded\n6. Rename to `token.json` and you're done!\n\nFor building the `sat-extractor` service, you may also need to configure the credentials used by the cloud provider commandline devkit.\nPermissions at the project-owner level are recommended.\nIf using Google Cloud Platform, you can authorize the `gcloud` devkit to access Google Cloud Platform using your Google credentials by running the command `gcloud auth login`.\nYou may also need to run `gcloud config set project your-proj-name` for `sat-extractor` to work properly.\n\n\u003c!-- CONTRIBUTING --\u003e\n## Contributing\n\nContributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.\n\nIf you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag \"enhancement\".\nDon't forget to give the project a star! Thanks again!\n\n1. Fork the Project\n2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)\n3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)\n4. Push to the Branch (`git push origin feature/AmazingFeature`)\n5. Open a Pull Request\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\n\n\u003c!-- LICENSE --\u003e\n## License\n\nDistributed under the BSD 2 License. See `LICENSE.txt` for more information.\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n\n## Citation \n\nIf you want to use this repo please cite:\n\n```\n@software{dorr_francisco_2021_5609657,\n  author       = {Dorr, Francisco and\n                  Kruitwagen, Lucas and\n                  Ramos, Raúl and\n                  García, Dolores and\n                  Gottfriedsen, Julia and\n                  Kalaitzis, Freddie},\n  title        = {SatExtractor},\n  month        = oct,\n  year         = 2021,\n  publisher    = {Zenodo},\n  version      = {v0.1.0},\n  doi          = {10.5281/zenodo.5609657},\n  url          = {https://doi.org/10.5281/zenodo.5609657}\n}\n```\n\n\u003cp align=\"right\"\u003e(\u003ca href=\"#top\"\u003eback to top\u003c/a\u003e)\u003c/p\u003e\n\n## Acknowledgments\n\n\u003cdiv align=\"center\"\u003e\n   \u003ca href=\"https://fdleurope.org/\"\u003e\n    \u003cimg src=\"images/fdleuropeESA.png\" alt=\"fdleurope\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n\nThis work is the result of the 2021 ESA Frontier Development Lab World Food Embeddings team. We are grateful to all organisers, mentors and sponsors for providing us this opportunity. We thank Google Cloud for providing computing and storage resources to complete this work.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFrontierDevelopmentLab%2Fsat-extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFrontierDevelopmentLab%2Fsat-extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFrontierDevelopmentLab%2Fsat-extractor/lists"}