{"id":13735536,"url":"https://github.com/lifebit-ai/ecw-converter","last_synced_at":"2025-05-08T11:33:35.870Z","repository":{"id":105054657,"uuid":"179525794","full_name":"lifebit-ai/ecw-converter","owner":"lifebit-ai","description":"Dockerised python scripts \u0026 Nextflow pipeline for converting ecw files to either geotiffs or Cloud Optimised Geotiffs (COGs)","archived":false,"fork":false,"pushed_at":"2019-05-09T12:31:26.000Z","size":801,"stargazers_count":6,"open_issues_count":1,"forks_count":5,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-08-04T03:05:12.056Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lifebit-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-04-04T15:28:49.000Z","updated_at":"2023-02-22T20:43:29.000Z","dependencies_parsed_at":null,"dependency_job_id":"f481399e-57da-4681-941c-8b48d90a0560","html_url":"https://github.com/lifebit-ai/ecw-converter","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifebit-ai%2Fecw-converter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifebit-ai%2Fecw-converter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifebit-ai%2Fecw-converter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lifebit-ai%2Fecw-converter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lifebit-ai","download_url":"https://codeload.github.com/lifebit-ai/ecw-converter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224727181,"owners_count":17359532,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T03:01:07.913Z","updated_at":"2024-11-15T03:32:10.150Z","avatar_url":"https://github.com/lifebit-ai.png","language":"Python","funding_links":[],"categories":["`Python` processing of optical imagery (non deep learning)"],"sub_categories":["Cloud Native Geospatial"],"readme":"# ecw-converter\n\nDockerised python scripts \u0026 Nextflow pipeline for converting ecw files to either Geotiffs or Cloud Optimised Geotiffs (COGs).\n\n- [Motivation](https://github.com/lifebit-ai/ecw-converter#motivation)\n- [Quick Run](https://github.com/lifebit-ai/ecw-converter#quick-run)\n- [Testdata](https://github.com/lifebit-ai/ecw-converter#testdata)\n- [Docker](https://github.com/lifebit-ai/ecw-converter#docker)\n    - [Rebuilding the docker image](https://github.com/lifebit-ai/ecw-converter#rebuilding-the-docker-image) \n    - [Running on the command line with Docker](https://github.com/lifebit-ai/ecw-converter#running-on-the-command-line-with-docker)\n- [Deploit](https://github.com/lifebit-ai/ecw-converter#deploit)\n    - [Running docker on Deploit](https://github.com/lifebit-ai/ecw-converter#running-docker-on-deploit) \n        - [Import the docker image from DockerHub](https://github.com/lifebit-ai/ecw-converter#import-the-docker-image-from-dockerhub)\n        - [Running a job](https://github.com/lifebit-ai/ecw-converter#running-a-job)\n        - [Setting resources](https://github.com/lifebit-ai/ecw-converter#setting-resources)\n- [Nextflow](https://github.com/lifebit-ai/ecw-converter#nextflow)\n    - [Running on the command line](https://github.com/lifebit-ai/ecw-converter#running-on-the-command-line-with-nextflow)\n    - [Running Nextflow on Deploit](https://github.com/lifebit-ai/ecw-converter#running-nextflow-on-deploit)\n        - [Import the Nextflow pipeline from GitHub](https://github.com/lifebit-ai/ecw-converter#import-the-nextflow-pipeline-from-githib)\n        - [Running a Nextflow job](https://github.com/lifebit-ai/ecw-converter#running-a-nextflow-job)\n        - [Setting resources](https://github.com/lifebit-ai/ecw-converter#setting-resources-1)\n- [Cost estimate](https://github.com/lifebit-ai/ecw-converter#cost-estimate)\n- [Outputs](https://github.com/lifebit-ai/ecw-converter#outputs)\n\n## Motivation\n\nThe scripts have been used for converting a stream ecw file images from [Denmark aerial imagery source site](https://download.kortforsyningen.dk/content/geodanmark-ortofoto-blokinddelt) into COGs (which is a very high compute process).\n\nConverting to full COGs is far better than creating regular Geotiffs. The key benefit of a COG is that it is possible to get only a section of the image if required, rather than downloading the entire file. When working with large files and doing analysis on/viewing a specific section of the image, this becomes incredibly beneficial.\n(There are also further differences)\n\n\n## Quick run\nThe tool(s) can be run on:\n* [command line with Docker](#running-on-the-command-line-with-docker)\n* [command line with Nextflow](#running-on-the-command-line-with-nextflow)\n* [Deploit with Docker](#running-docker-on-deploit)\n* [Deploit with Nextflow](#running-nextflow-on-deploit) (recommended)\n\nIf analysing lots of data it is recommended to use Nextflow rather than Docker alone for increased parallelisation. \n\n## Testdata\nBucket containing the images (300 zips of the .ecw format files) can be found at: [s3://lifebit-public](https://s3.console.aws.amazon.com/s3/buckets/lifebit-public/?region=eu-west-1\u0026tab=overview#)\n\n![aws_data](https://raw.githubusercontent.com/lifebit-ai/ecw-converter/master/images/aws_data.png)\n\n\n## Docker\nThe docker image is [lifebitai/ecw_converter:latest](https://hub.docker.com/r/lifebitai/ecw_converter)\n   \nThe docker image contains the scripts which were originally downloaded from [joe.peskett/ecw_converter](https://gitlab.officialstatistics.org/joe.peskett/ecw_converter.git) \u0026 were modified. \n\nThe modifications included:\n- changing the regex for input ECW files\n- removing the pushing to an S3 bucket as this is handled by Deploit\n- adding python shebang lines\n\nDependencies for the scripts such as GDAL with .ecw drivers \u0026 Python are also installed in the image.\n\nThe docker image includes the following scripts:\n- [`ecw_to_cog.sh`](ecw_converter/ecw_to_cog.sh) bash wrapper script to unzip files the input files and then run the scripts below\n- [`ecw_convert_2_cog.py`](ecw_converter/ecw_convert_2_cog.py) scripts for converting .ecw files to both COGs and Geotiffs. There are two gdal_translate processes. Without the second process, you will NOT create a valid COG\n- [`validate_cog.py`](ecw_converter/validate_cog.py) validate whether a COG is a valid, fully compliant COG\n\n### (Re)building the Docker image\n\nIf you wish to make any modifications to the docker image you can do so with the steps below:\n```bash\ngit clone https://github.com/lifebit-ai/ecw-converter.git \u0026\u0026 cd ecw-converter\ndocker build -t \u003cDockerHubUsername\u003e/ecw_converter:\u003ctag\u003e .\n# you can then use `docker login` \u0026 `docker push \u003cDockerHubUsername\u003e/ecw_converter:\u003ctag\u003e` to push to DockerHub\n```\n\nOnce the docker image has been built \u0026 pushed to the DockerHub registry. (Which has already been done under the lifebitai DockerHub account). Any user can easily run the docker image either on the command line or on Deploit (see more details below)\n\n### Running on the command line with Docker\n\nIf you have docker installed, and zipped ECW files in you current directory the tool can be run with the following command:\n```bash\n# you can download a zipped ecw file with `wget https://s3-eu-west-1.amazonaws.com/lifebit-public/10km_2017_612_62_ECW_UTM32-ETRS89.zip`\ndocker run -v $PWD:$PWD -w $PWD lifebitai/ecw_converter ecw_to_cog.sh\n```\n\n## Deploit\n\nDeploit is a bioinformatics platform, developed by Lifebit, where you can run your analysis over the Cloud/AWS.\n\nYou can create an account/log in [here](https://deploit.lifebit.ai/login)\n\n![deploit](https://raw.githubusercontent.com/lifebit-ai/ecw-converter/master/images/deploit.png)\n\n### Running Docker on Deploit\n\n#### Import the Docker image from DockerHub:\n\nNavigate to the pipelines page, click new to import a new pipeline. Then select Docker \u0026 paste the URL from DockerHub eg: https://hub.docker.com/r/lifebitai/ecw_converter\n\n![import](https://raw.githubusercontent.com/lifebit-ai/ecw-converter/master/images/import_docker.png)\n\n\n#### Running a job\n\nYou can then click the pipeline under the \"My pipelines\" section and select data/input parameters:\n\n![run_job](https://raw.githubusercontent.com/lifebit-ai/ecw-converter/master/images/run_job.png)\n\nNo input parameters are required. Currently, all of the input zipped ecw files are set using the working directory. All of the files in the working directory will then be unzipped and the ecw files converted.\n\n#### Setting resources\n\nSelect a project \u0026 instance:\n\n![instance](https://raw.githubusercontent.com/lifebit-ai/ecw-converter/master/images/instance.png)\n\n## Nextflow\n\nNextflow is a programming language used to build data pipelines that has been widely adopted by the bioinformatics community. It was used here because of it's in-built support for Docker containers and parallelisation that allows the conversion of each of the ECW files to take place simultaneously.\n\n### Running on the command line with Nextflow\n\nIf you have Nextflow \u0026 Docker installed, and zipped ECW files in one of your directories the pipeline can be run with the following command:\n```bash\n# you can download a zipped ecw file with `wget https://s3-eu-west-1.amazonaws.com/lifebit-public/10km_2017_612_62_ECW_UTM32-ETRS89.zip`\nnextflow run main.nf --input_folder \u003cyour_folder\u003e\n```\n\n### Running Nextflow on Deploit\n\n#### Import the Nextflow pipeline from GitHib:\n\nNavigate to the pipelines page, click new to import a new pipeline. Then select Nextflow \u0026 paste the URL from GitHub eg: https://github.com/lifebit-ai/ecw-converter\n\n![import_nextflow](https://raw.githubusercontent.com/lifebit-ai/ecw-converter/master/images/import_nextflow.png)\n\n\n#### Running a Nextflow job\n\nYou can then click the pipeline under the \"My pipelines\" section and select data/input parameters:\n\nThe `--input_folder` is a required parameter. It must contain all of the input zipped ecw files to be unzipped and the ecw files converted. The data can be set by clicking the blue database button and selecting your data either from an S3 bucket or by uploading the data. \n\n\n![run_nextflow_job](https://raw.githubusercontent.com/lifebit-ai/ecw-converter/master/images/run_nextflow_job.png)\n\n#### Setting resources\n\nSelect a project \u0026 instance:\n\n![instance_nextflow](https://raw.githubusercontent.com/lifebit-ai/ecw-converter/master/images/instance_nextflow.png)\n\n## Cost estimate\nResources used for four zipped ecw files, 4.35GB in total (see the job [here](https://deploit.lifebit.ai/public/jobs/5ca8cf0fe4365600b2b15a2e))\n* Resources: an m2.2xlarge (spot) instance was used. (This has 4 CPUs \u0026 34.2 GB memory)\n* Run time: 2h 46m\n* Cost: $0.43\n\nAs the bucket contains 2056GB the cost to convert all of the files (assuming the cost scales linearly) may be around $200 (0.426 / 4.35 x 2056)\n\nAs the file conversion can be run in parrallel for each of the files (by using the Nextflow pipeline) the total time taken should be equal to that of the time taken to convert the largest file. \n\n![job_monitor](https://raw.githubusercontent.com/lifebit-ai/ecw-converter/master/images/job_monitor.png)\n\n## Outputs\n\nFrom running the `ecw_to_cog.sh` script the following folders/files are generated:\n* `zip` the input .zip files are moved to this directory\n* `tif` directory to store the generated .tif files\n* `logs`\n    * `validate_cog.log` stdout from `validate_cog.py`\n    * `unzip.log` stdout from unzipping of the files\n    * `ecw_convert_2_cog.log` stdout from `ecw_convert_2_cog.py`\n* `img`\n    * `compliant-cog` directory to contain COG files\n* `ecw` directory to store the .ecw files once unzipped\n\nWhen running the Nextflow pipeline only the `tif`, `img` \u0026 `logs` directories are outputted to save storage space.\n\nWhen run over Deploit results are made in the users S3 bucket generated by Deploit. This will be located in `s3://lifebit-user-data-\u003cuser_id\u003e/results/job-\u003cjob_id\u003e/results/` as shown by Deploit\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flifebit-ai%2Fecw-converter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flifebit-ai%2Fecw-converter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flifebit-ai%2Fecw-converter/lists"}