{"id":15903664,"url":"https://github.com/vfdev-5/unosat_challenge","last_synced_at":"2025-04-02T20:16:47.810Z","repository":{"id":149001873,"uuid":"220839225","full_name":"vfdev-5/UNOSAT_Challenge","owner":"vfdev-5","description":"UNOSAT Challenge","archived":false,"fork":false,"pushed_at":"2019-12-15T22:55:38.000Z","size":123081,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-08T10:44:21.763Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://challenge.phi-unet.com/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vfdev-5.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-10T19:25:39.000Z","updated_at":"2020-02-14T09:53:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"8641f051-99ae-4e53-9d88-01aae2b85f02","html_url":"https://github.com/vfdev-5/UNOSAT_Challenge","commit_stats":{"total_commits":25,"total_committers":1,"mean_commits":25.0,"dds":0.0,"last_synced_commit":"37e12db083b794e1f135a6b7f7ab3df5a7bbff60"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vfdev-5%2FUNOSAT_Challenge","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vfdev-5%2FUNOSAT_Challenge/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vfdev-5%2FUNOSAT_Challenge/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vfdev-5%2FUNOSAT_Challenge/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vfdev-5","download_url":"https://codeload.github.com/vfdev-5/UNOSAT_Challenge/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246884773,"owners_count":20849554,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-06T12:03:38.543Z","updated_at":"2025-04-02T20:16:47.789Z","avatar_url":"https://github.com/vfdev-5.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [UNOSAT Challenge](https://challenge.phi-unet.com/)\n\nHumanitarian AI4EO Challenge For UNOSAT, ESA and CERN openlab to detect building footprints in Iraq and support the local government to plan reconstruction and development activities in the area.\n\n## Results for Phase 1\n\nWe are training conv neural networks for designed for the segmentation task:\n\nExperiment | Validation IoU(1) | Validation F1 | Test F1 | Notes\n---|---|---|---|---\n[baseline_lwrefinenet.py](configs/train/baseline_lwrefinenet.py)| 0.506 | 0.83 | 0.688175 | LWRefineNet with CrossEntropy, validation city \"38SNE\", c01b6ccf59474808b528f55ee13b497d\n[baseline_lwrefinenet_xentropy_jaccard.py](configs/train/baseline_lwrefinenet_xentropy_jaccard.py)| 0.524 | 0.838 | 0.705516 | LWRefineNet with CrossEntropy+2*Jaccard, validation city \"38SNE\", inference with TTA, df7c6ed4870a40c1b9dcf742a6c07f0a\n[baseline_resnet50-unet.py](configs/train/baseline_resnet50-unet.py)| 0.525 | 0.839 | 0.701196 | ResNet50+UNet with CrossEntropy, validation city \"38SNE\", inference with TTA, 2982bbc723e049f4839ea83d1900c2a2\n[baseline_lwrefinenet_on_5b_db.py](configs/train/baseline_lwrefinenet_on_5b_db.py)| 0.532 | 0.841 | - | LWRefineNet with CrossEntropy on 5 bands (dB), validation city \"38SNE\", inference with TTA, 48b27adb07794d6c901047307d304312\n[baseline_se_resnext50-FPN_on_db.py](configs/train/baseline_se_resnext50-FPN_on_db.py)| 0.535 | 0.843 | 0.742955 | SE-ResNet50+FPN with CrossEntropy on 3 bands (dB), validation city \"38SNE\", inference with TTA, 38c8cca75f8b46a798224a146cdf4426\n[baseline_se_resnext50-FPN_on_db_lr_restart:py](configs/train/baseline_se_resnext50-FPN_on_db_lr_restart.py)| 0.543 | 0.847 | 0.643489 | SE-ResNet50+FPN with CrossEntropy, other hyperparams, LR restarts, validation city \"38SNE\", inference with TTA, 3a0b1378668547f0967974fdb56bb710\n\n\n## Code architecture\n\n- code : project package providing data processing, training/validation/inference scripts and modules with dataflow, losses, models and utils.\n    - [code/scripts/training.py](code/scripts/training.py) training script (single node, 1/N GPUs).\n    - [code/scripts/training_uda.py](code/scripts/training_uda.py) (Work-In-Progress) training script with Unsupervised Data Augmentation method (single node, 1/N GPUs).\n    - [code/scripts/inference.py](code/scripts/inference.py) inference script to validate a model or make predictions on the test dataset (single node, 1/N GPUs).\n- configs : configuration files\n    - configuration is a python file, flexible and highly configurable without any meta-language\n- experiments : some bash scripts and [mlflow](https://mlflow.org/) related file to manage ML experiments in reproducible manner.\n    - software dependencies are setup in [experiments/conda.yaml](experiments/conda.yaml)\n    - job commands are defined in [experiments/MLproject](experiments/MLproject)\n- notebooks : jupyter notebook for visual checkings and development.\n\n\n### Requirements\n\n- Linux OS, Python 3.X, pip\n- git\n- linux libs for opencv-python\n- conda\n- [mlflow](https://mlflow.org/) : `pip install mlflow`\n\n### MLflow setup\n\nSetup mlflow output path as \n```bash\ncd UNOSAT_Challenge\nexport MLFLOW_TRACKING_URI=$PWD/output/mlruns\n```\n\nCreate once \"Trainings\" and \"Inferences\" experiments\n```bash\nmlflow experiments create -n Trainings\nmlflow experiments create -n Inferences\n```\nor check existing experiments:\n```bash\nmlflow experiments list\n```\n\n### Data setup and preparations\n\nCreate symbolic links to downloaded data and output folder\n```bash\ncd UNOSAT_Challenge\nln -s /path/to/data input\nln -s /path/to/output output\n```\n\nSetup mlflow tracking path\n```bash\nexport MLFLOW_TRACKING_URI=$PWD/output/mlruns\n```\n\nGenerate 3-bands files from VV / VH separate files\n```bash\nmlflow run experiments/ -e generate_3b_images -P input_path=../input/train -P output_path=../output/train/images_3b\nmlflow run experiments/ -e generate_3b_images -P input_path=../input/test -P output_path=../output/test/images_3b\n```\n\nRasterize shape files:\n```bash\nmlflow run experiments/ -e rasterize -P input_path=../input/train -P output_path=../output/train/masks\n```\n\nGenerate train tiles:\n```bash\nmlflow run experiments/ -e generate_tiles -P input_path=../output/train/images_3b -P output_path=../input/train_tiles/images\nmlflow run experiments/ -e generate_tiles -P input_path=../output/train/masks -P output_path=../input/train_tiles/masks\n```\n\nGenerate test tiles:\n```bash\nmlflow run experiments/ -e generate_test_tiles -P input_path=../output/test/images_3b -P output_path=../input/test_tiles/images\n```\n\nGenerate train tiles stats:\n```bash\nmlflow run experiments/ -e generate_tiles_stats -P input_path=../input/train_tiles/ -P output_path=../input/train_tiles/\n```\n\n\n### Training, validation and inference\n\nTraining an a single node with 1 or N GPUs:\n\n```bash\nexport MLFLOW_TRACKING_URI=$PWD/output/mlruns\nmlflow run experiments/ --experiment-name=Trainings -P script_path=code/scripts/training.py -P config_path=configs/train/XXX.py -P num_gpus=1\n```\n\nValidation (load model, make prediction, compute metrics) on validation data:\n```bash\nexport MLFLOW_TRACKING_URI=$PWD/output/mlruns\nmlflow run experiments/ --experiment-name=Inferences -P script_path=code/scripts/inference.py -P config_path=configs/inference/validate_XYZ.py\n```\n\nInference on test data:\n```bash\nexport MLFLOW_TRACKING_URI=$PWD/output/mlruns\nmlflow run experiments/ --experiment-name=Inferences -P script_path=code/scripts/inference.py -P config_path=configs/inference/test_XYZ.py\n```\n\nEnsembling multiple predictions \n```bash\nexport MLFLOW_TRACKING_URI=$PWD/output/mlruns\nmlflow run experiments/ --experiment-name=Inferences -e ensemble -P input_paths=\"$PWD/output/mlruns/2/48b27adb07794d6c901047307d304312/artifacts/raw/;$PWD/output/mlruns/2/38c8cca75f8b46a798224a146cdf4426/artifacts/raw/;$PWD/output/mlruns/2/3a0b1378668547f0967974fdb56bb710/artifacts/raw\"\n```\n\nRun validation on predictions (validation tiles):\n```bash\nmlflow run experiments/ --experiment-name=Inferences -e validate -P preds_path=$PWD/output/mlruns/2/XYZ/artifacts/raw/ -P gt_path=$PWD/input/train_tiles/masks\n```\n\n### Transform predictions to submission format\n\n```bash\nexport MLFLOW_TRACKING_URI=$PWD/output/mlruns\nmlflow run experiments/ -e to_submission -P input_path=output/mlruns/2/XYZ/artifacts/raw\n```\n\nShapefiles to submit are produced in the root of `input_path` folder.\n\n#### or manually every step\n\n1) Merge tiles into a single mask image:\n```bash\nexport MLFLOW_TRACKING_URI=$PWD/output/mlruns\nmlflow run experiments/ -e merge_tiles -P input_path=output/mlruns/2/XYZ/artifacts/raw\n```\n\n2) Aggregate predictions by city\n```bash\nmlflow run experiments/ -e all_agg_by_city -P input_path=output/mlruns/2/XYZ/artifacts/raw/\n```\n\n3) Vectorize predictions\n```bash\nmlflow run experiments/ -e polygonize -P input_path=output/mlruns/2/XYZ/artifacts/raw/\n```\n\n### MLflow dashboard\n\nTo visualize experiments and runs, user can start mlflow dashboard:\n\n```bash\nmlflow server --backend-store-uri $PWD/output/mlruns --default-artifact-root $PWD/output/mlruns -p 6026 -h 0.0.0.0\n```\n\n### Remove deleted MLflow runs \n\n```bash\nfor i in `mlflow runs list --experiment-id=1 -v deleted_only | awk '{ print $4 }' | awk '/[0-9]+/'`; do rm -R output/mlruns/1/$i; done\n```\n\n### TODO/Ideas\n\n* [x] EDA\n    - VV / VH float images\n    - ortho-rectified =\u003e zero-pixel zones\n    - what are exactly the targets ?\n    - quality of targets ?\n    - same shape for multiple images of the same region -\u003e still correct targets ?\n    - shape rasterization produces mask for zero ortho-rectified part of image\n    - on test data, we can reduce variance using multiple images per city\n\n* [x] Dataflow\n    * [x] Merge VV / VH -\u003e (VH, VV, (VH + VV) * 0.5)\n    * [x] Rasterize shape masks\n    * [x] Generate tiles from the data\n    * [x] Generate tiles stats and fold indices\n    * [x] Create pytorch datasets and dataloaders \n\n* [x] Baselines\n    * [x] Train Light-Weight RefineNet model\n    * [x] Train ResNet-101 DeeplabV3 model\n    * [x] Train SE-ResNet-50-FPN model \n    * [x] Sampling based on target    \n\n* [ ] Ideas to accelerate/improve training    \n    * [ ] Implement data echoing/minibatch persistence to accelerate trainings ?\n    * [ ] Label smoothing ?\n    * [ ] Model with OctConv ?\n    * [x] Train a pre-segmentation classifier ? \n        =\u003e No need to do this. \n        =\u003e LB F1 score is computed on complete image which should have ground truth pixels\n    * [x] Use CrossEntropy + Jaccard loss ?\n    * [ ] Try to implement and train GSCNN model ?\n    * [ ] Try Unsup Data Augmentation ?\n    * [ ] Try Pseudo-Labelling of test data ?\n    * [x] Try to train on all train dataset and validate on test dataset\n    * [ ] Try multiple-input architectures\n    * [x] Validation with TTA ?\n        =\u003e depending on model, in some cases can improve predictions\n    * [x] Data normalization over training mean/std\n        * [x] Validate with LWRefineNet model on 3b =\u003e same as original 3b\n    * [x] Try different input data:\n        * [x] Generate on fly 3 input channels transformed by log(x^2)\n            * [x] Validate the change with LWRefineNet model =\u003e worse than original 3 channels\n        * [x] Generate on fly 5 input channels: (VH, VV, (VH + VV) * 0.5, VH - VV, sqrt(VH^2 + VV^2))\n            * [x] Validate the change with LWRefineNet model =\u003e same as original 3 channels\n        * [x] Sample-wise min/max normalization with `x^0.2 - 0.5`\n            * [x] Validate the change with LWRefineNet model =\u003e same or worse than original 3 channels\n    * [x] More features and channel's normalization\n        * [x] VV * VH, VV / VH =\u003e small improvements for LWRefineNet, did not work for SE-ResNet50-FPN\n    \n\n* [ ] Inferences\n    * [ ] Handler to save images with predictions: `[img, img+preds, preds, img+gt, gt]` with a metric value, e.g. `IoU(1)`    \n    * [x] Handler to save predictions as images with geo info(tile level)\n    * [x] Script to aggregate predictions as images for the same city\n        * [x] Check if images has the same geo extension    \n    * [x] Script to vectorize masks into shapefiles (city level)\n    * [x] Add TTA using ttach package\n    \n* [x] Check submission score on which part of data =\u003e F1 score is computed on whole test dataset\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvfdev-5%2Funosat_challenge","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvfdev-5%2Funosat_challenge","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvfdev-5%2Funosat_challenge/lists"}