{"id":18621511,"url":"https://github.com/ethz-asl/autolabel","last_synced_at":"2025-04-11T03:30:43.957Z","repository":{"id":59983805,"uuid":"539578573","full_name":"ethz-asl/autolabel","owner":"ethz-asl","description":"A project for computing high-quality ground truth training examples for RGB-D data. ","archived":false,"fork":false,"pushed_at":"2023-06-10T12:58:14.000Z","size":288,"stargazers_count":44,"open_issues_count":0,"forks_count":3,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-25T08:38:09.904Z","etag":null,"topics":["computer-vision","labeling-tool","machine-learning","nerf","rgb-d","robotics"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ethz-asl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-09-21T16:16:29.000Z","updated_at":"2025-03-24T23:18:44.000Z","dependencies_parsed_at":"2023-01-19T11:32:27.771Z","dependency_job_id":null,"html_url":"https://github.com/ethz-asl/autolabel","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethz-asl%2Fautolabel","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethz-asl%2Fautolabel/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethz-asl%2Fautolabel/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ethz-asl%2Fautolabel/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ethz-asl","download_url":"https://codeload.github.com/ethz-asl/autolabel/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248335348,"owners_count":21086577,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","labeling-tool","machine-learning","nerf","rgb-d","robotics"],"created_at":"2024-11-07T04:12:14.577Z","updated_at":"2025-04-11T03:30:43.553Z","avatar_url":"https://github.com/ethz-asl.png","language":"Python","funding_links":[],"categories":["Paper List"],"sub_categories":["Follow-up Papers"],"readme":"# Autolabel\n\nThe goal of this project is to facilitate research in autolabeling, scene understanding and neural implicit feature fields.\n\nhttps://user-images.githubusercontent.com/1204635/191912816-0de3791c-d29b-458a-aead-ba020a0cc871.mp4\n\n## Getting started\n\n### Installing\n\nThe installation instructions were tested for Python 3.8 and 3.9. Some dependencies are recommended to be installed through Anaconda and we assume you are using an Anaconda environment for these instructions.\n\nThe software uses CUDA and compiling `tiny-cuda-nn` requires `nvcc`. If you don't have cuda \u003e= version 11.3, including `nvcc`, installed on your system, you can install it in your anaconda env with:\n```\nconda install -c conda-forge cudatoolkit-dev=11.4\n```\n\nTo install Pytorch and ffmpeg, run:\n```\nconda install pytorch torchvision cudatoolkit=11.3 -c pytorch\nconda install ffmpeg\n```\n\nInstall into your desired python environment with the following commands:\n```\npip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch\ngit clone --recursive git@github.com:cvg/Hierarchical-Localization.git\npushd Hierarchical-Localization/\npython -m pip install -e .\npopd\n\ngit submodule update --init --recursive\npushd torch_ngp\ngit submodule update --init --recursive\npip install -e .\nbash scripts/install_ext.sh\npopd\n\n# To use LSeg features for vision-language feature fields\ngit clone https://github.com/kekeblom/lang-seg\npushd lang-seg\npip install -e .\npopd\n\n# Finally install autolabel\npip install -e .\n```\n\n### Autolabeling Usage\n\nAfter installing the project using the instructions above, you can follow these steps to run autolabel on an example scene.\n\n```\n# Download example scene\nwget http://robotics.ethz.ch/~asl-datasets/2022_autolabel/bench.tar.gz\n# Uncompress\ntar -xvf bench.tar.gz\n\n# Compute camera poses, scene bounds and undistort images using raw input images\npython scripts/mapping.py bench\n\n# Compute DINO features from color images.\npython scripts/compute_feature_maps.py bench --features dino --autoencode\n# Pretrain neural representation on color, depth and extracted features\npython scripts/train.py bench --features dino\n\n# Open the scene in the graphical user interface for annotation\npython scripts/gui.py bench --features dino\n```\n\nOnce you have annotated a scene, you can train some more on the annotations and render a video of the annotations:\n```\n# Train some more on the given annotations\npython scripts/train.py bench --features dino\n\n# Export labels for learning on some downstream task.\n# The objects flag is optional, but tells it how many objects are in the scene per class.\n# It is used to remove noise from the produced segmentation maps.\n# Labels are saved at bench/output/semantic.\npython scripts/export.py bench --objects 1\n\n# Render a video of annotations and features\npython scripts/render.py bench --model-dir bench/nerf/g15_hg+freq_dino_rgb1.0_d0.1_s1.0_f0.5_do0.1/ --out bench.mp4\n```\n\n### Vision-language feature fields\n\nhttps://github.com/ethz-asl/autolabel/assets/1204635/3ab55149-c907-45e0-8da3-ca9fba090644\n\nThe repository contains an implementation of vision-language feature fields. See [`docs/vision-language.md`](docs/vision-language.md) for instructions on how to run and use vision-language examples and the ROS node.\n\n### GUI Keybindings\n\nThe GUI can be controlled with the following keybindings:\n\n| Key          | Class Name                    |\n| ------------ | ----------------------------- |\n| `0`          | select background paint brush |\n| `1`          | select foreground paint brush |\n| `esc` or `Q` | shutdown application          |\n| `ctrl+S`     | save model                    |\n| `C`          | clear image                   |\n\n\n## Scene directory structure\n\nThe scene directory structure is as follows:\n```\nraw_rgb/        # Raw distorted color frames.\nrgb/            # Undistorted color frames either as png or jpg.\n  00000.jpg\n  00001.jpg\n  ...\nraw_depth/      # Raw original distorted depth frames.\n  00000.png     # 16 bit grayscale png images where values are in millimeters.\n  00001.png     # Depth frames might be smaller in size than the rgb frames.\n  ...\ndepth/          # Undistorted frames to match a perfect pinhole camera model.\n  00000.png\n  00001.png\n  ...\npose/\n  00000.txt       # 4 x 4 world to camera transform.\n  00001.txt\n  ...\nsemantic/         # Ground truth semantic annotations provided by user.\n  00010.png       # These might not exist.\n  00150.png\ngt_masks/         # Optional\n  00010.json      # Dense ground truth masks used for evaluation.\n  00150.json      # Used e.g. by scripts/evaluate.py\nintrinsics.txt    # 4 x 4 camera matrix.\nbbox.txt          # 6 values denoting the bounds of the scene (min_x, min_y, min_z, max_x, max_y, max_z).\nnerf/             # Contains NeRF checkpoints and training metadata.\n```\n\n## Computing camera poses\n\nThe script [`scripts/mapping.py`](scripts/mapping.py) defines a mapping pipeline which will compute camera poses for your scene. The required input files are:\n- `raw_rgb/` images\n- `raw_depth/` frames\n- `intrinsics.txt` camera intrinsic parameters\n\nThe computed outputs are:\n- `rgb/` undistorted camera images\n- `depth/` undistorted depth images\n- `pose/` camera poses for each frame\n- `intrinsics.txt` inferred camera intrinsic parameters\n- `bbox.txt` scene bounds\n\n## Datasets\n\nData can be imported from various sources, including:\n- The [Stray Scanner app](https://apps.apple.com/us/app/stray-scanner/id1557051662)\n- [SemanticNeRF replica renders](https://github.com/Harry-Zhi/semantic_nerf/)\n- [ARKitScenes](https://github.com/apple/ARKitScenes)\n- [ScanNet](https://github.com/ScanNet/ScanNet)\n\nSee the [data documentation](docs/data.md) for instructions on how to import from different sources.\n\n## Debugging\n\n### Running scenes in `instant-ngp`\n\nFor debugging, visualization and for comparing results, the project includes a script to convert scenes for running in [`instant-ngp`](https://github.com/NVlabs/instant-ngp).\n\nTo do so, assuming you have `instant-ngp` installed, you can:\n1. Convert the dataset generated through `autolabel` to a format readable by `instant-ngp` using the script [`scripts/convert_to_instant_ngp.py`](./scripts/convert_to_instant_ngp.py). Example usage:\n    ```bash\n    python scripts/convert_to_instant_ngp.py --dataset_folder \u003cscene\u003e\n    ```\n2. Run `instant-ngp` on the converted dataset:\n    ```bash\n    cd \u003cpath/to/instant_ngp/installation\u003e\n    ./build/testbed --scene \u003cscene\u003e/transforms.json\n    ```\n\n## Pretraining on a scene\n\nTo fit the representation to the scene without the user interface, you can run `scripts/train.py`. Checkpoints and metadata data will be stored in the scene folder under the `nerf` directory.\n\n\nTo use pretrained features as additional training supervision, pretrain on these and then open the scene in the GUI, run:\n```\npython scripts/compute_feature_maps.py --features dino --autoencode \u003cscene\u003e\npython scripts/train.py --features dino \u003cscene\u003e\npython scripts/gui.py --features dino \u003cscene\u003e\n```\n\nThe models are saved in the scene folder under the `nerf` directory, organized according to the given parameters. I.e. the gui will load the model which matches the given parameters. If one is not found, it will simply randomly initialize the network.\n\n\n## Evaluating against ground truth frames\n\nWe use [labelme](https://github.com/wkentaro/labelme) to annotate ground truth frames. Follow the installation instructions, using for instance a `conda` environment, and making sure that your Python version is `\u003c3.10` to avoid type errors (see [here](https://github.com/wkentaro/labelme/issues/1020#issuecomment-1139749978)). To annotate frames, run:\n```\nlabelme rgb --nodata --autosave --output gt_masks\n```\ninside a scene directory, to annotate the frames in the `rgb` folder. Corresponding annotations will be saved into the `gt_masks` folder. You don't need to annotate every single frame, but can sample just a few.\n\nTo compute the intersection-over-union agreement against the manually annotated frames, run:\n```\npython scripts/evaluate.py \u003cscene1\u003e \u003cscene2\u003e # ...\n```\n\n## Code formatting\n\nThis repository enforces code formatting rules using [`yapf`](https://github.com/google/yapf). After installing, you can format the code before committing by running:\n```\nyapf --recursive autolabel scripts -i\n```\n\n### Vim\n\nIn case you want to automatically run formatting in Vim on save, you can follow these steps.\n\nFirst, install `google/yapf` as a vim plugin. If using Vundle, add `Plugin 'google/yapf'` to your `.vimrc` and run `:PluginInstall`.\n\nCopy the file `.yapf.vim` to `$HOME/.vim/autoload/yapf.vim`, creating the autoload directory if it doesn't exist.\n\nTo run yapf on save for Python files, add `autocmd FileType python autocmd BufWritePre \u003cbuffer\u003e call yapf#YAPF()` to your `.vimrc` then restart vim.\n\n## Research Papers\n\nBaking in the Feature: Accelerating Volumetric Segmentation by Rendering Feature Maps - [Link](https://keke.dev/baking-in-the-feature)\n\nNeural Implicit Vision-Language Feature Fields - [Link](https://arxiv.org/abs/2303.10962)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fethz-asl%2Fautolabel","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fethz-asl%2Fautolabel","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fethz-asl%2Fautolabel/lists"}