{"id":15027513,"url":"https://github.com/cvg/pixel-perfect-sfm","last_synced_at":"2025-04-08T00:39:09.405Z","repository":{"id":37727510,"uuid":"397306592","full_name":"cvg/pixel-perfect-sfm","owner":"cvg","description":"Pixel-Perfect Structure-from-Motion with Featuremetric Refinement (ICCV 2021, Best Student Paper Award)","archived":false,"fork":false,"pushed_at":"2024-07-30T18:03:22.000Z","size":9465,"stargazers_count":1387,"open_issues_count":43,"forks_count":149,"subscribers_count":48,"default_branch":"main","last_synced_at":"2025-03-31T23:36:39.093Z","etag":null,"topics":["3d-vision","deep-learning","feature-matching","structure-from-motion","visual-localization"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cvg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-17T15:31:09.000Z","updated_at":"2025-03-31T14:00:34.000Z","dependencies_parsed_at":"2023-01-29T16:01:33.227Z","dependency_job_id":"bd576154-cf8f-40dd-84e3-dc736c34c16a","html_url":"https://github.com/cvg/pixel-perfect-sfm","commit_stats":{"total_commits":42,"total_committers":10,"mean_commits":4.2,"dds":0.5714285714285714,"last_synced_commit":"40f7c1339328b2a0c7cf71f76623fb848e0c0357"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvg%2Fpixel-perfect-sfm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvg%2Fpixel-perfect-sfm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvg%2Fpixel-perfect-sfm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvg%2Fpixel-perfect-sfm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cvg","download_url":"https://codeload.github.com/cvg/pixel-perfect-sfm/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247755560,"owners_count":20990620,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["3d-vision","deep-learning","feature-matching","structure-from-motion","visual-localization"],"created_at":"2024-09-24T20:06:35.994Z","updated_at":"2025-04-08T00:39:09.380Z","avatar_url":"https://github.com/cvg.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pixel-Perfect Structure-from-Motion\n\n### Best student paper award @ [ICCV 2021](http://iccv2021.thecvf.com/)\n\nWe introduce a framework that **improves the accuracy of Structure-from-Motion (SfM) and visual localization** by refining keypoints, camera poses, and 3D points using the direct alignment of deep features. It is presented in our paper:\n- [Pixel-Perfect Structure-from-Motion with Featuremetric Refinement](https://arxiv.org/abs/2108.08291)\n- Authors: [Philipp Lindenberger](https://scholar.google.com/citations?user=FMVAi2YAAAAJ\u0026hl=en)\\*, [Paul-Edouard Sarlin](https://psarlin.com/)\\*, [Viktor Larsson](http://people.inf.ethz.ch/vlarsson/), and [Marc Pollefeys](http://people.inf.ethz.ch/pomarc/)\n- Website: [psarlin.com/pixsfm](https://psarlin.com/pixsfm/) (videos, slides, poster)\n\nHere we provide `pixsfm`, a Python package that can be readily used with [COLMAP](https://colmap.github.io/) and [our toolbox hloc](https://github.com/cvg/Hierarchical-Localization/). This makes it easy to **refine an existing COLMAP model or reconstruct a new dataset with state-of-the-art image matching**. Our framework also improves visual localization in challenging conditions.\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://arxiv.org/abs/2108.08291\"\u003e\u003cimg src=\"doc/assets/pipeline.svg\" width=\"80%\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nThe refinement is composed of 2 steps:\n\n1. **Keypoint adjustment:** before SfM, jointly refine all 2D keypoints that are matched together.\n2. **Bundle adjustment:** after SfM, refine 3D points and camera poses.\n\nIn each step, we optimize the consistency of dense deep features over multiple views by minimizing a featuremetric cost. These features are extracted beforehand from the images using a pre-trained CNN.\n\nWith `pixsfm`, you can:\n\n- reconstruct and refine a scene using hloc, from scratch or with given camera poses\n- localize and refine new query images using hloc\n- run the keypoint or bundle adjustments on a COLMAP database or 3D model\n- evaluate the refinement with new dense or sparse features on the ETH3D dataset\n\n Our implementation scales to large scenes by carefully managing the memory and leveraging parallelism and SIMD vectorization when possible.\n\n## Installation\n\n`pixsfm` requires Python \u003e=3.6, GCC \u003e=6.1, and COLMAP 3.8 [installed from source](https://colmap.github.io/install.html#build-from-source). The core optimization is implemented in C++ with [Ceres \u003e= 2.1](https://github.com/ceres-solver/ceres-solver/) but we provide Python bindings with high granularity. The code is written for UNIX and has not been tested on Windows. The remaining dependencies are listed in `requirements.txt` and include [PyTorch](https://pytorch.org/) \u003e=1.7 and [pycolmap](https://github.com/colmap/pycolmap) + [pyceres](https://github.com/cvg/pyceres) built from source:\n\n```bash\n# install COLMAP following colmap.github.io/install.html#build-from-source, tag 3.8\nsudo apt-get install libhdf5-dev\ngit clone https://github.com/cvg/pixel-perfect-sfm --recursive\ncd pixel-perfect-sfm\npip install -r requirements.txt\n```\n\nTo use other local features besides SIFT via COLMAP, we also require [hloc](https://github.com/cvg/Hierarchical-Localization/):\n```bash\ngit clone --recursive https://github.com/cvg/Hierarchical-Localization/\ncd Hierarchical-Localization/\npip install -e .\n```\n\nFinally build and install the `pixsfm` package:\n```bash\npip install -e .  # install pixsfm in develop mode\n```\n\nWe highly recommend to use `pixsfm` with a working GPU for the dense feature extraction. All other steps can only run on the CPU. Having issues with compilation errors or runtime crashes? Want to use the codebase as a C++ library? Check our [FAQ](./doc/FAQ.md).\n\n## Tutorial\n\nThe Jupyter notebook [`demo.ipynb`](./demo.ipynb) demonstrates a minimal usage example. It shows how to run Structure-from-Motion and the refinement, how to align and compare different 3D models, and how to localize and refine additional query images.\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"./notebooks/demo.ipynb\"\u003e\u003cimg src=\"doc/assets/demo.gif\" width=\"60%\"/\u003e\u003c/a\u003e\n  \u003cbr /\u003e\u003cem\u003eVisualizing mapping and localization results in the demo.\u003c/em\u003e\n\u003c/p\u003e\n\n## Structure-from-Motion\n\n### End-to-end SfM with hloc\n\nGiven keypoints and matches computed with hloc and stored in HDF5 files, we can run Pixel-Perfect SfM from a Python script:\n\n```python\nfrom pixsfm.refine_hloc import PixSfM\nrefiner = PixSfM()\nmodel, debug_outputs = refiner.reconstruction(\n    path_to_working_directory,\n    path_to_image_dir,\n    path_to_list_of_image_pairs,\n    path_to_keypoints.h5,\n    path_to_matches.h5,\n)\n# model is a pycolmap.Reconstruction 3D model\n```\n\nor from the command line:\n\n```bash\npython -m pixsfm.refine_hloc reconstructor \\\n    --sfm_dir path_to_working_directory \\\n    --image_dir path_to_image_dir \\\n    --pairs_path path_to_list_of_image_pairs \\\n    --features_path path_to_keypoints.h5 \\\n    --matches_path path_to_matches.h5\n```\n\nNote that:\n\n- The final refined 3D model is written to `path_to_working_directory` in either case.\n- Dense features are automatically extracted (on GPU when available) using a pre-trained CNN, [S2DNet](https://github.com/germain-hug/S2DNet-Minimal) by default.\n- The result `debug_outputs` contains the dense features and optimization statistics.\n\n### Configurations\n\nWe have fine-grained control over all hyperparameters via [OmegaConf](https://omegaconf.readthedocs.io/) configurations, which have sensible default values defined in `PixSfM.default_conf`. See [Detailed configuration](#detailed-configuration) for a description of the main configuration entries and their defaults.\n\n\u003cdetails\u003e\n\u003csummary\u003e[Click to see some examples]\u003c/summary\u003e\n\nFor example, dense features are stored in memory by default. If we reconstruct a large scene or have limited RAM, we should instead write them to a cache file that is loaded on-demand. With the Python API, we can pass a configuration update:\n```python\nrefiner = PixSfM(conf={\"dense_features\": {\"use_cache\": True}})\n```\nor equivalently with the command line [using a dotlist](https://omegaconf.readthedocs.io/en/2.1_branch/usage.html#from-command-line-arguments):\n\n```bash\npython -m pixsfm.refine_hloc reconstructor [...] dense_features.use_cache=true\n```\n\nWe also provide ready-to-use configuration templates in [`pixsfm/configs/`](./pixsfm/configs/) covering the main use cases. For example, [`pixsfm/configs/low_memory.yaml`](./pixsfm/configs/low_memory.yaml) reduces the memory consumption to scale to large scene and can be used as follow:\n```python\nrefiner = PixSfM(conf=\"low_memory\")\n# or\npython -m pixsfm.refine_hloc reconstructor [...] --config low_memory\n```\n\n\u003c/details\u003e\n\n### Triangulation from known camera poses\n\n\u003cdetails\u003e\n\u003csummary\u003e[Click to expand]\u003c/summary\u003e\n\nIf camera poses are available, we can simply triangulate a 3D point cloud from an existing reference COLMAP model with:\n\n```python\nmodel, _ = refiner.triangulation(..., path_to_reference_model, ...)\n```\nor\n```bash\npython -m pixsfm.refine_hloc triangulator [...] \\\n    --reference_sfm_model path_to_reference_model\n```\n\nBy default, camera poses and intrinsics are optimized by the bundle adjustment. To keep them fixed, we can simply overwrite the corresponding options as:\n```python\nconf = {\"BA\": {\"optimizer\": {\n    \"refine_focal_length\": False,\n    \"refine_extra_params\": False,  # distortion parameters\n    \"refine_extrinsics\": False,    # camera poses\n}}}\nrefiner = PixSfM(conf=conf)\nrefiner.triangulation(...)\n```\nor equivalently\n```bash\npython -m pixsfm.refine_hloc triangulator [...] \\\n  'BA.optimizer={refine_focal_length: false, refine_extra_params: false, refine_extrinsics: false}'\n```\n\n\u003c/details\u003e\n\n### Keypoint adjustment\n\nThe first step of the refinement is the keypoint adjustment (KA). It refines the keypoints from tentative matches only, before SfM. Here we show how to run this step separately.\n\n\u003cdetails\u003e\n\u003csummary\u003e[Click to expand]\u003c/summary\u003e\n\nTo refine keypoints stored in an hloc HDF5 feature file:\n```python\nfrom pixsfm.refine_hloc import PixSfM\nrefiner = PixSfM()\nkeypoints, _, _ = refiner.refine_keypoints(\n    path_to_output_keypoints.h5,\n    path_to_input_keypoints.h5,\n    path_to_list_of_image_pairs,\n    path_to_matches.h5,\n    path_to_image_dir,\n)\n```\n\nTo refine keypoints stored in a COLMAP database:\n```python\nfrom pixsfm.refine_colmap import PixSfM\nrefiner = PixSfM()\nkeypoints, _, _ = refiner.refine_keypoints_from_db(\n    path_to_output_database,  # pass path_to_input_database for in-place refinement\n    path_to_input_database,\n    path_to_image_dir,\n)\n```\n\nIn either case, there is an equivalent command line interface.\n\n\u003c/details\u003e\n\n### Bundle adjustment\n\nThe second contribution of the refinement is the bundle adjustment (BA). Here we show how to run it separately to refine an existing COLMAP 3D model.\n\n\u003cdetails\u003e\n\u003csummary\u003e[Click to expand]\u003c/summary\u003e\n\nTo refine a 3D model stored on file:\n```python\nfrom pixsfm.refine_colmap import PixSfM\nrefiner = PixSfM()\nmodel, _, _, = refiner.refine_reconstruction(\n    path_to_input_model,\n    path_to_output_model,\n    path_to_image_dir,\n)\n```\n\nUsing the command line interface:\n```bash\npython -m pixsfm.refine_colmap bundle_adjuster \\\n    --input_path path_to_input_model \\\n    --output_path path_to_output_model \\\n    --image_dir path_to_image_dir\n```\n\n\u003c/details\u003e\n\n## Visual localization\n\nWhen estimating the camera pose of a single image, we can also run the keypoint and bundle adjustments before and after PnP+RANSAC. This requires reference features attached to each observation of the reference model. They can be computed in several ways.\n\n\u003cdetails\u003e\n\u003csummary\u003e[Click to learn how to localize a single image]\u003c/summary\u003e\n\n1. To recompute the references from scratch, pass the path to the reference images:\n\n```python\nfrom pixsfm.localization import QueryLocalizer\nlocalizer = QueryLocalizer(\n    reference_model,  # pycolmap.Reconstruction 3D model\n    image_dir=path_to_reference_image_dir,\n    dense_features=cache_path,  # optional: cache to file for later reuse\n)\npose_dict = localizer.localize(\n    pnp_points2D      # keypoints with valid 3D correspondence (N, 2)\n    pnp_point3D_ids,  # IDs of corresponding 3D points in the reconstruction\n    query_camera,     # pycolmap.Camera\n    image_path=path_to_query_image,\n)\nif pose_dict[\"success\"]:\n    # quaternion and translation of the query, from world to camera\n    qvec, tvec = pose_dict[\"qvec\"], pose_dict[\"tvec\"]\n```\n\nThe default localization configuration can be accessed with `QueryLocalizer.default_conf`.\n\n2. Alternatively, if dense reference features have already been computed during the pixel-perfect SfM, it is more efficient to reuse them:\n\n```python\nrefiner = PixSfM()\nmodel, outputs = refiner.reconstruction(...)\nfeatures = outputs[\"feature_manager\"]\n# or load the features manually\nfeatures = pixsfm.extract.load_features_from_cache(\n    refiner.resolve_cache_path(output_dir=path_to_output_sfm)\n)\nlocalizer = QueryLocalizer(\n    reference_model,  # pycolmap.Reconstruction 3D model\n    dense_features=features,\n)\n```\n\n\u003c/details\u003e\n\nWe can also batch-localize multiple queries equivalently to [`hloc.localize_sfm`](https://github.com/cvg/Hierarchical-Localization/blob/master/hloc/localize_sfm.py):\n\n```python\npixsfm.localize.main(\n    dense_features,  # FeatureManager or path to cache file\n    reference_model,  # pycolmap.Reconstruction 3D model\n    path_to_query_list,\n    path_to_image_dir,\n    path_to_image_pairs,\n    path_to_keypoints,\n    path_to_matches,\n    path_to_output_results,\n    config=config,  # optional dict\n)\n```\n\n## Example: mapping and localization\n\nWe now show how to run the featuremetric pipeline on the Aachen Day-Night v1.1 dataset. First, download the dataset by following [the instructions described here](https://github.com/cvg/Hierarchical-Localization/tree/master/hloc/pipelines/Aachen_v1_1#installation). Then run `python examples/sfm+loc_aachen.py`, which will perform mapping and localization with SuperPoint+SuperGlue. As the scene is large, with over 7k images, we cache the dense feature patches and therefore require about 350GB of free disk space. Expect the sparse feature matching to take a few hours on a recent GPU. We also show in [`examples/refine_sift_aachen.py`](examples/refine_sift_aachen.py) how to start from an existing COLMAP database.\n\n## Evaluation\n\nWe can evaluate the accuracy of the pixel-perfect SfM and of camera pose estimation on the ETH3D dataset. Refer to the paper for more details.\n\nFirst, we download the dataset with `python -m pixsfm.eval.eth3d.download`, by default to `./datasets/ETH3D/`.\n\n### 3D triangulation\n\n\u003cdetails\u003e\n\u003csummary\u003e[Click to expand]\u003c/summary\u003e\n\nWe first need to install the [ETH3D multi-view evaluation tool](https://github.com/ETH3D/multi-view-evaluation):\n\n```bash\nsudo apt install libpcl-dev  # linux only\ngit clone git@github.com:ETH3D/multi-view-evaluation.git\ncd multi-view-evaluation \u0026\u0026 mkdir build \u0026\u0026 cd build\ncmake .. \u0026\u0026 make -j\n```\n\nWe can then evaluate the accuracy of the sparse 3D point cloud triangulated with Pixel-Perfect SfM, for example on the courtyard scene with SuperPoint keypoints:\n\n```bash\npython -m pixsfm.eval.eth3d.triangulation \\\n    --scenes courtyard \\\n    --methods superpoint \\\n    --tag pixsfm\n```\n\n- omit `--scenes` and `--methods` to run all scenes with all feature detectors.\n- the results are written to `./outputs/ETH3D/` by default\n- use `--tag some_run_name` to distinguish different runs\n- add `--config norefine` to turn off any refinement or use the dotlist `KA.apply=false BA.apply=false` \n- add `--config photometric` to run the photometric BA (no KA)\n\nTo aggregate the results and compare different runs, for example with and without refinement, we run:\n\n```bash\npython -m pixsfm.eval.eth3d.plot_triangulation \\\n    --scenes courtyard \\\n    --methods superpoint \\\n    --tags pixsfm raw\n```\n\nRunning on all scenes and all detectors should yield the following results (±1%):\n\n```\n----scene---- -keypoints- -tag-- -accuracy @ X cm- completeness @ X cm\n                                  1.0   2.0   5.0   1.0   2.0   5.0 \n----------------------------------------------------------------------\nindoor        sift        raw    75.95 85.50 92.88  0.21  0.88  3.65\n                          pixsfm 83.16 89.94 94.94  0.25  0.96  3.77\n              superpoint  raw    78.96 87.77 94.55  0.64  2.36  9.39\n                          pixsfm 89.93 94.09 97.04  0.76  2.62  9.85\n              r2d2        raw    67.91 80.25 90.45  0.55  2.12  8.85\n                          pixsfm 81.09 87.78 93.41  0.67  2.32  9.04\n----------------------------------------------------------------------\noutdoor       sift        raw    57.70 72.90 86.41  0.06  0.34  2.46\n                          pixsfm 68.10 80.57 91.59  0.08  0.42  2.75\n              superpoint  raw    53.63 68.93 83.27  0.11  0.64  4.43\n                          pixsfm 71.83 82.65 92.06  0.18  0.89  5.40\n              r2d2        raw    49.33 66.21 83.37  0.11  0.55  3.62\n                          pixsfm 67.94 81.02 91.68  0.16  0.71  3.99\n```\n\nThe results of this evaluation can be different from the numbers reported in the paper. The trends are however similar and the conclusions of the paper still hold. This difference is due to improvements of the `pixsfm` code and to changes in the SuperPoint implementation: we initially used the setup of [PatchFlow](https://github.com/mihaidusmanu/local-feature-refinement) and later switched to hloc, which is strictly better and easier to install.\n\n\u003c/details\u003e\n\n### Camera pose estimation\n\n\u003cdetails\u003e\n\u003csummary\u003e[Click to expand]\u003c/summary\u003e\n\nSimilarly, we evaluate the accuracy of camera pose estimation given sparse 3D models triangulated from other views:\n```bash\npython -m pixsfm.eval.eth3d.localization --tag pixsfm\n```\n\nAgain, we can also run on a subset of scenes or keypoint detectors. To aggregate the results and compare different runs, for example with and without KA and BA, we run:\n\n```bash\npython -m pixsfm.eval.eth3d.plot_localization --tags pixsfm raw\n```\n\nWe should then obtain the following table and plot (±2%):\n\n\u003ctable\u003e\u003ctr\u003e\u003ctd\u003e\n\n```\n-keypoints- -tag-- -AUC @ X cm (%)--\n                    0.1    1    10  \nsift        raw    16.92 55.39 81.15\n            pixsfm 23.08 60.47 84.01\nsuperpoint  raw    15.38 63.41 87.24\n            pixsfm 41.54 73.86 89.66\nr2d2        raw     6.15 51.70 83.46\n            pixsfm 23.85 62.41 86.89\n```\n\n\u003c/td\u003e\u003ctd\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"doc/assets/eth3d_localization.svg\" width=\"60%\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nSIFT (black), SuperPoint (red), R2D2 (green)\n\n\u003c/td\u003e\u003c/tr\u003e\u003c/table\u003e\n\nResults for the 0.1cm threshold can vary across setups and therefore differ from the numbers reported in the paper. This might be due to changes in the PyTorch and COLMAP dependencies. We are investigating this but any help is welcome!\n\n\u003c/details\u003e\n\n## Advanced usage\n\n### Detailed configuration\n\nHere we explain the main configuration entries for mapping and localization along with their default values:\n\n\u003cdetails\u003e\n\u003csummary\u003e[Click to expand]\u003c/summary\u003e\n\n```yaml\ndense_features:  # refinement features\n  model:  # the CNN that extracts the features\n    name: s2dnet  # the name of one of the models defined in pixsfm/features/models/\n    num_layers: 1  # the number of output layers (model-specific parameters)\n  device: auto  # cpu, cuda, or auto-determined based on CUDA availability\n  max_edge: 1600  # downscale the image such the largest dimension has this value\n  resize: LANCZOS  # interpolation algorithm for the image resizing\n  pyr_scales: [1.0]   # concat features extracted at multiple scales\n  fast_image_load: false  # approximate resizing for large images\n  l2_normalize: true  # whether to normalize the features so they have unit norm\n  sparse: true  # whether to store sparse patches of features instead of the full feature maps\n  patch_size: 8  # the size of the feature patches if sparse\n  dtype: half  # the data type of features when stored, half float or double\n  use_cache: false  # whether to cache the features on file or keep them in memory\n  overwrite_cache: false  # whether to overwrite the cache file if it already exists\n  cache_format: chunked\ninterpolation:\n  nodes: [[0.0, 0.0]]  # grid over which to compute the cost, by default a single point\n  mode: BICUBIC  # the interpolation algorithm\n  l2_normalize: true\n  ncc_normalize: false  # only works if len(nodes)\u003e1, mostly for photometric\nmapping:  # pixsfm.refine_colmap.PixSfM\n  dense_features: ${..dense_features}\n  KA:  # keypoint adjustment\n    apply: true  # whether to apply or instead skip\n    strategy: featuremetric  # regular, or alternatively topological_reference (much faster)\n    interpolation: ${...interpolation}  # we can use a different interpolation for KA\n    level_indices: null  # we can optimize a subset of levels, by default all\n    split_in_subproblems: true  # parallelize the optimization\n    max_kps_per_problem: 50  # parallelization, a lower value saves memory, conservative if -1\n    optimizer:  # optimization problem and solving\n      loss:\n        name: cauchy  # name of the loss function, among {huber, soft_l1, ...}\n        params: [0.25]  # loss-specific parameters\n      solver:\n        function_tolerance: 0.0\n        gradient_tolerance: 0.0\n        parameter_tolerance: 1.0e-05\n        minimizer_progress_to_stdout: false  # print a progress bar\n        max_num_iterations: 100  # maximum number of optimization iterations\n        max_linear_solver_iterations: 200\n        max_num_consecutive_invalid_steps: 10\n        max_consecutive_nonmonotonic_steps: 10\n        use_inner_iterations: false\n        use_nonmonotonic_steps: false\n        num_threads: 1\n      root_regularize_weight: -1  # prevent drift by adding edges to the root node, disabled if -1\n      print_summary: false  # whether to print a detailed summary after completion\n      bound: 4.0  # constraint on the distance (in pixels) w.r.t. the initial values\n      num_threads: -1  # number of threads if parallelize in subproblems\n  BA:  # bundle adjustment\n    apply: true  # whether to apply or instead skip\n    strategy: feature_reference  # regular, or alternatively {costmaps, patch_warp}\n    interpolation: ${...interpolation}  # we can use a different interpolation for BA\n    level_indices: null  # we can optimize a subset of levels, by default all\n    max_tracks_per_problem: 10  # parallelization of references/costmaps, a lower value saves memory\n    num_threads: -1\n    optimizer:\n      loss:  # same config as KA.optimizer.loss\n      solver:  # same config as KA.optimizer.solver\n      print_summary: false\n      refine_focal_length: true  # whether to optimize the focal length\n      refine_principal_point: false  # whether to optimize the principal points\n      refine_extra_params: true  # whether to optimize distortion parameters\n      refine_extrinsics: true  # whether to optimize the camera poses\n    references:  # if strategy==feature_reference\n      loss:  # what to minimize to compute the robust mean\n        name: cauchy\n        params: [0.25]\n      iters: 100  # number of iterations to compute the robust mean\n      num_threads: -1\n    repeats: 1\nlocalization:  # pixsfm.localization.main.QueryLocalizer\n  dense_features: ${..dense_features}\n  target_reference: nearest  # how to select references, in {nearest, robust_mean, all_observations}\n  overwrite_features_sparse: null  # overwrite dense_features.sparse in query localization only\n  references:  # how to compute references\n    loss:  # what to minimize to compute the robust mean, same as BA.references.loss\n    iters: 100\n    keep_observations: true  # required for target_reference in {nearest, all_observations}\n    num_threads: -1\n  max_tracks_per_problem: 50  # parallelization of references, a lower value saves memory\n  unique_inliers: min_error  # how we select unique matches for each 3D point\n  QKA:  # query keypoint adjustment\n    apply: true  # whether to apply or instead skip\n    interpolation: ${...interpolation}\n    level_indices: null\n    feature_inlier_thresh: -1  # discard points with high feature error, disabled if -1\n    stack_correspondences: False # Stack references for equal keypoints\n    optimizer:\n      loss:  # same config as KA.optimizer.loss\n        name: trivial  # L2, no robust loss function\n        params: []\n      solver:  # same config as KA.optimizer.solver\n      print_summary: false\n      bound: 4.0  # constraint on the distance (in pixels) w.r.t. the initial values\n  PnP:\n    estimation:  # pycolmap.absolute_pose_estimation\n      ransac:\n        max_error: 12  # inlier threshold in pixel reprojection error\n        estimate_focal_length: false  # if the focal length is unknown\n    refinement:  # refinement in pycolmap.absolute_pose_estimation\n    \trefine_focal_length: false\n    \trefine_extra_params: false\n  QBA:  # query bundle adjuster\n    apply: true  # whether to apply or instead skip\n    interpolation: ${...interpolation}\n    level_indices: null\n    optimizer:\n      loss:  # same config as KA.optimizer.loss\n      solver:  # same config as KA.optimizer.solver\n      print_summary: false\n      refine_focal_length: false\n      refine_principal_point: false\n      refine_extra_params: false\n```\n\nNote that the config supports [variable interpolation](https://omegaconf.readthedocs.io/en/2.0_branch/usage.html#variable-interpolation) through omegaconf.\n\n\u003c/details\u003e\n\n### Large-scale refinement\n\nWhen dealing with large scenes or with a large number of images, memory is often a bottleneck. The configuration [`low_memory`](./pixsfm/configs/low_memory.yaml) shows how to decrease the memory consumption by trading-off accuracy and speed.\n\n\u003cdetails\u003e\n\u003csummary\u003e[Click to expand]\u003c/summary\u003e\n\nThe main improvements are:\n- `dense_features`\n  - store as sparse patches: `sparse=true`\n  - reduce the size of the patches: `patch_size=8` (or smaller)\n  - store in a cache file: `use_cache=true`\n- `KA`\n  - chunk the optimization, loading only a subset of features at once: `split_in_subproblems=true`\n  - optimize at most around 50 keypoints per chunk: `max_kps_per_problem=50`\n- `BA`\n  - use the costmap approximation: `strategy=costmaps` (described in Section C of the paper)\n\n\u003c/details\u003e\n\nWhen runtime is a limitation, one can also reduce the runtime of KA by optimizing only costs with respect to the topological center of each track with `KA.strategy=topological_reference`.\n\n### Keypoints with large noise\n\n\u003cdetails\u003e\n\u003csummary\u003e[Click to expand]\u003c/summary\u003e\n\nSome keypoint detectors with low output resolution, like D2-Net, predict keypoints that are localized inaccurately. In this case, the refinement is highly beneficial but the default parameters are not optimal. It is necessary to increase the patch size and use multiple feature layers. An example configuration is given in [`pixsfm_eth3d_d2net`](./pixsfm/configs/pixsfm_eth3d_d2net.yaml) to evaluate D2-Net on ETH3D.\n\n\u003c/details\u003e\n\n### Extending pixsfm\n\n- To refine your own sparse keypoints or matcher, refer to [Using your own local features or matcher](https://github.com/cvg/Hierarchical-Localization/#using-your-own-local-features-or-matcher) in hloc.\n- To add different dense features, see [Using your own dense features](./doc/features.md#using-your-own-features).\n- For a description of how dense features are accessed and stored, see [doc/features.md](./doc/features.md).\n- For a description of the internals of `pixsfm`, see [Design Principles](./doc/general.md).\n\nStill having questions about `pixsfm`? Anything in the doc is unclear? Are you unsure whether it fits your use case? Please let us know by opening an issue!\n\n## Contributing\n\nWe welcome external contributions, especially to improve the following points:\n\n- [ ] make `pixsfm` work on Windows\n- [ ] train and integrate dense features that are more compact with fewer dimensions\n- [ ] build a conda package for pixsfm and pycolmap to not require installing COLMAP from source\n- [ ] add examples on how to build featuremetric problems with pyceres\n\n## BibTex citation\n\nPlease consider citing our work if you use any code from this repo or ideas presented in the paper:\n\n```\n@inproceedings{lindenberger2021pixsfm,\n  author    = {Philipp Lindenberger and\n               Paul-Edouard Sarlin and\n               Viktor Larsson and\n               Marc Pollefeys},\n  title     = {{Pixel-Perfect Structure-from-Motion with Featuremetric Refinement}},\n  booktitle = {ICCV},\n  year      = {2021},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcvg%2Fpixel-perfect-sfm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcvg%2Fpixel-perfect-sfm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcvg%2Fpixel-perfect-sfm/lists"}