{"id":13738384,"url":"https://github.com/juglab/FourierImageTransformer","last_synced_at":"2025-05-08T16:33:32.452Z","repository":{"id":37399137,"uuid":"323406520","full_name":"juglab/FourierImageTransformer","owner":"juglab","description":"Fourier Image Transformer (FIT) can solve relevant image analysis tasks in Fourier space.","archived":false,"fork":false,"pushed_at":"2022-03-23T06:30:05.000Z","size":19481,"stargazers_count":97,"open_issues_count":3,"forks_count":13,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-18T04:01:36.783Z","etag":null,"topics":["computer-vision","image-processing","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/juglab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-12-21T17:39:20.000Z","updated_at":"2025-01-25T17:08:44.000Z","dependencies_parsed_at":"2022-07-08T16:47:24.157Z","dependency_job_id":null,"html_url":"https://github.com/juglab/FourierImageTransformer","commit_stats":null,"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juglab%2FFourierImageTransformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juglab%2FFourierImageTransformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juglab%2FFourierImageTransformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/juglab%2FFourierImageTransformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/juglab","download_url":"https://codeload.github.com/juglab/FourierImageTransformer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253105503,"owners_count":21855044,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","image-processing","transformer"],"created_at":"2024-08-03T03:02:20.731Z","updated_at":"2025-05-08T16:33:31.983Z","avatar_url":"https://github.com/juglab.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Fourier Image Transformer\n\nTim-Oliver Buchholz\u003csup\u003e1\u003c/sup\u003e and Florian Jug\u003csup\u003e2\u003c/sup\u003e\u003c/br\u003e\n\u003csup\u003e1\u003c/sup\u003etibuch@mpi-cbg.de, \u003csup\u003e2\u003c/sup\u003eflorian.jug@fht.org\n\nTransformer architectures show spectacular performance on NLP tasks and have recently also been used for tasks such as\nimage completion or image classification. Here we propose to use a sequential image representation, where each prefix of\nthe complete sequence describes the whole image at reduced resolution. Using such Fourier Domain Encodings (FDEs), an\nauto-regressive image completion task is equivalent to predicting a higher resolution output given a low-resolution\ninput. Additionally, we show that an encoder-decoder setup can be used to query arbitrary Fourier coefficients given a\nset of Fourier domain observations. We demonstrate the practicality of this approach in the context of computed\ntomography (CT) image reconstruction. In summary, we show that Fourier Image Transformer (FIT) can be used to solve\nrelevant image analysis tasks in Fourier space, a domain inherently inaccessible to convolutional architectures.\n\nPreprint: [arXiv](https://arxiv.org/abs/2104.02555)\n\n## FIT for Super-Resolution\n\n![SRes](figs/SRes.png)\n\n__FIT for super-resolution.__ Low-resolution input images are first transformed into Fourier space and then unrolled\ninto an FDE sequence, as described in Section 3.1 of the paper. This FDE sequence can now be fed to a FIT, that,\nconditioned on this input, extends the FDE sequence to represent a higher resolution image. This setup is trained using\nan FC-Loss that enforces consistency between predicted and ground truth Fourier coefficients. During inference, the FIT\nis conditioned on the first 39 entries of the FDE, corresponding to (a,d) 3x Fourier binned input images. Panels (b,e)\nshow the inverse Fourier transform of the predicted output, and panels (c,f) depict the corresponding ground truth.\n\n## FIT for Tomography\n\n![TRec](figs/TRec.png)\n\n__FIT for computed tomography.__ We propose an encoder-decoder based Fourier Image Transformer setup for tomographic\nreconstruction. In 2D computed tomography, 1D projections of an imaged sample (i.e. the columns of a sinogram) are\nback-transformed into a 2D image. A common method for this transformationis the filtered backprojection (FBP). Since\neach projection maps to a line of coefficients in 2D Fourier space, a limited number of projections in a sinogram leads\nto visible streaking artefacts due to missing/unobserved Fourier coefficients. The idea of our FIT setup is to encode\nall information of a given sinogram and use the decoder to predict missing Fourier coefficients. The reconstructed image\nis then computed via an inverse Fourier transform (iFFT) of these predictions. In order to reduce high frequency\nfluctuations in this result, we introduce a shallow conv-block after the iFFT (shown in black). We train this setup\ncombining the FC-Loss, see Section 3.2 in the paper, and a conventional MSE-loss between prediction and ground truth.\n\n## Installation\n\nWe use [fast-transformers](https://github.com/idiap/fast-transformers) as underlying transformer implementation. In our super-resolution experiments we use their\n`causal-linear` implementation, which uses custom CUDA code (prediction works without this custom code). This code is\ncompiled during the installation of fast-transformers and it is necessary that CUDA and NVIDIA driver versions match.\nFor our experiments we used CUDA 10.2 and NVIDIA driver 440.118.02.\n\nWe recommend to install Fast Image Transformer into a new [conda](https://docs.conda.io/en/latest/miniconda.html)\nenvironment:\n\n`conda create -n fit python=3.9`\n\nNext activate the new environment.:\n\n`conda activate fit`\n\nThen we install PyTorch for CUDA 10.2:\n\n`conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch`\n\nFollowed by installing fast-transformers:\n\n`pip install --user pytorch-fast-transformers`\n\nNow we have to install the `astra-toolbox`:\n\n`conda install -c astra-toolbox/label/dev astra-toolbox`\n\nAnd finally we install Fourier Image Transformer:\n\n`pip install fourier-image-transformer`\n\nStart the jupyter server:\n\n`jupyter notebook`\n\n\n## Cite\n```\n@misc{buchholz2021fourier,\n      title={Fourier Image Transformer}, \n      author={Tim-Oliver Buchholz and Florian Jug},\n      year={2021},\n      eprint={2104.02555},\n      archivePrefix={arXiv},\n      primaryClass={cs.CV}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuglab%2FFourierImageTransformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuglab%2FFourierImageTransformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuglab%2FFourierImageTransformer/lists"}