{"id":23370635,"url":"https://github.com/waltersimoncini/fungivision","last_synced_at":"2025-09-04T03:33:28.051Z","repository":{"id":248572028,"uuid":"824716489","full_name":"WalterSimoncini/fungivision","owner":"WalterSimoncini","description":"Library implementation of \"No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations\"","archived":false,"fork":false,"pushed_at":"2024-10-31T12:04:13.000Z","size":100,"stargazers_count":36,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-24T14:21:13.563Z","etag":null,"topics":["computer-vision","deep-learning","retrieval","self-supervised-learning"],"latest_commit_sha":null,"homepage":"https://fungi.ashita.nl","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/WalterSimoncini.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-05T19:07:06.000Z","updated_at":"2025-03-03T06:42:10.000Z","dependencies_parsed_at":"2024-07-15T21:52:13.272Z","dependency_job_id":"5aa4806c-c2b7-4d6d-bbd3-d3e5cae54b08","html_url":"https://github.com/WalterSimoncini/fungivision","commit_stats":null,"previous_names":["waltersimoncini/fungivision"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WalterSimoncini%2Ffungivision","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WalterSimoncini%2Ffungivision/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WalterSimoncini%2Ffungivision/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/WalterSimoncini%2Ffungivision/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/WalterSimoncini","download_url":"https://codeload.github.com/WalterSimoncini/fungivision/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248252696,"owners_count":21072701,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","retrieval","self-supervised-learning"],"created_at":"2024-12-21T15:47:54.266Z","updated_at":"2025-04-10T16:32:30.999Z","avatar_url":"https://github.com/WalterSimoncini.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **FUNGI**: **F**eatures from **UN**supervised **G**rad**I**ents\n\n[Walter Simoncini](https://walter.ashita.nl/)\u003csup\u003e1\u003c/sup\u003e, [Andrei Bursuc](https://abursuc.github.io/)\u003csup\u003e2\u003c/sup\u003e, [Spyros Gidaris](https://scholar.google.fr/citations?user=7atfg7EAAAAJ\u0026hl=en)\u003csup\u003e2\u003c/sup\u003e, [Yuki M. Asano](https://yukimasano.github.io/)\u003csup\u003e1\u003c/sup\u003e.\n\n1. [QUVA Lab](https://ivi.fnwi.uva.nl/quva/), University of Amsterdam.\n2. [valeo.ai](https://www.valeo.com/en/valeo-ai/), Paris, France.\n\nThis library implements our [No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations](https://fungi.ashita.nl/) paper. If you're looking for the code to replicate our experimental results please [click here](https://github.com/WalterSimoncini/no-train-all-gain).\n\nThe library allows you to extract **FUNGI**: **F**eatures from **UN**supervised **G**rad**I**ents from vision transformer backbones.\nThe FUNGI leverage the power of self-supervised losses to provide features that improve upon kNN-classification for images, text, audio and even semantic segmentation on images.\n\n## Getting Started\n\nYou can install the `fungivision` package using the following command. The package requires `Python 3.10`.\n\n```sh\npip install fungivision\n```\n\nWe provide a quick demo of the library in `demo.ipynb`, where we extract FUNGI features for the [Flowers102](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) dataset and a DINOv1 backbone. If you want to run the k-nearest neighbor classification evaluation make sure to also install `scikit-learn`!\n\n## Example Usage\n\nWe provide an easy to use `FUNGIWrapper` to extract gradient features from any transformer backbone. First, initialize a torch `dataset` that returns a `(PIL.Image, label)`. It's important that **NO** transformation is applied to the raw images, as each SSL objective must apply its own augmentation independently. Second, initialize a transformer encoder, e.g. you can initialize [DINO](https://arxiv.org/abs/2104.14294) ViT-B/16 as follows:\n\n```python\nmodel = torch.hub.load(\"facebookresearch/dino:main\", \"dino_vitb16\")\n```\n\nAfter that, you can wrap the model with `FUNGIWrapper`.\n\n```python\nimport torch\nimport torch.nn as nn\n\nfrom tqdm import tqdm\nfrom fungivision.wrapper import FUNGIWrapper\nfrom fungivision.config import KLConfig, DINOConfig, SimCLRConfig\n\n\n# Run the code on GPU if possible, or fallback on the CPU\ndevice = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n\n# Wrap the model using the FUNGI feature extractor\nfungi = FUNGIWrapper(\n    model=model,\n    # The target layer is a dot-separated path to a linear layer within the model. The path\n    # used here points to the attention output projection of the last transformer block.\n    target_layer=\"blocks.11.attn.proj\",\n    device=device,\n    # Using fp16 is ~2x faster, and the downstream performance is not affected.\n    use_fp16=True,\n    # The list of objectives for which the wrapper will extract gradient features.\n    # As we use three objectives, the output features will have 3 * E dimensions,\n    # where E is the dimensionality of the model embeddings. You can reduce the\n    # feature dimensionality via PCA to maintain an iso-storage/retrieval cost.\n    extractor_configs=[\n        KLConfig(),\n        DINOConfig(),\n        # You can configure the self-supervised objectives by passing arguments\n        # to their configuration objects. See each config dataclass in\n        # src/fungivision/config for more details.\n        SimCLRConfig(num_patches=4, stride_scale=6)\n    ]\n)\n\n# You must call setup before extracting FUNGI features, as some objectives may\n# require supporting data to compute the loss, e.g. the SimCLR negative batch\nfungi.setup(dataset=train_dataset)\n```\n\nOnce wrapper, you're ready to extract the gradient features!\n\n```python\n# Change as appropriate depending on your system\nbatch_size = 32\nnum_workers = 18\nfeatures = []\n\ndata_loader = DataLoader(\n    dataset,\n    batch_size=batch_size,\n    shuffle=False,\n    num_workers=num_workers,\n    # This makes sure each iteration returns a list of images and a list of targets,\n    # without the data loader creating a batch by itself, which may result in errors\n    # as images may have a different size\n    collate_fn=lambda batch: zip(*batch)\n)\n\nfor images, _ in tqdm(data_loader):\n    # The sub-components of each feature are already L2-normalized independently\n    features.append(wrapper(images).cpu().float())\n```\n\nBy default the wrapper does not extract the model embeddings, as each model requires its own inference transform. Assuming you've extracted them on your own, you can combine them with the gradient features as follows:\n\n```python\nembeddings = ...\nembeddings = nn.functional.normalize(embeddings, dim=-1, p=2)\n\n# Features are now [embeddings, KL gradients, DINO gradients, SimCLR gradients]\nfeatures = torch.cat([\n    embeddings,\n    features\n], dim=-1)\n```\n\n### Creating your own SSL objective\n\nYou can create your own gradient-feature extractor, and to do so you just need to write two classes: a subclass of `BaseGradientExtractor` and its configuration dataclasss. Assuming your loss works with a single view, such as our KL objective, you just need to implement two methods:\n\n```python\nimport torch\nimport torchvision.transforms.v2 as tf\n\nfrom torch.nn.functional import log_softmax, softmax, kl_div\nfrom fungivision.gradients.base_extractor import BaseGradientExtractor\n\n\nclass CustomGradientsExtractor(BaseGradientExtractor):\n    def input_transform(self, input_dim: int) -\u003e nn.Module:\n        # Implement the data augmentation to be applied to each input image.\n        # input_dim indicates the input dimensionality of the backbone.\n        return tf.Compose([...])\n\n    def compute_loss(self, latents: torch.Tensor, views_per_sample: int, **kwargs) -\u003e torch.Tensor:\n        # Given a batch of latent representations, compute the per-sample loss. It's\n        # extremely important that the computational graph for each individual input\n        # image is independent from the others, except for a final average of the\n        # individual losses. If this constraint is not respected the per-sample gradients\n        # will be contaminated by other batch items, and you will experience significant\n        # performance fluctuations as you change the batch size (up to 10-20-30%!).\n        #\n        # You can also test for this mistake by comparing the gradients of the same\n        # input sample when you forward it by itself and in a batch of 2 inputs. If\n        # the gradients are significantly different when you're testing on a CPU then\n        # the two batch items are probably interacting.\n        # \n        # NOTE: on a GPU device the gradients may be slighty different as you change\n        # the batch size even if you've done everything correctly, as modern GPUs pick\n        # the most appropriate algorithm automatically, even if you force their behavior\n        # to be deterministic.\n        #\n        # latents is a [B * V, E] tensor, where B is the batch size and V the number\n        # of views (i.e. views_per_sample). If your data augmentation generates multiple\n        # views per image you can reshape them in [B, V, E] using the following code:\n        #\n        # batch_size = latents.shape[0] // views_per_sample\n        # latents = latents.reshape(batch_size, views_per_sample, -1)\n\n        # In this function we implement our KL loss. Notice that the computational\n        # graph of batch items is only fused at the end via reduction = \"mean\"\n        latent_dim = latents.shape[1]\n\n        uniform = (torch.ones(latent_dim) / latent_dim).to(self.device)\n\n        softmax_uniform = softmax(uniform / self.temperature, dim=0)\n        softmax_uniform = softmax_uniform.unsqueeze(dim=0).repeat(latents.shape[0], 1)\n\n        softmax_latents = log_softmax(latents / self.temperature, dim=1)\n\n        # NOTE: Always use a mean reduction!\n        return kl_div(softmax_latents, softmax_uniform, reduction=\"mean\")\n```\n\nIf you accept custom configuration parameters, e.g. `self.temperature` in this case, you should also override the `__init__` method and add your parameters before the `**kwargs`. For more complex examples (that use multiple views per input image) see the DINO and SimCLR gradient extractors in `src/fungivision/gradients`. Once you've created your gradients extractor create a configuration dataclass as follows, which defines every user-customizable parameter for your extractor.\n\n```python\nfrom dataclasses import dataclass, asdict\n\nfrom .extractor import CustomGradientsExtractor\n\n\n@dataclass\nclass CustomConfig:\n    temperature: float = 1\n\n    def get_extractor(self, base_params: dict) -\u003e CustomGradientsExtractor:\n        # Create an instance of your feature extractor by merging the given\n        # base parameters (which are common to all extractors) and your custom\n        # parameters defined in this dataclass.\n        params = base_params | asdict(self)\n\n        return CustomGradientsExtractor(**params)\n```\n\nYou can then use your gradients extractor with `FUNGIWrapper`!\n\n```python\nfungi = FUNGIWrapper(\n    model=model,\n    target_layer=\"blocks.11.attn.proj\",\n    device=device,\n    use_fp16=True,\n    extractor_configs=[\n        CustomConfig(temperature=0.07)\n    ]\n)\n```\n\n## Related Repositories\n\nThe goal of this repository is providing an easy to use library for extracting FUNGI features from a vision transformer backbone. To reproduce the results shown in the paper please check out [this repository](https://github.com/WalterSimoncini/no-train-all-gain).\n\n## Reference\n\nIf you found our work useful please cite us as follows:\n\n```\n@inproceedings{simoncini2024fungi,\n    title={No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations},\n    author={Walter Simoncini and Spyros Gidaris and Andrei Bursuc and Yuki M. Asano},\n    booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},\n    year={2024},\n    url={https://openreview.net/forum?id=PRBsEz8rnV}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwaltersimoncini%2Ffungivision","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwaltersimoncini%2Ffungivision","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwaltersimoncini%2Ffungivision/lists"}