{"id":43493155,"url":"https://github.com/amathislab/wildclip","last_synced_at":"2026-02-03T10:17:39.267Z","repository":{"id":209690019,"uuid":"724717332","full_name":"amathislab/wildclip","owner":"amathislab","description":"Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models","archived":false,"fork":false,"pushed_at":"2024-03-08T15:40:34.000Z","size":5522,"stargazers_count":12,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-04-09T11:57:20.830Z","etag":null,"topics":["behavior","camera-trap","clip","computer-vision","computervision","visual-language-models"],"latest_commit_sha":null,"homepage":"https://amathislab.github.io/wildclip/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amathislab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-11-28T16:45:50.000Z","updated_at":"2024-04-05T09:47:07.000Z","dependencies_parsed_at":"2024-02-05T09:26:48.615Z","dependency_job_id":"a4882c87-6901-4ca5-9194-e158d2d03700","html_url":"https://github.com/amathislab/wildclip","commit_stats":null,"previous_names":["amathislab/wildclip"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/amathislab/wildclip","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amathislab%2Fwildclip","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amathislab%2Fwildclip/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amathislab%2Fwildclip/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amathislab%2Fwildclip/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amathislab","download_url":"https://codeload.github.com/amathislab/wildclip/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amathislab%2Fwildclip/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29041057,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-03T10:09:22.136Z","status":"ssl_error","status_checked_at":"2026-02-03T10:09:16.814Z","response_time":96,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["behavior","camera-trap","clip","computer-vision","computervision","visual-language-models"],"created_at":"2026-02-03T10:17:38.418Z","updated_at":"2026-02-03T10:17:39.251Z","avatar_url":"https://github.com/amathislab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Generic badge](https://img.shields.io/badge/Contributions-Welcome-brightgreen.svg)](CONTRIBUTING.md)\n\n\u003ch1 style=\"text-align: center;\"\u003eScene and animal attribute retrieval from camera trap data with domain-adapted vision-language models.\u003c/h1\u003e\n\n\u003ch2\u003eOverview\u003c/h2\u003e\n\nWildCLIP is an adapted CLIP model that allows to retrieve camera-trap events with natural language from the Snapshot Serengeti dataset. Our work was selected as an oral at CVPR CV4animal in Vancouver in June 2023. An extended version is available in [biorxiv](https://www.biorxiv.org/content/10.1101/2023.12.22.572990v1).\n\nThis project intends to demonstrate how vision-language models may assist the annotation process of camera-trap datasets. \n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://github.com/amathislab/wildclip/blob/main/resources/overview.png\" width=\"600\" alt=\"Overview\"\u003e\n\u003c/p\u003e\n\n\nWe actively seek to expand the training set. **Reach out if you want to collaborate (see information below)!** \n\nThis repository is currently primarily intended at machine-learning practitioners that wish to test, understand and contribute to WildCLIP.\n\nIf you find this code or ideas presented in our work useful, please cite:\n\n[WildCLIP: Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models](https://www.biorxiv.org/content/10.1101/2023.12.22.572990v1) \nby  Valentin Gabeff,  Marc Russwurm,  Devis Tuia and Alexander Mathis\n\n\u003ch2\u003eCode usage (setup, inference and training)\u003c/h2\u003e\n\nInterested in trying out? \n\nFirst install the necessary dependencies and download the models/data. \n\n\u003ch3\u003eCode requirements (installation) \u003c/h3\u003e\n\nRequired python packages are listed in the requirements.yml which can be used to build a conda environment.\n\n```\nconda env create --file environment.yml\nconda activate wildclip\npip install clip@git+https://github.com/openai/CLIP.git\n```\n\n\u003ch3\u003eData requirements\u003c/h3\u003e\n\n\nFor trying out the WildCLIP, please download the model and the data available [here](https://zenodo.org/records/10479317).\n\nFirst clone this repository, then copy the annotation files in the folder `/data`, and the images in the folder /dataset. (These paths can freely be changed in the command line and configuration files, respectively).\n\n\u003ch3\u003eModel inference\u003c/h3\u003e\n\nFrom the command line in the src folder:\n\n```\npython eval_clip.py \\\n-F config_files/wildclip_vitb16_t1.yml \\\n-I ../data/test_dataset_crops_single_animal_template_captions_T1T8T10.csv \\\n-C captions/serengeti_test_queries_T1T8T10_nattr_1.txt \\\n-M ../results/wildclip_vitb16_t1/wildclip_vitb16_t1_last_ckpt.pth\n```\n\n\u003ch3\u003eModel training\u003c/h3\u003e\n\nTo fine-tune CLIP with your own data:\n```\npython fine_tune_clip.py \\\n-F config_files/wildclip_vitb16_t1.yml \\\n-I ../data/train_dataset.csv \\\n-V ../data/val_dataset.csv\n```\n\nTo fine-tune CLIP with the VR-LwF loss:\n```\npython fine_tune_clip.py \\\n-F config_files/wildclip_vitb16_t1_lwf.yml \\\n-I ../data/train_dataset.csv \\\n-V ../data/val_dataset.csv \\\n--lwf_loss \\\n--path_replayed_vocabulary captions/serengeti_anchors.txt\n```\n\nTo fine-tune CLIP-adapter with few-shots from a pretrained CLIP model:\n```\npython fine_tune_clip.py \\\n-F config_files/wildclip_vitb16_t1_fs.yml \\\n-I ../data/few_shots_dataset.csv \\\n-M ../results/wildclip_vitb16_t1/wildclip_vitb16_t1_last_ckpt.pth \\\n--few_shots \\\n-K 1 \\\n--override_adapter\n```\n\n\u003ch2\u003eResults\u003c/h2\u003e\n\n_Qualitative performance of WildCLIP in comparison to CLIP on the Snapshot Serengeti test set_\n\n\u003cp align=\"left\"\u003e\n    \u003cimg src=\"https://github.com/amathislab/wildclip/blob/main/resources/quali_combinations.png\" width=\"700\" alt=\"Overview\"\u003e\n\u003c/p\u003e\n\nMore examples can be found in the Appendix of the associated publication.\n\nResults can be reproduced in the ```src/notebooks/ResultsEvaluation.ipynb``` notebook.\n\n\u003ch2\u003eData sources\u003c/h2\u003e\n\nThe complete Snapshot Serengeti dataset is available on [lila.science](https://lila.science/datasets/snapshot-serengeti).\n\nTheir data set is released under the [Community Data License Agreement (permissive variant)](https://cdla.dev/permissive-1-0/).\n\nThe data subset provided corresponds to the test data of Snapshot Serengeti containing single animals only and cropped according to MegaDetector output (MDv4 at the time of the study).\n\n\u003ch2\u003eTesting on your data\u003c/h2\u003e\n\nWe also provide code to easily test the original CLIP model given a folder of images and a list of text queries.\n\nThis will output a prediction CSV file of cosine similarities between each image and the provided queries where the results can be visualized with ```src/notebooks/PredictionVisualization.ipynb```:\n```\npython predict_clip.py -I /path/to/image_folder -C path/to/queries.txt -O /path/to/output_folder --zero-shot-clip\n```\n\nSince ```src/eval_clip.py``` requires true labels of the test images to run, ```src/predict_clip.py``` can also be used to run predictions with a fine-tuned wildclip model on a new set of images:\n```\npython predict_clip.py -I /path/to/image_folder -C path/to/queries.txt -O /path/to/output_folder -M wildclip_vitb16_t1_lwf_last_ckpt.pth\n```\n\n\u003ch2\u003eFuture directions\u003c/h2\u003e\n\nThis project proposes the development of ecology-specific vision-language models to facilitate the annotation of wildlife data. Our work showed a proof a principle.\n\nIn principle, to be more usable to the ecological community, WildCLIP will benefit from multiple improvements:\n- Extend WildCLIP to more geographical regions, species and behaviors. Currently, there are not many camera trap datasets with behavioral attributes. Reach out, if you want to collaborate\n- Improve fine-tuning of WildCLIP from few images and captions to adapt to a task of interest.\n- Improve open-vocabulary capabilities of WildCLIP.\n- Integrate WildCLIP and adaptation functions in a graphical interface to facilate annotation process of camera trap datasets.\n\n\u003ch2\u003eContributing\u003c/h2\u003e\n\nIf you are interested in contributing to one of the aforementioned points, or work on a similar project and wish to collaborate, please reach out to [ECEO](https://www.epfl.ch/labs/eceo) or to the [Mathis Group](https://www.mathislab.org) at EPFL.\n\nFor code related contributions, suggestions or inquires, please open a github issue. Code is still under active development.\n\nIf you use this code in your research, please consider citing us:\n\n```\n@article{gabeff2023wildclip,\n  title={WildCLIP: Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models},\n  author={Gabeff, Valentin and Russwurm, Marc and Tuia, Devis and Mathis, Alexander},\n  journal={bioRxiv},\n  pages={2023--12},\n  year={2023},\n  publisher={Cold Spring Harbor Laboratory}\n}\n```\n\n\u003ch2\u003eCode acknowledgments\u003c/h2\u003e\n\nWe acknowledge the following code repositories that helped to create WildCLIP:  \n\nhttps://github.com/openai/CLIP  \nhttps://github.com/mlfoundations/open_clip  \nhttps://github.com/gaopengcuhk/CLIP-Adapter/  \nhttps://github.com/locuslab/FLYP/  \n\nand the following article for the WildCLIP-LwF variant:\nhttps://arxiv.org/pdf/2207.09248.pdf\n\nThank you! Sources are mentioned in the relevant code sections. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famathislab%2Fwildclip","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famathislab%2Fwildclip","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famathislab%2Fwildclip/lists"}