{"id":30412037,"url":"https://github.com/facebookresearch/dinov3","last_synced_at":"2025-08-22T02:03:18.928Z","repository":{"id":309949883,"uuid":"1033896376","full_name":"facebookresearch/dinov3","owner":"facebookresearch","description":"Reference PyTorch implementation and models for DINOv3","archived":false,"fork":false,"pushed_at":"2025-08-14T17:58:28.000Z","size":9996,"stargazers_count":379,"open_issues_count":1,"forks_count":14,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-08-14T19:35:43.746Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/facebookresearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-07T14:11:34.000Z","updated_at":"2025-08-14T19:35:33.000Z","dependencies_parsed_at":"2025-08-14T19:46:55.292Z","dependency_job_id":null,"html_url":"https://github.com/facebookresearch/dinov3","commit_stats":null,"previous_names":["facebookresearch/dinov3"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/facebookresearch/dinov3","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fdinov3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fdinov3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fdinov3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fdinov3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/facebookresearch","download_url":"https://codeload.github.com/facebookresearch/dinov3/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/facebookresearch%2Fdinov3/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271574428,"owners_count":24783319,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-22T02:00:08.480Z","response_time":65,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-22T02:01:17.676Z","updated_at":"2025-08-22T02:03:18.912Z","avatar_url":"https://github.com/facebookresearch.png","language":"Jupyter Notebook","funding_links":[],"categories":["Repos","Jupyter Notebook","Paper List","对象检测_分割"],"sub_categories":["Seminal Papers","资源传输下载"],"readme":"🆕 [2025-08-14] :fire: DINOv3 backbones are now available in [Hugging Face Hub](https://huggingface.co/collections/facebook/dinov3-68924841bd6b561778e31009) and [supported](https://huggingface.co/docs/transformers/model_doc/dinov3) by the Hugging Face [Transformers](https://huggingface.co/docs/transformers/index) library\n\n# DINOv3 🦖🦖🦖\n\n**[Meta AI Research, FAIR](https://ai.meta.com/research/)**\n\nOriane Siméoni, Huy V. Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, \u003cbr/\u003e\nCijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, \u003cbr/\u003e\nFrancisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, \u003cbr/\u003e\nTimothée Darcet, Théo Moutakanni, Leonel Sentana, Claire Roberts, \u003cbr/\u003e\nAndrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie, \u003cbr/\u003e\nJulien Mairal, Hervé Jégou, Patrick Labatut, Piotr Bojanowski\n\n[ :scroll: [`Paper`](https://arxiv.org/abs/2508.10104)] [ :newspaper: [`Blog`](https://ai.meta.com/blog/dinov3-self-supervised-vision-model/)] [ :globe_with_meridians: [`Website`](https://ai.meta.com/dinov3/)] [ :book: [`BibTeX`](#citing-dinov3)]\n\nReference PyTorch implementation and models for DINOv3. For details, see the **[DINOv3](https://arxiv.org/abs/2508.10104)** paper.\n\n## Overview\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"1364\" height=\"1024\" alt=\"market\" src=\"https://github.com/user-attachments/assets/1411f491-988e-49cb-95ae-d03fe6e3c268\" /\u003e\n\n  \u003ci\u003e\u003c/em\u003e\u003cb\u003eHigh-resolution dense features.\u003c/b\u003e\u003cbr/\u003eWe visualize the cosine similarity maps obtained with DINOv3 output features\u003cbr/\u003e between the patches marked with a red cross and all other patches.\u003c/i\u003e\n\u003c/div\u003e\n\n\u003cbr/\u003e\n\nAn extended family of versatile vision foundation models producing high-quality dense features and achieving outstanding performance on various vision tasks including outperforming the specialized state of the art across a broad range of settings, without fine-tuning\n\n## Pretrained models\n\n:information_source: Please follow the link provided below to get access to all the model weights: once accepted, an e-mail will be sent with the complete list of URLs pointing to all the available model weights (both backbones and adapters). These URLs can then be used to either:\n- download the model or adapter weights to a local filesystem and point `torch.hub.load()` to these local weights via the `weights` or `backbone_weights` parameters, or\n- directly invoke `torch.hub.load()` to download and load a backbone or an adapter from its URL via also the `weights` or `backbone_weights` parameters.\n\nSee the example code snippets below.\n\n:warning: Please use `wget` instead of a web browser to download the weights.\n\nViT models pretrained on web dataset (LVD-1689M):\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eModel\u003c/th\u003e\n      \u003cth\u003eParameters\u003c/th\u003e\n      \u003cth\u003ePretraining\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eDownload\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-S/16 distilled \u003c/td\u003e\n      \u003ctd align=\"right\"\u003e21M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-S+/16 distilled\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e29M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-B/16 distilled\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e86M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-L/16 distilled\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e300M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-H+/16 distilled\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e840M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-7B/16\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e6,716M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nConvNeXt models pretrained on web dataset (LVD-1689M):\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eModel\u003c/th\u003e\n      \u003cth\u003eParameters\u003c/th\u003e\n      \u003cth\u003ePretraining\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eDownload\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eConvNeXt Tiny\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e29M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eConvNeXt Small\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e50M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eConvNeXt Base\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e89M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eConvNeXt Large\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e198M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nViT models pretrained on satellite dataset (SAT-493M):\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eModel\u003c/th\u003e\n      \u003cth\u003eParameters\u003c/th\u003e\n      \u003cth\u003ePretraining\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eDownload\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-L/16 distilled\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e300M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eSAT-493M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-7B/16\u003c/td\u003e\n      \u003ctd align=\"right\"\u003e6,716M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eSAT-493M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n\n### Pretrained backbones (via PyTorch [Hub](https://docs.pytorch.org/docs/stable/hub.html))\n\nPlease follow the instructions [here](https://pytorch.org/get-started/locally/) to install PyTorch (the only required dependency for loading the model). Installing PyTorch with CUDA support is strongly recommended.\n\n```python\nimport torch\n\nREPO_DIR = \u003cPATH/TO/A/LOCAL/DIRECTORY/WHERE/THE/DINOV3/REPO/WAS/CLONED\u003e\n\n# DINOv3 ViT models pretrained on web images\ndinov3_vits16 = torch.hub.load(REPO_DIR, 'dinov3_vits16', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\ndinov3_vits16plus = torch.hub.load(REPO_DIR, 'dinov3_vits16plus', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\ndinov3_vitb16 = torch.hub.load(REPO_DIR, 'dinov3_vitb16', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\ndinov3_vitl16 = torch.hub.load(REPO_DIR, 'dinov3_vitl16', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\ndinov3_vith16plus = torch.hub.load(REPO_DIR, 'dinov3_vith16plus', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\ndinov3_vit7b16 = torch.hub.load(REPO_DIR, 'dinov3_vit7b16', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\n\n# DINOv3 ConvNeXt models pretrained on web images\ndinov3_convnext_tiny = torch.hub.load(REPO_DIR, 'dinov3_convnext_tiny', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\ndinov3_convnext_small = torch.hub.load(REPO_DIR, 'dinov3_convnext_small', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\ndinov3_convnext_base = torch.hub.load(REPO_DIR, 'dinov3_convnext_base', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\ndinov3_convnext_large = torch.hub.load(REPO_DIR, 'dinov3_convnext_large', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\n\n# DINOv3 ViT models pretrained on satellite imagery\ndinov3_vitl16 = torch.hub.load(REPO_DIR, 'dinov3_vitl16', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\ndinov3_vit7b16 = torch.hub.load(REPO_DIR, 'dinov3_vit7b16', source='local', weights=\u003cCHECKPOINT/URL/OR/PATH\u003e)\n```\n\n### Pretrained backbones (via Hugging Face [Transformers](https://huggingface.co/docs/transformers/))\n\nAll the backbones are available in the the [DINOv3](https://huggingface.co/collections/facebook/dinov3-68924841bd6b561778e31009) collection on Hugging Face Hub and supported via the Hugging Face [Transformers](https://huggingface.co/docs/transformers/index) library. Please refer to the corresponding documentation for usage, but below is a short example that demonstrates how to obtain an image embedding with either [Pipeline] or the [AutoModel] class.\n\n```python\nfrom transformers import pipeline\nfrom transformers.image_utils import load_image\n\nurl = \"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg\"\nimage = load_image(url)\n\nfeature_extractor = pipeline(\n    model=\"facebook/dinov3-convnext-tiny-pretrain-lvd1689m\",\n    task=\"image-feature-extraction\", \n)\nfeatures = feature_extractor(image)\n```\n\n```python\nimport torch\nfrom transformers import AutoImageProcessor, AutoModel\nfrom transformers.image_utils import load_image\n\nurl = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\nimage = load_image(url)\n\npretrained_model_name = \"facebook/dinov3-convnext-tiny-pretrain-lvd1689m\"\nprocessor = AutoImageProcessor.from_pretrained(pretrained_model_name)\nmodel = AutoModel.from_pretrained(\n    pretrained_model_name, \n    device_map=\"auto\", \n)\n\ninputs = processor(images=image, return_tensors=\"pt\").to(model.device)\nwith torch.inference_mode():\n    outputs = model(**inputs)\n\npooled_output = outputs.pooler_output\nprint(\"Pooled output shape:\", pooled_output.shape)\n```\n\nwhere `model` and `pretrained_model_name` above can be one of:\n- `facebook/dinov3-vits16-pretrain-lvd1689m`\n- `facebook/dinov3-vits16plus-pretrain-lvd1689m`\n- `facebook/dinov3-vitb16-pretrain-lvd1689m`\n- `facebook/dinov3-vitl16-pretrain-lvd1689m`\n- `facebook/dinov3-vith16plus-pretrain-lvd1689m`\n- `facebook/dinov3-vit7b16-pretrain-lvd1689m`\n- `facebook/dinov3-convnext-base-pretrain-lvd1689m`\n- `facebook/dinov3-convnext-large-pretrain-lvd1689m`\n- `facebook/dinov3-convnext-small-pretrain-lvd1689m`\n- `facebook/dinov3-convnext-tiny-pretrain-lvd1689m`\n- `facebook/dinov3-vitl16-pretrain-sat493m`\n- `facebook/dinov3-vit7b16-pretrain-sat493m`\n\n### Image transforms\n\nFor models using the LVD-1689M weights (pretrained on web images), please use the following transform (standard ImageNet evaluation transform):\n\n```python\nimport torchvision\n\ndef make_transform(resize_size: int = 224):\n    to_tensor = transforms.ToTensor()\n    resize = transforms.Resize((resize_size, resize_size), antialias=True)\n    normalize = transforms.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225),\n    )\n    return transforms.Compose([to_tensor, resize, normalize])\n```\n\n\nFor models using the SAT-493M weights (pretrained on satellite imagery), please use the following transform:\n\n\n```python\nimport torchvision\n\ndef make_transform(resize_size: int = 224):\n    to_tensor = transforms.ToTensor()\n    resize = transforms.Resize((resize_size, resize_size), antialias=True)\n    normalize = transforms.Normalize(\n        mean=(0.430, 0.411, 0.296),\n        std=(0.213, 0.156, 0.143),\n    )\n    return transforms.Compose([to_tensor, resize, normalize])\n```\n\n### Pretrained heads - Image classification\n\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eBackbone\u003c/th\u003e\n      \u003cth\u003ePretraining\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eHead\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eDownload\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-7B/16\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eImageNet\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n\nThe (full) classifier models can be loaded via PyTorch Hub:\n\n```python\nimport torch\n\n# DINOv3\ndinov3_vit7b16_lc = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_lc', source=\"local\", weights=\u003cDEPTHER/CHECKPOINT/URL/OR/PATH\u003e, backbone_weights=\u003cBACKBONE/CHECKPOINT/URL/OR/PATH\u003e)\n\n```\n\n### Pretrained heads - Depther trained on SYNTHMIX dataset\n\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eBackbone\u003c/th\u003e\n      \u003cth\u003ePretraining\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eHead\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eDownload\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-7B/16\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eSYNTHMIX\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n\n```python\ndepther = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_dd', source=\"local\", weights=\u003cDEPTHER/CHECKPOINT/URL/OR/PATH\u003e, backbone_weights=\u003cBACKBONE/CHECKPOINT/URL/OR/PATH\u003e)\n```\n\nFull example code of depther on an image\n\n```python\nfrom PIL import Image\nimport torch\nfrom torchvision import transforms\nimport matplotlib.pyplot as plt\nfrom matplotlib import colormaps\n\ndef get_img():\n    import requests\n    url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n    image = Image.open(requests.get(url, stream=True).raw).convert(\"RGB\")\n    return image\n\ndef make_transform(resize_size: int | list[int] = 768):\n    to_tensor = transforms.ToTensor()\n    resize = transforms.Resize((resize_size, resize_size), antialias=True)\n    normalize = transforms.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225),\n    )\n    return transforms.Compose([to_tensor, resize, normalize])\n\ndepther = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_dd', source=\"local\", weights=\u003cDEPTHER/CHECKPOINT/URL/OR/PATH\u003e, backbone_weights=\u003cBACKBONE/CHECKPOINT/URL/OR/PATH\u003e)\n\nimg_size = 1024\nimg = get_img()\ntransform = make_transform(img_size)\nwith torch.inference_mode():\n    with torch.autocast('cuda', dtype=torch.bfloat16):\n        batch_img = transform(img)[None]\n        batch_img = batch_img\n        depths = depther(batch_img)\n\nplt.figure(figsize=(12, 6))\nplt.subplot(121)\nplt.imshow(img)\nplt.axis(\"off\")\nplt.subplot(122)\nplt.imshow(depths[0,0].cpu(), cmap=colormaps[\"Spectral\"])\nplt.axis(\"off\")\n\n```\n\n### Pretrained heads - Detector trained on COCO2017 dataset\n\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eBackbone\u003c/th\u003e\n      \u003cth\u003ePretraining\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eHead\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eDownload\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-7B/16\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eCOCO2017\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n\n```python\ndetector = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_de', source=\"local\", weights=\u003cDETECTOR/CHECKPOINT/URL/OR/PATH\u003e, backbone_weights=\u003cBACKBONE/CHECKPOINT/URL/OR/PATH\u003e)\n```\n\n### Pretrained heads - Segmentor trained on ADE20K dataset\n\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eBackbone\u003c/th\u003e\n      \u003cth\u003ePretraining\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eHead\u003cbr/\u003eDataset\u003c/th\u003e\n      \u003cth\u003eDownload\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-7B/16\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eLVD-1689M\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eADE20K\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n```python\nsegmentor = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_ms', source=\"local\", weights=\u003cSEGMENTOR/CHECKPOINT/URL/OR/PATH\u003e, backbone_weights=\u003cBACKBONE/CHECKPOINT/URL/OR/PATH\u003e)\n```\n\nFull example code of segmentator on an image\n\n```python\nimport sys\nsys.path.append(REPO_DIR)\n\nfrom PIL import Image\nimport torch\nfrom torchvision import transforms\nimport matplotlib.pyplot as plt\nfrom matplotlib import colormaps\nfrom functools import partial\nfrom dinov3.eval.segmentation.inference import make_inference\n\n\ndef get_img():\n    import requests\n    url = \"http://images.cocodataset.org/val2017/000000039769.jpg\"\n    image = Image.open(requests.get(url, stream=True).raw).convert(\"RGB\")\n    return image\n\ndef make_transform(resize_size: int | list[int] = 768):\n    to_tensor = transforms.ToTensor()\n    resize = transforms.Resize((resize_size, resize_size), antialias=True)\n    normalize = transforms.Normalize(\n        mean=(0.485, 0.456, 0.406),\n        std=(0.229, 0.224, 0.225),\n    )\n    return transforms.Compose([to_tensor, resize, normalize])\n\nsegmentor = torch.hub.load(REPO_DIR, 'dinov3_vit7b16_ms', source=\"local\", weights=\u003cSEGMENTOR/CHECKPOINT/URL/OR/PATH\u003e, backbone_weights=\u003cBACKBONE/CHECKPOINT/URL/OR/PATH\u003e)\n\nimg_size = 896\nimg  = get_img()\ntransform = make_transform(img_size)\nwith torch.inference_mode():\n    with torch.autocast('cuda', dtype=torch.bfloat16):\n        batch_img = transform(img)[None]\n        pred_vit7b = segmentor(batch_img)  # raw predictions  \n        # actual segmentation map\n        segmentation_map_vit7b = make_inference(\n            batch_img,\n            segmentor,\n            inference_mode=\"slide\",\n            decoder_head_type=\"m2f\",\n            rescale_to=(img.size[-1], img.size[-2]),\n            n_output_channels=150,\n            crop_size=(img_size, img_size),\n            stride=(img_size, img_size),\n            output_activation=partial(torch.nn.functional.softmax, dim=1),\n        ).argmax(dim=1, keepdim=True)\nplt.figure(figsize=(12, 6))\nplt.subplot(121)\nplt.imshow(img)\nplt.axis(\"off\")\nplt.subplot(122)\nplt.imshow(segmentation_map_vit7b[0,0].cpu(), cmap=colormaps[\"Spectral\"])\nplt.axis(\"off\")\n```\n\n\n\n\n### Pretrained heads - Zero-shot tasks with `dino.txt`\n\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth rowspan=\"2\"\u003eBackbone\u003c/th\u003e\n      \u003cth\u003eDownload\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eViT-L/16 distilled\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\n        \u003ca href=\"https://ai.meta.com/resources/models-and-libraries/dinov3-downloads/\"\u003e[link]\u003c/a\u003e,\n        \u003ca href=\"https://dl.fbaipublicfiles.com/dinov3/thirdparty/bpe_simple_vocab_16e6.txt.gz\"\u003evocabulary\u003c/a\u003e,\n        \u003ca href=\"https://dl.fbaipublicfiles.com/dinov2/thirdparty/LICENSE\"\u003evocabulary license\u003c/a\u003e\n      \u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\nThe (full) dino.txt model can be loaded via PyTorch Hub:\n\n```python\nimport torch\n# DINOv3\ndinov3_vitl16_dinotxt_tet1280d20h24l, tokenizer = torch.hub.load(REPO_DIR, 'dinov3_vitl16_dinotxt_tet1280d20h24l', weights=\u003cSEGMENTOR/CHECKPOINT/URL/OR/PATH\u003e, backbone_weights=\u003cBACKBONE/CHECKPOINT/URL/OR/PATH\u003e)\n```\n\n\n## Installation\n\nThe training and evaluation code requires PyTorch version \u003e= 2.7.1 as well as a few other 3rd party packages. Note that the code has only been tested with the specified versions and also expects a Linux environment. To setup all the required dependencies for training and evaluation, please follow the instructions below:\n\n*[micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html)* **(Recommended)** - Clone the repository and then create and activate a `dinov3` conda environment using the provided environment definition:\n\n```shell\nmicromamba env create -f conda.yaml\nmicromamba activate dinov3\n```\n\n## Getting started\n\nSeveral notebooks are provided to get started applying DINOv3:\n- [PCA of patch features](notebooks/pca.ipynb): display the PCA of DINOv3 patch features on a foreground object (rainbow visualizations from the paper) [[Run in Google Colab]](https://colab.research.google.com/github/facebookresearch/dinov3/blob/main/notebooks/pca.ipynb)\n- [Foreground segmentation](notebooks/foreground_segmentation.ipynb): train a linear foreground segmentation model based on DINOv3 features [[Run in Google Colab]](https://colab.research.google.com/github/facebookresearch/dinov3/blob/main/notebooks/foreground_segmentation.ipynb)\n- [Dense and sparse matching](notebooks/dense_sparse_matching.ipynb): match patches from objects on two different images based on DINOv3 features [[Run in Google Colab]](https://colab.research.google.com/github/facebookresearch/dinov3/blob/main/notebooks/dense_sparse_matching.ipynb)\n- [Segmentation tracking](notebooks/segmentation_tracking.ipynb): video segmentation tracking using a non-parametric method based on DINOv3 features [[Run in Google Colab]](https://colab.research.google.com/github/facebookresearch/dinov3/blob/main/notebooks/segmentation_tracking.ipynb)\n\n## Data preparation\n\n### ImageNet-1k\n\nThe root directory of the dataset should hold the following contents:\n\n- `\u003cROOT\u003e/test/ILSVRC2012_test_00000001.JPEG`\n- `\u003cROOT\u003e/test/[..]`\n- `\u003cROOT\u003e/test/ILSVRC2012_test_00100000.JPEG`\n- `\u003cROOT\u003e/train/n01440764/n01440764_10026.JPEG`\n- `\u003cROOT\u003e/train/[...]`\n- `\u003cROOT\u003e/train/n15075141/n15075141_9993.JPEG`\n- `\u003cROOT\u003e/val/n01440764/ILSVRC2012_val_00000293.JPEG`\n- `\u003cROOT\u003e/val/[...]`\n- `\u003cROOT\u003e/val/n15075141/ILSVRC2012_val_00049174.JPEG`\n- `\u003cROOT\u003e/labels.txt`\n\nThe provided dataset implementation expects a few additional metadata files to be present under the extra directory:\n\n- `\u003cEXTRA\u003e/class-ids-TRAIN.npy`\n- `\u003cEXTRA\u003e/class-ids-VAL.npy`\n- `\u003cEXTRA\u003e/class-names-TRAIN.npy`\n- `\u003cEXTRA\u003e/class-names-VAL.npy`\n- `\u003cEXTRA\u003e/entries-TEST.npy`\n- `\u003cEXTRA\u003e/entries-TRAIN.npy`\n- `\u003cEXTRA\u003e/entries-VAL.npy`\n\nThese metadata files can be generated (once) with the following lines of Python code:\n\n```python\nfrom dinov3.data.datasets import ImageNet\n\nfor split in ImageNet.Split:\n    dataset = ImageNet(split=split, root=\"\u003cROOT\u003e\", extra=\"\u003cEXTRA\u003e\")\n    dataset.dump_extra()\n```\n\nNote that the root and extra directories do not have to be distinct directories.\n\n### ImageNet-22k\n\nPlease adapt the [dataset class](dinov3/data/datasets/image_net_22k.py) to match your local setup.\n\n\u003cbr /\u003e\n\n:warning: To execute the commands provided in the next sections for training and evaluation, the `dinov3` package should be included in the Python module search path, i.e. simply prefix the command to run with `PYTHONPATH=.`.\n\n## Training\n\n### Fast setup: training DINOv3 ViT-L/16 on ImageNet-1k\n\nRun DINOv3 pre-training on 4 H100-80GB nodes (32 GPUs) in a SLURM cluster environment with submitit:\n\n```shell\n PYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \\\n  --nodes 4 \\\n  --config-file dinov3/configs/train/vitl_im1k_lin834.yaml \\\n  --output-dir \u003cPATH/TO/OUTPUT/DIR\u003e \\\n  train.dataset_path=ImageNet22k:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e\n```\nTraining time is approximately 14 hours and the resulting checkpoint should reach 82.0% on k-NN eval and 83.5% on linear eval.\n\nThe training code saves the weights of the teacher in the eval folder every 12500 iterations for evaluation.\n\n### Exact DINOv3 setup: training DINOv3 ViT-7B/16\n\nDINOv3 ViT-7B/16 is trained on a private dataset. The training involves 3 stages:\n- Pretraining\n- Gram anchoring\n- High resolution adaptation\n\n#### Pretraining\n\nLaunch DINOV3 ViT-7B/16 pretraining on 32 nodes (256 GPUs) in a SLURM cluster environment with submitit.\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \\\n  --nodes 32 \\\n  --config-file dinov3/configs/train/dinov3_vit7b16_pretrain.yaml \\\n  --output-dir \u003cPATH/TO/OUTPUT/DIR\u003e \\\n  train.dataset_path=\u003cDATASET\u003e:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e\n```\n\n#### Gram anchoring\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \\\n  --nodes 32 \\\n  --config-file dinov3/configs/train/dinov3_vit7b16_gram_anchor.yaml \\\n  --output-dir \u003cPATH/TO/OUTPUT/DIR\u003e \\\n  train.dataset_path=\u003cDATASET\u003e:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e \\\n  gram.ckpt=\u003cPATH/TO/GRAM_TEACHER_FROM_PREVIOUS_STEP\u003e   \n```\n\n#### High-resolution adaptation\n\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \\\n  --nodes 32 \\\n  --config-file dinov3/configs/train/dinov3_vit7b16_high_res_adapt.yaml \\\n  --output-dir \u003cPATH/TO/OUTPUT/DIR\u003e \\\n  train.dataset_path=\u003cDATASET\u003e:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e \\\n  gram.ckpt=\u003cPATH/TO/TEACHER_FROM_GRAM\u003e \\\n  student.resume_from_teacher_chkpt=\u003cPATH/TO/TEACHER_FROM_GRAM\u003e\n```\n\n## Multi-distillation \n\n### Test setup:\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/train/train.py \\\n  --nodes 1 \\\n  --config-file dinov3/configs/train/multi_distillation_test.yaml \\\n  --output-dir \u003cPATH/TO/OUTPUT/DIR\u003e \\\n  --multi-distillation \\\n  train.dataset_path=\u003cDATASET\u003e:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e\n```\n\n## Evaluation\n\nThe training code regularly saves the teacher weights. In order to evaluate the model, run the following evaluation on a single node:\n\n\n### Logistic regression classification on ImageNet-1k\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/eval/log_regression.py \\\n  model.config_file=\u003cPATH/TO/OUTPUT/DIR\u003e/config.yaml \\\n  model.pretrained_weights=\u003cPATH/TO/OUTPUT/DIR\u003e/teacher_checkpoint.pth \\\n  output_dir=\u003cPATH/TO/OUTPUT/DIR\u003e \\\n  train.dataset=ImageNet:split=TRAIN:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e \\\n  eval.test_dataset=ImageNet:split=VAL:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e\n```\n\n### k-NN classification on ImageNet-1k\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/eval/knn.py \\\n  model.config_file=\u003cPATH/TO/OUTPUT/DIR\u003e/config.yaml \\\n  model.pretrained_weights=\u003cPATH/TO/OUTPUT/DIR\u003e/teacher_checkpoint.pth \\\n  output_dir=\u003cPATH/TO/OUTPUT/DIR\u003e \\\n  train.dataset=ImageNet:split=TRAIN:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e \\\n  eval.test_dataset=ImageNet:split=VAL:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e\n```\n\n### Linear classification with data augmentation on ImageNet-1k\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/eval/linear.py \\\n  model.config_file=\u003cPATH/TO/OUTPUT/DIR\u003e/config.yaml \\\n  model.pretrained_weights=\u003cPATH/TO/OUTPUT/DIR\u003e/teacher_checkpoint.pth \\\n  output_dir=\u003cPATH/TO/OUTPUT/DIR\u003e \\\n  train.dataset=ImageNet:split=TRAIN:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e \\\n  train.val_dataset=ImageNet:split=VAL:root=\u003cPATH/TO/DATASET\u003e:extra=\u003cPATH/TO/DATASET\u003e\n```\n\n\n### Text alignment on DINOv3 using dino.txt\n\nText alignment can be done following the method from `dino.txt` aka [DINOv2 Meets Text](https://arxiv.org/abs/2412.16334).\n\n```shell\nPYTHONPATH=${PWD} python -m dinov3.run.submit dinov3/eval/text/train_dinotxt.py \\\n   --nodes 4 \\\n  # An example config for text alignment is here: dinov3/eval/text/configs/dinov3_vitl_text.yaml \\ \n  trainer_config_file=\"\u003cPATH/TO/DINOv3/TEXT/CONFIG\u003e\" \\\n  output-dir=\u003cPATH/TO/OUTPUT/DIR\u003e\n```\nLaunching the above trains text alignment on 4 nodes with 8 gpus each (32 gpus in total).\nPlease note that the text alignment model in the DINOv3 paper was trained on a private dataset and here we have given an example config in ```dinov3/eval/text/configs/dinov3_vitl_text.yaml``` using ```CocoCaptions``` dataset for illustration purposes.\nPlease adapt the provided ```CocoCaptions``` dataset class, the dataset can be found [here](https://www.kaggle.com/datasets/nikhil7280/coco-image-caption)  \n\n## License\n\nDINOv3 code and model weights are released under the DINOv3 License. See [LICENSE.md](LICENSE.md) for additional details.\n\n## Contributing\n\nSee [contributing](CONTRIBUTING.md) and the [code of conduct](CODE_OF_CONDUCT.md).\n\n## Citing DINOv3\n\nIf you find this repository useful, please consider giving a star :star: and citation :t-rex::\n\n```\n@misc{simeoni2025dinov3,\n  title={{DINOv3}},\n  author={Sim{\\'e}oni, Oriane and Vo, Huy V. and Seitzer, Maximilian and Baldassarre, Federico and Oquab, Maxime and Jose, Cijo and Khalidov, Vasil and Szafraniec, Marc and Yi, Seungeun and Ramamonjisoa, Micha{\\\"e}l and Massa, Francisco and Haziza, Daniel and Wehrstedt, Luca and Wang, Jianyuan and Darcet, Timoth{\\'e}e and Moutakanni, Th{\\'e}o and Sentana, Leonel and Roberts, Claire and Vedaldi, Andrea and Tolan, Jamie and Brandt, John and Couprie, Camille and Mairal, Julien and J{\\'e}gou, Herv{\\'e} and Labatut, Patrick and Bojanowski, Piotr},\n  year={2025},\n  eprint={2508.10104},\n  archivePrefix={arXiv},\n  primaryClass={cs.CV},\n  url={https://arxiv.org/abs/2508.10104},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Fdinov3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffacebookresearch%2Fdinov3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffacebookresearch%2Fdinov3/lists"}