{"id":15037028,"url":"https://github.com/apple/ml-aim","last_synced_at":"2025-05-14T21:10:58.030Z","repository":{"id":217611686,"uuid":"742567082","full_name":"apple/ml-aim","owner":"apple","description":"This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.","archived":false,"fork":false,"pushed_at":"2025-04-23T09:07:07.000Z","size":816,"stargazers_count":1275,"open_issues_count":18,"forks_count":59,"subscribers_count":26,"default_branch":"main","last_synced_at":"2025-05-03T20:02:42.369Z","etag":null,"topics":["jax","large-scale-vision-models","mlx","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apple.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-12T19:07:45.000Z","updated_at":"2025-05-02T18:44:00.000Z","dependencies_parsed_at":"2025-01-15T14:06:38.129Z","dependency_job_id":"e7c91481-0aef-41b9-9c58-f0f40a75e8d5","html_url":"https://github.com/apple/ml-aim","commit_stats":null,"previous_names":["apple/ml-aim"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-aim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-aim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-aim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-aim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apple","download_url":"https://codeload.github.com/apple/ml-aim/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254227631,"owners_count":22035671,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["jax","large-scale-vision-models","mlx","pytorch"],"created_at":"2024-09-24T20:33:06.379Z","updated_at":"2025-05-14T21:10:53.010Z","avatar_url":"https://github.com/apple.png","language":"Python","funding_links":[],"categories":["Apps, UI \u0026 Tooling"],"sub_categories":[],"readme":"# Autoregressive Pre-training of Large Vision Encoders\n\u003cdiv\u003e\n\u003ca href=\"https://arxiv.org/abs/2411.14402\" target=\"_blank\"\u003e\u003cimg alt=\"AIMv2 arXiv\" src=\"https://img.shields.io/badge/arXiv-AIMv2-red?logo=arxiv\"/\u003e\u003c/a\u003e\n\u003ca href=\"#aimv2-model-gallery\"\u003e\u003cimg alt=\"AIMv2 model gallery\" src=\"https://img.shields.io/badge/model_gallery-AIMv2-blue\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://arxiv.org/abs/2401.08541\" target=\"_blank\"\u003e\u003cimg alt=\"AIMv1 arXiv\" src=\"https://img.shields.io/badge/arXiv-AIMv1-red?logo=arxiv\"/\u003e\u003c/a\u003e\n\u003ca href=\"aim-v1/README.md#pre-trained-backbones\"\u003e \u003cimg alt=\"AIMv1 model gallery\" src=\"https://img.shields.io/badge/model_gallery-AIMv1-blue\"/\u003e\u003c/a\u003e\n\u003c/div\u003e\n\nThis repository is the entry point for all things AIM, a family of autoregressive models that push the boundaries of\nvisual and multimodal learning:\n\n- **AIMv2**: [`Multimodal Autoregressive Pre-training of Large Vision Encoders`](https://arxiv.org/abs/2411.14402)  [[`BibTeX`](#citation)]\n  \u003cbr\u003e\n  Enrico Fini*, Mustafa Shukor*, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju,\n  Victor Guilherme Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander T Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang,\n  Joshua M. Susskind, and Alaaeldin El-Nouby*\n- **AIMv1**: [`Scalable Pre-training of Large Autoregressive Image Models`](https://arxiv.org/abs/2401.08541) [[`BibTeX`](#citation)]\u003cbr\u003e\n  Alaaeldin El-Nouby, Michal Klein, Shuangfei Zhai, Miguel Angel Bautista, Alexander Toshev, Vaishaal Shankar,\n  Joshua M Susskind, Armand Joulin.\n\n*: Equal technical contribution\n\nIf you're looking for the original AIM model (AIMv1), please refer to the README [here](aim-v1/README.md).\n\n---\n\n## Overview of AIMv2\nWe introduce the AIMv2 family of vision models pre-trained with a multimodal autoregressive objective.\nAIMv2 pre-training is simple and straightforward to train and to scale effectively. Some AIMv2 highlights include:\n\n1. Outperforms OAI CLIP and SigLIP on the majority of multimodal understanding benchmarks.\n2. Outperforms DINOv2 on open-vocabulary object detection and referring expression comprehension.\n3. Exhibits strong recognition performance with AIMv2-3B achieving *89.5% on ImageNet using a frozen trunk*.\n\n![gh_aimv2_dark](aim-v2/assets/aimv2_overview_dark.png#gh-dark-mode-only)\n![gh_aimv2_light](aim-v2/assets/aimv2_overview_light.png#gh-light-mode-only)\n\n## AIMv2 Model Gallery\n\u003cdiv\u003e\n\u003ca href=\"#using-pytorch\"\u003e\u003cimg alt=\"PyTorch\" src=\"https://img.shields.io/badge/PyTorch-EE4C2C?style=for-the-badge\u0026logo=pytorch\u0026logoColor=white\" height=\"25\"/\u003e\u003c/a\u003e\n\u003ca href=\"#using-jax\"\u003e\u003cimg alt=\"JAX\" src=\"https://raw.githubusercontent.com/jax-ml/jax/main/images/jax_logo_250px.png\" height=\"25\"/\u003e\u003c/a\u003e\n\u003ca href=\"#using-mlx\"\u003e\u003cimg alt=\"MLX\" src=\"aim-v2/assets/mlx_logo_light.png\" height=\"25\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://huggingface.co/collections/apple/aimv2-6720fe1558d94c7805f7688c\"\u003e\u003cimg alt=\"HuggingFace\" src=\"https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-md.svg\" height=\"25\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\nWe share with the community AIMv2 pre-trained checkpoints of varying capacities, pre-training resolutions:\n\n+ [[`AIMv2 with 224px`]](#aimv2-with-224px)\n+ [[`AIMv2 with 336px`]](#aimv2-with-336px)\n+ [[`AIMv2 with 448px`]](#aimv2-with-448px)\n+ [[`AIMv2 with Native Resolution`]](#aimv2-with-native-resolution)\n+ [[`AIMv2 distilled ViT-Large`]](#aimv2-distilled-vit-large) (*recommended for multimodal applications*)\n+ [[`Zero-shot Adapted AIMv2`]](#zero-shot-adapted-aimv2)\n\n## Installation\nPlease install PyTorch using the official [installation instructions](https://pytorch.org/get-started/locally/).\nAfterward, install the package as:\n```commandline\npip install 'git+https://github.com/apple/ml-aim.git#subdirectory=aim-v1'\npip install 'git+https://github.com/apple/ml-aim.git#subdirectory=aim-v2'\n```\nWe also offer [MLX](https://ml-explore.github.io/mlx/) backend support for research and experimentation on Apple silicon.\nTo enable MLX support, simply run:\n```commandline\npip install mlx\n```\n\n## Examples\n\n### Using PyTorch\n\n```python\nfrom PIL import Image\n\nfrom aim.v2.utils import load_pretrained\nfrom aim.v1.torch.data import val_transforms\n\nimg = Image.open(...)\nmodel = load_pretrained(\"aimv2-large-patch14-336\", backend=\"torch\")\ntransform = val_transforms(img_size=336)\n\ninp = transform(img).unsqueeze(0)\nfeatures = model(inp)\n```\n\n### Using MLX\n\u003cdetails\u003e\n\n```python\nfrom PIL import Image\nimport mlx.core as mx\n\nfrom aim.v2.utils import load_pretrained\nfrom aim.v1.torch.data import val_transforms\n\nimg = Image.open(...)\nmodel = load_pretrained(\"aimv2-large-patch14-336\", backend=\"mlx\")\ntransform = val_transforms(img_size=336)\n\ninp = transform(img).unsqueeze(0)\ninp = mx.array(inp.numpy())\nfeatures = model(inp)\n```\n\u003c/details\u003e\n\n### Using JAX\n\n\u003cdetails\u003e\n\n```python\nfrom PIL import Image\nimport jax.numpy as jnp\n\nfrom aim.v2.utils import load_pretrained\nfrom aim.v1.torch.data import val_transforms\n\nimg = Image.open(...)\nmodel, params = load_pretrained(\"aimv2-large-patch14-336\", backend=\"jax\")\ntransform = val_transforms(img_size=336)\n\ninp = transform(img).unsqueeze(0)\ninp = jnp.array(inp)\nfeatures = model.apply({\"params\": params}, inp)\n```\n\u003c/details\u003e\n\n## Pre-trained Checkpoints\nThe pre-trained models can be accessed via [HuggingFace Hub](https://huggingface.co/collections/apple/aimv2-6720fe1558d94c7805f7688c) as:\n```python\nfrom PIL import Image\nfrom transformers import AutoImageProcessor, AutoModel\n\nimage = Image.open(...)\nprocessor = AutoImageProcessor.from_pretrained(\"apple/aimv2-large-patch14-336\")\nmodel = AutoModel.from_pretrained(\"apple/aimv2-large-patch14-336\", trust_remote_code=True)\n\ninputs = processor(images=image, return_tensors=\"pt\")\noutputs = model(**inputs)\n```\n\n### AIMv2 with 224px\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003emodel_id\u003c/th\u003e\n      \u003cth\u003e#params\u003c/th\u003e\n      \u003cth\u003eIN-1k\u003c/th\u003e\n      \u003cth\u003eHF Link\u003c/th\u003e\n      \u003cth\u003eBackbone\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody align=\"center\"\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-large-patch14-224\u003c/td\u003e\n      \u003ctd\u003e0.3B\u003c/td\u003e\n      \u003ctd\u003e86.6\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-224\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-224/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-huge-patch14-224\u003c/td\u003e\n      \u003ctd\u003e0.6B\u003c/td\u003e\n      \u003ctd\u003e87.5\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-huge-patch14-224\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-huge-patch14-224/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-1B-patch14-224\u003c/td\u003e\n      \u003ctd\u003e1.2B\u003c/td\u003e\n      \u003ctd\u003e88.1\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-1B-patch14-224\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-1B-patch14-224/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-3B-patch14-224\u003c/td\u003e\n      \u003ctd\u003e2.7B\u003c/td\u003e\n      \u003ctd\u003e88.5\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-3B-patch14-224\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-3B-patch14-224/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n### AIMv2 with 336px\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003emodel_id\u003c/th\u003e\n      \u003cth\u003e#params\u003c/th\u003e\n      \u003cth\u003eIN-1k\u003c/th\u003e\n      \u003cth\u003eHF Link\u003c/th\u003e\n      \u003cth\u003eBackbone\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody align=\"center\"\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-large-patch14-336\u003c/td\u003e\n      \u003ctd\u003e0.3B\u003c/td\u003e\n      \u003ctd\u003e87.6\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-336\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-336/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-huge-patch14-336\u003c/td\u003e\n      \u003ctd\u003e0.6B\u003c/td\u003e\n      \u003ctd\u003e88.2\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-huge-patch14-336\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-huge-patch14-336/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-1B-patch14-336\u003c/td\u003e\n      \u003ctd\u003e1.2B\u003c/td\u003e\n      \u003ctd\u003e88.7\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-1B-patch14-336\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-1B-patch14-336/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-3B-patch14-336\u003c/td\u003e\n      \u003ctd\u003e2.7B\u003c/td\u003e\n      \u003ctd\u003e89.2\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-3B-patch14-336\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-3B-patch14-336/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n### AIMv2 with 448px\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003emodel_id\u003c/th\u003e\n      \u003cth\u003e#params\u003c/th\u003e\n      \u003cth\u003eIN-1k\u003c/th\u003e\n      \u003cth\u003eHF Link\u003c/th\u003e\n      \u003cth\u003eBackbone\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody align=\"center\"\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-large-patch14-448\u003c/td\u003e\n      \u003ctd\u003e0.3B\u003c/td\u003e\n      \u003ctd\u003e87.9\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-448\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-448/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-huge-patch14-448\u003c/td\u003e\n      \u003ctd\u003e0.6B\u003c/td\u003e\n      \u003ctd\u003e88.6\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-huge-patch14-448\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-huge-patch14-448/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-1B-patch14-448\u003c/td\u003e\n      \u003ctd\u003e1.2B\u003c/td\u003e\n      \u003ctd\u003e89.0\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-1B-patch14-448\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-1B-patch14-448/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-3B-patch14-448\u003c/td\u003e\n      \u003ctd\u003e2.7B\u003c/td\u003e\n      \u003ctd\u003e89.5\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-3B-patch14-448\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-3B-patch14-448/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n### AIMv2 with Native Resolution\nWe additionally provide an AIMv2-L checkpoint that is finetuned to process a wide range of image resolutions and\naspect ratios. Regardless of the aspect ratio, the image is patchified (patch_size=14) and\n*a 2D sinusoidal positional embedding* is added to the linearly projected input patches.\n*This checkpoint supports number of patches in the range of [112, 4096]*.\n\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003emodel_id\u003c/th\u003e\n      \u003cth\u003e#params\u003c/th\u003e\n      \u003cth\u003eIN-1k\u003c/th\u003e\n      \u003cth\u003eHF Link\u003c/th\u003e\n      \u003cth\u003eBackbone\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody align=\"center\"\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-large-patch14-native\u003c/td\u003e\n      \u003ctd\u003e0.3B\u003c/td\u003e\n      \u003ctd\u003e87.3\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-native\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-native/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n### AIMv2 distilled ViT-Large\nWe provide an AIMv2-L checkpoint distilled from AIMv2-3B that provides a remarkable performance for multimodal\nunderstanding benchmarks.\n\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003eModel\u003c/th\u003e\n      \u003cth\u003eVQAv2\u003c/th\u003e\n      \u003cth\u003eGQA\u003c/th\u003e\n      \u003cth\u003eOKVQA\u003c/th\u003e\n      \u003cth\u003eTextVQA\u003c/th\u003e\n      \u003cth\u003eDocVQA\u003c/th\u003e\n      \u003cth\u003eInfoVQA\u003c/th\u003e\n      \u003cth\u003eChartQA\u003c/th\u003e\n      \u003cth\u003eSciQA\u003c/th\u003e\n      \u003cth\u003eMMEp\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody align=\"center\"\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eAIMv2-L\u003c/td\u003e\n      \u003ctd\u003e80.2\u003c/td\u003e\n      \u003ctd\u003e72.6\u003c/td\u003e\n      \u003ctd\u003e60.9\u003c/td\u003e\n      \u003ctd\u003e53.9\u003c/td\u003e\n      \u003ctd\u003e26.8\u003c/td\u003e\n      \u003ctd\u003e22.4\u003c/td\u003e\n      \u003ctd\u003e20.3\u003c/td\u003e\n      \u003ctd\u003e74.5\u003c/td\u003e\n      \u003ctd\u003e1457\u003c/td\u003e\n     \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eAIMv2-L-distilled\u003c/td\u003e\n      \u003ctd\u003e81.1\u003c/td\u003e\n      \u003ctd\u003e73.0\u003c/td\u003e\n      \u003ctd\u003e61.4\u003c/td\u003e\n      \u003ctd\u003e53.5\u003c/td\u003e\n      \u003ctd\u003e29.2\u003c/td\u003e\n      \u003ctd\u003e23.3\u003c/td\u003e\n      \u003ctd\u003e24.0\u003c/td\u003e\n      \u003ctd\u003e76.3\u003c/td\u003e\n      \u003ctd\u003e1627\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003emodel_id\u003c/th\u003e\n      \u003cth\u003e#params\u003c/th\u003e\n      \u003cth\u003eRes.\u003c/th\u003e\n      \u003cth\u003eHF Link\u003c/th\u003e\n      \u003cth\u003eBackbone\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody align=\"center\"\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-large-patch14-224-distilled\u003c/td\u003e\n      \u003ctd\u003e0.3B\u003c/td\u003e\n      \u003ctd\u003e224px\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-224-distilled\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-224-distilled/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eaimv2-large-patch14-336-distilled\u003c/td\u003e\n      \u003ctd\u003e0.3B\u003c/td\u003e\n      \u003ctd\u003e336px\u003c/td\u003e\n      \u003ctd\u003e🤗\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-336-distilled\" target=\"_blank\"\u003elink\u003c/a\u003e\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-336-distilled/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n### Zero-shot Adapted AIMv2\nWe provide the AIMv2-L vision and text encoders after LiT tuning to enable zero-shot recognition.\n\n\u003ctable style=\"margin: auto\"\u003e\n  \u003cthead\u003e\n    \u003ctr\u003e\n      \u003cth\u003emodel\u003c/th\u003e\n      \u003cth\u003e#params\u003c/th\u003e\n      \u003cth\u003ezero-shot IN1-k\u003c/th\u003e\n      \u003cth\u003eBackbone\u003c/th\u003e\n    \u003c/tr\u003e\n  \u003c/thead\u003e\n  \u003ctbody align=\"center\"\u003e\n    \u003ctr\u003e\n      \u003ctd\u003eAIMv2-L\u003c/td\u003e\n      \u003ctd\u003e0.3B\u003c/td\u003e\n      \u003ctd\u003e77.0\u003c/td\u003e\n      \u003ctd\u003e\u003ca href=\"https://huggingface.co/apple/aimv2-large-patch14-224-lit/resolve/main/model.safetensors\"\u003elink\u003c/a\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n\u003c/table\u003e\n\n## Citation\nIf you find our work useful, please consider citing us as:\n\n### AIMv2 bibtex\n\n```bibtex\n@misc{fini2024multimodal,\n    title={Multimodal Autoregressive Pre-training of Large Vision Encoders},\n    author={Enrico Fini and Mustafa Shukor and Xiujun Li and Philipp Dufter and Michal Klein and David Haldimann and Sai Aitharaju and Victor Guilherme Turrisi da Costa and Louis Béthune and Zhe Gan and Alexander T Toshev and Marcin Eichner and Moin Nabi and Yinfei Yang and Joshua M. Susskind and Alaaeldin El-Nouby},\n    year={2024},\n    eprint={2411.14402},\n    archivePrefix={arXiv},\n    primaryClass={cs.CV}\n}\n```\n\n### AIMv1 bibtex\n\n```bibtex\n@InProceedings{pmlr-v235-el-nouby24a,\n  title     = {Scalable Pre-training of Large Autoregressive Image Models},\n  author    = {El-Nouby, Alaaeldin and Klein, Michal and Zhai, Shuangfei and Bautista, Miguel \\'{A}ngel and Shankar, Vaishaal and Toshev, Alexander T and Susskind, Joshua M. and Joulin, Armand},\n  booktitle = {Proceedings of the 41st International Conference on Machine Learning},\n  pages     = {12371--12384},\n  year      = {2024},\n}\n```\n\n## License\nPlease check out the repository [LICENSE](LICENSE) before using the provided code and models.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-aim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapple%2Fml-aim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-aim/lists"}