{"id":31750501,"url":"https://github.com/emla2805/vision-transformer","last_synced_at":"2025-10-09T15:52:36.372Z","repository":{"id":47262387,"uuid":"302133243","full_name":"emla2805/vision-transformer","owner":"emla2805","description":"Tensorflow implementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)","archived":false,"fork":false,"pushed_at":"2020-10-18T21:07:01.000Z","size":336,"stargazers_count":196,"open_issues_count":3,"forks_count":63,"subscribers_count":3,"default_branch":"master","last_synced_at":"2023-11-07T18:12:47.820Z","etag":null,"topics":["computer-vision","tensorflow","transformer","vision-transformer"],"latest_commit_sha":null,"homepage":"https://openreview.net/pdf?id=YicbFdNTTy","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/emla2805.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-07T18:58:50.000Z","updated_at":"2023-09-18T12:47:42.000Z","dependencies_parsed_at":"2022-09-15T15:50:40.829Z","dependency_job_id":null,"html_url":"https://github.com/emla2805/vision-transformer","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/emla2805/vision-transformer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emla2805%2Fvision-transformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emla2805%2Fvision-transformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emla2805%2Fvision-transformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emla2805%2Fvision-transformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/emla2805","download_url":"https://codeload.github.com/emla2805/vision-transformer/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/emla2805%2Fvision-transformer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001644,"owners_count":26083147,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","tensorflow","transformer","vision-transformer"],"created_at":"2025-10-09T15:52:33.609Z","updated_at":"2025-10-09T15:52:36.361Z","avatar_url":"https://github.com/emla2805.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Vision Transformer (ViT)\n\nTensorflow implementation of the Vision Transformer (ViT) presented in \n[An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://openreview.net/pdf?id=YicbFdNTTy),\nwhere the authors show that Transformers applied directly to image patches and pre-trained on large datasets work really well on image classification.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"vit.png\" height=\"440px\"\u003e\n\u003c/p\u003e\n\n## Install dependencies\n\nCreate a Python 3 virtual environment and activate it:\n\n```bash\nvirtualenv -p python3 venv\nsource ./venv/bin/activate\n```\n\nNext, install the required dependencies:\n\n```bash\npip install -r requirements.txt\n```\n\n## Train model\n\nStart the model training by running:\n\n```bash\npython train.py --logdir path/to/log/dir\n```\n\nTo track metrics, start `Tensorboard`\n\n```bash\ntensorboard --logdir path/to/log/dir\n```\n\nand then go to [localhost:6006](localhost:6006).\n\n## Citation\n\n```bibtex\n@inproceedings{\n    anonymous2021an,\n    title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},\n    author={Anonymous},\n    booktitle={Submitted to International Conference on Learning Representations},\n    year={2021},\n    url={https://openreview.net/forum?id=YicbFdNTTy},\n    note={under review}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femla2805%2Fvision-transformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Femla2805%2Fvision-transformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Femla2805%2Fvision-transformer/lists"}