{"id":20161902,"url":"https://github.com/ailab-cvc/vl-gpt","last_synced_at":"2026-01-28T05:19:39.288Z","repository":{"id":212536978,"uuid":"728946471","full_name":"AILab-CVC/VL-GPT","owner":"AILab-CVC","description":"VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation","archived":false,"fork":false,"pushed_at":"2024-09-12T05:25:46.000Z","size":5230,"stargazers_count":85,"open_issues_count":2,"forks_count":2,"subscribers_count":19,"default_branch":"main","last_synced_at":"2025-03-03T02:43:51.788Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AILab-CVC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-08T03:37:34.000Z","updated_at":"2025-01-29T18:17:22.000Z","dependencies_parsed_at":"2023-12-14T20:42:14.063Z","dependency_job_id":"fa6d3fae-4146-4c5e-989c-1e019f22fcea","html_url":"https://github.com/AILab-CVC/VL-GPT","commit_stats":null,"previous_names":["ailab-cvc/vl-gpt"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AILab-CVC/VL-GPT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FVL-GPT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FVL-GPT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FVL-GPT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FVL-GPT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AILab-CVC","download_url":"https://codeload.github.com/AILab-CVC/VL-GPT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AILab-CVC%2FVL-GPT/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28840088,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-28T02:10:51.810Z","status":"ssl_error","status_checked_at":"2026-01-28T02:10:50.806Z","response_time":57,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T00:21:41.049Z","updated_at":"2026-01-28T05:19:39.273Z","avatar_url":"https://github.com/AILab-CVC.png","language":null,"funding_links":[],"categories":["Fundamental MIM Methods"],"sub_categories":["MIM for Multi-Modality"],"readme":"# VL-GPT\n\nVL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation\n\n## Project Termination \nWe regret to inform that the project (VL-GPT) has been terminated. Unfortunately, the authors Jinguo and Xiaohan left the company and did not manage to refactor the codebase before their checkout. As a result, the source code and weights for this work cannot be released.\n\nHowever, the main contribution from this work, an image tokenizer with continuous embedding and applying it in Large Multimodal Model, has also been  adopted in another project within our team called [SEED-X](https://github.com/AILab-CVC/SEED-X), which has been made open source already. We recommend to refer to the [SEED-X](https://github.com/AILab-CVC/SEED-X) project for insights and implementation details.\n\nWe sincerely apologize for not being able to release this work as an open-source project. Thank you for your understanding.\n\n## Introduction\n\n\u003cdiv align=\"center\"\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca href=\"https://scholar.google.com/citations?user=YfHg5lQAAAAJ\u0026hl=en\" target=\"_blank\"\u003eJinguo Zhu\u003c/a\u003e\u003csup\u003e1*\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca href=\"https://dingxiaohan.xyz/\" target=\"_blank\"\u003eXiaohan Ding\u003c/a\u003e\u003csup\u003e2*\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003c/span\u003e\n    \u003ca href=\"https://geyixiao.com/\" target=\"_blank\"\u003eYixiao Ge\u003c/a\u003e\u003csup\u003e2\u003c/sup\u003e,\n    \u003c/span\u003e\n     \u003cspan class=\"author-block\"\u003e\n    \u003c/span\u003e\n    \u003ca href=\"https://geyuying.github.io/\" target=\"_blank\"\u003eYuying Ge\u003c/a\u003e\u003csup\u003e2\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003c/br\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca target=\"_blank\"\u003eSijie Zhao\u003c/a\u003e\u003csup\u003e2\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca href=\"https://hszhao.github.io/\" target=\"_blank\"\u003eHengshuang Zhao\u003c/a\u003e\u003csup\u003e3\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca href=\"https://gr.xjtu.edu.cn/web/xhw\" target=\"_blank\"\u003eXiaohua Wang\u003c/a\u003e\u003csup\u003e1\u003c/sup\u003e,\n    \u003c/span\u003e\n    \u003cspan class=\"author-block\"\u003e\n    \u003ca href=\"https://scholar.google.com/citations?user=4oXBp9UAAAAJ\u0026hl=en\u0026oi=ao\" target=\"_blank\"\u003eYing Shan\u003c/a\u003e\u003csup\u003e2\u003c/sup\u003e\n    \u003c/span\u003e\n\n\u003c/div\u003e\n\n\n\n\u003cdiv align=\"center\"\u003e\n    \u003csup\u003e1\u003c/sup\u003e \u003ca  target='_blank'\u003eXi'an Jiaotong University\u003c/a\u003e\n    \u003csup\u003e2\u003c/sup\u003e \u003ca href='https://ai.tencent.com/' target='_blank'\u003eTencent AI Lab\u003c/a\u003e\n    \u003csup\u003e3\u003c/sup\u003e\n    \u003ca  target='_blank'\u003eThe University of Hong Kong\u003c/a\u003e\u0026emsp;\n    \u003c/br\u003e\n    \u003csup\u003e*\u003c/sup\u003e Equal Contribution\u0026emsp;\n\u003c/div\u003e\n\n\u003ca href=\"https://arxiv.org/abs/2312.09251\"\u003e\u003cimg src=\"https://img.shields.io/badge/Paper-PDF-orange\"\u003e\u003c/a\u003e \n\u003ca href=\"#LICENSE--citation\"\u003e\n  \u003cimg alt=\"License: Apache2.0\" src=\"https://img.shields.io/badge/LICENSE-Apache%202.0-blue.svg\"/\u003e\n\u003c/a\u003e\n\n\n\u003cp align=\"center\" width=\"100%\"\u003e\n\u003cimg src=\"assets/overview.png\"  width=\"100%\" height=\"60%\"\u003e\n\u003c/p\u003e\n\n\n\n* VL-GPT is a generative pre-trained transformer model for vision and language understanding and generation tasks, which can perceive and generate visual and linguistic data concurrently. By employing a straightforward auto-regressive objective, VL-GPT achieves a unified pre-training for both image and text modalities.\n\n* We also propose an image tokenizer-detokenizer framework for the conversion between raw images and continuous visual embeddings, analogous to the role of the BPE tokenization in language models.\n\n\n\n\n\n## License\nThis project is released under the Apache 2.0 license. Please see the [LICENSE](LICENSE) file for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Fvl-gpt","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Failab-cvc%2Fvl-gpt","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Failab-cvc%2Fvl-gpt/lists"}