{"id":20046624,"url":"https://github.com/ncsoft/cap2qa","last_synced_at":"2026-02-12T12:39:14.854Z","repository":{"id":217228860,"uuid":"741762731","full_name":"ncsoft/cap2qa","owner":"ncsoft","description":"Official implementation of \"Visually Dehallucinative Instruction Generation\" (ICASSP 2024)","archived":false,"fork":false,"pushed_at":"2024-03-19T10:42:15.000Z","size":26460,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-02T07:49:41.502Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ncsoft.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-11T03:58:34.000Z","updated_at":"2024-04-18T03:16:42.000Z","dependencies_parsed_at":"2024-03-19T11:57:37.429Z","dependency_job_id":null,"html_url":"https://github.com/ncsoft/cap2qa","commit_stats":null,"previous_names":["ncsoft/cap2qa"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ncsoft/cap2qa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2Fcap2qa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2Fcap2qa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2Fcap2qa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2Fcap2qa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ncsoft","download_url":"https://codeload.github.com/ncsoft/cap2qa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ncsoft%2Fcap2qa/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260571715,"owners_count":23029934,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-13T11:25:18.254Z","updated_at":"2026-02-12T12:39:09.816Z","avatar_url":"https://github.com/ncsoft.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Visually Dehallucinative Instruction Generation\n\n**(CAP2QA) Visually Dehallucinative Instruction Generation**  [[paper](https://arxiv.org/abs/2402.08348)] \u003cbr\u003e\n[Sungguk Cha](https://sunggukcha.github.io), Jusung Lee, Younghyun Lee and Cheoljong Yang\n\nSee also, \n**(IDK) Visually Dehallucinative Instruction Generation: Know What You Don't Know** [[paper](https://arxiv.org/abs/2402.09717)] [[github](https://github.com/ncsoft/idk)] \u003cbr\u003e\n\n## CAP2QA\n### Image-aligned Sentence Level VQA Data\n\u003cimg src=\"images/fig1.png\"\u003e \u003cbr\u003e\n\u003cimg src=\"images/examples.png\"\u003e\n### Details\n| Dataset             | Avg. \\#word Question/Answer | \\#Image | \\#Question | Scalable | ImageAligned | Recognition | Description | Reasoning |\n|---------------------|-----------------------------|---------|------------|----------|--------------|-------------|--------------------------|-----------------------|\n| DAQUAR              | 11.5/1.1 (word)             | 1,449   | 12,468     | $\\times$ | $\\checkmark$ | $\\checkmark$| $\\times$                 | $\\times$              |\n| VQAv2               | 6.1/1.2 (word)              | 200k    | 1.1M       | $\\times$ | $\\checkmark$ | $\\checkmark$| $\\times$                 | $\\times$              |\n| OKVQA               | 8.1/1.3 (word)              | 14,031  | 14,055     | $\\times$ | $\\times$     | $\\checkmark$| $\\times$                 | $\\checkmark$          |\n| LLaVA               | 10.7/60.7 (sentence)        | 80,000  | 221,333    | $\\checkmark$| $\\times$   | $\\checkmark$| $\\checkmark$            | $\\checkmark$          |\n| **CAP2QA** (Ours)   | 7.2/5.4 (sentence)          | 122,906 | 873,631    | $\\checkmark$| $\\checkmark$ | $\\checkmark$| $\\checkmark$          | $\\checkmark$          |\n\nPrepare MSCOCO 2017 images. \nTrain/Val splits are preserved.\n\n## Citation\nIf you find CAP2QA useful for your research and applications, please cite using this BibTeX:\n```\n@inproceedings{cha2024visually,\n      title={Visually Dehallucinative Instruction Generation}, \n      author={Cha, Sungguk and Lee, Jusung and Lee, Younghyun and Yang, Cheoljong},\n      booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},\n      year={2024},\n}\n```\n\n## Licenses\nThis work, [instructions](/instructions), used COCO-Caption dataset (CC BY-NC-ND license) for the caption source and ChatGPT (refer OpenAI policies, https://openai.com/policies).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncsoft%2Fcap2qa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fncsoft%2Fcap2qa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fncsoft%2Fcap2qa/lists"}