{"id":31211534,"url":"https://github.com/cyberagentailab/adtec","last_synced_at":"2026-02-13T07:32:16.147Z","repository":{"id":252750996,"uuid":"811623263","full_name":"CyberAgentAILab/AdTEC","owner":"CyberAgentAILab","description":"The AdTEC dataset is designed to evaluate the quality of ad texts from multiple aspects, considering practical advertising operations.","archived":false,"fork":false,"pushed_at":"2025-06-26T17:17:21.000Z","size":11807,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-10T07:42:48.959Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://cyberagentailab.github.io/AdTEC/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CyberAgentAILab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-06-07T01:09:32.000Z","updated_at":"2025-06-26T17:17:24.000Z","dependencies_parsed_at":"2024-08-12T09:33:21.810Z","dependency_job_id":"21009756-28a8-4bb9-8f3f-abcccdf494f1","html_url":"https://github.com/CyberAgentAILab/AdTEC","commit_stats":null,"previous_names":["cyberagentailab/adtec"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CyberAgentAILab/AdTEC","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2FAdTEC","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2FAdTEC/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2FAdTEC/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2FAdTEC/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CyberAgentAILab","download_url":"https://codeload.github.com/CyberAgentAILab/AdTEC/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CyberAgentAILab%2FAdTEC/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276195627,"owners_count":25601152,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-21T02:00:07.055Z","response_time":72,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-21T05:30:48.376Z","updated_at":"2025-09-21T05:30:50.662Z","avatar_url":"https://github.com/CyberAgentAILab.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# AdTEC: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising\n\n[![arXiv](https://img.shields.io/badge/-2408.05906-grey?style=flat\u0026logo=arxiv)](https://arxiv.org/abs/2408.05906)\n[![Dataset](https://img.shields.io/badge/-Dataset-grey?style=flat\u0026logo=huggingface\u0026logoColor=white)](https://huggingface.co/datasets/cyberagent/AdTEC)\n[![Project Page](https://img.shields.io/badge/-Project_Page-grey?style=flat\u0026logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0IiBmaWxsPSJub25lIiBzdHJva2U9IiNmYWZhZmEiIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbGluZWpvaW49InJvdW5kIiBjbGFzcz0ibHVjaWRlIGx1Y2lkZS1nbG9iZS1pY29uIGx1Y2lkZS1nbG9iZSI+PGNpcmNsZSBjeD0iMTIiIGN5PSIxMiIgcj0iMTAiLz48cGF0aCBkPSJNMTIgMmExNC41IDE0LjUgMCAwIDAgMCAyMCAxNC41IDE0LjUgMCAwIDAgMC0yMCIvPjxwYXRoIGQ9Ik0yIDEyaDIwIi8+PC9zdmc+)](https://cyberagentailab.github.io/AdTEC/)\n[![Poster](https://img.shields.io/badge/-Poster-grey?style=flat\u0026logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0IiBmaWxsPSJub25lIiBzdHJva2U9IiNmYWZhZmEiIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbGluZWpvaW49InJvdW5kIiBjbGFzcz0ibHVjaWRlIGx1Y2lkZS1maWxlLWljb24gbHVjaWRlLWZpbGUiPjxwYXRoIGQ9Ik0xNSAySDZhMiAyIDAgMCAwLTIgMnYxNmEyIDIgMCAwIDAgMiAyaDEyYTIgMiAwIDAgMCAyLTJWN1oiLz48cGF0aCBkPSJNMTQgMnY0YTIgMiAwIDAgMCAyIDJoNCIvPjwvc3ZnPg==)](https://github.com/CyberAgentAILab/AdTEC/blob/main/materials/NAACL2025-poster.pdf)\n[![Slides](https://img.shields.io/badge/-Slides-grey?style=flat\u0026logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0IiBmaWxsPSJub25lIiBzdHJva2U9IiNmYWZhZmEiIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbGluZWpvaW49InJvdW5kIiBjbGFzcz0ibHVjaWRlIGx1Y2lkZS1pbWFnZXMtaWNvbiBsdWNpZGUtaW1hZ2VzIj48cGF0aCBkPSJNMTggMjJINGEyIDIgMCAwIDEtMi0yVjYiLz48cGF0aCBkPSJtMjIgMTMtMS4yOTYtMS4yOTZhMi40MSAyLjQxIDAgMCAwLTMuNDA4IDBMMTEgMTgiLz48Y2lyY2xlIGN4PSIxMiIgY3k9IjgiIHI9IjIiLz48cmVjdCB3aWR0aD0iMTYiIGhlaWdodD0iMTYiIHg9IjYiIHk9IjIiIHJ4PSIyIi8+PC9zdmc+)](https://github.com/CyberAgentAILab/AdTEC/blob/main/materials/NAACL2025-slides.pdf)\n[![Video](https://img.shields.io/badge/-Video-grey?style=flat\u0026logo=youtube)](https://youtu.be/3QmKidnlkiI)\n\n\n\nThe AdTEC dataset is designed to evaluate the quality of ad texts from multiple aspects, considering practical advertising operations.\n\n\u003e [!NOTE]\n\u003e We are excited to announce that our paper has been accepted to the main track of [NAACL 2025](https://aclanthology.org/2025.naacl-long.391/)! We are making the data from this research publicly available in this repository.\n\n## Experiments and Tasks Considered in the Paper\n\nThis dataset includes five tasks:\n- **Ad Acceptability**: Given a text, predict the acceptance of overall quality with binary labels: `acceptable`/`unacceptable`.\n- **Ad Consistency**: Given a pair of ad text and landing page (LP) text, predict the consistency between the ad and LP text with binary labels: `consistent`/`inconsistent`.\n- **Ad Performance Estimation**: Given ad texts, keywords, and industry type, predict the overall performance with a score ranging from 0 to 100.\n- **A3 Recognition**: Given a text, predict all possible aspects of advertising appeals (A3). The comprehensive list of A3 can be found in [[Murakami et al., 2022](https://aclanthology.org/2022.naacl-industry.9/)].\n- **Ad Similarity**: Given a pair of ad texts, predict their similarity with a score ranging from 1 to 5.\n\n## Languages\nThe dataset contains Japanese text only.\n\n## Domain\nOnline advertisement (Search engine advertisement).\n\n## Dataset Curators\nThe dataset was created by Peinan Zhang, Yusuke Sakai, Masato Mita, Hiroki Ouchi, and Taro Watanabe.\n\n## Dataset Structure\nAll tasks are defined in TSV format and split into train, dev, and test sets.\nThe number of instances for each task in each split is shown in the table below.\n\n| Task | Train | Dev | Test |\n|------|------:|----:|-----:|\n| Ad Acceptability | 13,265 | 970 | 980 |\n| Ad Consistency   | 10,635 | 945 | 970 |\n| Ad Performance Estimation | 125,087 | 965 | 965 |\n| A3 Recognition | 1,856 | 465 | 410 |\n| Ad Similarity | 4,980 | 623 | 629 |\n\nDetailed examples of each task are provided below.\n\n### Ad Acceptability\n\n```tsv\nlabel  title\nacceptable  最新のビデオキャプチャー年間ランキングをご紹介・会員ランクに応じてポイント獲得がお得に。\nunacceptable\t茨城県アパート以上\nacceptable\t気になる保障をカンタン見積り\n...\n```\n\n### Ad Consistency\n\n```tsv\nlabel\tlp_text\ttitle\nconsistent\t介護の転職サポート登録のページ。介護の求人・転職なら介護ワーカー。介護福祉士、ホームヘルパー、ケアマネ、社会福祉士などの求人情報がどこよりも詳しくわかる！あなたの理想の職場がきっと見つかる！\t介護職の求人なら【名古屋市】\ninconsistent\t手ぶらで体験レッスンが0円！上下ウェア・バスタオル・フェイスタオル・ヨガマット無料レンタルだから、手ぶらでOK！さらにお水（550ml×2本）をプレゼント！\tヨガ大分スタジオ「ロイブ」\nconsistent\tホーム コース ゲーム総合科 ゲーム総合科\tプロの講師陣があなたのなりゲームクリエイターになるにはを全力サポート！資料請求はこちらから\n...\n```\n\n### Ad Performance Estimation\n\n```tsv\nindustry_type\tkeyword\ttitle_1\ttitle_2\ttitle_3\tdescription_1\tdescription_2\tscore\nEC\thunter\t【公式】[MASK_2]（[MASK_8]）\t《MAX50% OFF》AUTUMN SALE\t人気＆注目アイテムはこちらから\t人気商品がMAX50%OFF！機能性抜群の[MASK_8]アイテムで秋を楽しく過ごそう\t機能的でスタイリッシュなデザイン。創業当初から愛される「機能美」をあなたへ。\t55.90851442\n金融\tかしつけ\t急ぎでお金の貸付受けたい方必見\t免許証だけで最短1時間貸付可能\t\t≪最短1h融資可能≫ローン貸付ランキング24hスマホ完結申込。融資可能か1秒診断\t\t79.01220606\n...\n```\n\n### A3 Recognition\n\n```tsv\ntitle  labels\nTHE THOUSAND KYOTO（ザ・サウザンド京都）の最寄駅は京都駅。THE THOUSAND KYOTO。ブライダルフェア情報、おトクな割引特典、フォトギャラリー、口コミ、見積もり例が充実！結婚式場探しはハナユメ！\t商品/サービス特徴|オファー|アクセス\n鳥取ガス enetopia\t\nジャニーズ屈指のダンス力を誇る7人組グループ「Travis Japan」を全4週にわたって特集！\t商品/サービス特徴\n...\n```\n\n### Ad Similarity\n\n```tsv\ntext1\ttext2\tscore\n３月３１日まで実施中\t山口トヨタ／お客様大感謝祭\t2.67\n春得キャンペーン第1弾\t春得キャンペーン\t4.33\n...\n```\n\n## License\nAdTEC dataset is released under the [CreativeCommons Attribution-NonCommercial-ShareAlike 4.0 International license](./LICENSE).\n\n## Citation\n\n```latex\n@inproceedings{zhang2025adtec,\n  title={{AdTEC}: A Unified Benchmark for Evaluating Text Quality in Search Engine Advertising},\n  author={Peinan Zhang and Yusuke Sakai and Masato Mita and Hiroki Ouchi and Taro Watanabe},\n  booktitle={Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL)},\n  year={2025},\n  publisher={Association for Computational Linguistics},\n  eprint={2408.05906},\n  primaryClass={cs.CL},\n  url={https://arxiv.org/abs/2408.05906},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberagentailab%2Fadtec","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyberagentailab%2Fadtec","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyberagentailab%2Fadtec/lists"}