{"id":15036851,"url":"https://github.com/apple/ml-interspeech2022-phi_rtn","last_synced_at":"2025-08-02T00:07:55.170Z","repository":{"id":65978502,"uuid":"520601600","full_name":"apple/ml-interspeech2022-phi_rtn","owner":"apple","description":"Repository accompanying the Interspeech 2022 publication titled \"Space-Efficient Representation of Entity-centric Query Language Models\" by Van Gysel et al.","archived":false,"fork":false,"pushed_at":"2022-09-08T13:45:30.000Z","size":34721,"stargazers_count":13,"open_issues_count":0,"forks_count":2,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-01-30T07:33:10.909Z","etag":null,"topics":["language-modeling","machine-learning","speech-recognition","virtual-assistants"],"latest_commit_sha":null,"homepage":"https://machinelearning.apple.com/research/space-efficient-representation","language":null,"has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apple.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-08-02T18:06:00.000Z","updated_at":"2024-10-29T10:43:56.000Z","dependencies_parsed_at":"2023-02-19T19:30:59.590Z","dependency_job_id":null,"html_url":"https://github.com/apple/ml-interspeech2022-phi_rtn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-interspeech2022-phi_rtn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-interspeech2022-phi_rtn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-interspeech2022-phi_rtn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-interspeech2022-phi_rtn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apple","download_url":"https://codeload.github.com/apple/ml-interspeech2022-phi_rtn/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237224861,"owners_count":19275098,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["language-modeling","machine-learning","speech-recognition","virtual-assistants"],"created_at":"2024-09-24T20:32:31.462Z","updated_at":"2025-02-05T01:31:36.678Z","avatar_url":"https://github.com/apple.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Space-Efficient Representation of Entity-centric Query Language Models\n\nThis repository hosts the data released as part of the Interspeech 2022 publication titled \"*Space-Efficient Representation of Entity-centric Query Language Models*\" by Van Gysel et al. The data consists of a two-level grammar of weighted templates with entity slots, and a weighted list of entities that fill that slot.\n\nThe [templates](data/templates.csv.gz) represent use-case queries where a user instructs a virtual assistant to navigate, or interact with, a catalog of audible content (e.g., songs, artists, playlists, podcasts, etc.) with a weight that represents use-case importance. The [entities](data/entities.csv.gz) are a list of media entities extracted from a large-scale media catalog around December 2021 with a weight that represents the entity's popularity according to a popular online streaming service.\n\n## Getting started\n\nThe templates and entities are stored as CSV files, compressed using GZIP. The following Python snippet allows you to parse the data:\n\n\timport gzip\n\timport csv\n\timport os\n\t\n\tdata_dir = os.path.join(os.path.realpath(os.path.dirname(__file__)), 'data')\n\tassert os.path.isdir(data_dir)\n\t\n\ttemplates_path = os.path.join(data_dir, 'templates.csv.gz')\n\tassert os.path.isfile(templates_path)\n\t\n\tentities_path = os.path.join(data_dir, 'entities.csv.gz')\n\tassert os.path.isfile(entities_path)\n\t\n\t\n\tdef load_entries(path):\n\t    with gzip.open(path, 'rt') as f:\n\t        reader = csv.DictReader(f)\n\t\n\t        yield from reader\n\t\n\t\n\t# Print the first 10 templates.\n\tprint('Top-10 templates:')\n\tfor _, entry in zip(range(10), load_entries(templates_path)):\n\t    print(entry)\n\t\n\tprint()\n\t\n\t# Print the first 10 entities.\n\tprint('Top-10 entities:')\n\tfor _, entry in zip(range(10), load_entries(entities_path)):\n\t    print(entry)\n\t\n\tprint()\n\nWhen executed, this prints the following output:\n\n\tTop-10 templates:\n\t{'unnormalized_prior': '57637551.0', 'text': 'hey Siri play \u003cENTITY\u003e'}\n\t{'unnormalized_prior': '39276474.0', 'text': 'play \u003cENTITY\u003e'}\n\t{'unnormalized_prior': '3447328.0', 'text': 'Siri play \u003cENTITY\u003e'}\n\t{'unnormalized_prior': '1938007.0', 'text': 'hey Siri play the song \u003cENTITY\u003e'}\n\t{'unnormalized_prior': '1880284.0', 'text': 'hey Siri \u003cENTITY\u003e'}\n\t{'unnormalized_prior': '1866158.0', 'text': '\u003cENTITY\u003e'}\n\t{'unnormalized_prior': '1446139.0', 'text': 'play the song \u003cENTITY\u003e'}\n\t{'unnormalized_prior': '1413427.0', 'text': 'hey Siri play \u003cENTITY\u003e music'}\n\t{'unnormalized_prior': '1265844.0', 'text': 'hey Siri who sings \u003cENTITY\u003e'}\n\t{'unnormalized_prior': '998550.0', 'text': 'hey Siri play some \u003cENTITY\u003e'}\n\t\n\tTop-10 entities:\n\t{'unnormalized_prior': '667.5385373950351', 'text': 'hip hop rap'}\n\t{'unnormalized_prior': '489.17729342016963', 'text': 'Pop'}\n\t{'unnormalized_prior': '379.43119497564226', 'text': 'R\u0026B'}\n\t{'unnormalized_prior': '362.88422257786624', 'text': 'alternative'}\n\t{'unnormalized_prior': '334.26272050577774', 'text': 'rock'}\n\t{'unnormalized_prior': '326.4236190113619', 'text': 'country'}\n\t{'unnormalized_prior': '234.22980267171658', 'text': 'Holiday music'}\n\t{'unnormalized_prior': '214.52684952882836', 'text': 'soundtracks'}\n\t{'unnormalized_prior': '194.4645101369402', 'text': 'hard rock'}\n\t{'unnormalized_prior': '180.5184652443179', 'text': 'dance'}\n\n## Citation\n\nIf you use the data hosted in this repository within your scientific publication, please refer to our [Interspeech 2022](https://machinelearning.apple.com/research/space-efficient-representation) paper:\n\n```\n@inproceedings{VanGysel2022phirtn,\n  title={Space-Efficient Representation of Entity-centric Query Language Models},\n  author={Van Gysel, Christophe and Hannemann, Mirko and Pusateri, Ernie and Oualil, Youssef and Oparin, Ilya},\n  booktitle={Interspeech},\n  year={2022},\n}\n```\n\n## License\n\nThe content of this repository is licensed under the [Apple Sample Code License](LICENSE).","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-interspeech2022-phi_rtn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapple%2Fml-interspeech2022-phi_rtn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-interspeech2022-phi_rtn/lists"}