{"id":28676601,"url":"https://github.com/zjunlp/speech","last_synced_at":"2026-03-04T15:01:55.682Z","repository":{"id":169201132,"uuid":"634910442","full_name":"zjunlp/SPEECH","owner":"zjunlp","description":"[ACL 2023] SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres","archived":false,"fork":false,"pushed_at":"2023-12-22T12:04:59.000Z","size":46779,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-06-13T23:05:15.706Z","etag":null,"topics":["acl2023","energy-model","event-extraction","ie","information-extraction","machine-learning","natural-language-processing","nlp","physics","pytorch","speech","structure-prediction"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-05-01T14:23:56.000Z","updated_at":"2024-06-03T13:57:24.000Z","dependencies_parsed_at":"2023-12-25T13:21:51.259Z","dependency_job_id":null,"html_url":"https://github.com/zjunlp/SPEECH","commit_stats":{"total_commits":26,"total_committers":2,"mean_commits":13.0,"dds":0.07692307692307687,"last_synced_commit":"6ff21120cf1d1a75f9a2a3efd45a25f7c94c7404"},"previous_names":["zjunlp/speech"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjunlp/SPEECH","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FSPEECH","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FSPEECH/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FSPEECH/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FSPEECH/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/SPEECH/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FSPEECH/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30084685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-04T13:22:36.021Z","status":"ssl_error","status_checked_at":"2026-03-04T13:20:45.750Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acl2023","energy-model","event-extraction","ie","information-extraction","machine-learning","natural-language-processing","nlp","physics","pytorch","speech","structure-prediction"],"created_at":"2025-06-13T23:05:15.747Z","updated_at":"2026-03-04T15:01:55.662Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SPEECH 🚀\n\n\u003cp align=\"center\"\u003e\n    \u003cfont size=6\u003e\u003cstrong\u003e💬SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres\u003c/strong\u003e\u003c/font\u003e\n\u003c/p\u003e\n\n\n🍎 The project is an official implementation for [**SPEECH**](https://github.com/zjunlp/SPEECH) model and a repository for [**OntoEvent-Doc**](https://github.com/zjunlp/SPEECH/tree/main/Datasets/OntoEvent-Doc.zip) dataset, which has firstly been proposed in the paper [💬SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres](https://aclanthology.org/2023.acl-long.21/) accepted by ACL 2023 main conference. \n\n🖥️ We also release the [poster](https://github.com/zjunlp/SPEECH/tree/main/ACL2023@Poster_Speech.pdf) and [slides](https://github.com/zjunlp/SPEECH/tree/main/ACL2023@Slides_Speech.pdf) for better understanding of this paper. \n\n🤗 The implementations are based on [Huggingface's Transformers](https://github.com/huggingface/transformers) and also referred to [OntoED](https://github.com/231sm/Reasoning_In_EE) \u0026 [DeepKE](https://github.com/zjunlp/DeepKE). \n\n🤗 The baseline implementations are reproduced with codes referred to [MAVEN's baselines](https://github.com/THU-KEG/MAVEN-dataset/) or with official implementation. \n\n\n## Brief Introduction 📣\nSPEECH is proposed to address event-centric structured prediction with energy-based hyperspheres.  \nSPEECH models complex dependency among event structured components with energy-based modeling, and represents event classes with simple but effective hyperspheres.\n\n\n## Project Structure 🔍\nThe structure of data and code is as follows: \n\n```shell\nSPEECH\n├── README.md\n├── ACL2023@Poster_Speech.pdf\n├── ACL2023@Slides_Speech.pdf\n├── requirements.txt    # for package requirements\n├── data_utils.py       # for data processing\n├── speech.py\t\t        # main model (bert serves as the backbone)\n├── speech_distilbert.py\t# main model (distilbert serves as the backbone)\n├── speech_roberta.py           # toy model (roberta serves as the backbone, not adopted in the paper and just for reference)\n├── run_speech.py\t# for model running\n├── run_speech.sh\t# bash file for model running \n└── Datasets\t\t    # data\n    ├── MAVEN_ERE   \n    │   ├── train.jsonl     # for training\n    │   ├── test.jsonl      # for testing\n    │   └── valid.jsonl     # for validation\n    ├── OntoEvent-Doc\n    │   ├── event_dict_label_data.json          # containing all event type labels \n    │   ├── event_dict_on_doc_train.json\t# for training\n    │   ├── event_dict_on_doc_test.json\t\t# for testing\n    │   └── event_dict_on_doc_valid.json\t# for validation\n    └── README.md \n```\n\n## Requirements 📦\n\n- python==3.9.12\n\n- torch==1.13.0 \n\n- transformers==4.25.1\n\n- scikit-learn==1.2.2\n\n- torchmetrics==0.9.3\n\n- sentencepiece==0.1.97\n\n\n## Usage 🛠️\n\n**1. Project Preparation**:\n\nDownload this project and unzip the dataset. You can directly download the archive, or run ```git clone https://github.com/zjunlp/SPEECH.git``` in your teminal. \n\n```\ncd [LOCAL_PROJECT_PATH]\n\ngit clone git@github.com:zjunlp/SPEECH.git \n```\n\n\n**2. Data Preparation**: \n\nUnzip [**MAVEN_ERE**](https://github.com/zjunlp/SPEECH/tree/main/Datasets/MAVEN_ERE.zip) and [**OntoEvent-Doc**](https://github.com/zjunlp/SPEECH/tree/main/Datasets/OntoEvent-Doc.zip) datasets stored at ```./Datasets```. \n \n```\ncd Datasets/\nunzip MAVEN_ERE\nunzip OntoEvent-Doc\ncd .. \n```\n\n\n**3. Running Preparation**:\n\nInstall all required packages.  \nAdjust the parameters in [```run_speech.sh```](https://github.com/zjunlp/SPEECH/tree/main/run_speech.sh) bash file. \n\n```\npip install -r requirements.txt\nvim run_speech.sh\n# input the parameters, save and quit\n```\n**Hint**:  \n- Please refer to ```main()``` function in [```run_speech.py```](https://github.com/zjunlp/SPEECH/tree/main/run_speech.py) file for detail meanings of each parameters.\n- Pay attention to ```--ere_task_type``` parameter candidates:  \n    - \"doc_all\" is for \"All Joint\" experiments in the paper \n    - \"doc_joint\" is for each ERE subtask \"+joint\" experiments in the paper\n    - \"doc_temporal\"/\"doc_causal/\"doc_sub\" is for each ERE subtask experiments only \n- Note that the loss ratio λ1, λ2, λ3, for trigger classification, event classification and event-relation extraction depends on different tasks, please ensure a correct setting of these ratios, referring to line 56-61 in [```speech.py```](https://github.com/zjunlp/SPEECH/tree/main/speech.py) and [```speech_distilbert.py```](https://github.com/zjunlp/SPEECH/tree/main/speech_distilbert.py) file for details. We also present the loss ratio setting in Appendix B in our paper.  \n\n\n\n**4. Running Model**:\n \nRun [```./run_speech.sh```](https://github.com/zjunlp/SPEECH/tree/main/run_speech.sh) for *training*, *validation*, and *testing*.  \n\n```\n./run_speech.sh\n\n# Or you can run run_speech.py with manual parameter input in the terminal.\n\npython run_speech.py --para... \n```\n**Hint**:  \n- A folder of model checkpoints will be saved at the path you input (```--output_dir```) in the bash file [```run_speech.sh```](https://github.com/zjunlp/SPEECH/tree/main/run_speech.sh) or the command line in the terminal. \n- We also release the [checkpoints](https://drive.google.com/drive/folders/18gFW_m02pgiGV2piktS308w41iBRZeN2?usp=sharing) for direct testing (Dismiss ```--do_train``` in the parameter input)\n\n\n## How about the Dataset 🗃️\nWe briefly introduce the datasets in Section 4.1 and Appendix A in our paper. \n\n[**MAVEN_ERE**](https://github.com/zjunlp/SPEECH/tree/main/Datasets/MAVEN_ERE.zip) is proposed in a [paper](https://aclanthology.org/2022.emnlp-main.60) and released in [GitHub](https://github.com/THU-KEG/MAVEN-ERE).\n\n[**OntoEvent-Doc**](https://github.com/zjunlp/SPEECH/tree/main/Datasets/OntoEvent-Doc.zip), formatted in document level, is derived from [OntoEvent](https://github.com/231sm/Reasoning_In_EE/tree/main/OntoEvent) which is formatted in sentence level. \n\n### Statistics\nThe statistics of ***MAVEN-ERE*** and ***OntoEvent-Doc*** are shown below, and the detailed data schema can be referred to [```./Datasets/README.md```]. \n\nDataset         | #Document | #Mention | #Temporal | #Causal | #Subevent |\n| :----------------- | ---------------- | ---------------- | ---------------- | ---------------- | ---------------- |\nMAVEN-ERE        | 4,480 | 112,276 | 1,216,217 | 57,992  | 15,841 |\nOntoEvent-Doc    | 4,115 | 60,546 | 5,914 | 14,155 | / |\n\n### Data Format\nThe data schema of MAVEN-ERE can be referred to their [GitHub](https://github.com/THU-KEG/MAVEN-ERE). \nExperiments on MAVEN-ERE in our paper involve:  \n- 6 temporal relations: BEFORE, OVERLAP, CONTAINS, SIMULTANEOUS, BEGINS-ON, ENDS-ON\n- 2 causal relations: CAUSE, PRECONDITION \n- 1 subevent relation: subevent\\_relations\n\nExperiments on OntoEvent-Doc in our paper involve:  \n- 3 temporal relations: BEFORE, AFTER, EQUAL \n- 2 causal relations: CAUSE, CAUSEDBY\n\nWe also add a NA relation to signify no relation between the event mention pair for the two datasets. \n\n🍒 The OntoEvent-Doc dataset is stored in json format. Each *document* (specialized with a *doc_id*, e.g., 95dd35ce7dd6d377c963447eef47c66c) in OntoEvent-Doc datasets contains a list of \"events\" and a dictionary of \"relations\", where the data format is as below:\n\n```\n[a doc_id]:\n{\n    \"events\": [\n    {\n        'doc_id': '...', \n        'doc_title': 'XXX', \n        'sent_id': , \n        'event_mention': '......', \n        'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], \n        'trigger': '...', \n        'trigger_pos': [, ], \n        'event_type': ''\n    },\n    {\n        'doc_id': '...', \n        'doc_title': 'XXX', \n        'sent_id': , \n        'event_mention': '......', \n        'event_mention_tokens': ['.', '.', '.', '.', '.', '.'], \n        'trigger': '...', \n        'trigger_pos': [, ], \n        'event_type': ''\n    },\n    ... \n    ],\n    \"relations\": { // each event-relation contains a list of 'sent_id' pairs.  \n        \"COSUPER\": [[,], [,], [,]], \n        \"SUBSUPER\": [], \n        \"SUPERSUB\": [], \n        \"CAUSE\": [[,], [,]], \n        \"BEFORE\": [[,], [,]], \n        \"AFTER\": [[,], [,]], \n        \"CAUSEDBY\": [[,], [,]], \n        \"EQUAL\": [[,], [,]]\n    }\n} \n```\n\n\n## How to Cite 📝\n📋 Thank you very much for your interest in our work. If you use or extend our work, please cite the following paper:\n\n```bibtex\n@inproceedings{ACL2023_SPEECH,\n    author    = {Shumin Deng and\n                 Shengyu Mao and\n                 Ningyu Zhang and\n                 Bryan Hooi},\n  title       = {SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres},\n  booktitle   = {{ACL} {(1)}},\n  publisher   = {Association for Computational Linguistics},\n  pages       = {351--363},\n  year        = {2023},\n  url         = {https://aclanthology.org/2023.acl-long.21/}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fspeech","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2Fspeech","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fspeech/lists"}