{"id":16269254,"url":"https://github.com/thinkwee/unikeyphrase","last_synced_at":"2025-10-27T21:20:02.864Z","repository":{"id":79907460,"uuid":"375190209","full_name":"thinkwee/UniKeyphrase","owner":"thinkwee","description":"code for the paper \"UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction\"","archived":false,"fork":false,"pushed_at":"2024-10-03T01:26:17.000Z","size":26804,"stargazers_count":23,"open_issues_count":0,"forks_count":7,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-12-29T13:24:09.797Z","etag":null,"topics":["acl2021","keyphrase-generation","natural-language-processing","tencent","unilm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thinkwee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-06-09T01:26:57.000Z","updated_at":"2024-11-29T02:10:06.000Z","dependencies_parsed_at":"2024-10-27T21:53:35.880Z","dependency_job_id":null,"html_url":"https://github.com/thinkwee/UniKeyphrase","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinkwee%2FUniKeyphrase","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinkwee%2FUniKeyphrase/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinkwee%2FUniKeyphrase/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thinkwee%2FUniKeyphrase/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thinkwee","download_url":"https://codeload.github.com/thinkwee/UniKeyphrase/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":232577671,"owners_count":18544845,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["acl2021","keyphrase-generation","natural-language-processing","tencent","unilm"],"created_at":"2024-10-10T18:07:44.155Z","updated_at":"2025-10-27T21:19:57.823Z","avatar_url":"https://github.com/thinkwee.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# UniKeyphrase\n-   code for the ACL 2021 findings paper \"UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction\"\n-   preprint paper: [arxiv](https://arxiv.org/pdf/2106.04847.pdf)\n-   [video presentation](https://aclanthology.org/2021.findings-acl.73.mp4)\n\n# Updates\n-   Update dataset preprocess and evaluate scripts. The datasets can be found in [https://github.com/memray/OpenNMT-kpg-release](https://github.com/memray/OpenNMT-kpg-release), we also give three test sets in dataset.zip\n-   Update v2 version of the paper, no model code changed. See [v2](https://arxiv.org/abs/2106.04847) for the new version, [v1](https://arxiv.org/abs/2106.04847v1) for the older version\n    -   fix a improper tokenization on the datasets which may lead to high results on present F1@M and low results on absent F1@5 \u0026 F1@M\n    -   update some results of baseline(SEG-NET) from arxiv version to the newest ACL version\n    -   following previous work, pad the result when calculating F1@5\n    -   provide more detailed ablation study(layer and module)\n    -   update table of \"average numbers of predict keyphrases\". UniKeyphrase now predicts more accurately after fixing the tokenization problem \n    -   update case study     \n-   Update train and test scripts, see the sciprts/ folder.\n\n# Environment:\n-   prepare for APEX\n```\n    . .bashrc\n    apt-get update\n    apt-get install -y vim wget ssh\n\n    PWD_DIR=$(pwd)\n    cd $(mktemp -d)\n    git clone -q https://github.com/NVIDIA/apex.git\n    cd apex\n    git reset --hard 1603407bf49c7fc3da74fceb6a6c7b47fece2ef8\n    python setup.py install --user --cuda_ext --cpp_ext\n    cd $PWD_DIR\n```\n-   other packages\n```\n    pip install --user tensorboardX six numpy tqdm path.py pandas scikit-learn lmdb pyarrow py-lz4framed methodtools py-rouge pyrouge nltk\n    python -c \"import nltk; nltk.download('punkt')\"\n    pip install -e git://github.com/Maluuba/nlg-eval.git#egg=nlg-eval\n```\n-   get pretrained models from:  https://unilm.blob.core.windows.net/ckpt/unilm1-base-cased.bin\n\n# Run\n-   see scripts\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthinkwee%2Funikeyphrase","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthinkwee%2Funikeyphrase","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthinkwee%2Funikeyphrase/lists"}