https://github.com/supercoderhawk/deep-keyphrase

seq2seq based keyphrase generation model sets, including copyrnn copycnn and copytransfomer
https://github.com/supercoderhawk/deep-keyphrase

copynet keyphrase-extraction keyphrase-generation keyword-extraction pytorch seq2seq

Last synced: 11 months ago
JSON representation

seq2seq based keyphrase generation model sets, including copyrnn copycnn and copytransfomer

Host: GitHub
URL: https://github.com/supercoderhawk/deep-keyphrase
Owner: supercoderhawk
Created: 2019-11-05T08:52:48.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-02-07T07:02:26.000Z (over 4 years ago)
Last Synced: 2025-08-01T01:34:51.409Z (12 months ago)
Topics: copynet, keyphrase-extraction, keyphrase-generation, keyword-extraction, pytorch, seq2seq
Language: Python
Homepage:
Size: 161 KB
Stars: 50
Watchers: 2
Forks: 8
Open Issues: 1
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.rst
- License: LICENSE.txt

Awesome Lists containing this project

README

          ==============

deep-keyphrase

==============

Implement some keyphrase generation algorithm

.. image:: https://img.shields.io/github/workflow/status/supercoderhawk/deep-keyphrase/ci.svg

.. image:: https://img.shields.io/pypi/v/deep-keyphrase.svg

    :target: https://pypi.org/project/deep-keyphrase

.. image:: https://img.shields.io/pypi/dm/deep-keyphrase.svg

    :target: https://pypi.org/project/pysenal

Description

===========

Implemented Paper

>>>>>>>>>>>>>>>>>>>>>

CopyRNN

`Deep Keyphrase Generation (Meng et al., 2017)`__

.. __: https://arxiv.org/abs/1704.06879

ToDo List

>>>>>>>>>>>>>>>

CopyCNN

CopyTransformer

Usage

============

required files (4 files in total)

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

1. vocab_file: word line by line (don't with index!!!!) ::

    this

    paper

    proposes

2. training, valid and test file

data format for training, valid and test

""""""""""""""""""""""""""""""""""""""""""""""""""

json line format, every line is a dict::

    {'tokens': ['this', 'paper', 'proposes', 'using', 'virtual', 'reality', 'to', 'enhance', 'the', 'perception', 'of', 'actions', 'by', 'distant', 'users', 'on', 'a', 'shared', 'application', '.', 'here', ',', 'distance', 'may', 'refer', 'either', 'to', 'space', '(', 'e.g.', 'in', 'a', 'remote', 'synchronous', 'collaboration', ')', 'or', 'time', '(', 'e.g.', 'during', 'playback', 'of', 'recorded', 'actions', ')', '.', 'our', 'approach', 'consists', 'in', 'immersing', 'the', 'application', 'in', 'a', 'virtual', 'inhabited', '3d', 'space', 'and', 'mimicking', 'user', 'actions', 'by', 'animating', 'avatars', '.', 'we', 'illustrate', 'this', 'approach', 'with', 'two', 'applications', ',', 'the', 'one', 'for', 'remote', 'collaboration', 'on', 'a', 'shared', 'application', 'and', 'the', 'other', 'to', 'playback', 'recorded', 'sequences', 'of', 'user', 'actions', '.', 'we', 'suggest', 'this', 'could', 'be', 'a', 'low', 'cost', 'enhancement', 'for', 'telepresence', '.'] ,

    'keyphrases': [['telepresence'], ['animation'], ['avatars'], ['application', 'sharing'], ['collaborative', 'virtual', 'environments']]}

Training

>>>>>>>>>>>>>>>

download the kp20k_

.. _kp20k: https://drive.google.com/uc?id=1ZTQEGZSq06kzlPlOv4yGjbUpoDrNxebR&export=download

::

    mkdir data

    mkdir data/raw

    mkdir data/raw/kp20k_new

    # !! please unzip kp20k data put the files into above folder manually

    python -m nltk.downloader punkt

    bash scripts/prepare_kp20k.sh

    bash scripts/train_copyrnn_kp20k.sh

    # start tensorboard

    # enter the experiment result dir, suffix is time that experiment starts

    cd data/kp20k/copyrnn_kp20k_basic-20191212-080000

    # start tensorboard services

    tenosrboard --bind_all --logdir logs --port 6006

Notes

=============================

1. compared with the original :code:`seq2seq-keyphrase-pytorch`

    1. fix the implementation error:

        1. copy mechanism

        2. train and inference are not correspond (training doesn\'t have input feeding and inference has input feeding)

    2. easy data preparing

    3. tensorboard support

    4. **faster beam search (6x faster used cpu and more than 10x faster used gpu)**

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/supercoderhawk/deep-keyphrase

Awesome Lists containing this project

README