Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/talmago/spacy_ke
Keyword extraction with spaCy
https://github.com/talmago/spacy_ke
keyword-extraction keyword-extractor positionrank spacy spacy-extension spacy-nlp spacy-pipeline textrank topicrank yake
Last synced: 29 days ago
JSON representation
Keyword extraction with spaCy
- Host: GitHub
- URL: https://github.com/talmago/spacy_ke
- Owner: talmago
- License: gpl-3.0
- Created: 2020-09-15T12:07:28.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2021-11-08T23:18:19.000Z (about 3 years ago)
- Last Synced: 2024-10-01T05:41:28.145Z (about 1 month ago)
- Topics: keyword-extraction, keyword-extractor, positionrank, spacy, spacy-extension, spacy-nlp, spacy-pipeline, textrank, topicrank, yake
- Language: Python
- Homepage: https://pypi.org/project/spacy-ke/
- Size: 270 KB
- Stars: 31
- Watchers: 2
- Forks: 6
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spacy_ke: Keyword Extraction with spaCy.
## ⏳ Installation
```bash
pip install spacy_ke
```## 🚀 Quickstart
### Usage as a spaCy pipeline component (spaCy v2.x.x)
```python
import spacy
import spacy_ke# load spacy model
nlp = spacy.load("en_core_web_sm")# spacy v3.0.x factory.
# if you're using spacy v2.x.x swich to `nlp.add_pipe(spacy_ke.Yake(nlp))`
nlp.add_pipe("yake")doc = nlp(
"Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence "
"concerned with the interactions between computers and human language, in particular how to program computers "
"to process and analyze large amounts of natural language data. "
)for keyword, score in doc._.extract_keywords(n=3):
print(keyword, "-", score)
```### Configure the pipeline component
Normally you'd want to configure the keyword extraction pipeline according to its implementation.
```python
window: int = 2 # default
lemmatize: bool = False # default
candidate_selection: str = "ngram" # default, use "chunk" for noun phrase selection.nlp.add_pipe(
Yake(
nlp,
window=window, # default
lemmatize=lemmatize, # default
candidate_selection="ngram" # default, use "chunk" for noun phrase selection
)
)
```And if you want to define a custom candidate selection use the example below.
```python
from typing import Iterable
from spacy.tokens import Doc
from spacy_ke.util import registry, Candidate@registry.candidate_selection.register("custom")
def custom_selection(doc: Doc, n=3) -> Iterable[Candidate]:
...nlp.add_pipe(
Yake(
nlp,
candidate_selection="custom"
)
)```
## Development
Set up virtualenv
```sh
$ python -m venv .venv
$ source .venv/bin/activate
```Install dependencies
```sh
$ pip install -U pip
$ pip install -r requirements-dev.txt
```Run unit test
```sh
$ pytest
```Run black (code formatter)
```sh
$ black spacy_ke/ --config=pyproject.toml
```Release package (via `twine`)
```sh
$ python setup.py upload
```## References
[1] A Review of Keyphrase Extraction
```
@article{DBLP:journals/corr/abs-1905-05044,
author = {Eirini Papagiannopoulou and
Grigorios Tsoumakas},
title = {A Review of Keyphrase Extraction},
journal = {CoRR},
volume = {abs/1905.05044},
year = {2019},
url = {http://arxiv.org/abs/1905.05044},
archivePrefix = {arXiv},
eprint = {1905.05044},
timestamp = {Tue, 28 May 2019 12:48:08 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1905-05044.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```[2] [pke](https://github.com/boudinfl/pke): an open source python-based keyphrase extraction toolkit.
```
@InProceedings{boudin:2016:COLINGDEMO,
author = {Boudin, Florian},
title = {pke: an open source python-based keyphrase extraction toolkit},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
month = {December},
year = {2016},
address = {Osaka, Japan},
pages = {69--73},
url = {http://aclweb.org/anthology/C16-2015}
}
```