Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ainsleyclark/nlp
NLP (Natrual Language Processing) API via the pke (Python Keyphrase Extraction) engine.
https://github.com/ainsleyclark/nlp
computational-linguistics information-retrieval keyphrase keyphrase-extraction keyphrase-extractor keyword-analysis keyword-extraction keywords natural-language-processing nlp pke python python-keyword seo
Last synced: 22 days ago
JSON representation
NLP (Natrual Language Processing) API via the pke (Python Keyphrase Extraction) engine.
- Host: GitHub
- URL: https://github.com/ainsleyclark/nlp
- Owner: ainsleyclark
- License: mit
- Created: 2022-05-03T11:38:27.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2022-05-04T08:32:20.000Z (over 2 years ago)
- Last Synced: 2024-11-15T22:45:14.545Z (2 months ago)
- Topics: computational-linguistics, information-retrieval, keyphrase, keyphrase-extraction, keyphrase-extractor, keyword-analysis, keyword-extraction, keywords, natural-language-processing, nlp, pke, python, python-keyword, seo
- Language: Python
- Homepage:
- Size: 48.8 KB
- Stars: 1
- Watchers: 4
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# nlp
NLP (Natrual Language Processing) API via the pke (Python Keyphrase Extraction) engine to extract keywords and analyse
topics from text. SalienceThis library ships with supervised models trained on the [SemEval-2010 dataset](http://aclweb.org/anthology/S10-1004).
[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)
## Authentication & URL
- Main production API URL sits at `https://nlp-vqyb5tu4fq-ew.a.run.app` **Note** - This will likely change.
- The base URL of the application is `/api/v1` and must be prepended to every request.
- A token must be passed for request to this application, it must be set as a header with the key of `X-Auth-Token`.## Endpoints
Below is a list of endpoints the API serves.
### Ping
➡️ GET `/api/v1/`
Heartbeat endpoint that doesn't require any authorisation.
**Example response:**
```json
{
"status": 200,
"message": "PONG",
"error": false,
"data": null
}```
### Extraction
➡️ POST `/api/v1/`
This endpoint extracts keywords from a given piece of text. The JSON body for the endpoint is described below. A slice
of objects is returned on successful submission which details the keyword and salience score.| Key | Example Value | Default Value | Required | Notes |
|-----------|:----------------|:--------------|:---------|:------------------------------------------|
| language | `"en"` | en | ✅ | See below for available language keys |
| limit | `10` | 30 | ✅ | The amount of keywords to extract. |
| text | `"My keywords"` | N/A | ❌ | The content to extract the keywords from |
| stopwords | `["exclude"]` | N/A | ❌ | Specific words to exclude |
| dirty | `["exclude"]` | N/A | ❌ | Words that contain a substring to exclude |**Example response:**
```json
{
"status": 200,
"message": "Successfully obtained keywords.",
"error": false,
"data": [
{
"term": "seo",
"salience": 165.13790907034348
},
{
"term": "reddico",
"salience": 100.51872726020909
},
{
"term": "serp",
"salience": 28.719636360059738
},
{
"term": "brands",
"salience": 25.899545450074672
},
{
"term": "insights",
"salience": 23.97899999016428
},
{
"term": "unique technology",
"salience": 21.539727270044803
},
{
"term": "blackrock",
"salience": 21.539727270044803
},
{
"term": "reddico digital",
"salience": 21.539727270044803
},
{
"term": "technology",
"salience": 21.251797342284842
},
{
"term": "learn",
"salience": 20.368295910502656
},
{
"term": "agency",
"salience": 20.04992044286311
},
{
"term": "optimised",
"salience": 19.43192398051029
},
{
"term": "talent",
"salience": 18.539727270044803
},
{
"term": "company",
"salience": 18.16462612505645
},
{
"term": "team",
"salience": 16.725550003417045
}
]
}
```## Languages
The available languages and keys for the library is listed below.
```json
"da": "danish"
"du": "dutch"
"en": "english"
"fi": "finnish"
"fr": "french"
"ge": "german"
"it": "italian"
"no": "norwegian"
"po": "portuguese"
"ro": "romanian"
"ru": "russian"
"sp": "spanish"
"sw": "swedish"
```## Excluding Words
To exclude words from the extraction you can either pass `stopwords` or `dirty` in the JSON body of the request, the
difference is explained below. If you notice a pattern with a word regularly occurring, please add the word
to `./exclude/stopwords.json` or `./exclude/dirty.json` and make a pull request.### Stopwords
Stopwords are specific words to exclude from the analysis.
### Dirty
Dirty words will be compared by a substring to see if the keyword contains the word passed, if it does it will be
excluded from the analysis.## Implemented Models
This library currently implements the following keyphrase extraction models:
* Unsupervised models
* Statistical models
* FirstPhrases
* TfIdf
* KPMiner [(El-Beltagy and Rafea, 2010)](http://www.aclweb.org/anthology/S10-1041.pdf)
* YAKE [(Campos et al., 2020)](https://doi.org/10.1016/j.ins.2019.09.013)
* Graph-based models
* TextRank [(Mihalcea and Tarau, 2004)](http://www.aclweb.org/anthology/W04-3252.pdf)
* SingleRank [(Wan and Xiao, 2008)](http://www.aclweb.org/anthology/C08-1122.pdf)
* TopicRank [(Bougouin et al., 2013)](http://aclweb.org/anthology/I13-1062.pdf)
* TopicalPageRank [(Sterckx et al., 2015)](http://users.intec.ugent.be/cdvelder/papers/2015/sterckx2015wwwb.pdf)
* PositionRank [(Florescu and Caragea, 2017)](http://www.aclweb.org/anthology/P17-1102.pdf)
* MultipartiteRank [(Boudin, 2018)](https://arxiv.org/abs/1803.08721)
* Supervised models
* Feature-based models
* Kea [(Witten et al., 2005)](https://www.cs.waikato.ac.nz/ml/publications/2005/chap_Witten-et-al_Windows.pdf)## Development
To get started with local development for the project, please see the following steps below.
### Setup
This library relies on relies on `spacy` (>= 3.2.3) for text processing and
requires [models](https://spacy.io/usage/models) to be installed. To set up the dependencies of the project, run the
following setup script.```bash
sudo chmod -R 777 ./bin
./bin/start.sh
```### Token
Export the environment variable `NLP_TOKEN` to set an authorisation token to be used for the API. Subsequent requests
should use `X-Auth-Token` with the value of the exported token.```bash
export NLP_TOKEN=mytoken
```### Docker
A dockerfile is included in this project, so you can run the API locally.
```bash
docker build . nlp
docker run -it -p 8080:8080 nlp
```