Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/deven96/nlp-tasks
Compilation of NLP tasks
https://github.com/deven96/nlp-tasks
Last synced: about 1 month ago
JSON representation
Compilation of NLP tasks
- Host: GitHub
- URL: https://github.com/deven96/nlp-tasks
- Owner: deven96
- Created: 2020-02-29T00:08:10.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T03:42:26.000Z (almost 2 years ago)
- Last Synced: 2024-10-02T07:55:03.415Z (about 2 months ago)
- Language: Python
- Size: 42 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NLP Tasks
This is a compilation of natural language processing tasks such as sentence similarity, named entity recognition, e.t.c
## Installation
```bash
pip install -r requirements.txt
```## Data Sources
### Sentence Similarity
Get the [data](https://www.kaggle.com/quora/question-pairs-dataset), extract and store in `data/sentsim/questions.csv`
- `sentsim/wnn.py` makes use of a wide neural network to provide a sigmoid value of sentence similarity.
The architecture used is shown below![Architecture of WNN](assets/widenn.png)
To make use of it, pass both sentences to the script (NOTE: current threshold is 0.15)
```bash
python sentsim/wnn.py "I am very happy" "I am ectstatic"First sentence: I am very happy
Second sentence: I am ecstatic
Similar? : True
```- `sentsim/doc2vec.py` makes use of a word2vec extension for paragraph, document similarity known as [doc2vec](https://www.quora.com/What-is-doc2vec?share=1)
### Named Entity Recognition
Get the [data](https://gmb.let.rug.nl/releases/gmb-2.2.0.zip) and store it as a folder `gmb-2.2.0/` under `data/ner/`