https://github.com/deven96/nlp-tasks

Compilation of NLP tasks
https://github.com/deven96/nlp-tasks

Last synced: 4 months ago
JSON representation

Compilation of NLP tasks

Host: GitHub
URL: https://github.com/deven96/nlp-tasks
Owner: deven96
Created: 2020-02-29T00:08:10.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2022-12-08T03:42:26.000Z (over 2 years ago)
Last Synced: 2025-03-24T04:02:07.747Z (4 months ago)
Language: Python
Size: 42 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 7
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# NLP Tasks

This is a compilation of natural language processing tasks such as sentence similarity, named entity recognition, e.t.c

## Installation

```bash
pip install -r requirements.txt
```

## Data Sources

### Sentence Similarity

Get the [data](https://www.kaggle.com/quora/question-pairs-dataset), extract and store in `data/sentsim/questions.csv`

- `sentsim/wnn.py` makes use of a wide neural network to provide a sigmoid value of sentence similarity.

The architecture used is shown below

![Architecture of WNN](assets/widenn.png)

To make use of it, pass both sentences to the script (NOTE: current threshold is 0.15)

```bash
python sentsim/wnn.py "I am very happy" "I am ectstatic"

First sentence: I am very happy

Second sentence: I am ecstatic

Similar? : True
```

- `sentsim/doc2vec.py` makes use of a word2vec extension for paragraph, document similarity known as [doc2vec](https://www.quora.com/What-is-doc2vec?share=1)

### Named Entity Recognition

Get the [data](https://gmb.let.rug.nl/releases/gmb-2.2.0.zip) and store it as a folder `gmb-2.2.0/` under `data/ner/`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/deven96/nlp-tasks

Awesome Lists containing this project

README