An open API service indexing awesome lists of open source software.

https://github.com/crispengari/ahd-detector

♻ Automatic Humour Detector (AHD) is an artificial intelligent project based on Natural Language Processing (NLP) classification task for predicting wether there's humour in given text or not.
https://github.com/crispengari/ahd-detector

artificial-intelligence deep-learning flask graphene javascript machine-learning natural-language-processing nlp nueral-networks pytorch rest-api tensorflow text-classification torchtext typescript

Last synced: 2 months ago
JSON representation

♻ Automatic Humour Detector (AHD) is an artificial intelligent project based on Natural Language Processing (NLP) classification task for predicting wether there's humour in given text or not.

Awesome Lists containing this project

README

          

### Automatic Humour Detection (AHD)

In this repository I present an Application Programable Interface (API) for both `REST` and `GraphQL` in Natural Language Processing(NLP) on the topic Automatic Humour Detection (AHD) on short text.

cover

---

Project: `Automatic Humour Detection (AHD)`

Programmer: `@crispengari`

Date: `2022-04-26`

Abstract: _`Automatic Humour Detection (AHD) is a very useful topic in modern technologies. In this project we are going to create an Artificial Neural Network model(s) using Deep Learning to detect humour in short texts. AHD are very useful because in model technologies such as virtual assistance and chatbots. They help Artificial Virtual Assistance and Bot to detect wether to take the conversation serious or not`._

Research Paper: [`2004.12765`](https://arxiv.org/abs/2004.12765)

Keywords: `pytorch`, `embedding`, `torchtext`, `fast-text`, `LSTM`, `RNN`, `CNN`, `tensorflow`, `keras`, `flask`, `graphql`, `rest`, `bi-directional`,
`api`

Programming Language: `python`

Dataset: [`kaggle`](https://www.kaggle.com/datasets/deepcontractor/200k-short-texts-for-humor-detection)

--

### Approach

I'm going to create two artificial neural network models using `python` and `pytorch`. These models will serve a basic binary classification on text.

The `tensorflow` model will be built using `bi-direction` Long Short Term Memory(LSTM) layers and Gated Recurrent Unit (GRU) together with some layers such as the `Flatten`, `Dropout` , `Dense`. This model will be built using the functional api refer to [this notebook](/notebooks/02_AHD_Classification.ipynb)

The pytorch model will be built using `CNN`. This model have the following code base behind the scenes:

```py
class AHDCNN(nn.Module):
def __init__(self,
vocab_size,
embedding_size,
n_filters,
filter_sizes,
output_size,
dropout,
pad_idx):
super().__init__()
self.embedding = nn.Embedding(vocab_size,
embedding_size,
padding_idx = pad_idx
)
self.convs = nn.ModuleList([
nn.Conv2d(in_channels = 1,
out_channels = n_filters,
kernel_size = (fs, embedding_size))
for fs in filter_sizes
])
self.fc = nn.Linear(len(filter_sizes) * n_filters, output_size)
self.dropout = nn.Dropout(dropout)

def forward(self, text):
embedded = self.embedding(text)
embedded = embedded.unsqueeze(1)
conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs]
pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
cat = self.dropout(torch.cat(pooled, dim = 1))
return self.fc(cat)
```

### Folder structure of the server

The following is the folder structure of our `server`:

```
├───app
│ └───__pycache__
├───blueprints
│ └───__pycache__
├───exceptions
│ └───__pycache__
├───models
│ ├───pytorch
│ │ ├───static
│ │ └───__pycache__
│ ├───tensorflow
│ │ ├───static
│ │ └───__pycache__
│ └───__pycache__
└───schema
└───__pycache_
```

### Getting started

In this section we are going to show how you can use the `ADH` server to make predictions of humour on text locally.

First you are required to have `python` installed on your computer to be more specific python version `3`

First you need to clone this repository by running the following command:

```shell
git clone https://github.com/CrispenGari/ahd-detector.git
```

And then you navigate to the server folder of this repository by running the following command:

```shell
cd ahd-detector/server
```

Next you are going to create a virtual environment `venv` by running the following command:

```shell
virtualenv venv
```

Then you need to activate the virtual environment by running the following command:

```shell
.\venv\Scripts\activate.bat
```

After activating the virtual environment you need to install the required packages by running the following command:

```shell
pip install -r requirements.txt
```

Then you are ready to start the server. To start the server you are going to run the following command:

```shell
cd api && python app.py
```

The above command will start the local server at default port of `3001` you can be able to make request to the server.

> **_Note: The tensorflow static file for the model is not available in this github repository you may need to run [this notebook](./notebooks/02_AHD_Classification.ipynb) so that you can save `.h5` model in the `./server/api/models/tensorflow/static` folder before attempting to make any request to the server._**

### Making GraphQL Request to the server

The graphql endpoint is served at the following urls:

1. http://127.0.0.1:3001/graphql
2. http://localhost:3001/graphql

This endpoint can only serve one query for detecting humour using either the `tensorflow` or `pytorch` model. If you visit the specified url's you will be represented by the `GraphiQL` interface where you can run the query as follows:

```
fragment PredictionFragment on PredictionType {
label
probability
class_
text
}

fragment ErrorFragment on ErrorType {
field
message
}

fragment HumourDetectionResponseFragment on PredictionResponse {
ok
error {
...ErrorFragment
}
prediction {
...PredictionFragment
}
}

{
predictHumour(input: {modelType: "tf", text: "What do you get if king kong sits on your piano? a flat note."}) {
...HumourDetectionResponseFragment
}
}
```

If the query went well you are going to get the response in the following format:

```json
{
"data": {
"predictHumour": {
"ok": true,
"error": null,
"prediction": {
"label": 0,
"probability": 1,
"class_": "HUMOUR",
"text": "what do you get if king kong sits on your piano? a flat note."
}
}
}
}
```

### `input`

The input to the `predictHumour` takes in two arguments the:

1. `modelType`- type of graphql string it can be either `tf` or `pt` and not case sensitive

2. `text` - this is a graphql string which is the text that you want to detect if there's a humour element in it.

### Making `REST` Request to the server for Humour Detection.

You can start detecting humour from text using `rest` approach with different clients. I'm going to show few examples of how to detect humour using the following clients and api's.

### Using `Postman`

Using postman you send a `POST` request to `http://127.0.0.1:3001/api/detect-humour?model=tf` with the following request json body:

```json
{
"text": "If the opposite of pro is con, then what is the opposite of progress"
}
```

To get the following json response:

```json
{
"class_": "HUMOUR",
"label": 0,
"probability": 1.0,
"text": "if the opposite of pro is con, then what is the opposite of progress"
}
```

> **_Note that in the request url the `query-string` `model` is required as either `tf` for tensorflow model or `pt` for pytorch model._**

### Using `cURL`

To detect humour on text using `curl` the request may look as follows:

```shell
curl -X POST http://127.0.0.1:3001/api/detect-humour?model=tf -H "Content-Type: application/json" -d "{\"text\": \"If the opposite of pro is con, then what is the opposite of progress\"}"
```

The response will be as follows:

```json
{
"class_": "HUMOUR",
"label": 0,
"probability": 1.0,
"text": "if the opposite of pro is con, then what is the opposite of progress"
}
```

### Using `javascript-fetch` API

You can use the `js` fetch api to make the request to the server locally using the following snippet code:

```js
fetch("http://127.0.0.1:3001/api/detect-humour?model=tf", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
text: "If the opposite of pro is con, then what is the opposite of progress",
}),
})
.then((res) => res.json())
.then((data) => console.log(data));
```

The response will be as follows:

```json
{
"class_": "HUMOUR",
"label": 0,
"probability": 1.0,
"text": "if the opposite of pro is con, then what is the opposite of progress"
}
```

### Data

The data for training these notebooks was found on [`kaggle`](https://www.kaggle.com/datasets/deepcontractor/200k-short-texts-for-humor-detection) and you can also find the `csv` in the `data` folder.

### Notebooks

All the notebooks for training and data preparations are found in this repository.

1. [data preparation](/notebooks/00_AHD_Data_Prep.ipynb)
2. [pytorch model with glove word vectors](/notebooks/01_AHD_Classification.ipynb)
3. [pytorch model without glove word vectors](/notebooks/03_AHD_Classification_No_EmbeddingVectors.ipynb)
4. [tensorflow model](/notebooks/02_AHD_Classification.ipynb)

> **_Note that the pytorch `model` that is being served by the server was trained in the 3rd notebook without glove word vectors._**