https://github.com/bhawnamehbubani/conversational-chatbot-using-lstm

Creation of a conversational chatbot using Sequence-to-Sequence (Seq2Seq) Long Short-Term Memory (LSTM) models. A chatbot built on deep learning offers human-like responses, making it ideal for applications like customer support, personal assistants, and more.
https://github.com/bhawnamehbubani/conversational-chatbot-using-lstm

chatbot lstm seq2seq-model

Last synced: about 2 months ago
JSON representation

Host: GitHub
URL: https://github.com/bhawnamehbubani/conversational-chatbot-using-lstm
Owner: BhawnaMehbubani
Created: 2020-05-19T14:04:48.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2025-01-26T13:20:49.000Z (4 months ago)
Last Synced: 2025-02-06T03:52:26.558Z (3 months ago)
Topics: chatbot, lstm, seq2seq-model
Language: Jupyter Notebook
Homepage:
Size: 16.1 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Conversational Chatbot using LSTM

## Table of Contents
1. [Introduction](#introduction)
2. [Project Objective](#project-objective)
3. [Why LSTM for Chatbots?](#why-lstm-for-chatbots)
4. [Dataset and Preprocessing](#dataset-and-preprocessing)
5. [Pipeline Architecture Workflow](#pipeline-architecture-workflow)
6. [Detailed Steps and Algorithms Used](#detailed-steps-and-algorithms-used)
7. [Data Insights](#data-insights)
8. [How to Run the Project](#how-to-run-the-project)
9. [Results](#results)
10. [Future Work and Improvements](#future-work-and-improvements)
11. [Contributions](#contributions)

## Introduction

This project demonstrates the creation of a conversational chatbot using **Sequence-to-Sequence (Seq2Seq) Long Short-Term Memory (LSTM)** models. A chatbot built on deep learning offers human-like responses, making it ideal for applications like customer support, personal assistants, and more.

## Project Objective

The goal of this project is to:
- Build a chatbot capable of generating meaningful and contextually relevant responses.
- Train it using **Seq2Seq LSTM** models for tasks like **machine translation, speech recognition, and conversational AI**.
- Provide insights into the dataset used for training, model workflow, and evaluation.

## Why LSTM for Chatbots?

LSTM networks are a type of **Recurrent Neural Network (RNN)** that excels at processing and predicting sequence data by remembering information over long periods. For chatbots:
- **Memory cells** allow LSTMs to retain context while generating responses.
- They address the **vanishing gradient problem** of traditional RNNs.
- LSTMs can handle the sequential nature of conversational data effectively.

## Dataset and Preprocessing

**Dataset:**
The chatbot uses the **Chatterbot Kaggle English Dataset**. It contains question-answer pairs from topics such as history, AI, and food.

### Key Preprocessing Steps:
1. **Parsing:** Extract questions and answers from `.yml` files.
2. **Cleaning:**
- Remove punctuation and special characters.
- Normalize case by converting text to lowercase.
- Handle multi-line answers by concatenating sentences.
3. **Tokenizer Creation:** Map unique words to integer tokens.
4. **Data Arrays Preparation:**
- **Encoder Input Data:** Tokenized questions padded to the maximum sequence length.
- **Decoder Input Data:** Tokenized answers padded similarly.
- **Decoder Output Data:** Shifted versions of `Decoder Input Data`.

## Project Architecture

```plaintext
Data Collection
|
v
+-----------------------------------+
| Kaggle Chatterbot Dataset |
+-----------------------------------+
|
v
Data Preprocessing
|
+-----------------------------------------------------+
| |
v v
Question-Answer Parsing Text Cleaning
- Extract pairs from .yml files - Remove punctuation
- Concatenate multi-line answers - Lowercase conversion
- Tokenize and pad sequences
|
v
+---------------------------------------------------------------+
| Feature Engineering |
| - Create Tokenizer for vocabulary mapping |
| - Build Encoder Input, Decoder Input, and Decoder Output |
+---------------------------------------------------------------+
|
v
Seq2Seq Model Definition
|
+--------------------------------------------------------------+
| |
v v
Encoder Model Decoder Model
- Embedding Layer - Embedding Layer
- LSTM Layer - LSTM Layer
- Dense Layer for state vectors - Dense Layer for predictions
|
v
Model Training
|
+--------------------------------------------------------------+
| |
v v
Loss Calculation RMSProp Optimizer
- Categorical Crossentropy Loss - Update model weights
|
v
Model Evaluation
|
v
Inference Models
|
+--------------------------------------------------------------+
| |
v v
Encoder Inference Model Decoder Inference Model
- Generates state vectors - Generates response sequences
|
v
Chatbot Interaction
- Input questions
- Predict responses
- Generate conversational answers
```

## Detailed Steps and Algorithms Used

### 1. Data Extraction and Preprocessing:
- **Algorithm:** Python string operations, TensorFlow’s tokenizer API.
- **Goal:** Clean, tokenize, and pad the dataset for the Seq2Seq model.

### 2. Model Definition:
- **Encoder-Decoder Architecture:**
- Encoder compresses input sequences into a context vector (`h` and `c` state).
- Decoder generates output sequences using context and previous outputs.
- **Algorithm:** LSTM layers with Keras Functional API.
- **Reason:** Seq2Seq models excel at sequence-based tasks.

### 3. Training:
- **Algorithm:** RMSprop optimizer + categorical cross-entropy loss.
- **Details:** 150 epochs with validation split for training accuracy (~96%).

### 4. Inference Models:
- **Encoder Inference Model:** Generates the context vector.
- **Decoder Inference Model:** Produces response sequences token by token.

```

## Data Insights

1. **Vocabulary Size:**
- Questions: ~12,000 unique tokens.
- Answers: ~15,000 unique tokens.

2. **Token Distribution:**
- Short questions/answers dominate (~5–10 tokens).
- Answers tend to include polite phrases like "thank you," "please," etc.

3. **Real-Life Observations:**
- Repetitive questions often have slight variations in answers.
- Dataset bias is visible toward popular subjects like food and AI.
- Certain questions have multiple valid responses (e.g., greetings).

## How to Run the Project

### Prerequisites:
- Python 3.8 or above.
- Install required libraries:
```bash
pip install numpy tensorflow keras pickle
```

### Steps to Run:
1. Clone the repository:
```bash
git clone https://github.com/BhawnaMehbubani/Conversational-Chatbot-using-LSTM.git
cd Conversational-Chatbot-using-LSTM
```
2. Run the Jupyter notebook:
```bash
jupyter notebook Zomato_Restaurant_Clustering_and_Sentiment_Analysis.ipynb
```
3. Interact with the chatbot using the **Talking with Chatbot** cell.

## Results

- **Training Accuracy:** 96% after 150 epochs.
- **Generated Responses:** Contextually relevant and coherent replies.
- **Examples:**
- **Input:** "What is AI?"
**Output:** "AI stands for Artificial Intelligence."
- **Input:** "How are you?"
**Output:** "I am fine, thank you."

## Future Work and Improvements

1. **Improve Response Variety:** Use Beam Search or Transformer-based models (e.g., BERT).
2. **Add Context Awareness:** Include mechanisms to track conversation history.
3. **Expand Dataset:** Add more domains and diversify the training data.

## Contributions

Contributions, suggestions, and feature requests are welcome! Feel free to open an issue or submit a pull request.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bhawnamehbubani/conversational-chatbot-using-lstm

Awesome Lists containing this project

README