An open API service indexing awesome lists of open source software.

https://github.com/yassin522/english-grammar-error-correction

The project focuses on leveraging state-of-the-art natural language processing techniques, including the T5 model and a custom Encoder-Decoder architecture, to automatically detect and correct grammatical errors in written English text.
https://github.com/yassin522/english-grammar-error-correction

encoder-decoder grammar grammar-error-correction jupyter-notebook python t5-model

Last synced: about 2 months ago
JSON representation

The project focuses on leveraging state-of-the-art natural language processing techniques, including the T5 model and a custom Encoder-Decoder architecture, to automatically detect and correct grammatical errors in written English text.

Awesome Lists containing this project

README

          

# English Grammar Error Correction Project

## Overview
This project focuses on the development of an English grammar error correction system using the T5 model and implementing an Encoder-Decoder architecture from scratch. The goal is to create a robust and efficient tool that can automatically detect and correct grammatical errors in written English text.

## Features
- T5 Model Integration: The project leverages the Transformer-based T5 (Text-to-Text Transfer Transformer) model, known for its ability to handle a wide range of natural language processing tasks. The T5 model is fine-tuned specifically for English grammar error correction.

- Encoder-Decoder Architecture: In addition to the pre-trained T5 model, we have implemented an Encoder-Decoder architecture from scratch. This architecture enhances the model's understanding of contextual information and aids in generating accurate corrections for grammatical errors.

- User-Friendly Interface: The system is designed with a user-friendly interface, allowing users to input text and receive corrected output seamlessly. The interface provides a simple yet effective way to interact with the correction system.

## Model Training
- Fine-Tuning T5 Model:
The T5 model is fine-tuned on a dataset containing annotated examples of grammatical errors. This ensures that the model is tailored to the specific task of English grammar correction.

- Encoder-Decoder Training:
The Encoder-Decoder architecture is trained on a parallel corpus of correct and incorrect sentences. The training process involves optimizing the model's parameters to minimize the difference between the predicted corrected sentence and the ground truth.

- Embedding Model:
We utilized the wiki-news-300d-1M.vec pre-trained embedding model to enhance the representation of words in the input text.
```
https://www.kaggle.com/datasets/pablomarino/wikinews300d1msubwordvec
```

- Encoder-Decoder Training Results:
```
311/311 [==============================] - 8213s 26s/step - loss: 0.1656 - f_beta_score: 0.6820 - val_loss: 0.1498 - val_f_beta_score: 0.6787
```
- T5 Training Results:

| step | Training Loss | Validation Loss | Gleu |
|------|---------------|-----------------|---------|
| 250 | No log | 0.732383 | 10.8529 |
| 500 | 0.841700 | 0.699691 | 12.1853 |
| 1000 | 0.742300 | 0.676036 | 13.4657 |
| 1250 | 0.742300 | 0.670769 | 13.6931 |
| 1500 | 0.729500 | 0.668988 | 13.7441 |

## Evaluation
The performance of the grammar correction system is evaluated using metrics such as precision, recall, and F1 score, Gleu.

## Dataset
The training dataset can be found here:
```
https://www.kaggle.com/datasets/studentramya/lang-8?select=lang8.train.auto.bea19.m2
```
It includes annotated examples of grammatical errors for optimizing the model's performance in English grammar correction

![Screenshot (2212)](https://github.com/Yassin522/English-Grammar-Error-Correction/assets/88105077/a0b498e5-5f8e-4fb3-b53d-56f5f15543db)