https://github.com/yassin522/english-grammar-error-correction
The project focuses on leveraging state-of-the-art natural language processing techniques, including the T5 model and a custom Encoder-Decoder architecture, to automatically detect and correct grammatical errors in written English text.
https://github.com/yassin522/english-grammar-error-correction
encoder-decoder grammar grammar-error-correction jupyter-notebook python t5-model
Last synced: about 2 months ago
JSON representation
The project focuses on leveraging state-of-the-art natural language processing techniques, including the T5 model and a custom Encoder-Decoder architecture, to automatically detect and correct grammatical errors in written English text.
- Host: GitHub
- URL: https://github.com/yassin522/english-grammar-error-correction
- Owner: Yassin522
- Created: 2024-02-02T06:48:07.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-02T07:48:48.000Z (over 2 years ago)
- Last Synced: 2025-04-07T10:51:41.251Z (about 1 year ago)
- Topics: encoder-decoder, grammar, grammar-error-correction, jupyter-notebook, python, t5-model
- Language: Jupyter Notebook
- Homepage: https://english-grammar-error-correction.vercel.app
- Size: 2.42 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# English Grammar Error Correction Project
## Overview
This project focuses on the development of an English grammar error correction system using the T5 model and implementing an Encoder-Decoder architecture from scratch. The goal is to create a robust and efficient tool that can automatically detect and correct grammatical errors in written English text.
## Features
- T5 Model Integration: The project leverages the Transformer-based T5 (Text-to-Text Transfer Transformer) model, known for its ability to handle a wide range of natural language processing tasks. The T5 model is fine-tuned specifically for English grammar error correction.
- Encoder-Decoder Architecture: In addition to the pre-trained T5 model, we have implemented an Encoder-Decoder architecture from scratch. This architecture enhances the model's understanding of contextual information and aids in generating accurate corrections for grammatical errors.
- User-Friendly Interface: The system is designed with a user-friendly interface, allowing users to input text and receive corrected output seamlessly. The interface provides a simple yet effective way to interact with the correction system.
## Model Training
- Fine-Tuning T5 Model:
The T5 model is fine-tuned on a dataset containing annotated examples of grammatical errors. This ensures that the model is tailored to the specific task of English grammar correction.
- Encoder-Decoder Training:
The Encoder-Decoder architecture is trained on a parallel corpus of correct and incorrect sentences. The training process involves optimizing the model's parameters to minimize the difference between the predicted corrected sentence and the ground truth.
- Embedding Model:
We utilized the wiki-news-300d-1M.vec pre-trained embedding model to enhance the representation of words in the input text.
```
https://www.kaggle.com/datasets/pablomarino/wikinews300d1msubwordvec
```
- Encoder-Decoder Training Results:
```
311/311 [==============================] - 8213s 26s/step - loss: 0.1656 - f_beta_score: 0.6820 - val_loss: 0.1498 - val_f_beta_score: 0.6787
```
- T5 Training Results:
| step | Training Loss | Validation Loss | Gleu |
|------|---------------|-----------------|---------|
| 250 | No log | 0.732383 | 10.8529 |
| 500 | 0.841700 | 0.699691 | 12.1853 |
| 1000 | 0.742300 | 0.676036 | 13.4657 |
| 1250 | 0.742300 | 0.670769 | 13.6931 |
| 1500 | 0.729500 | 0.668988 | 13.7441 |
## Evaluation
The performance of the grammar correction system is evaluated using metrics such as precision, recall, and F1 score, Gleu.
## Dataset
The training dataset can be found here:
```
https://www.kaggle.com/datasets/studentramya/lang-8?select=lang8.train.auto.bea19.m2
```
It includes annotated examples of grammatical errors for optimizing the model's performance in English grammar correction
