https://github.com/yahiazakaria445/arabic-answer-grading-using-sequence-models

An NLP project on Arabic data using LSTM model
https://github.com/yahiazakaria445/arabic-answer-grading-using-sequence-models

matplotlib nltk numpy pandas scikit-learn tensorflow

Last synced: 4 months ago
JSON representation

An NLP project on Arabic data using LSTM model

Host: GitHub
URL: https://github.com/yahiazakaria445/arabic-answer-grading-using-sequence-models
Owner: yahiazakaria445
Created: 2025-03-10T13:33:13.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-03-10T14:17:51.000Z (11 months ago)
Last Synced: 2025-04-03T23:18:11.772Z (10 months ago)
Topics: matplotlib, nltk, numpy, pandas, scikit-learn, tensorflow
Language: Jupyter Notebook
Homepage:
Size: 136 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          This project focuses on automatically grading short Arabic answers using advanced Natural Language Processing (NLP) techniques and sequence models such as LSTM.

# Dataset

This project uses The ARabic Dataset for Automatic Short Answer Grading Evaluation 

This dataset is intended for exploring research in the field of Automatic Short Answer Grading (ASAG).

[Dataset Link](https://github.com/leilaouahrani/AR-ASAG-Dataset)

# libraries used

- numpy  

- pandas  

- matplotlib  

- seaborn  

- nltk  

- scikit-learn  

- keras  

- tensorflow 

# Text Preprocessing

Arabic text is cleaned, normalized, and tokenized to handle linguistic variations such as diacritics, punctuation, and synonyms. Using `Tokenizer`, text is converted into sequences of tokens (integers), and then `pad_sequences` is applied to ensure all input sequences have a uniform length, which is essential for feeding data into model.

# texts_to_sequences & Word Embedding

After preprocessing the text, the `Tokenizer` is used to convert words into numerical sequences using the `texts_to_sequences` method. These sequences are then padded to ensure consistent input length. The resulting sequences are passed through a word embedding layer (`Word2Vec`) allowing the model to represent each word in a dense vector space that captures semantic and contextual relationships. This encoding forms the foundation for LSTM sequence model input.

# Model Architecture

| model  | optimizer | metrics |loss function  |accuracy score|

| -------- | -------- | -------- | -------- |-------- |

|  LSTM    | adam   |accuracy score    |categorical crossentropy    |85%

# testing sample

|answer| rate |

| -------- | -------- |

| 1    | B   |

|2    | A| 

| 3   | D   |

| a  | B   |

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yahiazakaria445/arabic-answer-grading-using-sequence-models

Awesome Lists containing this project

README