Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/nabilaagha/word2vec-for-arabic-bert-corpus

This project involves training a Word2Vec model on an Arabic BERT corpus to generate high-quality word embeddings, enhancing natural language processing (NLP) applications in Arabic text analysis.
https://github.com/nabilaagha/word2vec-for-arabic-bert-corpus

arabic-dataset arabic-nlp jupyter-notebook nlp nlp-machine-learning python word2vec-model

Last synced: about 23 hours ago
JSON representation

This project involves training a Word2Vec model on an Arabic BERT corpus to generate high-quality word embeddings, enhancing natural language processing (NLP) applications in Arabic text analysis.

Awesome Lists containing this project

README

        

# 🔠 **Word2Vec for Arabic BERT Corpus**

## 📌 Project Overview
This project focuses on training a **Word2Vec model** using the **Arabic BERT Corpus** to generate high-quality **word embeddings**, improving **natural language processing (NLP)** applications for Arabic text analysis. By leveraging **Gensim**, we train a model that captures semantic relationships between words, making it valuable for **text classification, sentiment analysis, and machine translation**.

## 📂 **Dataset**
The dataset used for training is the **[Arabic BERT Corpus](https://www.kaggle.com/datasets/abedkhooli/arabic-bert-corpus)**, a large collection of **preprocessed Arabic text** designed for deep learning models.

## 🚀 **Key Features**
✔️ **Preprocess Arabic Text** – Tokenization, normalization, and stopword removal.
✔️ **Train Word2Vec Model** – Learn vector representations for words.
✔️ **Visualize Word Embeddings** – Explore relationships between words.
✔️ **Optimize for NLP Tasks** – Improve Arabic text-based machine learning models.

## 🛠️ **Technologies Used**
- Python 🐍
- Gensim 🧠
- NLP Libraries (NLTK, spaCy) 📖
- Pandas & NumPy 📊

## 📥 **Getting Started**
Clone this repository and install the required dependencies to start training your own **Word2Vec model** on Arabic text!
![dataset-cover](https://github.com/user-attachments/assets/2ca3c459-ebb7-4980-bbec-29ce0190f0fd)