https://github.com/zeuscoderbe/next_word_predicting
NLP bacsic
https://github.com/zeuscoderbe/next_word_predicting
deep-learning generative-ai machine-learning nlp-machine-learning rnn-tensorflow text-generator
Last synced: 9 months ago
JSON representation
NLP bacsic
- Host: GitHub
- URL: https://github.com/zeuscoderbe/next_word_predicting
- Owner: ZeusCoderBE
- Created: 2024-04-14T03:26:43.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-06-04T14:43:17.000Z (over 1 year ago)
- Last Synced: 2025-02-05T08:51:28.638Z (11 months ago)
- Topics: deep-learning, generative-ai, machine-learning, nlp-machine-learning, rnn-tensorflow, text-generator
- Language: Jupyter Notebook
- Homepage:
- Size: 6.09 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Applying Artificial Neural Networks to Build Text Generation Models as Part of the Generative AI Problem
## WorkFlow

### Visualization after word embedding

### Environment setup
1. **Install Python libraries:** `numpy`,`tensorflow`,`scikit-learn`, `regex` .
2. **Dataset:**
- Text data is used from "Land Law"
- Link: https://thuvienphapluat.vn/van-ban/Bat-dong-san/Luat-Dat-dai-2024-31-2024-QH15-523642.aspx
### Steps to build a model in the project
1. **Preprocessing:**
- Perform data preprocessing steps, including sentence extraction, meaningful word matching, white space removal, punctuation removal, word dictionary
generation, and input sequence generation using the n-gram method.
3. **Word Embedding:**
-I use a word embedding layer to reduce the representation size of words. To improve computing and learning abilities.
-I represent words in a multidimensional vector space to capture semantic relationships.
4. **Recurrent Neural Network (RNN):**
- I built a deep learning architecture, including embedding layers and SimpleRNN to train the model.
- I used TensorFlow and Keras libraries to develop and evaluate the model
6. **Performance evaluation**
- Used metrics such as accuracy, precision, recall, and F1 score to evaluate the performance of RNN models on the test set.
### Libraries and Technology
- **Programming language:** Python
- **Main libraries:** numpy, scikit-learn, tensorFlow, regex
- **Model:** RNN
### Oriented development
- I will use a larger data set to build the model.
- I will use other models such as LSTM and GRU to overcome the limitation of the RNN model that the derivative in the back propagation process is exploded or vanishing.
- I will use the contextual word separation method for better performance
### Conclusion
- This project provides an overview of different NLP techniques and how to implement them for text processing and analysis. The models were trained and evaluated on real-world text data to ensure their effectiveness in handling natural language tasks.