https://github.com/thangtran3112/machine-learning
NLP, Neural networks, pytorch, tensorflow, AWS Sagemaker fine-tuning
https://github.com/thangtran3112/machine-learning
artificial-neural-networks aws-bedrock aws-sagemaker gensim gru-neural-networks keras lemmatization lstm-neural-networks nltk numpy one-hot-encoding pandas python recurrent-neural-network scikit-learn tensorflow tfidf-vectorizer word2vec
Last synced: 6 months ago
JSON representation
NLP, Neural networks, pytorch, tensorflow, AWS Sagemaker fine-tuning
- Host: GitHub
- URL: https://github.com/thangtran3112/machine-learning
- Owner: thangtran3112
- License: mit
- Created: 2024-11-22T03:49:56.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-16T20:10:49.000Z (11 months ago)
- Last Synced: 2025-03-16T21:26:01.061Z (11 months ago)
- Topics: artificial-neural-networks, aws-bedrock, aws-sagemaker, gensim, gru-neural-networks, keras, lemmatization, lstm-neural-networks, nltk, numpy, one-hot-encoding, pandas, python, recurrent-neural-network, scikit-learn, tensorflow, tfidf-vectorizer, word2vec
- Language: Jupyter Notebook
- Homepage: https://movie-review-sentiment-rnn.streamlit.app
- Size: 195 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# machine-learning
## Initializations
```bash
conda create -p venv python=3.12
conda activate venv/
```
## Tokenization
- Paragraph to sentences (Tokenization into sentence)
- Sentence to words/vocabolary (Tokenization into words)
## NLTK (Tokenization library)
- Alternative: `spacy`
```bash
pip install ntlk
pip install numpy
```
- Or install those libraries, just for the Conda environment:
```bash
conda install -p venv/ nltk
conda install -p venv/ numpy
```
- [Install skykit-learn](https://scikit-learn.org/stable/install.html). This may install dependency from `numpy` and `scipy`
```bash
conda uninstall numpy # just in case
conda install -c conda-forge scikit-learn
```
- Or install with `pip` under conda environment. If there is stack error, uninstall `numpy`
```bash
pip unstall numpy
pip install --force-reinstall scikit-learn
```
- One time action to download punkt_tab, in python code:
```python
import nltk
nltk.download('punkt') # Download the tokenizer models
from nltk.tokenize import word_tokenize, sent_tokenize
text = "Hello, world! This is NLTK's tokenizer."
words = word_tokenize(text) # Tokenizes into words
sentences = sent_tokenize(text) # Tokenizes into sentences
```
### Lemmatization
- Require to download: nltk.download('wordnet')
```python
nltk.download('wordnet')
```