Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/andreped/atg-nlp
Code relevant for training and evaluating multi-task NLPs from free text using TensorFlow
https://github.com/andreped/atg-nlp
bert-model cnn deep-learning google-colab-notebook natural-language-processing nlp tensorflow2
Last synced: 4 days ago
JSON representation
Code relevant for training and evaluating multi-task NLPs from free text using TensorFlow
- Host: GitHub
- URL: https://github.com/andreped/atg-nlp
- Owner: andreped
- License: mit
- Created: 2021-03-31T14:13:28.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-03-31T17:52:47.000Z (almost 4 years ago)
- Last Synced: 2024-12-19T22:09:39.981Z (14 days ago)
- Topics: bert-model, cnn, deep-learning, google-colab-notebook, natural-language-processing, nlp, tensorflow2
- Language: Jupyter Notebook
- Homepage:
- Size: 123 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ATG-NLP
Code relevant for training and evaluating NLPs from free text using Auxilliary Task Guiding (ATG).The code is inspired by [this](https://keras.io/examples/nlp/text_extraction_with_bert/) example from the Keras.io page. In this project we have used TensorFlow 2.4 to train neural networks. The datasets used are from the [Huggingface/datasets Hub](https://huggingface.co/datasets). We have used a pretrained [BERT](https://arxiv.org/abs/1810.04805) model from the [transformers](https://pypi.org/project/transformers/) library for feature extraction.
------
### How to use?
1. Add the project structure to a google drive
2. Open Google Colab using a jupyter notebook from the notebooks/ folder
3. Syncronize the google drive with the current colab projectThen you are all set. Simply train an NLP executing the jupyter notebook.
------
### Project structure
```
+-- {ATG-NLP}/
| +-- python/
| | +-- create_data.py
| | +-- train.py
| | +-- [...]
| +-- data/
| | +-- folder_containing_the_dataset/
| | | +-- fold_name0/
| | | +-- fold_name1/
| | | +-- [...]
| +-- output/
| | +-- history/
| | | +--- history_some_run_name1.txt
| | | +--- history_some_run_name2.txt
| | | +--- [...]
| | +-- models/
| | | +--- model_some_run_name1.h5
| | | +--- model_some_run_name2.h5
| | | +--- [...]
```------
### TODOs (most important from top to bottom):
- [x] Get benchmark datasets using the huggingface/datasets repository
- [x] Setup the project structure
- [x] Make jupyter notebook and code deployable on Google Colab
- [x] Use pretrained BERT as tokenizer
- [x] Introduce simple neural network that performs some task in an end-to-end NLP pipeline
- [ ] Compare BERT with other relevant tokenizers
- [ ] Introduce more benchmark datasets, ideally that are suitable for a specific use case (currently not defined)
- [ ] Implement MTL designs to test hypothesis
- [ ] Write report------
Made with :heart: and python