Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/andreped/atg-nlp

Code relevant for training and evaluating multi-task NLPs from free text using TensorFlow
https://github.com/andreped/atg-nlp

bert-model cnn deep-learning google-colab-notebook natural-language-processing nlp tensorflow2

Last synced: 4 days ago
JSON representation

Code relevant for training and evaluating multi-task NLPs from free text using TensorFlow

Awesome Lists containing this project

README

        

# ATG-NLP
Code relevant for training and evaluating NLPs from free text using Auxilliary Task Guiding (ATG).

The code is inspired by [this](https://keras.io/examples/nlp/text_extraction_with_bert/) example from the Keras.io page. In this project we have used TensorFlow 2.4 to train neural networks. The datasets used are from the [Huggingface/datasets Hub](https://huggingface.co/datasets). We have used a pretrained [BERT](https://arxiv.org/abs/1810.04805) model from the [transformers](https://pypi.org/project/transformers/) library for feature extraction.

------

### How to use?

1. Add the project structure to a google drive
2. Open Google Colab using a jupyter notebook from the notebooks/ folder
3. Syncronize the google drive with the current colab project

Then you are all set. Simply train an NLP executing the jupyter notebook.

------

### Project structure

```
+-- {ATG-NLP}/
| +-- python/
| | +-- create_data.py
| | +-- train.py
| | +-- [...]
| +-- data/
| | +-- folder_containing_the_dataset/
| | | +-- fold_name0/
| | | +-- fold_name1/
| | | +-- [...]
| +-- output/
| | +-- history/
| | | +--- history_some_run_name1.txt
| | | +--- history_some_run_name2.txt
| | | +--- [...]
| | +-- models/
| | | +--- model_some_run_name1.h5
| | | +--- model_some_run_name2.h5
| | | +--- [...]
```

------

### TODOs (most important from top to bottom):

- [x] Get benchmark datasets using the huggingface/datasets repository
- [x] Setup the project structure
- [x] Make jupyter notebook and code deployable on Google Colab
- [x] Use pretrained BERT as tokenizer
- [x] Introduce simple neural network that performs some task in an end-to-end NLP pipeline
- [ ] Compare BERT with other relevant tokenizers
- [ ] Introduce more benchmark datasets, ideally that are suitable for a specific use case (currently not defined)
- [ ] Implement MTL designs to test hypothesis
- [ ] Write report

------

Made with :heart: and python