https://github.com/andreped/atg-nlp

Code relevant for training and evaluating multi-task NLPs from free text using TensorFlow
https://github.com/andreped/atg-nlp

bert-model cnn deep-learning google-colab-notebook natural-language-processing nlp tensorflow2

Last synced: 7 days ago
JSON representation

Code relevant for training and evaluating multi-task NLPs from free text using TensorFlow

Host: GitHub
URL: https://github.com/andreped/atg-nlp
Owner: andreped
License: mit
Created: 2021-03-31T14:13:28.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-03-31T17:52:47.000Z (over 4 years ago)
Last Synced: 2025-06-23T18:48:26.332Z (9 days ago)
Topics: bert-model, cnn, deep-learning, google-colab-notebook, natural-language-processing, nlp, tensorflow2
Language: Jupyter Notebook
Homepage:
Size: 123 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # ATG-NLP

Code relevant for training and evaluating NLPs from free text using Auxilliary Task Guiding (ATG).

The code is inspired by [this](https://keras.io/examples/nlp/text_extraction_with_bert/) example from the Keras.io page. In this project we have used TensorFlow 2.4 to train neural networks. The datasets used are from the [Huggingface/datasets Hub](https://huggingface.co/datasets). We have used a pretrained [BERT](https://arxiv.org/abs/1810.04805) model from the [transformers](https://pypi.org/project/transformers/) library for feature extraction.

------

### How to use?

1. Add the project structure to a google drive

2. Open Google Colab using a jupyter notebook from the notebooks/ folder

3. Syncronize the google drive with the current colab project

Then you are all set. Simply train an NLP executing the jupyter notebook.

------

### Project structure

```

+-- {ATG-NLP}/

|   +-- python/

|   |   +-- create_data.py

|   |   +-- train.py

|   |   +-- [...]

|   +-- data/

|   |   +-- folder_containing_the_dataset/

|   |   |   +-- fold_name0/

|   |   |   +-- fold_name1/

|   |   |   +-- [...]

|   +-- output/

|   |   +-- history/

|   |   |   +--- history_some_run_name1.txt

|   |   |   +--- history_some_run_name2.txt

|   |   |   +--- [...]

|   |   +-- models/

|   |   |   +--- model_some_run_name1.h5

|   |   |   +--- model_some_run_name2.h5

|   |   |   +--- [...]

```

------

### TODOs (most important from top to bottom):

- [x] Get benchmark datasets using the huggingface/datasets repository

- [x] Setup the project structure

- [x] Make jupyter notebook and code deployable on Google Colab

- [x] Use pretrained BERT as tokenizer

- [x] Introduce simple neural network that performs some task in an end-to-end NLP pipeline

- [ ] Compare BERT with other relevant tokenizers

- [ ] Introduce more benchmark datasets, ideally that are suitable for a specific use case (currently not defined)

- [ ] Implement MTL designs to test hypothesis

- [ ] Write report

------

Made with :heart: and python

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/andreped/atg-nlp

Awesome Lists containing this project

README