{"id":18785306,"url":"https://github.com/andreped/atg-nlp","last_synced_at":"2025-06-26T11:34:09.008Z","repository":{"id":109858290,"uuid":"353380080","full_name":"andreped/ATG-NLP","owner":"andreped","description":"Code relevant for training and evaluating multi-task NLPs from free text using TensorFlow","archived":false,"fork":false,"pushed_at":"2021-03-31T17:52:47.000Z","size":126,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-23T18:48:26.332Z","etag":null,"topics":["bert-model","cnn","deep-learning","google-colab-notebook","natural-language-processing","nlp","tensorflow2"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andreped.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-31T14:13:28.000Z","updated_at":"2021-04-13T21:34:45.000Z","dependencies_parsed_at":"2023-06-12T07:45:25.522Z","dependency_job_id":null,"html_url":"https://github.com/andreped/ATG-NLP","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/andreped/ATG-NLP","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreped%2FATG-NLP","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreped%2FATG-NLP/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreped%2FATG-NLP/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreped%2FATG-NLP/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andreped","download_url":"https://codeload.github.com/andreped/ATG-NLP/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andreped%2FATG-NLP/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262056490,"owners_count":23251681,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert-model","cnn","deep-learning","google-colab-notebook","natural-language-processing","nlp","tensorflow2"],"created_at":"2024-11-07T20:46:15.384Z","updated_at":"2025-06-26T11:34:08.989Z","avatar_url":"https://github.com/andreped.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ATG-NLP\nCode relevant for training and evaluating NLPs from free text using Auxilliary Task Guiding (ATG).\n\nThe code is inspired by [this](https://keras.io/examples/nlp/text_extraction_with_bert/) example from the Keras.io page. In this project we have used TensorFlow 2.4 to train neural networks. The datasets used are from the [Huggingface/datasets Hub](https://huggingface.co/datasets). We have used a pretrained [BERT](https://arxiv.org/abs/1810.04805) model from the [transformers](https://pypi.org/project/transformers/) library for feature extraction.\n\n------\n\n### How to use?\n\n1. Add the project structure to a google drive\n2. Open Google Colab using a jupyter notebook from the notebooks/ folder\n3. Syncronize the google drive with the current colab project\n\nThen you are all set. Simply train an NLP executing the jupyter notebook.\n\n------\n\n### Project structure\n\n```\n+-- {ATG-NLP}/\n|   +-- python/\n|   |   +-- create_data.py\n|   |   +-- train.py\n|   |   +-- [...]\n|   +-- data/\n|   |   +-- folder_containing_the_dataset/\n|   |   |   +-- fold_name0/\n|   |   |   +-- fold_name1/\n|   |   |   +-- [...]\n|   +-- output/\n|   |   +-- history/\n|   |   |   +--- history_some_run_name1.txt\n|   |   |   +--- history_some_run_name2.txt\n|   |   |   +--- [...]\n|   |   +-- models/\n|   |   |   +--- model_some_run_name1.h5\n|   |   |   +--- model_some_run_name2.h5\n|   |   |   +--- [...]\n```\n\n------\n\n### TODOs (most important from top to bottom):\n\n- [x] Get benchmark datasets using the huggingface/datasets repository\n- [x] Setup the project structure\n- [x] Make jupyter notebook and code deployable on Google Colab\n- [x] Use pretrained BERT as tokenizer\n- [x] Introduce simple neural network that performs some task in an end-to-end NLP pipeline\n- [ ] Compare BERT with other relevant tokenizers\n- [ ] Introduce more benchmark datasets, ideally that are suitable for a specific use case (currently not defined)\n- [ ] Implement MTL designs to test hypothesis\n- [ ] Write report\n\n\n------\n\nMade with :heart: and python\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreped%2Fatg-nlp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandreped%2Fatg-nlp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandreped%2Fatg-nlp/lists"}