{"id":18892973,"url":"https://github.com/dongjunlee/transformer-tensorflow","last_synced_at":"2025-04-07T19:15:10.378Z","repository":{"id":236588807,"uuid":"113988061","full_name":"DongjunLee/transformer-tensorflow","owner":"DongjunLee","description":"TensorFlow implementation of 'Attention Is All You Need (2017. 6)'","archived":false,"fork":false,"pushed_at":"2018-04-30T11:40:02.000Z","size":842,"stargazers_count":348,"open_issues_count":5,"forks_count":109,"subscribers_count":13,"default_branch":"master","last_synced_at":"2025-03-31T16:17:05.020Z","etag":null,"topics":["attention","deep-learning","experiments","hb-experiment","nlp","tensorflow","transformer","translation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DongjunLee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-12T12:33:32.000Z","updated_at":"2025-01-21T04:08:49.000Z","dependencies_parsed_at":"2024-05-02T12:32:15.047Z","dependency_job_id":null,"html_url":"https://github.com/DongjunLee/transformer-tensorflow","commit_stats":null,"previous_names":["dongjunlee/transformer-tensorflow"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DongjunLee%2Ftransformer-tensorflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DongjunLee%2Ftransformer-tensorflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DongjunLee%2Ftransformer-tensorflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DongjunLee%2Ftransformer-tensorflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DongjunLee","download_url":"https://codeload.github.com/DongjunLee/transformer-tensorflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247713258,"owners_count":20983683,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","deep-learning","experiments","hb-experiment","nlp","tensorflow","transformer","translation"],"created_at":"2024-11-08T08:06:54.064Z","updated_at":"2025-04-07T19:15:10.327Z","avatar_url":"https://github.com/DongjunLee.png","language":"Python","readme":"# transformer [![hb-research](https://img.shields.io/badge/hb--research-experiment-green.svg?style=flat\u0026colorA=448C57\u0026colorB=555555)](https://github.com/hb-research)\n\nTensorFlow implementation of [Attention Is All You Need](https://arxiv.org/abs/1706.03762). (2017. 6)\n\n![images](images/transformer-architecture.png)\n\n## Requirements\n\n- Python 3.6\n- TensorFlow 1.8\n- [hb-config](https://github.com/hb-research/hb-config) (Singleton Config)\n- nltk (tokenizer and blue score)\n- tqdm (progress bar)\n- [Slack Incoming Webhook URL](https://my.slack.com/services/new/incoming-webhook/)\n\n\n## Project Structure\n\ninit Project by [hb-base](https://github.com/hb-research/hb-base)\n\n    .\n    ├── config                  # Config files (.yml, .json) using with hb-config\n    ├── data                    # dataset path\n    ├── notebooks               # Prototyping with numpy or tf.interactivesession\n    ├── transformer             # transformer architecture graphs (from input to logits)\n        ├── __init__.py             # Graph logic\n        ├── attention.py            # Attention (multi-head, scaled_dot_product and etc..)\n        ├── encoder.py              # Encoder logic\n        ├── decoder.py              # Decoder logic\n        └── layer.py                # Layers (FFN)\n    ├── data_loader.py          # raw_date -\u003e precossed_data -\u003e generate_batch (using Dataset)\n    ├── hook.py                 # training or test hook feature (eg. print_variables)\n    ├── main.py                 # define experiment_fn\n    └── model.py                # define EstimatorSpec\n\nReference : [hb-config](https://github.com/hb-research/hb-config), [Dataset](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_generator), [experiments_fn](https://www.tensorflow.org/api_docs/python/tf/contrib/learn/Experiment), [EstimatorSpec](https://www.tensorflow.org/api_docs/python/tf/estimator/EstimatorSpec)\n\n## Todo\n\n- Train and evaluate with 'WMT German-English (2016)' dataset\n\n## Config\n\nCan control all **Experimental environment**.\n\nexample: check-tiny.yml\n\n```yml\ndata:\n  base_path: 'data/'\n  raw_data_path: 'tiny_kor_eng'\n  processed_path: 'tiny_processed_data'\n  word_threshold: 1\n\n  PAD_ID: 0\n  UNK_ID: 1\n  START_ID: 2\n  EOS_ID: 3\n\nmodel:\n  batch_size: 4\n  num_layers: 2\n  model_dim: 32\n  num_heads: 4\n  linear_key_dim: 20\n  linear_value_dim: 24\n  ffn_dim: 30\n  dropout: 0.2\n\ntrain:\n  learning_rate: 0.0001\n  optimizer: 'Adam'  ('Adagrad', 'Adam', 'Ftrl', 'Momentum', 'RMSProp', 'SGD')\n  \n  train_steps: 15000\n  model_dir: 'logs/check_tiny'\n  \n  save_checkpoints_steps: 1000\n  check_hook_n_iter: 100\n  min_eval_frequency: 100\n  \n  print_verbose: True\n  debug: False\n  \nslack:\n  webhook_url: \"\"  # after training notify you using slack-webhook\n```\n\n* debug mode : using [tfdbg](https://www.tensorflow.org/programmers_guide/debugger)\n* `check-tiny` is a data set with about **30 sentences** that are translated from Korean into English. (recommend read it :) )\n\n## Usage\n\nInstall requirements.\n\n```pip install -r requirements.txt```\n\nThen, pre-process raw data.\n\n```python data_loader.py --config check-tiny```\n\nFinally, start train and evaluate model\n\n```python main.py --config check-tiny --mode train_and_evaluate```\n\n\nOr, you can use [IWSLT'15 English-Vietnamese](https://nlp.stanford.edu/projects/nmt/) dataset.\n\n```\nsh prepare-iwslt15.en-vi.sh                                        # download dataset\npython data_loader.py --config iwslt15-en-vi                       # preprocessing\npython main.py --config iwslt15-en-vi --mode train_and_evalueate   # start training\n```\n\n### Predict\n\nAfter training, you can test the model.\n\n- command\n\n```bash\npython predict.py --config {config} --src {src_sentence}\n```\n\n- example\n\n```bash\n$ python predict.py --config check-tiny --src \"안녕하세요. 반갑습니다.\"\n\n------------------------------------\nSource: 안녕하세요. 반갑습니다.\n \u003e Result: Hello . I'm glad to see you . \u003c\\s\u003e vectors . \u003c\\s\u003e Hello locations . \u003c\\s\u003e will . \u003c\\s\u003e . \u003c\\s\u003e you . \u003c\\s\u003e\n```\n\n\n### Experiments modes\n\n:white_check_mark: : Working  \n:white_medium_small_square: : Not tested yet.\n\n\n- :white_check_mark: `evaluate` : Evaluate on the evaluation data.\n- :white_medium_small_square: `extend_train_hooks` :  Extends the hooks for training.\n- :white_medium_small_square: `reset_export_strategies` : Resets the export strategies with the new_export_strategies.\n- :white_medium_small_square: `run_std_server` : Starts a TensorFlow server and joins the serving thread.\n- :white_medium_small_square: `test` : Tests training, evaluating and exporting the estimator for a single step.\n- :white_check_mark: `train` : Fit the estimator using the training data.\n- :white_check_mark: `train_and_evaluate` : Interleaves training and evaluation.\n\n---\n\n### Tensorboar\n\n```tensorboard --logdir logs```\n\n- check-tiny example\n\n![images](images/check_tiny_tensorboard.png)\n\n\n## Reference\n\n- [hb-research/notes - Attention Is All You Need](https://github.com/hb-research/notes/blob/master/notes/transformer.md)\n- [Paper - Attention Is All You Need](https://arxiv.org/abs/1706.03762) (2017. 6) by A Vaswani (Google Brain Team)\n- [tensor2tensor](https://github.com/tensorflow/tensor2tensor) - A library for generalized sequence to sequence models (official code)\n\n## Author\n\nDongjun Lee (humanbrain.djlee@gmail.com)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdongjunlee%2Ftransformer-tensorflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdongjunlee%2Ftransformer-tensorflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdongjunlee%2Ftransformer-tensorflow/lists"}