{"id":15625822,"url":"https://github.com/graykode/xlnet-pytorch","last_synced_at":"2025-04-05T15:08:51.280Z","repository":{"id":40545810,"uuid":"193910364","full_name":"graykode/xlnet-Pytorch","owner":"graykode","description":"Simple XLNet implementation with Pytorch Wrapper","archived":false,"fork":false,"pushed_at":"2019-07-03T16:09:50.000Z","size":573,"stargazers_count":582,"open_issues_count":15,"forks_count":107,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-03-29T14:11:19.021Z","etag":null,"topics":["bert","natural-language-processing","nlp","pytorch","xlnet","xlnet-pytorch"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/1906.08237.pdf","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/graykode.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-26T13:31:19.000Z","updated_at":"2025-03-22T02:24:36.000Z","dependencies_parsed_at":"2022-08-09T22:50:28.096Z","dependency_job_id":null,"html_url":"https://github.com/graykode/xlnet-Pytorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fxlnet-Pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fxlnet-Pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fxlnet-Pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/graykode%2Fxlnet-Pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/graykode","download_url":"https://codeload.github.com/graykode/xlnet-Pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247353746,"owners_count":20925329,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","natural-language-processing","nlp","pytorch","xlnet","xlnet-pytorch"],"created_at":"2024-10-03T10:05:48.527Z","updated_at":"2025-04-05T15:08:51.241Z","avatar_url":"https://github.com/graykode.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"## XLNet-Pytorch [arxiv:1906.08237](https://arxiv.org/pdf/1906.08237.pdf)\n\n**Simple XLNet implementation with Pytorch Wrapper!**\n\n#### You can see How XLNet Architecture work in pre-training with small batch size(=1) example.\n\n#### To Usage\n\n```shell\n$ git clone https://github.com/graykode/xlnet-Pytorch \u0026\u0026 cd xlnet-Pytorch\n\n# To use Sentence Piece Tokenizer(pretrained-BERT Tokenizer)\n$ pip install pytorch_pretrained_bert\n\n$ python main.py --data ./data.txt --tokenizer bert-base-uncased \\\n   --seq_len 512 --reuse_len 256 --perm_size 256 \\\n   --bi_data True --mask_alpha 6 --mask_beta 1 \\\n   --num_predict 85 --mem_len 384 --num_epoch 100\n```\n\nAlso, You can run code in [Google Colab](https://colab.research.google.com/github/graykode/xlnet-Pytorch/blob/master/XLNet.ipynb) easily.\n\n- Hyperparameters for Pretraining in Paper.\n\n\u003cp align=\"center\"\u003e\u003cimg width=\"300\" src=\"images/hyperparameters.png\" /\u003e \u003c/p\u003e\n#### Option\n\n- `—data`(String) : `.txt` file to train. It doesn't matter multiline text. Also, one file will be one batch tensor. Default : `data.txt`\n- `—tokenizer`(String) : I just used [huggingface/pytorch-pretrained-BERT's Tokenizer](https://github.com/huggingface/pytorch-pretrained-BERT) as subword tokenizer(I'll edit it to sentence piece soon). you can choose in `bert-base-uncased`, `bert-large-uncased`, `bert-base-cased`, `bert-large-cased`. Default : `bert-base-uncased`\n- `—seq_len`(Integer) : Sequence length. Default : `512`\n- `—reuse_len`(Interger) : Number of token that can be reused as memory. Could be half of `seq_len`. Default : `256`\n- `—perm_size`(Interger) : the length of longest permutation. Could be set to be reuse_len. Default : `256`\n\n- `--bi_data`(Boolean) : whether to create bidirectional data. If `bi_data` is `True`, `biz(batch size)` should be even number. Default : `False`\n- `—mask_alpha`(Interger) : How many tokens to form a group. Defalut : `6`\n- `—mask_beta`(Integer) : How many tokens to mask within each group. Default : `1`\n- `—num_predict`(Interger) : Num of tokens to predict. In Paper, it mean Partial Prediction. Default : `85`\n- `—mem_len`(Interger) : Number of steps to cache in Transformer-XL Architecture. Default : `384`\n- `—num_epoch`(Interger) : Number of Epoch. Default : `100`\n\n\n\n## What is XLNet?\n\n**XLNet** is a new unsupervised language representation learning method based on a novel generalized permutation language modeling objective. Additionally, XLNet employs [Transformer-XL](https://arxiv.org/abs/1901.02860) as the backbone model, exhibiting excellent performance for language tasks involving long context.\n\n- [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237)\n- [Paper Author's XLNet Github](https://github.com/zihangdai/xlnet)\n\n| Model | MNLI     | QNLI     | QQP      | RTE      | SST-2    | MRPC     | CoLA     | STS-B    |\n| ----- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |\n| BERT  | 86.6     | 92.3     | 91.3     | 70.4     | 93.2     | 88.0     | 60.6     | 90.0     |\n| XLNet | **89.8** | **93.9** | **91.8** | **83.8** | **95.6** | **89.2** | **63.6** | **91.8** |\n\n\n\n### Keyword in XLNet\n\n1. How did XLNet benefit from Auto-Regression and Auto-Encoding models?\n\n   - Auto-Regression  Model\n     ![](images/ARmodel.png)\n   - Auto-Encoding Model\n     ![](images/AEmodel.png)\n\n2. Permutation Language Modeling with Partial Prediction\n   - Permutation Language Modeling\n    ![](images/PLM.png)\n   \n   - Partial Prediction\n    ![](images/ParPrediction.png)\n  \n3. Two-Stream Self-Attention with Target-Aware Representation\n\n   - Two-Stram Self-Attention\n\n     ![](images/twoattn.png)\n\n   - Target-Aware Representation\n\n     ![](images/target-aware.png)\n\n\n\n## Author\n\n- Because the original repository is subject to the **Apache2.0 license**, it is subject to the same license.\n- Tae Hwan Jung(Jeff Jung) @graykode, Kyung Hee Univ CE(Undergraduate).\n- Author Email : [nlkey2022@gmail.com](mailto:nlkey2022@gmail.com)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraykode%2Fxlnet-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgraykode%2Fxlnet-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgraykode%2Fxlnet-pytorch/lists"}