{"id":13405712,"url":"https://github.com/asyml/texar","last_synced_at":"2025-05-14T19:05:46.261Z","repository":{"id":39176308,"uuid":"98052177","full_name":"asyml/texar","owner":"asyml","description":"Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow.  This is part of the CASL project: http://casl-project.ai/","archived":false,"fork":false,"pushed_at":"2021-08-26T09:49:50.000Z","size":14241,"stargazers_count":2388,"open_issues_count":40,"forks_count":372,"subscribers_count":77,"default_branch":"master","last_synced_at":"2025-04-13T13:19:12.494Z","etag":null,"topics":["bert","casl-project","data-processing","deep-learning","dialog-systems","gpt-2","machine-learning","machine-translation","natural-language-processing","python","tensorflow","texar","text-data","text-generation","xlnet"],"latest_commit_sha":null,"homepage":"https://asyml.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/asyml.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-22T19:02:05.000Z","updated_at":"2025-03-28T05:12:58.000Z","dependencies_parsed_at":"2022-09-26T20:21:41.283Z","dependency_job_id":null,"html_url":"https://github.com/asyml/texar","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asyml%2Ftexar","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asyml%2Ftexar/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asyml%2Ftexar/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asyml%2Ftexar/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/asyml","download_url":"https://codeload.github.com/asyml/texar/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248717246,"owners_count":21150390,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","casl-project","data-processing","deep-learning","dialog-systems","gpt-2","machine-learning","machine-translation","natural-language-processing","python","tensorflow","texar","text-data","text-generation","xlnet"],"created_at":"2024-07-30T19:02:09.274Z","updated_at":"2025-04-13T13:19:25.627Z","avatar_url":"https://github.com/asyml.png","language":"Python","funding_links":[],"categories":["Toolbox","Python","BERT Text Generation Task:","Tasks","文本数据和NLP","Neural Natural Language Generation"],"sub_categories":["Libraries","Text Generation"],"readme":"\u003cdiv align=\"center\"\u003e\n   \u003cimg src=\"./docs/_static/img/logo_h_035.png\"\u003e\u003cbr\u003e\u003cbr\u003e\n\u003c/div\u003e\n \n-----------------\n\n\n[![pypi](https://img.shields.io/pypi/v/texar.svg)](https://pypi.python.org/pypi/texar)\n[![Build Status](https://travis-ci.org/asyml/texar.svg?branch=master)](https://travis-ci.org/asyml/texar)\n[![codecov](https://codecov.io/gh/asyml/texar/branch/master/graph/badge.svg)](https://codecov.io/gh/asyml/texar)\n[![Documentation Status](https://readthedocs.org/projects/texar/badge/?version=latest)](https://texar.readthedocs.io/en/latest/?badge=latest)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/asyml/texar/blob/master/LICENSE)\n \n\n**Texar** is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation.\n\nTexar was originally developed and is \nactively contributed by [Petuum](https://petuum.com/) and [CMU](https://www.cmu.edu/) in collaboration with other institutes.\nA mirror of this repository is maintained by [Petuum Open Source](https://github.com/petuum).\n\n### Key Features\n* **Two Versions, (Mostly) Same Interfaces**. Texar-TensorFlow (this repo) and **[Texar-PyTorch](https://github.com/asyml/texar-pytorch)** have mostly the same interfaces. Both further combine the best design of TF and PyTorch:\n  - Interfaces and variable sharing in *PyTorch convention*\n  - Excellent factorization and rich functionalities in *TF convention*.\n* **Rich Pre-trained Models, Rich Usage with Uniform Interfaces**. BERT, GPT2, XLNet, etc, for encoding, classification, generation, and composing complex models with other Texar components!\n* **Fully Customizable** at multiple abstraction level -- both novice-friendly and expert-friendly. \n  - Free to plug in whatever external modules, since Texar is fully compatible with the native TF/PyTorch APIs. \n* **Versatile** to support broad tasks, models, algorithms, data processing, evaluation, etc. \n   - encoder(s) to decoder(s), sequential- and self-attentions, memory, hierarchical models, classifiers... \n   - maximum likelihood learning, reinforcement learning, adversarial learning, probabilistic modeling, ... \n* **Modularized** for maximal re-use and clean APIs, based on principled decomposition of *Learning-Inference-Model Architecture*. \n* **Distributed** model training with multiple GPUs.\n* Clean, detailed [documentation](https://texar.readthedocs.io) and rich [examples](./examples).\n\n\n\u003cdiv align=\"center\"\u003e\n   \u003cimg src=\"./docs/_static/img/texar_stack.png\"\u003e\u003cbr\u003e\u003cbr\u003e\n\u003c/div\u003e \n\n### Library API Example\nBuilds an encoder-decoder model, with maximum likelihood learning:\n```python\nimport texar.tf as tx\n\n# Data \ndata = tx.data.PairedTextData(hparams=hparams_data) # a dict of hyperparameters \niterator = tx.data.DataIterator(data)\nbatch = iterator.get_next()                         # get a data mini-batch\n\n# Model architecture\nembedder = tx.modules.WordEmbedder(data.target_vocab.size, hparams=hparams_emb)\nencoder = tx.modules.TransformerEncoder(hparams=hparams_enc)\noutputs_enc = encoder(inputs=embedder(batch['source_text_ids']),  # call as a function\n                      sequence_length=batch['source_length'])\n                      \ndecoder = tx.modules.TransformerDecoder(\n    output_layer=tf.transpose(embedder.embedding) # tie input embedding w/ output layer\n    hparams=hparams_decoder)\noutputs, _, _ = decoder(memory=output_enc, \n                        memory_sequence_length=batch['source_length'],\n                        inputs=embedder(batch['target_text_ids']),\n                        sequence_length=batch['target_length']-1,\n                        decoding_strategy='greedy_train')    # teacher-forcing decoding\n                        \n# Loss for maximum likelihood learning\nloss = tx.losses.sequence_sparse_softmax_cross_entropy(\n    labels=batch['target_text_ids'][:, 1:],\n    logits=outputs.logits,\n    sequence_length=batch['target_length']-1)  # automatic sequence masks\n\n# Beam search decoding\noutputs_bs, _, _ = tx.modules.beam_search_decode(\n    decoder,\n    embedding=embedder,\n    start_tokens=[data.target_vocab.bos_token_id]*num_samples,\n    end_token=data.target_vocab.eos_token_id)\n```\nThe same model, but with adversarial learning:\n```python\nhelper = tx.modules.GumbelSoftmaxTraingHelper( # Gumbel-softmax decoding\n    start_tokens=[BOS]*batch_size, end_token=EOS, embedding=embedder)\noutputs, _ = decoder(helper=helper)            # automatic re-use of the decoder variables\n\ndiscriminator = tx.modules.BertClassifier(hparams=hparams_bert)        # pre-trained model\n\nG_loss, D_loss = tx.losses.binary_adversarial_losses(\n    real_data=data['target_text_ids'][:, 1:],\n    fake_data=outputs.sample_id,\n    discriminator_fn=discriminator)\n```\nThe same model, but with RL policy gradient learning:\n```python\nagent = tx.agents.SeqPGAgent(samples=outputs.sample_id,\n                             logits=outputs.logits,\n                             sequence_length=batch['target_length']-1,\n                             hparams=config_model.agent)\n```\nMany more examples are available [here](./examples)\n  \n### Installation\n\n**(Note: Texar\u003e0.2.3 requires Python 3.6 or 3.7. To use with older Python versions, please use Texar\u003c=0.2.3)**\n\nTexar requires:\n\n* `tensorflow \u003e= 1.10.0 (but \u003c 2.0.0)`. Follow the [tensorflow official instructions](https://www.tensorflow.org/install) to install the appropriate version\n* `tensorflow_probability \u003e= 0.3.0 (but \u003c 0.8.0)`. Follow the [tensorflow_probability official instractions](https://www.tensorflow.org/probability/install) to install.\n\nAfter `tensorflow` and `tensorflow_probability` are installed, install Texar from PyPI: \n```bash\npip install texar\n```\n\nTo use cutting-edge features or develop locally, install from source: \n```\ngit clone https://github.com/asyml/texar.git\ncd texar\npip install .\n```\n\n### Getting Started\n* [Examples](./examples)\n* [Documentation](https://texar.readthedocs.io)\n\n### Reference\nIf you use Texar, please cite the [tech report](https://arxiv.org/abs/1809.00794) with the following BibTex entry:\n```\nTexar: A Modularized, Versatile, and Extensible Toolkit for Text Generation\nZhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He, Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, Wanrong Zhu, Devendra Sachan and Eric Xing\nACL 2019\n\n@inproceedings{hu2019texar,\n  title={Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation},\n  author={Hu, Zhiting and Shi, Haoran and Tan, Bowen and Wang, Wentao and Yang, Zichao and Zhao, Tiancheng and He, Junxian and Qin, Lianhui and Wang, Di and others},\n  booktitle={ACL 2019, System Demonstrations},\n  year={2019}\n}\n```\n\n### License\n[Apache License 2.0](./LICENSE)\n\n### Companies and Universities Supporting Texar\n\u003cp float=\"left\"\u003e\n   \u003cimg src=\"https://github.com/asyml/texar/blob/master/docs/_static/img/Petuum.png\" width=\"200\" align=\"top\"\u003e\n   \u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\u0026nbsp;\n   \u003cimg src=\"https://asyml.io/assets/institutions/cmu.png\", width=\"200\" align=\"top\"\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasyml%2Ftexar","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fasyml%2Ftexar","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasyml%2Ftexar/lists"}