{"id":13546882,"url":"https://github.com/textflint/textflint","last_synced_at":"2025-04-02T19:32:04.458Z","repository":{"id":43179010,"uuid":"345074406","full_name":"textflint/textflint","owner":"textflint","description":"Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing","archived":false,"fork":false,"pushed_at":"2022-09-27T17:09:16.000Z","size":12195,"stargazers_count":643,"open_issues_count":6,"forks_count":95,"subscribers_count":18,"default_branch":"master","last_synced_at":"2025-03-05T15:51:59.833Z","etag":null,"topics":["adversarial-samples","attack","data-augmentation","model-robustness","robustness-analysis","subpopulation","text-augmentation","text-transformations","transformation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/textflint.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-06T11:15:52.000Z","updated_at":"2025-03-03T03:41:39.000Z","dependencies_parsed_at":"2022-09-10T02:21:48.400Z","dependency_job_id":null,"html_url":"https://github.com/textflint/textflint","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/textflint%2Ftextflint","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/textflint%2Ftextflint/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/textflint%2Ftextflint/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/textflint%2Ftextflint/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/textflint","download_url":"https://codeload.github.com/textflint/textflint/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246880093,"owners_count":20848813,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adversarial-samples","attack","data-augmentation","model-robustness","robustness-analysis","subpopulation","text-augmentation","text-transformations","transformation"],"created_at":"2024-08-01T12:00:47.058Z","updated_at":"2025-04-02T19:31:59.448Z","avatar_url":"https://github.com/textflint.png","language":"Python","funding_links":[],"categories":["Python","🏹️ Adversarial Attack"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\u003cimg src=\"images/logo.png\" alt=\"Textflint Logo\" height=\"100\"\u003e\u003c/p\u003e\n\n\u003ch3 align=\"center\"\u003eUnified Multilingual Robustness Evaluation Toolkit \n  for Natural Language Processing\u003c/h3\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca\u003e\n    \u003cimg src=\"https://github.com/textflint/textflint/actions/workflows/python-package.yml/badge.svg\" alt=\"Github Runner Covergae Status\"\u003e\n  \u003c/a\u003e\n\n  \u003ca href=\"https://www.textflint.io/textflint\"\u003e\n  \t\u003cimg alt=\"Website\" src=\"https://img.shields.io/website?up_message=online\u0026url=https%3A%2F%2Fwww.textflint.io%2F\"\u003e\n  \u003c/a\u003e\n\n  \u003ca\u003e\n  \t\u003cimg alt=\"License\" src=\"https://img.shields.io/badge/license-GPL%20v3-brightgreen\"\u003e\n  \u003c/a\u003e\n\n  \u003ca href=\"https://badge.fury.io/py/textflint\"\u003e\n  \t\u003cimg alt=\"GitHub release (latest by date)\" \tsrc=\"https://img.shields.io/github/v/release/textflint/textflint?label=release\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\nTextFlint is a multilingual robustness evaluation platform for natural language processing, which unifies text **transformation**, **sub-population**, **adversarial attack**,and their combinations to provide a comprehensive robustness analysis. So far, TextFlint supports 13 NLP tasks.\n\n\u003e If you're looking for robustness evaluation results of SOTA models, you might want the [TextFlint IO](https://www.textflint.io/textflint) page.\n\n## Features\n\n- **Full coverage of transformation types**, including 20 general transformations, 8 subpopulations and 60 task-specific transformations, as well as thousands of their combinations.\n- **Subpopulation**, which is to identify the specific part of dataset on which the target model performs poorly. \n- **Adversarial attack** aims to find a perturbation of an input text that is able to fool the given model.\n- **Complete analytical report** to accurately explain where your model's shortcomings are, such as the problems in lexical rules or syntactic rules. \n\n## Online Demo\n\nYou can test most of transformations directly on our [online demo](https://www.textflint.io/tutorials). \n\n## Table of Contents\n\n- [Setup](#setup)\n- [Usage](#usage)\n- [Architecture](#Architecture)\n- [Learn More](#learn-more)\n- [Contributing](#contributing)\n- [Citation](#Citation)\n\n## Setup\n\nRequire **python version \u003e= 3.7**, recommend install with `pip`.\n\n```shell\npip install textflint\n```\n\nOnce TextFlint is installed, you can run it via command-line (`textflint ...`) or integrate it inside another NLP project.\n\n## Usage\n\n### Workflow\n\n\n\n\u003cimg src=\"images/workflow.png\" style=\"zoom:50%;\" /\u003e\n\nThe general workflow of TextFlint is displayed above. Evaluation of target models could be divided into three steps:\n\n1. For input preparation, the original dataset for testing, which is to be loaded by `Dataset`, should be firstly formatted as a series of `JSON` objects. You can use the built-in `Dataset` following this [instruction](docs/user/components/4_Sample_Dataset.ipynb). TextFlint configuration is specified by `Config`. Target model is also loaded as `FlintModel`.\n2. In adversarial sample generation, multi-perspective transformations (i.e., [80+Transformation](docs/user/components/transformation.md), [Subpopulation](docs/user/components/subpopulation.md) and [AttackRecipe](https://github.com/QData/TextAttack)), are performed on `Dataset` to generate transformed samples. Besides, to ensure semantic and grammatical correctness of transformed samples, [Validator](docs/user/components/validator.md) calculates confidence of each sample to filter out unacceptable samples.\n3. Lastly, `Analyzer` collects evaluation results and `ReportGenerator` automatically generates a comprehensive report of model robustness. \n\nFor example, on the Sentiment Analysis (SA) task, this is a statistical chart of the performance of`XLNET`  with different types of `Transformation`/`Subpopulation`/`AttackRecipe` on the `IMDB` dataset. \n\n\u003cimg src=\"images/report.png\" alt=\"\" style=\"zoom:100%\" /\u003e\n\nWe release tutorials of performing the whole pipeline of TextFlint on various tasks, including:\n\n* [Machine Reading Comprehension](docs/user/tutorials/9_MRC.ipynb)\n* [Part-of-speech Tagging](docs/user/tutorials/7_BERT%20for%20POS%20tagging.ipynb)\n* [Named Entity Recognition](docs/user/tutorials/11_NER.ipynb)\n* [Chinese Word Segmentation](docs/user/tutorials/10_CWS.ipynb)\n\n### Quick Start\n\nUsing TextFlint to verify the robustness of a specific model is as simple as running the following command:\n\n```shell\n$ textflint --dataset input_file --config config.json\n```\n\nwhere *input\\_file* is the input file of csv or json format, *config.json* is a configuration file with generation and target model options.  Transformed datasets would save to your out dir according to your *config.json*. \n\nBased on the design of decoupling sample generation and model verification, **TextFlint** can be used inside another NLP project with just a few lines of code.\n\n```python\nfrom textflint import Engine\n\ndata_path = 'input.json'\nconfig = 'config.json'\nengine = Engine()\nengine.run(data_path, config)\n```\n\nFor more input and output instructions of TextFlint, please refer to the [IO format  document](docs/user/components/IOFormat.md).\n\n## Architecture\n\n\u003cimg src=\"images/architecture.png\" style=\"zoom:50%;\" /\u003e\n\n***Input layer:*** receives textual datasets and models as input, represented as `Dataset` and `FlintModel` separately.\n\n- **`DataSet`**: a container, provides efficient and handy operation interfaces for `Sample`. `Dataset` supports loading, verification, and saving data in Json or CSV format for various NLP tasks. \n- **`FlintModel`**: a target model used in an adversarial attack.\n\n ***Generation layer:***  there are mainly four parts in generation layer:\n\n- **`Subpopulation`**: generates a subset of a `DataSet`. \n- **`Transformation`**: transforms each sample of `Dataset` if it can be transformed. \n- **`AttackRecipe`**: attacks the `FlintModel` and generates a `DataSet` of adversarial examples.\n- **`Validator`**: verifies the quality of samples generated by `Transformation` and `AttackRecipe`.\n\n\u003e textflint provides an interface to integrate the easy-to-use adversarial attack recipes implemented based on `textattack`. Users can refer to [textattack](https://github.com/QData/TextAttack) for more information about the supported `AttackRecipe`.\n\n***Report layer:*** analyzes model testing results and provides robustness report for users.\n\n## Learn More\n\n| Section                                                      | Description                                                  |\n| ------------------------------------------------------------ | ------------------------------------------------------------ |\n| [Documentation](https://textflint.readthedocs.io/)           | Full API documentation and tutorials                         |\n| [Tutorial](https://github.com/textflint/textflint/tree/master/docs/user) | The tutorial of textflint components and pipeline            |\n| [Website](https://www.textflint.io/textflint)                | Provides evaluation results of SOTA models and transformed data download |\n| [Online Demo](https://www.textflint.io/tutorials)            | Interactive demo to try single text transformations          |\n| [Paper](https://aclanthology.org/2021.acl-demo.41.pdf) | Our system paper which was received by ACL2021               |\n\n## Contributing\n\nWe welcome community contributions to TextFlint in the form of bugfixes 🛠️ and new features💡!   If you want to contribute, please first read [our contribution guideline](CONTRIBUTING.md).\n\n## Citation\n\nIf you are using TextFlint for your work, please kindly cite our [ACL2021 TextFlint demo paper](https://aclanthology.org/2021.acl-demo.41.pdf):\n\n```latex\n@inproceedings{wang-etal-2021-textflint,\n    title = {TextFlint: Unified Multilingual Robustness Evaluation Toolkit for Natural Language Processing},\n    author = {Wang, Xiao  and Liu, Qin  and Gui, Tao  and Zhang, Qi and others},\n    booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations},\n    month = {aug},\n    year = {2021},\n    address = {Online},\n    publisher = {Association for Computational Linguistics},\n    url = {https://aclanthology.org/2021.acl-demo.41},\n    doi = {10.18653/v1/2021.acl-demo.41},\n    pages = {347--355}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftextflint%2Ftextflint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftextflint%2Ftextflint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftextflint%2Ftextflint/lists"}