{"id":13676589,"url":"https://github.com/marcotcr/checklist","last_synced_at":"2025-05-14T17:05:44.762Z","repository":{"id":37383696,"uuid":"246097498","full_name":"marcotcr/checklist","owner":"marcotcr","description":"Beyond Accuracy: Behavioral Testing of NLP models with CheckList","archived":false,"fork":false,"pushed_at":"2024-01-09T01:46:07.000Z","size":131068,"stargazers_count":2031,"open_issues_count":11,"forks_count":208,"subscribers_count":26,"default_branch":"master","last_synced_at":"2025-04-13T01:59:22.842Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/marcotcr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-09T17:18:49.000Z","updated_at":"2025-04-09T05:27:29.000Z","dependencies_parsed_at":"2023-02-14T18:31:49.886Z","dependency_job_id":"6d4be6ca-9900-40f0-a3b0-c4772ea5c70c","html_url":"https://github.com/marcotcr/checklist","commit_stats":{"total_commits":201,"total_committers":13,"mean_commits":"15.461538461538462","dds":0.4626865671641791,"last_synced_commit":"3edd07c9a84e6c6657333450d4d0e70ecb0c00d9"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcotcr%2Fchecklist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcotcr%2Fchecklist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcotcr%2Fchecklist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/marcotcr%2Fchecklist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/marcotcr","download_url":"https://codeload.github.com/marcotcr/checklist/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254190396,"owners_count":22029632,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T13:00:30.054Z","updated_at":"2025-05-14T17:05:44.738Z","avatar_url":"https://github.com/marcotcr.png","language":"Jupyter Notebook","readme":"# CheckList\r\nThis repository contains code for testing NLP Models as described in the following paper:  \r\n\u003e[Beyond Accuracy: Behavioral Testing of NLP models with CheckList](http://homes.cs.washington.edu/~marcotcr/acl20_checklist.pdf)  \r\n\u003e Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, Sameer Singh\r\n\u003e Association for Computational Linguistics (ACL), 2020\r\n\r\nBibtex for citations:\r\n```bibtex\r\n @inproceedings{checklist:acl20,  \r\n author = {Marco Tulio Ribeiro and Tongshuang Wu and Carlos Guestrin and Sameer Singh},  \r\n title = {Beyond Accuracy: Behavioral Testing of NLP models with CheckList},  \r\n booktitle = {Association for Computational Linguistics (ACL)},  \r\n year = {2020}\r\n }\r\n```\r\n\r\n\r\nTable of Contents\r\n=================\r\n\r\n   * [CheckList](#checklist)\r\n      * [Table of Contents](#table-of-contents)\r\n      * [Installation](#installation)\r\n      * [Tutorials](#tutorials)\r\n      * [Paper tests](#paper-tests)\r\n         * [Notebooks: how we created the tests in the paper](#notebooks-how-we-created-the-tests-in-the-paper)\r\n         * [Replicating paper tests, or running them with new models](#replicating-paper-tests-or-running-them-with-new-models)\r\n            * [Sentiment Analysis](#sentiment-analysis)\r\n            * [QQP](#qqp)\r\n            * [SQuAD](#squad)\r\n            * [Testing huggingface transformer pipelines](#testing-huggingface-transformer-pipelines)\r\n      * [Code snippets](#code-snippets)\r\n         * [Templates](#templates)\r\n         * [RoBERTa suggestions](#roberta-suggestions)\r\n            * [Multilingual suggestions](#multilingual-suggestions)\r\n         * [Lexicons (somewhat multilingual)](#lexicons-somewhat-multilingual)\r\n         * [Perturbing data for INVs and DIRs](#perturbing-data-for-invs-and-dirs)\r\n         * [Creating and running tests](#creating-and-running-tests)\r\n         * [Custom expectation functions](#custom-expectation-functions)\r\n         * [Test Suites](#test-suites)\r\n      * [API reference](#api-reference)\r\n      * [Code of Conduct](#code-of-conduct)\r\n\r\n## Installation\r\nFrom pypi:  \r\n```bash\r\npip install checklist\r\njupyter nbextension install --py --sys-prefix checklist.viewer\r\njupyter nbextension enable --py --sys-prefix checklist.viewer\r\n```\r\nNote:  `--sys-prefix` to install into python’s sys.prefix, which is useful for instance in virtual environments, such as with conda or virtualenv. If you are not in such environments, please switch to `--user` to install into the user’s home jupyter directories.\r\n\r\nFrom source:\r\n```bash\r\ngit clone git@github.com:marcotcr/checklist.git\r\ncd checklist\r\npip install -e .\r\n```\r\nEither way, you need to install `pytorch` or `tensorflow` if you want to use masked language model suggestions:\r\n```bash\r\npip install torch\r\n```\r\nFor most tutorials, you also need to download a spacy model:\r\n```bash\r\npython -m spacy download en_core_web_sm\r\n```\r\n## Tutorials\r\nPlease note that the visualizations are implemented as ipywidgets, and don't work on colab or JupyterLab (use jupyter notebook). Everything else should work on these though.\r\n\r\n1. [Generating data](notebooks/tutorials/1.%20Generating%20data.ipynb)\r\n2. [Perturbing data](notebooks/tutorials/2.%20Perturbing%20data.ipynb)\r\n3. [Test types, expectation functions, running tests](notebooks/tutorials/3.%20Test%20types,%20expectation%20functions,%20running%20tests.ipynb)\r\n4. [The CheckList process](notebooks/tutorials/4.%20The%20CheckList%20process.ipynb)\r\n\r\n## Paper tests\r\n### Notebooks: how we created the tests in the paper\r\n1. [Sentiment analysis](notebooks/Sentiment.ipynb)\r\n2. [QQP](notebooks/QQP.ipynb)\r\n3. [SQuAD](notebooks/SQuAD.ipynb)\r\n\r\n### Replicating paper tests, or running them with new models\r\nFor all of these, you need to unpack the release data (in the main repo folder after cloning):\r\n```bash\r\ntar xvzf release_data.tar.gz\r\n```\r\n#### Sentiment Analysis\r\nLoading the suite:\r\n```python\r\nimport checklist\r\nfrom checklist.test_suite import TestSuite\r\nsuite_path = 'release_data/sentiment/sentiment_suite.pkl'\r\nsuite = TestSuite.from_file(suite_path)\r\n```\r\nRunning tests with precomputed `bert` predictions (replace `bert` on `pred_path` with `amazon`, `google`, `microsoft`, or `roberta` for others):\r\n```python\r\npred_path = 'release_data/sentiment/predictions/bert'\r\nsuite.run_from_file(pred_path, overwrite=True)\r\nsuite.summary() # or suite.visual_summary_table()\r\n```\r\nTo test your own model, get predictions for the texts in `release_data/sentiment/tests_n500` and save them in a file where each line has 4 numbers: the prediction (0 for negative, 1 for neutral, 2 for positive) and the prediction probabilities for (negative, neutral, positive).  \r\nThen, update `pred_path` with this file and run the lines above.\r\n\r\n\r\n#### QQP\r\n```python\r\nimport checklist\r\nfrom checklist.test_suite import TestSuite\r\nsuite_path = 'release_data/qqp/qqp_suite.pkl'\r\nsuite = TestSuite.from_file(suite_path)\r\n```\r\nRunning tests with precomputed `bert` predictions (replace `bert` on `pred_path` with `roberta` if you want):\r\n```python\r\npred_path = 'release_data/qqp/predictions/bert'\r\nsuite.run_from_file(pred_path, overwrite=True, file_format='binary_conf')\r\nsuite.visual_summary_table()\r\n```\r\nTo test your own model, get predictions for pairs in `release_data/qqp/tests_n500` (format: tsv) and output them in a file where each line has a single number: the probability that the pair is a duplicate.\r\n\r\n#### SQuAD\r\n```python\r\nimport checklist\r\nfrom checklist.test_suite import TestSuite\r\nsuite_path = 'release_data/squad/squad_suite.pkl'\r\nsuite = TestSuite.from_file(suite_path)\r\n```\r\nRunning tests with precomputed `bert` predictions:\r\n```python\r\npred_path = 'release_data/squad/predictions/bert'\r\nsuite.run_from_file(pred_path, overwrite=True, file_format='pred_only')\r\nsuite.visual_summary_table()\r\n```\r\nTo test your own model, get predictions for pairs in `release_data/squad/squad.jsonl` (format: jsonl) or `release_data/squad/squad.json` (format: json, like SQuAD dev) and output them in a file where each line has a single string: the prediction span.\r\n\r\n\r\n#### Testing huggingface transformer pipelines\r\nSee [this notebook](notebooks/tutorials/5.%20Testing%20transformer%20pipelines.ipynb).\r\n\r\n##  Code snippets\r\n### Templates\r\nSee [1. Generating data](notebooks/tutorials/1.%20Generating%20data.ipynb) for more details.\r\n\r\n```python\r\nimport checklist\r\nfrom checklist.editor import Editor\r\nimport numpy as np\r\neditor = Editor()\r\nret = editor.template('{first_name} is {a:profession} from {country}.',\r\n                       profession=['lawyer', 'doctor', 'accountant'])\r\nnp.random.choice(ret.data, 3)\r\n```\r\n\u003e ['Mary is a doctor from Afghanistan.',  \r\n       'Jordan is an accountant from Indonesia.',  \r\n       'Kayla is a lawyer from Sierra Leone.']\r\n\r\n### RoBERTa suggestions\r\nSee [1. Generating data](notebooks/tutorials/1.%20Generating%20data.ipynb) for more details.  \r\nIn template:\r\n```python\r\nret = editor.template('This is {a:adj} {mask}.',  \r\n                      adj=['good', 'bad', 'great', 'terrible'])\r\nret.data[:3]\r\n```\r\n\r\n\u003e ['This is a good idea.',  \r\n 'This is a good sign.',  \r\n 'This is a good thing.']\r\n\r\nMultiple masks:\r\n```python\r\nret = editor.template('This is {a:adj} {mask} {mask}.',\r\n                      adj=['good', 'bad', 'great', 'terrible'])\r\nret.data[:3]\r\n```\r\n\u003e ['This is a good history lesson.',  \r\n 'This is a good chess move.',  \r\n 'This is a good news story.']\r\n\r\nGetting suggestions rather than filling out templates:\r\n```python\r\neditor.suggest('This is {a:adj} {mask}.',\r\n               adj=['good', 'bad', 'great', 'terrible'])[:5]\r\n```\r\n\u003e ['idea', 'sign', 'thing', 'example', 'start']\r\n\r\nGetting suggestions for replacements (only a single text allowed, no templates):\r\n```python\r\neditor.suggest_replace('This is a good movie.', 'good')[:5]\r\n```\r\n\u003e ['great', 'horror', 'bad', 'terrible', 'cult']\r\n\r\nGetting suggestions through jupyter visualization:  \r\n```python\r\neditor.visual_suggest('This is {a:mask} movie.')\r\n```\r\n![visual suggest](notebooks/tutorials/visual_suggest.gif )\r\n\r\n#### Multilingual suggestions\r\nJust initialize the editor with the `language` argument (should work with language names and iso 639-1 codes):\r\n\r\n```python\r\nimport checklist\r\nfrom checklist.editor import Editor\r\nimport numpy as np\r\n# in Portuguese\r\neditor = Editor(language='portuguese')\r\nret = editor.template('O João é um {mask}.',)\r\nret.data[:3]\r\n```\r\n\u003e ['O João é um português.',  \r\n'O João é um poeta.',  \r\n'O João é um brasileiro.']\r\n\r\n```python\r\n# in Chinese\r\neditor = Editor(language='chinese')\r\nret = editor.template('西游记的故事很{mask}。',)\r\nret.data[:3]\r\n```\r\n\u003e ['西游记的故事很精彩。',  \r\n'西游记的故事很真实。',  \r\n'西游记的故事很经典。']\r\n\r\n\r\nWe're using [FlauBERT](https://arxiv.org/abs/1912.05372) for french, [German BERT](https://deepset.ai/german-bert) for german, and [XLM-RoBERTa](https://github.com/pytorch/fairseq/tree/master/examples/xlmr) for everything else (click the link for a list of supported languages). We can't vouch for the quality of the suggestions in other languages, but it seems to work reasonably well for the languages we speak (although not as well as English).\r\n\r\n### Lexicons (somewhat multilingual)\r\n`editor.lexicons` is a dictionary, which can be used in templates. For example:\r\n```python\r\nimport checklist\r\nfrom checklist.editor import Editor\r\nimport numpy as np\r\n# Default: English\r\neditor = Editor()\r\nret = editor.template('{male1} went to see {male2} in {city}.', remove_duplicates=True)\r\nlist(np.random.choice(ret.data, 3))\r\n```\r\n\u003e ['Dan went to see Hugh in Riverside.',  \r\n 'Stephen went to see Eric in Omaha.',  \r\n 'Patrick went to see Nick in Kansas City.']\r\n\r\nPerson names and location (country, city) names are multilingual, depending on the `editor` language. We [got the data](notebooks/other/Acquiring%20multilingual%20lexicons%20from%20wikidata.ipynb) from [wikidata](https://www.wikidata.org), so there is a bias towards names on wikipedia.\r\n```python\r\neditor = Editor(language='german')\r\nret = editor.template('{male1} went to see {male2} in {city}.', remove_duplicates=True)\r\nlist(np.random.choice(ret.data, 3))\r\n```\r\n\u003e ['Rolf went to see Klaus in Leipzig.',  \r\n  'Richard went to see Jörg in Marl.',  \r\n  'Gerd went to see Fritz in Schwerin.']\r\n\r\nList of available lexicons:\r\n\r\n```python\r\neditor.lexicons.keys()\r\n```\r\n\u003e dict_keys(['male', 'female', 'first_name', 'first_pronoun', 'last_name', 'country', 'nationality', 'city', 'religion', 'religion_adj', 'sexual_adj', 'country_city', 'male_from', 'female_from', 'last_from'])\r\n\r\nSome of these cannot be used directly in templates because they are themselves dictionaries. For example, `male_from`, `female_from`, `last_from` and `country_city` are dictionaries from country to male names, female names, last names and most populous cities.  \r\nYou can call `editor.lexicons.male_from.keys()` for a list of country names. Example usage:\r\n```python\r\nimport numpy as np\r\ncountries = ['France', 'Germany', 'Brazil']\r\nfor country in countries:\r\n    ts = editor.template('{male} {last} is from {city}',\r\n                male=editor.lexicons.male_from[country],\r\n                last=editor.lexicons.last_from[country],\r\n                city=editor.lexicons.country_city[country],\r\n               )\r\n    print('Country: %s' % country)\r\n    print('\\n'.join(np.random.choice(ts.data, 3)))\r\n    print()\r\n```\r\n\u003e Country: France  \r\nJean-Jacques Brun is from Avignon  \r\nBruno Deschamps is from Vitry-sur-Seine  \r\nErnest Picard is from Chambéry\r\n\u003e\r\n\u003e Country: Germany  \r\nRainer Braun is from Schwerin  \r\nMarkus Brandt is from Gera  \r\nReinhard Busch is from Erlangen  \r\n\u003e\r\n\u003e Country: Brazil  \r\nGilberto Martins is from Anápolis  \r\nAlfredo Guimarães is from Indaiatuba  \r\nJorge Barreto is from Fortaleza  \r\n\r\n\r\n### Perturbing data for INVs and DIRs\r\nSee [2.Perturbing data](notebooks/tutorials/2.%20Perturbing%20data.ipynb) for more details.  \r\nCustom perturbation function:\r\n```python\r\nimport re\r\nimport checklist\r\nfrom checklist.perturb import Perturb\r\ndef replace_john_with_others(x, *args, **kwargs):\r\n    # Returns empty (if John is not present) or list of strings with John replaced by Luke and Mark\r\n    if not re.search(r'\\bJohn\\b', x):\r\n        return None\r\n    return [re.sub(r'\\bJohn\\b', n, x) for n in ['Luke', 'Mark']]\r\n\r\ndataset = ['John is a man', 'Mary is a woman', 'John is an apostle']\r\nret = Perturb.perturb(dataset, replace_john_with_others)\r\nret.data\r\n```\r\n\u003e [['John is a man', 'Luke is a man', 'Mark is a man'],  \r\n ['John is an apostle', 'Luke is an apostle', 'Mark is an apostle']]\r\n\r\nGeneral purpose perturbations (see tutorial for more):\r\n```python\r\nimport spacy\r\nnlp = spacy.load('en_core_web_sm')\r\npdataset = list(nlp.pipe(dataset))\r\nret = Perturb.perturb(pdataset, Perturb.change_names, n=2)\r\nret.data\r\n```\r\n\u003e [['John is a man', 'Ian is a man', 'Robert is a man'],  \r\n ['Mary is a woman', 'Katherine is a woman', 'Alexandra is a woman'],  \r\n ['John is an apostle', 'Paul is an apostle', 'Gabriel is an apostle']]\r\n\r\n```python\r\nret = Perturb.perturb(pdataset, Perturb.add_negation)\r\nret.data\r\n```\r\n\u003e [['John is a man', 'John is not a man'],  \r\n ['Mary is a woman', 'Mary is not a woman'],  \r\n ['John is an apostle', 'John is not an apostle']]\r\n\r\n### Creating and running tests\r\nSee [3. Test types, expectation functions, running tests](notebooks/tutorials/3.%20Test%20types,%20expectation%20functions,%20running%20tests.ipynb) for more details.\r\n\r\nMFT:\r\n```python\r\nimport checklist\r\nfrom checklist.editor import Editor\r\nfrom checklist.perturb import Perturb\r\nfrom checklist.test_types import MFT, INV, DIR\r\neditor = Editor()\r\n\r\nt = editor.template('This is {a:adj} {mask}.',  \r\n                      adj=['good', 'great', 'excellent', 'awesome'])\r\ntest1 = MFT(t.data, labels=1, name='Simple positives',\r\n           capability='Vocabulary', description='')\r\n```\r\nINV:\r\n```python\r\ndataset = ['This was a very nice movie directed by John Smith.',\r\n           'Mary Keen was brilliant.',\r\n          'I hated everything about this.',\r\n          'This movie was very bad.',\r\n          'I really liked this movie.',\r\n          'just bad.',\r\n          'amazing.',\r\n          ]\r\nt = Perturb.perturb(dataset, Perturb.add_typos)\r\ntest2 = INV(**t)\r\n```\r\nDIR:\r\n```python\r\nfrom checklist.expect import Expect\r\ndef add_negative(x):\r\n    phrases = ['Anyway, I thought it was bad.', 'Having said this, I hated it', 'The director should be fired.']\r\n    return ['%s %s' % (x, p) for p in phrases]\r\n\r\nt = Perturb.perturb(dataset, add_negative)\r\nmonotonic_decreasing = Expect.monotonic(label=1, increasing=False, tolerance=0.1)\r\ntest3 = DIR(**t, expect=monotonic_decreasing)\r\n```\r\nRunning tests directly:\r\n```python\r\nfrom checklist.pred_wrapper import PredictorWrapper\r\n# wrapped_pp returns a tuple with (predictions, softmax confidences)\r\nwrapped_pp = PredictorWrapper.wrap_softmax(model.predict_proba)\r\ntest.run(wrapped_pp)\r\n```\r\nRunning from a file:\r\n```python\r\n# One line per example\r\ntest.to_raw_file('/tmp/raw_file.txt')\r\n# each line has prediction probabilities (softmax)\r\ntest.run_from_file('/tmp/softmax_preds.txt', file_format='softmax', overwrite=True)\r\n```\r\nSummary of results:\r\n```python\r\ntest.summary(n=1)\r\n```\r\n\u003e Test cases:      400  \r\n\u003e Fails (rate):    200 (50.0%)  \r\n\u003e\r\n\u003e Example fails:  \r\n\u003e 0.2 This is a good idea\r\n\r\nVisual summary:\r\n```python\r\ntest.visual_summary()\r\n```\r\n\r\n![visual summary](notebooks/tutorials/visual_summary.gif )\r\n\r\nSaving and loading individual tests:\r\n```python\r\n# save\r\ntest.save(path)\r\n# load\r\ntest = MFT.from_file(path)\r\n```\r\n\r\n### Custom expectation functions\r\nSee [3. Test types, expectation functions, running tests](notebooks/tutorials/3.%20Test%20types,%20expectation%20functions,%20running%20tests.ipynb) for more details.\r\n\r\nIf you are writing a custom expectation functions, it must return a float or bool for each example such that:\r\n- `\u003e 0` (or True) means passed,\r\n- `\u003c= 0` or False means fail, and (optionally) the magnitude of the failure, indicated by distance from 0, e.g. -10 is worse than -1\r\n- `None` means the test does not apply, and this should not be counted\r\n\r\nExpectation on a single example:\r\n```python\r\ndef high_confidence(x, pred, conf, label=None, meta=None):\r\n    return conf.max() \u003e 0.95\r\nexpect_fn = Expect.single(high_confidence)\r\n```\r\n\r\nExpectation on pairs of `(orig, new)` examples (for `INV` and `DIR`):\r\n```python\r\ndef changed_pred(orig_pred, pred, orig_conf, conf, labels=None, meta=None):\r\n    return pred != orig_pred\r\nexpect_fn = Expect.pairwise(changed_pred)\r\n```\r\nThere's also `Expect.testcase` and `Expect.test`, amongst many others.  \r\nCheck out [expect.py](checklist/expect.py) for more details.\r\n\r\n\r\n### Test Suites\r\nSee [4. The CheckList process](notebooks/tutorials/4.%20The%20CheckList%20process.ipynb) for more details.\r\n\r\nAdding tests:\r\n```python\r\nfrom checklist.test_suite import TestSuite\r\n# assuming test exists:\r\nsuite.add(test)\r\n```\r\n\r\nRunning a suite is the same as running an individual test, either directly or through a file:\r\n\r\n```python\r\nfrom checklist.pred_wrapper import PredictorWrapper\r\n# wrapped_pp returns a tuple with (predictions, softmax confidences)\r\nwrapped_pp = PredictorWrapper.wrap_softmax(model.predict_proba)\r\nsuite.run(wrapped_pp)\r\n# or suite.run_from_file, see examples above\r\n```\r\n\r\nTo visualize results, you can call `suite.summary()` (same as `test.summary`), or `suite.visual_summary_table()`. This is what the latter looks like for BERT on sentiment analysis:\r\n```python\r\nsuite.visual_summary_table()\r\n```\r\n![visual summary table](notebooks/tutorials/visual_sentiment_summary.gif )\r\n\r\nFinally, it's easy to save, load, and share a suite:\r\n```python\r\n# save\r\nsuite.save(path)\r\n# load\r\nsuite = TestSuite.from_file(path)\r\n```\r\n\r\n## API reference\r\nOn [readthedocs](https://checklist-nlp.readthedocs.io/en/latest/)\r\n\r\n## Code of Conduct\r\n[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)\r\n","funding_links":[],"categories":["Jupyter Notebook","Natural Language Processing","模型的可解释性","Testing Frameworks","Technical Resources","The List of AI Testing Tools"],"sub_categories":["Others","Language-Specific Tools","Open Source/Access Responsible AI Software Packages","8. CheckList (for NLP)"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarcotcr%2Fchecklist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmarcotcr%2Fchecklist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmarcotcr%2Fchecklist/lists"}