{"id":26938020,"url":"https://github.com/adaamko/potato","last_synced_at":"2025-07-27T16:34:04.386Z","repository":{"id":37931030,"uuid":"348635539","full_name":"adaamko/POTATO","owner":"adaamko","description":"XAI based human-in-the-loop framework for automatic rule-learning.","archived":false,"fork":false,"pushed_at":"2024-07-07T22:34:54.000Z","size":6361,"stargazers_count":48,"open_issues_count":13,"forks_count":8,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-18T19:51:30.865Z","etag":null,"topics":["classification","explainability","explainable-ai","explainable-ml","information-extraction","interpretable-ai","interpretable-machine-learning","nlp","nlp-machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adaamko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-03-17T08:36:20.000Z","updated_at":"2025-02-08T22:17:49.000Z","dependencies_parsed_at":"2023-02-09T11:31:32.584Z","dependency_job_id":null,"html_url":"https://github.com/adaamko/POTATO","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adaamko%2FPOTATO","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adaamko%2FPOTATO/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adaamko%2FPOTATO/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adaamko%2FPOTATO/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adaamko","download_url":"https://codeload.github.com/adaamko/POTATO/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246819778,"owners_count":20839095,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","explainability","explainable-ai","explainable-ml","information-extraction","interpretable-ai","interpretable-machine-learning","nlp","nlp-machine-learning"],"created_at":"2025-04-02T13:16:48.243Z","updated_at":"2025-04-02T13:16:48.920Z","avatar_url":"https://github.com/adaamko.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🥔 POTATO\nPOTATO is a human-in-the-loop XAI framework for extracting and evaluating interpretable graph features for any classification problem in Natural Language Processing.\n\n## Built systems\n\nTo get started with rule-systems we provide rule-based features prebuilt with POTATO on different datasets (e.g. our paper _Offensive text detection on English Twitter with deep learning models and rule-based systems_ for the HASOC2021 shared task). If you are interested in that, you can go under _features/_ for more info!\n\n## Install and Quick Start\nCheck out our quick demonstration (~2 min) video about the tool:\nhttps://youtu.be/PkQ71wUSeNU\n\nThere is a longer version with a detailed method description and presented background research (~1 hour): https://youtu.be/6R_V1WfIjsU\n\n### Setup\nThe tool is heavily dependent upon the [tuw-nlp](https://github.com/recski/tuw-nlp) repository. You can install tuw-nlp with pip:\n\n```\npip install tuw-nlp\n```\nThen follow the [instructions](https://github.com/recski/tuw-nlp) to setup the package.\n\n\nThen install POTATO from pip:\n\n```\npip install xpotato\n```\n\nOr you can install it from source:\n\n```\npip install -e .\n```\n\n### Usage\n\n- POTATO is an IE tool that works on graphs, currently we support three types of graphs: AMR, UD and [Fourlang](https://github.com/kornai/4lang). \n\n- In the README we provide examples with fourlang semantic graphs. Make sure to follow the instructions in the [tuw_nlp](https://github.com/recski/tuw-nlp) repo to be able to build fourlang graphs. \n\n- If you are interested in AMR graphs, you can go to the [hasoc](https://github.com/adaamko/POTATO/tree/main/features/hasoc) folder To get started with rule-systems prebuilt with POTATO on the HASOC dataset (we also presented a paper named _Offensive text detection on English Twitter with deep learning models and rule-based systems_ for the HASOC2021 shared task). \n\n- We also provide experiments on the [CrowdTruth](https://github.com/CrowdTruth/Medical-Relation-Extraction) medical relation extraction datasets with UD graphs, go to the [crowdtruth](https://github.com/adaamko/POTATO/tree/main/features/crowdtruth) folder for more info!\n\n- POTATO can also handle unlabeled, or partially labeled data, see [advanced](###advanced-mode) mode to get to know more.\n\n__To see complete working examples go under the _notebooks/_ folder to see experiments on HASOC and on the Semeval relation extraction dataset.__\n\nFirst import packages from potato:\n```python\nfrom xpotato.dataset.dataset import Dataset\nfrom xpotato.models.trainer import GraphTrainer\n```\n\nFirst we demonstrate POTATO's capabilities with a few sentences manually picked from the dataset.\n\n__Note that we replaced the two entitites in question with _XXX_ and _YYY_.__\n\n```python\nsentences = [(\"Governments and industries in nations around the world are pouring XXX into YYY.\", \"Entity-Destination(e1,e2)\"),\n            (\"The scientists poured XXX into pint YYY.\", \"Entity-Destination(e1,e2)\"),\n            (\"The suspect pushed the XXX into a deep YYY.\", \"Entity-Destination(e1,e2)\"),\n            (\"The Nepalese government sets up a XXX to inquire into the alleged YYY of diplomatic passports.\", \"Other\"),\n            (\"The entity1 to buy papers is pushed into the next entity2.\", \"Entity-Destination(e1,e2)\"),\n            (\"An unnamed XXX was pushed into the YYY.\", \"Entity-Destination(e1,e2)\"),\n            (\"Since then, numerous independent feature XXX have journeyed into YYY.\", \"Other\"),\n            (\"For some reason, the XXX was blinded from his own YYY about the incommensurability of time.\", \"Other\"),\n            (\"Sparky Anderson is making progress in his XXX from YYY and could return to managing the Detroit Tigers within a week.\", \"Other\"),\n            (\"Olympics have already poured one XXX into the YYY.\", \"Entity-Destination(e1,e2)\"),\n            (\"After wrapping him in a light blanket, they placed the XXX in the YYY his father had carved for him.\", \"Entity-Destination(e1,e2)\"),\n            (\"I placed the XXX in a natural YYY, at the base of a part of the fallen arch.\", \"Entity-Destination(e1,e2)\"),\n            (\"The XXX was delivered from the YYY of Lincoln Memorial on August 28, 1963 as part of his famous March on Washington.\", \"Other\"),\n            (\"The XXX leaked from every conceivable YYY.\", \"Other\"),\n            (\"The scientists placed the XXX in a tiny YYY which gets channelled into cancer cells, and is then unpacked with a laser impulse.\", \"Entity-Destination(e1,e2)\"),\n            (\"The level surface closest to the MSS, known as the XXX, departs from an YYY by about 100 m in each direction.\", \"Other\"),\n            (\"Gaza XXX recover from three YYY of war.\", \"Other\"),\n            (\"This latest XXX from the animation YYY at Pixar is beautiful, masterly, inspired - and delivers a powerful ecological message.\", \"Other\")]\n```\n\nInitialize the dataset and also provide a label encoding. Then parse the sentences into graphs. Currently we provide three types of graphs: _ud_, _fourlang_, _amr_. Also provide the language you want to parse, currently we support English (en) and German (de).\n\n```python\ndataset = Dataset(sentences, label_vocab={\"Other\":0, \"Entity-Destination(e1,e2)\": 1}, lang=\"en\")\ndataset.set_graphs(dataset.parse_graphs(graph_format=\"ud\"))\n```\n\nCheck the dataset:\n```python\ndf = dataset.to_dataframe()\n```\n\nWe can also check any of the graphs:\n### Check any of the graphs parsed\n\n```python\nfrom xpotato.models.utils import to_dot\nfrom graphviz import Source\n\nSource(to_dot(df.iloc[0].graph))\n```\n![graph](https://raw.githubusercontent.com/adaamko/POTATO/main/files/re_example.svg)\n\n### Rules\n\nIf the dataset is prepared and the graphs are parsed, we can write rules to match labels. We can write rules either manually or extract\nthem automatically (POTATO also provides a frontend that tries to do both).\n\nThe simplest rule would be just a node in the graph:\n```python\n# The syntax of the rules is List[List[rules that we want to match], List[rules that shouldn't be in the matched graphs], Label of the rule]\nrule_to_match = [[[\"(u_1 / into)\"], [], \"Entity-Destination(e1,e2)\"]]\n```\n\nInit the rule matcher:\n```python\nfrom xpotato.graph_extractor.extract import FeatureEvaluator\nevaluator = FeatureEvaluator()\n```\n\nMatch the rules in the dataset:\n```python\n#match single feature\ndf = dataset.to_dataframe()\nevaluator.match_features(df, rule_to_match)\n```\n\n|    | Sentence                                                                                                                        | Predicted label           | Matched rule                                        |\n|---:|:--------------------------------------------------------------------------------------------------------------------------------|:--------------------------|:----------------------------------------------------|\n|  0 | Governments and industries in nations around the world are pouring XXX into YYY.                                                | Entity-Destination(e1,e2) | [['(u_1 / into)'], [], 'Entity-Destination(e1,e2)'] |\n|  1 | The scientists poured XXX into pint YYY.                                                                                        | Entity-Destination(e1,e2) | [['(u_1 / into)'], [], 'Entity-Destination(e1,e2)'] |\n|  2 | The suspect pushed the XXX into a deep YYY.                                                                                     | Entity-Destination(e1,e2) | [['(u_1 / into)'], [], 'Entity-Destination(e1,e2)'] |\n|  3 | The Nepalese government sets up a XXX to inquire into the alleged YYY of diplomatic passports.                                  | Entity-Destination(e1,e2) | [['(u_1 / into)'], [], 'Entity-Destination(e1,e2)'] |\n|  4 | The entity1 to buy papers is pushed into the next entity2.                                                                      | Entity-Destination(e1,e2) | [['(u_1 / into)'], [], 'Entity-Destination(e1,e2)'] |\n|  5 | An unnamed XXX was pushed into the YYY.                                                                                         | Entity-Destination(e1,e2) | [['(u_1 / into)'], [], 'Entity-Destination(e1,e2)'] |\n|  6 | Since then, numerous independent feature XXX have journeyed into YYY.                                                           | Entity-Destination(e1,e2) | [['(u_1 / into)'], [], 'Entity-Destination(e1,e2)'] |\n|  7 | For some reason, the XXX was blinded from his own YYY about the incommensurability of time.                                     |                           |                                                     |\n|  8 | Sparky Anderson is making progress in his XXX from YYY and could return to managing the Detroit Tigers within a week.           |                           |                                                     |\n|  9 | Olympics have already poured one XXX into the YYY.                                                                              | Entity-Destination(e1,e2) | [['(u_1 / into)'], [], 'Entity-Destination(e1,e2)'] |\n| 10 | After wrapping him in a light blanket, they placed the XXX in the YYY his father had carved for him.                            |                           |                                                     |\n| 11 | I placed the XXX in a natural YYY, at the base of a part of the fallen arch.                                                    |                           |                                                     |\n| 12 | The XXX was delivered from the YYY of Lincoln Memorial on August 28, 1963 as part of his famous March on Washington.            |                           |                                                     |\n| 13 | The XXX leaked from every conceivable YYY.                                                                                      |                           |                                                     |\n| 14 | The scientists placed the XXX in a tiny YYY which gets channelled into cancer cells, and is then unpacked with a laser impulse. | Entity-Destination(e1,e2) | [['(u_1 / into)'], [], 'Entity-Destination(e1,e2)'] |\n| 15 | The level surface closest to the MSS, known as the XXX, departs from an YYY by about 100 m in each direction.                   |                           |                                                     |\n| 16 | Gaza XXX recover from three YYY of war.                                                                                         |                           |                                                     |\n| 17 | This latest XXX from the animation YYY at Pixar is beautiful, masterly, inspired - and delivers a powerful ecological message.  |                           |                                                     |\n\n\n\nYou can see in the dataset that the rules only matched the instances where the \"into\" node was present.\n\nOne of the core features of our tool is that we are also able to match subgraphs. To describe a graph, we use the [PENMAN](https://github.com/goodmami/penman) notation. \n\nE.g. the string _(u_1 / into :1 (u_3 / pour))_ would describe a graph with two nodes (\"into\" and \"pour\") and a single directed edge with the label \"1\" between them.\n```python\n#match a simple graph feature\nevaluator.match_features(df, [[[\"(u_1 / into :1 (u_2 / pour) :2 (u_3 / YYY))\"], [], \"Entity-Destination(e1,e2)\"]])\n```\n\nDescribing a subgraph with the string \"(u_1 / into :1 (u_2 / pour) :2 (u_3 / YYY))\" will return only three examples instead of 9 (when we only had a single node as a feature)\n|    | Sentence                                                                                                                        | Predicted label           | Matched rule                                                                       |\n|---:|:--------------------------------------------------------------------------------------------------------------------------------|:--------------------------|:-----------------------------------------------------------------------------------|\n|  0 | Governments and industries in nations around the world are pouring XXX into YYY.                                                | Entity-Destination(e1,e2) | [['(u_1 / into :1 (u_2 / pour) :2 (u_3 / YYY))'], [], 'Entity-Destination(e1,e2)'] |\n|  1 | The scientists poured XXX into pint YYY.                                                                                        | Entity-Destination(e1,e2) | [['(u_1 / into :1 (u_2 / pour) :2 (u_3 / YYY))'], [], 'Entity-Destination(e1,e2)'] |\n|  2 | The suspect pushed the XXX into a deep YYY.                                                                                     |                           |                                                                                    |\n|  3 | The Nepalese government sets up a XXX to inquire into the alleged YYY of diplomatic passports.                                  |                           |                                                                                    |\n|  4 | The entity1 to buy papers is pushed into the next entity2.                                                                      |                           |                                                                                    |\n|  5 | An unnamed XXX was pushed into the YYY.                                                                                         |                           |                                                                                    |\n|  6 | Since then, numerous independent feature XXX have journeyed into YYY.                                                           |                           |                                                                                    |\n|  7 | For some reason, the XXX was blinded from his own YYY about the incommensurability of time.                                     |                           |                                                                                    |\n|  8 | Sparky Anderson is making progress in his XXX from YYY and could return to managing the Detroit Tigers within a week.           |                           |                                                                                    |\n|  9 | Olympics have already poured one XXX into the YYY.                                                                              | Entity-Destination(e1,e2) | [['(u_1 / into :1 (u_2 / pour) :2 (u_3 / YYY))'], [], 'Entity-Destination(e1,e2)'] |\n| 10 | After wrapping him in a light blanket, they placed the XXX in the YYY his father had carved for him.                            |                           |                                                                                    |\n| 11 | I placed the XXX in a natural YYY, at the base of a part of the fallen arch.                                                    |                           |                                                                                    |\n| 12 | The XXX was delivered from the YYY of Lincoln Memorial on August 28, 1963 as part of his famous March on Washington.            |                           |                                                                                    |\n| 13 | The XXX leaked from every conceivable YYY.                                                                                      |                           |                                                                                    |\n| 14 | The scientists placed the XXX in a tiny YYY which gets channelled into cancer cells, and is then unpacked with a laser impulse. |                           |                                                                                    |\n| 15 | The level surface closest to the MSS, known as the XXX, departs from an YYY by about 100 m in each direction.                   |                           |                                                                                    |\n| 16 | Gaza XXX recover from three YYY of war.                                                                                         |                           |                                                                                    |\n| 17 | This latest XXX from the animation YYY at Pixar is beautiful, masterly, inspired - and delivers a powerful ecological message.  |                           |                                                                                    |\n\n\nWe can also add negated features that we don't want to match (e.g. this won't match the first row where 'pour' is present):\n```python\n#match a simple graph feature\nevaluator.match_features(df, [[[\"(u_1 / into :2 (u_3 / YYY))\"], [\"(u_2 / pour)\"], \"Entity-Destination(e1,e2)\"]])\n```\n\n|    | Sentence                                                                                                                        | Predicted label           | Matched rule                                                                     |\n|---:|:--------------------------------------------------------------------------------------------------------------------------------|:--------------------------|:---------------------------------------------------------------------------------|\n|  0 | Governments and industries in nations around the world are pouring XXX into YYY.                                                |                           |                                                                                  |\n|  1 | The scientists poured XXX into pint YYY.                                                                                        |                           |                                                                                  |\n|  2 | The suspect pushed the XXX into a deep YYY.                                                                                     | Entity-Destination(e1,e2) | [['(u_1 / into :2 (u_3 / YYY))'], ['(u_2 / pour)'], 'Entity-Destination(e1,e2)'] |\n|  3 | The Nepalese government sets up a XXX to inquire into the alleged YYY of diplomatic passports.                                  | Entity-Destination(e1,e2) | [['(u_1 / into :2 (u_3 / YYY))'], ['(u_2 / pour)'], 'Entity-Destination(e1,e2)'] |\n|  4 | The entity1 to buy papers is pushed into the next entity2.                                                                      |                           |                                                                                  |\n|  5 | An unnamed XXX was pushed into the YYY.                                                                                         | Entity-Destination(e1,e2) | [['(u_1 / into :2 (u_3 / YYY))'], ['(u_2 / pour)'], 'Entity-Destination(e1,e2)'] |\n|  6 | Since then, numerous independent feature XXX have journeyed into YYY.                                                           | Entity-Destination(e1,e2) | [['(u_1 / into :2 (u_3 / YYY))'], ['(u_2 / pour)'], 'Entity-Destination(e1,e2)'] |\n|  7 | For some reason, the XXX was blinded from his own YYY about the incommensurability of time.                                     |                           |                                                                                  |\n|  8 | Sparky Anderson is making progress in his XXX from YYY and could return to managing the Detroit Tigers within a week.           |                           |                                                                                  |\n|  9 | Olympics have already poured one XXX into the YYY.                                                                              |                           |                                                                                  |\n| 10 | After wrapping him in a light blanket, they placed the XXX in the YYY his father had carved for him.                            |                           |                                                                                  |\n| 11 | I placed the XXX in a natural YYY, at the base of a part of the fallen arch.                                                    |                           |                                                                                  |\n| 12 | The XXX was delivered from the YYY of Lincoln Memorial on August 28, 1963 as part of his famous March on Washington.            |                           |                                                                                  |\n| 13 | The XXX leaked from every conceivable YYY.                                                                                      |                           |                                                                                  |\n| 14 | The scientists placed the XXX in a tiny YYY which gets channelled into cancer cells, and is then unpacked with a laser impulse. |                           |                                                                                  |\n| 15 | The level surface closest to the MSS, known as the XXX, departs from an YYY by about 100 m in each direction.                   |                           |                                                                                  |\n| 16 | Gaza XXX recover from three YYY of war.                                                                                         |                           |                                                                                  |\n| 17 | This latest XXX from the animation YYY at Pixar is beautiful, masterly, inspired - and delivers a powerful ecological message.  |                           |                                                                                  |\n\nIf we don't want to specify nodes, regex can also be used in place of the node and edge-names:\n\n```python\n#regex can be used to match any node (this will match instances where 'into' is connected to any node with '1' edge)\nevaluator.match_features(df, [[[\"(u_1 / into :1 (u_2 / .*) :2 (u_3 / YYY))\"], [], \"Entity-Destination(e1,e2)\"]])\n```\n\n|    | Sentence                                                                                                                        | Predicted label           | Matched rule                                                                     |\n|---:|:--------------------------------------------------------------------------------------------------------------------------------|:--------------------------|:---------------------------------------------------------------------------------|\n|  0 | Governments and industries in nations around the world are pouring XXX into YYY.                                                | Entity-Destination(e1,e2) | [['(u_1 / into :1 (u_2 / .*) :2 (u_3 / YYY))'], [], 'Entity-Destination(e1,e2)'] |\n|  1 | The scientists poured XXX into pint YYY.                                                                                        | Entity-Destination(e1,e2) | [['(u_1 / into :1 (u_2 / .*) :2 (u_3 / YYY))'], [], 'Entity-Destination(e1,e2)'] |\n|  2 | The suspect pushed the XXX into a deep YYY.                                                                                     | Entity-Destination(e1,e2) | [['(u_1 / into :1 (u_2 / .*) :2 (u_3 / YYY))'], [], 'Entity-Destination(e1,e2)'] |\n|  3 | The Nepalese government sets up a XXX to inquire into the alleged YYY of diplomatic passports.                                  | Entity-Destination(e1,e2) | [['(u_1 / into :1 (u_2 / .*) :2 (u_3 / YYY))'], [], 'Entity-Destination(e1,e2)'] |\n|  4 | The entity1 to buy papers is pushed into the next entity2.                                                                      |                           |                                                                                  |\n|  5 | An unnamed XXX was pushed into the YYY.                                                                                         | Entity-Destination(e1,e2) | [['(u_1 / into :1 (u_2 / .*) :2 (u_3 / YYY))'], [], 'Entity-Destination(e1,e2)'] |\n|  6 | Since then, numerous independent feature XXX have journeyed into YYY.                                                           | Entity-Destination(e1,e2) | [['(u_1 / into :1 (u_2 / .*) :2 (u_3 / YYY))'], [], 'Entity-Destination(e1,e2)'] |\n|  7 | For some reason, the XXX was blinded from his own YYY about the incommensurability of time.                                     |                           |                                                                                  |\n|  8 | Sparky Anderson is making progress in his XXX from YYY and could return to managing the Detroit Tigers within a week.           |                           |                                                                                  |\n|  9 | Olympics have already poured one XXX into the YYY.                                                                              | Entity-Destination(e1,e2) | [['(u_1 / into :1 (u_2 / .*) :2 (u_3 / YYY))'], [], 'Entity-Destination(e1,e2)'] |\n| 10 | After wrapping him in a light blanket, they placed the XXX in the YYY his father had carved for him.                            |                           |                                                                                  |\n| 11 | I placed the XXX in a natural YYY, at the base of a part of the fallen arch.                                                    |                           |                                                                                  |\n| 12 | The XXX was delivered from the YYY of Lincoln Memorial on August 28, 1963 as part of his famous March on Washington.            |                           |                                                                                  |\n| 13 | The XXX leaked from every conceivable YYY.                                                                                      |                           |                                                                                  |\n| 14 | The scientists placed the XXX in a tiny YYY which gets channelled into cancer cells, and is then unpacked with a laser impulse. |                           |                                                                                  |\n| 15 | The level surface closest to the MSS, known as the XXX, departs from an YYY by about 100 m in each direction.                   |                           |                                                                                  |\n| 16 | Gaza XXX recover from three YYY of war.                                                                                         |                           |                                                                                  |\n| 17 | This latest XXX from the animation YYY at Pixar is beautiful, masterly, inspired - and delivers a powerful ecological message.  |                           |                                                                                  |\n\nWe can also train regex rules from a training data, this will automatically replace regex '.*' with nodes that are \n'good enough' statistically based on the provided dataframe.\n\n```python\nevaluator.train_feature(\"Entity-Destination(e1,e2)\", \"(u_1 / into :1 (u_2 / .*) :2 (u_3 / YYY))\", df)\n```\n\nThis returns '(u_1 / into :1 (u_2 / push|pour) :2 (u_3 / YYY))' (replaced '.*' with _push_ and _pour_)\n\n### Learning rules\n\nTo extract rules automatically, train the dataset with graph features and rank them based on relevancy:\n\n```python\ndf = dataset.to_dataframe()\ntrainer = GraphTrainer(df)\n#extract features\nfeatures = trainer.prepare_and_train()\n\nfrom xpotato.dataset.utils import save_dataframe\nfrom sklearn.model_selection import train_test_split\n\ntrain, val = train_test_split(df, test_size=0.2, random_state=1234)\n\n#save train and validation, this is important for the frontend to work\nsave_dataframe(train, 'train.tsv')\nsave_dataframe(val, 'val.tsv')\n\nimport json\n\n#also save the ranked features\nwith open(\"features.json\", \"w+\") as f:\n    json.dump(features, f)\n\n```\n\nYou can also save the parsed graphs for evaluation or for caching:\n\n```python\nimport pickle\nwith open(\"graphs.pickle\", \"wb\") as f:\n    pickle.dump(val.graph, f)\n```\n\n## Frontend\n\nIf the DataFrame is ready with the parsed graphs, the UI can be started to inspect the extracted rules and modify them. The frontend is a streamlit app, the simplest way of starting it is (the training and the validation dataset must be provided):\n\n```\nstreamlit run frontend/app.py -- -t notebooks/train.tsv -v notebooks/val.tsv -g ud\n```\n\nit can be also started with the extracted features:\n\n```\nstreamlit run frontend/app.py -- -t notebooks/train.tsv -v notebooks/val.tsv -g ud -sr notebooks/features.json\n```\n\nif you already used the UI and extracted the features manually and you want to load it, you can run:\n```\nstreamlit run frontend/app.py -- -t notebooks/train.tsv -v notebooks/val.tsv -g ud -sr notebooks/features.json -hr notebooks/manual_features.json\n```\n\n### Advanced mode\n\nIf labels are not or just partially provided, the frontend can be started also in _advanced_ mode, where the user can _annotate_ a few examples at the start, then the system gradually offers rules based on the provided examples. \n\n\nDataset without labels can be initialized with:\n```python\nsentences = [(\"Governments and industries in nations around the world are pouring XXX into YYY.\", \"\"),\n            (\"The scientists poured XXX into pint YYY.\", \"\"),\n            (\"The suspect pushed the XXX into a deep YYY.\", \"\"),\n            (\"The Nepalese government sets up a XXX to inquire into the alleged YYY of diplomatic passports.\", \"\"),\n            (\"The entity1 to buy papers is pushed into the next entity2.\", \"\"),\n            (\"An unnamed XXX was pushed into the YYY.\", \"\"),\n            (\"Since then, numerous independent feature XXX have journeyed into YYY.\", \"\"),\n            (\"For some reason, the XXX was blinded from his own YYY about the incommensurability of time.\", \"\"),\n            (\"Sparky Anderson is making progress in his XXX from YYY and could return to managing the Detroit Tigers within a week.\", \"\"),\n            (\"Olympics have already poured one XXX into the YYY.\", \"\"),\n            (\"After wrapping him in a light blanket, they placed the XXX in the YYY his father had carved for him.\", \"\"),\n            (\"I placed the XXX in a natural YYY, at the base of a part of the fallen arch.\", \"\"),\n            (\"The XXX was delivered from the YYY of Lincoln Memorial on August 28, 1963 as part of his famous March on Washington.\", \"\"),\n            (\"The XXX leaked from every conceivable YYY.\", \"\"),\n            (\"The scientists placed the XXX in a tiny YYY which gets channelled into cancer cells, and is then unpacked with a laser impulse.\", \"\"),\n            (\"The level surface closest to the MSS, known as the XXX, departs from an YYY by about 100 m in each direction.\", \"\"),\n            (\"Gaza XXX recover from three YYY of war.\", \"\"),\n            (\"This latest XXX from the animation YYY at Pixar is beautiful, masterly, inspired - and delivers a powerful ecological message.\", \"\")]\n```\n\n\nThen, the frontend can be started:\n```\nstreamlit run frontend/app.py -- -t notebooks/unsupervised_dataset.tsv -g ud -m advanced\n```\n\nOnce the frontend starts up and you define the labels, you are faced with the annotation interface. You can search elements by clicking on the appropriate column name and applying the desired filter. You can annotate instances by checking the checkbox at the beginning of the line. You can check multiple checkboxs at a time. Once you've selected the utterances you want to annotate, click on the _Annotate_ button. The annotated samples will appear in the lower table. You can clear the annotation of certain elements by selecting them in the second table and clicking _Clear annotation_.\n\nOnce you have some annotated data, you can train rules by clicking the _Train!_ button. It is recommended to set the _Rank features based on accuracy_ to True, if you have just a few samples. You will get a similar interface as in supervised mode, you can generate rule suggestions, and write your own rules as usual. Once you are satisfied with the rules, select each of them and click _annotate based on selected_. This process might take a while if you are working with large data. You should get all the rule matches marked in the first and the second tables. You can order the tables by each column, so it's easier to check. You will have to manually accept the annotations generated this way for them to appear in the second table.\n\n- You can read about the use of the advanced mode in the [docs](https://github.com/adaamko/POTATO/tree/main/docs/README_advanced_mode.md)\n\n\n## Evaluate\nIf you have the features ready and you want to evaluate them on a test set, you can run:\n\n```python\npython scripts/evaluate.py -t ud -f notebooks/features.json -d notebooks/val.tsv\n```\n\nThe result will be a _csv_ file with the labels and the matched rules.\n\n## Service\nIf you are ready with the extracted features and want to use our package in production for inference (generating predictions for sentences), we also provide a REST API built on POTATO (based on [fastapi](https://github.com/tiangolo/fastapi)).\n\nFirst install FastAPI and [Uvicorn](https://www.uvicorn.org/)\n```bash\npip install fastapi\npip install \"uvicorn[standard]\"\n```\n\nTo start the service, you should set _language_, _graph\\_type_ and the _features_  for the service. This can be done through enviroment variables.\n\nExample:\n```bash\nexport FEATURE_PATH=/home/adaamko/projects/POTATO/features/semeval/test_features.json\nexport GRAPH_FORMAT=ud\nexport LANG=en\n```\n\nThen, start the REST API:\n```python\npython services/main.py\n```\n\nIt will start a service running on _localhost_ on port _8000_ (it will also initialize the correct models).\n\nThen you can use any client to make post requests:\n```bash\ncurl -X POST localhost:8000 -H 'Content-Type: application/json' -d '{\"text\":\"The suspect pushed the XXX into a deep YYY.\\nSparky Anderson is making progress in his XXX from YYY and could return to managing the Detroit Tigers within a week.\"}'\n```\n\nThe answer will be a list with the predicted labels (if none of the rules match, it will return \"NONE\"):\n```bash\n[\"Entity-Destination(e1,e2)\",\"NONE\"]\n```\n\nThe streamlit frontend also has an inference mode, where the implemented rule-system can be used for inference. It can be started with:\n\n```bash\nstreamlit run frontend/app.py -- -hr features/semeval/test_features.json -m inference\n```\n\n## Contributing\n\nWe welcome all contributions! Please fork this repository and create a branch for your modifications. We suggest getting in touch with us first, by opening an issue or by writing an email to Adam Kovacs or Gabor Recski at firstname.lastname@tuwien.ac.at\n\n## Citing\n\nIf you use the library, please cite our [paper](https://dl.acm.org/doi/abs/10.1145/3511808.3557196) published in CIKM 2022:\n\n```bib\n@inproceedings{Kovacs:2022,\nauthor = {Kov\\'{a}cs, \\'{A}d\\'{a}m and G\\'{e}mes, Kinga and Ikl\\'{o}di, Eszter and Recski, G\\'{a}bor},\ntitle = {POTATO: ExPlainable InfOrmation ExTrAcTion FramewOrk},\nyear = {2022},\nisbn = {9781450392365},\npublisher = {Association for Computing Machinery},\naddress = {New York, NY, USA},\nurl = {https://doi.org/10.1145/3511808.3557196},\ndoi = {10.1145/3511808.3557196},\nbooktitle = {Proceedings of the 31st ACM International Conference on Information \u0026 Knowledge Management},\npages = {4897–4901},\nnumpages = {5},\nkeywords = {explainability, explainable, hitl},\nlocation = {Atlanta, GA, USA},\nseries = {CIKM '22}\n}\n```\n\n## License \n\nMIT license\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadaamko%2Fpotato","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadaamko%2Fpotato","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadaamko%2Fpotato/lists"}