{"id":16685293,"url":"https://github.com/fractalego/tree_parser","last_synced_at":"2025-06-28T23:40:20.408Z","repository":{"id":40967503,"uuid":"249246037","full_name":"fractalego/tree_parser","owner":"fractalego","description":"A simple dependency parser in PyTorch","archived":false,"fork":false,"pushed_at":"2023-07-06T21:40:03.000Z","size":199,"stargazers_count":3,"open_issues_count":3,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-12T06:14:36.584Z","etag":null,"topics":["dependency-parser","dependency-tree","parsing","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fractalego.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"license.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-22T18:24:22.000Z","updated_at":"2024-12-15T14:16:45.000Z","dependencies_parsed_at":"2022-09-09T04:50:34.771Z","dependency_job_id":null,"html_url":"https://github.com/fractalego/tree_parser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fractalego%2Ftree_parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fractalego%2Ftree_parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fractalego%2Ftree_parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fractalego%2Ftree_parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fractalego","download_url":"https://codeload.github.com/fractalego/tree_parser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248525138,"owners_count":21118619,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dependency-parser","dependency-tree","parsing","pytorch"],"created_at":"2024-10-12T14:46:49.090Z","updated_at":"2025-04-12T06:14:41.967Z","avatar_url":"https://github.com/fractalego.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Simple Dependency Parser\n\nThis repo contains a simple code on a BERT-based dependency parser. \nI wanted to build one from scratch and ended up being surprised by how easy \nit has become at the current level of NLP.\n\nThe main inspiration is [Dozat and Manning](https://nlp.stanford.edu/pubs/dozat2017deep.pdf) as well as [Kondratyuk and Straka](https://www.aclweb.org/anthology/D19-1279/).\n\n## Installation\nPlease install [git-lfs](https://git-lfs.github.com/) before installing\n\n```bash\ngit clone https://github.com/fractalego/tree_parser.git\ncd tree_parser\nvirtualenv .env --python=/usr/bin/python3\npip install .\n```\n\n# Example\n\nAn example code is as below (in the file [predict.py](tree_parser/predict.py))\n\n```python\nimport os\n\nfrom networkx.drawing.nx_agraph import write_dot\n\nfrom tree_parser.parser import DependencyParser\n\n_path = os.path.dirname(__file__)\n\n_save_filename = os.path.join(_path, '../data/tree_parser.model')\n\n_text = \"\"\"\nIn nuclear physics, the island of stability is a predicted set of isotopes of superheavy elements \nthat may have considerably longer half-lives than known isotopes of these elements. \n\"\"\"\n\nif __name__ == '__main__':\n    parser = DependencyParser(_save_filename)\n    g = parser.parse(_text)\n\n    write_dot(g, 'test.dot')\n\n```\n\nWhich parses the sentence \n\n``\nIn nuclear physics, the island of stability is a predicted set of isotopes of superheavy elements \nthat may have considerably longer half-lives than known isotopes of these elements. \n``\n\nAnd outputs a '.dot' file.\n\nThe file can be converted onto a png using graphviz\n\n```bash\ndot -Tpng test.dot -o foo.png\n```\n![parse_tree](images/parse_tree.png)\n\n## Features\n\nThe output of `DependencyParser().parse()` is a Networkx graph.\nEach node has two attributes:\n \n1) 'pos', the POS tag \n2) 'token', the parsed word.\n\nEach edge has only one attribute 'label'. \nThey can be accessed through the \n\nIn addition the ID associated to each node is of the form `\u003cINDEX\u003e_TOKEN`\n\nIn the previous example the code\n\n```python\n    import networkx as nx\n\n    print('Node words:')\n    print(nx.get_node_attributes(g, 'token'))\n    print('Node POS tags:')\n    print(nx.get_node_attributes(g, 'pos'))\n    print('edge labels:')\n    print(nx.get_edge_attributes(g, 'label'))\n```\n\nwould output\n\n```python\nNode words:\n{'0_in': 'in', '2_physics': 'physics', '1_nuclear': 'nuclear', '5_island': 'island', '3_,': ',', '4_the': 'the', '11_set': 'set', '6_of': 'of', '7_stability': 'stability', '8_is': 'is', '9_a': 'a', '10_predicted': 'predicted', '12_of': 'of', '13_isotopes': 'isotopes', '15_of': 'of', '19_elements': 'elements', '16_superheavy': 'superheavy', '20_that': 'that', '22_have': 'have', '21_may': 'may', '23_considerably': 'considerably', '24_longer': 'longer', '25_half': 'half', '28_than': 'than', '30_isotopes': 'isotopes', '29_known': 'known', '32_of': 'of', '34_elements': 'elements', '33_these': 'these', '35_.': '.'}\nNode POS tags:\n{'0_in': 'IN', '2_physics': 'NN', '1_nuclear': 'JJ', '5_island': 'NNP', '3_,': ',', '4_the': 'DT', '11_set': 'NN', '6_of': 'IN', '7_stability': 'NN', '8_is': 'VBZ', '9_a': 'DT', '10_predicted': 'VBN', '12_of': 'IN', '13_isotopes': 'NNS', '15_of': 'IN', '19_elements': 'NNS', '16_superheavy': 'JJ', '20_that': 'WDT', '22_have': 'VB', '21_may': 'MD', '23_considerably': 'RB', '24_longer': 'JJR', '25_half': 'NNS', '28_than': 'IN', '30_isotopes': 'NNS', '29_known': 'VBN', '32_of': 'IN', '34_elements': 'NNS', '33_these': 'DT', '35_.': '.'}\nedge labels:\n{('0_in', '2_physics'): 'case', ('2_physics', '5_island'): 'nmod', ('1_nuclear', '2_physics'): 'amod', ('5_island', '11_set'): 'nsubj', ('3_,', '2_physics'): 'punct', ('4_the', '5_island'): 'det', ('6_of', '7_stability'): 'case', ('7_stability', '5_island'): 'nmod', ('8_is', '11_set'): 'cop', ('9_a', '11_set'): 'det', ('10_predicted', '11_set'): 'amod', ('12_of', '13_isotopes'): 'case', ('13_isotopes', '11_set'): 'nmod', ('15_of', '19_elements'): 'case', ('19_elements', '13_isotopes'): 'nmod', ('16_superheavy', '19_elements'): 'amod', ('20_that', '22_have'): 'nsubj', ('22_have', '19_elements'): 'acl:relcl', ('21_may', '22_have'): 'aux', ('23_considerably', '24_longer'): 'advmod', ('24_longer', '25_half'): 'amod', ('25_half', '22_have'): 'obj', ('28_than', '30_isotopes'): 'case', ('30_isotopes', '22_have'): 'nmod', ('29_known', '30_isotopes'): 'amod', ('32_of', '34_elements'): 'case', ('34_elements', '30_isotopes'): 'nmod', ('33_these', '34_elements'): 'det', ('35_.', '11_set'): 'punct'}\n\n```\n\n## Architecture and training\n\nThe stylized architecture is depicted below \n\n![Architecture](images/parser_diagram.png)\n\nWhile the full code is found in the file [model.py](tree_parser/model.py) \n\nThe system is trained with an Adam optimizer using 5e-5 as step size. \n\nThe two transformer layers seem to add stability to training: without the the system ends up stuck in a local minimum after a few epochs.\n\n\n## Dataset and results\n\nThe system is trained on the UG dataset [en_gum-ud](https://universaldependencies.org/treebanks/en_gum/index.html)\n\nThe results of the system are quite good. A comparison with a non BERT-based method can be found in the excellent paper   [Nguyen and Vespoor](https://www.aclweb.org/anthology/K18-2008/):\nBert seems to add 6% points in UAS and 7 in LAS. \nCuriously the POS score is increased by only 1 percentage point.   \n\n| Set | UAS | LAS | POS |\n|:---:|:---:|:---:|:---:|\n| Dev | 0.91| 0.88| 0.96|\n| Test| 0.9 | 0.87| 0.95|\n\n\n# Comments \n\nA pretrained BERT language model seems to be improving upon LSTM based training. By quite a lot!  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffractalego%2Ftree_parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffractalego%2Ftree_parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffractalego%2Ftree_parser/lists"}