{"id":15672628,"url":"https://github.com/tomekkorbak/treehopper","last_synced_at":"2025-05-06T21:41:55.169Z","repository":{"id":86091056,"uuid":"95490913","full_name":"tomekkorbak/treehopper","owner":"tomekkorbak","description":"A Tree-LSTM-based dependency tree sentiment labeler","archived":false,"fork":false,"pushed_at":"2019-05-09T09:26:44.000Z","size":144,"stargazers_count":15,"open_issues_count":1,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-31T03:12:12.732Z","etag":null,"topics":["deep-learning","dependency-tree","fasttext","natural-language-processing","polish-language","pytorch","recursive-neural-networks","sentiment-analysis","tree-lstm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tomekkorbak.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-26T21:26:42.000Z","updated_at":"2023-11-09T10:59:01.000Z","dependencies_parsed_at":"2023-03-02T09:15:27.805Z","dependency_job_id":null,"html_url":"https://github.com/tomekkorbak/treehopper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomekkorbak%2Ftreehopper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomekkorbak%2Ftreehopper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomekkorbak%2Ftreehopper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomekkorbak%2Ftreehopper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tomekkorbak","download_url":"https://codeload.github.com/tomekkorbak/treehopper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252775829,"owners_count":21802454,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","dependency-tree","fasttext","natural-language-processing","polish-language","pytorch","recursive-neural-networks","sentiment-analysis","tree-lstm"],"created_at":"2024-10-03T15:29:07.868Z","updated_at":"2025-05-06T21:41:55.128Z","avatar_url":"https://github.com/tomekkorbak.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"treehopper\n==================================\n\ntreehopper is a Tree-LSTM-based dependency tree sentiment labeler, implemented in [PyTorch](https://github.com/pytorch/pytorch) and optimized for morphologically rich languages with relatively loose word order (such as Polish).\n\ntreehopper was originally developed as a submission for [PolEval 2017](http://poleval.pl/), a SemEval-inspired NLP evaluation contest for Polish. It scores 0.80 accuracy on PolEval task 2 evaluation dataset. For more details see paper accompanying this submission: [Fine-tuning Tree-LSTM for phrase-level sentiment classification on a Polish dependency treebank](https://arxiv.org/abs/1711.01985).\n\n## What the heck are Tree-LSTMs and dependency tree sentiment labeling?\n\nA dependency tree is a linguistic formalism used for describing the structure of sentences. They are parse trees just like constituency trees, but slightly more useful when dealing with languages with complex inflectional structure and relatively loose word order such as Czech, Turkish, or Polish.\n\nTree sentiment labeling is the task of labeling each phrase (subtree) of a parse tree with its sentiment. [Stanford Sentiment Treebank](https://nlp.stanford.edu/sentiment) is one famous dataset for this task, but using constituency trees as its underlying linguistic formalism of choice.\n\nTree-LSTMs ([Tai et al., 2015](https://arxiv.org/abs/1503.00075)) generalize LSTMs from chain-like to tree-like structures, enabling state-of-the-art tree sentiment labeling. treehopper implements a variant of Tree-LSTMs known as Child-Sum Tree-LSTM, where each node of a tree can have an unbounded number of children and there is no order over those children. This approach is particularly well-suited for dependency trees.\n\n## How to use\n\nFirst things first:\n\n```bash\ngit clone git@github.com:tomekkorbak/treehopper.git\n```\n\n### Dependencies\n\nMake sure to use Python\u003e=3.5, PyTorch\u003e=0.2 and a Unix-like operating system (sorry, Windows users).\n\nWe recommend managing your dependencies using [virtualenv](https://virtualenv.pypa.io/en/stable/) and pip. For instructions on installing an appropriate PyTorch version please refer to [its website](http://pytorch.org/). All other dependencies can be installed by running `pip install -r requirements.txt`.\n\n### Inference using a pre-trained model\n\nWe provide a pre-trained model, trained on full PolEval training dataset (excluding evaluation dataset) with default hyperparameters (i.e. those described in the paper).\n\nThe script assumes the data to be tokenized and parsed. Specifically, `input_sentences` must be a list of tokenized sentences separated by a newline character. `input_parents` is a list of dependency trees in PolEval format (i.e. each token is assigned with an index of its parent).\n\n```bash\ncd treehopper/\ncurl -o model.pth \u003c\u003cURL WILL BE ADDED HERE\u003e\u003e\npython predict --model_path model.pth \\\n               --input_parents test/polevaltest_parents.txt \\\n               --input_sentences test/polevaltest_sentence.txt \\\n               --output output.txt\n```\n\n### Evaluating a pre-trained model\n\n```bash\n./fetch_data.sh\ncd treehopper/\npython evaluate.py --model_path model.pth\n```\n\nBy default, evaluation is against PolEval evaluation dataset.\n\n### Training from scratch\n\n```bash\n./fetch_data.sh\ncd treehopper/\npython train.py\n```\n\nBy default, models trained are saved per epoch in `/models/saved_models/`.\n\n### Documentation\n\nFor a complete API documentation, please run `predict.py`, `train.py`, or `evaluate.py` with `--help` flag.\n\nAll flags default to hyperparameters described in the paper.\n\n## Authors\n\nTomasz Korbak (tomasz.korbak@gmail.com)  \nPaulina Żak (paulina.zak1@gmail.com)\n\n## How to cite\n\n```\n@article{korbakzak2017,\n  author    = {Tomasz Korbak and\n               Paulina \\.Zak},\n  title     = {Fine-tuning Tree-LSTM for phrase-level sentiment classification on\n               a Polish dependency treebank. Submission to PolEval task 2},\n  journal   = {Proceedings of the 8th Language \u0026 Technology Conference (LTC 2017)},\n  year      = {2017},\n  url       = {http://arxiv.org/abs/1711.01985}\n}\n```\n\n## Acknowledgements\n\ntreehopper core code was loosely based on [TreeLSTMSentiment](https://github.com/ttpro1995/TreeLSTMSentiment), which was based on [Tree-LSTM's original Lua implementation](https://github.com/stanfordnlp/treelstm) of [Tai et al., 2015](https://arxiv.org/abs/1503.00075).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomekkorbak%2Ftreehopper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftomekkorbak%2Ftreehopper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomekkorbak%2Ftreehopper/lists"}