{"id":19031446,"url":"https://github.com/plandes/todo-task","last_synced_at":"2025-10-09T02:45:02.784Z","repository":{"id":80107545,"uuid":"138088594","full_name":"plandes/todo-task","owner":"plandes","description":"A Supervised Approach To The Interpretation Of Imperative To-Do Lists","archived":false,"fork":false,"pushed_at":"2018-06-29T01:38:36.000Z","size":735,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-23T16:53:25.261Z","etag":null,"topics":["arxiv","clojure","machine-learning","natural-language-processing","scholarly-articles"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/1806.07999","language":"Clojure","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/plandes.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-06-20T21:38:56.000Z","updated_at":"2023-12-22T13:42:41.000Z","dependencies_parsed_at":null,"dependency_job_id":"a5857db1-6176-4343-84f8-2bcee1ffaa8b","html_url":"https://github.com/plandes/todo-task","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/plandes/todo-task","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plandes%2Ftodo-task","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plandes%2Ftodo-task/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plandes%2Ftodo-task/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plandes%2Ftodo-task/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/plandes","download_url":"https://codeload.github.com/plandes/todo-task/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/plandes%2Ftodo-task/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279000770,"owners_count":26082906,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arxiv","clojure","machine-learning","natural-language-processing","scholarly-articles"],"created_at":"2024-11-08T21:23:29.746Z","updated_at":"2025-10-09T02:45:02.757Z","avatar_url":"https://github.com/plandes.png","language":"Clojure","funding_links":[],"categories":["Stale"],"sub_categories":["ML \u0026 Training"],"readme":"# Supervised Approach Imperative To-Do List Categorization\n\nThis repository contains a corpus and code base to categorize natural language\ntodo list items as described in our paper [A Supervised Approach To The\nInterpretation Of Imperative To-Do Lists].\n\nThis repository contains:\n\n* A publicly available [corpus](#corpus).\n* A [code base](#code-base) similar to that given as published results in the\n  [arXiv paper].\n\n\n\u003c!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc --\u003e\n## Table of Contents\n\n- [Documents](#documents)\n- [Corpus](#corpus)\n- [Citation](#citation)\n- [Code Base](#code-base)\n    - [What's Included](#whats-included)\n    - [Third Party Libraries](#third-party-libraries)\n    - [Documentation](#documentation)\n- [Running the Tests](#running-the-tests)\n    - [Parsing](#parsing)\n    - [Test Evaluation](#test-evaluation)\n    - [Predictions](#predictions)\n    - [Off-line Tests](#off-line-tests)\n    - [Sample Results of Test](#sample-results-of-test)\n    - [Building](#building)\n    - [Advanced](#advanced)\n- [Changelog](#changelog)\n- [Special Thanks](#special-thanks)\n- [License](#license)\n\n\u003c!-- markdown-toc end --\u003e\n\n\n## Documents\n\n* [Corpus]\n* [Paper on arXiv] (please [cite](#citation) this paper)\n* [Paper](https://plandes.github.io/todo-task/SupervisedInterpretationImperativeToDos.pdf) (please do **not**\n  cite this paper).\n* [Slides](https://plandes.github.io/todo-task/SupervisedInterpretationImperativeToDosSlides.pdf)\n* [Evaluation](results/full-evaluation.xls) (generated using\n  the [evaluation functionality](#test-evaluation))\n* [Predictions](results/predictions.csv) (generated using\n  the [predictions functionality](#predictions))\n\n\n## Corpus\n\nThe publicly available corpus is available [here](resources/corpus.xlsx) in\nExcel format.  This corpus is referred to as *Corpus B* in the [arXiv\npaper]. The columns in the spreadsheet are:\n\n| Column Name   | Description                                     | Trello Artifact |\n|---------------|-------------------------------------------------|-----------------|\n| `utterance`   | The natural language todo list text.            | no              |\n| `class`       | The label if classified, otherwise left blank.  | no              |\n| `board_name`  | The name of the board                           | yes             |\n| `board_id`    | The board ID                                    | yes             |\n| `short_url`   | The URL of the comment on Trello                | yes             |\n| `description` | Additional description information for the task | yes             |\n\n\n## Citation\n\nPlease use the following to cite the [arXiv paper].\n\n```jflex\n@article{landesDiEugenio2018,\n  title = {A Supervised Approach To The Interpretation Of Imperative To-Do Lists},\n  url = {http://arxiv.org/abs/1806.07999},\n  note = {arXiv: 1806.07999},\n  journal = {arXiv:1806.07999 [cs]},\n  author = {Landes, Paul and Di Eugenio, Barbara},\n  year = {2018},\n  month = {Jun}\n}\n```\n\nIf you use this software in your research, please cite with the following\nBibTeX (note that the [third party libraries] also have citations):\n\n```jflex\n@misc{plandesTodoTask2018,\n  author = {Paul Landes},\n  title = {Supervised Approach Imperative To-Do List Categorization},\n  year = {2018},\n  publisher = {GitHub},\n  journal = {GitHub repository},\n  howpublished = {\\url{https://github.com/plandes/todo-task}}\n}\n```\n\n\n## Code Base\n\nThe code base used in this repository is an updated version of the code used on\n*Corpus A* (see the [arXiv paper]).  It is written in [Clojure] and written to\nbe accessed mostly via `make` commands.  However, it can be compiled into a\ncommand line app if you want to run the long running cross fold validation\ntasks.  See the [Running the Tests](#running-the-tests) to compile and run it.\n\n\n### What's Included\n\nThe functionality included is *agent classification* as described in the [arXiv\npaper].  The following is *not* included:\n\n* Argument classification\n* Extending the Named Entity Recognizer (section 4.1)\n* The first verb model (section 4.2)\n\nThis functionality is not included as the origianl code base is proprietary.\nThis code base was rewritten and [third party libraries] utilized where\npossible to speed up the development.\n\n\n### Third Party Libraries\n\nPrimary libraries used are listed below.  Their dependencies can be traced from\ntheir respective repo links:\n\n* [Natural Language Parsing and Feature Generation]\n* [Interface for Machine Learning Modeling]\n* [Generate, split into folds or train/test and cache a dataset]\n* [Natural Language Feature Creation]\n* [Word Vector Feature Creation]\n\n\n### Documentation\n\nAPI [documentation](https://plandes.github.io/todo-task/codox/index.html).\n\n\n## Running the Tests\n\nThis section explains how to run the the model against the corpus to reproduce\nthe results (*similar*) to the [arXiv paper].  These instructions assume either\na UNIX, Linux, macOS operating system or *maybe* Cygwin under Windows.\n\nBefore proceeding, please install all the all tools given in\nthe [building](#building) section.\n\n\n### Parsing\n\nThis section describes how to parse the corpus and load the corpus.  Note that\nif you just want to run the tests you can **skip**\nto [test evaluation](#test-evaluation) section.  This means you don't need\n[ElasticSearch], which is only necessary for parsing the corpus and creating\nfile system train/test split.  This is already done\nand [in the repo](resources/todo-dataset.json) already.\n\nOn the other hand, if you **really** want to manually parse and create the\ntrain/test data sets you must first install [ElasticSearch] or [Docker].  The\neasiest way to get this up and working is to use [Docker], which is easy enough\nto download, install and get running on a container with:\n\n```bash\nmake startes\n```\n\nwhich provides the configuration necessary to download and start an\n[ElasticSearch] container ready to store the generated features from the parsed\nnatural language text.\n\nNext, populate [ElasticSearch] with parsed featues:\n\n```bash\nmake load\n```\n\nThis parses the corpus and adds a JSON parse representation of each utterance\nto the database.\n\nNext create train and test datasets by randomly shuffling the corpus.  After\nthe train/test assignment for each data point, export the data set to the JSON\nfile:\n\n```bash\nmake dsprep\n```\n\n\n### Test Evaluation\n\nProduce the optimal results for the model by evaluating and\nprinting the results:\n\n```bash\nmake printbest\n```\n\nThis gives the best (0.76 F1) results.\n\n\nTo run all defined feature and classifier combinations run the following:\n\n```bash\nmake print\n```\n\n\nTo run all defined feature and classifier combinations and create a spreadsheet\nwith all performance metrics, features and classifiers used for those metrics run:\n\n```bash\nmake evaluate\n```\n\nThis will create an `evaluation.xls` file.  The file this process generates\nis [here](results/full-evaluation.xls).\n\n\n### Predictions\n\nIt is possible to generate a CSV file with predictions complete with the\nutterance, the correct label, and the predicted label.  In addition, the file\nalso includes all features used to create the prediction.  This proces includes:\n\n1. For each feature sets and classifier combination, train the model and test.\n2. The winning combination (by F1) of feature set and classifier is used to\n   train the model.\n3. Create predictions on the test set.\n4. Generate the spreadhsheet with the results.\n\nTo invoke this functionality, use the following:\n\n```bash\nmake predict\n```\n\nThis will generate a `predictions.csv` file.  The file this process generates\nis [here](results/predictions.csv).\n\n\n### Off-line Tests\n\nIf you have a slower computer and the tests take too long, they can run in an\noffline mode.\n\nTo long running offline tests in the background, first download and link to the\nmodels (note the space between `ZMODEL=` and `models` is intentional):\n\n```bash\nmake ZMODEL= models\n```\n\ncreate the application as a standalone and then\nexecute in the background:\n\n```bash\nmake ZMODEL=`pwd`/model DIST_PREFIX=./inst disttodo\ncd ./inst/todotask\n./run.sh sanity\ntail -f log/test-res.log\n```\n\nType `CONTROL-C` to break out of `tail` and check open `results/test-res.xlsx`\nto confirm the a single line from a simple majority label classifer (it will\nhave terrible performance).\n\nIf everything works, now run the long running tests:\n\n```bash\n./run.sh long\nls results\n```\n\nThe `results` directory will have the results from each test.\nSection [results](#results-of-code-base) has a summary of each test.\n\n\n### Sample Results of Test\n\nA selection of results using the this code base on [*Corpus B*] are given\nbelow:\n\n|  Classifier  | F1        | Precision | Recall |                                                                                                                                              Attributes                                                                                                                                              |\n|--------------|----------:|----------:|-------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| J48          |   0.763 | 7677 | 0.761 | similarity-top-label, pos-last-tag, word-count-contact, word-count-call, word-count-buy, word-count-calendar, word-count-pay-bill-online, pos-first-tag, word-count-plan-meal, word-count-email, word-count-postal, word-count-school-work, word-count-print                                         |\n| RandomForest |   0.695 | 7101 | 0.714 | *all word counts*, elected-verb-id, similarity-top-label, similarity-score, pos-tag-ratio-noun                                  |\n| RandomTree   |   0.656 | 6654 | 0.666 | *all word counts*, elected-verb-id, similarity-top-label, similarity-score, pos-tag-ratio-noun                                  |\n| LogitBoost   |   0.592 | 6171 | 0.619 | *all word counts*, elected-verb-id, similarity-top-label, similarity-score, pos-tag-ratio-noun                                  |\n| NaiveBayes   |   0.547 | 5779 | 0.571 | similarity-top-label, pos-last-tag, word-count-contact, word-count-call, word-count-buy, word-count-calendar, word-count-pay-bill-online, pos-first-tag, word-count-plan-meal, word-count-email, word-count-postal, word-count-school-work, word-count-print                                         |\n| SVM          |   0.273 | 2815 | 0.285 | elected-verb-id, token-average-length, pos-first-tag, pos-last-tag, similarity-top-label, similarity-score, pos-tag-ratio-noun                                                                                                                                                                       |\n| Baseline     |   0.091 | 2356 | 0.238 | similarity-top-label, pos-last-tag, word-count-contact, word-count-call, word-count-buy, word-count-calendar, word-count-pay-bill-online, pos-first-tag, word-count-plan-meal, word-count-email, word-count-postal, word-count-school-work, word-count-print                                         |\n\n\n### Building\n\nTo build from source, do the folling:\n\n- Install [Leiningen](http://leiningen.org) (this is just a script)\n- Install [GNU make](https://www.gnu.org/software/make/)\n- Install [Git](https://git-scm.com)\n- Download the source: `git clone --recurse-submodules https://github.com/plandes/todo-task \u0026\u0026 cd todo-task`\n\n\n### Advanced\n\nAll the capabilities of the [Interface for Machine Learning Modeling] package,\nincluding creating a usable executable model, are possible.  The (not unit test\ncase) [Clojure] [experimental execution file](test/uic/nlp/todo/eval_test.clj)\ndemonstrates how to do other things with the model.  All you need to do is to\nstart a [REPL](https://clojure.org/guides/repl/introduction) and call the\n`main` function.\n\n\n## Changelog\n\nAn extensive changelog is available [here](CHANGELOG.md).\n\n\n## Special Thanks\n\nThanks to those that volunteered their To-do tasks that, in part, made\nup this publicly available corpus.\n\n\n## License\n\nThis license applies to the code base and the corpus.\n\nCopyright (c) 2018 Paul Landes\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of\nthis software and associated documentation files (the \"Software\"), to deal in\nthe Software without restriction, including without limitation the rights to\nuse, copy, modify, merge, publish, distribute, sublicense, and/or sell copies\nof the Software, and to permit persons to whom the Software is furnished to do\nso, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n\n\n\u003c!-- links --\u003e\n[A Supervised Approach To The Interpretation Of Imperative To-Do Lists]: https://arxiv.org/pdf/1806.07999\n[arXiv paper]: https://arxiv.org/pdf/1806.07999\n[Paper on arXiv]: https://arxiv.org/pdf/1806.07999\n[*Corpus B*]: resources/corpus.xlsx\n[Corpus]: resources/corpus.xlsx\n\n[ElasticSearch]: https://www.elastic.co\n[Docker]: https://www.docker.com\n[Clojure]: https://clojure.org\n\n[Natural Language Parsing and Feature Generation]: https://github.com/plandes/clj-nlp-parse\n[Interface for Machine Learning Modeling]: https://github.com/plandes/clj-ml-model\n[Generate, split into folds or train/test and cache a dataset]: https://github.com/plandes/clj-ml-dataset\n[Natural Language Feature Creation]: https://github.com/plandes/clj-nlp-feature\n[Word Vector Feature Creation]: https://github.com/plandes/clj-nlp-wordvec\n[third party libraries]: #third-party-libraries\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplandes%2Ftodo-task","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fplandes%2Ftodo-task","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplandes%2Ftodo-task/lists"}