{"id":19381600,"url":"https://github.com/megagonlabs/tagruler","last_synced_at":"2025-04-23T20:32:02.169Z","repository":{"id":41551869,"uuid":"342399223","full_name":"megagonlabs/tagruler","owner":"megagonlabs","description":"Data programming by demonstration for information extraction and span annotation","archived":false,"fork":false,"pushed_at":"2021-09-09T19:45:20.000Z","size":86639,"stargazers_count":35,"open_issues_count":1,"forks_count":6,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-02T19:47:01.098Z","etag":null,"topics":["data-labeling","data-programming","data-programming-by-demonstration","machine-learning","weak-supervision"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/megagonlabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-02-25T22:44:49.000Z","updated_at":"2024-02-17T07:51:17.000Z","dependencies_parsed_at":"2022-09-01T13:51:24.710Z","dependency_job_id":null,"html_url":"https://github.com/megagonlabs/tagruler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megagonlabs%2Ftagruler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megagonlabs%2Ftagruler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megagonlabs%2Ftagruler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/megagonlabs%2Ftagruler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/megagonlabs","download_url":"https://codeload.github.com/megagonlabs/tagruler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250509708,"owners_count":21442482,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-labeling","data-programming","data-programming-by-demonstration","machine-learning","weak-supervision"],"created_at":"2024-11-10T09:17:41.823Z","updated_at":"2025-04-23T20:31:57.160Z","avatar_url":"https://github.com/megagonlabs.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration\nThis repo contains the source code and the user evaluation data for TagRuler, a data programming by demonstration system for span-level annotation.\nCheck out our [demo video](https://youtu.be/MRc2elPaZKs) to see TagRuler in action!\n\nDemonstration Video: https://youtu.be/MRc2elPaZKs\n\n\n\u003ch3 align=\"center\"\u003e\nTagRuler synthesizes labeling functions based on your annotations, allowing you to quickly and easily generate large amounts of training data for span annotation, without the need to program. \u003cbr/\u003e\n \u003ca href=\"https://youtu.be/MRc2elPaZKs\"\u003e\u003cimg width=800px src=tagruler-teaser.gif\u003e\u003c/a\u003e\n\u003c/h3\u003e\n\n\n# \u003ca name='About'\u003e\u003c/a\u003eWhat is TagRuler?\n\nIn 2020, we introduced [Ruler](https://github.com/megagonlabs/ruler), a novel data programming by demonstration system that allows domain experts to leverage data programming without the need for coding.  Ruler generates document classification rules, but we knew that there was a bigger challenge left to tackle:  span-level annotations. This is one of the more time-consuming labelling tasks, and creating a DPBD system for this proved to be a challenge because of the sheer magnitude of the space of labeling functions over spans.\n\nWe feel that this is a critical extension of the DPBD paradigm, and that by open-sourcing it, we can help with all kinds of labelling needs.\n\n# \u003ca name='Use'\u003e\u003c/a\u003eHow to use the source code in this repo\n\nFollow these instructions to run the system on your own, where you can plug in your own data and save the resulting labels, models, and annotations.\n\n## 1. Server\n\n### 1-1. Install Dependencies :wrench:\n\n```shell\ncd server\npip install -r requirements.txt\npython -m spacy download en_core_web_sm\n```\n\n### 1-2. (Optional) Download Data Files\n\n- **BC5CDR** ([Download Preprocessed Data](https://drive.google.com/file/d/1kKeINUOjtCVGr1_L3aC3qDo3-O-jr5hR/view?usp=sharing)): PubMed articles for Chemical-Disease annotation\nLi, Jiao \u0026 Sun, Yueping \u0026 Johnson, Robin \u0026 Sciaky, Daniela \u0026 Wei, Chih-Hsuan \u0026 Leaman, Robert \u0026 Davis, Allan Peter \u0026 Mattingly, Carolyn \u0026 Wiegers, Thomas \u0026 lu, Zhiyong. (2016). Original database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/\n\n- **Your Own Data** See instructions in [server/datasets](server/datasets)\n\n### 1-3. Run :runner:\n\n```\npython api/server.py\n```\n\n## 2. User Interface\n\n### 2-1. Install Node.js\n\n[You can download node.js here.](https://nodejs.org/en/)\n\nTo confirm that you have node.js installed, run `node - v`\n\n### 2-2. Run\n\n```shell\ncd ui\nnpm install \nnpm start\n```\n\nBy default, the app will make calls to `localhost:5000`, assuming that you have the server running on your machine. (See the [instructions above](#Engine)).\n\nOnce you have both of these running, navigate to `localhost:3000`.\n\n\n# Issues?\n\n...or other inquiries, contact \u003csara@megagon.ai\u003e and/or \u003cjin.choi@gatech.edu\u003e.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmegagonlabs%2Ftagruler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmegagonlabs%2Ftagruler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmegagonlabs%2Ftagruler/lists"}