https://github.com/megagonlabs/tagruler

Data programming by demonstration for information extraction and span annotation
https://github.com/megagonlabs/tagruler

data-labeling data-programming data-programming-by-demonstration machine-learning weak-supervision

Last synced: 6 months ago
JSON representation

Data programming by demonstration for information extraction and span annotation

Host: GitHub
URL: https://github.com/megagonlabs/tagruler
Owner: megagonlabs
License: apache-2.0
Created: 2021-02-25T22:44:49.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2021-09-09T19:45:20.000Z (about 4 years ago)
Last Synced: 2025-04-02T19:47:01.098Z (6 months ago)
Topics: data-labeling, data-programming, data-programming-by-demonstration, machine-learning, weak-supervision
Language: JavaScript
Homepage:
Size: 82.6 MB
Stars: 35
Watchers: 4
Forks: 6
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE.txt

Awesome Lists containing this project

README

# TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration
This repo contains the source code and the user evaluation data for TagRuler, a data programming by demonstration system for span-level annotation.
Check out our [demo video](https://youtu.be/MRc2elPaZKs) to see TagRuler in action!

Demonstration Video: https://youtu.be/MRc2elPaZKs

TagRuler synthesizes labeling functions based on your annotations, allowing you to quickly and easily generate large amounts of training data for span annotation, without the need to program.

# What is TagRuler?

In 2020, we introduced [Ruler](https://github.com/megagonlabs/ruler), a novel data programming by demonstration system that allows domain experts to leverage data programming without the need for coding. Ruler generates document classification rules, but we knew that there was a bigger challenge left to tackle: span-level annotations. This is one of the more time-consuming labelling tasks, and creating a DPBD system for this proved to be a challenge because of the sheer magnitude of the space of labeling functions over spans.

We feel that this is a critical extension of the DPBD paradigm, and that by open-sourcing it, we can help with all kinds of labelling needs.

# How to use the source code in this repo

Follow these instructions to run the system on your own, where you can plug in your own data and save the resulting labels, models, and annotations.

## 1. Server

### 1-1. Install Dependencies :wrench:

```shell
cd server
pip install -r requirements.txt
python -m spacy download en_core_web_sm
```

### 1-2. (Optional) Download Data Files

- **BC5CDR** ([Download Preprocessed Data](https://drive.google.com/file/d/1kKeINUOjtCVGr1_L3aC3qDo3-O-jr5hR/view?usp=sharing)): PubMed articles for Chemical-Disease annotation
Li, Jiao & Sun, Yueping & Johnson, Robin & Sciaky, Daniela & Wei, Chih-Hsuan & Leaman, Robert & Davis, Allan Peter & Mattingly, Carolyn & Wiegers, Thomas & lu, Zhiyong. (2016). Original database URL: http://www.biocreative.org/tasks/biocreative-v/track-3-cdr/

- **Your Own Data** See instructions in [server/datasets](server/datasets)

### 1-3. Run :runner:

```
python api/server.py
```

## 2. User Interface

### 2-1. Install Node.js

[You can download node.js here.](https://nodejs.org/en/)

To confirm that you have node.js installed, run `node - v`

### 2-2. Run

```shell
cd ui
npm install
npm start
```

By default, the app will make calls to `localhost:5000`, assuming that you have the server running on your machine. (See the [instructions above](#Engine)).

Once you have both of these running, navigate to `localhost:3000`.

# Issues?

...or other inquiries, contact and/or .

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/megagonlabs/tagruler

Awesome Lists containing this project

README

TagRuler synthesizes labeling functions based on your annotations, allowing you to quickly and easily generate large amounts of training data for span annotation, without the need to program.