https://github.com/uwdata/errudite
An Interactive Tool for Scalable and Reproducible Error Analysis.
https://github.com/uwdata/errudite
Last synced: 3 months ago
JSON representation
An Interactive Tool for Scalable and Reproducible Error Analysis.
- Host: GitHub
- URL: https://github.com/uwdata/errudite
- Owner: uwdata
- License: gpl-2.0
- Created: 2019-05-27T21:51:18.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-07-22T06:35:41.000Z (almost 4 years ago)
- Last Synced: 2025-03-23T04:11:23.890Z (3 months ago)
- Language: Python
- Homepage: https://errudite.readthedocs.io/en/latest/index.html
- Size: 6.91 MB
- Stars: 106
- Watchers: 9
- Forks: 11
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Errudite
Errudite is an interactive tool for scalable, reproducible, and counterfactual error analysis.
Errudite provides an expressive domain-specific language for extracting relevant features of
linguistic data, which allows users to visualize data attributes, group relevant instances,
and perform counterfactual analysis across all available validation data.## Getting Started
1. Read [our blog post](https://medium.com/@uwdata/errudite-55d5fbf3232e) which explains the core idea of Errudite.
2. Watch [this video demo](https://youtu.be/Dil5i0AYyu8) that contains the highlights of Errudite's functions & use cases
3. Get [set up](#installation) quickly
4. Try [Errudite's user interface](#gui-server) on machine comprehension
5. Try the [tutorials on JupyterLab notebooks](#jupyterLab-tutorial)
6. Read the [documentation](https://errudite.readthedocs.io/en/latest/)## Citation
If you are interested in this work, please see our
[ACL 2019 research paper](https://homes.cs.washington.edu/~wtshuang/static/papers/2019-acl-errudite.pdf)
and consider citing our work:
```
@inproceedings{2019-errudite,
title = {Errudite: Scalable, Reproducible, and Testable Error Analysis},
author = {Wu, Tongshuang and Ribeiro, Marco Tulio and Heer, Jeffrey and Weld Daniel S.},
booktitle={the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)},
year = {2019},
url = {https://www.aclweb.org/anthology/P19-1073.pdf},
}
```## Quick Start
### Installation
#### PIP
Errudite requires Python 3.6.x. The package is avaiable through `pip`:
Just install it in your Python environment and you're good to go!```SH
# create the virtual environment
virtualenv --no-site-packages -p python3.6 venv
# activate venv
source venv/bin/activate
# install errudite
pip install errudite
```#### Install from source
You can also install Errudite by cloning our git repository:
```sh
git clone https://github.com/uwdata/errudite
```Create a Python 3.6 virtual environment, and install Errudite in `editable` mode by running:
```sh
pip install --editable .
```This will make `errudite` available on your system but it will use the sources from the local clone
you made of the source repository.####
1. `mysql_config not found` for `Pattern`: See similar solutions [here](https://github.com/PyMySQL/mysqlclient-python#prerequisites).### GUI Server
Errudite has a UI wrapped for Machine Comprehension and Visual Question Answering tasks.
The interface integrates all the key analysis functions (e.g., inspecting instance attributes,
grouping similar instances, rewriting instances), It also provides exploration
support such as visualizing data distributions, suggesting potential queries, and presenting the
grouping and rewriting results. While not strictly necessary, it makes their application much
more straightforward.Note that the GUI is meant to be released as-is -- _We do not expect it to be extended to other tasks._
As such, the frontend code is not as well-documented as the backend code.
**If you are interested in using Errudite for your own task, please consider using [Errudite package in JupyterLab](#jupyterLab-tutorial)**.
It wraps almost all the Errudite functions (except for query auto-complete and programming-by-demonstration),
and allows you to customize for your own task.To get a taste of GUI for the machine comprehension task, you should first download a cache folder
for preprocessed [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/) instances, which will help you
skip the process of running your own preprocessing. Say we want to use the preprocessed SQuAD dataset,
and we want to download the preprocessed data folder to `~/caches/`:```
python -m errudite.download --cache_folder_name squad-10570 --cache_path ~/caches/Commands:
cache_folder_name
A folder name. Currently, we allow downloading the following:
squad-100, squad-10570.
cache_path A local path where you want to save the cache folder to.
```Then, we need to start the server:
```sh
# the model relies on Allennlp, so make sure you install that first.
# If you run into issues installing it, please refer to Allennlp's official page: https://github.com/allenai/allennlp
pip install allennlp==0.9.0
source venv/bin/activate
python -m errudite.server --config_file config.ymlCommands:
config_file
A yaml config file path.
```
The config file looks like the following (or in [config.yml](config.yml)):```yml
task: qa # the task, should be "qa" and "vqa".
cache_path: ~/caches/squad-10570 # the cached folder: {cache_path}/{cache_folder_name}/
model_metas: # a model.
- name: bidaf
model_class: bidaf # an implemented model class
model_path: # a local model file path
# an online path to an Allennlp model
model_online_path: https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz
description: Pretrained model from Allennlp, for the BiDAF model (QA)
attr_file_name: null # It set, to load previously saved analysis.
group_file_name: null
rewrite_file_name: null
```Then visit `http://localhost:5000/` in your web browser.
### JupyterLab Tutorial (and task extension)
Besides used in a GUI, errudite also serves as a general python package. The tutorial goes
through:
1. Preprocessing the data, and extending Errudite to different tasks & predictors
2. Creating data attributes and data groups with a domain specific language (or your customized functions).
3. Creating rewrite rules with the domain specific language (or your customized functions).To go through the tutorial, do the following steps:
```sh
# clone the repo
git clone https://github.com/uwdata/errudite
# initial folder: errudite/
# create the virtual environment
virtualenv --no-site-packages -p python3.6 venv
# activate venv
source venv/bin/activate# run the default setup script
pip install --editable .# get to the tutorial folder, and start!
cd tutorials
pip install -r requirements_tutorial.txt
jupyter lab
```