An open API service indexing awesome lists of open source software.

https://github.com/bkj/tagless

Interface for building image classifiers, via transfer learning, active search and uncertainty sampling
https://github.com/bkj/tagless

active-learning active-search human-computer-interaction labeling-tool python rest-api

Last synced: 8 months ago
JSON representation

Interface for building image classifiers, via transfer learning, active search and uncertainty sampling

Awesome Lists containing this project

README

          

## tagless

Tagging interface w/ transfer learning, linearized active search and uncertainty sampling:

| | |
| ----------------- | ---------------------------- |
| Transfer learning | https://github.com/bkj/tdesc |
| Linearized active search (LAS) | https://github.com/bkj/simple_las |
| Uncertainty sampling | https://github.com/bkj/libact |

Under active development -- some things are broken or don't have sensible APIs exposed.

### Usage

```

cd $TARGET_DIR
mkdir -p ./{data,results}
# expect set of images to be in `imgs` directory

# Featurize images
find ./imgs/ -type f | python -m tdesc --model vgg16 --crow > .crow

# Prep + reformat images
python $TAGLESS_ROOT/tagless/prep.py --inpath .crow ./data/crow

# Run server
python -m tagless --outpath ./results/my-labels --crow ./data/crow

# Connect to localhost:5000 + start tagging
```

### Notes

Uncertainty sampling computes the score for each unlabeled image at each iteration. ATM we're using a linear SVM, so the runtime of this step increases linearly w/ the size of the corpus. On my machine, predicting on ~350K images takes ~2.5s, which is unacceptably slow. Thus, for big corpora, we may want to fall back to some kind of approximate matrix-vector product. That'll take a little bit of thought thought. For now I'll recommend running on a subset of the data.

__Idea__: Feature vectors are normalized relus -- so norm=1 and all positive entries. Could maybe do uncertainty sampling via `faiss` by using vector orthogonal to SVM feature vector and take the largest/smallest entries. Have to check my work on that one though.

### Dependencies

This has been tested on Ubuntu 16.04 w/ Python 2.7 (via Anaconda)