https://github.com/bkj/tagless
Interface for building image classifiers, via transfer learning, active search and uncertainty sampling
https://github.com/bkj/tagless
active-learning active-search human-computer-interaction labeling-tool python rest-api
Last synced: 8 months ago
JSON representation
Interface for building image classifiers, via transfer learning, active search and uncertainty sampling
- Host: GitHub
- URL: https://github.com/bkj/tagless
- Owner: bkj
- License: mit
- Created: 2017-08-01T21:59:33.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2023-03-08T20:10:18.000Z (over 3 years ago)
- Last Synced: 2023-08-01T10:38:07.180Z (almost 3 years ago)
- Topics: active-learning, active-search, human-computer-interaction, labeling-tool, python, rest-api
- Language: Python
- Homepage:
- Size: 2.06 MB
- Stars: 2
- Watchers: 3
- Forks: 1
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## tagless
Tagging interface w/ transfer learning, linearized active search and uncertainty sampling:
| | |
| ----------------- | ---------------------------- |
| Transfer learning | https://github.com/bkj/tdesc |
| Linearized active search (LAS) | https://github.com/bkj/simple_las |
| Uncertainty sampling | https://github.com/bkj/libact |
Under active development -- some things are broken or don't have sensible APIs exposed.
### Usage
```
cd $TARGET_DIR
mkdir -p ./{data,results}
# expect set of images to be in `imgs` directory
# Featurize images
find ./imgs/ -type f | python -m tdesc --model vgg16 --crow > .crow
# Prep + reformat images
python $TAGLESS_ROOT/tagless/prep.py --inpath .crow ./data/crow
# Run server
python -m tagless --outpath ./results/my-labels --crow ./data/crow
# Connect to localhost:5000 + start tagging
```
### Notes
Uncertainty sampling computes the score for each unlabeled image at each iteration. ATM we're using a linear SVM, so the runtime of this step increases linearly w/ the size of the corpus. On my machine, predicting on ~350K images takes ~2.5s, which is unacceptably slow. Thus, for big corpora, we may want to fall back to some kind of approximate matrix-vector product. That'll take a little bit of thought thought. For now I'll recommend running on a subset of the data.
__Idea__: Feature vectors are normalized relus -- so norm=1 and all positive entries. Could maybe do uncertainty sampling via `faiss` by using vector orthogonal to SVM feature vector and take the largest/smallest entries. Have to check my work on that one though.
### Dependencies
This has been tested on Ubuntu 16.04 w/ Python 2.7 (via Anaconda)