https://github.com/bkj/tagless

Interface for building image classifiers, via transfer learning, active search and uncertainty sampling
https://github.com/bkj/tagless

active-learning active-search human-computer-interaction labeling-tool python rest-api

Last synced: about 18 hours ago
JSON representation

Interface for building image classifiers, via transfer learning, active search and uncertainty sampling

Host: GitHub
URL: https://github.com/bkj/tagless
Owner: bkj
License: mit
Created: 2017-08-01T21:59:33.000Z (almost 9 years ago)
Default Branch: master
Last Pushed: 2023-03-08T20:10:18.000Z (over 3 years ago)
Last Synced: 2025-06-15T06:07:04.786Z (about 1 year ago)
Topics: active-learning, active-search, human-computer-interaction, labeling-tool, python, rest-api
Language: Python
Homepage:
Size: 2.06 MB
Stars: 2
Watchers: 2
Forks: 1
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## tagless

Tagging interface w/ transfer learning, linearized active search and uncertainty sampling:

| | |
| ----------------- | ---------------------------- |
| Transfer learning | https://github.com/bkj/tdesc |
| Linearized active search (LAS) | https://github.com/bkj/simple_las |
| Uncertainty sampling | https://github.com/bkj/libact |

Under active development -- some things are broken or don't have sensible APIs exposed.

### Usage

```

cd $TARGET_DIR
mkdir -p ./{data,results}
# expect set of images to be in `imgs` directory

# Featurize images
find ./imgs/ -type f | python -m tdesc --model vgg16 --crow > .crow

# Prep + reformat images
python $TAGLESS_ROOT/tagless/prep.py --inpath .crow ./data/crow

# Run server
python -m tagless --outpath ./results/my-labels --crow ./data/crow

# Connect to localhost:5000 + start tagging
```

### Notes

Uncertainty sampling computes the score for each unlabeled image at each iteration. ATM we're using a linear SVM, so the runtime of this step increases linearly w/ the size of the corpus. On my machine, predicting on ~350K images takes ~2.5s, which is unacceptably slow. Thus, for big corpora, we may want to fall back to some kind of approximate matrix-vector product. That'll take a little bit of thought thought. For now I'll recommend running on a subset of the data.

__Idea__: Feature vectors are normalized relus -- so norm=1 and all positive entries. Could maybe do uncertainty sampling via `faiss` by using vector orthogonal to SVM feature vector and take the largest/smallest entries. Have to check my work on that one though.

### Dependencies

This has been tested on Ubuntu 16.04 w/ Python 2.7 (via Anaconda)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bkj/tagless

Awesome Lists containing this project

README