Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/stsievert/salmon
A tool to collect triplet queries
https://github.com/stsievert/salmon
active-learning crowdsourcing embedding machine-learning triplet-loss triplets
Last synced: 20 days ago
JSON representation
A tool to collect triplet queries
- Host: GitHub
- URL: https://github.com/stsievert/salmon
- Owner: stsievert
- License: bsd-3-clause
- Created: 2019-11-12T20:32:16.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-06-07T03:48:32.000Z (5 months ago)
- Last Synced: 2024-10-07T18:08:54.970Z (about 1 month ago)
- Topics: active-learning, crowdsourcing, embedding, machine-learning, triplet-loss, triplets
- Language: Python
- Homepage: https://docs.stsievert.com/salmon/
- Size: 119 MB
- Stars: 8
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE.txt
- Citation: CITATION.cff
Awesome Lists containing this project
README
Salmon is a tool for efficiently generating ordinal embeddings. It relies on
"active" machine learning algorithms to choose the most informative queries for
humans to answer.### Documentation
This documentation is available at these locations:
- **Primary source**: https://docs.stsievert.com/salmon/
- Secondary source: as [a raw PDF][pdf] (and as a [slower loading PDF][blobpdf]).
- Secondary source: as [zipped HTML directory][ziphtml], which requires unzipping the directory
then opening up `index.html`.[pdf]:https://github.com/stsievert/salmon/raw/gh-pages/salmon.pdf
[blobpdf]:https://github.com/stsievert/salmon/blob/gh-pages/salmon.pdf
[ziphtml]:https://github.com/stsievert/salmon/archive/refs/heads/gh-pages.zipPlease [file an issue][issue] if you can not access the documentation.
[issue]:https://github.com/stsievert/salmon/issues/new
### Running Salmon offline
Visit the documentation at https://docs.stsievert.com/salmon/offline.html.
Briefly, this should work:``` shell
$ cd path/to/salmon
$ conda env create -f salmon.lock.yml
$ conda activate salmon
(salmon) $ pip install -e .
```The documentation online mentions more about how to generate an embedding
offline: https://docs.stsievert.com/salmon/offline.html#generate-embeddingsWith this, it's also possible to create a script that uses and imports Salmon:
``` python
from salmon.triplets.samplers import TSTE
import numpy as npn, d = 85, 2
sampler = TSTE(n=n, d=d)em_init = np.array([[i, -i] for i in range(n)])
sampler.opt.initialize(embedding=em_init)queries, scores, meta = sampler.get_queries(num=10_000)
```This script allows the data scientist to score queries for an embedding they
specify.[semver]:https://semver.org