Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rmitsch/tapas
Assisted hyperparameter optimization for t-SNE in a word embedding context.
https://github.com/rmitsch/tapas
dimensionality-reduction hyperparameter-optimization t-sne visualization word-embeddings
Last synced: 2 days ago
JSON representation
Assisted hyperparameter optimization for t-SNE in a word embedding context.
- Host: GitHub
- URL: https://github.com/rmitsch/tapas
- Owner: rmitsch
- License: gpl-3.0
- Created: 2017-10-27T12:59:42.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-07-13T12:16:11.000Z (over 6 years ago)
- Last Synced: 2024-12-29T10:44:31.535Z (5 days ago)
- Topics: dimensionality-reduction, hyperparameter-optimization, t-sne, visualization, word-embeddings
- Language: Python
- Homepage:
- Size: 4.01 MB
- Stars: 1
- Watchers: 4
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# tAPAS
### Assisted Parameter optimization by Approximating neighbourhood SimilarityBayesian optimization of hyperparameters for t-SNE in the context of word embeddings - i. e.: Input is a word embedding (labels + coordinates in high-dimensional space). The optimization procedure samples the parameter space to generate low-dimensional approximations of the original word embeddig data using t-SNE. The quality/truthfulness of the resulting models are evaluated with several metrics:
* Trustworthiness: Measure for proportion of points too close together1 in the low-dimensional space.
* Continuity: Measure for proportion of points too far apart1 in the low-dimensional space.
* Generalization: Generalization error of 1-nearest neighbour classifier (e. g. word embedding is clustered in high-dimensional and low-dimensional space - the higher the similarity between the cluster labels, the lower the generalization error).
* Relative word embedding quality: QVEC [1] is used to evalute the intrinsic quality of the original word embedding and its dimensionality-reduced projection. The ratio is referred to as 'relative word embedding quality'.The first three measures were chosen following [2].
![Main View](https://raw.githubusercontent.com/rmitsch/tapas/master/doc/main.png)
![Generation of New Runs](https://raw.githubusercontent.com/rmitsch/tapas/master/doc/run_generation.png)
1: In terms of neighbourhood ranks, not absolute distances.
_____
[1] Y. Tsvetkov, M. Faruqui, W. Ling, G. Lample, and C. Dyer, “Evaluation of Word Vector Representations by Subspace Alignment,” 2015, pp. 2049–2054.
[2] L. J. P. van der Maaten, E. O. Postma, and H. J. van den Herik, Dimensionality Reduction: A Comparative Review. 2008.