Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/krzjoa/salto
Playing with embedding vectors
https://github.com/krzjoa/salto
nlp python visulaization word-embeddings
Last synced: about 2 months ago
JSON representation
Playing with embedding vectors
- Host: GitHub
- URL: https://github.com/krzjoa/salto
- Owner: krzjoa
- License: mit
- Created: 2022-05-15T19:57:00.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-21T21:11:55.000Z (about 1 year ago)
- Last Synced: 2024-10-29T01:29:49.999Z (about 2 months ago)
- Topics: nlp, python, visulaization, word-embeddings
- Language: Python
- Homepage: https://salto.readthedocs.io
- Size: 3.44 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# salto
> Playing with embedding vectors
## Installation
You can install this library from *PyPI*
```bash
pip install salto
```or from the GitHub repo:
```bash
pip install git+https://github.com/krzjoa/salto.git
```## Motivation
The goal of the **salto** package is to explore embeddings and check,
how the distance between two points (vectors) can be interpreted.
We get two arbitrary selected points, such as embedding vectors for **ice** and **fire**
draw a straight line passing trough the both these points. Then, we treat the
newly created line as a new axis by projecting the rest of the points onto this line.
Drawn using: https://www.geogebra.org/m/JMMKv7cx
I named the package **salto**, which means *somersault* in many languages or simply *jump* in Romance languages like Italian, where this word originally comes from.
It's because the operation of changing space for me resembles a kind of acrobatics 😉.## Usage
```python
import numpy as np
import spacy
import saltonlp = spacy.load('en_core_web_md')
fire = nlp('fire')
ice = nlp('ice')ice_fire_axis = salto.axis(ice.vector, fire.vector)
cold = ['ice cream', 'polar', 'snow', 'winter', 'fridge', 'Antarctica']
warm = ['boiling water', 'tropical', 'sun', 'summer', 'oven', 'Africa']cold_vecs = [nlp(w).vector for w in cold]
warm_vecs = [nlp(w).vector for w in warm]cold_values = [ice_fire_axis(p) for p in cold_vecs]
warm_values = [ice_fire_axis(p) for p in warm_vecs]ice_fire_axis.plot(
{'values': cold_values, 'labels': cold, 'color': 'tab:blue'},
{'values': warm_values, 'labels': warm, 'color': 'tab:red'},
poles = {'negative': {'label': 'ice', 'color': 'blue'},
'positive': {'label': 'fire', 'color': 'red'}}
)
```## See also
[scikit-spatial](https://github.com/ajhynes7/scikit-spatial) - Spatial objects and computations based on NumPy arrays.