https://github.com/deezer/vmf-glove
https://github.com/deezer/vmf-glove
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/deezer/vmf-glove
- Owner: deezer
- Created: 2025-04-01T12:08:03.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-04-01T12:10:09.000Z (10 months ago)
- Last Synced: 2025-07-11T07:23:11.550Z (6 months ago)
- Language: Python
- Size: 271 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# von Mises-Fisher Sampling of GloVe Vectors
Repository for the paper "von Mises-Fisher Sampling of GloVe Vectors" by W. Bendada, G. Salha-Galvan, R. Hennequin, T. Bontempelli, T. Bouabça and T. Cazenave, presented at the FPI workshop of ICLR 2025.
## Introduction
We recently introduced, in [1], von Mises-Fisher exploration (vMF-exp), a scalable sampling method for exploring large action sets in reinforcement learning problems where hyperspherical embedding vectors represent these actions.
In this prior work, we demonstrated that vMF-exp scales to millions of actions
and exhibits several desirable properties. However, we did not test vMF-exp on publicly available
real-world data, which is essential for reproducibility and deeper understanding of the method.
We address this limitation by experimentally validating the main properties of vMF-exp on
a large-scale, publicly available real-world dataset of GloVe word embedding vectors. The purpose of this paper is to provide a
fresh perspective on our initial work, with reinforced validation.
## Download data
The GloVe word embedding dataset used in our experiments is publicly available for
download at: [https://nlp.stanford.edu/projects/glove/](https://nlp.stanford.edu/projects/glove/)
Experiences were run using the 25-dimensional embedding vectors (GloVe-25).
After downloading the correct file, unzip it and place it in a folder named `dataset`.
## Compute probabilities for a given set of parameters
The script `compute_probas.py` will run Monte Carlo simulations estimating the probability for von Mises-Fisher exploration and Boltzmann exploration to sample an action with known similarity given a state vector.
All vectors are sampled from the GloVe-25 dataset previously downloaded. The result can then be plotted using `plot_probas.py`.
For instance, to reproduce **Figure 2.a**, one can run the following command:
```
python -m src.compute_probas -k 1 -a 0.0 -n glove.25 -bs 3000 -nt 10000
```
which will run the corresponding Monte Carlo Simulations, followed by the command:
```
python -m src.plot_probas --path results/glove.25/k\=1.0_a\=0.00_samples\=30000000/
```
which will create a plot similar to the following one:

and saved in a sub-folder of `/results/` named according to the chosen parameters.
## Compare Boltzmann and von-Mises Fisher Explorations for a range of values
The script `compare_boltzmann_vs_vmf.py` will reproduce **Figure 1.a** and **Figure 1.b** for a specified range of values of that must first be computed using `compute_probas.py` with changing values of a (see above).
For instance, running `compute_probas.py` several times with values of a in [0.9,0.3,0.0,-0.3,-0.9] and then running:
```
python -m src.compare_boltzmann_vs_vmf --values 0.9,0.3,0.0,-0.3,-0.9
```
will create the following two plots:


## References
[1] Bendada et al., vMF-exp: von Mises-Fisher Exploration of Large Action Sets with Hyperspherical Embeddings, 2024