https://github.com/pliang279/sparse_discrete
[ICLR 2021] Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
https://github.com/pliang279/sparse_discrete
deep-learning efficient-neural-networks machine-learning natural-language-processing recommender-system sparse-representations
Last synced: 9 months ago
JSON representation
[ICLR 2021] Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
- Host: GitHub
- URL: https://github.com/pliang279/sparse_discrete
- Owner: pliang279
- Created: 2020-05-02T00:31:04.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2021-05-04T07:20:41.000Z (about 5 years ago)
- Last Synced: 2025-08-11T02:36:26.506Z (11 months ago)
- Topics: deep-learning, efficient-neural-networks, machine-learning, natural-language-processing, recommender-system, sparse-representations
- Language: Python
- Homepage:
- Size: 5.83 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
> Pytorch implementation for Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Correspondence to:
- Paul Liang (pliang@cs.cmu.edu)
- Manzil Zaheer (manzilzaheer@google.com)
## Paper
[**Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies**](https://arxiv.org/abs/2003.08197)
[Paul Pu Liang](http://www.cs.cmu.edu/~pliang/), [Manzil Zaheer](http://www.manzil.ml/), [Yuan Wang](https://ai.google/research/people/YuanWang), [Amr Ahmed](https://ai.google/research/people/AmrAhmed)
ICLR 2021
If you find this repository useful, please cite our paper:
```
@inproceedings{liang2021anchor,
author = {Paul Pu Liang and
Manzil Zaheer and
Yuan Wang and
Amr Ahmed},
title = {Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies},
booktitle = {9th International Conference on Learning Representations, {ICLR} 2021},
publisher = {OpenReview.net},
year = {2021},
url = {https://openreview.net/forum?id=Vd7lCMvtLqg}
}
```
## Installation
First check that the requirements are satisfied:
Python 3.6
torch 1.2.0
numpy 1.18.1
matplotlib 3.1.2
tqdm 4.45.0
The next step is to clone the repository:
```bash
git clone https://github.com/pliang279/sparse_discrete.git
```
## Data
### Movielens data
download Movielens 25m data from http://files.grouplens.org/datasets/movielens/ml-25m.zip and unzip into a folder ml-25m/
download Movielens 1m data from http://files.grouplens.org/datasets/movielens/ml-1m.zip and unzip into a folder ml-1m/
run ```python3 movielens_data.py``` which extracts the .dat files in ml-1m/ and generates ml-1m/ml1m_ratings.csv
by now, make sure you have the files ```ml-25m/ratings.csv``` and ```ml-1m/ml1m_ratings.csv```
### Amazon review data
download amazon data from http://deepyeti.ucsd.edu/jianmo/amazon/categoryFilesSmall/all_csv_files.csv into a folder called amazon_data/
run ```python3 movielens_data.py```, which parses the .csv files in amazon_data/ and generates the file ```amazon_data/saved_amazon_data_filtered5.h5```
## Instructions
### Movielens data
MF baseline: ```python3 movielens.py --model_path MF --latent_dim 16 --dataset 25m```
MixDim embeddings: ```python3 movielens.py --model_path mdMF --base_dim 16 --temperature 0.4 --k 8 --dataset 25m```
ANT: ```python3 movielens.py --model_path sparseMF --latent_dim 16 --user_anchors 10 --item_anchors 15 --lda2s 2e-6 --lda2e 2e-6 --dataset 25m```
NBANT: ```python3 movielens.py --model_path sparseMF --latent_dim 16 --lda1 0.1 --lda2s 2e-6 --lda2e 2e-6 --dataset 25m --dynamic```
### Amazon review data
MF: ```python3 amazon.py --model_path MF --latent_dim 16 --dataset amazon```
ANT: ```python3 amazon.py --model_path sparseMF --latent_dim 16 --user_anchors 8 --item_anchors 8 --lda2s 1e-7 --lda2e 1e-7 --dataset amazon```