https://github.com/jaanli/rankfromsets
RankFromSets - SDSS submission code for reproducibility.
https://github.com/jaanli/rankfromsets
Last synced: 4 months ago
JSON representation
RankFromSets - SDSS submission code for reproducibility.
- Host: GitHub
- URL: https://github.com/jaanli/rankfromsets
- Owner: jaanli
- License: mit
- Created: 2018-12-19T19:44:33.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2020-01-28T23:50:56.000Z (over 5 years ago)
- Last Synced: 2024-12-27T20:29:25.270Z (6 months ago)
- Language: HTML
- Homepage:
- Size: 9.09 MB
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

# RankFromSets
This code accompanies the RankFromSets SDSS submission.
To view the above visualization in a browser, please download [this HTML file](https://github.com/altosaar/rankfromsets/raw/master/(Download%20and%20open%20in%20a%20browser!)%20rankfromsets-65k-arXiv-user-embeddings-t-SNE.html).
## Environment
These experiments were conducted on a red hat linux cluster with Nvidia P100
GPUs.Python environment, using the Anaconda python package manager:
```
conda env create -f environment.yml
```## Data format
`{train, valid, test}.tsv` files are observations of user, item interactions.The `item_attributes_csr.npz` is a [compressed sparse row](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) format matrix of shape `(n_items, n_attributes)`. For example, if the data is documents in a bag of words format, each row is a document and the attributes are the words.
## Synthetic Example
We omit raw arXiv data and food tracking data as it is private user data.
This follows the reproducibility supplement example of a square kernel.
Generate data to `/tmp/dat/simulation_%d` where %d is a number from 1 to 30 replications:
```
export DAT=/tmp
``````
python build_simulation_dataset.py
```Launch the best-performing parameter settings with the SLURM manager for the inner product, deep, and residual regression functions:
```
PYTHONPATH=. python experiment/arxiv/grid.py
```## Hyperparameters
We find that large batch sizes significantly improve performance. See `config.yml` for the best-performing hyperparameters.