https://github.com/jaanli/rankfromsets

RankFromSets - SDSS submission code for reproducibility.
https://github.com/jaanli/rankfromsets

Last synced: 4 months ago
JSON representation

RankFromSets - SDSS submission code for reproducibility.

Host: GitHub
URL: https://github.com/jaanli/rankfromsets
Owner: jaanli
License: mit
Created: 2018-12-19T19:44:33.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2020-01-28T23:50:56.000Z (over 5 years ago)
Last Synced: 2024-12-27T20:29:25.270Z (6 months ago)
Language: HTML
Homepage:
Size: 9.09 MB
Stars: 4
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        ![alt text](https://github.com/kdd-anonymous/rankfromsets/raw/master/arxiv_user_embeddings_tsne.png)

# RankFromSets

This code accompanies the RankFromSets SDSS submission. 

To view the above visualization in a browser, please download [this HTML file](https://github.com/altosaar/rankfromsets/raw/master/(Download%20and%20open%20in%20a%20browser!)%20rankfromsets-65k-arXiv-user-embeddings-t-SNE.html).

## Environment

These experiments were conducted on a red hat linux cluster with Nvidia P100

GPUs.

Python environment, using the Anaconda python package manager:

```

conda env create -f environment.yml

```

## Data format

`{train, valid, test}.tsv` files are observations of user, item interactions.

The `item_attributes_csr.npz` is a [compressed sparse row](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) format matrix of shape `(n_items, n_attributes)`. For example, if the data is documents in a bag of words format, each row is a document and the attributes are the words.

## Synthetic Example

We omit raw arXiv data and food tracking data as it is private user data.

This follows the reproducibility supplement example of a square kernel.

Generate data to `/tmp/dat/simulation_%d` where %d is a number from 1 to 30 replications:

```

export DAT=/tmp

```

```

python build_simulation_dataset.py

```

Launch the best-performing parameter settings with the SLURM manager for the inner product, deep, and residual regression functions:

```

PYTHONPATH=. python experiment/arxiv/grid.py

```

## Hyperparameters

We find that large batch sizes significantly improve performance. See `config.yml` for the best-performing hyperparameters.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jaanli/rankfromsets

Awesome Lists containing this project

README