Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lenskit/lk-demo-experiment
Example project for running LensKit experiments
https://github.com/lenskit/lk-demo-experiment
Last synced: 2 months ago
JSON representation
Example project for running LensKit experiments
- Host: GitHub
- URL: https://github.com/lenskit/lk-demo-experiment
- Owner: lenskit
- Created: 2019-05-08T15:43:17.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2023-11-10T19:09:46.000Z (about 1 year ago)
- Last Synced: 2024-08-04T01:13:08.175Z (6 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 811 KB
- Stars: 13
- Watchers: 5
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LensKit Demo Experiment
This repository contains a demo experiment for running LensKit experiments on
public data sets with current best practices for moderately-sized experiments.## Layout
This experiment uses [DVC](https://dvc.org) to script the experiment, and is
laid out in several subcomponents:- `lkdemo` is a Python package containing support code (e.g. log configurations)
and algorithm definitions. Two files are of particular interest:- `lkdemo/algorithms.py` defines the different algorithms we can train with
sensible default configurations.
- `lkdemo/datasets.py` defines the different data sets, so that any
supported data set can be loaded into the format [LensKit expects][lkdata]
in a uniform fashion.- `data` contains data files and controls.
- `data-split` contains cross-validation splits, produced by `split-data.py`.
These splits only contain the test files, to save disk space - the train files
can be obtained with `lkdemo.datasets.ds_diff`, as seen in `run-algo.py`.- `runs` contains the results of running LensKit train/test runs.
- Various Python scripts to run individual pieces of the analysis. They use
`docopt` for parsing their arguments and thus have comprehensive usage docs
in their docstrings.- Jupyter notebooks to analyze results. These are parameterized and run with
[Papermill][] to analyze different data sets with the same notebook.[Papermill]: https://papermill.readthedocs.io/en/latest/
[lkdata]: https://lkpy.lenskit.org/en/stable/datasets.html## Setup
This experiment comes with an Anaconda environment file that defines the
necessary dependencies. To set up and activate:conda env create -f environment.yml
conda activate lk-demoAfter creating the environment, you just need to `activate`; you can update the
environment with `conda env update -f environment.yml`.## Running
The `dvc` program controls runs of individual steps, including downloading data.
For example, to download the ML-20M data set and recommend with ALS, run:dvc repro runs/dvc.yaml:ml20m@ALS
To re-run the whole experiment:
dvc repro
To reproduce results on one data set:
dvc repro eval-report-ml100k
## Extending
The various `dvc.yaml` files control the run. Look at them to modify and extend!
You will probably want to consult the [DVC user guide][dvc].
[dvc]: https://dvc.org/doc/user-guide/