{"id":22449656,"url":"https://github.com/naszilla/reczilla","last_synced_at":"2025-08-01T23:31:09.266Z","repository":{"id":39413184,"uuid":"437106890","full_name":"naszilla/reczilla","owner":"naszilla","description":"RecZilla: Metalearning for algorithm selection on Recommender Systems","archived":false,"fork":false,"pushed_at":"2023-10-21T21:32:46.000Z","size":106954,"stargazers_count":21,"open_issues_count":5,"forks_count":1,"subscribers_count":5,"default_branch":"main","last_synced_at":"2023-10-21T22:28:50.669Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/naszilla.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-10T20:34:17.000Z","updated_at":"2023-08-29T22:01:20.000Z","dependencies_parsed_at":"2023-02-06T06:47:18.102Z","dependency_job_id":null,"html_url":"https://github.com/naszilla/reczilla","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naszilla%2Freczilla","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naszilla%2Freczilla/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naszilla%2Freczilla/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/naszilla%2Freczilla/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/naszilla","download_url":"https://codeload.github.com/naszilla/reczilla/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228414002,"owners_count":17915921,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-06T05:10:39.449Z","updated_at":"2024-12-06T05:10:40.124Z","avatar_url":"https://github.com/naszilla.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cbr/\u003e\n\u003cp align=\"center\"\u003e\u003cimg src=\"img/logo3.png\" width=700 /\u003e\u003c/p\u003e\n\n----\n![Crates.io](https://img.shields.io/crates/l/Ap?color=orange)\n\n`RecZilla` is a framework which provides the functionality to perform metalearning for algorithm selection on recommender systems datasets. It uses a meta-learner model to predict the best algorithm and hyperparameters for new, unseen datasets. \n\nSee our NeurIPS 2022 paper at [https://arxiv.org/abs/2206.11886](https://arxiv.org/abs/2206.11886).\n\n# Overview\nThe figure below shows the overview of the end-to-end `RecZilla` framework pipeline.\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"img/reczilla_overview.png\" width=700 /\u003e\u003c/p\u003e\n\nThis repository is based on the public repository [RecSys2019_DeepLearning_Evaluation](https://github.com/MaurizioFD/RecSys2019_DeepLearning_Evaluation). We use several core functions of this codebase---for training and evaluating algorithms, and for reading and splitting datasets. This repo extends the original in several ways:\n- [Data_manager/](RecSys2019_DeepLearning_Evaluation/Data_manager): added several datasets, and added a global timestamp splitting function\n- [Experiment_handler/](RecSys2019_DeepLearning_Evaluation/Experiment_handler): added classes and scripts for training and evaluating recsys algorithms on datasets\n- [Metafeatures/](RecSys2019_DeepLearning_Evaluation/Metafeatures): added classess and scripts for calculating metafeatures of recsys datasets\n- [ParameterTuning/](RecSys2019_DeepLearning_Evaluation/ParameterTuning): added a `ParameterSpace` class and `RandomSearch` class for random hyperparameter search\n- [ReczillaClassifier/](RecSys2019_DeepLearning_Evaluation/ReczillaClassifier): added classes and scripts for training and using a recsys meta-model\n- [algorithm_handler.py](RecSys2019_DeepLearning_Evaluation/algorithm_handler.py): added a function for accessing all implemented algorithms and their hyperparameter spaces\n- [dataset_handler.py](RecSys2019_DeepLearning_Evaluation/dataset_handler.py): added a function for accessing all implemented datasets\n- removed several large dataset files from the repo\n- made several small changes and bug fixes to support our experiments\n\n**NOTE: unless specified otherwise, all code should be run from the directory `reczilla/RecSys2019_DeepLearning_Evaluation/`**\n\n# Table of contents\n1. [Installation](#Installation)\n2. [Datasets](#Datasets)\n    1. [Loading An Implemented Dataset](#LoadingAnImplementedDataset)\n    2. [Loading All Implemented Datasets](#LoadingAllImplementedDatasets)\n    3. [Loading New Datasets](#LoadingNewDatasets)\n3. [Evaluating Recsys Algorithms](#EvaluatingRecsysAlgorithms)\n4. [Meta-Learning](#Meta-Learning)\n   1. [Main script overview](#MainScriptOverview)\n   2. [Training a new meta-model](#TrainingANewMetaModel)\n   3. [Using a Trained Meta-Model for Inference](#UsingATrainedMetaModelForInference)\n# Installation \u003ca name=\"Installation\"\u003e\u003c/a\u003e\n\nYou need Python 3.6 to use this repository.\n\nYou can start by first creating a new environment using `conda` or your preferred method.\n\n```\n# using conda\nconda create -n DLevaluation python=3.6 anaconda\nconda activate DLevaluation\n```\n\nOnce you're done with the above step, you need to install all the dependencies in the `requirements.txt` file using,\n```\npip install -r requirements.txt\n```\n\nNext step, you need to compile all the Cython algorithms. For that you will need to install `gcc` and `python3-dev`. You can install it on Linux as,\n```\nsudo apt install gcc \nsudo apt-get install python3-dev\n```\n\nOnce installed, you can compile all the Cython algorithms by running the below command in the `RecSys2019_DeepLearning_Evaluation` directory,\n```\npython run_compile_all_cython.py\n```\nAnd you're all set!\n\n# Datasets \u003ca name=\"Datasets\"\u003e\u003c/a\u003e\n\nEach recsys dataset is managed using an instance of class `DataReader` in [`DataReader.py`](RecSys2019_DeepLearning_Evaluation/Data_manager/DataReader.py). All datasets in our paper are implemented as custom subclasses of `DataReader` objects---this object handles downloading, splitting, and i/o. In the current implementation **datasets must be read using a `DataReader` object.**\n\nBefore using any recsys dataset for training, testing, or meta-learning tasks, you need to **load the dataset by calling the `load_data()` function of its `DataReader object.** This function writes a version of the dataset locally.\n\n## Loading An Implemented Dataset \u003ca name=\"LoadingAnImplementedDataset\"\u003e\u003c/a\u003e\n\nEach dataset used in our experiment has a custom `DataReader` class; a list of these classes can be found in `Data_manager.dataset_handler.DATASET_READER_LIST`. For example, the following python code downloads the `Movielens100K` dataset to a local folder, creates a global-timestamp split, and saves the split in a different folder:\n\n```python\nfrom Data_manager.Movielens.Movielens100KReader import Movielens100KReader\n\n# Folder where dataset will be loaded from. The dataset will be downloaded if it's not found here.\ndata_folder = \"/home/datasets\"\n\n# load the dataset\ndata_reader = Movielens100KReader(folder=data_folder)\nloaded_dataset = data_reader.load_data()\n```\n\u003cdetails\u003e\n  \u003csummary\u003eexpected output\u003c/summary\u003e\n\n```commandline\nMovielens100K: reload_from_original_data is 'as-needed', will only reload original data if it cannot be found.\nMovielens100K: Preloaded data not found, reading from original files...\nMovielens100K: Loading original data\nMovielens100K: Unable to fild data zip file. Downloading...\nDownloading: http://files.grouplens.org/datasets/movielens/ml-100k.zip\nIn folder: /code/reczilla/RecSys2019_DeepLearning_Evaluation/Data_manager/../Data_manager_split_datasets/Movielens100K/ml-100k.zip\nDataReader: Downloaded 100.00%, 4.70 MB, 922 KB/s, 5 seconds passed\nMovielens100K: cleaning temporary files\nMovielens100K: loading complete\nMovielens100K: Verifying data consistency...\nMovielens100K: Verifying data consistency... Passed!\nMovielens100K: Found already existing folder '/home/datasets'\nMovielens100K: Saving complete!\n```\n\u003c/details\u003e\n\nNow, the dataset `Moviekens100K` has been downloaded to folder `/home/datasets`. The following python code creates a global timestamp split for this dataset:\n\n```python\nfrom Data_manager.DataSplitter_global_timestamp import DataSplitter_global_timestamp\n\n# Folder where dataset splits will be written\nsplit_folder = \"/home/splits/MovieLens100K\"\n\n# split the dataset, and write it to file\ndata_splitter = DataSplitter_global_timestamp(data_reader)\ndata_splitter.load_data(save_folder_path=split_folder)\n```\n\u003cdetails\u003e\n  \u003csummary\u003eexpected output\u003c/summary\u003e\n\n```commandline\nDataSplitter_global_timestamp: Cold users not allowed\nDataSplitter_global_timestamp: Preloaded data not found, reading from original files...\nMovielens100K: Verifying data consistency...\nMovielens100K: Verifying data consistency... Passed!\nsplit_data_on_global_timestamp: 192 cold users of total 943 users skipped\nDataSplitter_global_timestamp: Split complete\nDataSplitter_global_timestamp: Verifying data consistency...\nDataSplitter_global_timestamp: Verifying data consistency... Passed!\nDataSplitter_global_timestamp: Preloaded data not found, reading from original files... Done\nDataSplitter_global_timestamp: DataReader: Movielens100K\n\tNum items: 1682\n\tNum users: 751\n\tTrain \t\tinteractions 79999, \tdensity 6.33E-02\n\tValidation \tinteractions 1535, \tdensity 1.22E-03\n\tTest \t\tinteractions 1418, \tdensity 1.12E-03\nDataSplitter_global_timestamp: \nDataSplitter_global_timestamp: Done.\n```\n\u003c/details\u003e\n\nNow, the global timestamp split of `Movielens100K` has been written to `/home/splits/MovieLens100K`.\n\n## Loading All Implemented Datasets \u003ca name=\"LoadingAllImplementedDatasets\"\u003e\u003c/a\u003e\nThe script `Data_manager.create_all_data_splits` runs the above procedure on all datasets used in our experiments:\n\n```commandline\nusage: create_all_data_splits.py [-h] --data-dir DATA_DIR --splits-dir\n                                 SPLITS_DIR\n\narguments:\n  --data-dir DATA_DIR   Directory where the downloaded dataseta have been\n                        stored. If a dataset is not downloaded, it will be\n                        downloaded.\n  --splits-dir SPLITS_DIR\n                        Directory where the splits will be saved.\n```\n\n## Loading New Datasets \u003ca name=\"LoadingNewDatasets\"\u003e\u003c/a\u003e\n\nTo load a recsys dataset that is not currently implemented, you need to create a subclass of `Data_manager.DataReader`, which specifies the loading procedure for the dataset. Once you create a `DataReader` for your dataset, you can use the same splitting and loading process from above.\n\nIf the dataset is in CSV format with columns `user_id, item_id, rating, timestamp`, then it is simple to create a class based on the example class `ExampleCSVDatasetReader`, which loads a dataset from a sample CSV included in this repository. \n\nThis class reads a CSV from a fixed path, and loads it using shared functions:\n```python\n#### from Dataset_manager/ExampleCSVDataset/ExampleCSVDatasetReader.py\n...\n\nURM_path = \"../examples/random_rating_list.csv\"\n\n(\n    URM_all,\n    URM_timestamp,\n    item_original_ID_to_index,\n    user_original_ID_to_index,\n) = load_CSV_into_SparseBuilder(\n    URM_path, separator=\",\", header=True, remove_duplicates=True, timestamp=True\n)\n\nloaded_URM_dict = {\"URM_all\": URM_all, \"URM_timestamp\": URM_timestamp}\n\nloaded_dataset = Dataset(\n    dataset_name=self._get_dataset_name(),\n    URM_dictionary=loaded_URM_dict,\n    ICM_dictionary=None,\n    ICM_feature_mapper_dictionary=None,\n    UCM_dictionary=None,\n    UCM_feature_mapper_dictionary=None,\n    user_original_ID_to_index=user_original_ID_to_index,\n    item_original_ID_to_index=item_original_ID_to_index,\n    is_implicit=self.IS_IMPLICIT,\n)\n\n...\n```\n---\n\n\n# Evaluating Recsys Algorithms \u003ca name=\"EvaluatingRecsysAlgorithms\"\u003e\u003c/a\u003e\n\nThe main results from our paper are based on a \"meta-dataset\", which consists of performance metrics for a large number of parameterized recsys algorithms on all recsys datasets implemented in this codebase.\n\nTo generate results for each algorithm-dataset pair, we use the script `Experiment_handler.run_experiment`, which takes several positional arguments: \n\n```\nusage: run_experiment.py [-h]\n                         time_limit dataset_name split_type alg_name split_dir\n                         alg_seed param_seed num_samples result_dir\n                         experiment_name original_split_path\n\npositional arguments:\n  time_limit           time limit in seconds\n  dataset_name         name of dataset. we use this to find the dataset and\n                       split.\n  split_type           name of datasplitter to use. we use this to find the\n                       split directory.\n  alg_name             name of the algorithm to use.\n  split_dir            directory containing split data files.\n  alg_seed             random seed passed to the recommender algorithm. only\n                       for random algorithms.\n  param_seed           random seed for generating random hyperparameters.\n  num_samples          number of hyperparameter samples.\n  result_dir           directory where result dir structure will be written.\n                       this directory should exist.\n  experiment_name      name of the result directory that will be created.\n  original_split_path  full path to the split data. only used for bookkeeping.\n```\n\nFor example, the following call trains and evaluates 5 hyperparameter samples for algorithm `P3alphaRecommender`, using the split created in the previous section. The results of this experiment will be written to `/home/results`.\n\n```commandline\n\n# first, create a directory to write results in\nmkdir ./example-results\n\npython -m Experiment_handler.run_experiment \\\n    7200 \\\n    Movielens100K \\\n    DataSplitter_global_timestamp \\\n    P3alphaRecommender \\\n    /home/splits/MovieLens100K \\\n    0 \\\n    0 \\\n    5 \\\n    ./example-results \\\n    example-experiment \\\n    original-split-path\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003eoutput\u003c/summary\u003e\n\n```commandline\n[2022-06-21 12:06:57,142] [Experiment.py:__init__] : initializing Experiment: base_directory=/code/reczilla/RecSys2019_DeepLearning_Evaluation/example-results, result_directory=/code/reczilla/RecSys2019_DeepLearning_Evaluation/example-results/example-experiment, data_directory=None\n[2022-06-21 12:06:57,143] [Experiment.py:__init__] : found result directory: /code/reczilla/RecSys2019_DeepLearning_Evaluation/example-results/example-experiment\n[2022-06-21 12:06:57,143] [Experiment.py:prepare_dataset] : initialized dataset in Movielens100K\n[2022-06-21 12:06:57,254] [Experiment.py:prepare_split] : found a split in directory /home/splits/MovieLens100K_splits\n[2022-06-21 12:06:57,254] [Experiment.py:prepare_split] : initialized split Movielens100K/DataSplitter_global_timestamp\n[2022-06-21 12:06:57,254] [Experiment.py:run_experiment] : WARNING: URM_validation not found in URM_dict for split Movielens100K/DataSplitter_global_timestamp\nEvaluatorHoldout: Ignoring 81 (89.2%) Users that have less than 1 test interactions\nEvaluatorHoldout: Ignoring 69 (90.8%) Users that have less than 1 test interactions\n[2022-06-21 12:06:57,257] [Experiment.py:run_experiment] : starting experiment, writing results to example-results\n[2022-06-21 12:06:57,292] [RandomSearch.py:_log_info] : RandomSearch: Starting parameter set\n\nP3alphaRecommender: URM Detected 66 (3.92 %) cold items.\nEvaluatorHoldout: Processed 81 (100.0%) in 0.34 sec. Users per second: 240\nEvaluatorHoldout: Processed 69 (100.0%) in 0.32 sec. Users per second: 213\nDataIO: Json dumps supports only 'str' as dictionary keys. Transforming keys to string, note that this will alter the mapper content.\n[2022-06-21 12:06:58,182] [RandomSearch.py:_log_info] : RandomSearch: Starting parameter set 1 of 5\n\nP3alphaRecommender: URM Detected 66 (3.92 %) cold items.\nEvaluatorHoldout: Processed 81 (100.0%) in 0.33 sec. Users per second: 243\nEvaluatorHoldout: Processed 69 (100.0%) in 0.30 sec. Users per second: 227\n[2022-06-21 12:07:00,094] [RandomSearch.py:_log_info] : RandomSearch: Starting parameter set 2 of 5\n\nP3alphaRecommender: URM Detected 66 (3.92 %) cold items.\nEvaluatorHoldout: Processed 81 (100.0%) in 0.32 sec. Users per second: 250\nEvaluatorHoldout: Processed 69 (100.0%) in 0.31 sec. Users per second: 221\n[2022-06-21 12:07:01,058] [RandomSearch.py:_log_info] : RandomSearch: Starting parameter set 3 of 5\n\nP3alphaRecommender: URM Detected 66 (3.92 %) cold items.\nEvaluatorHoldout: Processed 81 (100.0%) in 0.38 sec. Users per second: 215\nEvaluatorHoldout: Processed 69 (100.0%) in 0.31 sec. Users per second: 220\n[2022-06-21 12:07:02,465] [RandomSearch.py:_log_info] : RandomSearch: Starting parameter set 4 of 5\n\nP3alphaRecommender: URM Detected 66 (3.92 %) cold items.\nEvaluatorHoldout: Processed 81 (100.0%) in 0.33 sec. Users per second: 248\nEvaluatorHoldout: Processed 69 (100.0%) in 0.27 sec. Users per second: 257\n[2022-06-21 12:07:04,678] [RandomSearch.py:_log_info] : RandomSearch: Search complete. Output written to: example-results/\n\n[2022-06-21 12:07:04,684] [Experiment.py:run_experiment] : results written to file: example-results/result_20220621_120657_metadata.zip\ninitial result file: example-results/result_20220621_120657_metadata.zip\nrenaming to: example-results/result.zip\n```\n\u003c/details\u003e\n\nThere are two files of interest created by this experiment script, both written to the results folder provided (`example-results`):\n- a log file with namning convention `result_yyyymmdd_hhmmss_RandomSearch.txt`\n- the hyperparameters and evaluation metrics, stored in a zip archive named `result.zip`\n\n\n---\n# Meta-Learning \u003ca name=\"Meta-Learning\"\u003e\u003c/a\u003e\n\n## Main script overview \u003ca name=\"MainScriptOverview\"\u003e\u003c/a\u003e\n\n\nThe main script for meta-learning is `run_reczilla.py`, which must be run from the folder `RecSys2019_DeepLearning_Evaluation`. This script has two functions: (1) to train a new meta-model, and (2) use a pre-trained meta-model to train a new recommender on a dataset; both of these tasks can be performmed in the same call.\n\nThe script takes in these arguments:\n\n```\n\u003e python -m ReczillaClassifier.run_reczilla -h\nusage: run_reczilla.py [-h] [--train_meta] --metamodel_filepath\n                       METAMODEL_FILEPATH\n                       [--dataset_split_path DATASET_SPLIT_PATH]\n                       [--rec_model_save_path REC_MODEL_SAVE_PATH]\n                       [--metadataset_name METADATASET_NAME]\n                       [--metamodel_name {xgboost,knn,linear,svm-poly}]\n                       [--target_metric TARGET_METRIC]\n                       [--num_algorithms NUM_ALGORITHMS]\n                       [--num_metafeatures NUM_METAFEATURES]\n\nRun Reczilla on a new dataset.\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --train_meta          Use to train a new metalearner Reczilla model (instead\n                        of loading).\n  --metamodel_filepath METAMODEL_FILEPATH\n                        Filepath of Reczilla model (to save or load).\n  --dataset_split_path DATASET_SPLIT_PATH\n                        Path of dataset split to perform inference on. Only\n                        required if performing inference\n  --rec_model_save_path REC_MODEL_SAVE_PATH\n                        Destination path for recommender model trained on\n                        dataset on dataset_split_path.\n  --metadataset_name METADATASET_NAME\n                        Name of metadataset (required if training metamodel).\n  --metamodel_name {xgboost,knn,linear,svm-poly}\n                        Name of metalearner to use (required if training\n                        metamodel).\n  --target_metric TARGET_METRIC\n                        Target metric to optimize.\n  --num_algorithms NUM_ALGORITHMS\n                        Number of algorithms to use in Reczilla (required if\n                        training metamodel).\n  --num_metafeatures NUM_METAFEATURES\n                        Number of metafeatures to select for metalearner.\n```\n\n## Training a new meta-model \u003ca name=\"TrainingANewMetaModel\"\u003e\u003c/a\u003e\nThe following files are required for training a new metamodel. Both of these files can be downloaded from a public Google drive folder, [here](https://drive.google.com/drive/folders/1-WqiHpQOJa04zPSS45xGp7CKcFU14qVi?usp=sharing):\n\n1. `Metafeatures.csv`: The dataset metafeatures. **Note:** This file must be placed in the local directory `reczilla/RecSys2019_DeepLearning_Evaluation/Metafeatures/`\n2. `metadata-v2.pkl`: performance metadataset, containing performance metrics of algorithms on each recsys dataset. **Note:** this file must be placed in the local directory`reczilla/RecSys2019_DeepLearning_Evaluation/metadatasets/`\n\nThe script `train_reczilla_models.sh` shows samples for training metalearners for different metrics. The script does the following:\n1. Creates a directory `ReczillaModels` to save new meta-models\n2. Trains a metamodel for precision @ 10 and saves it to `ReczillaModels/prec_10.pickle`\n3. Trains a metamodel for training time and saves it to `ReczillaModels/time_on_train.pickle`\n4. Trains a metamodel for MRR @ 10 and saves it to `ReczillaModels/mrr_10.pickle`\n5. Trains a metamodel for item hit coverage @ 10 and saves it to `ReczillaModels/item_hit_cov.pickle`\n\nFor this script, the expected output should be similar to the following:\n```commandline\npython -m ReczillaClassifier.run_reczilla     --train_meta     --metamodel_filepath=\"../ReczillaModels/prec_10.pickle\"     --target_metric=\"PRECISION_cut_10\"     --num_algorithms=10     --num_metafeatures=10\nselecting algs and features..\ndone selecting algs in :  0:00:21.533609\nComputing correlations...\ndone selecting features in :  0:00:02.300044\nMetamodel saved to ../ReczillaModels/prec_10.pickle\n\npython -m ReczillaClassifier.run_reczilla     --train_meta     --metamodel_filepath=\"../ReczillaModels/time_on_train.pickle\"     --target_metric=\"time_on_train\"     --num_algorithms=10     --num_metafeatures=10\nselecting algs and features..\ndone selecting algs in :  0:00:25.295785\nComputing correlations...\ndone selecting features in :  0:00:02.587050\nMetamodel saved to ../ReczillaModels/time_on_train.pickle\n\npython -m ReczillaClassifier.run_reczilla      --train_meta    --metamodel_filepath=\"../ReczillaModels/mrr_10.pickle\"  --target_metric=\"MRR_cut_10\"   --num_algorithms=10      --num_metafeatures=10\nselecting algs and features..\ndone selecting algs in :  0:00:15.817387\nComputing correlations...\ndone selecting features in :  0:00:01.631595\nMetamodel saved to ../ReczillaModels/mrr_10.pickle\n\npython -m ReczillaClassifier.run_reczilla     --train_meta     --metamodel_filepath=\"../ReczillaModels/item_hit_cov.pickle\"     --target_metric=\"COVERAGE_ITEM_HIT_cut_10\"     --num_algorithms=10     --num_metafeatures=10\nselecting algs and features..\ndone selecting algs in :  0:00:20.211772\nComputing correlations...\ndone selecting features in :  0:00:03.447911\nMetamodel saved to ../ReczillaModels/item_hit_cov.pickle\n```\n\n\n## Using a Trained Meta-Model for Inference \u003ca name=\"UsingATrainedMetaModelForInference\"\u003e\u003c/a\u003e\nA sample script to perform inference on a new dataset is provided in `run_reczilla_inference.sh`. It uses pre-trained Reczilla models (located in the folder `ReczillaModels`) to select and train a recommender on a dataset specified on a path. This script can be modified to run inference on new datasets.\n\nThe only required files for execution is a pre-trained metamodel and a dataset to perform inference on. In the case of `run_reczilla_inference.sh`, these correspond to:\n1. `ReczillaModels/prec_10.pickle` (metamodel)\n2. `ReczillaModels/time_on_train.pickle` (metamodel)\n3. `all_data/splits-v5/AmazonGiftCards/DataSplitter_leave_k_out_last` (folder with dataset split to perform inference on)\n\nThe script does the following:\n1. Use the pre-trained precision @ 10 meta-model to select an algorithm to train on the dataset under `all_data/splits-v5/AmazonGiftCards/DataSplitter_leave_k_out_last`, and saves the recommender to a zip file with the prefix `prec_10_`.\n2. Use the pre-trained time on train meta-model to select an algorithm to train on the dataset under `all_data/splits-v5/AmazonGiftCards/DataSplitter_leave_k_out_last`, and saves the recommender to a zip file with the prefix `train_time_`.\n\nFor example, the following command does the following:\n- reads the `Movielens100K` dataset split created earlier in this README\n- reads the meta-model `ReczillaModels/prec_10.pickle` created in the example above\n- estimates the best parameterized recsys algorithm for the `Movielens100K` training split, using the `prec_10.pickle` metamodel\n- trains this parameterized recsys algorithm on the `Movielens100K` training split, and saves the trained model to file `../prec_10_{model name}.zip`.\n\n```commandline\npython -m ReczillaClassifier.run_reczilla \\\n        --dataset_split_path=\"/home/splits/MovieLens100K\" \\\n        --metamodel_filepath=\"../ReczillaModels/prec_10.pickle\" \\\n        --rec_model_save_path=\"../prec_10_\"\n\n```\n\n\u003cdetails\u003e\n  \u003csummary\u003eexpected output\u003c/summary\u003e\n\n```commandline\nLoading metamodel from ../ReczillaModels/prec_10.pickle\nDataSplitter_global_timestamp: Cold users not allowed\nDataSplitter_global_timestamp: Verifying data consistency...\nDataSplitter_global_timestamp: Verifying data consistency... Passed!\nDataSplitter_global_timestamp: DataReader: Movielens100K\n        Num items: 1682\n        Num users: 751\n        Train           interactions 79999,     density 6.33E-02\n        Validation      interactions 1535,      density 1.22E-03\n        Test            interactions 1418,      density 1.12E-03\n\nDataSplitter_global_timestamp: \n\nDataSplitter_global_timestamp: Done.\nEvaluatorHoldout: Processed 100 (100.0%) in 0.05 sec. Users per second: 1966\nSimilarity column 100 (100.0%), 68255.56 column/sec. Elapsed time 0.00 sec\nEvaluatorHoldout: Processed 100 (100.0%) in 0.05 sec. Users per second: 2009\nDataSplitter_global_timestamp: Cold users not allowed\nDataSplitter_global_timestamp: Verifying data consistency...\nDataSplitter_global_timestamp: Verifying data consistency... Passed!\nDataSplitter_global_timestamp: DataReader: Movielens100K\n        Num items: 1682\n        Num users: 751\n        Train           interactions 79999,     density 6.33E-02\n        Validation      interactions 1535,      density 1.22E-03\n        Test            interactions 1418,      density 1.12E-03\n\nDataSplitter_global_timestamp: \n\nDataSplitter_global_timestamp: Done.\nChose IALSRecommender:random_25 for PRECISION_cut_10 with predicted value 0.015277831815183163\nIALSRecommender: URM Detected 66 (3.92 %) cold items.\nIALSRecommender: Epoch 1 of 300. Elapsed time 0.28 sec\nIALSRecommender: Epoch 2 of 300. Elapsed time 0.51 sec\nIALSRecommender: Epoch 3 of 300. Elapsed time 0.75 sec\n...\nIALSRecommender: Epoch 299 of 300. Elapsed time 1.33 min\nIALSRecommender: Epoch 300 of 300. Elapsed time 1.34 min\nIALSRecommender: Terminating at epoch 300. Elapsed time 1.34 min\nEvaluatorHoldout: Ignoring 69 (90.8%) Users that have less than 1 test interactions\nEvaluatorHoldout: Processed 69 (100.0%) in 0.04 sec. Users per second: 1653\n\n**************************************************\nDone training recommender. Summary:\nMetric to optimize: PRECISION_cut_10\nChosen algorithm: IALSRecommender:random_25\nPredicted performance: 0.015277831815183163\nActual performance: 0.013043478260869566\n**************************************************\n\nIALSRecommender: Saving model in file '../prec_10_IALSRecommender'\nIALSRecommender: Saving complete\n```\n\u003c/details\u003e\n\n## Citation \nPlease cite our work if you use code from this repo:\n```bibtex\n@inproceedings{reczilla-2022,\n  title={On the Generalizability and Predictability of Recommender Systems}, \n  author={McElfresh, Duncan and Khandagale, Sujay and Valverde, Jonathan and Dickerson, John P. and White, Colin}, \n  booktitle={Advances in Neural Information Processing Systems},\n  year={2022}, \n} \n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnaszilla%2Freczilla","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnaszilla%2Freczilla","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnaszilla%2Freczilla/lists"}