{"id":20515409,"url":"https://github.com/snap-research/improving-inductive-oov-recsys","last_synced_at":"2026-04-20T01:31:42.185Z","repository":{"id":243308831,"uuid":"765918177","full_name":"snap-research/improving-inductive-oov-recsys","owner":"snap-research","description":"Improving Out-of-Vocabulary Handling in Recommendation Systems","archived":false,"fork":false,"pushed_at":"2024-06-18T18:21:50.000Z","size":12759,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-01-16T09:55:30.914Z","etag":null,"topics":["embedding","hashing","inductive-learning","ranking","recommendation-system","recommender-system","retrieval"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/snap-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-01T21:58:37.000Z","updated_at":"2024-10-24T23:35:27.000Z","dependencies_parsed_at":"2024-06-07T23:44:43.536Z","dependency_job_id":"a17e0a04-24fc-4935-9535-8aacf2ce1afb","html_url":"https://github.com/snap-research/improving-inductive-oov-recsys","commit_stats":null,"previous_names":["snap-research/improving-inductive-oov-recsys"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2Fimproving-inductive-oov-recsys","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2Fimproving-inductive-oov-recsys/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2Fimproving-inductive-oov-recsys/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/snap-research%2Fimproving-inductive-oov-recsys/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/snap-research","download_url":"https://codeload.github.com/snap-research/improving-inductive-oov-recsys/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242118503,"owners_count":20074588,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embedding","hashing","inductive-learning","ranking","recommendation-system","recommender-system","retrieval"],"created_at":"2024-11-15T21:21:33.753Z","updated_at":"2026-04-20T01:31:37.162Z","avatar_url":"https://github.com/snap-research.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Improving Out-of-Vocabulary Handling in Recommendation Systems\n\nThis repository contains the code for the paper [\"Improving Out-of-Vocabulary Handling in Recommendation Systems\"](https://arxiv.org/abs/2403.18280).\n\n## Setup\n\nFirst, install our local copy of RecBole. You will need to uninstall the original RecBole first (if already installed). (This framework is a major extension to RecBole https://github.com/RUCAIBox/RecBole. Recbole provides the basic framework to evaluate on recommendation system models with standard datasets. This framework enables inductive training and evaluation, as well as datasets.)\n\n```bash\npip uninstall recbole\ncd RecBole\npip install -e .\n```\n\nThen, you can run the code:\n\n```bash\ncd ../src\npython run_recbole.py --dataset [DATASET]\n```\n\nThe `run_recbole.py` script accepts the following general parameters:\n\n- `--dataset`: Mandatory. The dataset name to train and evaluate on.\n- `--model`: Mandatory. The model type. The currently supported types are `BPR`, `DirectAU`, `DCNV2`, `WideDeep`, and `xDeepFM`.\n- `--checkpoint_dir`: Directory to store the model weights.\n- `--embedding_size`: The embedding size to use.\n- `--train_batch_size`: The training batch size.\n- `--eval_batch_size`: The evaluation batch size.\n- `--weight_decay`: Weight decay to use during training (if any).\n- `--gcs_bucket_name`: The name of the GCS bucket you would like to write model weights to, if applicable. Note that you need write permissions on the bucket and should already be authenticated.\n- `--learning_rate`: The model learning rate. Defaults to \n- `--log_wandb`: Whether or not to log results to Weights and Biases.\n- `--model_eval_type`: The model evaluation metrics type - either `retrieval` or `ranking`.\n\nThere are also inductive-specific parameters that can be passed in:\n- `--add_oov_buckets`: Whether or not to add OOV buckets to the model. This should be true if using any trainable OOV methods.\n- `--inductive_embedder`: The inductive embedder to use. The currently supported types are `lsh`, `slsh`, `knn`, `dnn`, `dhe`, `fdhe`, `zero`, and `mean`.\n- `--inductive_mapper`: The inductive mapper to use. The only currently supported type is `random`.\n- `--inductive_eval`: Whether or not to perform inductive evaluation (versus training only)\n- `--user_oov_buckets`: The number of user OOV buckets to use.\n- `--item_oov_buckets`: The number of item OOV buckets to use.\n- `--oov_feature_mask_rate`: The rate at which to mask OOV features during training.\n- `--oov_freeze_embedding`: Whether or not to freeze IV embeddings during OOV training.\n- `--oov_freeze_skip_optim`: Whether or not to also freeze the optimizer parameters.\n- `--train_oov`: Whether or not to train OOV embeddings at all. Should be true for all embedders except for `zero` and `mean`.\n- `--oov_only_epoch`: Whether or not to split OOV samples out into its own epoch.\n- `--oov_train_ratio`: Ratio of IV samples used for OOV training at every epoch.\n- `--oov_normalization_type`: Feature normalization type. Can be one of three options: per-feature, global, none. Not implemented for all OOV embedders.\n\nNote that any empty parameters will use the default parameters found in `RecBole/recbole/properties/overall.yaml` unless overriden by model-specific or dataset-specific configuration files. You can also pass in any model-specific hyperparameters here, which can be found in the `RecBole/recbole/properties/model/MODEL_NAME.yaml` files.\n\n## Citation\n\nIf you use this code in your research, please cite the following paper:\n\n```tex\n@article{shiao2024improving,\n  title={Improving Out-of-Vocabulary Handling in Recommendation Systems},\n  author={Shiao, William and Ju, Mingxuan and Guo, Zhichun and Chen, Xin and Papalexakis, Evangelos and Zhao, Tong and Shah, Neil and Liu, Yozen},\n  journal={arXiv preprint arXiv:2403.18280},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnap-research%2Fimproving-inductive-oov-recsys","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnap-research%2Fimproving-inductive-oov-recsys","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnap-research%2Fimproving-inductive-oov-recsys/lists"}