{"id":19028806,"url":"https://github.com/nvidia-merlin/distributed-embeddings","last_synced_at":"2025-04-23T15:44:21.820Z","repository":{"id":41370526,"uuid":"468521878","full_name":"NVIDIA-Merlin/distributed-embeddings","owner":"NVIDIA-Merlin","description":"distributed-embeddings is a library for building large embedding based models in Tensorflow 2.","archived":false,"fork":false,"pushed_at":"2023-10-17T12:53:02.000Z","size":1874,"stargazers_count":44,"open_issues_count":8,"forks_count":12,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-04-18T00:57:59.320Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVIDIA-Merlin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-03-10T21:53:27.000Z","updated_at":"2025-04-12T14:40:35.000Z","dependencies_parsed_at":"2023-10-18T04:47:04.704Z","dependency_job_id":null,"html_url":"https://github.com/NVIDIA-Merlin/distributed-embeddings","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-Merlin%2Fdistributed-embeddings","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-Merlin%2Fdistributed-embeddings/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-Merlin%2Fdistributed-embeddings/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-Merlin%2Fdistributed-embeddings/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVIDIA-Merlin","download_url":"https://codeload.github.com/NVIDIA-Merlin/distributed-embeddings/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250463121,"owners_count":21434725,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T21:12:21.066Z","updated_at":"2025-04-23T15:44:21.795Z","avatar_url":"https://github.com/NVIDIA-Merlin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [Distributed Embeddings](https://github.com/NVIDIA-Merlin/distributed-embeddings)\n\n[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/distributed-embeddings/Introduction.html)\n[![LICENSE](https://img.shields.io/github/license/NVIDIA-Merlin/NVTabular)](https://github.com/NVIDIA-Merlin/distributed-embeddingsb/blob/main/LICENSE)\n\ndistributed-embeddings is a library for building large embedding based (e.g. recommender) models in Tensorflow 2. It provides a scalable model parallel wrapper that automatically distribute embedding tables to multiple GPUs, as well as efficient embedding operations that cover and extend Tensorflow's embedding functionalities.\n\nRefer to [NVIDIA Developer blog](https://developer.nvidia.com/blog/fast-terabyte-scale-recommender-training-made-easy-with-nvidia-merlin-distributed-embeddings/) about Terabyte-scale Recommender Training for more details.\n\n## Features\n\n### Distributed Model Parallel Wrappers\n`dist_model_parallel` contain tools to enable model parallel training by changing only three lines of your script. It can also be used alongside data parallel to form hybrid parallel training. Users can easily experiment large scale embeddings beyond single GPU's memory capacity without complex code to handle cross-worker communication.\n\nTo start model parallel, simply wrap a list of keras Embedding layers with `dist_model_parallel.DistributedEmbedding`\n\n### Embedding Layers\n\n`distributed_embeddings.Embedding` combines functionalities of `tf.keras.layers.Embedding` and `tf.nn.embedding_lookup_sparse` under a unified Keras layer API. The backend is designed to achieve high GPU efficiency.\n\n### Input Key Mapping with IntergerLookup Layers\n\n`distributed_embeddings.IntegerLookup` extends `tf.keras.layers.IntegerLookup`'s functionality with on-the-fly vocabulary building. This allow user to start training directly from input keys without offline preprocessing. A highly optimized GPU backend is along with CPU support.\n\n**See more details at [User Guide](https://nvidia-merlin.github.io/distributed-embeddings/userguide.html)**\n\n## Installation\n### Requirements\nPython 3, CUDA 11 or newer, TensorFlow 2\n### Containers ###\nYou can build inside 22.03 or later NGC TF2 [image](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow):\n\nNote: horovod v0.27 and TensorFlow 2.10, alternatively NGC 23.03 container, is required for building v0.3+\n```bash\ndocker pull nvcr.io/nvidia/tensorflow:23.06-tf2-py3\n```\n### Build from source\n\nAfter clone this repository, run:\n```bash\ngit submodule update --init --recursive\nmake pip_pkg \u0026\u0026 pip install artifacts/*.whl\n```\nTest installation with:\n```python\npython -c \"import distributed_embeddings\"\n```\nYou can also run [Synthetic](https://github.com/NVIDIA-Merlin/distributed-embeddings/tree/main/examples/benchmarks/synthetic_models) and [DLRM](https://github.com/NVIDIA-Merlin/distributed-embeddings/blob/main/examples/dlrm/main.py) examples.\n\n## Feedback and Support\n\nIf you'd like to contribute to the library directly, see the [CONTRIBUTING.md](https://github.com/NVIDIA-Merlin/distributed-embeddings/blob/main/CONTRIBUTING.md). We're particularly interested in contributions or feature requests for our feature engineering and preprocessing operations. To further advance our Merlin Roadmap, we encourage you to share all the details regarding your recommender system pipeline in this [survey](https://developer.nvidia.com/merlin-devzone-survey).\n\nIf you're interested in learning more about how distributed-embeddings works, see [documentation]( https://nvidia-merlin.github.io/distributed-embeddings/Introduction.html).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia-merlin%2Fdistributed-embeddings","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnvidia-merlin%2Fdistributed-embeddings","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia-merlin%2Fdistributed-embeddings/lists"}