{"id":14081266,"url":"https://github.com/NVIDIA-Merlin/Merlin","last_synced_at":"2025-07-30T19:32:36.310Z","repository":{"id":37076884,"uuid":"353169072","full_name":"NVIDIA-Merlin/Merlin","owner":"NVIDIA-Merlin","description":"NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.","archived":false,"fork":false,"pushed_at":"2024-07-28T01:04:48.000Z","size":40539,"stargazers_count":780,"open_issues_count":213,"forks_count":118,"subscribers_count":34,"default_branch":"main","last_synced_at":"2024-11-29T23:15:41.757Z","etag":null,"topics":["deep-learning","end-to-end","gpu-acceleration","machine-learning","recommendation-system","recommender-system"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVIDIA-Merlin.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-30T23:35:26.000Z","updated_at":"2024-11-29T10:26:02.000Z","dependencies_parsed_at":"2023-10-01T02:23:21.899Z","dependency_job_id":"9e408e46-f96a-4191-946d-cf5ee1a42051","html_url":"https://github.com/NVIDIA-Merlin/Merlin","commit_stats":null,"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-Merlin%2FMerlin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-Merlin%2FMerlin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-Merlin%2FMerlin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA-Merlin%2FMerlin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVIDIA-Merlin","download_url":"https://codeload.github.com/NVIDIA-Merlin/Merlin/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228178909,"owners_count":17881107,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","end-to-end","gpu-acceleration","machine-learning","recommendation-system","recommender-system"],"created_at":"2024-08-13T13:00:36.405Z","updated_at":"2024-12-04T19:32:05.605Z","avatar_url":"https://github.com/NVIDIA-Merlin.png","language":"Python","funding_links":[],"categories":["Industry Strength Recommender System"],"sub_categories":[],"readme":"# [NVIDIA Merlin](https://github.com/NVIDIA-Merlin)\n\n![GitHub tag (latest SemVer)](https://img.shields.io/github/v/tag/NVIDIA-Merlin/Merlin?sort=semver)\n![GitHub License](https://img.shields.io/github/license/NVIDIA-Merlin/Merlin)\n[![Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/Merlin/stable/README.html)\n\nNVIDIA Merlin is an open source library that accelerates recommender systems on\nNVIDIA GPUs. The library enables data scientists, machine learning engineers,\nand researchers to build high-performing recommenders at scale. Merlin includes\ntools to address common feature engineering, training, and inference challenges.\nEach stage of the Merlin pipeline is optimized to support hundreds of terabytes\nof data, which is all accessible through easy-to-use APIs. For more information,\nsee [NVIDIA Merlin](https://developer.nvidia.com/nvidia-merlin) on the NVIDIA\ndeveloper web site.\n\n## Benefits\n\nNVIDIA Merlin is a scalable and GPU-accelerated solution, making it easy to\nbuild recommender systems from end to end. With NVIDIA Merlin, you can:\n\n- Transform data (ETL) for preprocessing and engineering features.\n- Accelerate your existing training pipelines in TensorFlow, PyTorch, or FastAI\n  by leveraging optimized, custom-built data loaders.\n- Scale large deep learning recommender models by distributing large embedding\n  tables that exceed available GPU and CPU memory.\n- Deploy data transformations and trained models to production with only a few\n  lines of code.\n\n## Components of NVIDIA Merlin\n\nNVIDIA Merlin consists of the following open source libraries:\n\n**[NVTabular](https://github.com/NVIDIA-Merlin/NVTabular)**\n[![PyPI version shields.io](https://img.shields.io/pypi/v/nvtabular.svg)](https://pypi.org/project/nvtabular/)\n[![ Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/NVTabular/stable/Introduction.html)\n\u003cbr\u003e NVTabular is a feature engineering and preprocessing library for tabular\ndata. The library can quickly and easily manipulate terabyte-size datasets that\nare used to train deep learning based recommender systems. The library offers a\nhigh-level API that can define complex data transformation workflows. With\nNVTabular, you can:\n\n- Prepare datasets quickly and easily for experimentation so that you can train\n  more models.\n- Process datasets that exceed GPU and CPU memory without having to worry about\n  scale.\n- Focus on what to do with the data and not how to do it by using abstraction at\n  the operation level.\n\n**[HugeCTR](https://github.com/NVIDIA-Merlin/HugeCTR)**\n[![ Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/HugeCTR/stable/hugectr_user_guide.html)\u003cbr\u003e\nHugeCTR is a GPU-accelerated training framework that can scale large deep learning\nrecommendation models by distributing training across multiple GPUs and nodes.\nHugeCTR contains optimized data loaders with GPU-acceleration and provides\nstrategies for scaling large embedding tables beyond available memory. With\nHugeCTR, you can:\n\n- Scale embedding tables over multiple GPUs or nodes.\n- Load a subset of an embedding table into a GPU in a coarse-grained, on-demand\n  manner during the training stage.\n\n**[Merlin Models](https://github.com/NVIDIA-Merlin/models)**\n[![PyPI version shields.io](https://img.shields.io/pypi/v/merlin-models.svg)](https://pypi.org/project/merlin-models/)\n[![ Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/models/stable/README.html)\u003cbr\u003e\nThe Merlin Models library provides standard models for recommender systems with\nan aim for high-quality implementations that range from classic machine learning\nmodels to highly-advanced deep learning models. With Merlin Models, you can:\n\n- Accelerate your ranking model training by up to 10x by using performant data\n  loaders for TensorFlow, PyTorch, and HugeCTR.\n- Iterate rapidly on featuring engineering and model exploration by mapping\n  datasets created with NVTabular into a model input layer automatically. The\n  model input layer enables you to change either without impacting the other.\n- Assemble connectable building blocks for common RecSys architectures so that\n  you can create of new models quickly and easily.\n\n**[Transformers4Rec](https://github.com/NVIDIA-Merlin/Transformers4Rec)**\n[![PyPI version shields.io](https://img.shields.io/pypi/v/Transformers4Rec.svg)](https://pypi.org/project/Transformers4Rec/)\n[![ Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/Transformers4Rec/stable/README.html)\u003cbr\u003e\nThe Transformers4Rec library provides sequential and session-based recommendation.\nThe library provides modular building blocks that are compatible with standard PyTorch modules.\nYou can use the building blocks to design custom architectures such as multiple towers, multiple heads and tasks, and losses.\nWith Transformers4Rec, you can:\n\n- Build sequential and session-based recommenders from any sequential tabular data.\n- Take advantage of the integration with NVTabular for seamless data preprocessing and feature engineering.\n- Perform next-item prediction as well as classic binary classification or regression tasks.\n\n**[Merlin Systems](https://github.com/NVIDIA-Merlin/systems)**\n[![PyPI version shields.io](https://img.shields.io/pypi/v/merlin-systems.svg)](https://pypi.org/project/merlin-systems/)\n[![ Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/systems/stable/README.html)\u003cbr\u003e\nMerlin Systems provides tools for combining recommendation models with other\nelements of production recommender systems like feature stores, nearest neighbor\nsearch, and exploration strategies into end-to-end recommendation pipelines that\ncan be served with Triton Inference Server. With Merlin Systems, you can:\n\n- Start with an integrated platform for serving recommendations built on Triton\n  Inference Server.\n- Create graphs that define the end-to-end process of generating\n  recommendations.\n- Benefit from existing integrations with popular tools that are commonly found\n  in recommender system pipelines.\n\n**[Merlin Core](https://github.com/NVIDIA-Merlin/core)**\n[![PyPI version shields.io](https://img.shields.io/pypi/v/merlin-core.svg)](https://pypi.org/project/merlin-core/)\n[![ Documentation](https://img.shields.io/badge/documentation-blue.svg)](https://nvidia-merlin.github.io/core/stable/README.html)\u003cbr\u003e\nMerlin Core provides functionality that is used throughout the Merlin ecosystem.\nWith Merlin Core, you can:\n\n- Use a standard dataset abstraction for processing large datasets across\n  multiple GPUs and nodes.\n- Benefit from a common schema that identifies key dataset features and enables\n  Merlin to automate routine modeling and serving tasks.\n- Simplify your code by using a shared API for constructing graphs of data\n  transformation operators.\n\n## Installation\n\nThe simplest way to use Merlin is to run a docker container. NVIDIA GPU Cloud (NGC) provides containers that include all the Merlin component libraries, dependencies, and receive unit and integration testing. For more information, see the [Containers](https://nvidia-merlin.github.io/Merlin/stable/containers.html) page.\n\nTo develop and contribute to Merlin, review the installation documentation for each component library. The development environment for each Merlin component is easily set up with `conda` or `pip`:\n\n| Component        | Installation Steps                                                                 |\n| ---------------- | ---------------------------------------------------------------------------------- |\n| HugeCTR          | https://nvidia-merlin.github.io/HugeCTR/master/hugectr_contributor_guide.html      |\n| Merlin Core      | https://github.com/NVIDIA-Merlin/core/blob/stable/README.md#installation             |\n| Merlin Models    | https://github.com/NVIDIA-Merlin/models/blob/stable/README.md#installation           |\n| Merlin Systems   | https://github.com/NVIDIA-Merlin/systems/blob/stable/README.md#installation          |\n| NVTabular        | https://github.com/NVIDIA-Merlin/NVTabular/blob/stable/README.md#installation        |\n| Transformers4Rec | https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/README.md#installation |\n\n## Example Notebooks and Tutorials\n\nA collection of [end-to-end examples](./examples/) are available in the form of Jupyter notebooks.\nThe example notebooks demonstrate how to:\n\n- Download and prepare a dataset.\n- Use preprocessing and engineering features.\n- Train deep-learning recommendation models with TensorFlow, PyTorch, FastAI, HugeCTR or Merlin Models.\n- Deploy the models to production with Triton Inference Server.\n\nThese examples are based on different datasets and provide a wide range of\nreal-world use cases.\n\n## Merlin Is Built On\n\n**[RAPIDS cuDF](https://github.com/rapidsai/cudf)**\u003cbr\u003e Merlin relies on cuDF for\nGPU-accelerated DataFrame operations used in feature engineering.\n\n**[Dask](https://www.dask.org/)**\u003cbr\u003e Merlin relies on Dask to distribute and scale\nfeature engineering and preprocessing within NVTabular and to accelerate\ndataloading in Merlin Models and HugeCTR.\n\n**[Triton Inference Server](https://github.com/triton-inference-server/server)**\u003cbr\u003e\nMerlin leverages Triton Inference Server to provide GPU-accelerated serving for\nrecommender system pipelines.\n\n## Feedback and Support\n\nTo report bugs or get help, please\n[open an issue](https://github.com/NVIDIA-Merlin/Merlin/issues/new/choose).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVIDIA-Merlin%2FMerlin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNVIDIA-Merlin%2FMerlin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNVIDIA-Merlin%2FMerlin/lists"}