{"id":13574450,"url":"https://github.com/oneapi-src/product-recommendations","last_synced_at":"2025-04-04T15:31:00.402Z","repository":{"id":66145932,"uuid":"574715775","full_name":"oneapi-src/product-recommendations","owner":"oneapi-src","description":"AI Starter Kit for product recommendation system using Intel® Extension for Scikit-learn*","archived":true,"fork":false,"pushed_at":"2024-02-01T23:51:33.000Z","size":272,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-05T09:44:36.933Z","etag":null,"topics":["machine-learning","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oneapi-src.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-05T23:18:27.000Z","updated_at":"2024-04-12T21:19:41.000Z","dependencies_parsed_at":"2024-02-13T00:49:50.689Z","dependency_job_id":null,"html_url":"https://github.com/oneapi-src/product-recommendations","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fproduct-recommendations","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fproduct-recommendations/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fproduct-recommendations/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fproduct-recommendations/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oneapi-src","download_url":"https://codeload.github.com/oneapi-src/product-recommendations/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247202632,"owners_count":20900820,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","scikit-learn"],"created_at":"2024-08-01T15:00:51.710Z","updated_at":"2025-04-04T15:30:55.391Z","avatar_url":"https://github.com/oneapi-src.png","language":"Jupyter Notebook","readme":"PROJECT NOT UNDER ACTIVE MANAGEMENT\n\nThis project will no longer be maintained by Intel.\n\nIntel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.  \n\nIntel no longer accepts patches to this project.\n\nIf you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.  \n\nContact: webadmin@linux.intel.com\n# Product Recommendation\n\n## Introduction\n\nIn this reference kit, we demonstrate one way in which we can use Artificial Intelligence (AI) to design a Product Recommendation System for an e-commerce business.\n\nCheck out more workflow examples in the [Developer Catalog](https://developer.intel.com/aireferenceimplementations).\n\n## Solution Technical Overview\n\nWhen a new customer without any previous purchase history visits the e-commerce website for the first time and a business without any user-item purchase history, a product recommendation system will recommend the products based on the textual clustering analysis on the text given in the product description. Once, the customer makes a purchase, the product recommendation system updates and recommends other products based on the purchase history and ratings provided by other users on the website. Considering the journey of a new customer from the time the customer lands on the e-commerce website for the first time to when it makes repeat purchases, this reference kit can help e-commerce businesses to bring targeted products to customers using textual clustering analysis on the text given in the product description.\n\nThis reference kit solution extends to demonstrate the advantages of using the Intel® oneAPI AI Analytics Toolkit on the task of building a product recommendation system from product descriptions via cluster analysis.  The savings gained from using Intel® technologies can lead an analyst to more efficiently explore and understand customer archetypes, leading to better and more precise targeted solutions.\n\nLearn to use Intel's XPU hardware and Intel optimized software for a clustering algorithm with Scikit-learn, Intel® Extension for Scikit-learn and Intel® Distribution for Python*.\n\nIntel® Extension for Scikit-learn uses the Intel® oneAPI Data Analytics Library (oneDAL) to achieve its acceleration. This library enables all the latest vector instructions, such as the Intel® Advanced Vector Extensions (Intel AVX-512). It also uses cache-friendly data blocking, fast BLAS operations with the Intel® oneAPI Math Kernel Library (oneMKL), and scalable multithreading with the Intel® oneAPI Threading Building Blocks (oneTBB).\n\nThe experiment aimed to build a Product Recommendation System for the customers, in a scenario of a business without any user-item purchase history using an unsupervised learning algorithm. The goal is to train a clustering model (textual clustering analysis given in the product description). The algorithm used for clustering is k-means which allows creating product clustering and provides product recommendations from that cluster. We also focus on the below critical factors:\n\n- Faster model development\n- Performance efficient model inference and deployment mechanism.\n\nThe customer recommendation system has been built to recommend products based on textual clustering analysis of the text given in the product description.\nk-means clustering is an unsupervised learning algorithm, which groups the unlabeled dataset into different clusters. k-means aptly fits the Product Recommendation system in this specific case where we don't have prior user history and the only data available is the product description.\nFor the unsupervised clustering model, the product description dataset, which is text-based, has been converted to a sparse matrix using a Term Frequency-Inverse Document Frequency (TF-IDF) Vectorizer. In this stage, the feature of text type has been changed to numerical type for further analysis and prediction.\n\nThe following Intel® packages are being used for this project:\n\n- ***Intel® Distribution for Python****\n    The [Intel® Distribution for Python*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html#gs.52te4z) provides:\n  - Scalable performance using all available CPU cores on laptops, desktops, and powerful servers\n  - Support for the latest CPU instructions\n  - Near-native performance through acceleration of core numerical and machine learning packages with libraries like the Intel® oneAPI Math Kernel Library (oneMKL) and Intel® oneAPI Data Analytics Library\n  - Productivity tools for compiling Python code into optimized instructions\n  - Essential Python bindings for easing integration of Intel® native tools with your Python* project\n\n- ***Intel® Extension for Scikit-learn****\n   With [Intel® Extension for Scikit-learn](https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html) you can accelerate your Scikit-learn applications and still have full conformance with all Scikit-learn APIs and algorithms. This is a free software AI accelerator that brings over 10-100X acceleration across a variety of applications. And you do not even need to change the existing code!  \n\n## Solution Technical Details\n\nThe reference kit implementation is a reference solution to the described use case that includes:\n\n  1. A reference End to End (E2E) architecture to arrive at an AI solution with k-means from Scikit-learn\n  2. An Optimized reference E2E architecture enabled with Intel® Extension for Scikit-learn* available as part of Intel® oneAPI AI toolkit optimizations\n\n## Validated Hardware Details\n\nThere are workflow-specific hardware and software setup requirements depending on how the workflow is run.\nBare metal development system and jupyter notebooks have the same system requirements.\n\n| Recommended Hardware\n| ----------------------------\n| CPU: Intel® 2nd Gen Xeon® Platinum 8280 CPU @ 2.70GHz or higher\n| RAM: 187 GB\n| Recommended Free Disk Space: 20 GB or more\n\n- Operating system: Ubuntu\\* 22.04 LTS\n\n## How it Works\n\nThe following diagram describes the E2E workflow:\n![Use_case_flow](assets/e2e_flow_optimized.drawio.png)\n\n1. A list of product recommendations is provided as input.\n2. A clustering model us trained.\n3. Hyperparameters are tunned.\n4. Optimized inference is run to measure quality.\n5. A product recommendation is delivered as output.\n\nIn a realistic pipeline, this training process would follow the above `Use Case E2E flow` diagram, adding a human in the loop to determine the quality of the clustering solution from each of the saved models/predictions in the `saved_models` directory, or better, while tuning the model.  The quality of a clustering solution is highly dependent on the human analyst and they have the ability to not only tune hyper-parameters, but also modify the features being used to find better solutions.\n\nAs mentioned above, this Product recommendation system uses k-means from the Scikit-learn library to train an AI model and generate cluster labels for the passed-in data. This process is captured within the `run_benchmarks.py` script. This script *reads and preprocess the data*, and *performs training, predictions and hyperparameter tuning analysis on k-means*, while also reporting on the execution time for all the mentioned steps (we will use this information later when we are optimizing the implementation for Intel® architecture).  Furthermore, this script can also save each of the intermediate models/cluster labels for an in-depth analysis of the quality of fit.  \n\nExpected Input-Output:\n\n**Input**                                 | **Output** |\n| :---: | :---: |\n| Product Name        | List of product recommendations which is falling under the predicted cluster |\n\n**Example Input**                                 | **Example Output** |\n| :---: | :---: |\n|water | shower,water,faucet,valve,handle,easy,brass,drain,pressure,design |\n\nHyperparameters tuning is optional and can be enabled (detailed info will be provided later).\n\n## Get Started\n\nThe following variables could be adapted by the user and will be used during the E2E workflow.\n\n[//]: # (capture: baremetal)\n\n```bash\nexport WORKSPACE=$PWD/product-recommendations\n```\n\nDefine `DATA_DIR` and `OUTPUT_DIR` as follows:\n\n[//]: # (capture: baremetal)\n\n```bash\nexport DATA_DIR=$WORKSPACE/data\nexport OUTPUT_DIR=$WORKSPACE/output\n```\n\n### Download the Workflow Repository\n\nClone the Product Recommendation repository:\n\n[//]: # (capture: baremetal)\n\n```bash\nmkdir -p $WORKSPACE \u0026\u0026 cd $WORKSPACE\n```\n\n```bash\ngit clone https://github.com/oneapi-src/product-recommendations.git $WORKSPACE\n```\n\n### Set Up Conda\n\n1. Download the appropriate Miniconda Installer for linux.\n\n    ```bash\n    wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\n    ```\n\n2. In your terminal window, run.\n\n    ```bash\n    bash Miniconda3-latest-Linux-x86_64.sh\n    ```\n\n3. Delete downloaded file.\n\n    ```bash\n    rm Miniconda3-latest-Linux-x86_64.sh\n    ```\n\nTo learn more about conda installation, see the [Conda Linux installation instructions](https://docs.conda.io/projects/conda/en/stable/user-guide/install/linux.html).\n\n### Set Up Environment\n\nThe `$WORKSPACE/env/intel_env.yml` file contains all dependencies to create the intel environment necessary for running the workflow.\n\nExecute next command to create and activate the `product_recommendation_intel` conda environment.\n\n```bash\nconda install -n base conda-libmamba-solver\nconda config --set solver libmamba\nconda env create -f env/intel_env.yml -y\nconda activate product_recommendation_intel\n```\n\nEnvironment setup is required only once. This step does not cleanup the existing environment with the same name hence we need to make sure there is no conda environment with the same name.\nDuring this setup, `product_recommendation_intel` conda environment will be created with the dependencies listed in the YAML configuration.\n\n**YAML file**                                 | **Environment Name** |  **Configuration** |\n| :---: | :---: | :---: |\n`env/intel_env.yml`             | `product_recommendation_intel` | Python=3.10.x with Intel® Extension for Scikit-learn*  |\n\n### Download the Datasets\n\nA Kaggle* account is necessary to use the Kaggle* CLI.  Instructions can be found at [Kaggle* api website](https://github.com/Kaggle/kaggle-api).\n\nWithin this process, an `API Token File` will be created and as consequence, a json file named `kaggle.json` will be downloaded. That json file should be stored in a `.kaggle` folder that should be created by the user (usually in the home folder).\n\nIf you are behind a proxy, the `kaggle.json` file can be modified to add it. An example is shown as follows:\n\n```json\n{\"username\":\"your_user\",\"key\":\"your_key\",\"proxy\":\"your_proxy\"}\n```\n\n...where `your_user` and `your_key` were previously generated by Kaggle*. You should replace `your_proxy` with you proxy ip address.\n\nTo setup the data for benchmarking under these requirements, run the following set of commands:  \n\n\u003e Please see this data set's applicable license for terms and conditions. Intel Corporation does not own the rights to this data set and does not confer any rights to it.\n\n```bash\nmkdir -p $DATA_DIR\ncd $DATA_DIR\nkaggle datasets download -d PromptCloudHQ/flipkart-products\nunzip flipkart-products.zip -d flipkart-products-ecommerce\n```\n\nThe train-test split is 70:30.\n\n## Supported Runtime Environment\n\nYou can execute the references pipelines using the following environments:\n\n- [Bare Metal](#run-using-bare-metal)\n- [Jupyter Notebook](#run-using-jupyter-notebook)\n\n---\n\n### Run Using Bare Metal\n\n\u003eFollow these instructions to set up and run this workflow on your own development system.\n\nWith recommended hardware, it should take about 5 minutes from downloading data to get the final recommendations.\n\n#### Set Up System Software\n\n\u003eOur examples use the ``conda`` package and environment on your local computer. If you don't already have ``conda`` installed, go to [Set up conda](#set-up-conda) or see the [Conda Linux installation instructions](https://docs.conda.io/projects/conda/en/stable/user-guide/install/linux.html).\n\n#### Run Workflow\n\nCreate a folder called `saved_models` inside `OUTPUT_DIR` to save the trained models before the training script is run:\n\n[//]: # (capture: baremetal)\n\n```bash\nmkdir -p $OUTPUT_DIR/saved_models\n```\n\nThe script `run_benchmarks.py` takes the following arguments:\n\n```bash\nusage: run_benchmarks.py [-h][-d DATASET][-l LOGFILE][-t TUNNING][-mp MODELPATH]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -d DATASETSIZE, --dataset DATASETSIZE\n                        Size of the dataset\n  -l LOGFILE, --logfile LOGFILE\n                        Log file to output benchmarking results to\n  -t TUNNING, --tunning TUNING\n                        Hyper parameter tuning (0/1)\n  -mp MODELPATH --modelpath MODELPATH\n                        Model path for inference\n```\n\nAs an example of using this, we can run the following command to train and save `k-means` models.\n\n[//]: # (capture: baremetal)\n\n```sh\npython $WORKSPACE/src/run_benchmarks.py -d 1000\n```\n\nWe are training with 1k data size here. Similarly, one can try with 5k, 10k, 15k \u0026 20k.\n\nOutput Should be similar to this:\n\n```terminal\nIntel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)\nDEBUG:root:(100000, 2)\nDEBUG:root:(100000, 10)\nINFO:root:Data preparation time:9.813132762908936\nTop terms per cluster:\nCluster 0:\n cabinet\n vanity\n finish\n storage\n design\n easy\n faucet\n hardware\n wood\n sink\nINFO:root:Kmeans_training_time_without_Hyperparametertunning:0.16348862648010254\nSaving model..........\n```\n\nRunning Cluster Analysis/Predictions:\nTo run the batch and real time inference run the following command:\n\n[//]: # (capture: baremetal)\n\n```bash\npython $WORKSPACE/src/run_benchmarks.py -d 1000 -mp $OUTPUT_DIR/saved_models/prod_rec.joblib\n```\n\nHere we have tried inference with the trained model for batch size of 1k. Similarly one can try with other sizes like 1.5k \u0026 2k.\n\nInference output:\n\n```terminal\nRecommendations for :  cutting tool\nCluster 0:\n cm\n diwan\n cotton\n inch\n cover\n sheet\n details\n diamond\n features\n 40\nINFO:root:time taken for realtime recommendation:0.00015091896057128906\n```\n\nSee more information at [Expected Output](#expected-output)\n\nHyperparameter tuning:\nLoop Based Hyperparameter Tuning is used to apply fit method to train and optimize by applying different parameter values in loops to get the best Sihoutte score and thereby a better performing model.\n\nParameters Considered:\n\n| **Parameter** | **Description** | **Values**\n| :-- | :-- | :--\n| `n_clusters` | Number of clusters | 5, 10, 15, 20\n| `max_iter` | Max iteration value | 400, 450, 500, 550\n\nTo run Hyperparameter tuning with Intel® Distribution for Python* and Intel® technologies, we would run (after creating the appropriate environment as above):\n\n[//]: # (capture: baremetal)\n\n```bash\npython $WORKSPACE/src/run_benchmarks.py -d 1000 -t 1\n```\n\nWe are training with 1k data size here. Similarly, one can try with 5k, 10k, 15k \u0026 20k also.\n\n#### Clean Up Bare Metal\n\nFollow these steps to restore your $WORKSPACE directory to an initial step. Please note that all downloaded dataset files, conda environment, and logs created by workflow will be deleted. Before executing next steps back up your important files.\n\n```bash\nrm -rf $OUTPUT_DIR\nconda deactivate\nconda remove --name product_recommendation_intel --all -y\n```\n\nIf you want to remove all the repository, execute the following command:\n\n```bash\nrm -rf $WORKSPACE\n```\n\n---\n\n### Run Using Jupyter Notebook\n\nYou can directly access the Jupyter Notebook shared in this repo [here](./product_recommendation.ipynb).\n\n1. Follow the instructions described on [Get Started](#get-started) to set required environment variables.\n\nTo launch Jupyter Notebook, execute the next commands:\n\n1. Execute [Set Up Conda](#set-up-conda) and [Set Up Environment](#set-up-environment) steps.\n\n2. Activate Intel environment.\n\n    ```bash\n    conda activate product_recommendation_intel\n    ```\n\n3. Install the IPython Kernel Package.\n\n    ```bash\n    conda install -c conda-forge ipykernel -y\n    ```\n\n4. Create a virtual environment and Install Jupyter Notebook.\n\n    ```bash\n    conda create -n jupyter_server -c intel nb_conda_kernels notebook -y\n    ```\n\n5. Activate Jupyter Server environment.\n\n    ```bash\n    conda activate jupyter_server\n    ```\n\n6. Change to working directory.\n\n    ```bash\n    cd $WORKSPACE\n    ```\n\n7. Execute Jupyter command.\n\n    ```bash\n    jupyter notebook\n    ```\n\n#### Connect to Jupyter Notebook Server\n\nAbove command prints some information about the notebook server in your terminal, including the URL of the web application (by default, http://localhost:8888), for example:\n\n```terminal\nTo access the notebook, open this file in a browser: \nfile:///path/to/jupyter/notebook/server/open.html\nOr copy and paste one of these URLs: \nhttp://localhost:8888/?token=***************************************** \nor \nhttp://127.0.0.1:8888/?token=*****************************************\n```\n\nCopy and paste one of the URLs into a web browser to open the Jupyter Notebook Dashboard.\n\nOnce in Jupyter, click on **product_recommendation.ipynb** to get an interactive demo of the workflow.\n\n#### Clean Up Jupyter Notebook\n\nClean Bare Metal and Jupyter environments executing the following commands:\n\n```bash\nconda deactivate\nconda remove --name jupyter_server --all -y\nconda remove --name product_recommendation_intel --all -y\nrm -rf $OUTPUT_DIR\n```\n\nIf you want to remove all the repository, execute the following command:\n\n```bash\nrm -rf $WORKSPACE\n```\n\n---\n\n## Expected Output\n\nA successful execution of `python $WORKSPACE/src/run_benchmarks.py -d 1000` should return similar results as shown below:\n\n```terminal\nimport the intel sklearnex\nDEBUG:root:Loading intel libraries..\nIntel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)\n20000\n1000\nDEBUG:root:(1000, 15)\nDEBUG:root:(419, 15)\nDEBUG:root:(419, 10)\nINFO:root:Data preparation time:0.3751637935638428\nTop terms per cluster:\nCluster 0:\n jewellery\n nishtaa\n zirconia\n cubic\n ring\n silver\n kiara\n rhodium\n sterling\n clutch\nCluster 1:\n cm\n diwan\n sheet\n cover\n inch\n cotton\n 40\n cushion\n embroidered\n length\nCluster 2:\n cm\n details\n cotton\n diwan\n inch\n women\n fabric\n cover\n printed\n material\nCluster 3:\n mug\n ceramic\n akup\n mugs\n coffee\n mm\n 300\n ml\n quality\n safe\nCluster 4:\n shorts\n gym\n cycling\n solid\n details\n swim\n mynte\n women\n fabric\n dry\nCluster 5:\n kurta\n details\n straight\n women\n neck\n sleeve\n printed\n fabric\n round\n pattern\nCluster 6:\n ring\n diamond\n gold\n 18\n free\n cash\n shipping\n com\n genuine\n flipkart\nCluster 7:\n kiara\n rhodium\n zirconia\n cubic\n silver\n sterling\n jewellery\n ring\n guarantee\n cash\nCluster 8:\n pieces\n wearyourshine\n expert\n expressive\n pc\n newest\n keepsakes\n curation\n jeweller\n today\nCluster 9:\n clutch\n synthetic\n dressberry\n gold\n nishtaa\n black\n code\n chain\n strap\n secured\nCluster 10:\n diamond\n ring\n like\n solitaire\n solitana\n connoisseur\n marvel\n flaunt\n piece\n designer\nCluster 11:\n usb\n warranty\n cable\n charger\n furst\n battery\n adapter\n covered\n white\n service\nINFO:root:Kmeans_training_time_without_Hyperparametertunning:0.07413744926452637\nSaving model..........\n```\n\nA successful execution of `python $WORKSPACE/src/run_benchmarks.py -d 1000 -mp $OUTPUT_DIR/saved_models/prod_rec.joblib` should return similar results as shown below:\n\n```terminal\nimport the intel sklearnex\nDEBUG:root:Loading intel libraries..\nIntel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)\n20000\n1000\nDEBUG:root:(1000, 15)\nDEBUG:root:(419, 15)\nDEBUG:root:(419, 10)\nINFO:root:Data preparation time:0.3825080394744873\nwarm up in progress........\nTime Analysis for Batch Inference\ndataset size (419, 10)\nINFO:root:Time of Batch time recomendation:0.0003077983856201172\nINFO:root:Time of Batch time recomendation:0.0001919269561767578\nINFO:root:Time of Batch time recomendation:0.00016689300537109375\nINFO:root:Time of Batch time recomendation:0.0001590251922607422\nINFO:root:Time of Batch time recomendation:0.00015783309936523438\nINFO:root:Time of Batch time recomendation:0.00018978118896484375\nINFO:root:Time of Batch time recomendation:0.0001747608184814453\nINFO:root:Time of Batch time recomendation:0.0001678466796875\nINFO:root:Time of Batch time recomendation:0.0001628398895263672\nINFO:root:Time of Batch time recomendation:0.00015783309936523438\nINFO:root:Average Time of Batch time recomendation:0.00018365383148193358\nINFO:root:time taken for realtime recommendation:0.00016880035400390625\nRecommendations for :  cutting tool\nCluster 2:\n cm\n details\n cotton\n diwan\n inch\n women\n cover\n fabric\n printed\n sheet\nINFO:root:time taken for realtime recommendation:0.0001862049102783203\nRecommendations for :  spray paint\nCluster 2:\n cm\n details\n cotton\n diwan\n inch\n women\n cover\n fabric\n printed\n sheet\nINFO:root:time taken for realtime recommendation:0.0001609325408935547\nRecommendations for :  steel drill\nCluster 2:\n cm\n details\n cotton\n diwan\n inch\n women\n cover\n fabric\n printed\n sheet\nINFO:root:time taken for realtime recommendation:0.00016260147094726562\nRecommendations for :  water\nCluster 2:\n cm\n details\n cotton\n diwan\n inch\n women\n cover\n fabric\n printed\n sheet\nINFO:root:time taken for realtime recommendation:0.0001647472381591797\nRecommendations for :  powder\nCluster 2:\n cm\n details\n cotton\n diwan\n inch\n women\n cover\n fabric\n printed\n sheet\nINFO:root:Average Time of Real time recomendation:0.0001686573028564453\n```\n\nA successful execution of `python $WORKSPACE/src/run_benchmarks.py -d 1000 -t 1` should return similar results as shown below:\n\n```terminal\nimport the intel sklearnex\nDEBUG:root:Loading intel libraries..\nIntel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)\n20000\n1000\nDEBUG:root:(1000, 15)\nDEBUG:root:(419, 15)\nDEBUG:root:(419, 10)\nINFO:root:Data preparation time:0.3815338611602783\nNo.cluster 5 \nMax Iter 400\nsilhoutte score is : 0.3822014176971847\nSaving model!!! Best score is ---\u003e 0.3822014176971847\nNo.cluster 5 \nMax Iter 450\nsilhoutte score is : 0.3822014176971847\nNo.cluster 5 \nMax Iter 500\nsilhoutte score is : 0.3822014176971847\nNo.cluster 5 \nMax Iter 550\nsilhoutte score is : 0.3822014176971847\nNo.cluster 10 \nMax Iter 400\nsilhoutte score is : 0.5637014192791263\nSaving model!!! Best score is ---\u003e 0.5637014192791263\nNo.cluster 10 \nMax Iter 450\nsilhoutte score is : 0.5637014192791263\nNo.cluster 10 \nMax Iter 500\nsilhoutte score is : 0.5637014192791263\nNo.cluster 10 \nMax Iter 550\nsilhoutte score is : 0.5637014192791263\nNo.cluster 15 \nMax Iter 400\nsilhoutte score is : 0.5072029961921509\nNo.cluster 15 \nMax Iter 450\nsilhoutte score is : 0.5072029961921509\nNo.cluster 15 \nMax Iter 500\nsilhoutte score is : 0.5072029961921509\nNo.cluster 15 \nMax Iter 550\nsilhoutte score is : 0.5072029961921509\nNo.cluster 20 \nMax Iter 400\nsilhoutte score is : 0.5356860601413224\nNo.cluster 20 \nMax Iter 450\nsilhoutte score is : 0.5356860601413224\nNo.cluster 20 \nMax Iter 500\nsilhoutte score is : 0.5356860601413224\nNo.cluster 20 \nMax Iter 550\nsilhoutte score is : 0.5356860601413224\nINFO:root:Total fit and predict time taken during Hyperparameter Tuning in sec: 0.736302375793457\nHyperparameter Tuning has been executed successfully!!\nBest parameters=====\u003e n_clusters: 10    max_iter : 400\nINFO:root:Kmeans_training_time_with the best params:0.0380251407623291\n```\n\n## Summary and Next Steps\n\nCongratulations! You have successfully completed this workflow.\n\nAs clustering analysis is an exploratory task, an analyst will often run on different dataset of different sizes, resulting in different insights that they may use for decisions all from the same raw dataset.\n\nTo build a Product Recommendation System, Data Scientist will need to train models for substantial datasets and run inference more frequently. The ability to accelerate training will allow them to train more frequently and achieve better accuracy. Besides training, faster speed in inference will allow them to provide product recommendations in real-time scenarios as well as more frequently. This reference kit implementation provides performance-optimized guide around Product Recommendation System use cases that can be easily scaled across similar use cases.\n\n## Learn More\n\nFor more information about or to read about other relevant workflow examples, see these guides and software resources:\n\n- [Intel® AI Analytics Toolkit (AI Kit)](https://www.intel.com/content/www/us/en/developer/tools/oneapi/ai-analytics-toolkit.html)\n- [Intel® Distribution for Python*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-for-python.html)\n- [Intel® Extension for Scikit-learn*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html)\n\n## Support\n\nIf you have questions or issues about this use case, want help with troubleshooting, want to report a bug or submit enhancement requests, please submit a GitHub issue.\n\n## Appendix\n\nPlease see this data set's applicable license for terms and conditions. Intel®Corporation does not own the rights to this data set and does not confer any rights to it.\n\n\\*Other names and brands that may be claimed as the property of others. [Trademarks](https://www.intel.com/content/www/us/en/legal/trademarks.html).\n\nTo the extent that any public or non-Intel datasets or models are referenced by or accessed using tools or code on this site those datasets or models are provided by the third party indicated as the content source. Intel does not create the content and does not warrant its accuracy or quality. By accessing the public content, or using materials trained on or with such content, you agree to the terms associated with that content and that your use complies with the applicable license.\n\nIntel expressly disclaims the accuracy, adequacy, or completeness of any such public content, and is not liable for any errors, omissions, or defects in the content, or for any reliance on the content. Intel is not liable for any liability or damages relating to your use of public content.\n\nPerformance varies by use, configuration, and other factors. Learn more on the [Performance Index site](https://edc.intel.com/content/www/us/en/products/performance/benchmarks/overview/).\n\n© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.\n","funding_links":[],"categories":["Table of Contents"],"sub_categories":["AI - Frameworks and Toolkits"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foneapi-src%2Fproduct-recommendations","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foneapi-src%2Fproduct-recommendations","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foneapi-src%2Fproduct-recommendations/lists"}