{"id":13574331,"url":"https://github.com/oneapi-src/digital-twin","last_synced_at":"2025-04-04T14:32:24.978Z","repository":{"id":66145922,"uuid":"536270336","full_name":"oneapi-src/digital-twin","owner":"oneapi-src","description":"AI Starter Kit to build a MOSFET Digital Twin for Design Exploration using Intel® optimized version of XGBoost","archived":true,"fork":false,"pushed_at":"2024-02-01T23:53:39.000Z","size":286,"stargazers_count":0,"open_issues_count":2,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-11-05T09:44:15.240Z","etag":null,"topics":["machine-learning","xgboost"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oneapi-src.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-13T19:00:13.000Z","updated_at":"2024-04-08T18:22:08.000Z","dependencies_parsed_at":"2024-02-13T00:49:54.815Z","dependency_job_id":null,"html_url":"https://github.com/oneapi-src/digital-twin","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fdigital-twin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fdigital-twin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fdigital-twin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oneapi-src%2Fdigital-twin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oneapi-src","download_url":"https://codeload.github.com/oneapi-src/digital-twin/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247194342,"owners_count":20899466,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","xgboost"],"created_at":"2024-08-01T15:00:50.538Z","updated_at":"2025-04-04T14:32:24.564Z","avatar_url":"https://github.com/oneapi-src.png","language":"Python","readme":"PROJECT NOT UNDER ACTIVE MANAGEMENT\n\nThis project will no longer be maintained by Intel.\n\nIntel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project.  \n\nIntel no longer accepts patches to this project.\n\nIf you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the open source software community, please create your own fork of this project.  \n\nContact: webadmin@linux.intel.com\n# **Digital Twin**\r\n\r\n## Introduction\r\n\r\nThis case uses Intel® Optimized version of XGBoost* to achieve fast traing and inference times, converts a gradient boosting model to a daal4py version included inside Intel® Extension for Scikit-Learn* and enable inference performance acceleartion.\r\nWith this use case you will learn to use Intel® tools to build a Digital Twin model which reflects the response (leakage current) of a Metal-Oxide Substrate Field Effect Transistors (MOSFETs) based on the voltage received (gate) for design exploration purposes helping saving cost compared with normal physical experimentation. Visit [Developer Catalog](https://developer.intel.com/aireferenceimplementations) for more workflow examples. \r\n\r\n\r\n\u003c!--is this hidden?--\u003e\r\n## **Contents**\r\n\r\n- [**Building a MOSFET Digital Twin for Design Exploration: Modeling Sub-threshold Voltage Leakage Current using XGBoostRegressor**](#building-a-mosfet-digital-twin-for-design-exploration-modeling-sub-threshold-voltage-leakage-current-using-xgboostregressor)\r\n  - [Introduction](#introduction)\r\n  - [**Contents**](#contents)\r\n  - [Solution Technical Overview](#solution-technical-overview)\r\n  - [Solution Technical Details](#solution-technical-details)\r\n    - [Task 1: Generate Synthetic Data](#task-1-generate-synthetic-data)\r\n    - [Task 2: Training](#task-2-training)\r\n    - [Task 3: Tuning](#task-3-tuning)\r\n    - [Task 4: Semi-supervised Learning](#task-4-semi-supervised-learning)\r\n    - [Task 5: Prediction](#task-5-prediction)\r\n  - [Validated Hardware Details](#validated-hardware-details)\r\n  - [How it works](#how-it-works)\r\n  - [Get Started](#get-started)\r\n    - [Environment variables](#environment-variables)\r\n    - [Download the Workflow Repository](#download-the-workflow-repository)\r\n    - [Set Up Conda](#set-up-conda)\r\n    - [Set Up Environment](#set-up-environment)\r\n  - [Supported Runtime Environment](#supported-runtime-environment)\r\n  - [Run using Bare Metal](#run-using-bare-metal)\r\n    - [Clean Up Bare Metal](#clean-up-bare-metal)\r\n  - [Expected output](#expected-output)\r\n  - [Summary and next steps](#summary-and-next-steps)\r\n    - [How to customize this use case](#how-to-customize-this-use-case)\r\n    - [Adopt to your dataset](#adopt-to-your-dataset)\r\n  - [Learn More](#learn-more)\r\n  - [Support](#support)\r\n  - [Appendix](#appendix)\r\n    - [About This Use Case](#about-this-use-case)\r\n    - [References](#references)\r\n\r\n\r\n## Solution Technical Overview\r\n\r\nA Digital Twin ([1],[2]) is a virtual model designed to accurately reflect a physical object behaviour during its lifecycle, it can be updated with real-time data, machine learning and simulation. For the creation of a Digital Twin the object in question is outfitted with various sensors located in vital areas of functionality, this areas are defined according to the impact the information has with the desired output of the studied object. \r\nExamples of data produced by the sensors are temperature, humidity, pressure, distance, voltage, current, resistance, etc. \r\nOnce the data is studied and analyzed, it can fed a virtual model to run simulations, study critical behaviours, experimental optimizitations and provide valuable insights to be applied to the original physical object from response to the input variables or conditions.\r\nThis mere definition of a Digital Twin has impact in many areas of study for different types of industries due to the low cost compared with having a real physical twin object to perform tests which may cause the object to stop working or even cause catasthropic reactions. Digital Twins can also predict the lifespan of the object under certain conditions with predictive analytics, support the maintanance methods for it and manage complex connections within systems of systems. \r\n\r\n\r\nFor this reference kit we have chosen to model the behavior of Metal-Oxide Substrate Field Effect Transistors (MOSFETs), which are commonly used in consumer electronics and power devices. For MOSFETs the \"leakage current\" is a key indicator of performance. Hence understanding how the leakage current varies as a function of the input conditions is critical. \r\n\r\nThe device includes three components (source, drain and gate). The source-drain current is a function of the operating gate voltage, $v_{gs}$ and the threshold voltage $v_{th}$. The ideal switching characteristic of MOSFET is such that if the gate-source voltage exceeds the specified threshold voltage, the MOSFET is in the ON state. Otherwise, the device is in the OFF state, and the source drain current should be zero. However, in real applications there is always a leakage current because of several factors. The leakage current can be estimated through analytical equations which do not take into account statistical noise or testing which is often expensive. \r\n\r\nA Machine Learning (ML) solution or an ML-powered MOSFET Digital Twin can be a valuable substitute which will predict leakage current from input values which include $v_{th}$ and $v_{gs}$. Initial $v_{gs}$ and $v_{th}$ and **leakage_current** data can be collected on millions of MOSFETs. An ML model can be built using this data and can be continuously updated as more data is populated. Essentially this \"model\" can serve as a digital twin and substitute expensive testing/experimentation. Calculating the sub-threshold leakage of multiple MOSFETs for several voltage levels can help optimize manufacturing as well as monitor performance in the field.\r\n\r\nIn addition, this use case uses Intel® tools to speed the whole pipeline, which will be briefly described below, if you want to go directly to the links for each one of the Intel® tools described go to [Learn More](#learn-more) section. \r\n  \r\nScikit-learn* (often referred to as sklearn) is a Python* module for machine learning. Intel® Extension for Scikit-learn* seamlessly speeds up your scikit-learn applications for  Intel® CPUs and GPUs across single- and multi-node configurations. This extension package dynamically patches scikit-learn estimators while improving performance for your machine learning algorithms.\r\n\r\nThe extension is part of the Intel® AI Analytics Toolkit (AI Kit) that provides flexibility to use machine learning tools with your existing AI packages.\r\n\r\nXGBoost* is an open source gradient boosting machine learning library. It performs well across a variety of data and problem types, so it often pushes the limits of compute resources.\r\nUsing  XGBoost* on  Intel® CPUs takes advantage of software accelerations powered by oneAPI, without requiring any code changes. Software optimizations deliver the maximum performance for your existing hardware. This enables faster iterations during development and training, and lower latency during inference.. **Please keep in mind** that to train an  XGBoost* model using  Intel® optimizations, the 'tree_method' parameter should be set to 'hist'. \r\n\r\nModin* is a drop-in replacement for pandas, enabling data scientists to scale to distributed DataFrame processing without having to change API code. Intel® Distribution of Modin* adds optimizations to further accelerate processing on Intel® hardware.\r\n\r\ndaal4py included in Intel® oneAPI Data Analytics Library (oneDAL)* as part of the Intel® Extension for Scikit-learn*, is an easy-to-use Python* API  that provides superior performance for your machine learning algorithms and frameworks. Designed for data scientists, it provides a simple way to utilize powerful Intel® DAAL machine learning algorithms in a flexible and customizable manner. For scaling capabilities, daal4py also provides you the option to process and analyze data via batch, streaming, or distributed processing modes, allowing you to choose the option to best fit your system's needs.\r\n\r\nFor more details, visit the [Building a MOSFET Digital Twin for Design Exploration: Modeling Sub-threshold Voltage Leakage Current using XGBoostRegressor](https://github.com/oneapi-src/digital-twin) GitHub repository.\r\n\r\n## Solution Technical Details\r\n\r\nA schematic of the proposed reference architecture is shown in the following figure. The portion of the diagram enclosed in the red dashed line is the section of the workload for generating the synthetic data. The dashed green line section corresponds to the XGBoost* optimization process.\r\n\r\ndaal4py's speedy frameworks are best known as a way to accelerate machine learning algorithms from Scikit-Learn*, however, this guide provides you with the information to use the daal4py algorithms directly.\r\n\r\n![e2e-flow](assets/e2e-flow-diagram.png)\r\n\r\n### Task 1: Generate Synthetic Data\r\nThe main data generator script is located in (src/utils/synthetic_datagen.py)\r\nThe following figure describes how the leakage current is calculated from voltage values and other parameters.\r\n\r\n![data_gen](assets/data-gen-flow-diagram.png)\r\n\r\n### Task 2: Training\r\n\r\nThe proposed reference solution is built primarily using an  XGBoost*   Regressor. However, separate experiments were conducted using a Linear Regressor to serve as a reference point for Mean Squared Error (MSE) values and confirm that  XGBoost* outperforms a simple Linear Regressor.\r\n\r\n### Task 3: Tuning\r\nThe Hyperparameter tunning happens during the training phase by getting the best parameters and best estimator from the [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) function from [Intel® Extension for Scikit-learn*](https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html), the estimator will return the highest score (or smallest loss if specified) on the left out data and the best parameters will gave the best results on the hold out data.\r\n\r\n### Task 4: Semi-supervised Learning\r\nIn addition to the standard training/hyperparameter tuning/prediction sections of an ML pipeline, we will also build a semi-supervised learning component to enable continuous learning. Here we will start by training a conventional ML model and then use it to create pseudo-response values for non-training, synthetic data. Both the original and synthetic pseudo-response data will be combined and used to train a \"semi-supervised\" model.\r\n\r\nThis process can continue iteratively to _simulate_ self-learning - similar to a digital twin - from influx of \"fresh\" data from devices. The model.pkl file is the XGBRegressor model which will be stored to be later used for inferencing. \r\n\r\n### Task 5: Prediction\r\nOnce the development exercise is complete, the final model can then be deployed into production and used as a digital replica of a MOSFET device for simulating leakage behavior of a real device OR can be used as one of the componets to build a more complex Digital Twin system.\r\n\r\n\r\n\r\n## Validated Hardware Details\r\n\r\n| Recommended Hardware | Precision |\r\n| -------------- | -----------------|\r\n| Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz with 187 GB de RAM | FP32 |\r\n| RAM: 187 GB | |\r\n|  Recommended Free Disk Space: 22 GB or more | |\r\n\r\nCode was tested on Ubuntu* 22.04 LTS.\r\n\r\n## How it works\r\nThe workflow below represents the end to end process for this use case within the scripts that will be used in [get started section](#get-started).\r\n\r\n![main-workflow](assets/how-it-works.png)\r\n\r\nThis workflow gives a first approach model to MOSFETs behaviour using a synthetic dataset, if you want to use this use case pipeline in a production environment is critical to change the synthetic dataset for a pre analyzed dataset with data gathered from your productive environment. \r\n\r\n\r\n## Get Started\r\nDefine the environment variables that will store the path to your desired workspace folder, this variables will be referenced in next steps for an easy go through experience. \r\n### Environment variables\r\n\r\n**WORKSPACE:**\r\nPath where the current repository be cloned in next steps. \\\r\n**DATA_DIR:**\r\nPath where the dataset must be placed. \\\r\n**OUTPUT_DIR:**\r\nPath where the pipeline logs will be saved. \r\n\r\n[//]: # (capture: baremetal)\r\n``` bash\r\nexport WORKSPACE=$PWD/digital-twin\r\nexport DATA_DIR=$WORKSPACE/data\r\nexport OUTPUT_DIR=$WORKSPACE/logs\r\n```\r\n\r\n### Download the Workflow Repository\r\nCreate the workspace directory and clone the [Workflow Repository](https://github.com/oneapi-src/digital-twin) into the ```WORKSPACE``` path. \r\n\r\n[//]: # (capture: baremetal)\r\n``` bash\r\nmkdir -p $WORKSPACE \u0026\u0026 cd $WORKSPACE\r\n```\r\n```bash\r\ngit clone https://github.com/oneapi-src/digital-twin.git $WORKSPACE\r\n```\r\n[//]: # (capture: baremetal)\r\n```bash\r\nmkdir -p $DATA_DIR/models\r\nmkdir -p $OUTPUT_DIR\r\n```\r\n### Set Up Conda\r\nReffer to [Conda Installing on Linux](https://docs.anaconda.com/free/anaconda/install/linux/) for more details. \r\n``` bash\r\nwget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\r\nbash Miniconda3-latest-Linux-x86_64.sh\r\n```\r\n### Set Up Environment\r\nThis reference kit uses libmamba solver for fast environment creation. The dependencies file is located in [$WORKSPACE/env/intel_env.yml](env/intel_env.yml). \r\n| Packages | Version | \r\n| -------- | ------- |\r\n| python | 3.10 |\r\n| intelpython3_full | 2024.0.0 |\r\n| modin-all | 0.24.1 | \r\n\r\nSuggested libmamba setup\r\n``` bash\r\nconda install -n base conda-libmamba-solver\r\nconda config --set solver libmamba\r\n```\r\nEnvironment creation\r\n``` bash\r\nconda env create -f $WORKSPACE/env/intel_env.yml\r\nconda activate digital_twin_intel\r\n```\r\n***Note:***\r\n\r\nThe environment must be set just once time, the environment must include the dependencies listed above. To list your environments  use ```conda env list``` or ```conda info -e``` .\r\n## Supported Runtime Environment\r\nThe execution of the reference kit is compatible with the following environments: \r\n- Bare Metal\r\n\r\n\r\n## Run using Bare Metal\r\nBefore running the following steps, make sure your environment is complete according the [Get Started Section](#get-started). \\\r\nRequirements: \r\n- [Conda installation](#set-up-conda)\r\n- [Use case environment ready](#set-up-environment)\r\n\r\n**Run Workflow**\r\n\r\n\r\nGo to the WORKSPACE directory \r\n\r\n[//]: # (capture: baremetal)\r\n``` bash\r\ncd $WORKSPACE\r\n```\r\nFor the pipeline to run, make sure to have the use case environment activated: \r\n\r\n\r\n``` bash\r\nconda activate digital_twin_intel\r\n```\r\nOnce we create and activate the virtual environment, we can run the benchmarks for evaluating performance gain. The training and inference portion of benchmarking can be run using the python script `MOSFET_main.py`. [How this script works?](#how-it-works)\r\n\r\n\r\nThe run benchmark script takes the following arguments:\r\n\r\n```bash\r\nusage: MOSFET_main.py [-h] [-l LOGFILE] -m MODEL [-mf MODELFILE] [-n N_DATA_LEN] [-d DATA_PATH] [-x [X_COLS ...]]\r\n                      [-y Y_COL]\r\n\r\noptional arguments:\r\n  -h, --help            show this help message and exit\r\n  -l LOGFILE, --logfile LOGFILE\r\n                        log file to output benchmarking results to\r\n  -m MODEL, --model MODEL\r\n                        type of model lr:linreg, xgb:xgboost, xgbh: xgb with hyperparameter tuning, xgbfull:\r\n  -mf MODELFILE, --modelfile MODELFILE\r\n                        name for the built model please add extension if desired\r\n  -n N_DATA_LEN, --n_data_len N_DATA_LEN\r\n                        option for data length. Provide 1 2 or 3, default 1\r\n  -d DATA_PATH, --data_path DATA_PATH\r\n                        path to the customized csv dataset, optional\r\n  -x [X_COLS ...], --x_cols [X_COLS ...]\r\n                        provide the independent columns of customized dataset space separated\r\n  -y Y_COL, --y_col Y_COL\r\n                        provide the dependent column of customized dataset\r\n```\r\n**Training Types** \\\r\nThe pipeline has tree different type of trainings to perform for reference: \r\n- Linear Regression ( *lr* )\r\n- XGBoost ( *xgb* )\r\n- XGBoost Hyperparameter ( *xgbh* ) \r\n- XGBFull ( *xgbfull* )\r\n\r\n**Data Length** \\\r\nAs mentioned previously, the dataset used for this use case is synthetic, the main data generator script is located in (src/utils/synthetic_datagen.py)\r\nValid values for data length factors are: \r\n- 1: 120000 rows 10 columns datashape.\r\n- 2: 960000 rows 10 columns datashape.\r\n- 3: 3240000 rows 10 columns datashape. \r\n\r\n**Note:** The training script takes only 2.5M of rows, if the data length is larger only 2.5M rows will be taken.\r\n\r\n\r\nTo run the pipeline with default values: \r\n``` bash\r\npython $WORKSPACE/src/MOSFET_main.py -m \u003ctraining-type\u003e\r\n```\r\nTo run the pipeline giving names and logs addresses:\r\n```bash\r\npython $WORKSPACE/src/MOSFET_main.py -m \u003ctraining-type\u003e -mf \u003cmodel-name\u003e.pkl -l $OUTPUT_DIR/\u003clog-name\u003e.log -n 1\r\n```\r\n***Note:*** *After a successfull run you can find the model in the $WORKSPACE path with your given name, in addition the logs can be found inside $OUTPUT_DIR folder (if your setup is the same as the previous example) otherwise you can find your logs in your given path.*\r\n\r\n**Example 1**: \\\r\nTo run a simple  XGBoost* training, with default values: \r\n```bash\r\npython $WORKSPACE/src/MOSFET_main.py -m xgb\r\n```\r\nTo run a simple  XGBoost* training, with model name \"xgb_model.pkl\" , with logs saved in \"$OUTPUT_DIR/xgb_log.log\" and data length 2:\r\n\r\n[//]: # (capture: baremetal)\r\n```bash\r\npython $WORKSPACE/src/MOSFET_main.py -m xgb -mf $DATA_DIR/models/xgb_model.pkl -l $OUTPUT_DIR/xgb_log.log -n 2\r\n```\r\n\r\n**Example 2**: \\\r\nTo run a  XGBoost* training with hyperparameters, with default values: \r\n```bash\r\npython $WORKSPACE/src/MOSFET_main.py -m xgbh\r\n```\r\nTo run a  XGBoost* training with hyperparameters, with model name \"xgbh_model.pkl\" , with logs saved in \"$OUTPUT_DIR/xgbh_log.log\" and data length 1:\r\n\r\n[//]: # (capture: baremetal)\r\n```bash\r\npython $WORKSPACE/src/MOSFET_main.py -m xgbh -mf $DATA_DIR/models/xgbh_model.pkl -l $OUTPUT_DIR/xgbh_log.log -n 1\r\n```\r\n\r\n*Note: The same name and path for logfile for every call appends to the existing log file with equals name and path.*\r\n\r\n### Clean Up Bare Metal\r\nBefore proceeding to the cleaning process, it is strongly recommended to make a backup of the data that the user wants to keep. To clean the previously downloaded and generated data, run the following commands:\r\n```bash\r\nconda deactivate #Run line if digital_twin_intel is active\r\nconda env remove -n digital_twin_intel\r\n```\r\n\r\n\r\n```bash\r\nrm $OUTPUT_DIR $DATA_DIR -rf\r\n```\r\n\r\n## Expected output\r\nReffering to the examples mentioned in [this section](#run-using-bare-metal) the following outputs represent successfull runs. \r\n\r\n**Example 1** : \r\n```bash\r\nIntel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)\r\nIntel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)\r\n===== Running benchmarks for oneAPI tech =====\r\n===== Generating Synthetic Data =====\r\n--------- Synthetic Dataset Overview ---------\r\n     w_l   vgs       vth       eta        temp       sub-vth w_l_bins vgs_bins vth_bins  log-leakage\r\n0  0.001  0.01  1.050000  1.217129  330.082475  9.035183e-17        1        1        1    16.044063\r\n1  0.001  0.01  1.062342  1.201704  282.045813  2.233716e-19        1        1        1    18.650972\r\n2  0.001  0.01  1.074684  1.200153  281.472996  1.292874e-19        1        1        1    18.888444\r\n3  0.001  0.01  1.087025  1.175888  284.751179  6.035123e-20        1        1        1    19.219314\r\n4  0.001  0.01  1.099367  1.211889  356.945319  2.028265e-16        1        1        1    15.692875\r\nDone ✓\r\nData saved in:  //frameworks.ai.platform.sample-apps.digital-twin//data/synthetic_data.csv\r\nSynthetic data shape 960000 11\r\nSynthetic dataset 'X' columns: ['w_l', 'vgs', 'vth', 'eta','temp', 'w_l_bins', 'vgs_bins', 'vth_bins']\r\nSynthetic dataset 'Y' target column: 'log-leakage'\r\nINFO:sklearnex: sklearn.model_selection.train_test_split: running accelerated version on CPU\r\nsklearn.model_selection.train_test_split: running accelerated version on CPU\r\nINFO:sklearnex: sklearn.model_selection.train_test_split: running accelerated version on CPU\r\nsklearn.model_selection.train_test_split: running accelerated version on CPU\r\n===== Running Benchmarks for XGB Regression =====\r\nTraining time = 3.133789300918579\r\nPrediction time = 0.042740821838378906\r\ndaal4py Prediction time = 0.01962566375732422\r\nMean SQ Error: 0.017\r\ndaal4py Mean SQ Error: 0.017\r\n```\r\n\r\n**Example 2** : \r\n```bash\r\nIntel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)\r\nIntel(R) Extension for Scikit-learn* enabled (https://github.com/intel/scikit-learn-intelex)\r\n===== Running benchmarks for oneAPI tech =====\r\n===== Generating Synthetic Data =====\r\n--------- Synthetic Dataset Overview ---------\r\n     w_l   vgs    vth       eta        temp       sub-vth w_l_bins vgs_bins vth_bins  log-leakage\r\n0  0.001  0.01  1.050  1.208536  293.649819  1.701777e-18        1        1        1    17.769097\r\n1  0.001  0.01  1.075  1.204200  320.383640  1.223601e-17        1        1        1    16.912360\r\n2  0.001  0.01  1.100  1.221108  312.065480  3.785925e-18        1        1        1    17.421828\r\n3  0.001  0.01  1.125  1.217852  279.375132  3.034652e-20        1        1        1    19.517891\r\n4  0.001  0.01  1.150  1.204661  240.956293  1.615252e-23        1        1        1    22.791760\r\nDone ✓\r\nData saved in:  //frameworks.ai.platform.sample-apps.digital-twin//data/synthetic_data.csv\r\nSynthetic data shape 120000 11\r\nSynthetic dataset 'X' columns: ['w_l', 'vgs', 'vth', 'eta','temp', 'w_l_bins', 'vgs_bins', 'vth_bins']\r\nSynthetic dataset 'Y' target column: 'log-leakage'\r\nINFO:sklearnex: sklearn.model_selection.train_test_split: running accelerated version on CPU\r\nsklearn.model_selection.train_test_split: running accelerated version on CPU\r\nINFO:sklearnex: sklearn.model_selection.train_test_split: running accelerated version on CPU\r\nsklearn.model_selection.train_test_split: running accelerated version on CPU\r\n===== Running Benchmarks for XGB Hyperparameter Training =====\r\nFitting 4 folds for each of 8 candidates, totalling 32 fits\r\nTraining time = 14.580235242843628\r\nPrediction time = 0.46219873428344727\r\ndaal4py Prediction time = 0.004792928695678711\r\nMean SQ Error: 0.015\r\ndaal4py Mean SQ Error: 0.015\r\n```\r\n\r\n\r\n## Summary and next steps\r\n\r\n### How to customize this use case\r\nBecause MOSFET devices are so common, any performance gain in model development will be amplified significantly in a deployed model. This offers a significant advantage in model solution scalability. Because leakage current is a key indicator of performance, a digital twin which can predict the leakage current of MOSFET devices _at scale_ will be extremely valuable. To deploy this solution, the model.pkl file which is created as a result of training/hyperparameter tuning can be used to create the end-user applications (APIs to handle client requests) through standard OS packages such as flask or FastAPI.\r\n\r\n### Adopt to your dataset\r\nTo use this use case with your own dataset please take note of the name of the columns of your dataset (independent columns  and target column), since some scripts points directly to the columns used with the synthetic dataset. The dataset must be in **csv** format to guarantee the functionallity of this section. Follow this steps to use your own dataset. \\\r\n**Step 1. Make sure this steps are completed succesfully:**\r\n- [Get Started](#get-started)\r\n\r\n**Step 2. Place your customized dataset in the dataset dir**\r\n```bash\r\nmv /path/to/your/customized/dataset.csv $DATA_DIR/\r\n```\r\n\r\n**Step 3. Run your workflow with your data**\r\nThe script MOSFET_main.py can receive the independent columns of your dataset which corresponds to the 'X' part of the dataframes, and the target 'Y' column of your data which corresponds to the variable you want to predict as you can see in the help message below: \r\n\r\n```bash\r\nusage: MOSFET_main.py [-h] [-l LOGFILE] -m MODEL [-mf MODELFILE] [-n N_DATA_LEN] [-d DATA_PATH] [-x [X_COLS ...]]\r\n                      [-y Y_COL]\r\n\r\noptional arguments:\r\n  -h, --help            show this help message and exit\r\n  -l LOGFILE, --logfile LOGFILE\r\n                        log file to output benchmarking results to\r\n  -m MODEL, --model MODEL\r\n                        type of model lr:linreg, xgb:xgboost, xgbh: xgb with hyperparameter tuning, xgbfull:\r\n  -mf MODELFILE, --modelfile MODELFILE\r\n                        name for the built model please add extension if desired\r\n  -n N_DATA_LEN, --n_data_len N_DATA_LEN\r\n                        option for data length. Provide 1 2 or 3, default 1\r\n  -d DATA_PATH, --data_path DATA_PATH\r\n                        path to the customized csv dataset, optional\r\n  -x [X_COLS ...], --x_cols [X_COLS ...]\r\n                        provide the independent columns of customized dataset space separated\r\n  -y Y_COL, --y_col Y_COL\r\n                        provide the dependent column of customized dataset\r\n```\r\n***Example 1:***\r\nLets say your customized dataset follows this structure: \r\n```bash\r\n     w_l   vgs    vth       eta  temperature       sub-vth    w_l_b    vgs_b    vth_b  curr-log-leakage\r\n0  0.001  0.01  1.050  1.197438   316.039904  1.401728e-17        1        1        1         16.853336\r\n1  0.001  0.01  1.075  1.205384   343.338401  1.069032e-16        1        1        1         15.971009\r\n2  0.001  0.01  1.100  1.191140   296.339600  2.713053e-19        1        1        1         18.566542\r\n3  0.001  0.01  1.125  1.201279   311.210269  9.215524e-19        1        1        1         18.035480\r\n4  0.001  0.01  1.150  1.216623   309.201781  5.407484e-19        1        1        1         18.267005\r\n```\r\nThen your target variable is named as **curr-log-leackage**. \\\r\nThen your independent columns name are: **'w_l'   'vgs'    'vth'       'eta'  'temperature'       'sub-vth'    'w_l_b' 'vgs_b'** and **'vth_b'**.\r\n\r\nSo, the argument form for each one of the arguments X and Y should be: \r\n```bash\r\n-x w_l vgs vth eta temperature sub-vth w_l_b vgs_b vth_b\r\n-y curr-log-leackage\r\n```\r\nThen the command form you will take will be: \r\n```bash\r\npython $WORKSPACE/src/MOSFET_main.py -m \u003ctraining-type\u003e -mf \u003cmodel-name\u003e.pkl -l $OUTPUT_DIR/\u003clog-name\u003e.log -d $DATA_DIR/\u003ccustomized-dataset-name\u003e.csv -x w_l vgs vth eta temperature sub-vth w_l_b vgs_b vth_b -y curr-log-leackage\r\n```\r\n\r\nNow, let's put all together with examples: \r\n\r\n***Example 1.1:*** \\\r\nTo run a simple  XGBoost* training, with model name \"xgb_model.pkl\" , with logs saved in \"$OUTPUT_DIR/xgb_log.log\" **with your own dataset** with name \"modified.csv\":\r\n```bash\r\npython $WORKSPACE/src/MOSFET_main.py -m xgb -mf xgb_model.pkl -l $OUTPUT_DIR/xgb_log.log -d $DATA_DIR/modified.csv -x w_l vgs vth eta temperature sub-vth w_l_b vgs_b vth_b -y curr-log-leackage\r\n```\r\n***Example 1.2:*** \\\r\nTo run a  XGBoost* training with hyperparameters, with model name \"xgbh_model.pkl\" , with logs saved in \"$OUTPUT_DIR/xgbh_log.log\" **with your own dataset** with name \"modified.csv\":\r\n```bash\r\npython $WORKSPACE/src/MOSFET_main.py -m xgbh -mf xgbh_model.pkl -l $OUTPUT_DIR/xgbh_log.log -d $DATA_DIR/modified.csv -x w_l vgs vth eta temperature sub-vth w_l_b vgs_b vth_b -y curr-log-leackage\r\n``` \r\nIf you have questions related to the rest of the parameters used please reffer to [this section](#run-using-bare-metal).\r\n\r\n**_Note:_ Customized data pipelines only works with XGB,LR,XGBH training types.**\r\n\r\n## Learn More\r\n\r\nVisit [Intel® Extension for Scikit-learn](https://www.intel.com/content/www/us/en/developer/tools/oneapi/scikit-learn.html) for more.\r\n \r\nVisit [Intel® Optimization for XGBoost](https://www.intel.com/content/www/us/en/developer/tools/oneapi/optimization-for-xgboost.html) for more.\r\n\r\nVisit [Intel® Distribution of Modin](https://www.intel.com/content/www/us/en/developer/tools/oneapi/distribution-of-modin.html) for more.\r\n\r\nVisit [Python* API (daal4py) for Intel® oneAPI Data Analytics Library (oneDAL)](https://www.intel.com/content/www/us/en/developer/articles/guide/a-daal4py-introduction-and-getting-started-guide.html) for more.\r\n\r\n## Support\r\nThe End-to-end Digital Twin team tracks both bugs and enhancement requests using [GitHub issues](https://github.com/oneapi-src/digital-twin/issues). Before submitting a suggestion or bug report, search the [DLSA GitHub issues](https://github.com/oneapi-src/digital-twin/issues) to see if your issue has already been reported.\r\n\r\n\r\n\r\n## Appendix\r\n\r\n\r\n### About This Use Case\r\n Intel® has released  XGBoost* optimizations as part of the general  XGBoost* packages. Please keep in mind the performance benefit will be a result of both  Intel® Optimizations as well as version updates. No code changes are needed to realize these performance gains apart from just updating the  XGBoost* version, except for explicitly listing tree_method as hist (as all training optimizations from  Intel® are limited to the hist tree method). However, the daal4py optimizations are still relevant to the use case as this can further improve the performance of end-user applications.\r\n \r\n### References\r\nThe base code was sourced from the following github repository:\r\nhttps://github.com/tirthajyoti/Digital-Twin/blob/main/MOSFET-1.ipynb\r\n\r\n[1]: IBM. (2022). What is a Digital Twin. Www.ibm.com. https://www.ibm.com/topics/what-is-a-digital-twin \\\r\n[2]: Cheat sheet: What is Digital Twin? Internet of Things blog. (2020, December 4). IBM Blog. https://www.ibm.com/blog/iot-cheat-sheet-digital-twin/\r\n\r\n\r\n **The dataset used here is synthetic.  Intel® Corporation does not own the rights to this data set and does not confer any rights to it.**\r\n \r\nTo the extent that any public or non-Intel datasets or models are referenced by or accessed using tools or code on this site those datasets or models are provided by the third party indicated as the content source. Intel® does not create the content and does not warrant its accuracy or quality. By accessing the public content, or using materials trained on or with such content, you agree to the terms associated with that content and that your use complies with the applicable license.\r\nIntel® expressly disclaims the accuracy, adequacy, or completeness of any such public content, and is not liable for any errors, omissions, or defects in the content, or for any reliance on the content. Intel® is not liable for any liability or damages relating to your use of public content.\r\n","funding_links":[],"categories":["Table of Contents"],"sub_categories":["AI - Frameworks and Toolkits"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foneapi-src%2Fdigital-twin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foneapi-src%2Fdigital-twin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foneapi-src%2Fdigital-twin/lists"}