{"id":19944252,"url":"https://github.com/lazauk/slm-phi-3-mlflow","last_synced_at":"2025-11-25T20:06:54.872Z","repository":{"id":244996432,"uuid":"816723987","full_name":"LazaUK/SLM-Phi-3-MLFlow","owner":"LazaUK","description":"Converting Phi-3 model into MLFlow format, to enable targeted inference.","archived":false,"fork":false,"pushed_at":"2024-06-19T22:43:28.000Z","size":410,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-01T13:42:02.608Z","etag":null,"topics":["ai","azure","gen-ai","machine-learning","mlflow","mlops","phi-3","pipeline","slm","transformer","wrapper"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LazaUK.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-18T09:31:50.000Z","updated_at":"2024-06-19T22:43:31.000Z","dependencies_parsed_at":"2024-11-13T00:29:49.308Z","dependency_job_id":null,"html_url":"https://github.com/LazaUK/SLM-Phi-3-MLFlow","commit_stats":null,"previous_names":["lazauk/slm-phi-3-mlflow"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/LazaUK/SLM-Phi-3-MLFlow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LazaUK%2FSLM-Phi-3-MLFlow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LazaUK%2FSLM-Phi-3-MLFlow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LazaUK%2FSLM-Phi-3-MLFlow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LazaUK%2FSLM-Phi-3-MLFlow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LazaUK","download_url":"https://codeload.github.com/LazaUK/SLM-Phi-3-MLFlow/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LazaUK%2FSLM-Phi-3-MLFlow/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286079811,"owners_count":27282121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-25T02:00:05.816Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","azure","gen-ai","machine-learning","mlflow","mlops","phi-3","pipeline","slm","transformer","wrapper"],"created_at":"2024-11-13T00:19:40.712Z","updated_at":"2025-11-25T20:06:54.856Z","avatar_url":"https://github.com/LazaUK.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Building a wrapper and using Phi-3 as an MLFlow model\n\nMLflow is an open-source platform designed to streamline the entire machine learning (ML) lifecycle. It helps data scientists track experiments, manage their ML models and deploy them into production, ensuring reproducibility and efficient collaboration.\n\nIn this repo, I’ll demonstrate two different approaches to building a wrapper around Phi-3 small language model (SLM) and then running it as an MLFlow model either locally or in the cloud, e.g., in Azure Machine Learning workspace. You can use attached Jupyter notebooks to jump-start your development process.\n\n\u003e _Note_: this code has been now contributed to Microsoft's Phi-3 Cookbook [here](https://github.com/microsoft/Phi-3CookBook/blob/main/md/06.E2ESamples/E2E_Phi-3-MLflow.md).\n\n## Table of contents:\n- [Option 1: Transformer pipeline](https://github.com/LazaUK/SLM-Phi-3-MLFlow/tree/main#option-1-transformer-pipeline)\n- [Option 2: Custom Python wrapper](https://github.com/LazaUK/SLM-Phi-3-MLFlow/tree/main#option-2-custom-python-wrapper)\n- [Signatures of generated MLFlow models](https://github.com/LazaUK/SLM-Phi-3-MLFlow/tree/main#signatures-of-generated-mlflow-models)\n- [Inference of Phi-3 with MLFlow runtime](https://github.com/LazaUK/SLM-Phi-3-MLFlow/tree/main#inference-of-phi-3-with-mlflow-runtime)\n\n## Option 1: Transformer pipeline\nThis is the easiest option to build a wrapper if you want to use a HuggingFace model with MLFlow’s _experimental_ **transformers** flavour.\n1. You would require relevant Python packages from MLFlow and HuggingFace:\n``` Python\nimport mlflow\nimport transformers\n```\n2. Next, you should initiate a transformer pipeline by referring to the target Phi-3 model in the HuggingFace registry. As can be seen from the _Phi-3-mini-4k-instruct_’s model card, its task is of a “Text Generation” type:\n``` Python\npipeline = transformers.pipeline(\n    task = \"text-generation\",\n    model = \"microsoft/Phi-3-mini-4k-instruct\"\n)\n```\n3. You can now save your Phi-3 model’s transformer pipeline into MLFlow format and provide additional details such as the target artifacts path, specific model configuration settings and inference API type:\n``` Python\nmodel_info = mlflow.transformers.log_model(\n    transformers_model = pipeline,\n    artifact_path = \"phi3-mlflow-model\",\n    model_config = model_config,\n    task = \"llm/v1/chat\"\n)\n```\n\n## Option 2: Custom Python wrapper\nAt the time of writing, the transformer pipeline did not support MLFlow wrapper generation for HuggingFace models in ONNX format, even with the experimental _optimum_ Python package. For the cases like this, you can build your custom Python wrapper for MLFlow model.\n1. I utilise here Microsoft's [ONNX Runtime generate() API](https://github.com/microsoft/onnxruntime-genai) for the ONNX model's inference and tokens encoding / decoding. You have to choose _onnxruntime_genai_ package for your target compute, with the below example targeting CPU:\n``` Python\nimport mlflow\nfrom mlflow.models import infer_signature\nimport onnxruntime_genai as og\n```\n2. Our custom class implements two methods: _load_context()_ to initialise the **ONNX model** of Phi-3 Mini 4K Instruct, **generator parameters** and **tokenizer**; and _predict()_ to generate output tokens for the provided prompt:\n``` Python\nclass Phi3Model(mlflow.pyfunc.PythonModel):\n    def load_context(self, context):\n        # Retrieving model from the artifacts\n        model_path = context.artifacts[\"phi3-mini-onnx\"]\n        model_options = {\n             \"max_length\": 300,\n             \"temperature\": 0.2,         \n        }\n    \n        # Defining the model\n        self.phi3_model = og.Model(model_path)\n        self.params = og.GeneratorParams(self.phi3_model)\n        self.params.set_search_options(**model_options)\n        \n        # Defining the tokenizer\n        self.tokenizer = og.Tokenizer(self.phi3_model)\n\n    def predict(self, context, model_input):\n        # Retrieving prompt from the input\n        prompt = model_input[\"prompt\"][0]\n        self.params.input_ids = self.tokenizer.encode(prompt)\n\n        # Generating the model's response\n        response = self.phi3_model.generate(self.params)\n\n        return self.tokenizer.decode(response[0][len(self.params.input_ids):])\n```\n3. You can use now _mlflow.pyfunc.log_model()_ function to generate a custom Python wrapper (in pickle format) for the Phi-3 model, along with the original ONNX model and required dependencies:\n``` Python\nmodel_info = mlflow.pyfunc.log_model(\n    artifact_path = artifact_path,\n    python_model = Phi3Model(),\n    artifacts = {\n        \"phi3-mini-onnx\": \"cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4\",\n    },\n    input_example = input_example,\n    signature = infer_signature(input_example, [\"Run\"]),\n    extra_pip_requirements = [\"torch\", \"onnxruntime_genai\", \"numpy\"],\n)\n```\n\n## Signatures of generated MLFlow models\n1. In Step 3 of Option 1 above, we set the MLFlow model’s task to “_llm/v1/chat_”. Such instruction generates a model’s API wrapper, compatible with OpenAI’s Chat API as shown below:\n``` Python\n{inputs: \n  ['messages': Array({content: string (required), name: string (optional), role: string (required)}) (required), 'temperature': double (optional), 'max_tokens': long (optional), 'stop': Array(string) (optional), 'n': long (optional), 'stream': boolean (optional)],\noutputs: \n  ['id': string (required), 'object': string (required), 'created': long (required), 'model': string (required), 'choices': Array({finish_reason: string (required), index: long (required), message: {content: string (required), name: string (optional), role: string (required)} (required)}) (required), 'usage': {completion_tokens: long (required), prompt_tokens: long (required), total_tokens: long (required)} (required)],\nparams: \n  None}\n```\n2. As a result, you can submit your prompt in the following format:\n``` Python\nmessages = [{\"role\": \"user\", \"content\": \"What is the capital of Spain?\"}]\n```\n3. Then, use OpenAI API-compatible post-processing, e.g., _response[0][‘choices’][0][‘message’][‘content’]_, to beautify your output to something like this:\n``` JSON\nQuestion: What is the capital of Spain?\n\nAnswer: The capital of Spain is Madrid. It is the largest city in Spain and serves as the political, economic, and cultural center of the country. Madrid is located in the center of the Iberian Peninsula and is known for its rich history, art, and architecture, including the Royal Palace, the Prado Museum, and the Plaza Mayor.\n\nUsage: {'prompt_tokens': 11, 'completion_tokens': 73, 'total_tokens': 84}\n```\n4.  In Step 3 of Option 2 above, we allow the MLFlow package to generate the model’s signature from a given input example. Our MLFlow wrapper's signature will look like this:\n``` Python\n{inputs: \n  ['prompt': string (required)],\noutputs: \n  [string (required)],\nparams: \n  None}\n```\n5. So, our prompt would need to contain \"prompt\" dictionary key, similar to this:\n``` Python\n{\"prompt\": \"\u003c|system|\u003eYou are a stand-up comedian.\u003c|end|\u003e\u003c|user|\u003eTell me a joke about atom\u003c|end|\u003e\u003c|assistant|\u003e\",}\n```\n6. The model's output will be provided then in string format:\n``` JSON\nAlright, here's a little atom-related joke for you!\n\nWhy don't electrons ever play hide and seek with protons?\n\nBecause good luck finding them when they're always \"sharing\" their electrons!\n\nRemember, this is all in good fun, and we're just having a little atomic-level humor!\n```\n\n## Inference of Phi-3 with MLFlow runtime\n1. To run the generated MLFlow model locally, you can load it with _mlflow.pyfunc.load_model()_ from the model’s directory and then call its _predict()_ method. You can load the model as follows:\n``` Python\nloaded_model = mlflow.pyfunc.load_model(\n    model_uri = model_info.model_uri\n)\n```\n2. To run in a cloud environment like an Azure Machine Learning workspace, you can register your MLFlow model with a custom Python wrapper in workspace's model registry:\n![phi3_mlflow_registration](/images/phi3_aml_registry.png)\n3. Then, deploy it to a managed real-time endpoint:\n![phi3_mlflow_deploy](/images/phi3_aml_deploy.png)\n4. Once the deployment succeeds, you can immediately start using it with code samples provided in **JavaScript**, **Python**, **C#** or **R**:\n![phi3_mlflow_endpoint](/images/phi3_aml_endpoint.png)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flazauk%2Fslm-phi-3-mlflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flazauk%2Fslm-phi-3-mlflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flazauk%2Fslm-phi-3-mlflow/lists"}