{"id":22923672,"url":"https://github.com/mithril-security/blindllama-v2","last_synced_at":"2025-05-12T23:14:09.448Z","repository":{"id":223985627,"uuid":"733594586","full_name":"mithril-security/blindllama-v2","owner":"mithril-security","description":"Confidential inference in enclave for OpenAI grant. Uses k3s and Triton","archived":false,"fork":false,"pushed_at":"2025-03-20T13:30:53.000Z","size":223,"stargazers_count":12,"open_issues_count":1,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-01T04:32:47.017Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mithril-security.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-12-19T17:19:41.000Z","updated_at":"2025-03-20T13:30:56.000Z","dependencies_parsed_at":"2024-02-23T05:21:21.800Z","dependency_job_id":"744f8c22-af26-40f8-af1c-9b32b9be7b94","html_url":"https://github.com/mithril-security/blindllama-v2","commit_stats":null,"previous_names":["mithril-security/blindllama-v2"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mithril-security%2Fblindllama-v2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mithril-security%2Fblindllama-v2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mithril-security%2Fblindllama-v2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mithril-security%2Fblindllama-v2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mithril-security","download_url":"https://codeload.github.com/mithril-security/blindllama-v2/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253837469,"owners_count":21971984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-14T08:16:18.128Z","updated_at":"2025-05-12T23:14:09.425Z","avatar_url":"https://github.com/mithril-security.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Quick tour for BlindLlama-v2 OpenAI\n\n\n\u003e :warning: **This is a prototype**\n\n## Introduction\n\nBlindLlama-v2 is a framework for serving Kubernetes-based applications on verifiable and isolated environments called enclaves and deploying them on Cloud VMs equipped with GPUs and vTPMs.\n\nBy deploying models like Llama 2 with BlindLlama-v2, end-users can consume AI models with guarantees the admins of the AI infrastructure cannot see users' data as they can verify data is only processed in verifiable environments isolated (leveraging hypervisor isolation) and data will not leave (network isolation).\n\nFor developers wishing to deploy their applications with BlindLlama-v2, the process is done in 4 steps:\n- Prepare the image\n    - Model\n    - OS\n    - Network configuration\n- Generate measurements\n- Deploy on Azure\n- Integrate the secure client-side SDK\n\nThen, users can consume the Confidential AI service while having guarantees that their data is end-to-end protected and not visible to the AI service's admins.\n\nMore information about the security properties, the architecture, and the workflow can be found in our Whitepaper.\n\nIn this quick tour, we will show how one can package a Kubernetes application to serve either Llama 2 7b or GPT 2 using TensorRT, prepare measurements to prove the model is served in an enclave, deploy it on Azure VMs with A100 and vTPMs, and finally consume the AI model with confidentiality.\n\n## Core concepts\n\nTrust in our enclaves is derived from 3 core principles:\nTransparency source: the code is open-source \nTransparency build: the build is done using SLSA\nTransparency run time: the client verifies the server software identity through the measurements of the previously built source code in a transparent manner\n\n## Prerequisites\nTo run this example, you will need to use a VM with a GPU such as Standard_NC24ads_A100_v4. To run a larger model than the Llama 7B, you may need to use larger machines with more memory and more GPUs, such as the Standard_NC48ads_A100_v4 or the Standard_NC96ads_A100_v4.\n\nThe code requires python 3.11 or later.\nYou will also need to install git lfs, which can be done with:\n```console\napt-get update \u0026\u0026 apt-get install git-lfs pesign -y --no-install-recommends\ngit lfs install\n\ngit submodule update --init --recursive\n```\n\n## 1 - Preparing the image\nBlindLlama-v2 serves Kubernetes-based images inside enclaves and, therefore, requires developers to package their applications in the appropriate manner.\n\nFor this example, several components have to be packaged:\nThe model weights have to be prepared to be used by TensorRT\nThe Mithril OS, which is a minimal OS designed to be easily verifiable and provide measurements, has to be integrated into the final image\nThe application disk is a data disk containing the required container images, such as the attestation generator, the triton server, and the attestation server. The application disk can be measured and a root hash generated, attesting to every file in the disk.\nAny changes to the disk will alter the root hash and, therefore, be detected.\n\n### A - Model weights\n\nTriton with TensorRT requires the creation of a model engine that has the weights embedded in it. The following script will generate a model engine for Llama 2 7b.\n```console\n./launch_container_create_model_engine.sh \"Llama-2-7b-hf\"\n```\nTo create a model engine for GPT2-medium, use:\n```console\n./launch_container_create_model_engine.sh \"gpt2-medium\"\n```\nNote: The model engines are specific to the GPU they are generated on. If you use an A100 GPU to create the model engine, you must run the BlindLlama-v2 VM on a machine with an A100 GPU.\n\n**By default, the engine generated uses 1 engine. To create the model engine according to your specifications, you may change the create_engine.sh script present at tritonRT/create_engine.sh before creating the model engine.**\n\n```console\npython /tensorrtllm_backend/tensorrt_llm/examples/llama/build.py --model_dir /$1/ \\\n                --dtype bfloat16 \\\n                --use_gpt_attention_plugin bfloat16 \\\n                --use_inflight_batching \\\n                --paged_kv_cache \\\n                --remove_input_padding \\\n                --use_gemm_plugin bfloat16 \\\n                --output_dir /engines/1-gpu/ \\\n                --world_size 1\n```\n\n### B - Production mode:\nThis will create an OS image in production mode with no means of access to the image. The only point of access is the ingress controller and the endpoints it serves. There is no shell access, SSH, etc.\n\n```console\nearthly -i -P +mithril-os --OS_CONFIG='config.yaml'\n```\n\n### C - Application disk\nThis command will create an application disk with the Llama 2 7B model engine (generated earlier) included in it.\n\n```console\nearthly -i -P +blindllamav2-appdisk --MODEL=\"Llama-2-7b-hf\"\n```\nTo create an application disk with GPT2-medium use:\n```console\nearthly -i -P +blindllamav2-appdisk --MODEL=\"gpt2-medium\"\n```\n\n### D - Network policy\n\nWhile the network policy will be part of the disk, it is interesting to explore it further, as it is important for security and privacy. \n\nThe network policy that will be used will be included in the final measurement of the application disk. For instance, we will use the following one to allow data to be loaded inside the enclave, but nothing will leave it except the output of the AI model that will be sent back to the requester.\n\nThe network policy file can be found in the annex.\n\n## 2 - Generating measurements\nOnce the disks are created, we can generate the measurements of the disks. These measurements will be used by the client to verify the server.\n\nHere is how to generate the measurements of the OS disk.\n```console\n./scripts/generate_expected_measurements_files.py\n```\nThe measurement file contains the PCR values of the OS. A sample measurement file is as follows:\n```json\n{\n    \"measurements\": {\n        \"0\": \"f3a7e99a5f819a034386bce753a48a73cfdaa0bea0ecfc124bedbf5a8c4799be\",\n        \"1\": \"3d458cfe55cc03ea1f443f1562beec8df51c75e14a9fcf9a7234a13f198e7969\",\n        \"2\": \"3d458cfe55cc03ea1f443f1562beec8df51c75e14a9fcf9a7234a13f198e7969\",\n        \"3\": \"3d458cfe55cc03ea1f443f1562beec8df51c75e14a9fcf9a7234a13f198e7969\",\n        \"4\": \"dd2ccfebe24db4c43ed6913d3cbd7f700395a88679d3bb3519ab6bace1d064c0\",\n        \"12\": \"0000000000000000000000000000000000000000000000000000000000000000\",\n        \"13\": \"0000000000000000000000000000000000000000000000000000000000000000\"\n    }\n}\n```\n\nTo understand better what it means, each PCR measures different parts of the stack:\nPCRs  0, 1, 2, and 3 are firmware related measurements.\nPCR 4 measures the UKI (initrd, kernel image, and boot stub)\nPCR 12 and 13 measure the kernel command line and system extensions. We do not want any of those to be enabled, so we ensure they are 0s.\n\nHere is how to generate the root hash of the application disk.\n```console\n./scripts/generate_security_config.py\n```\nThe application disk is simply a data disk. Therefore, the only measurement we need is a measurement of everything stored on the disk. The root hash is calculated using dm-verity. It is independent of the OS disk and the OS disk measurement.\n\nA sample root hash is as follows:\n```json\n{\n    \"application_disk_roothash\": \"89ca5b62df40df834b8f7a17ce2cce72247cebbb87b80d220845ec583470605f\"\n}\n```\n\nThis root hash represents the full stack we expect to measure, from the Mithril OS to the Triton app, through the weights loaded.\n## 3 - Deploying it on Azure\nWe can now deploy the image on the appropriate Azure VM.\n\nHere are some deployment requirements: \n```console\n# qemu-utils to resize disk to conform to azure disk specifications\nsudo apt-get install qemu-utils\n# Azure CLI\ncurl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash\n# azcopy to copy disk to azure \nhttps://aka.ms/downloadazcopy-v10-linux\n```\n\nEdit `upload_config.sh` with your Azure resource group and region where you want the disk to be created.\n```sh\n### Replace with your values\nAZ_RESOURCE_GROUP=\"my-resource-group\"\nAZ_REGION=\"myregion\"\n### End\n```\nRun the following script to upload the disks and create a VM.\n```console\n./upload.sh\n```\nThis script uploads the disks, creates a VM, adds DNS entries in the local machine's /etc/hosts file, and creates a network rule in the Azure firewall to allow HTTPS requests into the VM. Note that these network rules are regular network rules. The network isolation policies to ensure data does not leave the enclaves are of the OS and k3s.\n\nThe model is now up and running and can be queried while having guarantees it is deployed in a VM with no admin access and benefits from network isolation.\n\n## 4 - Confidential consumption with attested TLS\nTo consume the model securely, we provide a Python client SDK. This SDK will perform attestation by verifying the measurements of the enclave, ensuring they come from genuine vTPMs and that the measurements match the expected secure version of our code.\n\nThese measurements are created in the client when the [generate_security_config.py](./scripts/generate_security_config.py) and [generate_expected_measurements_files.py](./scripts/generate_expected_measurements_files.py) scripts are run. You can find the measurements in the client directory at `client/client/blindllamav2/security_config`\n\nOnce the attestation passes, we establish a TLS channel that ends up inside our enclave.\n\nThis whole process is also known as attested TLS.\n\n### A - Installation\nThe client may be installed with pip.\n```console\napt install tpm2-tools\ncd client/client\npip install .\n```\n### B - Querying the model\n\nNow, we can start consuming the previously deployed model.\nWe provide an OpenAI-like interface to consume the model, but instead of using regular TLS, our framework performs attested TLS:\n```python\nimport blindllamav2\n\nresponse = blindllamav2.completion.create(\n    fetch_attestation_insecure=True,\n    model=\"meta-llama/Llama-2-7b-hf\",\n    text_input=\"What is machine Learning?\",\n    bad_words=\"\",\n    stop_words=\"\",\n    max_tokens=20,\n)\n\nprint(response)\n\n## What happens under the hood\n\nfrom blindllamav2._config import CONFIG\nfrom blindllamav2.attestation.attested_session import AttestedSession\nfrom blindllamav2.attestation_validator import AttestationValidator\nfrom blindllamav2.security_config import (\n    APPLICATION_DISK_ROOTHASH,\n    EXPECTED_OS_MEASUREMENTS,\n)\nfrom blindllamav2.attestation.verifier import PlatformKind\nfrom blindllamav2.completion import PromptRequest\n\nprint(f\"\\n Expected OS measurements: {EXPECTED_OS_MEASUREMENTS[CONFIG.target]} \\n\")\nprint(f\"Expected application disk roothash: {APPLICATION_DISK_ROOTHASH} \\n\")\n\n# A validator class that verifies that the attestation report matches expected measurements \nattestation_validator = AttestationValidator(\n    platform_kind=PlatformKind.AZURE_TRUSTED_LAUNCH,\n    expected_application_disk_roothash=APPLICATION_DISK_ROOTHASH,\n    expected_os_measurements=EXPECTED_OS_MEASUREMENTS[CONFIG.target],\n)\n\n# Creates a session with the obtained TLS certificate from the server after validating server state\nsession = AttestedSession(\n    api_url=CONFIG.api_url,\n    attestation_endpoint_base_url=CONFIG.attestation_endpoint_base_url,\n    attestation_validator=attestation_validator,\n    fetch_attestation_document_over_insecure_connection=True,  # CONFIG.feature_flags.fetch_attestation_document_over_insecure_connection\n)\n\nprompt = PromptRequest(text_input=\"What is machine Learning?\", max_tokens=20, bad_words=\"\", stop_words=\"\")\n\nreq = session.post(f\"{CONFIG.api_url}/v2/models/ensemble/generate\", json=prompt.dict())\nreq.raise_for_status()\nprint(req.text)\n```\n## Annex - Information on network Policy and Isolation\nThe network policy is implemented individually for the k3s pods as well as for the host.\nThe host network is controlled by iptables rules. The exact rules are:\n```\n*filter\n# Allow localhost connections to permit communication between k3s components\n-A INPUT -p tcp -s localhost -d localhost -j ACCEPT\n-A OUTPUT -p tcp -s localhost -d localhost -j ACCEPT\n# Allow connection to Azure IMDS to get the VM Instance userdata\n-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT\n-A OUTPUT -p tcp -d 169.254.169.254 --dport 80 -j ACCEPT\n# DNS over UDP\n-A INPUT -p udp --sport 53 -j ACCEPT\n-A INPUT -p udp --dport 53 -j ACCEPT\n-A OUTPUT -p udp --sport 53 -j ACCEPT\n-A OUTPUT -p udp --dport 53 -j ACCEPT\n# DNS over TCP\n-A INPUT -p tcp --sport 53 -j ACCEPT\n-A INPUT -p tcp --dport 53 -j ACCEPT\n-A OUTPUT -p tcp --sport 53 -j ACCEPT\n-A OUTPUT -p tcp --dport 53 -j ACCEPT\n# Drop all other traffic\n-A OUTPUT -j DROP\n-A INPUT -j DROP\nCOMMIT\n```\nIn the repository they can be found in [rules.v4](mithril-os/mkosi/rootfs/mkosi.extra/etc/iptables/rules.v4) and [rules.v6](mithril-os/mkosi/rootfs/mkosi.extra/etc/iptables/rules.v6)\n\nThese rules block all incoming and outgoing traffic except for DNS queries and localhost connections. The rules are applied on boot by the iptables-persistent package. You can verify that the package is installed if you take a look at the [mkosi.conf](mithril-os/mkosi/rootfs/mkosi.conf.j2) file.\n\nSimilarly for k3s we set rules to allow incoming traffic only to the ingress controller which acts as reverse proxy. Outgoing traffic is also restricted to the reverse proxy. All other traffic not destined for or leaving from the reverse proxy is blocked.\nRules are also in place to allow traffic from the reverse proxy to the appropriate container (either blindllamav2 or the attestation server).\n\nThese rules are as follows:\n```yaml\napiVersion: networking.k8s.io/v1\nkind: NetworkPolicy\nmetadata:\n  name: deny-all-caddy-ns\n  namespace: caddy-system\nspec:\n  podSelector: {}\n  policyTypes:\n  - Ingress\n  - Egress\n---\napiVersion: networking.k8s.io/v1\nkind: NetworkPolicy\nmetadata:\n  name: default-deny-all\nspec:\n  podSelector: {}\n  policyTypes:\n  - Ingress\n  - Egress\n---\napiVersion: networking.k8s.io/v1\nkind: NetworkPolicy\nmetadata:\n  name: caddy-ingress\n  namespace: caddy-system\nspec:\n  podSelector:\n    matchLabels:\n      app.kubernetes.io/name: caddy-ingress-controller\n  ingress:\n  - from:\n    - ipBlock:\n        cidr: 0.0.0.0/0\n---\napiVersion: networking.k8s.io/v1\nkind: NetworkPolicy\nmetadata:\n  name: caddy-egress\n  namespace: caddy-system\nspec:\n  podSelector:\n    matchLabels:\n      app.kubernetes.io/name: caddy-ingress-controller\n  policyTypes:\n  - Egress\n  egress:\n  - {}\n---\napiVersion: networking.k8s.io/v1\nkind: NetworkPolicy\nmetadata:\n  name: ingress-to-attestation-server\n  namespace: caddy-system\nspec:\n  podSelector:\n    matchLabels:\n      app: attestation-server\n  policyTypes:\n  - Ingress\n  ingress:\n  - from:\n    - namespaceSelector:\n        matchLabels:\n          kubernetes.io/metadata.name: \"caddy-system\"\n      podSelector:\n        matchLabels:\n          app.kubernetes.io/name: caddy-ingress-controller\n---\napiVersion: networking.k8s.io/v1\nkind: NetworkPolicy\nmetadata:\n  name: ingress-to-blindllamav2\nspec:\n  podSelector:\n    matchLabels:\n      app: blindllamav2\n  ingress:\n  - from:\n    - namespaceSelector:\n        matchLabels:\n          kubernetes.io/metadata.name: \"caddy-system\"\n      podSelector:\n        matchLabels:\n          app.kubernetes.io/name: caddy-ingress-controller\n```\nWhen the client tries to connect to the server it first retrieves the attestation report which is a quote from the TPM. The client uses the measurements stored in the `security_config` to validate the quote received from the TPM. \n\nIf there are any changes in the host networking rules, it will reflect in the PCR values (PCR 4) of the OS measurement and the connection will be terminated. If there are any changes in the k3s network policy, it will reflect in the application disk root hash measurement (PCR 15) and the connection will be terminated.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmithril-security%2Fblindllama-v2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmithril-security%2Fblindllama-v2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmithril-security%2Fblindllama-v2/lists"}