{"id":18554432,"url":"https://github.com/oracle/sd4j","last_synced_at":"2025-04-04T23:07:50.458Z","repository":{"id":212347944,"uuid":"730701037","full_name":"oracle/sd4j","owner":"oracle","description":"Stable diffusion pipeline in Java using ONNX Runtime","archived":false,"fork":false,"pushed_at":"2025-03-04T13:40:15.000Z","size":2681,"stargazers_count":161,"open_issues_count":1,"forks_count":15,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-03-28T22:12:24.197Z","etag":null,"topics":["java","onnxruntime","stable-diffusion"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"upl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oracle.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-12T13:43:36.000Z","updated_at":"2025-03-05T16:56:09.000Z","dependencies_parsed_at":"2025-01-12T02:17:06.775Z","dependency_job_id":"f155536d-306f-48b7-87db-f21832d66bb9","html_url":"https://github.com/oracle/sd4j","commit_stats":{"total_commits":13,"total_committers":3,"mean_commits":4.333333333333333,"dds":"0.15384615384615385","last_synced_commit":"c82731a66dc778f8f909161c110638df37631516"},"previous_names":["oracle-samples/sd4j","oracle/sd4j"],"tags_count":1,"template":false,"template_full_name":"oracle/template-repo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Fsd4j","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Fsd4j/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Fsd4j/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Fsd4j/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oracle","download_url":"https://codeload.github.com/oracle/sd4j/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247261603,"owners_count":20910108,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","onnxruntime","stable-diffusion"],"created_at":"2024-11-06T21:21:56.592Z","updated_at":"2025-04-04T23:07:50.438Z","avatar_url":"https://github.com/oracle.png","language":"Java","funding_links":[],"categories":["人工智能"],"sub_categories":[],"readme":"# SD4J (Stable Diffusion in Java)\n\nThis repo contains an implementation of Stable Diffusion inference running on top of ONNX Runtime,\nwritten in Java. It's a modified port of the [C# implementation](https://github.com/cassiebreviu/StableDiffusion/),\nwith a GUI for repeated generations and support for negative text inputs. It is intended to be a\ndemonstration of how to use ONNX Runtime from Java, and best practices for ONNX Runtime to get good performance.\nWe will keep it current with the latest releases of ONNX Runtime, with appropriate updates as new performance\nrelated ONNX Runtime features become available through the ONNX Runtime Java API. All the code is subject to change as\nthis is a code sample, any APIs in it should not be considered stable.\n\nThis repo targets ONNX Runtime 1.14. The version number is in two parts `\u003csd4j-version\u003e-\u003connxruntime-version\u003e`, and the\ninitial release of sd4j is v1.0-1.14.0. We'll bump the sd4j version number if it gains new features and the ONNX Runtime\nversion number as we depend on newer versions of ONNX Runtime.\n\nThe project supports txt2img generation, it doesn't currently implement img2img, upscaling or inpainting.\n\nBy default it uses a fp32 model, and running on a 6 core 2019 16\" Intel Macbook Pro each diffusion step takes around 5s. \nRunning on better hardware, or with a CUDA GPU will greatly reduce the time taken to generate an image, as will using an\nSD-Turbo model. There is experimental support for the CoreML (for macOS) and DirectML (for Windows) backends, but proper \nutilisation of these may require model changes like quantization which is not yet implemented.\n\n## Example images\n\nThese are a few example images generated by this code along with their generation parameters:\n\n![Generated image from the prompt \"Wildlife photograph of an astronaut riding a horse in the desert\"](images/astronaut-horse.png \"Wildlife photograph of an astronaut riding a horse in the desert\")\n\nText: \"Wildlife photograph of an astronaut riding a horse in the desert\", Negative Text: \"\", Seed: 42, Guidance Scale: 10, Inference Steps: 40, Scheduler: Euler Ancestral, Image Size: 512x512.\n\n![Generated image from the prompt \"Press photo of an America's Cup catamaran sailing through the sands of Mars, high resolution, high quality\"](images/boat-mars.png \"Press photo of an America's Cup catamaran sailing through the sands of Mars\")\n\nText: \"Press photo of an America's Cup catamaran sailing through the sands of Mars, high resolution, high quality\", Negative Text: \"water, sea, ocean, lake\", Seed: 42, Guidance Scale: 10, Inference Steps: 40, Scheduler: Euler Ancestral, Image Size: 512x512.\n\n![Generated image from the prompt \"Professional photograph of the Apollo 11 lunar lander in a field, high quality, 4k\"](images/lunar-lander.png \"Professional photograph of the Apollo 11 lunar lander in a field, high quality, 4k\")\n\nText: \"Professional photograph of the Apollo 11 lunar lander in a field, high quality, 4k\", Negative Text: \"\", Seed: 42, Guidance Scale: 10, Inference Steps: 50, Scheduler: Euler Ancestral, Image Size: 512x512.\n\n![Generated image from the prompt \"Professional photograph of George Washington in his garden grilling steaks, detailed face, high quality, 4k\"](images/washington-steak.png \"Professional photograph of George Washington in his garden grilling steaks, detailed face, high quality, 4k\")\n\nText: \"Professional photograph of George Washington in his garden grilling steaks, detailed face, high quality, 4k\", Negative Text: \"painting, drawing, art\", Seed: 42, Guidance Scale: 10, Inference Steps: 60, Scheduler: Euler Ancestral, Image Size: 512x512.\n\n## Model support\n\nThe SD4J project supports SD v1.5, SD v2 and SDXL style models. For models which do not support classifier-free guidance\nor negative prompts, such as SD-Turbo or SDXL-Turbo, the guidance scale should be set to a value less than 1.0 which\ndisables that guidance. Models like SD-Turbo can generate acceptable images in as few as two diffusion steps. The \ndifference between SDv1 and SDv2 models is autodetected, but SDXL must be supplied as the model type for SDXL models\notherwise it will throw an exception on generation. In some cases the autodetection of v1 and v2 may fail in which case\nsupplying the `--model-type {SD1.5, SD2, SDXL}` argument with the appropriate parameter will fix the model type.\n\n## Installation\n\nThis project requires [Apache Maven](https://maven.apache.org), [Java 17 or newer](https://www.oracle.com/java/technologies/downloads/),\na compiled ONNX Runtime extensions binary, and a Stable Diffusion model checkpoint.\nThe other dependencies (ONNX Runtime and Apache Commons Math) are downloaded by Maven automatically.\n\n### Prepare model checkpoint\n\nThere are many compatible models on [Hugging Face's website](https://huggingface.co). We have tested the\nStable Diffusion v1.5 checkpoint, which has pre-built ONNX models. This can be downloaded via \nthe following `git` commands (skip the first one if you have already configured `git-lfs`):\n```bash\ngit lfs install\ngit clone https://huggingface.co/runwayml/stable-diffusion-v1-5 -b onnx\n```\nThe Stable Diffusion v1.5 checkpoint is available under the [OpenRAIL-M license](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE).\nFor other SD models there is a one or two stage process to generate the ONNX format models. If the model is already in \nHugging Face Diffusers format then you can run the `convert_stable_diffusion_checkpoint_to_onnx.py` file from the \n[diffusers](https://github.com/huggingface/diffusers) project as follows:\n```bash\npython scripts/convert_stable_diffusion_checkpoint_to_onnx.py --model_path \u003cpath-on-disk-or-model-hub-name\u003e --output_path \u003cpath-to-onnx-model-folder\u003e\n```\nIf the model is an original stable diffusion checkpoint then you first need to run:\n```bash\npython scripts/convert_original_stable_diffusion_to_diffusers.py --checkpoint_path \u003cpath-on-disk-to-checkpoint\u003e --scheduler_type lms --dump_path \u003cpath-on-disk-to-diffusers-output\u003e\n```\nIf the model is an SDXL model then it needs to be exported from the Hugging Face Hub \nusing [optimum](https://github.com/huggingface/optimum):\n```bash\noptimum-cli export onnx --model \u003cmodel-hub-name\u003e \u003cpath-to-onnx-model-folder\u003e\n```\nThe scripts require a suitable Python 3 virtual environment with `diffusers`, `onnxruntime`, `optimum` and `onnx` \ninstalled.\n\n### Setup ORT extensions\nYou will also need to check out and compile onnxruntime-extensions for your platform. The repo is [https://github.com/microsoft/onnxruntime-extensions](https://github.com/microsoft/onnxruntime-extensions),\nand it can be compiled with `./build_lib.sh --config Release --update --build --parallel` which generates the required library (`libortextensions.[dylib,so]` or `ortextensions.dll`) in the\n`build/\u003cOS-name\u003e/Release/lib/` folder. That library should be copied into the root of this directory.\n\n## Running the GUI\n\nThe GUI can be executed with `mvn package exec:exec -DmodelPath=\u003cpath-to-stable-diffusion-model\u003e`. It constructs a\nwindow where you can specify the parameters of the image you'd like to generate, and each image creates its own window\nwhere it can save the image as a png file. Saved png files contain a metadata field with the generation parameters.\n\n### Use in other programs\n\nThe `com.oracle.labs.mlrg.sd4j.SD4J` class provides a full image generation pipeline which can be used without the GUI\ndirectly from other code.\n\n### Using a CUDA GPU\n\nTo use the GPU you need to modify the pom file to depend on `onnxruntime_gpu` and swap `\u003cexecutionProvider\u003eCPU\u003c/executionProvider\u003e` to\n`\u003cexecutionProvider\u003eCUDA\u003c/executionProvider\u003e` in the Maven `\u003cproperties\u003e` block.\nYou can also specify `-DexecutionProvider=CUDA` when executing the GUI with `mvn package exec:exec`.\n\n### Using OpenVINO\n\nTo enable OpenVINO acceleration, you need to ensure ONNX Runtime library referenced in the pom.xml has been build with OpenVINO explicitly enabled, so that the [OpenVINO Execution Provider](https://onnxruntime.ai/docs/build/eps.html) is included in the Java library pulled by the Maven reactor.\nThen you can swap `\u003cexecutionProvider\u003eCPU\u003c/executionProvider\u003e` to\n`\u003cexecutionProvider\u003eOPENVINO\u003c/executionProvider\u003e` in the Maven `\u003cproperties\u003e` block; you can also specify `-DexecutionProvider=OPENVINO` when executing the GUI with `mvn package exec:exec`.\n\n## Implementation details\n\nThis code provides a thin `Tensor` wrapper object which is a tuple of a direct `ByteBuffer` instance and a long shape\narray, which is used to provide easy access in and out of ORT's `OnnxTensor` objects. There's a `Scheduler` interface \nwhich the two available schedulers (LMS and Euler Ancestral) implement. The `SD4J` pipeline object is a suitable entry\npoint for use without the GUI, and there is an example of such usage in the `CLIApp` class.\n\n## Contributing\n\nThis project welcomes contributions from the community. Before submitting a pull request, please [review our contribution guide](./CONTRIBUTING.md).\n\n## Security\n\nPlease consult the [security guide](./SECURITY.md) for our responsible security vulnerability disclosure process\n\n## License\n\nThe code is available under the [Universal Permissive License (UPL)](https://oss.oracle.com/licenses/upl/). It requires\na Stable Diffusion model architecture checkpoint to work, and any Stable Diffusion models should be used under their\nlicenses. There are 1000+ compatible models available\non [Hugging Face](https://huggingface.co/models?other=stable-diffusion) each of which are licensed separately, though many use a variant of\nthe [OpenRAIL-M license](https://github.com/CompVis/stable-diffusion/blob/main/LICENSE).\n\nThe [tokenizer onnx model](text_tokenizer/custom_op_cliptok.onnx) is taken from the \n[C# implementation](https://github.com/cassiebreviu/StableDiffusion/), and is available under the MIT license. More \ndetails on the tokenizer are available in its [README file](text_tokenizer/README.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foracle%2Fsd4j","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foracle%2Fsd4j","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foracle%2Fsd4j/lists"}