{"id":19984508,"url":"https://github.com/intellabs/mmpano","last_synced_at":"2025-04-12T14:13:57.025Z","repository":{"id":242614343,"uuid":"781056154","full_name":"IntelLabs/MMPano","owner":"IntelLabs","description":"Official implementation of L-MAGIC","archived":false,"fork":false,"pushed_at":"2024-08-08T22:23:58.000Z","size":12174,"stargazers_count":127,"open_issues_count":2,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-26T08:51:32.866Z","etag":null,"topics":["diffusion","llms"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IntelLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-02T17:04:16.000Z","updated_at":"2025-03-14T07:35:19.000Z","dependencies_parsed_at":"2024-06-04T02:38:42.336Z","dependency_job_id":"a61daf43-8cc8-480d-bddc-39bff9cc0da3","html_url":"https://github.com/IntelLabs/MMPano","commit_stats":null,"previous_names":["intellabs/mmpano"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FMMPano","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FMMPano/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FMMPano/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IntelLabs%2FMMPano/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IntelLabs","download_url":"https://codeload.github.com/IntelLabs/MMPano/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248578859,"owners_count":21127713,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["diffusion","llms"],"created_at":"2024-11-13T04:19:16.066Z","updated_at":"2025-04-12T14:13:57.000Z","avatar_url":"https://github.com/IntelLabs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [CVPR 2024] Official implementation of the paper: \"L-MAGIC: Language Model Assisted Generation of Images with Coherence\"\nWe present a novel method that can generate 360 degree panorama from different types of zero-shot inputs (e.g., a single image, text description, hand-drawing etc.). Our Huggingface space is now available. Feel free to try it out!\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"800\" src=\"media/pipeline.png\"\u003e\n\u003c/div\u003e\n\n- [Paper](https://arxiv.org/abs/2406.01843)\n- [Project Page](https://zhipengcai.github.io/MMPano/)\n- [Youtube Video](https://youtu.be/XDMNEzH4-Ec)\n- [Huggingface demo (now available!)](https://huggingface.co/spaces/MMPano/MMPano)\n\n## Industrial Impact\n\n- Our work has been selected as **one of the 5 Intel featured live demos** at [ISC HPC 2024](https://www.intel.com/content/www/us/en/events/supercomputing.html).\n- Our work has been featured by [Intel Community Blog](https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Advancing-Gen-AI-on-Intel-Gaudi-AI-Accelerators-with-Multi-Modal/post/1603746)!\n- Our work has been featured by [Intel Labs Linkedin](https://www.linkedin.com/feed/update/urn:li:activity:7203797143831076864/)!\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"800\" src=\"media/ISC.png\"\u003e\n\u003c/div\u003e\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"800\" src=\"media/blog.png\"\u003e\n\u003c/div\u003e\n\n## 📌 Reference\n\n```bibtex\n@inproceedings{\nzhipeng2024lmagic,\ntitle={L-MAGIC: Language Model Assisted Generation of Images with Coherence},\nauthor={Zhipeng Cai and Matthias Müller and Reiner Birkl and Diana Wofk and Shao-Yen Tseng and JunDa Cheng and Gabriela Ben-Melech Stan and Vasudev Lal and Michael Paulitsch},\nbooktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},\nyear={2024}\n}\n```\n\n## ⭐️ Show Your Support\n\nIf you find this project helpful or interesting, please consider giving it a star! Your support is greatly appreciated and helps others discover the project.\n\n## Environment\n\nThis code has been tested on linux with python 3.9. It should be compatible with also other python versions.\n\n\n## Run on Intel Gaudi\n\nThis codebase has been developed and deployed on Intel Gaudi on Intel Developer Cloud\n\n- [Intel Gaudi](https://habana.ai/)\n- [Intel Developer Cloud](https://www.intel.com/content/www/us/en/developer/tools/devcloud/overview.html)\n\n\n#### Setup Docker environment\n```bash\n# Build docker image\n./docker_build.sh\n\n# Start the container. Following the instruction on the script, you may modify\n# the `HABANA_VISIBLE_DEVICES` and `HABANA_VISIBLE_MODULES` to run on different Gaudi device.\n./docker_run-hpu.sh\n```\n\n\n## Run on other device\n\nYou can also run it on Nvidia GPU. After a proper Nvidia environment setup with pytorch installed (ex: `conda`, `venv`, `docker` ...etc)\n\nInstall the necessary packages by running the following command:\n\n```bash\npip install -r requirements.txt\n```\n\n\n## Run the code\n#### Note\n- If you are running on Gaudi, you will encouter a slower performance because Gaudi requires at least 2 warmup cycles. If you want to build your own application using this codebase, please to warmup the Gaudi at least 2 times.\n  \n- The best performance is enabled by using ChatGPT as the LLM controller, which requires you to apply for an [OpenAI API key](https://platform.openai.com/docs/overview).\n\n- If you are in areas that cannot access the ChatGPT API, we also provided a way to use a free open sourced LLM controller (e.g., Llama3). Please see below for instructions on how to enable it. You may need to set the `HF_TOKEN` or pass a huggingface token. Feel free to also contribute to the code and enable other LLMs.\n\n#### (Optional) Start a TGI LLM server\n\nIf user wants to use the TGI to do LLM serving, the code provides a script to pull the docker image and start a TGI LLM serving on Gaudi. Once the TGI is on, please make sure to pass `--llm_model_name tgi` when running the MM Pano command line in the next step.\n\nWe've only validated the listed LLM models (\"meta-llama/Meta-Llama-3-8B-Instruct\", \"mistralai/Mistral-7B-Instruct-v0.2\"). We encourage users to try out new models and add them to the supported list.\n\n```bash\n# Modify the model name and pass Huggingface token if needed. You can also change the `num_shard` if you like.\nvi mm_pano/tgi_gaudi/run_tgi_gaudi.sh\n\n# Pull and start the TGI-Gaudi in the container\n(cd mm_pano/tgi_gaudi \u0026\u0026 ./run_tgi_gaudi.sh)\n```\n\nIf user wants to run the TGI on other devices, please make sure the default TGI url:port is set to `http://127.0.0.1:8080`.\n\n\n#### Command\nThere are different choices when running the code, a simple example for \n\n- image-to-panorama task\n- ChatGPT LLM (GPT4)\n- Gaudi accelerator as the hardware\n\n```bash\npython3 mm_pano/mmpano.py \\\n  --init_image exp/example/0.png \\\n  --output_folder exp/outputs \\\n  --dtype bfloat16 --device hpu \\\n  --llm_model_name gpt-4 \\\n  --api_key \u003cyour ChatGPT API key\u003e \\\n  --save_pano_img \\  # To save the generated panorama picture\n  --gen_video  # To generate and save the video\n```\n\nTo change the setups, e.g. \n- perform \"text-to-panorama\", change `--init_image exp/example/0.png` to `--init_prompt 'maple autum forest'`, also the `--init_prompt` can be used together with `--init_image` to provide a user specified scene description.\n- use other LLMs, change `--llm_model_name gpt-4` to `--llm_model_name [other LLM names]`. Currently the available choices are `\"gpt-4\", \"gpt-3.5-turbo\", \"meta-llama/Meta-Llama-3-8B-Instruct\", \"mistralai/Mistral-7B-Instruct-v0.2\", \"tgi\"`,\n  where TGI can be a [TGI Gaudi](https://github.com/huggingface/tgi-gaudi) or [TGI](https://github.com/huggingface/text-generation-inference) server to run bigger model like Llama3-70B. Note that the `--api_key` is only used for gpt models.\n- use cuda, change `--device hpu` to `--device cuda`\n- specify camera intrinsic for the input image, add `--intrinsitc float, float, float, float`\n\n## Results (see more on our project page and paper)\n\nAfter running the code, you will see in the output_folder (exp/outputs) a panoramic image \"pano.png\" (see below for examples) and a immersive video \"video.mp4\".\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"800\" src=\"media/pano.png\"\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"800\" src=\"media/snow.jpg\"\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"800\" src=\"media/underwater.jpeg\"\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"800\" src=\"media/livingRoom.jpg\"\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg width=\"800\" src=\"media/library.jpg\"\u003e\n\u003c/div\u003e\n\n\n## Contact\n\nFeel free to send an email to Zhipeng (czptc2h@gmail.com) or Joey (Tien Pei) Chou (joey.t.p.chou@gmail.com) if you have any questions and comments. \n\n## 📈 Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=IntelLabs/MMPano\u0026type=Date)](https://star-history.com/#IntelLabs/MMPano)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintellabs%2Fmmpano","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fintellabs%2Fmmpano","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fintellabs%2Fmmpano/lists"}