https://github.com/zjunlp/knowledge2data
Spatial Knowledge Graph-Guided Synthesis for Multimodal LLMs
https://github.com/zjunlp/knowledge2data
generation kg2data knowledge-graph large-language-models multimodal-large-language-models natural-language-processing spatial-knowledge-graph synthetic-data visual-question-answering
Last synced: 26 days ago
JSON representation
Spatial Knowledge Graph-Guided Synthesis for Multimodal LLMs
- Host: GitHub
- URL: https://github.com/zjunlp/knowledge2data
- Owner: zjunlp
- License: mit
- Created: 2024-12-11T14:34:00.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-03T12:50:47.000Z (10 months ago)
- Last Synced: 2025-06-13T23:04:51.985Z (10 months ago)
- Topics: generation, kg2data, knowledge-graph, large-language-models, multimodal-large-language-models, natural-language-processing, spatial-knowledge-graph, synthetic-data, visual-question-answering
- Language: Python
- Homepage:
- Size: 1.51 MB
- Stars: 2
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
ð Knowledge2Data ð
Spatial Knowledge Graph-Guided Multimodal Synthesis
[](https://github.com/zjunlp/Knowledge2Data)
[](https://opensource.org/licenses/MIT)
Project â¢
Paper â¢
HuggingFace â¢
Overview â¢
Quickstart â¢
Citation
## Table of Contents
- What's New â¢
- Overview â¢
- Quickstart â¢
- Citation
## ðNews
- **2025-11-01, Our paper has been ACCEPTED for publication as a REGULAR paper in the IEEE TASLP(Transactions on Audio, Speech and Language Processing).**
- **2025-02-28, We release the paper.**
---
## ðOverview
## â©Quickstart
### Data
Get training data and test data from HuggingFace: https://huggingface.co/datasets/zjunlp/Knowledge2Data
### Installation
```
git clone https://github.com/zjunlp/Knowledge2Data
cd Knowledge2Data
conda create -n skg python==3.9
conda activate skg
pip install -r requirements.txt
```
### Download the models
#### Download the following models from HuggingFace
| ð¯ Model Name | ð€ HuggingFace |
|-------------------------------|---------------------------------------------------------------------------|
| Diffusers-generation-text-box | [gligen/diffusers-generation-text-box](https://huggingface.co/gligen/diffusers-generation-text-box) |
| Sam-vit-base | [stabilityai/stable-diffusion-xl-refiner-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) |
| Stable-diffusion-xl-refiner | [facebook/sam-vit-base](https://huggingface.co/facebook/sam-vit-base) |
### Export the environment variables.
```shell
cd src
export OPENAI_API_KEY="YOUR_API_KEY"
export SKG_HF_MODELS="LOCAL_HUGGINGFACE_MODELS_DIR"
```
### Generate Spatial KG and multimodal synthetic data.
#### Execute script to generate Spatial KG.
```shell
sh run_skg.sh
```
You can also customize objects and their spatial relationships to form Spatial KG. Save the file format as a JSON file similar to "src/data/skg_demo.json".
#### Execute script to multimodal synthetic data.
```shell
sh run_data.sh
```
For custom data, only the input file parameters "--input_file" need to be modified.
You can find generated data in "src/data" and images in "src/img_generations" as default.
If you want to generate more data, you can modify the parameters including "--num_scenes" ([generate_scenes.py](src%2Fgenerate_scenes.py)) and "--repeats" ([generate_images.py](src%2Fgenerate_images.py)).
## ð»Acknowledgement
This project is based on open-source projects including [LLM-groundedDiffusion](https://github.com/TonyLianLong/LLM-groundedDiffusion). Thanks for their great contributions!
### ð©Citation
Please cite the following paper if you use this project in your work.
```bibtex
@misc{xue2025spatialknowledgegraphguidedmultimodal,
title={Spatial Knowledge Graph-Guided Multimodal Synthesis},
author={Yida Xue and Zhen Bi and Jinnan Yang and Jungang Lou and Huajun Chen and Ningyu Zhang},
year={2025},
eprint={2505.22633},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.22633},
}
```
---