https://github.com/cestmerneil/logicvision
This project combines segmentation model and logic tensor network to realize the reasoning of object relationship in images and improve image content analysis through first-order logic formula and multi-layer perceptron network.
https://github.com/cestmerneil/logicvision
ltntorch oneformer visualgenome-dataset yolo
Last synced: 2 months ago
JSON representation
This project combines segmentation model and logic tensor network to realize the reasoning of object relationship in images and improve image content analysis through first-order logic formula and multi-layer perceptron network.
- Host: GitHub
- URL: https://github.com/cestmerneil/logicvision
- Owner: CestMerNeil
- License: gpl-3.0
- Created: 2025-01-22T19:07:58.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-03-22T21:58:26.000Z (3 months ago)
- Last Synced: 2025-03-22T22:36:28.251Z (3 months ago)
- Topics: ltntorch, oneformer, visualgenome-dataset, yolo
- Language: Jupyter Notebook
- Homepage:
- Size: 47.6 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](/README.md)
[](/README/README_fr_FR.md)
[](/README/README_zh_CN.md)
[](/README/README_ja_JP.md)# Understanding visual scenes using logistic tensor neural networks 🚀🤖
[](https://www.python.org)
[](https://developer.nvidia.com/cuda-toolkit)
[](https://github.com/tommasocarraro/LTNtorch)
[](https://homes.cs.washington.edu/~ranjay/visualgenome/index.html)
[](https://github.com/ultralytics/ultralytics)
[](https://github.com/SHI-Labs/OneFormer)This project combines segmentation model and logic tensor network to realize the reasoning of object relationship in images and improve image content analysis through first-order logic formula and multi-layer perceptron network. ✨
---
## Overall architecture and module division
1. **✨ Image segmentation and feature extraction**: The YOLO-Seg model from [UltraLytics](https://docs.ultralytics.com) or the OneFormer model from [SHI-Labs](https://www.shi-labs.com) is used to segment and extract features from the input image. image for segmentation and feature extraction.
2. **✨Goal relation detection**: using a logic tensor network from [LTNTorch](https://github.com/tommasocarraro/LTNtorch), each goal is converted into a logical predicate, which is then reasoned over by the logic tensor network.
3. **✨Logical Relationship Training**: Logistic tensor networks were trained using relational data from the [Visual Genome](https://homes.cs.washington.edu/~ranjay/visualgenome/index.html) database.
4. **✨ Output of reasoning results**: reads the relations found by the user using the form of a ternary and outputs the results of the reasoning.## Installation Guide
### Training environment (Ubuntu 22.04)
```bash
pip install -r requirements.train.txt
```### Reasoning environment (macOS 15.3)
```bash
pip install -r requirements.inference.txt
```Pre-trained models for YOLO and OneFormer are automatically downloaded when the program is run.
## Guidelines for use
### Example of training
```Python
from utils.Trainer import trainerpredicate = ["in", "on", "next to"]
for pred in predicate:
print(f"🚂 Training {pred} ...")
trainer(
pos_predicate=pred,
neg_predicates=[p for p in predicate if p != pred],
epoches=50,
batch_size=32,
lr=1e-4
)
```### Examples of inference
```Python
from utils.Inferencer import Inferencer# Initialize the inferencer
analyzer = Inferencer(
subj_class="person",
obj_class="bicycle",
predicate="near"
)# Perform inference on a single image
result = analyzer.inference_single("demo.jpg")
print(f"🔎 Get :{result['relation']} (Confidence:{result['confidence']:.2f})")# Perform inference on a folder of images
analyzer.process_folder("input_images/")
```# Dataset
The relationships and image metadata data from the [Visual Genome](https://homes.cs.washington.edu/~ranjay/visualgenome/index.html) database were used to extract image information and feature pair information.
The project extracts data and target locations from relational data, and extracts image data to normalize the target locations.
# Code Style and Documentation
This project uses the ```black``` and ```isort``` to automatically enforce a consistent code style. All code comments and documentation follow the [Google Python Style Guide](https://google.github.io/styleguide/) to maintain clarity and consistency.Use the following command to keep the code in the same format before submitting.
```bash
black . && isort .
```
# Acknowledgements
This project is based on the [LTNTorch](https://github.com/tommasocarraro/LTNtorch) project and uses the [Visual Genome](https://homes.cs.washington.edu/~ranjay/visualgenome/api_beginners_tutorial.html) database for data extraction. The project uses the [YOLO](https://doc.ultralytics.com) and [OneFormer](https://www.shi-labs.com) models for object detection and segmentation.# License
This project is licensed under the GNU3.0 License - see the [LICENSE](/LICENSE) file for details.
---