{"id":30292518,"url":"https://github.com/linusling/rebalance-hca","last_synced_at":"2025-08-17T00:34:26.088Z","repository":{"id":305796997,"uuid":"1004660379","full_name":"LinusLing/ReBalance-HCA","owner":"LinusLing","description":"The PyTorch implementation for the paper Enhancing scene graph generation via Hybrid Co-Attention and Predicate Reweighting for long-tail robustness.","archived":false,"fork":false,"pushed_at":"2025-07-22T04:00:27.000Z","size":1333,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-22T04:28:26.479Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LinusLing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-19T01:40:12.000Z","updated_at":"2025-07-22T04:00:30.000Z","dependencies_parsed_at":"2025-07-22T04:38:59.258Z","dependency_job_id":null,"html_url":"https://github.com/LinusLing/ReBalance-HCA","commit_stats":null,"previous_names":["linusling/rebalance-hca"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/LinusLing/ReBalance-HCA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LinusLing%2FReBalance-HCA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LinusLing%2FReBalance-HCA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LinusLing%2FReBalance-HCA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LinusLing%2FReBalance-HCA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LinusLing","download_url":"https://codeload.github.com/LinusLing/ReBalance-HCA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LinusLing%2FReBalance-HCA/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":270791280,"owners_count":24645781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-16T02:00:11.002Z","response_time":91,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-17T00:34:24.996Z","updated_at":"2025-08-17T00:34:26.077Z","avatar_url":"https://github.com/LinusLing.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Enhancing scene graph generation via Hybrid Co-Attention and Predicate Reweighting for long-tail robustness (ReBalance-HCA)\n\n## Description\n\nThis is the PyTorch implementation for **ReBalance-HCA** proposed in the paper **Enhancing scene graph generation via Hybrid Co-Attention and Predicate Reweighting for long-tail robustness**. The scene graph generation task aims to generate a set of triplet \u003csubject, predicate, object\u003e and construct a graph structure for an image. The system supports multiple evaluation modes including Predicate Classification (PredCls), Scene Graph Classification (SGCls), and Scene Graph Detection (SGDet).\n\n## Architecture\n\n![framework](framework.png)\n\n## Dataset Information\nThe system supports **Visual Genome Dataset** datasets for scene graph generation:\n* Used as the primary dataset for training and evaluation\n* Requires GloVe embeddings for semantic processing\n* Dataset path configuration through GLOVE_DIR ./datasets/vg/\n\nThe following is adapted from [Danfei Xu](https://github.com/danfeiX/scene-graph-TF-release/blob/master/data_tools/README.md) and [neural-motifs](https://github.com/rowanz/neural-motifs).\n\nNote that our codebase intends to support attribute-head too, so our ```VG-SGG.h5``` and ```VG-SGG-dicts.json``` are different with their original versions in [Danfei Xu](https://github.com/danfeiX/scene-graph-TF-release/blob/master/data_tools/README.md) and [neural-motifs](https://github.com/rowanz/neural-motifs). We add attribute information and rename them to be ```VG-SGG-with-attri.h5``` and ```VG-SGG-dicts-with-attri.json```. The code we use to generate them is located at ```datasets/vg/generate_attribute_labels.py```. Consistent with the conventional approach described in 'Unbiased Scene Graph Generation from Biased Training', we disable the attribute head during detector pretraining and relationship prediction to ensure fair comparison, mirroring this codebase's default configuration.\n\n### Download:\n1. Download the VG images [part1](https://cs.stanford.edu/people/rak248/VG_100K_2/images.zip) [part2](https://cs.stanford.edu/people/rak248/VG_100K_2/images2.zip). Extract these images to the file `datasets/vg/VG_100K`. If you want to use other directory, please link it in `DATASETS['VG_stanford_filtered']['img_dir']` of `maskrcnn_benchmark/config/paths_catelog.py`. \n2. Download the [scene graphs](https://onedrive.live.com/embed?cid=22376FFAD72C4B64\u0026resid=22376FFAD72C4B64%21779871\u0026authkey=AA33n7BRpB1xa3I) and extract them to `datasets/vg/VG-SGG-with-attri.h5`, or you can edit the path in `DATASETS['VG_stanford_filtered_with_attribute']['roidb_file']` of `maskrcnn_benchmark/config/paths_catelog.py`.\n\n### Dataset Setup \nFollow the [instructions](https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch) to install and use the code. Follow some [scripts](https://github.com/ZhuGeKongKong/SSG-G2S/tree/main/scripts) for training and testing.\nThe training process is divided into two stages: **Training the common SGG model** and **Finetuning the informative SGG model**.\n\n## Code Information\nThe codebase is built on the MaskRCNN-Benchmark framework and includes:\n\n### Core Components:\n1. Transformer Predictors: Multiple transformer-based relation prediction models\n2. Relation Head: Complete relation detection pipeline\n3. Data Loaders: Custom dataset handling with resampling support\n\n### Key Models:\n* TransformerTransferPredictor: Main relation prediction model\n* Clean Classifier: Bias-aware classification component\n* Transfer Classifier: Domain adaptation capabilities\n\n## Usage Instructions\n### Training\nUse the provided training scripts with different configurations:\n\n#### For Predicate Classification:\n```bash\nbash scripts/train.sh 0  # PredCls mode\n```\n#### For Scene Graph Classification:\n```bash\nbash scripts/train.sh 1  # SGCls mode\n```\n#### For Scene Graph Detection:\n```bash\nbash scripts/train.sh 2  # SGDet mode\n```\n\n### Testing\nExecute evaluation using the test scripts:\n\n```bash\nbash scripts/test.sh 0  # Test PredCls  \nbash scripts/test.sh 1  # Test SGCls    \nbash scripts/test.sh 2  # Test SGDet\n```\n\n## Requirements\n### Dependencies:\n* Python 3.8\n* PyTorch with CUDA support 2.4.0\n* APEX for mixed-precision training\n* COCO API for dataset handling\n\n### Environment Setup:\nThe system requires proper PYTHONPATH configuration including APEX and COCO API paths.\n\n## Materials \u0026 Methods\n### Computing Infrastructure\n* **Operating System**: Linux-based systems\n* **Hardware Requirements**: NVIDIA GPUs with CUDA support\n* **Distributed Training**: Multi-GPU training support via DistributedDataParallel (optional)\n\n### Data Preprocessing Steps\n#### Image Preprocessing Pipeline:\nThe system applies different preprocessing transforms for training and evaluation phases:\n\n#### Training Transforms:\n* Color jittering (brightness, contrast, saturation, hue adjustments)\n* Random horizontal flipping (50% probability)\n* Random vertical flipping (configurable probability)\n* Image resizing with aspect ratio preservation\n* Tensor conversion and normalization [build.py:36-45](https://github.com/LinusLing/ReBalance-HCA/tree/main/maskrcnn_benchmark/data/transforms/build.py#L36-L45)\n\n#### Evaluation Transforms:\n* No augmentation applied (color jitter and flipping disabled)\n* Only resizing, tensor conversion, and normalization applied [build.py:15-23](https://github.com/LinusLing/ReBalance-HCA/tree/main/maskrcnn_benchmark/data/transforms/build.py#L15-L23)\n\n#### Normalization Parameters:\n* Pixel mean: [102.9801, 115.9465, 122.7717]\n* Pixel standard deviation: [1.0, 1.0, 1.0]\n* BGR255 format conversion for compatibility [defaults.py:57-61](https://github.com/LinusLing/ReBalance-HCA/tree/main/maskrcnn_benchmark/config/defaults.py#L57-L61)\n\n#### Dataset Format Requirements:\n* PASCAL VOC dataset supported in original format\n* Other datasets require conversion to COCO JSON format\n* Symlink-based dataset organization for efficient access\n\n### Evaluation Method\nThe system supports comprehensive evaluation across three main tasks:\n\n1. **Predicate Classification (PredCls)**: Uses ground truth bounding boxes and object labels\n2. **Scene Graph Classification (SGCls)**: Uses ground truth bounding boxes, predicts object labels\n3. **Scene Graph Detection (SGDet)**: End-to-end detection and relation prediction relation_train_net.py:314-323\n\n### Training Configuration:\n* Mixed precision training support\n* Warmup learning rate scheduling\n* Gradient clipping for training stability\n* Validation-based early stopping\n\n## Conclusions\n### Limitations\n\n* **Computational Resources Trade-off**  \n   While the Hybrid Co-Attention delivers enhanced feature fusion, its iterative refinement process involves moderately increased training time. We recommend leveraging multi-GPU configurations where feasible to optimize throughput.\n\n* **Novel Predicate Handling**  \n   The current framework focuses primarily on rebalancing existing predicate distributions. Accommodating entirely new predicates with minimal examples remains an active research direction – users exploring few-shot scenarios may consider supplementary augmentation techniques as interim solutions.\n\n* **Domain Adaptation Scope**  \n   For scenarios with significant domain shifts between datasets, note that our current fine-tuning protocol adjusts only the final model layer. This lightweight adaptation strategy favors efficiency, though exceptionally divergent domains might warrant additional network adjustments.\n\n## License \u0026 Contribution Guidelines\nThis implementation is based on [Facebook's MaskRCNN-Benchmark framework](https://github.com/facebookresearch/maskrcnn-benchmark). Users should refer to the original licensing terms and follow standard open-source contribution practices. \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinusling%2Frebalance-hca","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinusling%2Frebalance-hca","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinusling%2Frebalance-hca/lists"}