{"id":23450759,"url":"https://github.com/basedrhys/text-od-robustness","last_synced_at":"2025-04-10T09:59:21.435Z","repository":{"id":109627009,"uuid":"489786298","full_name":"basedrhys/text-od-robustness","owner":"basedrhys","description":"Evaluating the robustness of text-conditioned OD models such as MDETR","archived":false,"fork":false,"pushed_at":"2022-05-15T19:14:32.000Z","size":21306,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-15T19:48:42.762Z","etag":null,"topics":["captioning","deep-learning","image-captioning","machine-learning","mdetr","model","object-detection","transformers"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/basedrhys.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-05-07T21:34:57.000Z","updated_at":"2022-05-13T05:11:26.000Z","dependencies_parsed_at":"2023-06-11T19:15:11.934Z","dependency_job_id":null,"html_url":"https://github.com/basedrhys/text-od-robustness","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basedrhys%2Ftext-od-robustness","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basedrhys%2Ftext-od-robustness/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basedrhys%2Ftext-od-robustness/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/basedrhys%2Ftext-od-robustness/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/basedrhys","download_url":"https://codeload.github.com/basedrhys/text-od-robustness/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248198865,"owners_count":21063626,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["captioning","deep-learning","image-captioning","machine-learning","mdetr","model","object-detection","transformers"],"created_at":"2024-12-24T00:14:58.848Z","updated_at":"2025-04-10T09:59:21.413Z","avatar_url":"https://github.com/basedrhys.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Evaluating the Robustness of Text-Conditioned OD Models to False Captions\n\n## Summary\n\nThis repository is the official implementation of **Evaluating the Robustness of Text-Conditioned Object Detection Models to False Captions**. \n\n* **Problem Statement**: Empirically, text-conditioned OD models have shown weakness to negative captions (a caption where the implied object is not in the image); currently the means to evaluate this phenomenon systematically are limited.\n\n* **Approach**: Use large pretrained MLM (e.g., RoBERTa) to create negative captions dataset. Perform systematic evaluation over SOTA MDETR models. Evaluate model robustness + runtime performance of different model variations\n\n* **Benefit**: This work shows the strengths and limitations of current text-conditioned OD methods\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"700\" src=\"./img/solution.png\"\u003e\n\u003c/p\u003e\n\n## Contents\n\n- `data/`: Flickr30k annotation data and output of negative-caption generation\n- `sbatch/`: Slurm scripts used to run MDETR evaluation\n- `eval_flickr.*`: Using MDETR models for inference on our negative-caption dataset\n- `eval_MLM.ipynb`: Analysis of the MLM (`roberta-large`) outputs\n- `eval_runtime.ipynb`: Analysis of the MDETR inference runtime\n- `pre_MLM.ipynb`: Generation and preprocessing of negative-caption dataset using `roberta-large`\n\n## Setup\n\n### Python Dependencies\n\nYou'll need to have Anaconda/Miniconda installed on your machine. You can duplicate our environment via the following command:\n\n```setup\nconda env create -f environment.yaml\n```\n\nThis project relies on code from the official **MDETR** repo, so you'll need to clone it to your machine:\n\n```setup\ngit clone https://github.com/ashkamath/mdetr.git\n```\n\n### Flickr30k Dataset\n\nDownload the **flickr30k** dataset: [link](https://shannon.cs.illinois.edu/DenotationGraph/)\n\n### Script Paths\n\nFinally, update the paths in `eval_flickr.sh` to match your environment:\n\n- `OUTPUT_DIR`: Folder for evaluation results to be saved to\n- `IMG_DIR`: **flickr30k** image folder\n- `MDETR_GIT_DIR`: Path to **MDETR** repo cloned from above\n\n## Evaluation\n\nThe core model evaluation is run via `eval_flickr.sh`:\n\n```\n./eval_flickr.sh \u003cbatch size\u003e \u003cpretrained model\u003e \u003cgpu type\u003e\n```\n\nThe currently supported pretrained models are:\n- `mdetr_efficientnetB5`\n- `mdetr_efficientnetB3`\n- `mdetr_resnet101`\n\nFor example to evaluate `mdetr_resnet101` with a batch size of 8 while using an RTX8000:\n\n```eval\n./eval_flickr.sh 8 mdetr_resnet101 rtx8000\n```\n\nAll of the MDETR evaluation was run via Slurm jobs: refer to `sbatch/` for the exact scripts used.\n\n## Results\n\n### MLM Results\n\n![MLM Results](img/mlm-results.png)\n\nIn general, RoBERTa predictions are of low confidence, but more predictions accepted from lower k. Higher confidence predictions more likely to be synonyms, and so therefore are not accepted. \n\n\n### MDETR System Evaluation\n\n![MDETR System Evaluation](img/mdetr-system1.png)\n\nWe saw a significant improvement in total runtime from batch size ≠ 1, however, minimal improvement in runtime as batch size increased further. We also saw a ~linear increase in inference time as batch size increased. Comparing the V100 to the RTX8000, both GPUs comparable in system speed performance (slight difference in mean inference time but standard deviation overlap is too great to make any strong claims)\n\n![MDETR System Evaluation](img/mdetr-system2.png)\n\nDuring MDETR inference we recorded GPU utilisation statistics; the diagram above shows this data stratified by batch size, GPU type, and MDETR variation. As expected, using a higher batch size results in better GPU utilization. Comparing GPUs, we see that RTX8000 achieves better utilisation overall. Looking at MDETR variations, EfficientNetB3 has worst utilisation, while EfficientNetB5 and ResNet101 are comparable.\n\n### MDETR Model Performance\n\n![MDETR Model Performance](img/mdetr-performance1.png)\n\nAs expected, accuracy on the negative detection task (i.e. MDETR does not make a predict when an object is not in the image) increases linearly with the confidence score threshold for bounding box predictsion.\n\n![MDETR Model Performance](img/mdetr-performance2.png)\n\nOur primary goal is to test the negative detection task but we still want the models to be able to identify objects that *are* in the image. We count a correct prediction is IOU between a predicted box and target box \u003e 0.5. We can see that as we increase the score cutoff for bounding box predictions, recall on existent objects drops quicky. It appears that a confidence threshold of ~0.45 achieves a balanced between recall and accuracy on the negative detection task with both values ~ 55%.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasedrhys%2Ftext-od-robustness","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbasedrhys%2Ftext-od-robustness","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbasedrhys%2Ftext-od-robustness/lists"}