{"id":26707180,"url":"https://github.com/cyrilvallez/image-manipulation-detection","last_synced_at":"2025-04-13T15:35:20.036Z","repository":{"id":69954151,"uuid":"459900860","full_name":"Cyrilvallez/Image-manipulation-detection","owner":"Cyrilvallez","description":"Benchmarking library for image manipulation detection.","archived":false,"fork":false,"pushed_at":"2023-03-29T18:02:04.000Z","size":142029,"stargazers_count":10,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-27T06:32:10.037Z","etag":null,"topics":["benchmarking","computer-vision","deep-learning","image-fingerprint","perceptual-similarity"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Cyrilvallez.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-16T07:42:45.000Z","updated_at":"2023-08-25T13:10:50.000Z","dependencies_parsed_at":"2025-03-27T06:38:08.031Z","dependency_job_id":null,"html_url":"https://github.com/Cyrilvallez/Image-manipulation-detection","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cyrilvallez%2FImage-manipulation-detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cyrilvallez%2FImage-manipulation-detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cyrilvallez%2FImage-manipulation-detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Cyrilvallez%2FImage-manipulation-detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Cyrilvallez","download_url":"https://codeload.github.com/Cyrilvallez/Image-manipulation-detection/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248737005,"owners_count":21153689,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarking","computer-vision","deep-learning","image-fingerprint","perceptual-similarity"],"created_at":"2025-03-27T06:27:59.060Z","updated_at":"2025-04-13T15:35:20.028Z","avatar_url":"https://github.com/Cyrilvallez.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# News\n\nHere is the link towards our [paper](https://arxiv.org/abs/2206.00282) reviewing image manipulation detection and image reuse detection techniques. If you use this repo, please consider citing us !\n\n# Image manipulation detection\n\nA library for benchmarking of image manipulation detection. This supports 3 classes of algorithms :\n\n- Perceptual hashing methods (fast and simple methods designed for image forensics). The following algorithms are implemented in `hashing/imagehash.py` (taken and modified from [here](https://github.com/JohannesBuchner/imagehash)):\n    - Average Hash\n    - Perceptual hash\n    - Difference hash\n    - Wavelet hash\n    - Crop resistant hash\n    - Color hash\n    - Histogram hash\n\n\n- Features extractors and descriptors (designed for object/scene retievals). The following algorithms are supported in `hashing/featurehash.py` :\n    - SIFT\n    - ORB\n    - FAST + LATCH\n    - FAST + DAISY\n\n\n- Neural networks (deep CNNs) whose features from last layers have been shown to provide high descriptors of the image (regardless of the specific task the network was designed for, e.g classification). The following architectures are supported (note that each network was pretrained on ImageNet either for classification or by contrastive self-supervised learning) in `hashing/neuralhash.py`:\n    - inception v3 (classification)\n    - EfficientNet B7 (classification)\n    - ResNets with different depth and width multipliers (classification)\n    - SimCLR ResNets (contrastive learning). Link to [paper](https://arxiv.org/abs/2002.05709) and [github](https://github.com/google-research/simclr).\n\nThe specific goal here is more to detect crude near duplicate image manipulations than to perform object or scene retrival.\n\n# Pre-trained SimCLR models \n\nThe pre-trained SimCLR models are not available in this repository due to their large size. To download them, please navigate to the path where you cloned the repo and run the following files from your terminal :\n\n```sh\ncd path_to_repo/hashing/SimCLRv1\npython3 download.py \n```\n\nThis will download the files containing the models definitions to the current folder and convert them to Pytorch. The folder `tf_checkpoints` contains the Tensorflow definition of the models (directly downloaded from the [github of the authors](https://github.com/google-research/simclr)), and can be safely erased if you wish to save some disk space. \n\nThe exact same procedure will download the models for SimCLRv2 : \n\n```sh\ncd path_to_repo/hashing/SimCLRv2\npython3 download.py \n```\n\nBy default, this will only download one model. To download the others, please have a look at the `--model` argument. If unsure what is accepted, please have a look at the help message :\n\n```sh\npython3 download.py -h\n```\n\n# Usage\n\nThe basic usage for performing an experiment is \n\n```sh\npython3 main.py result_folder\n```\n\nDigest from the experiment will then be saved into `Results/result_folder`. Details are given below.\n\nThis library was created to benchmark all the different methods presented above. The easiest way for this is to choose a dataset, randomly split it in 2 parts (experimental and control groups), and sample a given number of images in both groups on which you can perform artificial attacks defined in `generator/generate_attacks.py`. The scripts `create_groups.py` and `create_attacks.py` perform those tasks, and save the images with correct name format for later matching.\n\nThen given a database of images, an experimental group of images that are manipulations of some images in the database (all attacks on the images sampled from experimental group) and a control group containing images not present in the database (all attacks on the images sampled from control group), datasets can be declared in the following way (here with the BSDS500 dataset as an example) :\n\n```python\nimport hashing \nfrom helpers import utils\n\npath_database = 'Datasets/BSDS500/Experimental/'\npath_experimental = 'Datasets/BSDS500/Experimental_attacks/'\npath_control = 'Datasets/BSDS500/Control_attacks/'\n\npositive_dataset = hashing.create_dataset(path_experimental, existing_attacks=True)\nnegative_dataset = hashing.create_dataset(path_control, existing_attacks=True)\n```\n\nAdditionally, if one wants to perform attacks at experiment time, without having to save them to disk (experiment will take more time but this will save storage space), it can be done as\n\n```python\npath_dataset = 'Datasets/...'\n\ndataset = hashing.create_dataset(path_dataset, fraction=0.3, existing_attacks=False):\n```\n\nwhere `fraction` is the fraction of the dataset on which attacks will be performed (give 1 for each image in the dataset).\n\nThen declare the methods and algorithms you wish to use, along with thresholds for the matching logic, e.g :\n\n```python\nalgos = [\n        hashing.ClassicalAlgorithm('Phash', hash_size=8),\n        hashing.FeatureAlgorithm('ORB', n_features=30),\n        hashing.NeuralAlgorithm('SimCLR v1 ResNet50 2x', device='cuda', distance='Jensen-Shannon')\n        ]\n\nthresholds = [\n    np.linspace(0, 0.4, 20),\n    np.linspace(0, 0.3, 20),\n    np.linspace(0.3, 0.8, 20),\n    ]\n```\n\nA list of all valid algorithms names can be found in `hashing/general_hash.py`, or equivalently by accessing the variable `ADMISSIBLE_ALGORITHMS` : \n\n```python\nimport hashing\nprint(hashing.ADMISSIBLE_ALGORITHMS)\n```\n\nValid arguments for an algorithm can be found looking at the docstrings for each of the three classes `hashing.ClassicalAlgorithm` (corresponding to perceptual hashing methods), `hashing.FeatureAlgorithm` (keypoint-related or *feature* extractors methods), and `hashing.NeuralAlgorithm` (obviously neural based methods).\n\nFinally perform the benchmark and save the results :\n\n```python\nsave_folder = utils.parse_input()\n\ndigest = hashing.total_hashing(algos, thresholds, path_database, positive_dataset, negative_dataset, general_batch_size=64)\n                               \nutils.save_digest(digest, save_folder)\n```\n\nAll this is contained in `main.py`. \n\nThe final digest is composed of 6 files : `general.json` with general metrics for all the experiment, `attacks.json` containing the metrics for each types of attack, `images_pos.json` and `images_neg.json` containing number of correct/incorrect detection for each image in the database respectively, and `match_time.json` and `db_time.json` respectively containing the time (s) for the matching phase and the the database creation phase.\n\n# Figure generation\n\nTo process and create figures from the digest, one can look into `process.py`. Figure generation is contained in `helpers/create_plot.py`. Note that by default this will require a LaTeX installation on the machine running the process. This can be disabled in `helpers/configs_plot.py`.\n\n# Datasets\n\nWe personally used 3 datasets that can be found online, and for which we performed the splitting. They are the [BSDS500 dataset](https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/resources.html), [ImageNet validation set (ILSVRC2012)](https://www.image-net.org/) and the [Kaggle memes dataset](https://www.kaggle.com/datasets/gmorinan/most-viewed-memes-templates-of-2018). For the kaggle memes dataset, one then need to run `data_retrieval/kaggle_splitter.py` to extract templates and annotate correctly the memes.\n\n# Computational setup\n\nFor neural methods, use of a GPU is almost essential for computational efficiency. Other classes of methods do not rely on it, and their computations are performed exclusively on CPU.\n\n# Results\n\nThe repository contains the folder `Results` containing digests from our own experiments. Each experiment contains a file `Experiment.yml` quickly summarizing the parameters for the experiment. You are free to look at it and perform the same benchmarks if you wish to verify results.\n\nAdditionally, people may use the data from the digest of our experiments and just recreate figures, using the experiment name along with `helpers/create_plot.py` functions. Examples are provided in `process.py`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyrilvallez%2Fimage-manipulation-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcyrilvallez%2Fimage-manipulation-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcyrilvallez%2Fimage-manipulation-detection/lists"}