{"id":13703886,"url":"https://github.com/Senzing/mapper-icij","last_synced_at":"2025-05-05T09:32:03.284Z","repository":{"id":38333330,"uuid":"198481047","full_name":"Senzing/mapper-icij","owner":"Senzing","description":"Map ICIJ format to Senzing format.","archived":false,"fork":false,"pushed_at":"2025-04-16T17:46:56.000Z","size":331,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-17T02:28:24.988Z","etag":null,"topics":["mapper","senzing-g2-python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Senzing.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-07-23T17:47:25.000Z","updated_at":"2025-04-16T17:46:59.000Z","dependencies_parsed_at":"2024-11-13T11:32:16.541Z","dependency_job_id":"5b35a62f-66e1-4bf6-95f4-f0c6b19e95ad","html_url":"https://github.com/Senzing/mapper-icij","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Senzing%2Fmapper-icij","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Senzing%2Fmapper-icij/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Senzing%2Fmapper-icij/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Senzing%2Fmapper-icij/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Senzing","download_url":"https://codeload.github.com/Senzing/mapper-icij/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252471602,"owners_count":21753216,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mapper","senzing-g2-python"],"created_at":"2024-08-02T21:01:01.300Z","updated_at":"2025-05-05T09:32:02.922Z","avatar_url":"https://github.com/Senzing.png","language":"Python","funding_links":[],"categories":["Mapper"],"sub_categories":[],"readme":"# mapper-icij\n\n## Overview\n\nThe [icij_mapper.py](icij_mapper.py) python script converts the ICIJ Offshore Leaks database to json files ready to load into Senzing. \n\nThis includes the ...\n- Panama Papers\n- Paradise Papers\n- Bahamas Leaks\n- Offshore Leaks\n- Pandora Papers (added in 2020)\n\n*In May 2022, ICIJ added additional records to their database and updated their format again.   This mapper will only work\nwith files dated 05/03/2022 which can be downloaded [here](https://offshoreleaks-data.icij.org/offshoreleaks/csv/full-oldb.20220503.zip)*\n\n***Since the ICIJ data set is static, we have already run this mapper and made the mapped json file available.  You can\ndownload it by clicking here:\n[icij_2022.json.zip](https://public-read-access.s3.amazonaws.com/mapped-data-sets/icij-offshore-leaks/icij_2022.json.zip).\nYou can then unzip it and load it right into Senzing!  But don't forget to add the configuration first as documented below!***\n\nUsage:\n\n```console\npython icij_mapper.py --help\nusage: icij_mapper.py [-h] [-i INPUT_PATH] [-o OUTPUT_FILE] [-l LOG_FILE] [-a]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -i INPUT_PATH, --input_path INPUT_PATH\n                        path to the downloaded ICIJ csv files\n  -o OUTPUT_FILE, --output_file OUTPUT_FILE\n                        path and file name for the json output\n  -l LOG_FILE, --log_file LOG_FILE\n                        optional statistics filename (json format)\n  -a, --include_address_nodes\n                        include address nodes\n```\n\n## Contents\n\n1. [Prerequisites](#prerequisites)\n2. [Installation](#installation)\n3. [Configuring Senzing](#configuring-senzing)\n4. [Running the mapper](#running-the-mapper)\n5. [Loading into Senzing](#loading-into-senzing)\n\n### Prerequisites\n\n- python 3.6 or higher\n- Senzing API version 2.1 or higher\n- pandas (pip3 install pandas)\n- [Senzing/mapper-base](https://github.com/Senzing/mapper-base)\n\n### Installation\n\nPlace the the following files on a directory of your choice ...\n\n- [icij_mapper.py](icij_mapper.py)\n- [icij_config_updates.g2c](icij_config_updates.g2c)\n\n*Note: Since the mapper-base project referenced above is required by this mapper, it is necessary to place them in a common directory structure like so ...*\n\n```Console\n/senzing/mappers/mapper-base\n/senzing/mappers/mapper-icij         \u003c--\n```\n\nYou will also need to set the PYTHONPATH to where the base mapper is as follows ... (assumuing the directory structure above)\n\n```Console\nexport PYTHONPATH=$PYTHONPATH:/senzing/mappers/mapper-base\n```\n\n### Configuring Senzing\n\n*Note:* This only needs to be performed one time! In fact you may want to add these configuration updates to a master configuration file for all your data sources.\n\nLoading ICIJ data into Senzing only requires registering the data souce.  No additional features or attributes are\nrequired.  This configuration is contained in the [icij_config_updates.g2c](icij_config_updates.g2c) file.\nTo apply it, from your Senzing project's python directrory type ...\n\n```console\npython3 G2ConfigTool.py \u003cpath-to-file\u003e/icij_config_updates.g2c\n```\n\nThis will step you through the process of adding any data sources, features, attributes and other settings needed to load this data into Senzing.\nAfter each command you will see a status message saying \"success\" or \"already exists\".\nFor instance, if you run the script twice, the second time through they will all say \"already exists\" which is OK.\n\n### Running the mapper\n\nDownload the raw files from: [https://offshoreleaks.icij.org/pages/database](https://offshoreleaks.icij.org/pages/database)\n\n![download page](images/download_page.jpg)\n\nWith the addition of the Pandora Papers in November 2020 and again in May 2022, there is now only 1 zip file\n *currently* named **full-oldb-20220503.zip** containing the files listed below:\n\n- nodes-entities.csv\n- nodes-intermediaries.csv\n- nodes-officers.csv\n- nodes-addresses.csv\n- nodes-others.csv\n- relationships.csv\n\nUnzip the files to a directory of your choice. *(in the example below the csv files were unzipped to /senzing/mappers/mapper-icij/input)*\n\nThe mapper will read all the files and create one output file.  Example usage:\n\n```console\npython3 icij_mapper.py -i /senzing/mappers/mapper-icij/input -o /senzing/mappers/mapper-icij/output/icij_2022.json\n```\n- Add the -l --log_file argument to generate a mapping statistics file\n- Add the -a --include_address_nodes argument to generate the address nodes as well. *Please note that addresses from these nodes\nare mapped to their entities regardless of this setting.*\n\n\n### Loading into Senzing\n\nIf you use the G2Loader program to load your data, from the /opt/senzing/g2/python directory ...\n\n```console\npython3 G2Loader.py -f /senzing/mappers/mapper-icij/output/icij_2022.json\n```\n\nThis data set currently contains about 1.9 million records and make take an hour or more to load depending on your hardware.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSenzing%2Fmapper-icij","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSenzing%2Fmapper-icij","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSenzing%2Fmapper-icij/lists"}