{"id":24024758,"url":"https://github.com/cognitedata/cognite-replicator","last_synced_at":"2025-10-11T14:06:22.013Z","repository":{"id":36062066,"uuid":"198810388","full_name":"cognitedata/cognite-replicator","owner":"cognitedata","description":"A package of scripts for replicating data between CDF tenants","archived":false,"fork":false,"pushed_at":"2025-08-23T19:54:20.000Z","size":738,"stargazers_count":7,"open_issues_count":25,"forks_count":5,"subscribers_count":76,"default_branch":"master","last_synced_at":"2025-08-24T07:45:27.534Z","etag":null,"topics":["cdf","python","replicating-data","sa"],"latest_commit_sha":null,"homepage":"https://cognite-cognite-replicator.readthedocs-hosted.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cognitedata.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-07-25T10:29:07.000Z","updated_at":"2024-10-19T08:15:02.000Z","dependencies_parsed_at":"2025-08-15T01:11:06.959Z","dependency_job_id":"67063098-c671-46a7-83e2-32ffae9c79c2","html_url":"https://github.com/cognitedata/cognite-replicator","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/cognitedata/cognite-replicator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognitedata%2Fcognite-replicator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognitedata%2Fcognite-replicator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognitedata%2Fcognite-replicator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognitedata%2Fcognite-replicator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cognitedata","download_url":"https://codeload.github.com/cognitedata/cognite-replicator/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cognitedata%2Fcognite-replicator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279007479,"owners_count":26084313,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdf","python","replicating-data","sa"],"created_at":"2025-01-08T15:34:39.992Z","updated_at":"2025-10-11T14:06:21.997Z","avatar_url":"https://github.com/cognitedata.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ca href=\"https://cognite.com/\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/cognitedata/cognite-python-docs/master/img/cognite_logo.png\" alt=\"Cognite logo\" title=\"Cognite\" align=\"right\" height=\"80\" /\u003e\n\u003c/a\u003e\n\n# Cognite Python Replicator\n[![build](https://webhooks.dev.cognite.ai/build/buildStatus/icon?job=github-builds/cognite-replicator/master)](https://jenkins.cognite.ai/job/github-builds/job/cognite-replicator/job/master/)\n[![codecov](https://codecov.io/gh/cognitedata/cognite-replicator/branch/master/graph/badge.svg)](https://codecov.io/gh/cognitedata/cognite-replicator)\n[![Documentation Status](https://readthedocs.com/projects/cognite-cognite-replicator/badge/?version=latest)](https://cognite-cognite-replicator.readthedocs-hosted.com/en/latest/)\n[![PyPI version](https://badge.fury.io/py/cognite-replicator.svg)](https://pypi.org/project/cognite-replicator/)\n[![tox](https://img.shields.io/badge/tox-3.6%2B-blue.svg)](https://www.python.org/downloads/release/python-366/)\n![PyPI - Python Version](https://img.shields.io/pypi/pyversions/cognite-replicator)\n[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)\n\nCognite Replicator is a Python package for replicating data across Cognite Data Fusion (CDF) projects. This package is built on top of the Cognite Python SDK.\nThis component is Community content and not officially supported by Cognite. Bugs and changes will be fixed on a best effort basis. Feel free to open issues and pull requests, we will review them as soon as we can. \n\nCopyright 2023 Cognite AS\n\n## Prerequisites\nIn order to start using the Replicator, you need:\n- Python3 (\u003e= 3.6)\n- Credentials for both the source and destination projects: \n    - CLIENT_ID (\"Client ID from Azure\")\n    - CLIENT_SECRET (\"Client secret from Azure\", only if using authentication via secret)\n    - CLUSTER (\"Name of CDF cluster\")\n    - TENANT_ID (\"Tenant ID from Azure\"\n    - PROJECT (\"Name of CDF project\")\n\nThis is how you set the client secret as an environment variable on Mac OS and Linux:\n```bash\n$ export SOURCE_CLIENT_SECRET=\u003cyour source client secret\u003e\n$ export DEST_CLIENT_SECRET=\u003cyour destination client secret\u003e\n```\n\n## Installation\nThe replicator is available on [PyPI](https://pypi.org/project/cognite-replicator/), and can also be executed .\n\nTo run it from command line, run:\n```bash\npip install cognite-replicator\n```\n\nAlternatively, build and run it as a docker container. The image is avaible on [docker hub](https://hub.docker.com/r/cognite/cognite-replicator):\n```bash\ndocker build -t cognite-replicator .\n```\n\n## Usage\n\n### 1. Run with a configuration file as a standalone script\n\nCreate a configuration file based on the config/default.yml and update the values corresponding to your environment\nIf no file is specified then replicator will use config/default.yml.\n\nvia Python \n\n```bash\npython -m cognite.replicator config/filepath.yml\n```\n\nor alternatively via docker\nIf no access to a browser, you should use the client secret authentication\n\n```bash\ndocker run -e SOURCE_CLIENT_SECRET -e DEST_CLIENT_SECRET -v /absolute/path/to/config/config.yml:/config.yml cognite-replicator /config.yml\n```\n\n### 2. Setup as Python library\n#### 2.1 Without configuration file and interactive login \nIt will copy everything from source to destination and use your own credentials to run the code, you need to have the right permissions to read on the source project and write on the destination project\n\n```python\nimport os\nimport yaml\nfrom cognite.client.credentials import OAuthInteractive\nfrom cognite.client import CogniteClient, ClientConfig\nfrom cognite.replicator import assets, events, files, time_series, datapoints, sequences, sequence_rows\n\n# SOURCE\nSOURCE_TENANT_ID = \"48d5043c-cf70-4c49-881c-c638f5796997\"\nSOURCE_CLIENT_ID = \"1b90ede3-271e-401b-81a0-a4d52bea3273\"\nSOURCE_PROJECT = \"publicdata\"\nSOURCE_CLUSTER = \"api\"\n\n# DESTINATION\nDEST_TENANT_ID = \"d4febcbc-db24-4823-bffd-92fd05b9c6bc\"\nDEST_CLIENT_ID = \"189e8b95-f1ce-47d2-aa66-4c2fe3567f91\"\nDEST_PROJECT = \"sa-team\"\nDEST_CLUSTER = \"bluefield\"\n\n### Autogenerated variables\nSOURCE_SCOPES = [f\"https://{SOURCE_CLUSTER}.cognitedata.com/.default\"]\nSOURCE_BASE_URL = f\"https://{SOURCE_CLUSTER}.cognitedata.com\"\nSOURCE_AUTHORITY_URL = f\"https://login.microsoftonline.com/{SOURCE_TENANT_ID}\"\nDEST_SCOPES = [f\"https://{DEST_CLUSTER}.cognitedata.com/.default\"]\nDEST_BASE_URL = f\"https://{DEST_CLUSTER}.cognitedata.com\"\nDEST_AUTHORITY_URL = f\"https://login.microsoftonline.com/{DEST_TENANT_ID}\"\n\n# Config\nBATCH_SIZE = 10000  # this is the max size of a batch to be posted\nNUM_THREADS = 10  # this is the max number of threads to be used\nTIMEOUT = 90\nPORT = 53000\n\nSOURCE_CLIENT = CogniteClient(\n    ClientConfig(\n        credentials=OAuthInteractive(\n            authority_url=SOURCE_AUTHORITY_URL,\n            client_id=SOURCE_CLIENT_ID,\n            scopes=SOURCE_SCOPES,\n        ),\n        project=SOURCE_PROJECT,\n        base_url=SOURCE_BASE_URL,\n        client_name=\"cognite-replicator-source\",\n    )\n)\nDEST_CLIENT = CogniteClient(\n    ClientConfig(\n        credentials=OAuthInteractive(\n            authority_url=DEST_AUTHORITY_URL,\n            client_id=DEST_CLIENT_ID,\n            scopes=DEST_SCOPES,\n        ),\n        project=DEST_PROJECT,\n        base_url=DEST_BASE_URL,\n        client_name=\"cognite-replicator-destination\",\n    )\n)\n\nif __name__ == \"__main__\":  # this is necessary because threading\n\n    #### Uncomment the resources you would like to copy\n    assets.replicate(SOURCE_CLIENT, DEST_CLIENT)\n    #events.replicate(SOURCE_CLIENT, DEST_CLIENT, BATCH_SIZE, NUM_THREADS)\n    #files.replicate(SOURCE_CLIENT, DEST_CLIENT, BATCH_SIZE, NUM_THREADS)\n    #time_series.replicate(SOURCE_CLIENT, DEST_CLIENT, BATCH_SIZE, NUM_THREADS)\n    #datapoints.replicate(SOURCE_CLIENT, DEST_CLIENT)\n    #sequences.replicate(SOURCE_CLIENT, DEST_CLIENT, BATCH_SIZE, NUM_THREADS)\n    #sequence_rows.replicate(SOURCE_CLIENT, DEST_CLIENT, BATCH_SIZE, NUM_THREADS)\n```\n\n#### 2.2 Without configuration file and with client credentials authentication\nIt will copy everything from source to destination and use your own credentials to run the code, you need to have the right permissions to read on the source project and write on the destination project\n(in the example below, the secrets are stored as environment variables)\n\n```python\nimport os\nfrom cognite.client.credentials import OAuthClientCredentials\nfrom cognite.client import CogniteClient, ClientConfig\nfrom cognite.replicator import assets, events, files, time_series, datapoints, sequences, sequence_rows\n\n# SOURCE\nSOURCE_TENANT_ID = \"48d5043c-cf70-4c49-881c-c638f5796997\"\nSOURCE_CLIENT_ID = \"1b90ede3-271e-401b-81a0-a4d52bea3273\"\nSOURCE_CLIENT_SECRET = os.environ.get(\"SOURCE_CLIENT_SECRET\")\nSOURCE_PROJECT = \"publicdata\"\nSOURCE_CLUSTER = \"api\"\n\n# DESTINATION\nDEST_TENANT_ID = \"d4febcbc-db24-4823-bffd-92fd05b9c6bc\"\nDEST_CLIENT_ID = \"189e8b95-f1ce-47d2-aa66-4c2fe3567f91\"\nDEST_CLIENT_SECRET = os.environ.get(\"DEST_CLIENT_SECRET\")\nDEST_PROJECT = \"sa-team\"\nDEST_CLUSTER = \"bluefield\"\n### Autogenerated variables\nSOURCE_SCOPES = [f\"https://{SOURCE_CLUSTER}.cognitedata.com/.default\"]\nSOURCE_BASE_URL = f\"https://{SOURCE_CLUSTER}.cognitedata.com\"\nSOURCE_TOKEN_URL = f\"https://login.microsoftonline.com/{SOURCE_TENANT_ID}/oauth2/v2.0/token\"\nDEST_SCOPES = [f\"https://{DEST_CLUSTER}.cognitedata.com/.default\"]\nDEST_BASE_URL = f\"https://{DEST_CLUSTER}.cognitedata.com\"\nDEST_TOKEN_URL = f\"https://login.microsoftonline.com/{DEST_TENANT_ID}/oauth2/v2.0/token\"\nCOGNITE_CONFIG_FILE = \"config/config.yml\"\n# Config\nBATCH_SIZE = 10000  # this is the max size of a batch to be posted\nNUM_THREADS = 10  # this is the max number of threads to be used\nTIMEOUT = 90\nPORT = 53000\n\nSOURCE_CLIENT = CogniteClient(\n    ClientConfig(\n        credentials=OAuthClientCredentials(\n            token_url=SOURCE_TOKEN_URL,\n            client_id=SOURCE_CLIENT_ID,\n            scopes=SOURCE_SCOPES,\n            client_secret=SOURCE_CLIENT_SECRET,\n        ),\n        project=SOURCE_PROJECT,\n        base_url=SOURCE_BASE_URL,\n        client_name=\"cognite-replicator-source\",\n    )\n)\n\nDEST_CLIENT = CogniteClient(\n    ClientConfig(\n        credentials=OAuthClientCredentials(\n            token_url=DEST_TOKEN_URL,\n            client_id=DEST_CLIENT_ID,\n            scopes=DEST_SCOPES,\n            client_secret=DEST_CLIENT_SECRET,\n        ),\n        project=DEST_PROJECT,\n        base_url=DEST_BASE_URL,\n        client_name=\"cognite-replicator-destination\",\n    )\n)\n\nif __name__ == \"__main__\":  # this is necessary because threading\n\n    #### Uncomment the resources you would like to copy\n    assets.replicate(SOURCE_CLIENT, DEST_CLIENT)\n    #events.replicate(SOURCE_CLIENT, DEST_CLIENT, BATCH_SIZE, NUM_THREADS)\n    #files.replicate(SOURCE_CLIENT, DEST_CLIENT, BATCH_SIZE, NUM_THREADS)\n    #time_series.replicate(SOURCE_CLIENT, DEST_CLIENT, BATCH_SIZE, NUM_THREADS)\n    #datapoints.replicate(SOURCE_CLIENT, DEST_CLIENT)\n    #sequences.replicate(SOURCE_CLIENT, DEST_CLIENT, BATCH_SIZE, NUM_THREADS)\n    #sequence_rows.replicate(SOURCE_CLIENT, DEST_CLIENT, BATCH_SIZE, NUM_THREADS)\n```\n\n### 2.3 Alternative by having some elements of the configuration file as variable\n\nRefer to [default configuration file](config/default.yml) or [example configuration file](config/example.yml) for all keys in the configuration file\nStart with client creation from either step 2.1 or 2.2\n\n```python\n\nif __name__ == \"__main__\":  # this is necessary because threading\n    config = {\n        \"timeseries_external_ids\": [\"pi:160670\", \"pi:160623\"],\n        \"datapoints_start\": \"100d-ago\",\n        \"datapoints_end\": \"now\",\n    }\n    time_series.replicate(\n        client_src=SOURCE_CLIENT,\n        client_dst=DEST_CLIENT,\n        batch_size=BATCH_SIZE,\n        num_threads=NUM_THREADS,\n        config=config,\n    )\n    datapoints.replicate(\n        client_src=SOURCE_CLIENT,\n        client_dst=DEST_CLIENT,\n        external_ids=config.get(\"timeseries_external_ids\"),\n        start=config.get(\"datapoints_start\"),\n        end=config.get(\"datapoints_end\"),\n    )\n```\n\n### 3. With configuration file\nIt will use the configuration file to determine what will be copied\nIn this case, no need to create the client, it will be created based on what is in the configuration file\n\n```python\nimport yaml\nfrom cognite.replicator.__main__ import main\nimport os\n\nif __name__ == \"__main__\":  # this is necessary because threading\n    COGNITE_CONFIG_FILE = yaml.safe_load(\"config/config.yml\")\n    os.environ[\"COGNITE_CONFIG_FILE\"] = COGNITE_CONFIG_FILE\n    main()\n```\n\n### 4. Local testing\nIt will use the configuration file to determine what will be copied\nIn this case, no need to create the client, it will be created based on what is in the configuration file\n\n```python\nimport yaml\nimport sys\nsys.path.append(\"cognite-replicator\") ### Path of the local version of the replicator. Importing from outside of the current working directory requires sys.path, which is a list of all directories Python searches through.\nimport os\n\nif __name__ == \"__main__\":  # this is necessary because threading\n    COGNITE_CONFIG_FILE = yaml.safe_load(\"config/config.yml\")\n    os.environ[\"COGNITE_CONFIG_FILE\"] = COGNITE_CONFIG_FILE\n    main()\n    sys.path.remove(\"cognite-replicator\")  ## Python will also search these paths for future projects unless they are removed. Removes unwanted search paths\n```\n\n## Development\n\nChange the version in the files\n- [_version.py](cognite/replicator/_version.py#L1 \"Version in code\")\n- [cd.yml](.github/workflows/cd.yml#L30 \"Continuous deployment yaml file\")\n- [pyproject.toml](pyproject.toml#L3 \"Poetry configuration\")\n\n\n## Changelog\nWondering about upcoming or previous changes? Take a look at the [CHANGELOG](https://github.com/cognitedata/cognite-replicator/blob/master/CHANGELOG.md).\n\n## Contributing\nWant to contribute? Check out [CONTRIBUTING](https://github.com/cognitedata/cognite-replicator/blob/master/CONTRIBUTING.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcognitedata%2Fcognite-replicator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcognitedata%2Fcognite-replicator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcognitedata%2Fcognite-replicator/lists"}