{"id":19725983,"url":"https://github.com/mesmacosta/datacatalog-util","last_synced_at":"2025-04-30T00:32:04.809Z","repository":{"id":46083197,"uuid":"254129931","full_name":"mesmacosta/datacatalog-util","owner":"mesmacosta","description":"A Python package to centralize some Google Cloud Data Catalog scripts, this repo contains commands like bulk CSV operations that help leverage Data Catalog features.","archived":false,"fork":false,"pushed_at":"2022-12-26T21:01:42.000Z","size":398,"stargazers_count":21,"open_issues_count":3,"forks_count":7,"subscribers_count":5,"default_branch":"master","last_synced_at":"2024-10-06T09:11:18.540Z","etag":null,"topics":["bigdata","bulk","cloud","csv","csv-export","csv-import","csv-importer","data-catalog-tagtemplate","data-governance","data-management","datacatalog","datacatalog-util","fileset-entries","filesets","gcp","gcp-storage","google-datacatalog","python"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mesmacosta.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-08T15:40:11.000Z","updated_at":"2024-07-05T18:58:11.000Z","dependencies_parsed_at":"2023-01-31T01:45:49.939Z","dependency_job_id":null,"html_url":"https://github.com/mesmacosta/datacatalog-util","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mesmacosta%2Fdatacatalog-util","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mesmacosta%2Fdatacatalog-util/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mesmacosta%2Fdatacatalog-util/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mesmacosta%2Fdatacatalog-util/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mesmacosta","download_url":"https://codeload.github.com/mesmacosta/datacatalog-util/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224192127,"owners_count":17271187,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bigdata","bulk","cloud","csv","csv-export","csv-import","csv-importer","data-catalog-tagtemplate","data-governance","data-management","datacatalog","datacatalog-util","fileset-entries","filesets","gcp","gcp-storage","google-datacatalog","python"],"created_at":"2024-11-11T23:33:59.880Z","updated_at":"2024-11-11T23:33:59.976Z","avatar_url":"https://github.com/mesmacosta.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Datacatalog Util [![Tweet](https://img.shields.io/twitter/url/http/shields.io.svg?style=social)](https://twitter.com/intent/tweet?text=Data%20Catalog%20util%20scripts%20\u0026url=https://github.com/mesmacosta/datacatalog-util\u0026via=github\u0026hashtags=datacatalog,python,bulk,bigdata)\n\n\n[![CircleCI][1]][2] [![PyPi][7]][8] [![License][9]][9] [![Issues][10]][11]\n\nA Python package to manage Google Cloud Data Catalog helper commands and scripts.\n\n**Disclaimer: This is not an officially supported Google product.**\n\n## Commands List\n\n| Group            | Command                        | Description                                             | Documentation Link | Code Repo |\n| ---              | ---                            | ---                                                     | ---                | ---       |\n| `tags`           |**create**                      | Load Tags from CSV file.                                | [GO][12]           | [GO][18]  |\n| `tags`           |**delete**                      | Delete Tags from CSV file.                              | [GO][31]           | [GO][26]  |\n| `tags`           |**export**                      | Export Tags to CSV file.                                | [GO][13]           | [GO][26]  |\n| `tag-templates`  |**create**                      | Load Templates from CSV file.                           | [GO][14]           | [GO][24]  |\n| `tag-templates`  |**delete**                      | Delete Templates from CSV file.                         | [GO][15]           | [GO][24]  |\n| `tag-templates`  |**export**                      | Export Templates to CSV file.                           | [GO][16]           | [GO][25]  |\n| `filesets`       |**create**                      | Create GCS filesets from CSV file.                      | [GO][29]           | [GO][28]  |\n| `filesets`       |**enrich**                      | Enrich GCS filesets with Tags.                          | [GO][20]           | [GO][19]  |\n| `filesets`       |**clean-up-templates-and-tags** | Cleans up the Fileset Template and their Tags.          | [GO][21]           | [GO][19]  |\n| `filesets`       |**delete**                      | Delete GCS filesets from CSV file.                      | [GO][30]           | [GO][28]  |\n| `filesets`       |**export**                      | Export Filesets to CSV file.                            | [GO][34]           | [GO][33]  |\n| `object-storage` |**create-entries**              | Create Entries for each Object Storage File.            | [GO][36]           | [GO][35]  |\n| `object-storage` |**delete-entries**              | Delete Entries that belong to the Object Storage Files. | [GO][37]           | [GO][35]  |\n\n\n-----\n\n## Execute Tutorial in Cloud Shell\n[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/mesmacosta/datacatalog-util\u0026tutorial=tutorials/TUTORIAL.md)\n\n\n\u003c!--\n  ⚠️ DO NOT UPDATE THE TABLE OF CONTENTS MANUALLY ️️⚠️\n  run `npx markdown-toc -i README.md`.\n\n  Please stick to 80-character line wraps as much as you can.\n--\u003e\n\n## Table of Contents\n\n\u003c!-- toc --\u003e\n\n- [0. Executing in Cloud Shell from PyPi](#0-executing-in-cloud-shell-from-pypi)\n- [1. Environment setup for local build](#1-environment-setup-for-local-build)\n  * [1.1. Python + virtualenv](#11-python--virtualenv)\n    + [1.1.1. Install Python 3.6+](#111-install-python-36)\n    + [1.1.2. Get the source code](#112-get-the-source-code)\n    + [1.1.3. Create and activate an isolated Python environment](#113-create-and-activate-an-isolated-python-environment)\n    + [1.1.4. Install the package](#114-install-the-package)\n  * [1.2. Docker](#12-docker)\n  * [1.3. Auth credentials](#13-auth-credentials)\n    + [1.3.1. Create a service account and grant it below roles](#131-create-a-service-account-and-grant-it-below-roles)\n    + [1.3.2. Download a JSON key and save it as](#132-download-a-json-key-and-save-it-as)\n    + [1.3.3. Set the environment variables](#133-set-the-environment-variables)\n- [2. Load Tags from CSV file](#2-load-tags-from-csv-file)\n  * [2.1. Create a CSV file representing the Tags to be created](#21-create-a-csv-file-representing-the-tags-to-be-created)\n    + [2.1.1 Execute Tutorial in Cloud Shell](#211-execute-tutorial-in-cloud-shell)\n  * [2.2. Run the datacatalog-util script - Create the Tags](#22-run-the-datacatalog-util-script---create-the-tags)\n  * [2.3. Run the datacatalog-util script - Delete the Tags](#23-run-the-datacatalog-util-script---delete-the-tags)\n- [3. Export Tags to CSV file](#3-export-tags-to-csv-file)\n  * [3.1. A list of CSV files, each representing one Template will be created.](#31-a-list-of-csv-files-each-representing-one-template-will-be-created)\n    + [3.1.1 Execute Tutorial in Cloud Shell](#311-execute-tutorial-in-cloud-shell)\n  * [3.2. Run tags export](#32-run-tags-export)\n  * [3.3 Run tags export filtering Tag Templates](#33-run-tags-export-filtering-tag-templates)\n- [4. Load Templates from CSV file](#4-load-templates-from-csv-file)\n  * [4.1. Create a CSV file representing the Templates to be created](#41-create-a-csv-file-representing-the-templates-to-be-created)\n    + [4.1.1 Execute Tutorial in Cloud Shell](#411-execute-tutorial-in-cloud-shell)\n  * [4.2. Run the datacatalog-util script - Create the Tag Templates](#42-run-the-datacatalog-util-script---create-the-tag-templates)\n  * [4.3. Run the datacatalog-util script - Delete the Tag Templates](#43-run-the-datacatalog-util-script---delete-the-tag-templates)\n- [5. Export Templates to CSV file](#5-export-templates-to-csv-file)\n  * [5.1. A CSV file representing the Templates will be created](#51-a-csv-file-representing-the-templates-will-be-created)\n    + [5.1.1 Execute Tutorial in Cloud Shell](#511-execute-tutorial-in-cloud-shell)\n  * [5.2. Run the datacatalog-util script](#52-run-the-datacatalog-util-script)\n- [6. Filesets Commands](#6-filesets-commands)\n  * [6.1. Create a CSV file representing the Entry Groups and Entries to be created](#61-create-a-csv-file-representing-the-entry-groups-and-entries-to-be-created)\n    + [6.1.1 Execute Tutorial in Cloud Shell](#611-execute-tutorial-in-cloud-shell)\n  * [6.2. Create the Filesets Entry Groups and Entries](#62-create-the-filesets-entry-groups-and-entries)\n    + [6.2.1 Create the Filesets Entry Groups and Entries - with DataFlow SQL types validation](#621-create-the-filesets-entry-groups-and-entries---with-dataflow-sql-types-validation)\n  * [6.3. Enrich GCS Filesets with Tags](#63-enrich-gcs-filesets-with-tags)\n    + [6.3.1 Enrich all fileset entries using Tag Template from a different Project (Good way to reuse the same Template)](#631-enrich-all-fileset-entries-using-tag-template-from-a-different-project-good-way-to-reuse-the-same-template)\n    + [6.3.2 Execute Fileset Enricher Tutorial in Cloud Shell](#632-execute-fileset-enricher-tutorial-in-cloud-shell)\n  * [6.4. clean up template and tags](#64-clean-up-template-and-tags)\n  * [6.5. Delete the Filesets Entry Groups and Entries](#65-delete-the-filesets-entry-groups-and-entries)\n- [7. Export Filesets to CSV file](#7-export-filesets-to-csv-file)\n  * [7.1. A CSV file representing the Filesets will be created](#71-a-csv-file-representing-the-filesets-will-be-created)\n    + [7.1.1 Execute Tutorial in Cloud Shell](#711-execute-tutorial-in-cloud-shell)\n  * [7.2. Run the datacatalog-util script](#72-run-the-datacatalog-util-script)\n- [8. DataCatalog Object Storage commands](#8-datacatalog-object-storage-commands)\n  * [8.1 Execute Tutorial in Cloud Shell](#81-execute-tutorial-in-cloud-shell)\n  * [8.2. Create DataCatalog entries based on object storage files](#82-create-datacatalog-entries-based-on-object-storage-files)\n  * [8.3. Delete object storage entries on entry group](#83-delete-object-storage-entries-on-entry-group)\n- [9. Data Catalog Templates Examples](#9-data-catalog-templates-examples)\n\n\u003c!-- tocstop --\u003e\n\n-----\n\n## 0. Executing in Cloud Shell from PyPi\nIf you want to execute this script directly in cloud shell, download it from PyPi:\n\n````bash\n# Set your SERVICE ACCOUNT, for instructions go to 1.3. Auth credentials\n# This name is just a suggestion, feel free to name it following your naming conventions\nexport GOOGLE_APPLICATION_CREDENTIALS=~/credentials/datacatalog-util-sa.json\n\n# Install datacatalog-util\npip3 install --upgrade datacatalog-util --user\n\n# Add to your PATH\nexport PATH=~/.local/bin:$PATH\n\n# Look for available commands\ndatacatalog-util --help\n````\n\n## 1. Environment setup for local build\n\n### 1.1. Python + virtualenv\n\nUsing [virtualenv][3] is optional, but strongly recommended unless you use [Docker](#12-docker).\n\n#### 1.1.1. Install Python 3.6+\n\n#### 1.1.2. Get the source code\n```bash\ngit clone https://github.com/mesmacosta/datacatalog-util\ncd ./datacatalog-util\n```\n\n_All paths starting with `./` in the next steps are relative to the `datacatalog-util`\nfolder._\n\n#### 1.1.3. Create and activate an isolated Python environment\n\n```bash\npip install --upgrade virtualenv\npython3 -m virtualenv --python python3 env\nsource ./env/bin/activate\n```\n\n#### 1.1.4. Install the package\n\n```bash\npip install --upgrade .\n```\n\n### 1.2. Docker\n\nDocker may be used as an alternative to run the script. In this case, please disregard the\n[Virtualenv](#11-python--virtualenv) setup instructions.\n\n### 1.3. Auth credentials\n\n#### 1.3.1. Create a service account and grant it below roles\n\n- Data Catalog Admin\n- Storage Admin\n\n#### 1.3.2. Download a JSON key and save it as\nThis name is just a suggestion, feel free to name it following your naming conventions\n- `./credentials/datacatalog-util-sa.json`\n\n#### 1.3.3. Set the environment variables\n\n_This step may be skipped if you're using [Docker](#12-docker)._\n\n```bash\nexport GOOGLE_APPLICATION_CREDENTIALS=~/credentials/datacatalog-util-sa.json\n```\n\n## 2. Load Tags from CSV file\n\n### 2.1. Create a CSV file representing the Tags to be created\n\nTags are composed of as many lines as required to represent all of their fields. The columns are\ndescribed as follows:\n\n| Column              | Description                                            | Mandatory |\n| ---                 | ---                                                    | ---       |\n| **linked_resource** | Full name of the asset the Entry refers to.            | Y         |\n| **template_name**   | Resource name of the Tag Template for the Tag.         | Y         |\n| **column**          | Attach Tags to a column belonging to the Entry schema. | N         |\n| **field_id**        | Id of the Tag field.                                   | Y         |\n| **field_value**     | Value of the Tag field.                                | Y         |\n\n*TIPS* \n- [sample-input/create-tags][4] for reference;\n- [Data Catalog Sample Tags][5] (Google Sheets) may help to create/export the CSV.\n\n#### 2.1.1 Execute Tutorial in Cloud Shell\n\n[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/mesmacosta/datacatalog-util\u0026tutorial=tutorials/tags/TUTORIAL.LOAD.md)\n\n### 2.2. Run the datacatalog-util script - Create the Tags\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util tags create --csv-file CSV_FILE_PATH\n```\n\n- Docker\n\n```bash\ndocker build --rm --tag datacatalog-util .\ndocker run --rm --tty \\\n  --volume CREDENTIALS_FILE_FOLDER:/credentials --volume CSV_FILE_FOLDER:/data \\\n  datacatalog-util create-tags --csv-file /data/CSV_FILE_NAME\n```\n\n### 2.3. Run the datacatalog-util script - Delete the Tags\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util tags delete --csv-file CSV_FILE_PATH\n```\n\n## 3. Export Tags to CSV file\n\n### 3.1. A list of CSV files, each representing one Template will be created.\nOne file with summary with stats about each template, will also be created on the same directory.\n\nThe columns for the summary file are described as follows:\n\n| Column                         | Description                                              | \n| ---                            | ---                                                      | \n| **template_name**              | Resource name of the Tag Template for the Tag.           | \n| **tags_count**                 | Number of tags found from the template.                  | \n| **tagged_entries_count**       | Number of tagged entries with the template.              | \n| **tagged_columns_count**       | Number of tagged columns with the template.              | \n| **tag_string_fields_count**    | Number of used String fields on tags of the template.    | \n| **tag_bool_fields_count**      | Number of used Bool fields on tags of the template.      | \n| **tag_double_fields_count**    | Number of used Double fields on tags of the template.    | \n| **tag_timestamp_fields_count** | Number of used Timestamp fields on tags of the template. | \n| **tag_enum_fields_count**      | Number of used Enum fields on tags of the template.      | \n\nThe columns for each template file are described as follows:\n\n| Column                     | Description                                            | \n| ---                        | ---                                                    |\n| **relative_resource_name** | Full resource name of the asset the Entry refers to.   |\n| **linked_resource**        | Full name of the asset the Entry refers to.            |\n| **template_name**          | Resource name of the Tag Template for the Tag.         | \n| **tag_name**               | Resource name of the Tag.                              |\n| **column**                 | Attach Tags to a column belonging to the Entry schema. |\n| **field_id**               | Id of the Tag field.                                   |\n| **field_type**             | Type of the Tag field.                                 | \n| **field_value**            | Value of the Tag field.                                | \n\n#### 3.1.1 Execute Tutorial in Cloud Shell\n\n[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/mesmacosta/datacatalog-util\u0026tutorial=tutorials/tags/TUTORIAL.EXPORT.md)\n\n### 3.2. Run tags export\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util tags export --project-ids my-project --dir-path DIR_PATH\n```\n\n### 3.3 Run tags export filtering Tag Templates\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util tags export --project-ids my-project \\\n--dir-path DIR_PATH \\\n--tag-templates-names projects/my-project/locations/us-central1/tagTemplates/my-template,\\\nprojects/my-project/locations/us-central1/tagTemplates/my-template-2 \n\n```\n\n## 4. Load Templates from CSV file\n\n### 4.1. Create a CSV file representing the Templates to be created\n\nTemplates are composed of as many lines as required to represent all of their fields. The columns are\ndescribed as follows:\n\n| Column                 | Description                                    | Mandatory |\n| ---                    | ---                                            | ---       |\n| **template_name**      | Resource name of the Tag Template for the Tag. | Y         |\n| **display_name**       | Resource name of the Tag Template for the Tag. | Y         |\n| **field_id**           | Id of the Tag Template field.                  | Y         |\n| **field_display_name** | Display name of the Tag Template field.        | Y         |\n| **field_type**         | Type of the Tag Template field.                | Y         |\n| **enum_values**        | Values for the Enum field.                     | N         |\n\n#### 4.1.1 Execute Tutorial in Cloud Shell\n\n[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/mesmacosta/datacatalog-util\u0026tutorial=tutorials/tag-templates/TUTORIAL.LOAD.md)\n\n\n### 4.2. Run the datacatalog-util script - Create the Tag Templates\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util tag-templates create --csv-file CSV_FILE_PATH\n```\n\n### 4.3. Run the datacatalog-util script - Delete the Tag Templates\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util tag-templates delete --csv-file CSV_FILE_PATH\n```\n\n*TIPS* \n- [sample-input/create-tag-templates][6] for reference;\n\n## 5. Export Templates to CSV file\n\n### 5.1. A CSV file representing the Templates will be created\n\nTemplates are composed of as many lines as required to represent all of their fields. The columns are\ndescribed as follows:\n\n| Column                 | Description                                    | \n| ---                    | ---                                            | \n| **template_name**      | Resource name of the Tag Template for the Tag. | \n| **display_name**       | Resource name of the Tag Template for the Tag. | \n| **field_id**           | Id of the Tag Template field.                  | \n| **field_display_name** | Display name of the Tag Template field.        | \n| **field_type**         | Type of the Tag Template field.                | \n| **enum_values**        | Values for the Enum field.                     | \n\n#### 5.1.1 Execute Tutorial in Cloud Shell\n\n[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/mesmacosta/datacatalog-util\u0026tutorial=tutorials/tag-templates/TUTORIAL.EXPORT.md)\n\n\n### 5.2. Run the datacatalog-util script\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util tag-templates export --project-ids my-project --file-path CSV_FILE_PATH\n```\n\n## 6. Filesets Commands\n\n### 6.1. Create a CSV file representing the Entry Groups and Entries to be created\n\nFilesets are composed of as many lines as required to represent all of their fields. The columns are\ndescribed as follows:\n\n| Column                        | Description               | Mandatory |\n| ---                           | ---                       | ---       |\n| **entry_group_name**          | Entry Group Name.         | Y         |\n| **entry_group_display_name**  | Entry Group Display Name. | N         |\n| **entry_group_description**   | Entry Group Description.  | N         |\n| **entry_id**                  | Entry ID.                 | Y         |\n| **entry_display_name**        | Entry Display Name.       | Y         |\n| **entry_description**         | Entry Description.        | N         |\n| **entry_file_patterns**       | Entry File Patterns.      | Y         |\n| **schema_column_name**        | Schema column name.       | N         |\n| **schema_column_type**        | Schema column type.       | N         |\n| **schema_column_description** | Schema column description.| N         |\n| **schema_column_mode**        | Schema column mode.       | N         |\n\nPlease note that the `schema_column_type` is an open string field and accept anything, if you want \nto use your fileset with Dataflow SQL, follow the data-types in the [official docs][38].\n\n#### 6.1.1 Execute Tutorial in Cloud Shell\n\n[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/mesmacosta/datacatalog-util\u0026tutorial=tutorials/filesets/TUTORIAL.LOAD.md)\n\n\n### 6.2. Create the Filesets Entry Groups and Entries\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util filesets create --csv-file CSV_FILE_PATH\n```\n\n*TIPS* \n- [sample-input/create-filesets][32] for reference;\n\n- If you want to create filesets without schema:\n[sample-input/create-filesets/fileset-entry-opt-1-all-metadata-no-schema.csv][32] for reference;\n\n#### 6.2.1 Create the Filesets Entry Groups and Entries - with DataFlow SQL types validation\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util filesets create --csv-file CSV_FILE_PATH --validate-dataflow-sql-types\n```\n\n### 6.3. Enrich GCS Filesets with Tags\nUsers are able to choose the Tag fields from the list provided at [Tags][23]\n\n```bash\ndatacatalog-util filesets enrich --project-id my-project \n```\n\n#### 6.3.1 Enrich all fileset entries using Tag Template from a different Project (Good way to reuse the same Template)\n\nIf you are using a different Project, make sure the Service Account has the following permissions on that Project or that Template:\n* Data Catalog TagTemplate Creator\n* Data Catalog TagTemplate User\n\n```bash\ndatacatalog-util filesets \\\n  --project-id my_project \\\n  enrich --tag-template-name projects/my_different_project/locations/us-central1/tagTemplates/fileset_enricher_findings\n```\n\n#### 6.3.2 Execute Fileset Enricher Tutorial in Cloud Shell\n\n[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/mesmacosta/datacatalog-util\u0026tutorial=tutorials/filesets/TUTORIAL.ENRICH.md)\n\n\n### 6.4. clean up template and tags\nCleans up the Template and Tags from the Fileset Entries, running the main command will recreate those.\n\n```bash\ndatacatalog-util filesets clean-up-templates-and-tags --project-id my-project \n```\n\n### 6.5. Delete the Filesets Entry Groups and Entries\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util filesets delete --csv-file CSV_FILE_PATH\n```\n\n## 7. Export Filesets to CSV file\n\n### 7.1. A CSV file representing the Filesets will be created\n\nFilesets are composed of as many lines as required to represent all of their fields. The columns are\ndescribed as follows:\n\n| Column                        | Description               | Mandatory |\n| ---                           | ---                       | ---       |\n| **entry_group_name**          | Entry Group Name.         | Y         |\n| **entry_group_display_name**  | Entry Group Display Name. | Y         |\n| **entry_group_description**   | Entry Group Description.  | Y         |\n| **entry_id**                  | Entry ID.                 | Y         |\n| **entry_display_name**        | Entry Display Name.       | Y         |\n| **entry_description**         | Entry Description.        | Y         |\n| **entry_file_patterns**       | Entry File Patterns.      | Y         |\n| **schema_column_name**        | Schema column name.       | N         |\n| **schema_column_type**        | Schema column type.       | N         |\n| **schema_column_description** | Schema column description.| N         |\n| **schema_column_mode**        | Schema column mode.       | N         |\n\n#### 7.1.1 Execute Tutorial in Cloud Shell\n\n[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/mesmacosta/datacatalog-util\u0026tutorial=tutorials/filesets/TUTORIAL.EXPORT.md)\n\n\n### 7.2. Run the datacatalog-util script\n\n- Python + virtualenv\n\n```bash\ndatacatalog-util filesets export --project-ids my-project --file-path CSV_FILE_PATH\n```\n\n## 8. DataCatalog Object Storage commands\n\n### 8.1 Execute Tutorial in Cloud Shell\n\n[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/mesmacosta/datacatalog-util\u0026tutorial=tutorials/object-storage/TUTORIAL.LOAD.md)\n\n\n### 8.2. Create DataCatalog entries based on object storage files\n\n```bash\ndatacatalog-util \\\n  object-storage sync-entries --type cloud_storage \\\n  --project-id my_project \\\n  --entry-group-name projects/my_project/locations/us-central1/entryGroups/my_entry_group \\\n  --bucket-prefix my_bucket\n```\n\n### 8.3. Delete object storage entries on entry group\n\n```bash\ndatacatalog-util \\\n  object-storage delete-entries --type cloud_storage \\\n  --project-id my_project \\\n  --entry-group-name projects/my_project/locations/us-central1/entryGroups/my_entry_group\n```\n\n## 9. Data Catalog Templates Examples\n\n[templates_examples.md](docs/templates_examples.md)\n\n[1]: https://circleci.com/gh/mesmacosta/datacatalog-util.svg?style=svg\n[2]: https://circleci.com/gh/mesmacosta/datacatalog-util\n[3]: https://virtualenv.pypa.io/en/latest/\n[4]: https://github.com/mesmacosta/datacatalog-util/tree/master/sample-input/create-tags\n[5]: https://docs.google.com/spreadsheets/d/1bqeAXjLHUq0bydRZj9YBhdlDtuu863nwirx8t4EP_CQ\n[6]: https://github.com/mesmacosta/datacatalog-util/tree/master/sample-input/create-tag-templates\n[7]: https://img.shields.io/pypi/v/datacatalog-util.svg?force_cache=true\n[8]: https://pypi.org/project/datacatalog-util/\n[9]: https://img.shields.io/github/license/mesmacosta/datacatalog-util.svg\n[10]: https://img.shields.io/github/issues/mesmacosta/datacatalog-util.svg\n[11]: https://github.com/mesmacosta/datacatalog-util/issues\n[12]: https://github.com/mesmacosta/datacatalog-util#2-load-tags-from-csv-file\n[13]: https://github.com/mesmacosta/datacatalog-util#3-export-tags-to-csv-file\n[14]: https://github.com/mesmacosta/datacatalog-util#4-load-templates-from-csv-file\n[15]: https://github.com/mesmacosta/datacatalog-util#43-run-the-datacatalog-util-script---delete-the-tag-templates\n[16]: https://github.com/mesmacosta/datacatalog-util#5-export-templates-to-csv-file\n[17]: https://github.com/mesmacosta/datacatalog-util\n[18]: https://github.com/ricardolsmendes/datacatalog-tag-manager\n[19]: https://github.com/mesmacosta/datacatalog-fileset-enricher\n[20]: https://github.com/mesmacosta/datacatalog-util#63-enrich-gcs-filesets-with-tags\n[21]: https://github.com/mesmacosta/datacatalog-util#64-clean-up-template-and-tags\n[23]: https://github.com/mesmacosta/datacatalog-fileset-enricher#1-created-tags\n[24]: https://github.com/mesmacosta/datacatalog-tag-template-processor\n[25]: https://github.com/mesmacosta/datacatalog-tag-template-exporter\n[26]: https://github.com/mesmacosta/datacatalog-tag-exporter\n[27]: https://github.com/mesmacosta/datacatalog-util#62-create-fileset-enricher-tag-template-in-a-different-project\n[28]: https://github.com/mesmacosta/datacatalog-fileset-processor\n[29]: https://github.com/mesmacosta/datacatalog-util#61-create-a-csv-file-representing-the-entry-groups-and-entries-to-be-created\n[30]: https://github.com/mesmacosta/datacatalog-util#65-delete-the-filesets-entry-groups-and-entries\n[31]: https://github.com/mesmacosta/datacatalog-util#23-run-the-datacatalog-util-script---delete-the-tags\n[32]: https://github.com/mesmacosta/datacatalog-util/tree/master/sample-input/create-filesets\n[33]: https://github.com/mesmacosta/datacatalog-fileset-exporter\n[34]: https://github.com/mesmacosta/datacatalog-util#7-export-filesets-to-csv-file\n[35]: https://github.com/mesmacosta/datacatalog-object-storage-processor\n[36]: https://github.com/mesmacosta/datacatalog-util#82-create-datacatalog-entries-based-on-object-storage-files\n[37]: https://github.com/mesmacosta/datacatalog-util#83-delete-up-object-storage-entries-on-entry-group\n[38]: https://cloud.google.com/dataflow/docs/reference/sql/data-types\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmesmacosta%2Fdatacatalog-util","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmesmacosta%2Fdatacatalog-util","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmesmacosta%2Fdatacatalog-util/lists"}