Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mesmacosta/datacatalog-fileset-exporter
A Python package to manage Google Cloud Data Catalog Fileset export scripts.
https://github.com/mesmacosta/datacatalog-fileset-exporter
bulk csv datacatalog docker export-filesets metadata-management python
Last synced: 27 days ago
JSON representation
A Python package to manage Google Cloud Data Catalog Fileset export scripts.
- Host: GitHub
- URL: https://github.com/mesmacosta/datacatalog-fileset-exporter
- Owner: mesmacosta
- License: mit
- Created: 2020-04-29T18:00:09.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2022-12-26T21:01:38.000Z (about 2 years ago)
- Last Synced: 2024-09-28T09:19:39.965Z (4 months ago)
- Topics: bulk, csv, datacatalog, docker, export-filesets, metadata-management, python
- Language: Python
- Homepage:
- Size: 73.2 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Datacatalog Fileset Exporter
[![CircleCI][1]][2] [![PyPi][7]][8] [![License][9]][9] [![Issues][10]][11]
A Python package to manage Google Cloud Data Catalog Fileset export scripts.
**Disclaimer: This is not an officially supported Google product.**
## Table of Contents
- [Executing in Cloud Shell](#executing-in-cloud-shell)
- [1. Environment setup](#1-environment-setup)
* [1.1. Python + virtualenv](#11-python--virtualenv)
+ [1.1.1. Install Python 3.6+](#111-install-python-36)
+ [1.1.2. Get the source code](#112-get-the-source-code)
+ [1.1.3. Create and activate an isolated Python environment](#113-create-and-activate-an-isolated-python-environment)
+ [1.1.4. Install the package](#114-install-the-package)
* [1.2. Docker](#12-docker)
* [1.3. Auth credentials](#13-auth-credentials)
+ [1.3.1. Create a service account and grant it below roles](#131-create-a-service-account-and-grant-it-below-roles)
+ [1.3.2. Download a JSON key and save it as](#132-download-a-json-key-and-save-it-as)
+ [1.3.3. Set the environment variables](#133-set-the-environment-variables)
- [2. Export Filesets to CSV file](#2-export-filesets-to-csv-file)
* [2.1. A CSV file representing the Filesets will be created](#21-a-csv-file-representing-the-filesets-will-be-created)
* [2.2. Run the datacatalog-fileset-exporter script](#22-run-the-datacatalog-fileset-exporter-script)-----
## Executing in Cloud Shell
````bash
# Set your SERVICE ACCOUNT, for instructions go to 1.3. Auth credentials
# This name is just a suggestion, feel free to name it following your naming conventions
export GOOGLE_APPLICATION_CREDENTIALS=~/datacatalog-fileset-exporter-sa.json# Install datacatalog-fileset-exporter
pip3 install datacatalog-fileset-exporter --user# Add to your PATH
export PATH=~/.local/bin:$PATH# Look for available commands
datacatalog-fileset-exporter --help
````[![Open in Cloud Shell](http://gstatic.com/cloudssh/images/open-btn.svg)](https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/mesmacosta/datacatalog-fileset-exporter&tutorial=TUTORIAL.md)
## 1. Environment setup
### 1.1. Python + virtualenv
Using [virtualenv][3] is optional, but strongly recommended unless you use [Docker](#12-docker).
#### 1.1.1. Install Python 3.6+
#### 1.1.2. Get the source code
```bash
git clone https://github.com/mesmacosta/datacatalog-fileset-exporter
cd ./datacatalog-fileset-exporter
```_All paths starting with `./` in the next steps are relative to the `datacatalog-fileset-exporter`
folder._#### 1.1.3. Create and activate an isolated Python environment
```bash
pip install --upgrade virtualenv
python3 -m virtualenv --python python3 env
source ./env/bin/activate
```#### 1.1.4. Install the package
```bash
pip install --upgrade .
```### 1.2. Docker
Docker may be used as an alternative to run the script. In this case, please disregard the
[Virtualenv](#11-python--virtualenv) setup instructions.### 1.3. Auth credentials
#### 1.3.1. Create a service account and grant it below roles
- Data Catalog Admin
#### 1.3.2. Download a JSON key and save it as
This name is just a suggestion, feel free to name it following your naming conventions
- `./credentials/datacatalog-fileset-exporter-sa.json`#### 1.3.3. Set the environment variables
_This step may be skipped if you're using [Docker](#12-docker)._
```bash
export GOOGLE_APPLICATION_CREDENTIALS=~/credentials/datacatalog-fileset-exporter-sa.json
```## 2. Export Filesets to CSV file
### 2.1. A CSV file representing the Filesets will be created
Filesets are composed of as many lines as required to represent all of their fields. The columns are
described as follows:| Column | Description | Mandatory |
| --- | --- | --- |
| **entry_group_name** | Entry Group Name. | Y |
| **entry_group_display_name** | Entry Group Display Name. | Y |
| **entry_group_description** | Entry Group Description. | Y |
| **entry_id** | Entry ID. | Y |
| **entry_display_name** | Entry Display Name. | Y |
| **entry_description** | Entry Description. | Y |
| **entry_file_patterns** | Entry File Patterns. | Y |
| **schema_column_name** | Schema column name. | N |
| **schema_column_type** | Schema column type. | N |
| **schema_column_description** | Schema column description.| N |
| **schema_column_mode** | Schema column mode. | N |### 2.2. Run the datacatalog-fileset-exporter script
- Python + virtualenv
```bash
datacatalog-fileset-exporter filesets export --project-ids my-project --file-path CSV_FILE_PATH
```[1]: https://circleci.com/gh/mesmacosta/datacatalog-fileset-exporter.svg?style=svg
[2]: https://circleci.com/gh/mesmacosta/datacatalog-fileset-exporter
[3]: https://virtualenv.pypa.io/en/latest/
[7]: https://img.shields.io/pypi/v/datacatalog-fileset-exporter.svg?force_cache=true
[8]: https://pypi.org/project/datacatalog-fileset-exporter/
[9]: https://img.shields.io/github/license/mesmacosta/datacatalog-fileset-exporter.svg
[10]: https://img.shields.io/github/issues/mesmacosta/datacatalog-fileset-exporter.svg
[11]: https://github.com/mesmacosta/datacatalog-fileset-exporter/issues