{"id":19195908,"url":"https://github.com/mercari/dataflowtemplate","last_synced_at":"2025-04-06T14:12:15.693Z","repository":{"id":38325749,"uuid":"318452458","full_name":"mercari/DataflowTemplate","owner":"mercari","description":"Mercari Dataflow Template","archived":false,"fork":false,"pushed_at":"2025-03-23T02:20:49.000Z","size":3158,"stargazers_count":72,"open_issues_count":5,"forks_count":22,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-05T20:49:47.042Z","etag":null,"topics":["apache-beam","cloud-dataflow","google-cloud"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mercari.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-12-04T08:30:37.000Z","updated_at":"2025-03-11T07:27:49.000Z","dependencies_parsed_at":"2024-01-16T02:47:45.299Z","dependency_job_id":"c9192e13-5907-4613-ab1a-cd0567b826fd","html_url":"https://github.com/mercari/DataflowTemplate","commit_stats":{"total_commits":366,"total_committers":8,"mean_commits":45.75,"dds":0.03825136612021862,"last_synced_commit":"00498768f4ada259f22baa698118a4287bdff62b"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mercari%2FDataflowTemplate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mercari%2FDataflowTemplate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mercari%2FDataflowTemplate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mercari%2FDataflowTemplate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mercari","download_url":"https://codeload.github.com/mercari/DataflowTemplate/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247492557,"owners_count":20947545,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-beam","cloud-dataflow","google-cloud"],"created_at":"2024-11-09T12:11:55.939Z","updated_at":"2025-04-06T14:12:15.673Z","avatar_url":"https://github.com/mercari.png","language":"Java","readme":"# Mercari Dataflow Template\n\nThe Mercari Dataflow Template enables you to run various pipelines without writing programs by simply defining a configuration file.\n\nMercari Dataflow Template is implemented as a [FlexTemplate](https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates) for [Cloud Dataflow](https://cloud.google.com/dataflow). Pipelines are assembled based on the defined configuration file and can be executed as Cloud Dataflow Jobs.\n\nSee the [Document](docs/README.md) for usage\n\n## Usage Example\n\nWrite the following json file and upload it to GCS (Suppose you upload it to gs://example/config.json).\n\nThis configuration file stores the BigQuery query results in the table specified by Spanner.\n\n```json\n{\n  \"sources\": [\n    {\n      \"name\": \"bigquery\",\n      \"module\": \"bigquery\",\n      \"parameters\": {\n        \"query\": \"SELECT * FROM `myproject.mydataset.mytable`\"\n      }\n    }\n  ],\n  \"sinks\": [\n    {\n      \"name\": \"spanner\",\n      \"module\": \"spanner\",\n      \"input\": \"bigquery\",\n      \"parameters\": {\n        \"projectId\": \"myproject\",\n        \"instanceId\": \"myinstance\",\n        \"databaseId\": \"mydatabase\",\n        \"table\": \"mytable\"\n      }\n    }\n  ]\n}\n```\n\nAssuming you have deployed the Mercari Dataflow Template to gs://example/template, run the following command.\n\n```sh\ngcloud dataflow flex-template run bigquery-to-spanner \\\n  --template-file-gcs-location=gs://example/template \\\n  --parameters=config=gs://example/config.json\n```\n\nThe Dataflow job will be started, and you can check the execution status of the job in the console screen.\n\n\u003cimg src=\"https://raw.githubusercontent.com/mercari/DataflowTemplate/master/docs/images/bigquery-to-spanner.png\"\u003e\n\n\n## Deploy Template\n\nMercari Dataflow Template is used as FlexTemplate.\nTherefore, the Mercari Dataflow Template should be deployed according to the FlexTemplate creation steps.\n\n### Requirements\n\n* Java 17\n* [Maven 3](https://maven.apache.org/index.html)\n* [gcloud command-line tool](https://cloud.google.com/sdk/gcloud)\n\n### Push Template Container Image to Cloud Container Registry.\n\nThe first step is to build the source code and register it as a container image in the [Cloud Artifact Registry](https://cloud.google.com/artifact-registry).\n\nTo upload container images to the Artifact registry via Docker commands, you will first need to execute the following commands, depending on the repository region.\n\n```sh\ngcloud auth configure-docker us-central1-docker.pkg.dev, asia-northeast1-docker.pkg.dev\n```\n\nThe following command will generate a container for FlexTemplate from the source code and upload it to Artifact Registry.\n\n```sh\nmvn clean package -DskipTests -Dimage={region}-docker.pkg.dev/{deploy_project}/{template_repo_name}/cloud:latest\n```\n\n### Upload template file.\n\nThe next step is to generate a template file to start a job from the container image and upload it to GCS.\n\nUse the following command to generate a template file that can execute a dataflow job from a container image, and upload it to GCS.\n\n```sh\ngcloud dataflow flex-template build gs://{path/to/template_file} \\\n  --image \"{region}-docker.pkg.dev/{deploy_project}/{template_repo_name}/cloud:latest\" \\\n  --sdk-language \"JAVA\"\n```\n\n## Run dataflow job from template file\n\nRun Dataflow Job from the template file.\n\n* gcloud command\n\nYou can run template specifying gcs path that uploaded config file.\n\n```sh\ngsutil cp config.json gs://{path/to/config.json}\n\ngcloud dataflow flex-template run {job_name} \\\n  --template-file-gcs-location=gs://{path/to/template_file} \\\n  --parameters=config=gs://{path/to/config.json}\n```\n\n* REST API\n\nYou can also run template by [REST API](https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.locations.flexTemplates/launch).\n\n```sh\nPROJECT_ID=[PROJECT_ID]\nREGION=[REGION]\nCONFIG=\"$(cat examples/xxx.json)\"\n\ncurl -X POST -H \"Content-Type: application/json\"  -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \"https://dataflow.googleapis.com/v1b3/projects/${PROJECT_ID}/locations/${REGION}/flexTemplates:launch\" -d \"{\n  'launchParameter': {\n    'jobName': 'myJobName',\n    'containerSpecGcsPath': 'gs://{path/to/template_file}',\n    'parameters': {\n      'config': '$(echo \"$CONFIG\")',\n      'stagingLocation': 'gs://{path/to/staging}'\n    },\n    'environment': {\n      'tempLocation': 'gs://{path/to/temp}'\n    }\n  }\n}\"\n```\n\n(The options `tempLocation` and `stagingLocation` are optional. If not specified, a bucket named `dataflow-staging-{region}-{project_no}` will be automatically generated and used)\n\n### Run Template in streaming mode\n\nTo run Template in streaming mode, specify `streaming=true` in the parameter.\n\n```sh\ngcloud dataflow flex-template run {job_name} \\\n  --template-file-gcs-location=gs://{path/to/template_file} \\\n  --parameters=config=gs://{path/to/config.json} \\\n  --parameters=streaming=true\n```\n\n## Deploy Docker image for local pipeline\n\nYou can run pipeline locally. This is useful when you want to process small data quickly.\n\nAt first, you should register the container for local execution.\n\n\n```sh\n# Generate MDT jar file.\nmvn clean package -DskipTests -Dimage=\"{region}-docker.pkg.dev/{deploy_project}/{template_repo_name}/cloud\"\n\n# Create Docker image for local run\ndocker build --tag=\"{region}-docker.pkg.dev/{deploy_project}/{template_repo_name}/local\" .\n\n# If you need to push the image to the GAR,\n# you may do so by using the following commands\ngcloud auth configure-docker\ndocker push {region}-docker.pkg.dev/{deploy_project}/{template_repo_name}/local\n```\n\n## Run Pipeline locally\n\nFor local execution, execute the following command to grant the necessary permissions\n\n```shell\ngcloud auth application-default login\n````\n\nThe following is an example of a locally executed command.\nThe authentication file and config file are mounted for access by the container.\nThe other arguments (such as `project` and `config`) are the same as for normal execution.\n\nIf you want to run in streaming mode, specify streaming=true in the argument as you would in normal execution.\n\n### Mac OS\n\n```sh\ndocker run \\\n  -v ~/.config/gcloud:/mnt/gcloud:ro \\\n  -v /{your_work_dir}:/mnt/config:ro \\\n  --rm {region}-docker.pkg.dev/{deploy_project}/{template_repo_name}/local \\\n  --project={project} \\\n  --config=/mnt/config/{my_config}.json\n```\n\n### Windows OS\n\n```sh\ndocker run ^\n  -v C:\\Users\\{YourUserName}\\AppData\\Roaming\\gcloud:/mnt/gcloud:ro ^\n  -v C:\\Users\\{YourWorkingDirPath}\\:/mnt/config:ro ^\n  --rm {region}-docker.pkg.dev/{deploy_project}/{template_repo_name}/local ^\n  --project={project} ^\n  --config=/mnt/config/{MyConfig}.json\n```\n\n* Note:\n  * If you use BigQuery module locally, you will need to specify the `tempLocation` argument.\n  * If the pipeline is to access an emulator running on a local machine, such as Cloud Spanner, the `--net=host` option is required.\n\n## Committers\n\n * Yoichi Nagai ([@orfeon](https://github.com/orfeon))\n\n## Contribution\n\nPlease read the CLA carefully before submitting your contribution to Mercari.\nUnder any circumstances, by submitting your contribution, you are deemed to accept and agree to be bound by the terms and conditions of the CLA.\n\nhttps://www.mercari.com/cla/\n\n## License\n\nCopyright 2024 Mercari, Inc.\n\nLicensed under the MIT License.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmercari%2Fdataflowtemplate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmercari%2Fdataflowtemplate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmercari%2Fdataflowtemplate/lists"}