https://github.com/spotify/gcs-tools
GCS support for avro-tools, parquet-tools and protobuf
https://github.com/spotify/gcs-tools
avro gcp gcs gcs-connector google-storage parquet protobuf
Last synced: 6 months ago
JSON representation
GCS support for avro-tools, parquet-tools and protobuf
- Host: GitHub
- URL: https://github.com/spotify/gcs-tools
- Owner: spotify
- License: apache-2.0
- Created: 2016-09-18T22:21:46.000Z (about 9 years ago)
- Default Branch: main
- Last Pushed: 2025-01-30T15:39:59.000Z (8 months ago)
- Last Synced: 2025-04-05T17:13:40.575Z (6 months ago)
- Topics: avro, gcp, gcs, gcs-connector, google-storage, parquet, protobuf
- Language: Scala
- Homepage:
- Size: 192 KB
- Stars: 74
- Watchers: 16
- Forks: 15
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
GCS Tools
=========[](https://github.com/spotify/gcs-tools/actions/workflows/ci.yml)
[](./LICENSE)## Raison d'être:
Light weight wrapper that adds [Google Cloud Storage](https://cloud.google.com/storage/) (GCS) support to common Hadoop tools, including [avro-tools](https://mvnrepository.com/artifact/org.apache.avro/avro-tools), [parquet-cli](https://mvnrepository.com/artifact/org.apache.parquet/parquet-cli), proto-tools for [Scio](https://github.com/spotify/scio)'s Protobuf in Avro file, and magnolify-tools for [Magnolify](https://github.com/spotify/magnolify) code generation, so that they can be used from regular workstations or laptops, outside of a [Google Compute Engine](https://cloud.google.com/compute/) (GCE) instance.
It uses your existing OAuth2 credentials and allows authentication via a browser.
## Usage:
You can install the tools via our [Homebrew tap](https://github.com/spotify/homebrew-public) on Mac.
```
brew tap spotify/public
brew install gcs-avro-tools gcs-parquet-cli gcs-proto-tools gcs-magnolify-tools
avro-tools tojson
parquet-cli cat
proto-tools tojson
magnolify-tools
```Or build them yourself.
```
sbt assembly
java -jar avro-tools/target/scala-2.13/avro-tools-*.jar tojson
java -jar parquet-cli/target/scala-2.13/parquet-cli-*.jar cat
java -jar proto-tools/target/scala-2.13/proto-tools-*.jar cat
java -jar magnolify-tools/target/scala-2.13/magnolify-tools-*.jar
```## How it works:
To make avro-tools and parquet-cli work with GCS we need:
- [GCS connector](https://github.com/GoogleCloudPlatform/bigdata-interop) and its dependencies
- [GCS connector configuration](//github.com/spotify/gcs-tools/blob/master/shared/src/main/resources/core-site.xml)GCS connector won't pick up your local gcloud configuration, and instead expects settings
in [core-site.xml](https://github.com/spotify/gcs-tools/blob/master/shared/src/main/resources/core-site.xml).