https://github.com/findify/flink-protobuf
Protobuf serialization support for Apache Flink
https://github.com/findify/flink-protobuf
Last synced: 11 months ago
JSON representation
Protobuf serialization support for Apache Flink
- Host: GitHub
- URL: https://github.com/findify/flink-protobuf
- Owner: findify
- License: apache-2.0
- Created: 2021-05-31T15:17:17.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-06-01T15:21:02.000Z (about 5 years ago)
- Last Synced: 2024-05-02T23:38:22.687Z (about 2 years ago)
- Language: Scala
- Size: 28.3 KB
- Stars: 19
- Watchers: 5
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Protobuf serialization support for Apache Flink
[](https://github.com/flink-protobuf/workflows/actions)
[](https://maven-badges.herokuapp.com/maven-central/io.findify/flink-protobuf_2.12)
[](https://opensource.org/licenses/Apache2.0)
This project is an adapter to connect [Google Protobuf](https://developers.google.com/protocol-buffers) to the flink's
own `TypeInformation`-based [serialization framework](https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html).
This project can be useful if you have:
* [oneof-encoded](https://developers.google.com/protocol-buffers/docs/proto#oneof) protobuf messages,
which cannot be efficiently encoded using flink's serialization without Kryo fallback.
* flexible requirements on schema evolution for POJO classes (as compared to
[Flinks' for POJOs and Scala case classes](https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/schema_evolution/))
* schema evolution support is needed for scala case classes (as Flink lacks it out of the box)
## Usage
`flink-protobuf` is released to Maven-central. For SBT, add this snippet to `build.sbt`:
```scala
libraryDependencies += "io.findify" %% "flink-protobuf" % "0.2"
```
Then, given that you have a following message format:
```proto
message Foo {
required int32 value = 1;
}
```
You can build a `TypeInformation` for scalapb-generated classes like this:
```scala
import io.findify.flinkpb.FlinkProtobuf
implicit val ti = FlinkProtobuf.generateScala(Foo)
val result = env.fromCollection(List(Foo(1), Foo(2), Foo(3)))
```
For Java it's going to look a bit different:
```java
import io.findify.flinkprotobuf.java.Tests;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
TypeInformation ti = FlinkProtobuf.generateJava(Tests.Foo.class, Tests.Foo.getDefaultInstance());
env.fromCollection(List.of(Tests.Foo.newBuilder().setValue(1).build()), ti).executeAndCollect(100);
```
## Schema evolution
Compared to Flink schema evolution for POJO classes, with `flink-protobuf` you can do much more:
* fields can be renamed (as protobuf uses an index-based encoding for field names)
* types can be changed (so optional field can be made repeated, or int32 can be upcasted to int64)
For Scala case classes Flink has no support for schema evolution, so with this project you can:
* add, rename, remove fields
* change field types
## Compatibility
The library is built over Flink 1.13 for Scala 2.12, but should be binary compatible with older flink versions.
Scala 2.11 version is not planned, as ScalaPB already dropped it's support.
## License
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/