https://github.com/banditopazzo/avro-akka-stream
avro akka stream source and sink
https://github.com/banditopazzo/avro-akka-stream
akka-streams avro avro-schema hdfs java-io streams
Last synced: 7 months ago
JSON representation
avro akka stream source and sink
- Host: GitHub
- URL: https://github.com/banditopazzo/avro-akka-stream
- Owner: banditopazzo
- License: mit
- Created: 2019-06-30T13:53:10.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-07-02T14:46:12.000Z (over 6 years ago)
- Last Synced: 2025-01-28T01:34:16.753Z (9 months ago)
- Topics: akka-streams, avro, avro-schema, hdfs, java-io, streams
- Language: Scala
- Homepage:
- Size: 7.81 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# avro-akka-stream
Simple Akka Streams connector for handle Avro files.
The schema/class generation and serializing/deserializing is made thought [avro4s](https://github.com/sksamuel/avro4s)
With the actual implementation, it's possible to read/write from/to:
* local filesystem
* HDFS
* InputStream/OutputStream## Akka Streams connectors
### Source
```scala
import akka.stream.scaladsl.Source
import akka.NotUsed
import akka.actor.ActorSystem
import scala.concurrent.duration._
import org.apache.hadoop.fs.FileSystem
import io.github.banditopazzo.akka.avro.AvroSource// Define a case class
case class Person(name: String, age: Int)val fs: FileSystem = ??? // Get Hadoop file system
// Read only top level files from folder having ".avro" extension
val source: Source[Person, NotUsed] =
AvroSource.fromHDFS[Person](
fs,
"/path/to/folder",
filter = _.getName.endsWith(".avro"),
recursive = false
)
```### Sink
```scala
import akka.stream.scaladsl.Source
import akka.stream.Materializer
import akka.NotUsed
import akka.actor.ActorSystem
import scala.concurrent.duration._
import org.apache.hadoop.fs.FileSystem
import io.github.banditopazzo.akka.avro.AvroSink// Define a case class
case class Person(name: String, age: Int)implicit val system: ActorSystem = ??? // Get your actor system
implicit val materializer: Materializer = ???
val fs: FileSystem = ??? // Get Hadoop file system
val source: Source[Person, NotUsed] = ??? // Get the example source// Aggregate elements and write them in a HDFS folder
source
.groupedWithin(3000, 3.minutes)
.runWith(AvroSink.writeManyToHDFS(fs, "/path/to/folder"))
```## Basic API
The Akka Streams connectors are built on basic functions. The usage is also simple and there are few input/output options.
Basically you can extend with every source/destination supporting InputStream/OutputStreamSee the example of reading/writing to a local file:
### Read
```scala
import io.github.banditopazzo.avro.AvroReader// Define a case class
case class Person(name: String, age: Int)// Read data
val data: Iterator[Person] = AvroReader.readFromFileLocal[Person]("/path/to/file")```
### Write
```scala
import io.github.banditopazzo.avro.AvroWriter// Define a case class
case class Person(name: String, age: Int)// Create some elements
val p1 = Person("Bob", 20)
val p2 = Person("Alice", 20)
val list = List(p1,p2)// Write single element
AvroWriter.writeOneToLocal("/path/to/file", p1)// Write multiple element
AvroWriter.writeOneToLocal("/path/to/file", list)```