https://github.com/castle/avro-filter
Tool for selecting records from Avro files using a filter
https://github.com/castle/avro-filter
Last synced: 4 months ago
JSON representation
Tool for selecting records from Avro files using a filter
- Host: GitHub
- URL: https://github.com/castle/avro-filter
- Owner: castle
- License: mit
- Created: 2016-09-02T06:13:51.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2016-09-07T16:31:26.000Z (almost 10 years ago)
- Last Synced: 2025-06-23T16:40:11.420Z (12 months ago)
- Language: Scala
- Homepage:
- Size: 4.88 KB
- Stars: 4
- Watchers: 12
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## avro-filter
Reads Avro files and writes all records that matches a filter expression to a new Avro file
Filters are specified using the `-f` parameter, eg.
```bash
java -jar avro-filter.jar -o out.avro -f user_id=1,status=failed transactions.avro
```
```bash
$ java -jar avro-filter.jar --help
avro-filter 0.1
Usage: avro-filter [options] ...
-f, --filter k1=v1,k2=v2...
filter expression, eg. user_id=1
-o, --out output file
-s, --schema optional schema to use when reading
--help prints this usage text
... input file(s)
```
## TODO
- [ ] Handle multiple input files
- [ ] Split output file in configurable chunks (max size)
- [ ] Configurable compression options
## Development
run with
```bash
sbt "run -o out.avro -f user_id=1 -s schema.avro input.avro"
```
## Build
Build a JAR containg all dependencies
```bash
sbt assembly
```