Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mariussoutier/avro-example
Example code to use Avro with Scala in your BigData stack
https://github.com/mariussoutier/avro-example
Last synced: about 6 hours ago
JSON representation
Example code to use Avro with Scala in your BigData stack
- Host: GitHub
- URL: https://github.com/mariussoutier/avro-example
- Owner: mariussoutier
- Created: 2014-04-23T19:18:25.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2014-05-02T11:50:40.000Z (over 10 years ago)
- Last Synced: 2024-04-17T10:44:58.236Z (7 months ago)
- Language: Scala
- Size: 129 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scalding with Avro
This project shows how to use Avro to model your data and later process it using Scalding.
It also illustrates an error that is occurring as soon as nested Avro structures hit the disk.## Tweets
Of course this example, too, has to model Twitter tweets. In contrast to most tutorials, we are
modeling Tweets close to the original, as
[documented by Twitter](https://dev.twitter.com/docs/platform-objects/tweets).
This means not only a simple flat model, but nested and nullable properties of various types.## Jobs
CountTweetsJob - count all tweets per day.
TrendingTagsJob - takes all hashtags, counts them per day, and keeps the top 20 ones.
## Running the Project
Note: This project currently uses Maven to reproduce my client's original environment as close as possible.
Run `mvn clean compile scala:cc` for interactive development (or just import into IntelliJ).
Run `mvn clean compile scala:cctest` to run tests while developing.To run the job on Hadoop, package everything via `mvn clean package`. The fat JAR will be placed into the
target folder and can be submitted to Hadoop.~~~bash
hadoop jar target/avro-example-1.0.0-SNAPSHOT.jar com.mariussoutier.avroexample.jobs. --input ... --output ... --hdfs
~~~In time, the project will be moved to sbt.
## Test Data
The project also contains [ScalaCheck](scalacheck.org) generators. Easily create 100 tweets in
`/tmp/tweets` by running `mvn scala:run -DmainClass="com.mariussoutier.avroexample.test.MakeTweets"`.