Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vsouza/spark-kinesis-redshift
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
https://github.com/vsouza/spark-kinesis-redshift
aws aws-kinesis aws-kinesis-stream aws-redshift etl etl-pipeline python shell spark spark-streaming
Last synced: 2 months ago
JSON representation
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
- Host: GitHub
- URL: https://github.com/vsouza/spark-kinesis-redshift
- Owner: vsouza
- Created: 2016-08-25T16:02:40.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-05-22T18:45:02.000Z (over 6 years ago)
- Last Synced: 2024-04-13T21:19:46.637Z (9 months ago)
- Topics: aws, aws-kinesis, aws-kinesis-stream, aws-redshift, etl, etl-pipeline, python, shell, spark, spark-streaming
- Language: Python
- Homepage:
- Size: 89.8 KB
- Stars: 11
- Watchers: 3
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Apache Spark Kinesis Consumer
> Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
Code from: [Processing IoT realtime data - Medium](https://medium.com/@iamvsouza/processing-grandparents-realtime-data-d6b8c99e0b43)
## Usage example
You need to set Amazon Credentials on your enviroment.
```shell
export AWS_ACCESS_KEY_ID=""
export AWS_ACCESS_KEY=""
export AWS_SECRET_ACCESS_KEY=""
export AWS_SECRET_KEY=""
```## Dependencies
Must be included on `--packages` flag.
`org.apache.spark:spark-streaming-kinesis-asl_2.10:1.6.1`
## Setup
__How run Kinesis locally?__
A few months ago I created a Docker image with Kinesalite (amazin project to simulate Amazon Kinesis), you can use
this image, or run [Kinesalite]() directly.`docker run -d -p 4567:4567 vsouza/kinesis-local -p 4567 --createStreaMs 5`
check the [project](https://github.com/vsouza/docker-Kinesis-local)
__I should have DynamoDB too?__
Yes, :cry: . The AWS SDK Kinesis module make checkpoints of your Kinesis tunnel, and store this on DynamoDB. You don't
need to create tables or else, the SDK will create for you.*Remember to configure your throughput value in DynamoDB correctly*
## License
[MIT License](http://vsouza.mit-license.org/) © Vinicius Souza