Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vsouza/spark-kinesis-redshift

Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
https://github.com/vsouza/spark-kinesis-redshift

aws aws-kinesis aws-kinesis-stream aws-redshift etl etl-pipeline python shell spark spark-streaming

Last synced: 2 months ago
JSON representation

Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark

Host: GitHub
URL: https://github.com/vsouza/spark-kinesis-redshift
Owner: vsouza
Created: 2016-08-25T16:02:40.000Z (over 8 years ago)
Default Branch: master
Last Pushed: 2018-05-22T18:45:02.000Z (over 6 years ago)
Last Synced: 2024-04-13T21:19:46.637Z (9 months ago)
Topics: aws, aws-kinesis, aws-kinesis-stream, aws-redshift, etl, etl-pipeline, python, shell, spark, spark-streaming
Language: Python
Homepage:
Size: 89.8 KB
Stars: 11
Watchers: 3
Forks: 6
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Apache Spark Kinesis Consumer

> Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark

Code from: [Processing IoT realtime data - Medium](https://medium.com/@iamvsouza/processing-grandparents-realtime-data-d6b8c99e0b43)

## Usage example

You need to set Amazon Credentials on your enviroment.

```shell
export AWS_ACCESS_KEY_ID=""
export AWS_ACCESS_KEY=""
export AWS_SECRET_ACCESS_KEY=""
export AWS_SECRET_KEY=""
```

## Dependencies

Must be included on `--packages` flag.

`org.apache.spark:spark-streaming-kinesis-asl_2.10:1.6.1`

## Setup

__How run Kinesis locally?__

A few months ago I created a Docker image with Kinesalite (amazin project to simulate Amazon Kinesis), you can use
this image, or run [Kinesalite]() directly.

`docker run -d -p 4567:4567 vsouza/kinesis-local -p 4567 --createStreaMs 5`

check the [project](https://github.com/vsouza/docker-Kinesis-local)

__I should have DynamoDB too?__

Yes, :cry: . The AWS SDK Kinesis module make checkpoints of your Kinesis tunnel, and store this on DynamoDB. You don't
need to create tables or else, the SDK will create for you.

*Remember to configure your throughput value in DynamoDB correctly*

## License