Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vincentclaes/serverless_data_pipeline_example

Build and Deploy A Serverless Data Pipeline on AWS
https://github.com/vincentclaes/serverless_data_pipeline_example

aws aws-glue aws-lambda aws-s3 python serverless-framework

Last synced: 8 days ago
JSON representation

Build and Deploy A Serverless Data Pipeline on AWS

Awesome Lists containing this project

README

        

# SERVERLESS-DATA-PIPELINE

Using AWS Cloud Services Lambda, S3, Glue and Athena we are going to build a data pipeline written in python and deploy it using the Serverless Framework.

You can read the article here:

https://medium.com/@vincentclaes_43752/build-a-serverless-data-pipeline-on-aws-7c7d498d9707

# Deploy and run the data pipeline

make sure you have correct user and roles defined:

### Create a user to access AWS

create an admin user using the AWS console and set the credentials under a [serverless] section in the credentials file located in 

~/.aws/credentials

a step by step guide can be found here: https://medium.com/@vincentclaes_43752/create-a-user-for-the-serverless-framework-8e5c336d47c7

### Create the necessary roles
for our project we need two roles; 
* one for lambda 
* one for glue.

create a role with administrator access for both these roles and keep the ARN somewhere accessible.
A step by step guide to creating a role can be found here: https://medium.com/@vincentclaes_43752/create-a-role-on-aws-for-the-serverless-framework-for-any-resource-c49712a5eee0

Replace the ARN of the lambda role and the ARN of the Glue role with the ones defined in the serverless.yml file.
There is a comment above both roles that indicates that you should replace the ARN.

### configure your local environment

git clone https://github.com/vincentclaes/serverless_data_pipeline_example.git
cd serverless_data_pipeline_example

under the root of this project execute:

export AWS_PROFILE="serverless"
pip install awscli
sudo npm install -g serverless
npm install serverless-s3-remover

to deploy the data pipeline execute:

# replace the `unique-identifier` with something unique
sls deploy --stage unique-identifier

to remove the data pipeline execute:

# replace the `unique-identifier` with something unique
sls remove --stage unique-identifier

to get more context please refer to the article: https://medium.com/@vincentclaes_43752/build-a-serverless-data-pipeline-on-aws-7c7d498d9707