An open API service indexing awesome lists of open source software.

https://github.com/aphexlog/userdata-ingestion-pipeline


https://github.com/aphexlog/userdata-ingestion-pipeline

Last synced: about 1 year ago
JSON representation

Awesome Lists containing this project

README

          

# Serverless Framework AWS Python User Data Ingestion Pipeline

This project shows you an easy way to set up a scalable data ingestion pipeline on AWS Lambda using the Serverless Framework. The pipeline grabs user data from [RandomUser.me](https://randomuser.me), processes it with Kinesis Data Stream, and stores it in an S3 bucket. If you want to explore different configurations, check out the official serverless [examples repo](https://github.com/serverless/examples/). It has additional integrations with services like SQS, DynamoDB, or event-triggered functions. For more detailed event configurations, take a look at the serverless [documentation](https://www.serverless.com/framework/docs/providers/aws/events/). For a manual exercise on how to carry out these steps, refer to the [`EXERCISE.md`](./EXERCISE.md) file.

## Important Information

### Key Details

- **Organization**: aphexlog
- **Service**: userdata-ingestion-pipeline
- **Provider**: AWS
- **Runtime**: Python 3.12
- **Stage**: ${opt:stage, 'dev'}

### Plugins Used

- `serverless-python-requirements`
- `serverless-iam-roles-per-function`

## Usage

### Deployment

Deploying the pipeline is straightforward. Simply run the following command:

```
serverless deploy
```

After executing the deploy command, you should see an output like:

```
Deploying "userdata-ingestion-pipeline" to stage "dev" (us-east-1)

✔ Service deployed to stack userdata-ingestion-pipeline-dev (90s)

functions:
producer: userdata-ingestion-pipeline-producer (2.1 kB)
consumer: userdata-ingestion-pipeline-consumer (2.1 kB)
```

### Invocation

Post successful deployment, you can test the data ingestion functions using these examples:

```
serverless invoke --function producer
serverless invoke --function consumer
```

You should receive responses similar to:

```json
{
"statusCode": 200,
"body": "{\"message\": \"User data sent to Kinesis stream\"}"
}
```

```json
{
"statusCode": 200,
"body": "{\"message\": \"Data processed and stored successfully\"}"
}
```

### Monitoring and Logs

To view real-time logs and monitor the performance of your functions, use the following commands:

```
serverless logs --function producer
serverless logs --function consumer
```

These will show the logs generated by your functions on AWS CloudWatch, which are essential for debugging and monitoring the health of your application.

### Local development

For local testing and development, you can invoke your functions locally via:

```
serverless invoke local --function producer
serverless invoke local --function consumer
```

These commands should yield responses similar to:

```
{
"statusCode": 200,
"body": "{\"message\": \"Producer function executed successfully!\"}"
}
```

```
{
"statusCode": 200,
"body": "{\"message\": \"Consumer function executed successfully!\"}"
}
```

### Bundling dependencies

In order to include third-party dependencies, you need to use the `serverless-python-requirements` plugin. Install it with the following command:

```
serverless plugin install -n serverless-python-requirements
```

This command will automatically add `serverless-python-requirements` to the `plugins` section in your `serverless.yml` file and register it as a `devDependency` in the `package.json` file. If `package.json` does not exist, it will be generated for you. Now, you can specify your dependencies in the `requirements.txt` file (support for `Pipfile` and `pyproject.toml` is available with additional configuration) and they will be included in the Lambda package during the build process. More details about plugin configuration can be found in the [official documentation](https://github.com/UnitedIncome/serverless-python-requirements). For detailed steps on performing these actions manually, consult the [`EXERCISE.md`](./EXERCISE.md) file.