https://github.com/aphexlog/userdata-ingestion-pipeline
https://github.com/aphexlog/userdata-ingestion-pipeline
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/aphexlog/userdata-ingestion-pipeline
- Owner: aphexlog
- Created: 2024-07-29T20:37:50.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-28T19:07:50.000Z (almost 2 years ago)
- Last Synced: 2024-08-28T20:50:39.874Z (almost 2 years ago)
- Language: Python
- Size: 92.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Serverless Framework AWS Python User Data Ingestion Pipeline
This project shows you an easy way to set up a scalable data ingestion pipeline on AWS Lambda using the Serverless Framework. The pipeline grabs user data from [RandomUser.me](https://randomuser.me), processes it with Kinesis Data Stream, and stores it in an S3 bucket. If you want to explore different configurations, check out the official serverless [examples repo](https://github.com/serverless/examples/). It has additional integrations with services like SQS, DynamoDB, or event-triggered functions. For more detailed event configurations, take a look at the serverless [documentation](https://www.serverless.com/framework/docs/providers/aws/events/). For a manual exercise on how to carry out these steps, refer to the [`EXERCISE.md`](./EXERCISE.md) file.
## Important Information
### Key Details
- **Organization**: aphexlog
- **Service**: userdata-ingestion-pipeline
- **Provider**: AWS
- **Runtime**: Python 3.12
- **Stage**: ${opt:stage, 'dev'}
### Plugins Used
- `serverless-python-requirements`
- `serverless-iam-roles-per-function`
## Usage
### Deployment
Deploying the pipeline is straightforward. Simply run the following command:
```
serverless deploy
```
After executing the deploy command, you should see an output like:
```
Deploying "userdata-ingestion-pipeline" to stage "dev" (us-east-1)
✔ Service deployed to stack userdata-ingestion-pipeline-dev (90s)
functions:
producer: userdata-ingestion-pipeline-producer (2.1 kB)
consumer: userdata-ingestion-pipeline-consumer (2.1 kB)
```
### Invocation
Post successful deployment, you can test the data ingestion functions using these examples:
```
serverless invoke --function producer
serverless invoke --function consumer
```
You should receive responses similar to:
```json
{
"statusCode": 200,
"body": "{\"message\": \"User data sent to Kinesis stream\"}"
}
```
```json
{
"statusCode": 200,
"body": "{\"message\": \"Data processed and stored successfully\"}"
}
```
### Monitoring and Logs
To view real-time logs and monitor the performance of your functions, use the following commands:
```
serverless logs --function producer
serverless logs --function consumer
```
These will show the logs generated by your functions on AWS CloudWatch, which are essential for debugging and monitoring the health of your application.
### Local development
For local testing and development, you can invoke your functions locally via:
```
serverless invoke local --function producer
serverless invoke local --function consumer
```
These commands should yield responses similar to:
```
{
"statusCode": 200,
"body": "{\"message\": \"Producer function executed successfully!\"}"
}
```
```
{
"statusCode": 200,
"body": "{\"message\": \"Consumer function executed successfully!\"}"
}
```
### Bundling dependencies
In order to include third-party dependencies, you need to use the `serverless-python-requirements` plugin. Install it with the following command:
```
serverless plugin install -n serverless-python-requirements
```
This command will automatically add `serverless-python-requirements` to the `plugins` section in your `serverless.yml` file and register it as a `devDependency` in the `package.json` file. If `package.json` does not exist, it will be generated for you. Now, you can specify your dependencies in the `requirements.txt` file (support for `Pipfile` and `pyproject.toml` is available with additional configuration) and they will be included in the Lambda package during the build process. More details about plugin configuration can be found in the [official documentation](https://github.com/UnitedIncome/serverless-python-requirements). For detailed steps on performing these actions manually, consult the [`EXERCISE.md`](./EXERCISE.md) file.