{"id":27037024,"url":"https://github.com/aphexlog/userdata-ingestion-pipeline","last_synced_at":"2025-04-05T01:15:55.551Z","repository":{"id":250780481,"uuid":"835432830","full_name":"aphexlog/UserData-Ingestion-Pipeline","owner":"aphexlog","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-28T19:07:50.000Z","size":95,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-08-28T20:50:39.874Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aphexlog.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-29T20:37:50.000Z","updated_at":"2024-08-28T19:07:53.000Z","dependencies_parsed_at":"2024-08-28T20:42:12.706Z","dependency_job_id":"faded1fa-5e0b-402f-a82a-37e6104dd9f4","html_url":"https://github.com/aphexlog/UserData-Ingestion-Pipeline","commit_stats":null,"previous_names":["aphexlog/userdata-ingestion-pipeline"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aphexlog%2FUserData-Ingestion-Pipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aphexlog%2FUserData-Ingestion-Pipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aphexlog%2FUserData-Ingestion-Pipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aphexlog%2FUserData-Ingestion-Pipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aphexlog","download_url":"https://codeload.github.com/aphexlog/UserData-Ingestion-Pipeline/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247271527,"owners_count":20911587,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-05T01:15:54.941Z","updated_at":"2025-04-05T01:15:55.535Z","avatar_url":"https://github.com/aphexlog.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\ntitle: 'AWS Python User Data Ingestion Pipeline'\ndescription: 'This template demonstrates how to deploy a Python-based user data ingestion pipeline running on AWS using the Serverless Framework. The data is sourced from https://randomuser.me.'\nlayout: Doc\nframework: v4\nplatform: AWS\nlanguage: python\npriority: 2\nauthorLink: 'https://github.com/Aphexlog'\nauthorName: 'Aaron West'\nauthorAvatar: ''\n--\u003e\n\n# Serverless Framework AWS Python User Data Ingestion Pipeline\n\nThis project shows you an easy way to set up a scalable data ingestion pipeline on AWS Lambda using the Serverless Framework. The pipeline grabs user data from [RandomUser.me](https://randomuser.me), processes it with Kinesis Data Stream, and stores it in an S3 bucket. If you want to explore different configurations, check out the official serverless [examples repo](https://github.com/serverless/examples/). It has additional integrations with services like SQS, DynamoDB, or event-triggered functions. For more detailed event configurations, take a look at the serverless [documentation](https://www.serverless.com/framework/docs/providers/aws/events/). For a manual exercise on how to carry out these steps, refer to the [`EXERCISE.md`](./EXERCISE.md) file.\n\n## Important Information\n\n### Key Details\n\n- **Organization**: aphexlog\n- **Service**: userdata-ingestion-pipeline\n- **Provider**: AWS\n- **Runtime**: Python 3.12\n- **Stage**: ${opt:stage, 'dev'}\n\n### Plugins Used\n\n- `serverless-python-requirements`\n- `serverless-iam-roles-per-function`\n\n## Usage\n\n### Deployment\n\nDeploying the pipeline is straightforward. Simply run the following command:\n\n```\nserverless deploy\n```\n\nAfter executing the deploy command, you should see an output like:\n\n```\nDeploying \"userdata-ingestion-pipeline\" to stage \"dev\" (us-east-1)\n\n✔ Service deployed to stack userdata-ingestion-pipeline-dev (90s)\n\nfunctions:\n  producer: userdata-ingestion-pipeline-producer (2.1 kB)\n  consumer: userdata-ingestion-pipeline-consumer (2.1 kB)\n```\n\n### Invocation\n\nPost successful deployment, you can test the data ingestion functions using these examples:\n\n```\nserverless invoke --function producer\nserverless invoke --function consumer\n```\n\nYou should receive responses similar to:\n\n```json\n{\n  \"statusCode\": 200,\n  \"body\": \"{\\\"message\\\": \\\"User data sent to Kinesis stream\\\"}\"\n}\n```\n\n```json\n{\n  \"statusCode\": 200,\n  \"body\": \"{\\\"message\\\": \\\"Data processed and stored successfully\\\"}\"\n}\n```\n\n### Monitoring and Logs\n\nTo view real-time logs and monitor the performance of your functions, use the following commands:\n\n```\nserverless logs --function producer\nserverless logs --function consumer\n```\n\nThese will show the logs generated by your functions on AWS CloudWatch, which are essential for debugging and monitoring the health of your application.\n\n### Local development\n\nFor local testing and development, you can invoke your functions locally via:\n\n```\nserverless invoke local --function producer\nserverless invoke local --function consumer\n```\n\nThese commands should yield responses similar to:\n\n```\n{\n  \"statusCode\": 200,\n  \"body\": \"{\\\"message\\\": \\\"Producer function executed successfully!\\\"}\"\n}\n```\n\n```\n{\n  \"statusCode\": 200,\n  \"body\": \"{\\\"message\\\": \\\"Consumer function executed successfully!\\\"}\"\n}\n```\n\n### Bundling dependencies\n\nIn order to include third-party dependencies, you need to use the `serverless-python-requirements` plugin. Install it with the following command:\n\n```\nserverless plugin install -n serverless-python-requirements\n```\n\nThis command will automatically add `serverless-python-requirements` to the `plugins` section in your `serverless.yml` file and register it as a `devDependency` in the `package.json` file. If `package.json` does not exist, it will be generated for you. Now, you can specify your dependencies in the `requirements.txt` file (support for `Pipfile` and `pyproject.toml` is available with additional configuration) and they will be included in the Lambda package during the build process. More details about plugin configuration can be found in the [official documentation](https://github.com/UnitedIncome/serverless-python-requirements). For detailed steps on performing these actions manually, consult the [`EXERCISE.md`](./EXERCISE.md) file.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faphexlog%2Fuserdata-ingestion-pipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faphexlog%2Fuserdata-ingestion-pipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faphexlog%2Fuserdata-ingestion-pipeline/lists"}