{"id":15563902,"url":"https://github.com/alexcasalboni/serverless-data-pipeline-sam","last_synced_at":"2025-08-20T23:11:56.632Z","repository":{"id":54751311,"uuid":"109898797","full_name":"alexcasalboni/serverless-data-pipeline-sam","owner":"alexcasalboni","description":"Serverless Data Pipeline powered by Kinesis Firehose, API Gateway, Lambda, S3, and Athena","archived":false,"fork":false,"pushed_at":"2018-10-22T23:27:40.000Z","size":15,"stargazers_count":87,"open_issues_count":1,"forks_count":29,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-08-12T21:59:43.185Z","etag":null,"topics":["amazon-web-services","aws","aws-lambda","aws-s3","aws-sam","cloudformation","data-pipeline"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alexcasalboni.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-07T22:33:29.000Z","updated_at":"2025-02-21T03:22:33.000Z","dependencies_parsed_at":"2022-08-14T01:40:42.941Z","dependency_job_id":null,"html_url":"https://github.com/alexcasalboni/serverless-data-pipeline-sam","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/alexcasalboni/serverless-data-pipeline-sam","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexcasalboni%2Fserverless-data-pipeline-sam","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexcasalboni%2Fserverless-data-pipeline-sam/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexcasalboni%2Fserverless-data-pipeline-sam/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexcasalboni%2Fserverless-data-pipeline-sam/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alexcasalboni","download_url":"https://codeload.github.com/alexcasalboni/serverless-data-pipeline-sam/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexcasalboni%2Fserverless-data-pipeline-sam/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271400259,"owners_count":24752830,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-20T02:00:09.606Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon-web-services","aws","aws-lambda","aws-s3","aws-sam","cloudformation","data-pipeline"],"created_at":"2024-10-02T16:29:45.729Z","updated_at":"2025-08-20T23:11:56.576Z","avatar_url":"https://github.com/alexcasalboni.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Serverless Data Pipeline - Powered by AWS SAM\nServerless Data Pipeline build with Amazon API Gateway, AWS Lambda, Amazon Kinesis Firehose, Amazon S3, and Amazon Athena.\n\n## How to deploy the stack \n\nSee `scripts/deploy.sh` (customize your deployment bucket and stack name).\n\n## How to ingest new records via API\n\nSee `scripts/track.sh` (customize your stack name).\n\n## What kind of queries can I run on the dataset?\n\nIt depends on the data that you collect and on the virtual tables that you define on Athena and Glue.\n\nThe file `queries.sql` contains a few sample queries that you can run with the default schema (e.g. `{\"name\": \"John\", \"action\": \"charge\", \"value\": 100}`).\n\n## Resources list\n\nThis stack will create the following resources:\n\n* An **API Gateway endpoint** that you can use to `track` events by submitting any JSON data via the HTTP POST method\n* A **Kinesis Firehose Delivery Stream** that will buffer, optionally compress, and write each record into S3\n* A **Lambda Function** to process/manipulate/clean/skip records before they get written into S3\n* An **S3 Bucket** that will contain all the collected data\n* Three **Athena Named Queries** to get started quickly with serverless queries\n* An **IAM Role and Policy** for API Gateway\n* An **IAM Role and Policy** for Kinesis Firehose\n\n\n## Parameters\n\n* **ApiStageName**: The API Gateway Stage name (e.g. dev, prod, etc.)\n* **FirehoseS3Prefix**: The S3 Key prefix for Kinesis Firehose\n* **FirehoseCompressionFormat**: The compression format used by Kinesis Firehose\n* **FirehoseBufferingInterval**: How long Firehose will wait before writing a new batch into S3\n* **FirehoseBufferingSize**: The maximum batch size in MB\n* **LambdaTimeout**: Lambda's max execution time in seconds\n* **LambdaMemorySize**: Lambda's max memory configuration\n* **AthenaDatabaseName**: The Athena database name\n* **AthenaTableName**: The Athena table name\n\n## Outputs\n\n* **TrackURL**: The public URL to submit new records\n* **BucketName**: The bucket that will store your data\n* **FunctionName**: The Lambda Function that will process/validate records\n\n## Gotchas\n\n* The architecture is 100% serverless (no hourly costs, no servers to manage)\n* The API Gateway endpoint is publicly accessible (i.e. any browser or anonymous website user can potentially submit new records/events)\n* You can customize the template to enable encryption at rest for Kinesis Firehose\n* You can configure Kinesis Firehose's buffering (see Parameters above)\n* Athena's Named Queries cannot be updated (you need to create a new query with a different logical name)\n* Make sure the S3 bucket is empty when you delete the stack\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexcasalboni%2Fserverless-data-pipeline-sam","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexcasalboni%2Fserverless-data-pipeline-sam","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexcasalboni%2Fserverless-data-pipeline-sam/lists"}