{"id":19308728,"url":"https://github.com/mtulio/s3-stream","last_synced_at":"2025-07-06T13:07:14.499Z","repository":{"id":89055553,"uuid":"100818840","full_name":"mtulio/s3-stream","owner":"mtulio","description":"S3 Streaming data to output broker, logger or database","archived":false,"fork":false,"pushed_at":"2021-03-11T03:57:54.000Z","size":22,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-24T03:17:49.485Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mtulio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-19T20:11:40.000Z","updated_at":"2021-03-11T03:57:56.000Z","dependencies_parsed_at":"2023-06-13T17:54:15.110Z","dependency_job_id":null,"html_url":"https://github.com/mtulio/s3-stream","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mtulio/s3-stream","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtulio%2Fs3-stream","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtulio%2Fs3-stream/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtulio%2Fs3-stream/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtulio%2Fs3-stream/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mtulio","download_url":"https://codeload.github.com/mtulio/s3-stream/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mtulio%2Fs3-stream/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263905742,"owners_count":23527972,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T00:16:16.470Z","updated_at":"2025-07-06T13:07:14.477Z","avatar_url":"https://github.com/mtulio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# s3-stream\n\nS3stream streaming S3 object text data, like Cloud Front logs, to an given\noutput. Supported output are:\n\n* Kafka\n\nBut we will support these outputs:\n\n* Syslog (in dev)\n* Elasticsearch\n* Raw logs\n\n## Overview\n\nThis project will get an S3 file, from an SQS notification (we are assuming\nthat you have already create it), filter something (no required) and publish\non Kafka, or other output providers.\n\nThe simple architecture are:\n\n```\n  |_S3_| -\u003e |_SQS_|\u003c-----------.\n               |               |\n.--------------:---------------|--------------.\n:|_INIT_|      |               |-|_ONE_SHOOT_|:--\u003e sys.exit(0)\n:   |          |         |_SQS_DELETE_|       :\n:   '--\u003e|_SQS_POOLER_|\u003c--------|              :\n:              |               |              :\n:        |_PARSE_MSG_|         |              :\n:              |               |              :\n:           |_S3_GET_|         |              :\n:              |               |              :\n:        |_EXTRACTOR_|         |              :\n:              |               |              :\n:           |_FILTER_|         |              :\n:              |               |              :\n:          |_PUBLISH_|         |              :\n:              |             |_OK_| |_FAIL_|--:--\u003e sys.exit(1)\n:              |               |_______|      :\n'--------------:---------------|--------------'\n               |               |\n               |           |_RESULT_|\n     __________|               |\n    |                          |\n  |_KAFKA_|-------------------\u003e|\n  |_ELASTICSEARCH_|-----------\u003e|\n  |_GRAYLOG_GELF_|------------\u003e|\n  |_SYSLOG_|------------------\u003e|\n\n```\n\n* Limitations\n\nThe project assumes that the SNS topic is already created, and you need to monitor both queue and s3stream, otherwise when processor stops the queue could increase drasticaly (take care of the costs).\n\n## Use case\n\n1) Near real time Cloud Front log processor to parse logs generated by gzip files from S3 to Kafka stopic, then Graylog could read as input plugin and index it - on ElasticSearch.\n\nThis solution, parser and processor, could take a lot of IOPS, so we have successful tested on i3.large running both s3stream and kafka using ephemeral storage (NVMe) processing millions of log messages by hour.\n\n\n## Goals\n\n* Filter messages before stream output\n* Support SQS pooler to run with interval\n* Support stream to kafka\n* Support stream to syslog\n* Support stream to gelf (graylog server)\n* Support stream to Elasticsearch\n* Support dry-run\n* Support to run as a deamon\n* Support to get last executions information\n* Support to run the consumer, when it's already running\n* Improve consumer metrics\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmtulio%2Fs3-stream","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmtulio%2Fs3-stream","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmtulio%2Fs3-stream/lists"}