{"id":34024845,"url":"https://github.com/obsrvbl-oss/flowlogs-reader","last_synced_at":"2026-04-06T07:02:32.409Z","repository":{"id":1102444,"uuid":"40683335","full_name":"obsrvbl-oss/flowlogs-reader","owner":"obsrvbl-oss","description":"Command line tool and Python library for working with AWS VPC Flow Logs","archived":false,"fork":false,"pushed_at":"2024-09-03T22:35:56.000Z","size":296,"stargazers_count":144,"open_issues_count":5,"forks_count":23,"subscribers_count":30,"default_branch":"main","last_synced_at":"2025-12-15T06:50:37.754Z","etag":null,"topics":["aws","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/obsrvbl-oss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2015-08-13T21:48:17.000Z","updated_at":"2025-08-25T16:31:44.000Z","dependencies_parsed_at":"2025-05-20T19:18:50.254Z","dependency_job_id":null,"html_url":"https://github.com/obsrvbl-oss/flowlogs-reader","commit_stats":null,"previous_names":["obsrvbl/flowlogs-reader"],"tags_count":22,"template":false,"template_full_name":null,"purl":"pkg:github/obsrvbl-oss/flowlogs-reader","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obsrvbl-oss%2Fflowlogs-reader","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obsrvbl-oss%2Fflowlogs-reader/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obsrvbl-oss%2Fflowlogs-reader/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obsrvbl-oss%2Fflowlogs-reader/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/obsrvbl-oss","download_url":"https://codeload.github.com/obsrvbl-oss/flowlogs-reader/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/obsrvbl-oss%2Fflowlogs-reader/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31463015,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-05T21:22:52.476Z","status":"online","status_checked_at":"2026-04-06T02:00:07.287Z","response_time":112,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","python"],"created_at":"2025-12-13T16:42:08.462Z","updated_at":"2026-04-06T07:02:32.404Z","avatar_url":"https://github.com/obsrvbl-oss.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Introduction\n\n[![Build Status](https://github.com/obsrvbl-oss/flowlogs-reader/actions/workflows/main.yml/badge.svg)](https://github.com/obsrvbl-oss/flowlogs-reader/actions/workflows/main.yml)\n[![PyPI Version](https://img.shields.io/pypi/v/flowlogs_reader.svg)](https://pypi.python.org/pypi/flowlogs_reader)\n\nAmazon's VPC Flow Logs are analogous to NetFlow and IPFIX logs, and can be used for security and performance analysis.\n[Observable Networks](https://observable.net) uses VPC Flow logs as an input to endpoint modeling for security monitoring.\n\nThis project contains:\n* A utility for working with VPC Flow Logs on the command line\n* A Python library for retrieving and working with VPC Flow logs\n\nThe tools support reading Flow Logs from both [CloudWatch Logs](https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/flow-logs-cwl.html) and [S3](https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/flow-logs-s3.html).\nFor S3 destinations, [version 3](https://aws.amazon.com/blogs/aws/learn-from-your-vpc-flow-logs-with-additional-meta-data/) custom log formats are supported.\n\nThe library builds on [boto3](https://github.com/boto/boto3) and should work on the [supported versions](https://devguide.python.org/#status-of-python-branches) of Python 3.\n\nFor information on VPC Flow Logs and how to enable them see [this post](https://aws.amazon.com/blogs/aws/vpc-flow-logs-log-and-view-network-traffic-flows/) at the AWS blog.\nYou may use this library with the [kinesis-logs-reader](https://github.com/obsrvbl-oss/kinesis-logs-reader) library when retrieving VPC flow logs from Amazon Kinesis.\n\n\n## Installation\n\nYou can get `flowlogs_reader` by using `pip`:\n\n```\npip install flowlogs_reader\n```\n\nOr if you want to install from source and/or contribute you can clone from GitHub:\n\n```\ngit clone https://github.com/obsrvbl-oss/flowlogs-reader.git\ncd flowlogs-reader\npython setup.py develop\n```\n\n## CLI Usage\n\n`flowlogs-reader` provides a command line interface called `flowlogs_reader` that allows you to print VPC Flow Log records to your screen.\nIt assumes your AWS credentials are available through environment variables, a boto configuration file, or through IAM metadata.\nSome example uses are below.\n\n__Location types__\n\n`flowlogs_reader` has one required argument, `location`. By default that is interpreted as a CloudWatch Logs group.\n\nTo use an S3 location, specify `--location-type='s3'`:\n\n* `flowlogs_reader --location-type=\"s3\" \"bucket-name/optional-prefix\"`\n\n__Printing flows__\n\nThe default action is to `print` flows. You may also specify the `ipset`, `findip`, and `aggregate` actions:\n\n* `flowlogs_reader location` - print all flows in the past hour\n* `flowlogs_reader location print 10` - print the first 10 flows from the past hour\n* `flowlogs_reader location ipset` - print the unique IPs seen in the past hour\n* `flowlogs_reader location findip 198.51.100.2` - print all flows involving 198.51.100.2\n* `flowlogs_reader location aggregate` - aggregate the flows by 5-tuple, then print them as a tab-separated stream (with a header). This requires that each of the fields in the 5-tuple are present in the data format.\n\nYou may combine the output of `flowlogs_reader` with other command line utilities:\n\n* `flowlogs_reader location | grep REJECT` - print all `REJECT`ed Flow Log records\n* `flowlogs_reader location | awk '$6 = 443'` - print all traffic from port 443\n\n__Time windows__\n\nThe default time window is the last hour. You may also specify a `--start-time` and/or an `--end-time`. The `-s` and `-e` switches may be used also:\n\n* `flowlogs_reader --start-time='2015-08-13 00:00:00' location`\n* `flowlogs_reader --end-time='2015-08-14 00:00:00' location`\n* `flowlogs_reader --start-time='2015-08-13 01:00:00' --end-time='2015-08-14 02:00:00' location`\n\nUse the `--time-format` switch to control how start and end times are interpreted. The default is `'%Y-%m-%d %H:%M:%S'`. See the Python documentation for `strptime` for information on format strings.\n\n__Concurrent reads__\n\nGive `--thread-count` to read from multiple log groups or S3 keys at once:\n\n* `flowlogs_reader --thread_count=4 location`\n\n__AWS options__\n\nOther command line switches:\n\n* `flowlogs_reader --region='us-west-2' location` - connect to the given AWS region\n* `flowlogs_reader --profile='dev_profile' location` - use the profile from your [local AWS configuration file](http://docs.aws.amazon.com/cli/latest/topic/config-vars.html) to specify credentials and regions\n* `flowlogs_reader --role-arn='arn:aws:iam::12345678901:role/myrole' --external-id='0a1b2c3d' location` - use the given role and external ID to connect to a 3rd party's account using [`sts assume-role`](http://docs.aws.amazon.com/cli/latest/reference/sts/assume-role.html)\n\nFor CloudWatch Logs locations:\n\n* `flowlogs_reader --fields='${version} ${account-id} ${interface-id} ${srcaddr} ${dstaddr} ${srcport} ${dstport} ${protocol} ${packets} ${bytes} ${start} ${end} ${action} ${log-status}'` - use the given `fields` to prevent the module from querying EC2 for the log line format\n* `flowlogs_reader --filter-pattern='REJECT' location` - use the given [filter pattern](http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/FilterAndPatternSyntax.html) to have the server limit the output\n\nFor S3 locations:\n\n* `flowlogs_reader --location-type='s3' --include-accounts='12345678901,12345678902' bucket-name/optional-prefix` - return logs only for the given accounts\n* `flowlogs_reader --location-type='s3' --include-regions='us-east-1,us-east-2' bucket-name/optional-prefix` - return logs only for the given regions\n\n\n## Module Usage\n\n`FlowRecord` takes an `event` dictionary retrieved from a log stream. It parses the `message` in the event, which takes a record like this:\n\n```\n2 123456789010 eni-102010ab 198.51.100.1 192.0.2.1 443 49152 6 10 840 1439387263 1439387264 ACCEPT OK\n```\n\nAnd turns it into a Python object like this:\n\n```python\n\u003e\u003e\u003e flow_record.srcaddr\n'198.51.100.1'\n\u003e\u003e\u003e flow_record.dstaddr\n'192.0.2.1'\n\u003e\u003e\u003e flow_record.srcport\n443\n\u003e\u003e\u003e flow_record.to_dict()\n{'account_id': '123456789010',\n 'action': 'ACCEPT',\n 'bytes': 840,\n 'dstaddr': '192.0.2.1',\n 'dstport': 49152,\n 'end': datetime.datetime(2015, 8, 12, 13, 47, 44),\n 'interface_id': 'eni-102010ab',\n 'log_status': 'OK',\n 'packets': 10,\n 'protocol': 6,\n 'srcaddr': '198.51.100.1',\n 'srcport': 443,\n 'start': datetime.datetime(2015, 8, 12, 13, 47, 43),\n 'version': 2}\n```\n\n`FlowLogsReader` reads from CloudWatch Logs. It takes the name of a log group and can then yield all the Flow Log records from that group.\n\n```python\n\u003e\u003e\u003e from flowlogs_reader import FlowLogsReader\n... flow_log_reader = FlowLogsReader('flowlog_group')\n... records = list(flow_log_reader)\n... print(len(records))\n176\n```\n\n`S3FlowLogsReader` reads from S3. It takes a `bucket` name or a `bucket/prefix` identifier.\n\nBy default these classes will yield records from the last hour.\n\nYou can control what's retrieved with these parameters:\n* `start_time` and `end_time` are Python `datetime.datetime` objects\n* `region_name` is a string like `'us-east-1'`.\n* `boto_client` is a boto3 client object.\n\nWhen using `FlowLogsReader` with CloudWatch Logs:\n\n* The `fields` keyword is a tuple like `('version', 'account-id')`. If not supplied then the EC2 API will be queried to find out the log format.\n* The `filter_pattern` keyword is a string like `REJECT` or `443` used to filter the logs. See the examples below.\n\nWhen using `S3FlowLogsReader` with S3:\n\n* The `include_accounts` keyword is an iterable of account identifiers (as strings) used to filter the logs.\n* The `include_regions` keyword is an iterable of region names used to filter the logs.\n\n## Examples\n\nStart by importing `FlowLogsReader`:\n\n```python\nfrom flowlogs_reader import FlowLogsReader\n```\n\nFind all of the IP addresses communicating inside the VPC:\n\n```python\nip_set = set()\nfor record in FlowLogsReader('flowlog_group'):\n    ip_set.add(record.srcaddr)\n    ip_set.add(record.dstaddr)\n```\n\nSee all of the traffic for one IP address:\n\n```python\ntarget_ip = '192.0.2.1'\nrecords = []\nfor record in FlowLogsReader('flowlog_group'):\n    if (record.srcaddr == target_ip) or (record.dstaddr == target_ip):\n        records.append(record)\n```\n\nLoop through a few preconfigured profiles and collect all of the IP addresses:\n\n```python\nip_set = set()\nprofile_names = ['profile1', 'profile2']\nfor profile_name in profile_names:\n    for record in FlowLogsReader('flowlog_group', profile_name=profile_name):\n        ip_set.add(record.srcaddr)\n        ip_set.add(record.dstaddr)\n```\n\nApply a filter for UDP traffic that was logged normally (CloudWatch Logs only):\n\n```python\nFILTER_PATTERN = (\n    '[version=\"2\", account_id, interface_id, srcaddr, dstaddr, '\n    'srcport, dstport, protocol=\"17\", packets, bytes, '\n    'start, end, action, log_status=\"OK\"]'\n)\n\nflow_log_reader = FlowLogsReader('flowlog_group', filter_pattern=FILTER_PATTERN)\nrecords = list(flow_log_reader)\nprint(len(records))\n```\n\nRetrieve logs from a list of regions:\n\n```python\nfrom flowlogs_reader import S3FlowLogsReader\n\nreader = S3FlowLogsReader('example-bucket/optional-prefix', include_regions=['us-east-1', 'us-east-2'])\nrecords = list(reader)\nprint(len(records))\n```\n\nYou may aggregate records with the `aggregate_records` function.\nPass in a `FlowLogsReader` or `S3FlowLogsReader` object and optionally a `key_fields` tuple.\nPython `dict` objects will be yielded representing the aggregated flow records.\nBy default the typical `('srcaddr', 'dstaddr', 'srcport', 'dstport', 'protocol')` will be used.\nThe `start`, `end`, `packets`, and `bytes` items will be aggregated.\n\n```python\nflow_log_reader = FlowLogsReader('flowlog_group')\nkey_fields = ('srcaddr', 'dstaddr')\nrecords = list(aggregated_records(flow_log_reader, key_fields=key_fields))\n```\n\nThe number of bytes processed after iterating is available in the `bytes_processed` attribute.\nFor `S3FlowLogsReader` instances there is also a `compressed_bytes_processed` attribute.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fobsrvbl-oss%2Fflowlogs-reader","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fobsrvbl-oss%2Fflowlogs-reader","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fobsrvbl-oss%2Fflowlogs-reader/lists"}