{"id":20789077,"url":"https://github.com/timoguin/aws-data-tools-py","last_synced_at":"2025-05-05T18:45:31.142Z","repository":{"id":37185349,"uuid":"347561746","full_name":"timoguin/aws-data-tools-py","owner":"timoguin","description":"A Python library for querying and transforming data from AWS APIs","archived":false,"fork":false,"pushed_at":"2025-01-06T20:35:52.000Z","size":516,"stargazers_count":4,"open_issues_count":6,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-02T12:11:45.651Z","etag":null,"topics":["aws","aws-organizations","cli","etl","library"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timoguin.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":".github/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":".github/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-14T06:28:47.000Z","updated_at":"2024-08-19T17:20:22.000Z","dependencies_parsed_at":"2024-02-10T14:29:10.681Z","dependency_job_id":"48b7136e-f3db-4078-99f7-db6a136cff9a","html_url":"https://github.com/timoguin/aws-data-tools-py","commit_stats":{"total_commits":23,"total_committers":2,"mean_commits":11.5,"dds":"0.17391304347826086","last_synced_commit":"b403ffad9a0aa1092551679231c56f13f73eaebf"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":"timoguin/repo-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timoguin%2Faws-data-tools-py","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timoguin%2Faws-data-tools-py/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timoguin%2Faws-data-tools-py/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timoguin%2Faws-data-tools-py/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timoguin","download_url":"https://codeload.github.com/timoguin/aws-data-tools-py/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252555670,"owners_count":21767207,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-organizations","cli","etl","library"],"created_at":"2024-11-17T15:19:27.006Z","updated_at":"2025-05-05T18:45:31.111Z","avatar_url":"https://github.com/timoguin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AWS Data Tools\n\n\u003c!-- Badges --\u003e\n[![Actions CI Status][gh-actions-ci-badge]][gh-actions-ci-link]\n[![Actions CodeQL Status][gh-actions-codeql-badge]][gh-actions-codeql-link]\n[![PyPI][pypi-badge]][pypi-link]\n[![License][license-badge]][license-link]\n\nA set of opinioned (but flexible) Python libraries for querying and transforming data\nfrom various AWS APIs, as well as a CLI interface.\n\nThis is in early development.\n\n## Installation\n\nUsing pip should work on any system with at least Python 3.9:\n\n```\n$ pip install aws-data-tools\n```\n\nBy default, the CLI is not installed. To include it, you can specify it as an extra:\n\n```\n$ pip install aws-data-tools[cli]\n```\n\nGraphviz is also an optional dependency that is required for outputing an AWS\nOrganization as a DOT-formatted file that can be used to generate a digraph:\n\n```\n$ pip install aws-data-tools[graphviz]\n```\n\nTo install everything, you can specify \"all\" as an extra:\n\n```\n$ pip install aws-data-tools[all]\n```\n\n## Quickstart\n\nThe quickest entrypoints are using the data builders and the CLI.\n\nTo dump a data representation of an AWS Organization, you can do the following using\nthe builder:\n\n```python\nfrom aws_data_tools.models.organizations import OrganizationDataBuilder\n\nodb = OrganizationDataBuilder(init_all=True)\norganization = odb.as_json()\n```\n\nHere is how to do the same thing with the CLI:\n\n```\n$ awsdata organization dump-all\n```\n\n## Usage\n\nThere are currently 4 main components of the package: helpers for working with AWS\nsession and APIs, data models for API data types, builders to query AWS APIs and\nperform deserialization and ETL operations of raw data, and a CLI tool to further\nabstract some of these operations.\n\n### Builders\n\nWhile it is possible to directly utilize and interact with the data models, probably\nthe largest benefit are the builders. They abstract any API operations and data\ntransformations required to build data models. The models can then be serialized to\ndicts, as well as DOT, JSON, or YAML strings.\n\nA full model of an AWS Organization can be constructed using the\n`OrganizationDataBuilder` class. It handles recursing the organizational tree and\npopulating any relational data between the various nodes, e.g., parent-child\nrelationships between an OU and an account.\n\nThe simplest example pulls all supported organizational data and creates the related\ndata models:\n\n```python\nfrom aws_data_tools.models.organizations import OrganizationDataBuilder\n\nodb = OrganizationDataBuilder(init_all=True)\n```\n\nNote that this makes many API calls to get this data. For example, every OU, policy,\nand account requires an API call to pull any associated tags, so every node requires at\nleast `n+3` API calls. Parallel operations are not supported, so everything runs\nserially.\n\nTo get a sense of the number of API calls required to populate organization data, an\norganization with 50 OUs, 5 policies, 200 accounts, and with all policy types activated\nrequires 316 API calls! That's why this library was created.\n\nFor more control over the process, you can init each set of components as desired:\n\n```python\nfrom aws_data_tools.models.organizations import OrganizationDataBuilder\n\norg = OrganizationDataBuilder()\norg.init_connection()\norg.init_organization()\norg.init_root()\norg.init_policies()\norg.init_policy_tags()\norg.init_ous()\norg.init_ou_tags()\norg.init_accounts()\norg.init_account_tags()\norg.init_policy_targets()\norg.init_effective_policies()\n```\n\n### CLI\n\nAs noted above, the CLI is an optional component that can be installed using pip's\nbracket notation for extras:\n\n```\n$ pip install aws-data-tools[cli]\n```\n\nWith no arguments or flags, help content is displayed by default. You can also pass the\n`--help` flag for the help content of any commands or subcommands.\n\n```\n$ awsdata\nUsage: awsdata [OPTIONS] COMMAND [ARGS]...\n\n  A command-line tool to interact with data from AWS APIs\n\nOptions:\n  --version    Show the version and exit.\n  -d, --debug  Enable debug mode\n  -h, --help   Show this message and exit.\n\nCommands:\n  organization  Interact with data from AWS Organizations APIs\n```\n\nHere is how to dump a JSON representation of an AWS Organization to stdout:\n\nThe `organization` subcommand allows dumping organization data to a file or to stdout:\n\n```\n$ awsdata organization dump-json --format json\nUsage: awsdata organization dump-json [OPTIONS]\n\n  Dump a JSON representation of the organization\n\nOptions:\n  --no-accounts             Exclude account data from the model\n  --no-policies             Exclude policy data from the model\n  -f, --format [JSON|YAML]  The output format for the data\n  -o, --out-file TEXT       File path to write data instead of stdout\n  -h, --help                Show this message and exit.\n```\n\nIt also supports looking up details about individual accounts:\n\n```\n$ awsdata organization lookup-accounts --help\nUsage: awsdata organization lookup-accounts [OPTIONS]\n\n  Query for account details using a list of account IDs\n\nOptions:\n  -a, --accounts TEXT           A space-delimited list of account IDs\n                                [required]\n  --include-effective-policies  Include effective policies for the accounts\n  --include-parents             Include parent data for the accounts\n  --include-tags                Include tags applied to the accounts\n  --include-policies            Include policies attached to the accounts\n  -h, --help                    Show this message and exit.\n```\n\n### API Client\n\nThe [APIClient](aws_data_models/client.py) class wraps the initialization of a boto3\nsession and a low-level client for a named service. It contains a single `api()`\nfunction that takes the name of an API operation and any necessary request data as\nkwargs.\n\nIt supports automatic pagination of any API operations that support it. The pagination\nconfig is set to `{'MaxItems': 500}` by default, but a `pagination_config` dict can be\npassed for any desired customizations.\n\nWhen initializing the class, it will create a session and a client.\n\n```python\nfrom aws_data_tools.client import APIClient\n\nclient = APIClient(\"organizations\")\norg = client.api(\"describe_organization\").get(\"organization\")\nroots = client.api(\"list_roots\")\nous = client.api(\"list_organizational_units_for_parent\", parent_id=\"r-abcd\").get(\n    \"organizational_units\"\n)\n```\n\nNote that, generally, any list operations will return a list with no further filtering\nrequired, while describe calls will have the data keyed under the name of the object\nbeing described. For example, describing an organization returns the relavant data\nunder an `organization` key.\n\nFurthermore, you may notice above that API operations and their corresponding arguments\nsupport `snake_case` format. Arguments can also be passed in the standard `PascalCase`\nformat that the APIs utilize. Any returned data has any keys converted to `snake_case`.\n\nThe raw boto3 session is available as the `session` field, and the raw, low-level\nclient is available as the `client` field.\n\n### Data Models\n\nThe [models](aws_data_tools/models) package contains a collection of opinionated models\nimplemented as data classes. There is a package for each available service. Each one is\nnamed after the service that would be passed when creating a boto3 client using\n`boto3.client('service_name')`.\n\nMost data types used with the Organizations APIs are supported. The top-level\n`Organization` class is the most useful, as it also acts as a container for all other\nrelated data in the organization.\n\nThe following data types and operations are currently not supported:\n\n- Viewing organization handshakes (for creating and accepting account invitations)\n- Viewing the status of accounts creations\n- Viewing organization integrations with AWS services (for org-wide implementations of\n  things like CloudTrail, Config, etc.)\n- Viewing delegated accounts and services\n- Any operations that are not read-only\n\nAll other data types are supported.\n\n```python\nfrom aws_data_tools.client import APIClient\nfrom aws_data_tools.models.organizations import Organization\n\nclient = APIClient(\"organizations\")\ndata = client.api(\"describe_organization\").get(\"organization\")\norg = Organization(**data)\norg.as_json()\n```\n\nView the [package](aws_data_tools/models/organization/__init__.py) for the full list of\nmodels.\n\n## Roadmap\n\nThe goal of this package is to provide consistent, enriched schemas for data from both\nraw API calls and data from logged events. We should also be able to unwrap and parse\ndata from messaging and streaming services like SNS, Kinesis, and EventBridge.\n\nHere are some examples:\n\n- Query Organizations APIs to build consistent, denormalized models of organizations\n- Validate and enrich data from CloudTrail log events\n- Parse S3 and ELB access logs into JSON\n\nThis initial release only contains support for managing data from AWS Organizations\nAPIs.\n\nThe following table shows what kinds of things may be supported in the future:\n\n| Library Name  | Description                                                       | Data Type | Data Sources                                                  | Supported |\n|---------------|-------------------------------------------------------------------|-----------|---------------------------------------------------------------|-----------|\n| organizations | Organization and OU hierarchy, policies, and accounts             | API       | Organizations APIs                                            | ☑         |\n| cloudtrail    | Service API calls recorded by CloudTrail                          | Log       | S3 / SNS / SQS / CloudWatch Logs / Kinesis / Kinesis Firehose | ☐         |\n| s3            | Access logs for S3 buckets                                        | Log       | S3 / SNS / SQS                                                | ☐         |\n| elb           | Access logs from Classic, Application, and Network Load Balancers | Log       | S3 / SNS / SQS                                                | ☐         |\n| vpc_flow      | Traffic logs from VPCs                                            | Log       | S3 / CloudWatch Logs / Kinesis / Kinesis Firehose             | ☐         |\n| config        | Resource state change events from AWS Config                      | Log       | S3 / SNS / SQS                                                | ☐         |\n| firehose      | Audit logs for Firehose delivery streams                          | Log       | CloudWatch Logs / Kinesis / Kinesis Firehose                  | ☐         |\n| ecs           | Container state change events                                     | Log       | CloudWatch Events / EventBridge                               | ☐         |\n| ecr           | Repository events for stored images                               | Log       | CloudWatch Events / EventBridge                               | ☐         |\n\nReferences:\n\n- CloudWatch Logs: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/aws-services-sending-logs.html\n- CloudWatch Events: https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html\n\n## Contributing\n\nView the [Contributing Guide](.github/CONTRIBUTING.md) to learn about giving back.\n\n\n\u003c!-- Markown anchors --\u003e\n[gh-actions-ci-badge]: https://github.com/timoguin/aws-data-tools-py/actions/workflows/ci.yml/badge.svg\n[gh-actions-ci-link]: https://github.com/timoguin/aws-data-tools-py/actions/workflows/ci.yml\n[gh-actions-codeql-badge]: https://github.com/timoguin/aws-data-tools-py/actions/workflows/codeql-analysis.yml/badge.svg\n[gh-actions-codeql-link]: https://github.com/timoguin/aws-data-tools-py/actions/workflows/codeql-analysis.yml\n[license-badge]: https://img.shields.io/github/license/timoguin/aws-data-tools-py.svg\n[license-link]: https://github.com/timoguin/aws-data-tools-py/blob/main/LICENSE\n[pypi-badge]: https://badge.fury.io/py/aws-data-tools.svg\n[pypi-link]: https://pypi.python.org/pypi/aws-data-tools\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimoguin%2Faws-data-tools-py","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimoguin%2Faws-data-tools-py","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimoguin%2Faws-data-tools-py/lists"}