{"id":24012894,"url":"https://github.com/glassechidna/config2jsonlines","last_synced_at":"2025-10-28T12:43:01.250Z","repository":{"id":84300168,"uuid":"289415327","full_name":"glassechidna/config2jsonlines","owner":"glassechidna","description":"Transform AWS Config snapshots to a more AWS Athena-friendly format.","archived":false,"fork":false,"pushed_at":"2020-08-26T03:56:01.000Z","size":283,"stargazers_count":11,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-06-25T03:03:28.958Z","etag":null,"topics":["aws","aws-athena","aws-config","aws-sdk-go","golang","lambda"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/glassechidna.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-22T03:58:33.000Z","updated_at":"2025-05-02T07:00:40.000Z","dependencies_parsed_at":null,"dependency_job_id":"ef303fa6-5765-4098-a121-bcf1a3925e1a","html_url":"https://github.com/glassechidna/config2jsonlines","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/glassechidna/config2jsonlines","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glassechidna%2Fconfig2jsonlines","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glassechidna%2Fconfig2jsonlines/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glassechidna%2Fconfig2jsonlines/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glassechidna%2Fconfig2jsonlines/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/glassechidna","download_url":"https://codeload.github.com/glassechidna/config2jsonlines/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/glassechidna%2Fconfig2jsonlines/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281441024,"owners_count":26501758,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-28T02:00:06.022Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-athena","aws-config","aws-sdk-go","golang","lambda"],"created_at":"2025-01-08T06:22:51.774Z","updated_at":"2025-10-28T12:43:01.220Z","avatar_url":"https://github.com/glassechidna.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `config2jsonlines`\n\n[AWS Config](https://aws.amazon.com/config/) is a service that can record the\nconfiguration of all your resources in AWS. It can deliver configuration \"snapshots\"\non a regular schedule to an S3 bucket to allow you to do further analysis.\n\nThis naturally pairs well with [AWS Athena](https://aws.amazon.com/athena/), a\nservice that allows you to perform ad hoc SQL queries on files stored in S3. Athena\ncan query arbitrary JSON, so it _should_ have no problem with files generated\nby AWS Config, right? In fact, the AWS blog even has an article:\n[_How to query your AWS resource configuration states using AWS Config and Amazon Athena_.](https://aws.amazon.com/blogs/mt/how-to-query-your-aws-resource-configuration-states-using-aws-config-and-amazon-athena/)\n\nThe examples in the blog post work for small AWS accounts, but in large accounts the \nqueries will consistently fail. I don't know enough about Athena to be certain, but I suspect the \n`CROSS JOIN UNNEST(configurationitems)` part of the query is loading the entire\ndecompressed config snapshot into memory - which in my case is more than a gigabyte -\nand runs out of memory.\n\nTo work around this, I created an AWS Lambda-powered app that \"unnests\" the config\nsnapshot JSON in advance - rather than being a gigabyte-long JSON array on one line,\nit is instead represented in [JSON lines](http://jsonlines.org/). This format is\nsupported by AWS Athena and I've yet to write a query that fails in the same way \nthat the original files do. This is a visual depiction of the transformation:\n\n![](readme-picture.png)\n\nThere are two happy coincidental benefits to this as well:\n\n* The owner of the new S3 objects is the AWS account, so you have no problems\n  allowing cross-account Athena access to the files.\n* Querying the files no longer requires `CROSS JOIN UNNEST(configurationitems)`,\n  which I always found confusing.\n  \n## Deployment\n\nTODO.\n\n## Implementation\n\nSome people are into this sort of thing. \n\n![](implementation.png)\n\n1. AWS SAM doesn't support configuring S3 event notifications for a bucket defined\n   outside of the same template as the function. So this part currently needs to be\n   done manually. A CloudFormation custom resource would be nicer.\n\n2. It would be nice to be \"backfill\" historical Config snapshots that were uploaded\n   to the input bucket before this solution was deployed. (This is actually why I \n   went with the indirection of the SQS queue in the middle.)\n\n## FAQ\n\n**Why Go?**\n\nGiven that a ~40MB gzipped Config snapshot can decompress to 1GB+ and there will be\naccounts out there bigger than mine, I decided that this needed to be implemented\nin a \"streaming\" fashion, i.e. it can process a file without having to decompress\nor parse the entire JSON first. I didn't know how to do this in any other language.\n\n**Are you going to be embarrassed to find out that Athena works perfectly fine\nwith big Config snapshots and you just did something wrong?**\n\nWow, it's like you're in my head. But to answer your question: yes it'll be \nembarrassing, but I'll have learned something new and I'll get to delete code,\nwhich is the only thing more pleasurable than writing code.\n\n**Should this instead have been implemented using AWS Glue?**\n\nI _think_ so? But I honestly can't wrap my head around Glue. I'd be delighted\nif someone else were to do so, I think I'd learn a lot from it.\n\n**Are those questions really _frequently_ asked?**\n\nNo, but I couldn't think of another format to relay that information. My apologies\nfor any deception.\n ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglassechidna%2Fconfig2jsonlines","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fglassechidna%2Fconfig2jsonlines","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglassechidna%2Fconfig2jsonlines/lists"}