https://github.com/cch0/price-transparency-data
Source code for processing insurance price transparency data
https://github.com/cch0/price-transparency-data
aws-glue big-data data-engineering polars pyspark
Last synced: 3 days ago
JSON representation
Source code for processing insurance price transparency data
- Host: GitHub
- URL: https://github.com/cch0/price-transparency-data
- Owner: cch0
- License: apache-2.0
- Created: 2025-01-04T00:04:00.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-04T00:42:59.000Z (over 1 year ago)
- Last Synced: 2025-02-23T23:15:03.470Z (over 1 year ago)
- Topics: aws-glue, big-data, data-engineering, polars, pyspark
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Price Transparency Data
This repository contains source code for the blog post series titled A Practical Take On Processing Price Transparency Data
- [Part A](https://medium.com/@CCH0/a-practical-take-on-processing-price-transparency-data-part-a-870619c8d48d)
- [Part B](https://medium.com/@CCH0/a-practical-take-on-processing-price-transparency-data-part-b-2d8707ab1522)
- [Part C](https://medium.com/@CCH0/a-practical-take-on-processing-price-transparency-data-part-c-a326e7d99704)
In Part A, we talk about downloading source machine readable data file using AWS Lambda running Python script.
In Part B, we talk about using Polars script to pre-process the source data and store data in Parquet format in S3.
In Part C, we talk about using both PySpark and Polars scripts to produce denormalized data partitioned by billing_code and store final data in Parquet format in S3.