An open API service indexing awesome lists of open source software.

https://github.com/cch0/price-transparency-data

Source code for processing insurance price transparency data
https://github.com/cch0/price-transparency-data

aws-glue big-data data-engineering polars pyspark

Last synced: 3 days ago
JSON representation

Source code for processing insurance price transparency data

Awesome Lists containing this project

README

          

# Price Transparency Data

This repository contains source code for the blog post series titled A Practical Take On Processing Price Transparency Data

- [Part A](https://medium.com/@CCH0/a-practical-take-on-processing-price-transparency-data-part-a-870619c8d48d)
- [Part B](https://medium.com/@CCH0/a-practical-take-on-processing-price-transparency-data-part-b-2d8707ab1522)
- [Part C](https://medium.com/@CCH0/a-practical-take-on-processing-price-transparency-data-part-c-a326e7d99704)


In Part A, we talk about downloading source machine readable data file using AWS Lambda running Python script.

In Part B, we talk about using Polars script to pre-process the source data and store data in Parquet format in S3.

In Part C, we talk about using both PySpark and Polars scripts to produce denormalized data partitioned by billing_code and store final data in Parquet format in S3.