https://github.com/nitor-infotech-oss/large-datafile-process-spark
https://github.com/nitor-infotech-oss/large-datafile-process-spark
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/nitor-infotech-oss/large-datafile-process-spark
- Owner: nitor-infotech-oss
- Created: 2023-06-16T04:36:38.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-14T09:55:23.000Z (about 2 years ago)
- Last Synced: 2025-03-27T04:16:40.522Z (7 months ago)
- Language: Python
- Size: 506 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### Handle Multiple Header File with Azure key vault Integration.
Overview
The purpose of this script is to download a file from Azure Blob Storage using the secret key vault credentials and separate the file data into multiple CSV files based on their headers.
Prerequisites
- Azure Blob Storage account
- Secret key vault credentials
Instructions
1. Set up the necessary environment configuration for the Azure Blob Storage account and secret key vault credentials.
2. Install the Azure Blob Storage Python library.
3. Use the Azure Blob Storage Python library to download the file with the secret key vault credentials.
4. Open the file and read the data.
5. Upload the separated CSVs back to Azure Blob Storage.
Usage
1. Set up the environment configuration variables for Azure Blob Storage and the secret key vault credentials.
2. Run the script to download the file and separate the data into multiple CSV files.
3. upload the separated CSV files back to Azure Blob Storage.