https://github.com/epomatti/az-synapse
Creating and loading data to Azure Synapse
https://github.com/epomatti/az-synapse
azure azure-synapse-analytics datawarehouse pluralsight terraform
Last synced: 7 months ago
JSON representation
Creating and loading data to Azure Synapse
- Host: GitHub
- URL: https://github.com/epomatti/az-synapse
- Owner: epomatti
- License: mit
- Created: 2023-05-19T20:11:08.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-07T15:13:09.000Z (about 1 year ago)
- Last Synced: 2025-01-17T18:34:41.036Z (9 months ago)
- Topics: azure, azure-synapse-analytics, datawarehouse, pluralsight, terraform
- Language: HCL
- Homepage:
- Size: 31.3 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Azure Synapse
## 1 - Create the infrastructure
Run this code to create the infrastructure resources:
```sh
terraform -chdir="infra" init
terraform -chdir="infra" apply -auto-approve
```After completion, enter manually in Synapse and allow Azure Services to connect to Synapse.
## 2 - Load User
Create a dedicated data loading account to use maximum performance.
Create the Login in the `Master` database:
```sql
CREATE LOGIN LoadUser WITH PASSWORD = 'This!s@StrongPW';
CREATE USER LoadUser FOR LOGIN LoadUser;
```Create the User in the data warehousing database:
```sql
CREATE USER LoadUser FOR LOGIN LoadUser;
GRANT CONTROL ON DATABASE::[syndpdatamountain] to LoadUser;
EXEC sp_addrolemember 'staticrc20', 'LoadUser';
```## 3 - Create the External objects
Connect to the DW database with the new user and create the objects:
```sql
CREATE MASTER KEY;CREATE EXTERNAL DATA SOURCE NYTPublic
WITH
(
TYPE = Hadoop,
LOCATION = 'wasbs://2013@nytaxiblob.blob.core.windows.net/'
);CREATE EXTERNAL FILE FORMAT uncompressedcsv
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '',
DATE_FORMAT = '',
USE_TYPE_DEFAULT = False
)
);
CREATE EXTERNAL FILE FORMAT compressedcsv
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS ( FIELD_TERMINATOR = '|',
STRING_DELIMITER = '',
DATE_FORMAT = '',
USE_TYPE_DEFAULT = False
),
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.GzipCodec'
);CREATE SCHEMA ext;
```Now we're ready to create the tables and load the data.
## 4 - Data Load
1. Execute the commands in the [`nyctaxy_schema.sql`](./sql/nyctaxy_schema.sql) file to create the external tables.
2. Execute the commands in the [`nyctaxy_load.sql`](./sql/nyctaxy_load.sql) file to load the data.
To monitor the data load:
```sql
SELECT
r.command,
s.request_id,
r.status,
count(distinct input_name) as nbr_files,
sum(s.bytes_processed)/1024/1024/1024.0 as gb_processed
FROM
sys.dm_pdw_exec_requests r
INNER JOIN sys.dm_pdw_dms_external_work s
ON r.request_id = s.request_id
WHERE
r.[label] = 'CTAS : Load [dbo].[Date]' OR
r.[label] = 'CTAS : Load [dbo].[Geography]' OR
r.[label] = 'CTAS : Load [dbo].[HackneyLicense]' OR
r.[label] = 'CTAS : Load [dbo].[Medallion]' OR
r.[label] = 'CTAS : Load [dbo].[Time]' OR
r.[label] = 'CTAS : Load [dbo].[Weather]' OR
r.[label] = 'CTAS : Load [dbo].[Trip]'
GROUP BY
r.command,
s.request_id,
r.status
ORDER BY
nbr_files desc,
gb_processed desc;
```View the system queries:
```sql
SELECT * FROM sys.dm_pdw_exec_requests;
```