https://github.com/AWS-Big-Data-Projects/AWS-Data-Lake
AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.
https://github.com/AWS-Big-Data-Projects/AWS-Data-Lake
aws-s3 datalake
Last synced: 5 months ago
JSON representation
AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.
- Host: GitHub
- URL: https://github.com/AWS-Big-Data-Projects/AWS-Data-Lake
- Owner: AWS-Big-Data-Projects
- License: apache-2.0
- Created: 2020-08-02T08:43:02.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-09-11T05:21:53.000Z (over 4 years ago)
- Last Synced: 2024-08-09T02:19:35.947Z (8 months ago)
- Topics: aws-s3, datalake
- Homepage:
- Size: 17.6 KB
- Stars: 16
- Watchers: 2
- Forks: 3
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - AWS-Big-Data-Projects/AWS-Data-Lake - AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search (Others)
README
# AWS-Data-Lake
AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.



## Steps
### Create the data lake
In the AWS Lake Formation console, in the left navigation pane, choose Register and ingest, Data lake locations. Select a single S3 bucket to house several independent data sources in your data lake.
### Add data to your data lake
Now that you have an S3 bucket configured as a storage resource for Lake Formation, you must add data to your data lake. You can add data to your data lake’s S3 bucket storage resource using AWS SDKs, AWS CLI, the S3 console, or a Lake Formation blueprint.With Lake Formation, you can discover and set up the ingestion of your source data. When you add a workflow that loads or updates the data lake, you can choose a blueprint or template of the type of importer to add. Lake Formation provides several blueprints on the Lake Formation console for common source data types to simplify the creation of workflows. Workflows point to your data source and target and specify the frequency that they run.
### Sample Datasets are provided as follows
New York City Taxi and Limousine Commission (TLC) Trip Record Data
Amazon Customer Reviews### Add Amazon customer reviews to your data lake
### Add New York taxi ride history to your data lake
### Create catalog databases
define three logical databases:
o amazon-reviews-prod
o amazon-reviews-test
o ny-taxi
### Add tables from S3 to your catalog databases
### Metadata search in the console
Search by classification
Search by keyword
Search by tag: attribute
Multiple filter searches
Metadata search results restricted by column permissions###### reference material:https://aws.amazon.com/blogs/big-data/discovering-metadata-with-aws-lake-formation-part-1/
###### https://aws.amazon.com/blogs/big-data/discover-metadata-with-aws-lake-formation-part-2/