{"id":13942589,"url":"https://github.com/AWS-Big-Data-Projects/AWS-Data-Lake","last_synced_at":"2025-07-20T06:31:35.351Z","repository":{"id":46594019,"uuid":"284423968","full_name":"AWS-Big-Data-Projects/AWS-Data-Lake","owner":"AWS-Big-Data-Projects","description":"AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.","archived":false,"fork":false,"pushed_at":"2020-09-11T05:21:53.000Z","size":18,"stargazers_count":17,"open_issues_count":5,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-19T11:28:40.152Z","etag":null,"topics":["aws-s3","datalake"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AWS-Big-Data-Projects.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-08-02T08:43:02.000Z","updated_at":"2025-05-18T21:23:43.000Z","dependencies_parsed_at":"2022-09-23T21:11:55.262Z","dependency_job_id":null,"html_url":"https://github.com/AWS-Big-Data-Projects/AWS-Data-Lake","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AWS-Big-Data-Projects/AWS-Data-Lake","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AWS-Big-Data-Projects%2FAWS-Data-Lake","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AWS-Big-Data-Projects%2FAWS-Data-Lake/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AWS-Big-Data-Projects%2FAWS-Data-Lake/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AWS-Big-Data-Projects%2FAWS-Data-Lake/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AWS-Big-Data-Projects","download_url":"https://codeload.github.com/AWS-Big-Data-Projects/AWS-Data-Lake/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AWS-Big-Data-Projects%2FAWS-Data-Lake/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266076350,"owners_count":23872741,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws-s3","datalake"],"created_at":"2024-08-08T02:01:56.703Z","updated_at":"2025-07-20T06:31:35.108Z","avatar_url":"https://github.com/AWS-Big-Data-Projects.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# AWS-Data-Lake\n\n AWS Lake Formation makes it easy for you to set up, secure, and manage your data lakes also  data discovery using the metadata search capabilities of Lake Formation in the console, and metadata search results restricted by column permissions.\n\n\n![image](https://user-images.githubusercontent.com/48589838/77393674-8e335300-6dc3-11ea-9857-8c44eae11188.png)\n\n\n\n![image](https://user-images.githubusercontent.com/48589838/77393709-a60ad700-6dc3-11ea-80c3-aeca60ee7d45.png)\n\n\n\n![image](https://user-images.githubusercontent.com/48589838/77393859-f7b36180-6dc3-11ea-9542-b2b58b5515a7.png)\n\n\n\n## Steps\n\n### Create the data lake\n\nIn the AWS Lake Formation console, in the left navigation pane, choose Register and ingest, Data lake locations. Select a single S3 bucket to house several independent data sources in your data lake.\n\n### Add data to your data lake\nNow that you have an S3 bucket configured as a storage resource for Lake Formation, you must add data to your data lake. You can add data to your data lake’s S3 bucket storage resource using AWS SDKs, AWS CLI, the S3 console, or a Lake Formation blueprint.\n\nWith Lake Formation, you can discover and set up the ingestion of your source data. When you add a workflow that loads or updates the data lake, you can choose a blueprint or template of the type of importer to add. Lake Formation provides several blueprints on the Lake Formation console for common source data types to simplify the creation of workflows. Workflows point to your data source and target and specify the frequency that they run.\n\n### Sample Datasets are provided as follows\n\nNew York City Taxi and Limousine Commission (TLC) Trip Record Data\nAmazon Customer Reviews\n\n### Add Amazon customer reviews to your data lake\n\n### Add New York taxi ride history to your data lake\n\n### Create catalog databases\n\n define three logical databases:\n\no  amazon-reviews-prod\n\no  amazon-reviews-test\n\no  ny-taxi\n\n### Add tables from S3 to your catalog databases\n\n### Metadata search in the console\n\nSearch by classification\nSearch by keyword\nSearch by tag: attribute\nMultiple filter searches\nMetadata search results restricted by column permissions\n\n###### reference material:https://aws.amazon.com/blogs/big-data/discovering-metadata-with-aws-lake-formation-part-1/\n###### https://aws.amazon.com/blogs/big-data/discover-metadata-with-aws-lake-formation-part-2/\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAWS-Big-Data-Projects%2FAWS-Data-Lake","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAWS-Big-Data-Projects%2FAWS-Data-Lake","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAWS-Big-Data-Projects%2FAWS-Data-Lake/lists"}