{"id":20716333,"url":"https://github.com/richclement/aws-data-lake-sdk","last_synced_at":"2025-05-10T22:31:58.639Z","repository":{"id":57188224,"uuid":"99370629","full_name":"richclement/aws-data-lake-sdk","owner":"richclement","description":"An sdk for the AWS data lake.","archived":true,"fork":false,"pushed_at":"2017-08-08T19:34:17.000Z","size":44,"stargazers_count":1,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-12T08:48:41.666Z","etag":null,"topics":["aws","datalake","sdk"],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/richclement.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-08-04T18:40:03.000Z","updated_at":"2024-10-25T16:24:20.000Z","dependencies_parsed_at":"2022-08-28T13:00:38.541Z","dependency_job_id":null,"html_url":"https://github.com/richclement/aws-data-lake-sdk","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richclement%2Faws-data-lake-sdk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richclement%2Faws-data-lake-sdk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richclement%2Faws-data-lake-sdk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richclement%2Faws-data-lake-sdk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/richclement","download_url":"https://codeload.github.com/richclement/aws-data-lake-sdk/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253492529,"owners_count":21916959,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","datalake","sdk"],"created_at":"2024-11-17T03:05:30.548Z","updated_at":"2025-05-10T22:31:56.270Z","avatar_url":"https://github.com/richclement.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"AWS Data Lake SDK\n=================\n\nThis project provides an SDK for the Data Lake Solution provided by the AWS Solutions Builder group. An introduction to their solution is available [here](https://aws.amazon.com/answers/big-data/data-lake-solution/ \"AWS Data Lake introduction\"). Detailed documentation for their solution is available [here](http://docs.awssolutionsbuilder.com/data-lake/ \"AWS Data Lake documentation\").\n\nThis SDK follows the pattern of commands provided in the [Data Lake CLI] (http://docs.awssolutionsbuilder.com/data-lake/cli/cli-getting-started/ \"AWS Data Lake CLI\") in order to make it easy to transition between the two tools when interacting with a data lake.\n\n## Usage\n\n### Install\n\n```\n$ npm install --save aws-data-lake-sdk\n```\n\n### Require\n``` javascript\nconst Datalake = require('aws-data-lake-sdk');\n```\n\n### Configure an API object\n``` javascript\n// The config to create a Datalake object requires the following properties\n// An API Access Key can be created in the Administration-\u003eUsers section\n// An API Secret Access Key can be created in the My Account-\u003eProfile section\n// The Data Lake API Endpoint URL can be found in the My Account-\u003eProfile section\nconst datalakeConfig = {\n  accessKey: 'my-access-key',\n  secretAccessKey: 'my-secret-access-key',\n  apiEndpointHost: 'my-api-endpoint'\n};\n\nconst package = new Datalake.Package(datalakeConfig);\nconst metadata = new Datalake.Metadata(datalakeConfig);\nconst cart = new Datalake.Cart(datalakeConfig);\n```\n\n**Use case 1: Search the data lake** Search the data lake for packages based on a keyword search of the package name, description, and metadata tags.\n``` javascript\nconst package = new Datalake.Package(datalakeConfig);\n\npackage.search({\n  terms: 'weather atlanta'\n}).then(searchResults =\u003e {\n  console.log('Search results: ');\n  console.log(JSON.stringify(searchResults));\n\n  // {\n  //   Items: [{\n  //     updated_at: \"2017-08-04T12:45:00Z\",\n  //     package_id: \"abcd123ty\",\n  //     created_at: \"2017-08-04T12:45:00Z\",\n  //     deleted: false,\n  //     owner: \"datalake_owner\",\n  //     description: \"Weather in Atlanta. Collected 2017-08-04T12:45:00Z.\",\n  //     name: \"Weather in Atlanta\",\n  //     metadata: [\n  //       { value: \"weather\", tag: \"data\" },\n  //       { value: \"atlanta\", tag: \"location\" }]\n  //   },\n  //   {\n  //     updated_at: \"2017-08-04T12:46:00Z\",\n  //     package_id: \"abcd456ur\",\n  //     deleted: false,\n  //     created_at: \"2017-08-04T12:46:00Z\",\n  //     owner: \"datalake_owner\",\n  //     description: \"Weather in Atlanta. Collected 2017-08-04T12:46:00Z\",\n  //     name: \"Weather in Atlanta\",\n  //     metadata: [\n  //       { value: \"weather\", tag: \"data\" },\n  //       { value: \"atlanta\", tag: \"location\" }]\n  //   }]\n  // }\n});\n```\n\n**Use case 2: Create a Package** Create a package in the data lake. A package must be created before adding data files to the data lake.\n``` javascript\n// The config to create a Datalake object requires the following properties\n// An API Access Key can be created in the Administration-\u003eUsers section\n// An API Secret Access Key can be created in the My Account-\u003eProfile section\n// The Data Lake API Endpoint URL can be found in the My Account-\u003eProfile section\nconst datalakeConfig = {\n  accessKey: 'my-access-key',\n  secretAccessKey: 'my-secret-access-key',\n  apiEndpointHost: 'my-api-endpoint'\n};\n\nconst package = new Datalake.Package(datalakeConfig);\n\n// Create a package\npackage.createPackage({\n  packageName: 'Sample Package',\n  packageDescription: 'Sample package created using package.createPackage(...)',\n  metadata: [\n    { tag: 'first-tag', value: 'first-value' }\n  ]\n}).then(response =\u003e {\n  console.log(response);\n\n  // {\n  //   package_id: \"abcd098ty\",\n  //   created_at: \"2017-08-07T18:30:00Z\",\n  //   updated_at: \"2017-08-07T18:30:00Z\",\n  //   owner: \"datalake_admin\",\n  //   name: \"Sample Package\",\n  //   description: \"Sample package created using package.createPackage(...)\",\n  //   deleted: false\n  // }\n});\n```\n\n**Use case 3: Add a file to a Package** Create a dataset containing a data file inside a package in the data lake. A package can contain 0 or more datasets, with each dataset containing a single file.\n``` javascript\nconst fs = require('fs');\nconst package = new Datalake.Package(datalakeConfig);\nvar fileName = 'new-data-file.zip';\n\nvar stats = fs.lstatSync(fileName);\nvar readableStreamFromFileSystem = fs.createReadStream(fileName);\n\npackage.uploadPackageDataset({\n  packageId: 'abcd098ty',\n  fileName: fileName,\n  fileSize: stats.size,\n  fileStream: readableStreamFromFileSystem,\n  contentType: 'application/zip'\n}).then(data =\u003e {\n  console.log('New dataset information: ');\n  console.log(JSON.stringify(data));\n\n  // {\n  //   Items: [{\n  //     updated_at: \"2017-08-07T18:30:00Z\",\n  //     package_id: \"abcd098ty\",\n  //     created_at: \"2017-08-07T18:30:00Z\",\n  //     s3_bucket: \"data-lake-us-east-1-012345678901\",\n  //     content_type: \"application/zip\",\n  //     created_by: \"datalake_admin\",\n  //     dataset_id: \"ABC123xyz\",\n  //     owner: \"datalake_admin\",\n  //     name: \"new-data-file.zip\",\n  //     s3_key: \"ABC123xyz/1504794600000/new-data-file.zip\",\n  //     type: \"dataset\"\n  //   }],\n  //   Count: 1,\n  //   ScannedCount: 1\n  // }\n});\n```\n\n**Use case 4: Describe a Package** Get the descriptive information about a specific package in the data lake.\n``` javascript\nconst package = new Datalake.Package(datalakeConfig);\n\npackage.describePackage({\n  packageId: 'abcd098ty'\n}).then(response =\u003e {\n  console.log('Description results: ');\n  console.log(JSON.stringify(response));\n\n  // {\n  //   Item: {\n  //     updated_at: \"2017-08-03T15:30:00Z\",\n  //     package_id: \"abcd098ty\",\n  //     deleted: false,\n  //     created_at: \"2017-08-03T15:30:00Z\",\n  //     owner: \"datalake_owner\",\n  //     description: \"Sample package created by unit test\",\n  //     name: \"Sample package\"\n  //   }\n  // }\n});\n```\n\n**Use case 5: Update a Package** Update the information describing a package in the data lake.\n``` javascript\nconst package = new Datalake.Package(datalakeConfig);\n\npackage.updatePackage({\n  packageId: 'abcd098ty',\n  packageName: 'Sample package v2',\n  packageDescription: 'A new description for the sample package.'\n}).then(response =\u003e {\n  console.log('Update results: ');\n  console.log(JSON.stringify(response));\n\n  // {\n  //   Item: {\n  //     updated_at: \"2017-08-03T15:35:00Z\",\n  //     package_id: \"abcd098ty\",\n  //     deleted: false,\n  //     created_at: \"2017-08-03T15:30:00Z\",\n  //     owner: \"datalake_owner\",\n  //     description: \"A new description for the sample package.\",\n  //     name: \"Sample package v2\"\n  //   }\n  // }\n});\n```\n\n**Use case 6: Delete a Package** Delete a package in the data lake based on the packageId.\n``` javascript\nconst package = new Datalake.Package(datalakeConfig);\n\npackage.deletePackage({\n  packageId: 'abcd098ty'\n}).then(deleteResponse =\u003e {\n  console.log('Delete results: ');\n  console.log(JSON.stringify(deleteResponse));\n\n  // { }\n});\n```\n\n**Use case 7: Get a list of all Datasets in a Package**\n``` javascript\nconst package = new Datalake.Package(datalakeConfig);\n\npackage.describePackageDatasets({\n  packageId: 'abcd098ty'\n}).then(response =\u003e {\n  console.log('Describe datasets results: ');\n  console.log(JSON.stringify(response));\n\n  // {\n  //   Items: [{\n  //     updated_at: \"2017-08-07T18:30:00Z\",\n  //     package_id: \"abcd098ty\",\n  //     created_at: \"2017-08-07T18:30:00Z\",\n  //     s3_bucket: \"data-lake-us-east-1-012345678901\",\n  //     content_type: \"application/zip\",\n  //     created_by: \"datalake_admin\",\n  //     dataset_id: \"ABC123xyz\",\n  //     owner: \"datalake_admin\",\n  //     name: \"new-data-file.zip\",\n  //     s3_key: \"ABC123xyz/1504794600000/new-data-file.zip\",\n  //     type: \"dataset\"\n  //   }, {\n  //     updated_at: \"2017-08-07T18:35:00Z\",\n  //     package_id: \"abcd098tu\",\n  //     created_at: \"2017-08-07T18:35:00Z\",\n  //     s3_bucket: \"data-lake-us-east-1-012345678901\",\n  //     content_type: \"application/zip\",\n  //     created_by: \"datalake_admin\",\n  //     dataset_id: \"ABC123xyz\",\n  //     owner: \"datalake_admin\",\n  //     name: \"other-data-file.zip\",\n  //     s3_key: \"ABC123wyz/1504794700000/other-data-file.zip\",\n  //     type: \"dataset\"\n  //   }],\n  //   Count: 2,\n  //   ScannedCount: 2\n  // }\n});\n```\n\n**Use case 8: Describe a Dataset in a Package**\n``` javascript\nconst package = new Datalake.Package(datalakeConfig);\n\npackage.describePackageDataset({\n  packageId: 'abcd098ty',\n  datasetId: 'ABC123xyz'\n}).then(response =\u003e {\n  console.log('Describe dataset results: ');\n  console.log(JSON.stringify(response));\n\n  // {\n  //   Items: [{\n  //     updated_at: \"2017-08-07T18:30:00Z\",\n  //     package_id: \"abcd098ty\",\n  //     created_at: \"2017-08-07T18:30:00Z\",\n  //     s3_bucket: \"data-lake-us-east-1-012345678901\",\n  //     content_type: \"application/zip\",\n  //     created_by: \"datalake_admin\",\n  //     dataset_id: \"ABC123xyz\",\n  //     owner: \"datalake_admin\",\n  //     name: \"new-data-file.zip\",\n  //     s3_key: \"ABC123xyz/1504794600000/new-data-file.zip\",\n  //     type: \"dataset\"\n  //   }],\n  //   Count: 1,\n  //   ScannedCount: 1\n  // }\n});\n```\n\n**Use case 9: Delete a Dataset** Delete a dataset and the file contained inside of it from a package and the data lake.\n``` javascript\nconst package = new Datalake.Package(datalakeConfig);\n\npackage.deletePackageDataset({\n  packageId: 'abcd098ty',\n  datasetId: 'xyz098qwe'\n}).then(deleteResponse =\u003e {\n  console.log('Delete results: ');\n  console.log(JSON.stringify(deleteResponse));\n\n  // { }\n});\n```\n\n**Use case 10: Get a list of all required Metadata fields** Get a list of all tags set up on the Governance tab of the Administration-\u003eSettings page of the data lake.\n``` javascript\nconst metadata = new Datalake.Metadata(datalakeConfig);\n\nmetadata.describeRequiredMetadata().then(response =\u003e {\n  console.log('Required metadata: ');\n  console.log(JSON.stringify(response));\n\n  // {\n  //   Items: [{\n  //     updated_at: \"2017-08-03T15:19:00Z\",\n  //     created_at: \"2017-08-03T15:19:00Z\",\n  //     setting_id: \"lkjh234so\",\n  //     type: \"governance\",\n  //     setting: {\n  //       tag: \"first-tag\",\n  //       governance: \"Required\"\n  //     }\n  //   }, {\n  //     updated_at: \"2017-08-03T15:20:00Z\",\n  //     created_at: \"2017-08-03T15:20:00Z\",\n  //     setting_id: \"qwe678mnb\",\n  //     type: \"governance\",\n  //     setting: {\n  //       tag: \"second-tag\",\n  //       governance: \"Optional\"\n  //     }\n  //   }]\n  // }\n});\n```\n\n**Use case 11: Get metadata tags on a Package** Get a list of metadata tags applied to a Package.\n``` javascript\nconst metadata = new Datalake.Metadata(datalakeConfig);\n\n// Get metadata for a package\nvar currentMetadata = null;\nmetadata.describeMetadata({ packageId: 'ABC123xyz' }).then(data =\u003e {\n  console.log('Current metadata is: ');\n  console.log(JSON.stringify(data));\n\n  // {\n  //   package_id: \"ABC123xyz\",\n  //   metadata_id: \"DEF456rst\",\n  //   created_at: \"2017-08-07T18:30:00Z\",\n  //   created_by: \"datalake_admin\",\n  //   metadata: [{\n  //     tag: \"format\",\n  //     value: \"zip\"\n  //   }]\n  // }\n});\n```\n\n**Use case 12: Create metadata tags on a Package** Add metadata tags to a Package in the data lake. To update metadata tags, first retrieve the current tags, update/delete/add to the tags array and then use createMetadata(...) to update the metadata tags associated with the Package.\n``` javascript\nconst metadata = new Datalake.Metadata(datalakeConfig);\n\n// Get metadata for a package\nvar newTags = [];\nnewTags.push({ tag: 'a-new-tag', value: 'new-value' });\nmetadata.createMetadata({ \n    packageId: 'abcd098ty',\n    metadata: newTags\n}).then(response =\u003e {\n  console.log('Current metadata is: ');\n  console.log(JSON.stringify(response));\n\n  // {\n  //   package_id: \"abcd098ty\",\n  //   metadata_id: \"DEF456rsu\",\n  //   created_at: \"2017-08-07T18:30:00Z\",\n  //   created_by: \"datalake_admin\",\n  //   metadata: [{\n  //     tag: \"a-new-tag\",\n  //     value: \"new-value\"\n  //   }]\n  // }\n});\n```\n\n## AWS Data Lake Solution\n\nYou can find a guide the AWS Data Lake solution released by the AWS Solutions Builder group at:\n\n[http://docs.awssolutionsbuilder.com/data-lake/](http://docs.awssolutionsbuilder.com/data-lake/ \"AWS Data Lake documentation\")\n\n## Implemented Actions\n\n### Package\n* Search\n* Create\n* Describe\n* Update\n* Delete\n* Dataset Upload\n* Dataset Delete\n* Dataset Describe\n* Datasets Describe\n\n### Metadata\n* Describe Required Metadata\n* Create Metadata\n* Describe Metadata\n\n### Cart (Incomplete)\n* Describe Cart\n* Add Item\n* Describe Item\n* Remove Item\n* Checkout\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frichclement%2Faws-data-lake-sdk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frichclement%2Faws-data-lake-sdk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frichclement%2Faws-data-lake-sdk/lists"}