{"id":13528312,"url":"https://github.com/Borewit/tokenizer-s3","last_synced_at":"2025-04-01T11:31:22.326Z","repository":{"id":38361589,"uuid":"231248473","full_name":"Borewit/tokenizer-s3","owner":"Borewit","description":"Amazon S3 tokenizer","archived":false,"fork":false,"pushed_at":"2024-10-28T15:48:01.000Z","size":2680,"stargazers_count":8,"open_issues_count":5,"forks_count":6,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-10-29T16:41:58.706Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Borewit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"Borewit","buy_me_a_coffee":"borewit"}},"created_at":"2020-01-01T18:11:40.000Z","updated_at":"2024-10-25T07:21:23.000Z","dependencies_parsed_at":"2024-09-09T20:59:13.910Z","dependency_job_id":"d351c6be-d52e-4b1b-9241-f4399ef3e3a4","html_url":"https://github.com/Borewit/tokenizer-s3","commit_stats":{"total_commits":666,"total_committers":8,"mean_commits":83.25,"dds":"0.19669669669669665","last_synced_commit":"0037e9d9e5659049c164e27d0edd5ae9811890f4"},"previous_names":[],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Borewit%2Ftokenizer-s3","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Borewit%2Ftokenizer-s3/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Borewit%2Ftokenizer-s3/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Borewit%2Ftokenizer-s3/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Borewit","download_url":"https://codeload.github.com/Borewit/tokenizer-s3/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222721797,"owners_count":17028600,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T06:02:25.257Z","updated_at":"2025-04-01T11:31:22.303Z","avatar_url":"https://github.com/Borewit.png","language":"TypeScript","funding_links":["https://github.com/sponsors/Borewit","https://buymeacoffee.com/borewit","https://www.buymeacoffee.com/borewit"],"categories":["TypeScript"],"sub_categories":[],"readme":"[![Node.js CI](https://github.com/Borewit/tokenizer-s3/actions/workflows/nodejs-ci.yml/badge.svg?branch=master)](https://github.com/Borewit/tokenizer-s3/actions/workflows/nodejs-ci.yml)\n[![CodeQL](https://github.com/Borewit/tokenizer-s3/actions/workflows/github-code-scanning/codeql/badge.svg?branch=master)](https://github.com/Borewit/tokenizer-s3/actions/workflows/github-code-scanning/codeql)\n[![NPM version](https://img.shields.io/npm/v/@tokenizer/s3.svg)](https://npmjs.org/package/@tokenizer/s3)\n[![npm downloads](https://img.shields.io/npm/dm/@tokenizer/s3.svg)](https://npmcharts.com/compare/@tokenizer/s3,@tokenizer/range,streaming-http-token-reader?start=300)\n[![Known Vulnerabilities](https://snyk.io/test/github/Borewit/tokenizer-s3/badge.svg?targetFile=package.json)](https://snyk.io/test/github/Borewit/tokenizer-s3?targetFile=package.json)\n\n# @tokenizer/s3\nThe tokenizer-s3 module enables seamless integration with [Amazon Web Services (AWS) S3](https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html), allowing you to read and tokenize data from S3 objects in a streaming fashion. This module extends the functionality of the strtok3 tokenizer by providing support for chunked S3 data access.\n\n## Features\nStreaming Support: Efficiently read and tokenize data from Amazon S3 objects using streaming, which is ideal for handling large files without loading them entirely into memory.\nIntegration with [strtok3](https://github.com/Borewit/strtok3): Works seamlessly with the [strtok3](https://github.com/Borewit/strtok3) tokenizer to process S3 data streams, making it easy to handle various tokenization tasks.\nFlexible Access: Provides options to configure S3 access, allowing for customized tokenization workflows based on your specific needs.\nPromise-Based API: Utilizes a promise-based API for easy integration into modern asynchronous workflows.\n\n## Installation\n\n```shell\nnpm install @tokenizer/s3\n```\n\n## Sponsor\nIf you appreciate my work and want to support the development of open-source projects like [music-metadata](https://github.com/Borewit/music-metadata), [file-type](https://github.com/sindresorhus/file-type), and [listFix()](https://github.com/Borewit/listFix), consider becoming a sponsor or making a small contribution.\nYour support helps sustain ongoing development and improvements.\n[Become a sponsor to Borewit](https://github.com/sponsors/Borewit)\n\nor\n\n\u003ca href=\"https://www.buymeacoffee.com/borewit\" target=\"_blank\"\u003e\u003cimg src=\"https://cdn.buymeacoffee.com/buttons/default-orange.png\" alt=\"Buy me A coffee\" height=\"41\" width=\"174\"\u003e\u003c/a\u003e\n\n## API Documention\n\n### `makeChunkedTokenizerFromS3`\n\nInitialize a tokenizer, with the option for random access, \nfrom an Amazon S3 client for use in extracting metadata from media files.\n\n#### Function Signature\n\n```ts\nfunction makeChunkedTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise\u003cIRandomAccessTokenizer\u003e\n```\nReads from the S3 as a stream.\n\n#### Parameters\n\n- `s3` (`S3Client`):\n\n  The S3 client used to make requests to Amazon S3.\n  \u003e [!NOTE]\n  \u003e To configure AWS client authentication see [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).\n\n- `objRequest` (`GetObjectRequest`):\n\n  The S3 object request containing details about the S3 object to fetch.\n  This includes properties like the bucket name and object key.\n\n- `options` (`IS3Options`, optional):\n\n#### Returns\n\n- `Promise\u003cIRandomAccessTokenizer\u003e`:\n\n  A Promise that resolves to an instance of `IRandomAccessTokenizer`.\n  This tokenizer can be used to extract metadata from the specified media file in the S3 object.\n  It supports [random access](https://en.wikipedia.org/wiki/Random_access) reads. \n\n### `makeStreamingTokenizerFromS3`\n\nInitialize a tokenizer from an Amazon S3 client for use in extracting metadata from media files.\n\n#### Function Signature\n\n```ts\nfunction makeStreamingTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise\u003cITokenizer\u003e\n```\nReads from the S3 as a stream.\n\n#### Parameters\n\n- `s3` (`S3Client`):\n  \n  The S3 client used to make requests to Amazon S3.\n  \u003e [!NOTE] \n  \u003e To configure AWS client authentication see [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).\n \n- `objRequest` (`GetObjectRequest`):\n  \n  The S3 object request containing details about the S3 object to fetch.\n  This includes properties like the bucket name and object key.\n\n#### Returns\n \n- `Promise\u003cITokenizer\u003e`:\n \n  A Promise that resolves to an instance of `ITokenizer`.\n  This tokenizer can be used to extract metadata from the specified media file in the S3 object.\n\n## Compatibility\n\nModule: version [0.3.0](https://github.com/Borewit/tokenizer-s3/releases/tag/v0.3.0) migrated from [CommonJS](https://en.wikipedia.org/wiki/CommonJS) to [pure ECMAScript Module (ESM)](https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c).\nThe distributed JavaScript codebase is compliant with the [ECMAScript 2020 (11th Edition)](https://en.wikipedia.org/wiki/ECMAScript_version_history#11th_Edition_%E2%80%93_ECMAScript_2020) standard.\n\nThis module requires a [Node.js ≥ 16](https://nodejs.org/en/about/previous-releases) engine.\nIt can also be used in a browser environment when bundled with a module bundler.\n\nFor TypeScript CommonJs backward compatibility, you can use [load-esm](https://github.com/Borewit/load-esm).\n\n## Examples\n\n### Determine S3 file type\n\nDetermine file type (based on it's content) from a file stored Amazon S3 cloud:\n```js\nimport { fileTypeFromTokenizer } from 'file-type';\nimport { fromEnv } from '@aws-sdk/credential-providers';\nimport { S3Client } from '@aws-sdk/client-s3';\nimport { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';\n\n(async () =\u003e {\n\n  // Initialize S3 client\n  const s3 = new S3Client({\n    region: 'eu-west-2',\n    credentials: fromEnv(),\n  });\n\n  // Initialize S3 tokenizer\n  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, {\n    Bucket: 'affectlab',\n    Key: '1min_35sec.mp4'\n  });\n\n  // Figure out what kind of file it is\n  const fileType = await fileTypeFromTokenizer(s3Tokenizer);\n  console.log(fileType);\n})();\n```\n\nSee also [example at file-type](https://github.com/sindresorhus/file-type#filetypefromtokenizertokenizer).\n\n### Reading audio metadata from Amazon S3 \n\nRetrieve music-metadata \n```js\nimport { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';\nimport { S3Client } from '@aws-sdk/client-s3';\nimport { parseFromTokenizer } from 'music-metadata/lib/core';\n\n/**\n * Retrieve metadata from Amazon S3 object\n * @param objRequest S3 object request\n * @param options `tokenizer-s3` options\n * @return Metadata\n */\nasync function parseS3Object(s3, objRequest, options) {\n  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, objRequest);\n  return parseFromTokenizer(s3Tokenizer, options);\n}\n\n(async () =\u003e {\n  const s3 = new S3Client({});\n\n  const metadata = await parseS3Object(s3, {\n    Bucket: 'standing0media',\n    Key: '01 Where The Highway Takes Me.mp3'\n  });\n\n  console.log(metadata);\n})();\n```\n\nA module implementation of this example can be found in [@music-metadata/s3](https://github.com/Borewit/music-metadata-s3).\n\n## Dependency graph\n\n![dependency graph](doc/dependency.svg)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBorewit%2Ftokenizer-s3","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBorewit%2Ftokenizer-s3","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBorewit%2Ftokenizer-s3/lists"}