{"id":19024164,"url":"https://github.com/raykitajima/swifttokenizer","last_synced_at":"2025-04-23T11:58:36.987Z","repository":{"id":95962091,"uuid":"607005108","full_name":"RayKitajima/SwiftTokenizer","owner":"RayKitajima","description":"Tokenizer for Swift","archived":false,"fork":false,"pushed_at":"2024-12-18T08:34:46.000Z","size":559,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-23T11:58:25.239Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RayKitajima.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-02-27T05:29:12.000Z","updated_at":"2024-12-18T08:32:16.000Z","dependencies_parsed_at":"2025-04-17T10:38:32.797Z","dependency_job_id":"c5e497e7-0b0c-4e5b-aada-d324d764ef78","html_url":"https://github.com/RayKitajima/SwiftTokenizer","commit_stats":{"total_commits":1,"total_committers":1,"mean_commits":1.0,"dds":0.0,"last_synced_commit":"cc0279dd3bc3d55a73611ff528d132e5e868d73e"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RayKitajima%2FSwiftTokenizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RayKitajima%2FSwiftTokenizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RayKitajima%2FSwiftTokenizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RayKitajima%2FSwiftTokenizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RayKitajima","download_url":"https://codeload.github.com/RayKitajima/SwiftTokenizer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250430589,"owners_count":21429323,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T20:35:23.124Z","updated_at":"2025-04-23T11:58:36.932Z","avatar_url":"https://github.com/RayKitajima.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# SwiftTokenizer\n\nTokenizer for Swift.\n\nExtrated and slightly modified version of GPT2 tokenizer from [swift-coreml-transformers](https://github.com/huggingface/swift-coreml-transformers) to be used as a standalone package. (now especially for BART)\n\n\n## Swift Package Manager\n\nAdd the following to your Package.swift dependencies:\n\n```\n  :\ndependencies: [\n    .package(url: \"https://github.com/RayKitajima/SwiftTokenizer.git\", from: \"1.0.0\"),\n],\n  :\n```\n\n## Usage\n\n```swift\nimport SwiftTokenizer\n\nlet config = TokenizerConfig(\n    vocab: Bundle.module.url(forResource: \"vocab\", withExtension: \"json\")!,\n    merges: Bundle.module.url(forResource: \"merges\", withExtension: \"txt\")!\n)\nlet tokenizer = Tokenizer(config: config)\n\nlet tokens = tokenizer.encode(text: \"Hello, world!\")\nprint(tokens)\n// [31414, 6, 232, 328]\n\nlet decoded = tokenizer.decode(tokens: tokenizer.stripBOS(tokens: tokenizer.stripEOS(tokens: tokens)))\nprint(decoded)\n// Hello, world!\n```\n\nIn case of BART, you need to append BOS and EOS tokens to the input_ids.\n\n```swift\nlet input_ids = tokenizer.appendEOS(tokens: tokenizer.appendBOS(tokens: tokenizer.encode(text: \"Hello, world!\")))\nprint(tokens)\n// [0, 31414, 6, 232, 328, 2]\n\n// inverse\nlet decoded = tokenizer.decode(tokens: tokenizer.stripBOS(tokens: tokenizer.stripEOS(tokens: input_ids)))\nprint(decoded)\n// Hello, world!\n```\n\n## See also\n\nhttps://github.com/huggingface/swift-coreml-transformers\n\n\n## License\n\nApache License 2.0\n\nCopyright © 2019 Hugging Face. All rights reserved.\n\nModifications copyright (C) 2023 Rei Kitajima (rei.kitajima@gmail.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraykitajima%2Fswifttokenizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fraykitajima%2Fswifttokenizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fraykitajima%2Fswifttokenizer/lists"}