{"id":15357148,"url":"https://github.com/chocolateboy/string_splitter","last_synced_at":"2025-10-04T11:18:58.062Z","repository":{"id":59156664,"uuid":"138094422","full_name":"chocolateboy/string_splitter","owner":"chocolateboy","description":"String#split on steroids","archived":false,"fork":false,"pushed_at":"2020-08-24T21:01:06.000Z","size":81,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-07T07:36:30.199Z","etag":null,"topics":["delimiter","delimiters","reverse","reverse-split","rsplit","separator","separators","split","splitter","string","string-split","string-splitter","zero-dependency"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"artistic-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chocolateboy.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-06-20T22:58:11.000Z","updated_at":"2021-11-02T22:20:18.000Z","dependencies_parsed_at":"2022-09-13T20:11:34.315Z","dependency_job_id":null,"html_url":"https://github.com/chocolateboy/string_splitter","commit_stats":null,"previous_names":[],"tags_count":13,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chocolateboy%2Fstring_splitter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chocolateboy%2Fstring_splitter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chocolateboy%2Fstring_splitter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chocolateboy%2Fstring_splitter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chocolateboy","download_url":"https://codeload.github.com/chocolateboy/string_splitter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246531986,"owners_count":20792735,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["delimiter","delimiters","reverse","reverse-split","rsplit","separator","separators","split","splitter","string","string-split","string-splitter","zero-dependency"],"created_at":"2024-10-01T12:33:31.225Z","updated_at":"2025-10-04T11:18:53.024Z","avatar_url":"https://github.com/chocolateboy.png","language":"Ruby","readme":"# StringSplitter\n\n[![Build Status](https://travis-ci.org/chocolateboy/string_splitter.svg)](https://travis-ci.org/chocolateboy/string_splitter)\n[![Gem Version](https://img.shields.io/gem/v/string_splitter.svg)](https://rubygems.org/gems/string_splitter)\n\n\u003c!-- toc --\u003e\n\n- [NAME](#name)\n- [INSTALLATION](#installation)\n- [SYNOPSIS](#synopsis)\n- [DESCRIPTION](#description)\n- [WHY?](#why)\n- [CAVEATS](#caveats)\n  - [Differences from String#split](#differences-from-stringsplit)\n- [COMPATIBILITY](#compatibility)\n- [VERSION](#version)\n- [SEE ALSO](#see-also)\n  - [Gems](#gems)\n  - [Articles](#articles)\n- [AUTHOR](#author)\n- [COPYRIGHT AND LICENSE](#copyright-and-license)\n\n\u003c!-- tocstop --\u003e\n\n# NAME\n\nStringSplitter - `String#split` on steroids\n\n# INSTALLATION\n\n```ruby\ngem \"string_splitter\"\n```\n\n# SYNOPSIS\n\n```ruby\nrequire \"string_splitter\"\n\nss = StringSplitter.new\n```\n\n**Same as `String#split`**\n\n```ruby\nss.split(\"foo bar baz\")\nss.split(\"foo bar baz\", \" \")\nss.split(\"foo bar baz\", /\\s+/)\n# =\u003e [\"foo\", \"bar\", \"baz\"]\n\nss.split(\"foo\", \"\")\nss.split(\"foo\", //)\n# =\u003e [\"f\", \"o\", \"o\"]\n\nss.split(\"\", \"...\")\nss.split(\"\", /.../)\n# =\u003e []\n```\n\n**Split at the first delimiter**\n\n```ruby\nss.split(\"foo:bar:baz:quux\", \":\", at: 1)\nss.split(\"foo:bar:baz:quux\", \":\", select: 1)\n# =\u003e [\"foo\", \"bar:baz:quux\"]\n```\n\n**Split at the last delimiter**\n\n```ruby\nss.split(\"foo:bar:baz:quux\", \":\", at: -1)\n# =\u003e [\"foo:bar:baz\", \"quux\"]\n```\n\n**Split at multiple delimiter positions**\n\n```ruby\nss.split(\"1:2:3:4:5:6:7:8:9\", \":\", at: [1..3, -1])\n# =\u003e [\"1\", \"2\", \"3\", \"4:5:6:7:8\", \"9\"]\n```\n\n**Split at all but the first and last delimiters**\n\n```ruby\nss.split(\"1:2:3:4:5:6\", \":\", except: [1, -1])\nss.split(\"1:2:3:4:5:6\", \":\", reject: [1, -1])\n# =\u003e [\"1:2\", \"3\", \"4\", \"5:6\"]\n```\n\n**Split from the right**\n\n```ruby\nss.rsplit(\"1:2:3:4:5:6:7:8:9\", \":\", at: [1..3, -1])\n# =\u003e [\"1\", \"2:3:4:5:6\", \"7\", \"8\", \"9\"]\n```\n\n**Split with negative, descending, and infinite ranges**\n\n```ruby\nss.split(\"1:2:3:4:5:6:7:8:9\", \":\", at: ..-3)\n# =\u003e [\"1\", \"2\", \"3\", \"4\", \"5\", \"6\", \"7:8:9\"]\n\nss.split(\"1:2:3:4:5:6:7:8:9\", \":\", at: 4...)\n# =\u003e [\"1:2:3:4\", \"5\", \"6\", \"7\", \"8:9\"]\n\nss.split(\"1:2:3:4:5:6:7:8:9\", \":\", at: [1, 5..3, -2..])\n# =\u003e [\"1\", \"2:3\", \"4\", \"5\", \"6:7\", \"8\", \"9\"]\n```\n\n**Full control via a block**\n\n```ruby\nresult = ss.split(\"1:2:3:4:5:6:7:8\", \":\") do |split|\n  split.pos % 2 == 0\nend\n# =\u003e [\"1:2\", \"3:4\", \"5:6\", \"7:8\"]\n```\n\n```ruby\nstring = \"banana\".chars.sort.join # \"aaabnn\"\n\nss.split(string, \"\") do |split|\n    split.rhs != split.lhs\nend\n# =\u003e [\"aaa\", \"b\", \"nn\"]\n```\n\n# DESCRIPTION\n\nMany languages have built-in `split` functions/methods for strings. They behave\nsimilarly (notwithstanding the occasional\n[surprise](https://chriszetter.com/blog/2017/10/29/splitting-strings/)), and\nhandle a few common cases, e.g.:\n\n* limiting the number of splits\n* including the separator(s) in the results\n* removing (some) empty fields\n\nBut, because the API is squeezed into two overloaded parameters (the delimiter\nand the limit), achieving the desired results can be tricky. For instance,\nwhile `String#split` removes empty trailing fields (by default), it provides no\nway to remove *all* empty fields. Likewise, the cramped API means there's no\nway to, e.g., combine a limit (positive integer) with the option to preserve\nempty fields (negative integer), or use backreferences in a delimiter pattern\nwithout including its captured subexpressions in the result.\n\nIf `split` was being written from scratch, without the baggage of its legacy\nAPI, it's possible that some of these options would be made explicit rather\nthan overloading the parameters. And, indeed, this is possible in some\nimplementations, e.g. in Crystal:\n\n```ruby\n\":foo:bar:baz:\".split(\":\", remove_empty: false)\n# =\u003e [\"\", \"foo\", \"bar\", \"baz\", \"\"]\n\n\":foo:bar:baz:\".split(\":\", remove_empty: true)\n# =\u003e [\"foo\", \"bar\", \"baz\"]\n````\n\nStringSplitter takes this one step further by moving the configuration out of\nthe method altogether and delegating the strategy — i.e. which splits should be\naccepted or rejected — to a block:\n\n```ruby\nss = StringSplitter.new\n\nss.split(\"foo:bar:baz\", \":\") { |split| split.index == 0 }\n# =\u003e [\"foo\", \"bar:baz\"]\n\nss.split(\"foo:bar:baz:quux\", \":\") do |split|\n  split.position == 1 || split.position == 3\nend\n# =\u003e [\"foo\", \"bar:baz\", \"quux\"]\n```\n\nAs a shortcut, the common case of splitting (or not splitting) at one or more\npositions is supported by dedicated options:\n\n```ruby\nss.split(\"foo:bar:baz:quux\", \":\", select: [1, -1])\n# =\u003e [\"foo\", \"bar:baz\", \"quux\"]\n\nss.split(\"foo:bar:baz:quux\", \":\", reject: [1, -1])\n# =\u003e [\"foo:bar\", \"baz:quux\"]\n```\n\n# WHY?\n\nI wanted to split semi-structured output into fields without having to resort\nto a regex or a full-blown parser.\n\nAs an example, the nominally unstructured output of many Unix commands is often\nformatted in a way that's tantalizingly close to being\n[machine-readable](https://en.wikipedia.org/wiki/Delimiter-separated_values),\napart from a few pesky exceptions, e.g.:\n\n```bash\n$ ls -l\n\n-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md\n-rw-r--r-- 1 user users  254 Jun 19 21:21 Gemfile\ndrwxr-xr-x 3 user users 4096 Jun 19 22:56 lib\n-rw-r--r-- 1 user users 8952 Jun 18 18:16 LICENSE.md\n-rw-r--r-- 1 user users 3134 Jun 19 22:59 README.md\n```\n\nThese lines can *almost* be parsed into an array of fields by splitting them on\nwhitespace. The exception is the date (columns 6-8), i.e.:\n\n```ruby\nline = \"-rw-r--r-- 1 user users   87 Jun 18 18:16 CHANGELOG.md\"\nline.split\n```\n\ngives:\n\n```ruby\n[\"-rw-r--r--\", \"1\", \"user\", \"users\", \"87\", \"Jun\", \"18\", \"18:16\", \"CHANGELOG.md\"]\n```\n\ninstead of:\n\n```ruby\n[\"-rw-r--r--\", \"1\", \"user\", \"users\", \"87\", \"Jun 18 18:16\", \"CHANGELOG.md\"]\n```\n\nOne way to work around this is to parse the whole line, e.g.:\n\n```ruby\nline.match(/^(\\S+) \\s+ (\\d+) \\s+ (\\S+) \\s+ (\\S+) \\s+ (\\d+) \\s+ (\\S+ \\s+ \\d+ \\s+ \\S+) \\s+ (.+)$/x)\n```\n\nBut that requires us to specify *everything*. What we really want is a version\nof `split` which allows us to veto splitting for the 6th and 7th delimiters\n(and to stop after the 8th delimiter), i.e. control over which splits are\naccepted, rather than being restricted to the single, baked-in strategy\nprovided by the `limit` parameter.\n\nBy providing a simple way to accept or reject each split, StringSplitter makes\ncases like this easy to handle, either via a block:\n\n```ruby\nss.split(line) do |split|\n  case split.position when 1..5, 8 then true end\nend\n# =\u003e [\"-rw-r--r--\", \"1\", \"user\", \"users\", \"87\", \"Jun 18 18:16\", \"CHANGELOG.md\"]\n```\n\nOr via its option shortcut:\n\n```ruby\nss.split(line, at: [1..5, 8])\n# =\u003e [\"-rw-r--r--\", \"1\", \"user\", \"users\", \"87\", \"Jun 18 18:16\", \"CHANGELOG.md\"]\n```\n\n# CAVEATS\n\n## Differences from String#split\n\nUnlike `String#split`, StringSplitter doesn't trim the string before splitting\nif the delimiter is omitted or a single space, e.g.:\n\n```ruby\n\" foo bar baz \".split          # =\u003e [\"foo\", \"bar\", \"baz\"]\n\" foo bar baz \".split(\" \")     # =\u003e [\"foo\", \"bar\", \"baz\"]\n\nss.split(\" foo bar baz \")      # =\u003e [\"\", \"foo\", \"bar\", \"baz\", \"\"]\nss.split(\" foo bar baz \", \" \") # =\u003e [\"\", \"foo\", \"bar\", \"baz\", \"\"]\n```\n\n`String#split` omits the `nil` values of unmatched optional captures:\n\n```ruby\n\"foo:bar:baz\".scan(/(:)|(-)/)  # =\u003e [[\":\", nil], [\":\", nil]]\n\"foo:bar:baz\".split(/(:)|(-)/) # =\u003e [\"foo\", \":\", \"bar\", \":\", \"baz\"]\n```\n\nStringSplitter preserves them by default (if `include_captures` is true, as it\nis by default), though they can be omitted from spread captures by passing\n`:compact` as the value of the `spread_captures` option:\n\n```ruby\ns1 = StringSplitter.new(spread_captures: true)\ns2 = StringSplitter.new(spread_captures: false)\ns3 = StringSplitter.new(spread_captures: :compact)\n\ns1.split(\"foo:bar:baz\", /(:)|(-)/) # =\u003e [\"foo\", \":\", nil, \"bar\", \":\", nil, \"baz\"]\ns2.split(\"foo:bar:baz\", /(:)|(-)/) # =\u003e [\"foo\", [\":\", nil], \"bar\", [\":\", nil], \"baz\"]\ns3.split(\"foo:bar:baz\", /(:)|(-)/) # =\u003e [\"foo\", \":\", \"bar\", \":\", \"baz\"]\n```\n\n# COMPATIBILITY\n\nStringSplitter is tested and supported on all versions of Ruby [supported by\nthe ruby-core team](https://www.ruby-lang.org/en/downloads/branches/), i.e.,\ncurrently, Ruby 2.5 and above.\n\n# VERSION\n\n0.7.3\n\n# SEE ALSO\n\n## Gems\n\n- [rsplit](https://github.com/Tatzyr/rsplit) - a reverse-split implementation (only works with string delimiters)\n\n## Articles\n\n- [Splitting Strings](https://chriszetter.com/blog/2017/10/29/splitting-strings/)\n\n# AUTHOR\n\n[chocolateboy](mailto:chocolate@cpan.org)\n\n# COPYRIGHT AND LICENSE\n\nCopyright © 2018-2020 by chocolateboy.\n\nThis is free software; you can redistribute it and/or modify it under the\nterms of the [Artistic License 2.0](https://www.opensource.org/licenses/artistic-license-2.0.php).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchocolateboy%2Fstring_splitter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchocolateboy%2Fstring_splitter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchocolateboy%2Fstring_splitter/lists"}