{"id":16179620,"url":"https://github.com/thomasheller/splitt0r","last_synced_at":"2025-07-27T17:06:12.392Z","repository":{"id":57557305,"uuid":"80344598","full_name":"thomasheller/splitt0r","owner":"thomasheller","description":"Split one file into multiple files based on delimiter :scissors:","archived":false,"fork":false,"pushed_at":"2017-03-18T14:53:40.000Z","size":17,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-13T14:39:37.471Z","etag":null,"topics":["delimiter","split","splitter","splitting","text"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomasheller.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-01-29T12:23:12.000Z","updated_at":"2023-08-09T17:26:56.000Z","dependencies_parsed_at":"2022-09-03T03:23:25.193Z","dependency_job_id":null,"html_url":"https://github.com/thomasheller/splitt0r","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasheller%2Fsplitt0r","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasheller%2Fsplitt0r/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasheller%2Fsplitt0r/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasheller%2Fsplitt0r/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomasheller","download_url":"https://codeload.github.com/thomasheller/splitt0r/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247648925,"owners_count":20972942,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["delimiter","split","splitter","splitting","text"],"created_at":"2024-10-10T05:43:38.609Z","updated_at":"2025-04-07T11:44:58.890Z","avatar_url":"https://github.com/thomasheller.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# splitt0r\n\n[![Build Status](https://travis-ci.org/thomasheller/splitt0r.svg?branch=master)](https://travis-ci.org/thomasheller/splitt0r)\n[![Go Report Card](https://goreportcard.com/badge/github.com/thomasheller/splitt0r)](https://goreportcard.com/report/github.com/thomasheller/splitt0r)\n[![Coverage Status](https://coveralls.io/repos/github/thomasheller/splitt0r/badge.svg?branch=master)](https://coveralls.io/github/thomasheller/splitt0r?branch=master)\n\nSplit one file into multiple files based on delimiter\n\nFor example, take this input file:\n\n```\n=====\naaa bbb ccc\nddd eee fff\n=====\nggg hhh iii\n=====\njjj kkk lll\n```\n\nsplitt0r will read this file and create three new files, `aaa.txt`, `ggg.txt` and `jjj.txt`, each containing the respective lines of text from the original file.\n\n## Install\n\nPrerequisites:\n  - Git\n  - Golang\n\nIf you haven't already, install Git and Golang on your system. On\nUbuntu/Debian this would be:\n\n```\nsudo apt-get install git golang\n```\n\nThen set up Go:\n  - Create a directory for your `$GOPATH`, for example `~/gocode`\n  - Set the `$GOPATH` environment variable accordingly: `export GOPATH=~/gocode`\n  - Add the `bin` directory to your `$PATH`, for example: ` export PATH=$PATH:~/gocode/bin`\n\nNow you can install splitt0r using `go get`:\n\n```\ngo get github.com/thomasheller/splitt0r\n```\n\n## Usage\n\nsplitt0r supports three modes of operation:\n  - `-stats` prints a few statistics about the input\n  - `-print` prints all titles found in the input (see below for what's a title)\n  - `-write` actually writes the split files into the output directory\n\nIf you don't pass any of the options, `-stats` is implied. You can supply multiple mode flags if you like.\n\n### Input\n\nYou can specify an input filename using `-file FILENAME`.\nIf you don't specify a filename, splitt0r will read from STDIN.\n\nIf you'd like splitt0r to recognize something different from `=====` as the delimiter,\nspecify the delimiter character using `-char CHAR`.\nNote that `CHAR` must be exactly one character.\n\nsplitt0r assumes that the delimiter character appears at least 5 times.\nIf the input line is shorter, it is considered part of the content.\nYou can set this to any positive number (integer) using `-len NUMBER`.\n\n### Output\n\nBy default, splitt0r will put all files in a subdirectory called `output`.\nYou can set this to something else using `-outdir DIRECTORY`.\nThe directory must be empty when splitt0r is started.\nIf the directory doesn't exist, splitt0r will create it for you.\n\nAll output filenames will be in the format `TITLE.txt` (regarding `TITLE`, see below).\nYou can change the filename extension using `-outext EXTENSION`.\nNote that `EXTENSION` must include the leading dot (unless you don't want a dot), for example `.foo`.\n\n### Titles (Filenames) and Duplicates\n\nsplitt0r will use the first word that appears after a delimiter line as the filename (\"title\") for the output (split) file.\n\nThere is a special mode called `-wiki` which parses the content according to MediaWiki markup rules.\nIn this case, the first word that is formatted either *italic* (`''italic''`), **bold** (`'''bold'''`) or ***bold-italic*** (`''''bold-italic''''`) is used as the filename (\"title\"), whichever comes first.\n\nIf you use Wiki mode, every first line of content after a delimiter must contain a title that is italic, bold or bold-italic. If it can't find a title as expected, splitt0r will refuse to split the input. You cannot mix the \"first word\" and \"Wiki\" approaches.\n\n#### Duplicates\n\nIf splitt0r finds the same title more than once, it will proceed as follows:\n  - The first file is put in the regular output directory, for example:  \n  `output/TITLE.txt`\n  - The second file is put in the `dupes` subdirectory and splitt0r appends an index, for example:  \n  `output/dupes/TITLE (2).txt`\n  - The third file would be:  \n  `output/dupes/TITLE (3).txt`\n  - and so forth...\n\nThis applies to `-write` mode. If you use `-print` to get a list of all titles, splitt0r will print the titles regardless of how often they appear in the input -- no indices will be appended. This is by design. In `-stats` mode, splitt0r will tell you how many duplicates appeared.\n\n### Details\n\n  - It doesn't matter how the input begins -- i.e., the first line doesn't need to be a delimiter line. splitt0r will ignore delimiter lines and empty lines until it finds the first line of actual content.\n  - If there are two or more delimiter lines with no lines or just empty lines in between them, they will be ignored. splitt0r won't produce empty output files.\n  - splitt0r will remove any empty lines that occur right before or right after the actual content. That means, you can have as many empty lines around your delimiter lines as you wish. If your content contains empty lines, they will be preserved.\n  - If there is any whitespace (spaces, tabs...) *after* the delimiter characters, splitt0r will still consider the line a delimiter line. For example, `=====\u003cSPACE\u003e\u003cSPACE\u003e` is a valid delimiter line. The opposite is not true: If there is any whitespace *before* the delimiter characters, the line is not recognized as a delimiter -- so `\u003cSPACE\u003e\u003cSPACE\u003e=====` will be considered part of the content.\n  - A line that consists solely of whitespace (spaces, tabs...) is considered empty. Nonetheless, splitt0r will preserve all whitespace characters in the output files: As noted above, it will remove empty lines that occur around your content and delimiter lines, but all empty lines inside your content will be copied verbatim, including potential whitespace characters.\n  - A whitespace character is any character that Golang's [`unicode.IsSpace`](https://golang.org/pkg/unicode/#IsSpace) recognizes as a whitespace character.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasheller%2Fsplitt0r","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomasheller%2Fsplitt0r","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasheller%2Fsplitt0r/lists"}