{"id":15999587,"url":"https://github.com/danburzo/ltr","last_synced_at":"2025-08-10T07:19:58.132Z","repository":{"id":233518320,"uuid":"786967125","full_name":"danburzo/ltr","owner":"danburzo","description":"Split text into chars, words, or sentences from the command line.","archived":false,"fork":false,"pushed_at":"2024-04-16T19:19:44.000Z","size":16,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-03T19:15:53.862Z","etag":null,"topics":["text-segmentation"],"latest_commit_sha":null,"homepage":"https://danburzo.ro/projects/ltr/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/danburzo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-15T16:30:27.000Z","updated_at":"2024-04-17T10:28:09.000Z","dependencies_parsed_at":"2024-04-16T14:47:05.305Z","dependency_job_id":"3687c7f1-b042-42e8-83f6-9d56d9c7bd9d","html_url":"https://github.com/danburzo/ltr","commit_stats":{"total_commits":13,"total_committers":1,"mean_commits":13.0,"dds":0.0,"last_synced_commit":"107ff2e208e62f666a9537fca724a320a7576ffe"},"previous_names":["danburzo/ltr"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danburzo%2Fltr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danburzo%2Fltr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danburzo%2Fltr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/danburzo%2Fltr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/danburzo","download_url":"https://codeload.github.com/danburzo/ltr/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240229972,"owners_count":19768597,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["text-segmentation"],"created_at":"2024-10-08T09:00:26.227Z","updated_at":"2025-02-22T20:14:40.515Z","avatar_url":"https://github.com/danburzo.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ltr\n\nA simple command-line text segmenter that uses the [`Intl.Segmenter`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter) \u003cabbr\u003eAPI\u003c/abbr\u003e to split text into characters, words and sentences.\n\nIt takes cues from standard Unix command-line tools such as [`wc`](https://en.wikipedia.org/wiki/Wc_(Unix)), [`uniq`](https://en.wikipedia.org/wiki/Uniq), and [`sort`](https://en.wikipedia.org/wiki/Sort_(Unix)).\n\n## Getting started\n\n`ltr` runs in Node.js and can be installed globally with npm:\n\n```bash\nnpm install -g ltr\n```\n\nYou can also run it without installing it first, using npx:\n\n```bash\nnpx ltr --help\n```\n\n## Usage\n\n```bash\nltr [command] [file1, [file2, …]]\n```\n\n`ltr` accepts one or more input files, or uses the standard input (`stdin`) when no files are provided. You can also concatenate `stdin` to other input files by using the `-` (dash) operand.\n\nGeneral options:\n* __`-h`__, __`--help`__.\n* __`-v`__, __`--version`__.\n\nAvailable commands:\n\n* `ltr chars` — extract graphemes;\n* `ltr words` — extract words;\n* `ltr sentences` — extract sentences.\n\nThe tool returns one value per line.\n\n## Options\n\n### `-l`, `--locale`\n\nBy default, `ltr` works with the current locale. An explicit locale can be specified.\n\n```bash\nltr sentences --locale=ro my-doc.txt\n```\n\n### `-u`, `--unique`\n\nReturn unique values, removing any duplicates.\n\n```bash\nltr words --unique my-doc.txt\n```\n\n### `-i`, `--ignore-case`\n\nIgnore case when performing operations. Causes values to be returned in lowercase.\n\n```bash\nltr words --ignore-case my-doc.txt\n```\n\n### `-I`, `--ignore-accents`\n\nIgnore diacritical marks when performing operations. Causes values to be returned without diacritical marks.\n\n```bash\nltr words --ignore-accents my-doc.txt\n```\n\n### `-c`, `--count`\n\nCount occurences of each unique value.\n\n```bash\nltr words --count my-doc.txt\n```\n\n### `-t`, `--total`\n\nCount total occurrences. The option implies `--count`.\n\n```bash\nltr words --total my-doc.txt\n```\n\n### `-s`, `--sort`\n\nSort the values. \n\n```bash\nltr words --sort my-doc.txt\n```\n\nWhen `--count` is present, values are sorted by occurrences, from most frequent to least. Otherwise values are sorted alphabetically in ascending order.\n\n### `-r`, `--reverse`\n\nReverse the order of the values. It can be used to reverse the sorting order, but can also be used on its own to list values in the reverse order of occurrence. \n\n```bash\nltr words --sort --reverse my-doc.txt\n```\n\n## Working with HTML and Markdown\n\nAlthough you can feed HMTL and Markdown to `ltr`, the list of returned value will have the added noise of markup constructs.\n\nYou can convert HTML or Markdown to plain text with [`trimd`](https://github.com/danburzo/trimd/) before calling `ltr`:\n\n```bash\n# Using Markdown:\ntrimd demarkdown my-post.md | ltr words --count --total\n\n# Using HTML:\ntrimd demarkup my-page.html | ltr words --count --total\n```\n\nFurhtermore, when using HTML documents you may want to focus on the main part of the content to reduce the interference of ancillary page content. You can use [`hred`](https://github.com/danburzo/hred/) to extract the content of a single element:\n\n```bash\n# Using HTML, just the \u003cmain\u003e content:\ncat my-page.html | trimd demarkup | ltr words --count --total\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanburzo%2Fltr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdanburzo%2Fltr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdanburzo%2Fltr/lists"}