{"id":22837387,"url":"https://github.com/hubtou/prep","last_synced_at":"2026-05-04T08:31:31.256Z","repository":{"id":57453818,"uuid":"376159028","full_name":"HubTou/prep","owner":"HubTou","description":"Prepare text for statistical processing","archived":false,"fork":false,"pushed_at":"2021-09-26T16:06:19.000Z","size":60,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-28T21:48:48.205Z","etag":null,"topics":["command-line-tool","learning-python","learning-unix","pnu-project","python","shell","tools","unix","unix-command","utilities"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HubTou.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-11T22:55:38.000Z","updated_at":"2021-09-26T16:05:24.000Z","dependencies_parsed_at":"2022-08-29T11:12:01.947Z","dependency_job_id":null,"html_url":"https://github.com/HubTou/prep","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HubTou%2Fprep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HubTou%2Fprep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HubTou%2Fprep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HubTou%2Fprep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HubTou","download_url":"https://codeload.github.com/HubTou/prep/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246412665,"owners_count":20773050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line-tool","learning-python","learning-unix","pnu-project","python","shell","tools","unix","unix-command","utilities"],"created_at":"2024-12-12T23:16:38.554Z","updated_at":"2026-05-04T08:31:31.222Z","avatar_url":"https://github.com/HubTou.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Installation\npip install [pnu-prep](https://pypi.org/project/pnu-prep/)\n\n# PREP(1)\n\n## NAME\nprep - prepare text for statistical processing\n\n## SYNOPSIS\n**prep**\n\\[-a|--ascii\\]\n\\[-d|--number\\]\n\\[-h|--hyphen\\]\n\\[-i|--ignore FILE\\]\n\\[-o|--only FILE\\]\n\\[-p|--ponctuate\\]\n\\[--debug\\]\n\\[--help|-?\\]\n\\[--version\\]\n\\[--\\]\n\\[file\\]\n\\[...\\]\n\n## DESCRIPTION\n**prep** reads each *file* in sequence and writes it on the standard output,\none lowercase `word' per line.\nA word is a string of alphabetic characters and embedded apostrophes, delimited by space or punctuation.\nHyphenated words are broken apart;\nhyphens at the end of lines are removed and the hyphenated parts are joined.\nStrings of digits are discarded.\n\nWhen no files are given as arguments, standard input is read (until a Control-D (Unix) or Control-Z (Windows) character is sent).\n\nThe following option letters may appear in any order:\n\n### OPTIONS\nOptions | Use\n------- | ---\n-a\\|--ascii|Try to convert Unicode letters to ASCII.\n-d\\|--number|Print the word number (in the input stream) with each word.\n-h\\|--hyphen|Don't break words on hyphens.\n-i\\|--ignore|Take the next *file* as an `ignore' file. These words will not appear in the output. (They will be counted, for purposes of the **-d** numbering.)\n-o\\|--only|Take the next *file* as an `only' file. Only these words will appear in the output. (All other words will also be counted for the **-d** numbering.)\n-p\\|--ponctuate|Include punctuation marks (single nonalphanumeric characters from the \"!(),.:;?\" set) as separate output lines. The punctuation marks are not counted for the **-d** numbering.\n--debug|Enable debug mode\n--help\\|-?|Print usage and a short help message and exit\n--version|Print version and exit\n--|Options processing terminator\n\n## FILES\nIgnore and only files contain words, one per line.\n\nThe file [/usr/local/etc/eign](https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/lib/eign) was originally provided in **/usr/lib** as an example or default ignore file.\n\n## EXIT STATUS\nThe **prep** utility exits 0 on success, and \u003e0 if an error occurs.\n\n## SEE ALSO\n[deroff(1)](http://man.cat-v.org/unix_7th/1/deroff)\n\n## STANDARDS\nThe **prep** utility is a deprecated [UNIX 7th edition](https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7) command (it also appeared in Unix V7M, Ultrix 3.1, 2.9BSD and 2.11BSD).\n\nOur implementation tries to follow the [PEP 8](https://www.python.org/dev/peps/pep-0008/) style guide for [Python](https://www.python.org/) code.\n\n## PORTABILITY\nTested OK under Windows.\n\n## HISTORY\nThis utility was made for the [PNU project](https://github.com/HubTou/PNU), out of historical curiosity and for fun, though it doesn't seem very useful...\n\nSome features were added compared to the original command:\n* [Unicode](https://en.wikipedia.org/wiki/Unicode) letters are now supported by default (the original command predated Unicode by 12 years).\n* It is now possible to use the **-i** and **-o** options at the same time.\n* The **-h** option was added to avoid breaking word on hyphens, which makes sense in French.\n* The **-a** option was added to try to convert Unicode accented letters to their ASCII equivalent.\n\nSeveral bugs from the original **prep** command were corrected:\n* A display bug on hyphenated words inside a line when used with the combined **-d** and **-p** options.\n* A bug with lines starting by an apostrophe.\n* A bug with the character following an apostrophe.\n\n## LICENSE\nThis utility is available under the [3-clause BSD license](https://opensource.org/licenses/BSD-3-Clause).\n\n## AUTHORS\n[Hubert Tournier](https://github.com/HubTou)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhubtou%2Fprep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhubtou%2Fprep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhubtou%2Fprep/lists"}