{"id":20425359,"url":"https://github.com/catseye/t-rext","last_synced_at":"2025-04-12T18:55:28.043Z","repository":{"id":45108946,"uuid":"44098331","full_name":"catseye/T-Rext","owner":"catseye","description":"MIRROR of https://codeberg.org/catseye/T-Rext : A command-line tool that attempts to rectify punctuation and spacing in (generated) text files","archived":false,"fork":false,"pushed_at":"2022-03-27T12:54:19.000Z","size":22,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-26T13:12:03.472Z","etag":null,"topics":["filtering","sanitization","text-processing","text-sanitization"],"latest_commit_sha":null,"homepage":"https://catseye.tc/node/T-Rext","language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"unlicense","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/catseye.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-10-12T09:43:35.000Z","updated_at":"2024-07-14T11:44:18.000Z","dependencies_parsed_at":"2022-09-02T23:11:01.558Z","dependency_job_id":null,"html_url":"https://github.com/catseye/T-Rext","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catseye%2FT-Rext","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catseye%2FT-Rext/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catseye%2FT-Rext/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/catseye%2FT-Rext/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/catseye","download_url":"https://codeload.github.com/catseye/T-Rext/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248618273,"owners_count":21134200,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["filtering","sanitization","text-processing","text-sanitization"],"created_at":"2024-11-15T07:13:02.247Z","updated_at":"2025-04-12T18:55:28.018Z","avatar_url":"https://github.com/catseye.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"T-Rext\n======\n\nT-Rext is a command-line filter that attempts to clean up spacing,\npunctuation, and capitalization in a text file.  Its purpose is so that,\nwhen you are writing a text generator, such as a Markov processor, you\nneed not worry too much about its output format; just toss its output\nthrough T-Rext when you're done to make it more presentable.\n\nThe current version of T-Rext is 0.3, which runs under either Python 2.7\nor Python 3.x.  Docker images based on appropriate versions of cPython\nfor each version are [available on Docker Hub][].\n\nUsage\n-----\n\n### Usage from the Command Line\n\n    bin/t-rext raw_output.txt \u003e cleaned_output.txt\n\nThis will take lines that look like this:\n\n    \" Well , \" said the king , , \" no . \"\n\nand reformat them to look like this:\n\n    “Well,” said the king, “no.”\n\nTo use T-Rext from any working directory, add the `bin` directory in this\nrepository to your `PATH`.  For example, you might add this line to your\n`.bashrc`:\n\n    export PATH=/path/to/this/repo/bin:$PATH\n\nAn easy way to accomplish the above is to install [shelf][], then\ndock T-Rext using\n\n    shelf_dockgh catseye/T-Rext\n\n### Usage from Python\n\nT-Rext is built on an over-engineered library of pipeline processors, which\nyou can use directly (note, its interface is not stable and liable to change.)\nTo use the T-Rext Python modules in other Python programs, make sure the\n`src` directory of this repository is on your `PYTHONPATH`.  For example,\nyou might add this line to your `.bashrc`:\n\n    export PYTHONPATH=/path/to/this/repo/src:$PYTHONPATH\n\nThen you can add imports like this to the top of your script:\n\n    from t_rext.processors import TrailingWhitespaceProcessor\n\nTests\n-----\n\nThis is a test suite, written in [Falderal][] format, for the `t-rext`\nutility.  It also serves as documentation for said utility.\n\n    -\u003e Tests for functionality \"Clean up punctuation and spaces\"\n\nSpaces before commas and periods are elided.\n\n    | Well , that is good .\n    = Well, that is good.\n\nMultiple commas are collapsed into a single comma.\n\n    | Well , , that is good .\n    = Well, that is good.\n\nMultiple periods are not collapsed into a single period.\n\n    | Well . . . that is good.\n    = Well... that is good.\n\nQuotes are oriented.\n\n    | \"Yes,\" he said.\n    = “Yes,” he said.\n\nSingle spaces after opening quotes and before closing quotes are elided.\n\n    | \" Yes , \" he said.\n    = “Yes,” he said.\n\nBut not the other way 'round.\n\n    | Muttering \"Yes,\" he turned around.\n    = Muttering “Yes,” he turned around.\n\nMultiple spaces after opening quotes and before closing quotes are elided.\n\n    | \"   Yes ,   \" he said.\n    = “Yes,” he said.\n\nBut not the other way 'round.\n\n    | Muttering   \"Yes,\"    he turned around.\n    = Muttering   “Yes,”    he turned around.\n\nQuotes do not match across paragraphs.\n\n    | Turbid \"Waters\" that \"leak.\n    | \n    | You \"don't\" have a clue.\n    = Turbid “Waters” that “leak.\n    = \n    = You “don't” have a clue.\n\nSingle spaces before apostrophes are elided in some situations.\n\n    | It wasn 't Arthur 's car.\n    = It wasn't Arthur's car.\n\nPunctuation at the beginning of a line is elided in some cases.\n\n    | , where he said so.\n    = Where he said so.\n\nCapitalization is applied at the beginning of a line, and the\nbeginning of a sentence.\n\n    | , where. he said so.\n    = Where. He said so.\n\n    | Really?    that was... so\n    = Really?    That was... so\n\nTwo full stops becomes an ellipsis.  Full stop then comma becomes\njust a comma.\n\n    | It was.. the nice., thing.\n    = It was... the nice, thing.\n\n[Falderal]:                https://catseye.tc/node/Falderal\n[shelf]:                   https://catseye.tc/node/shelf\n[available on Docker Hub]: https://hub.docker.com/r/catseye/t-rext\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatseye%2Ft-rext","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcatseye%2Ft-rext","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcatseye%2Ft-rext/lists"}