{"id":17614165,"url":"https://github.com/learnbyexample/tpyo_revealo","last_synced_at":"2025-05-06T14:33:01.411Z","repository":{"id":108058842,"uuid":"114744434","full_name":"learnbyexample/tpyo_revealo","owner":"learnbyexample","description":":see_no_evil: assistant for hunting down tpyos","archived":false,"fork":false,"pushed_at":"2017-12-20T10:45:03.000Z","size":23,"stargazers_count":13,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-09T14:08:16.781Z","etag":null,"topics":["docx","epub","python3","typo"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/learnbyexample.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-12-19T09:17:57.000Z","updated_at":"2023-07-07T03:36:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"96d905e6-2f13-4504-af45-e952bc8366c3","html_url":"https://github.com/learnbyexample/tpyo_revealo","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/learnbyexample%2Ftpyo_revealo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/learnbyexample%2Ftpyo_revealo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/learnbyexample%2Ftpyo_revealo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/learnbyexample%2Ftpyo_revealo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/learnbyexample","download_url":"https://codeload.github.com/learnbyexample/tpyo_revealo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252703467,"owners_count":21790891,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docx","epub","python3","typo"],"created_at":"2024-10-22T18:26:35.518Z","updated_at":"2025-05-06T14:33:01.144Z","avatar_url":"https://github.com/learnbyexample.png","language":"Python","readme":"![tpyo gif](tpyo.gif)\n\n\u003cbr\u003e\n\n#### Why?\n\n* Saw couple of typos while reading a fantasy book and wondered why weren't they caught\n* Felt like a good mini-project to improve my Python and programming skills\n\n\u003cbr\u003e\n\n#### Idea\n\n1. Compare list of dictionary words with words extracted from e-book using Python code\n    * as of now, working on docx/epub formats\n2. The output generated has to be manually checked to validate\n    * in-world terms like names, places, etc\n    * words not found in reference dictionary\n    * hyphenated words\n3. These words can then be added to reference list of words so that further runs will reveal only typos\n4. Repeat steps 1-3 when input documents change\n\n\u003cbr\u003e\n\n#### Caveats\n\n* Use the program at your own risk\n    * files/directories are read/created programmatically, bug could corrupt your system\n    * I only have Linux, so don't know how it'll behave when used with other operating systems\n* at best, project could be said to be at alpha stage\n\n\u003cbr\u003e\n\n#### Instructions\n\nFor Linux and Unix-like systems\n\nFirst, clone the repo or download the [zip](https://github.com/learnbyexample/tpyo_revealo/archive/master.zip)\n\n```bash\n$ git clone https://github.com/learnbyexample/tpyo_revealo.git\n\n$ cd tpyo_revealo/\n$ mkdir ref_words input_doc\n$ # multiple documents and reference lists can be put in these directories\n$ cp samples/sample.docx input_doc/\n$ cp /usr/share/dict/words ref_words/words.txt\n\n$ # this will create a log directory using current time as directory name\n$ python3 tpyo_revealo.py\n\n$ cat 2017-12-20_15_38_07.341621/hyphenated_words.log\nen-IN: 1\nfull-fledged: 1\n$ cat 2017-12-20_15_38_07.341621/tpyo_words.log\nLibreOffice/5.2.0.4$Linux_X: 1\nLibreOffice_project/20m0$Build: 1\nrny: 1\nsamlpe: 1\nT15:37:31Z: 1\ntpyo: 1\nwordswithoutspace: 1\n\n$ # create ignore lists and run again\n$ cat \u003e ref_words/ignore.txt\nen-IN\nLibreOffice/5.2.0.4$Linux_X\nLibreOffice_project/20m0$Build\nT15:37:31Z\n$ echo 'full-fledged' \u003e ref_words/hyphenated_words.txt\n\n$ python3 tpyo_revealo.py\n$ cat 2017-12-20_15_40_45.505735/hyphenated_words.log\n$ cat 2017-12-20_15_40_45.505735/tpyo_words.log\nrny: 1\nsamlpe: 1\ntpyo: 1\nwordswithoutspace: 1\n```\n\n\u003cbr\u003e\n\n#### Where to get word lists\n\n* this [stackoverflow Q\u0026A](https://stackoverflow.com/questions/4456446/dictionary-text-file) might help\n* [aspell](http://app.aspell.net/create) looked good (mentioned in above link)\n    * American/British/Canadian/Australian spellings\n    * SCOWL size 95, Variants 3, Diacritic stripped gives 660+K words\n        * The script finished in less than 3 seconds for Oathbringer book(450+K words) against 660+K reference words, so performance not an issue\n    * Can be downloaded for both Windows/Unix\n    * See [scowl-readme](http://wordlist.aspell.net/scowl-readme/) for more details including usage and license\n\n\u003cbr\u003e\n\n#### Wishlist\n\n* Better parsing for xhtml files. As of now xml extraction is used, so things like `T\u003cspan class=\"XXX\"\u003eHOSE words` messes up things\n* Code organization - need to break up into different functions, etc\n* Features - repeated words, adverbs repeated in short space, etc\n* Look into NLTK\n\n\u003cbr\u003e\n\n#### Contributing\n\n* Open an issue for suggestions, feature requests, bugs, etc\n\n\u003cbr\u003e\n\n#### License\n\nMIT, see [LICENSE](./LICENSE) file\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flearnbyexample%2Ftpyo_revealo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flearnbyexample%2Ftpyo_revealo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flearnbyexample%2Ftpyo_revealo/lists"}