{"id":21258842,"url":"https://github.com/jamesponddotco/wikiextract","last_synced_at":"2025-03-15T06:34:45.760Z","repository":{"id":232292905,"uuid":"628401988","full_name":"jamesponddotco/wikiextract","owner":"jamesponddotco","description":"[READ-ONLY] A word extractor for Wikipedia articles.","archived":false,"fork":false,"pushed_at":"2024-04-08T23:13:30.000Z","size":39,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"trunk","last_synced_at":"2025-03-07T00:52:09.477Z","etag":null,"topics":["crawler","crawling","diceware","go","wikipedia","wikipedia-crawler","word-extraction"],"latest_commit_sha":null,"homepage":"https://sr.ht/~jamesponddotco/wikiextract/","language":"Go","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jamesponddotco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-04-15T20:46:24.000Z","updated_at":"2024-04-08T17:17:31.000Z","dependencies_parsed_at":"2024-04-09T00:41:32.614Z","dependency_job_id":"21939d2c-6876-4619-a33a-2dac0f27cd01","html_url":"https://github.com/jamesponddotco/wikiextract","commit_stats":null,"previous_names":["jamesponddotco/wikiextract"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesponddotco%2Fwikiextract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesponddotco%2Fwikiextract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesponddotco%2Fwikiextract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jamesponddotco%2Fwikiextract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jamesponddotco","download_url":"https://codeload.github.com/jamesponddotco/wikiextract/tar.gz/refs/heads/trunk","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243695478,"owners_count":20332622,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","crawling","diceware","go","wikipedia","wikipedia-crawler","word-extraction"],"created_at":"2024-11-21T04:11:08.960Z","updated_at":"2025-03-15T06:34:45.725Z","avatar_url":"https://github.com/jamesponddotco.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# `wikiextract`\n\n[![builds.sr.ht status](https://builds.sr.ht/~jamesponddotco/wikiextract.svg)](https://builds.sr.ht/~jamesponddotco/wikiextract?)\n\n`wikiextract` is a word extractor for Wikipedia articles. It can extract\nwords bigger than 4 characters from a given Wikipedia page or list of\npages and save them to a file you can later use as the source for\ngenerating [diceware passwords](https://en.wikipedia.org/wiki/Diceware).\n\n## Installation\n\n### From source\n\nFirst install the dependencies:\n\n- Go 1.22 or above.\n- make.\n- [scdoc](https://git.sr.ht/~sircmpwn/scdoc).\n\nSwitch to the latest stable tag, `v1.0.0`, then compile and install:\n\n```bash\ngit checkout v1.0.0\nmake\nsudo make install\n```\n\n## Usage\n\n```bash\n$ wikiextract --help\nNAME:\n   wikiextract - a simple word extractor for Wikipedia articles\n\nUSAGE:\n   wikiextract [global options] \n\nVERSION:\n   1.0.0\n\nGLOBAL OPTIONS:\n   --input-url value, -u value [ --input-url value, -u value ]  the URL of the Wikipedia page\n   --input-file value, -f value                                 a file containing a list of URLs\n   --output value, -o value                                     the path to the output file\n   --help, -h                                                   show help\n   --version, -v                                                print the version\n\n$ wikiextract -u 'https://en.wikipedia.org/wiki/Wikipedia' -o 'output.txt'\n```\n\nSee _wikiextract(1)_ after installing for more information.\n\n## Contributing\n\nAnyone can help make `wikiextract` better. Send patches on the [mailing\nlist](https://lists.sr.ht/~jamesponddotco/wikiextract-devel) and report\nbugs on the [issue\ntracker](https://todo.sr.ht/~jamesponddotco/wikiextract).\n\nYou must sign-off your work using `git commit --signoff`. Follow the\n[Linux kernel developer's certificate of\norigin](https://www.kernel.org/doc/html/latest/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin)\nfor more details.\n\nAll contributions are made under [the GPL-2.0 license](LICENSE.md).\n\n## Resources\n\nThe following resources are available:\n\n- [Support and general discussions](https://lists.sr.ht/~jamesponddotco/wikiextract-discuss).\n- [Patches and development related questions](https://lists.sr.ht/~jamesponddotco/wikiextract-devel).\n- [Instructions on how to prepare patches](https://git-send-email.io/).\n- [Feature requests and bug reports](https://todo.sr.ht/~jamesponddotco/wikiextract).\n\n---\n\nReleased under the [GPL-2.0 license](LICENSE.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamesponddotco%2Fwikiextract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjamesponddotco%2Fwikiextract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjamesponddotco%2Fwikiextract/lists"}