{"id":17465424,"url":"https://github.com/suhlig/httpspell","last_synced_at":"2026-02-02T20:03:45.307Z","repository":{"id":38364697,"uuid":"135441968","full_name":"suhlig/httpspell","owner":"suhlig","description":"Spellchecker that recursively fetches HTML pages, converts them to plain text, and spellchecks them.","archived":false,"fork":false,"pushed_at":"2024-11-01T09:55:10.000Z","size":216,"stargazers_count":0,"open_issues_count":5,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-01-28T23:38:13.953Z","etag":null,"topics":["academic","http","spellcheck","spider"],"latest_commit_sha":null,"homepage":null,"language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/suhlig.png","metadata":{"files":{"readme":"README.markdown","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-30T12:49:35.000Z","updated_at":"2024-08-07T11:49:02.000Z","dependencies_parsed_at":"2024-07-22T14:02:04.714Z","dependency_job_id":"c6e7a85e-16fc-4fb6-ab27-a5af8399d8e7","html_url":"https://github.com/suhlig/httpspell","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suhlig%2Fhttpspell","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suhlig%2Fhttpspell/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suhlig%2Fhttpspell/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/suhlig%2Fhttpspell/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/suhlig","download_url":"https://codeload.github.com/suhlig/httpspell/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":236772464,"owners_count":19202281,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["academic","http","spellcheck","spider"],"created_at":"2024-10-18T11:42:41.021Z","updated_at":"2025-10-17T04:31:07.062Z","avatar_url":"https://github.com/suhlig.png","language":"Ruby","readme":"# `httpspell`\n\nThis is a spellchecker that recursively fetches HTML pages, converts them to plain text (using [pandoc](http://pandoc.org/)), and spellchecks them with [hunspell](https://hunspell.github.io/). Unknown words will be printed to `stdout`, which makes the tool a good candidate for CI pipelines where you might want to take action when a spelling error is found on a web page.\n\nWords that are not in the dictionary for the given language (inferred from the `lang` attribute of the HTML document's root element) can be added to a personal dictionary, which will mark the word as correctly spelled.\n\n# Usage\n\n* The following command will retrieve the HTML document at https://example.com, spellcheck it, and not print anything because there are no errors:\n\n  ```bash\n  $ httpspell https://example.com\n  ```\n\n  The exit code is `0`.\n\n* The following command will spellcheck the README of this project as rendered by GitHub, and print a list of unknown words. Note that we set the language to `en_US` because GitHub declares 'en' as document language, but the installed dictionaries usually refer the a specific language variant like `en_US`:\n\n  ```bash\n  $ httpspell https://github.com/suhlig/httpspell/blob/master/README.markdown --language en_US\n  suhlig\n  Permalink\n  httpspell\n  sloc\n  pandoc\n  hunspell\n  ...\n  ```\n\n  The exit code is `1`.\n\n# What is *not* checked\n\n* When spidering a site, `httpspell` will skip all responses with a `content-type` header other than `text/html` (unless pointing it to file, in which case it accepts anything).\n* Before converting, `httpspell` removes the following nodes from the HTML DOM as they are not a good target for spellchecking:\n  - `code`\n  - `pre`\n  - Elements with `spellcheck='false'` (this is how HTML5 allows tagging elements as a being target for spellchecking or not)\n\n# Misc\n\nIf you produce content with kramdown (e.g. using Jekyll), an [Inline Attribute List](https://kramdown.gettalong.org/syntax.html#inline-attribute-lists) can be used to set `spellcheck='false'` for an element by adding this line *after* the element (e.g. heading):\n\n```\n{: spellcheck=\"false\"}\n```\n\n# Dictionaries\n\nHunspell uses the system dictionary paths; on the Mac this is `~/Library/Spelling/`. Get some dictionaries as explained in the [hunspell](https://github.com/hunspell/hunspell) project:\n\n```command\n$ wget -O ~/Library/Spelling/en_US.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.aff\n$ wget -O ~/Library/Spelling/en_US.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.dic\n```\n\nGerman:\n\n```command\n$ wget -O ~/Library/Spelling/de_DE.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/de/de_DE_frami.dic\n$ wget -O ~/Library/Spelling/de_DE.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/de/de_DE_frami.aff\n```\n\nItalian (for integration tests):\n\n```command\n$ wget -O ~/Library/Spelling/it_IT.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/it_IT/it_IT.dic\n$ wget -O ~/Library/Spelling/it_IT.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/it_IT/it_IT.aff\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsuhlig%2Fhttpspell","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsuhlig%2Fhttpspell","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsuhlig%2Fhttpspell/lists"}