{"id":18048523,"url":"https://github.com/bent10/nomark","last_synced_at":"2026-02-05T16:31:27.914Z","repository":{"id":226548818,"uuid":"769011186","full_name":"bent10/nomark","owner":"bent10","description":"Transform hypertext strings (e.g., HTML, Markdown) into plain text for natural language processing (NLP) normalization","archived":false,"fork":false,"pushed_at":"2026-01-03T01:46:43.000Z","size":100,"stargazers_count":2,"open_issues_count":5,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-01-05T01:39:16.599Z","etag":null,"topics":["html","markdown","nlp","normalize","normalizer","plaintext","text","token","tokenize","transform","transformer"],"latest_commit_sha":null,"homepage":"https://www.npmjs.com/package/nomark","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bent10.png","metadata":{"files":{"readme":"readme.md","changelog":"changelog.md","contributing":null,"funding":null,"license":"license","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-03-08T06:47:57.000Z","updated_at":"2025-08-17T03:40:30.000Z","dependencies_parsed_at":"2024-03-08T07:30:20.324Z","dependency_job_id":"9940791f-b064-4108-be2f-14c0f9e0c760","html_url":"https://github.com/bent10/nomark","commit_stats":null,"previous_names":["bent10/nomark"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/bent10/nomark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bent10%2Fnomark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bent10%2Fnomark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bent10%2Fnomark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bent10%2Fnomark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bent10","download_url":"https://codeload.github.com/bent10/nomark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bent10%2Fnomark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29125823,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-05T14:05:12.718Z","status":"ssl_error","status_checked_at":"2026-02-05T14:03:53.078Z","response_time":65,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["html","markdown","nlp","normalize","normalizer","plaintext","text","token","tokenize","transform","transformer"],"created_at":"2024-10-30T20:13:15.531Z","updated_at":"2026-02-05T16:31:27.892Z","avatar_url":"https://github.com/bent10.png","language":"TypeScript","readme":"# nomark\n\nA utility to transform hypertext strings (e.g., HTML, Markdown) into plain text, which is useful for natural language processing (NLP) normalization.\n\n## Install\n\n```bash\nnpm install nomark\n```\n\nOr yarn:\n\n```bash\nyarn add nomark\n```\n\nAlternatively, you can also include this module directly in your HTML file from CDN:\n\n```yml\nUMD: https://cdn.jsdelivr.net/npm/nomark/dist/index.umd.js\nESM: https://cdn.jsdelivr.net/npm/nomark/+esm\nCJS: https://cdn.jsdelivr.net/npm/nomark/dist/index.cjs\n```\n\n## Usage\n\n````js\nimport nomark from 'nomark'\n\nconst hypertext =\n  '# Café \u003cem\u003edu\u003c/em\u003e Monde\\n\\nThis is some **bold**, _italic_, and ~~strikethrough~~ text.\\n\\n## Headers\\n\\n### This is an H3 header\\n\\n#### This is an H4 header\\n\\n##### This is an H5 header\\n\\n###### This is an H6 header\\n\\n## Lists\\n\\n### Unordered List\\n\\n- Item 1\\n- Item 2\\n  - Subitem A\\n  - Subitem B\\n    - Sub-subitem 1\\n    - Sub-subitem 2\\n\\n### Ordered List\\n\\n1. First item\\n2. Second item\\n   1. Nested item\\n   2. Another nested item\\n\\n## Links and Images\\n\\n[Example](https://example.com)\\n\\n![Example Logo](https://example.com/favicon.ico)\\n\\n## Blockquotes\\n\\n\u003e This is a blockquote.\\n\u003e\\n\u003e - John Doe\\n\\n## Code Blocks\\n\\n```javascript\\nfunction greet(name) {\\n  console.log(`Hello, ${name}!`)\\n}\\n\\ngreet(\\'World\\')\\n```\\n\\n## Tables\\n\\n| Name | Age | Gender |\\n| ---- | --- | ------ |\\n| John | 30  | Male   |\\n| Jane | 25  | Female |\\n\\n## Task Lists\\n\\n- [x] Task 1\\n- [ ] Task 2\\n- [x] Task 3\\n\\n## Emoji\\n\\n:smiley: :rocket: :book:\\n\\n## Strikethrough\\n\\n~~This text is strikethrough.~~\\n\\n## HTML tags\\n\\nThis is a \u003cspan style=\"color:red;\"\u003ered\u003c/span\u003e text.\\n\\n\u003cp\u003eThis is a paragraph.\u003c/p\u003e\\n\\n\u003cblockquote\u003eThis is a blockquote in HTML.\u003c/blockquote\u003e\\n\\n\u003cul\u003e\\n  \u003cli\u003eHTML List Item 1\u003c/li\u003e\\n  \u003cli\u003eHTML List Item 2\u003c/li\u003e\\n\u003c/ul\u003e\\n\\n\u003cimg src=\"https://example.com/image.jpg\" alt=\"Example Image\"\u003e\\n\\n## GitHub Flavored Markdown (GFM) Features\\n\\n### Code Blocks with Language Highlighting\\n\\n```typescript\\ninterface Person {\\n  name: string\\n  age: number\\n}\\n\\nconst person: Person = {\\n  name: \\'John Doe\\',\\n  age: 30\\n}\\n```\\n\\n### Task Lists in Tables\\n\\n| Task   | Status |\\n| ------ | ------ |\\n| Task 1 | [x]    |\\n| Task 2 | [ ]    |\\n| Task 3 | [x]    |\\n\\n### Mentioning Users\\n\\nHey @username, could you take a look at this?\\n\\n### URLs Automatically Linked\\n\\nhttps://example.com/foo/bar\\n\\n### Strikethrough in Tables\\n\\n| Item       | Price  |\\n| ---------- | ------ |\\n| Apple      | $2     |\\n| Banana     | $1     |\\n| ~~Orange~~ | ~~$3~~ |\\n\\n### Emoji in Headers\\n\\n## :sparkles: Features :sparkles:'\n\nconst plaintext = nomark(hypertext, {\n  stripMarkdown: true,\n  stripHtml: true\n})\n\nconsole.log(plaintext)\n````\n\n\u003cdetails\u003e\n\u003csummary\u003eSee the results:\u003c/summary\u003e\n\n```text\nCafé du Monde.\nThis is some bold, italic, and strikethrough text.\nHeaders.\nThis is an H3 header.\nThis is an H4 header.\nThis is an H5 header.\nThis is an H6 header.\nLists.\nUnordered List.\nItem 1.\nItem 2.\nSubitem A.\nSubitem B.\nSub-subitem 1.\nSub-subitem 2.\nOrdered List.\nFirst item.\nSecond item.\nNested item.\nAnother nested item.\nLinks and Images.\nExample.\nExample Logo.\nBlockquotes.\nThis is a blockquote.\nJohn Doe.\nCode Blocks.\nfunction greet(name) {\n  console.log(`Hello, ${name}!`)\n}\n\ngreet('World')\nTables.\nName, Age, Gender.\nJohn, 30, Male.\nJane, 25, Female.\nTask Lists.\nTask 1.\nTask 2.\nTask 3.\nEmoji.\n:smiley: :rocket: :book:\nStrikethrough.\nThis text is strikethrough.\nHTML tags.\nThis is a red text.\nThis is a paragraph.\nThis is a blockquote in HTML.\nHTML List Item 1\nHTML List Item 2\nGitHub Flavored Markdown (GFM) Features.\nCode Blocks with Language Highlighting.\ninterface Person {\n  name: string\n  age: number\n}\n\nconst person: Person = {\n  name: 'John Doe',\n  age: 30\n}\nTask Lists in Tables.\nTask, Status.\nTask 1, [x].\nTask 2, [ ].\nTask 3, [x].\nMentioning Users.\nHey @username, could you take a look at this?\nURLs Automatically Linked.\nhttps://example.com/foo/bar.\nStrikethrough in Tables.\nItem, Price.\nApple, $2.\nBanana, $1.\nOrange, $3.\nEmoji in Headers.\n:sparkles: Features :sparkles:\n```\n\n\u003c/details\u003e\n\n## API\n\n### `nomark(input: string, options?: NomarkOptions): string`\n\nThis function transforms hypertext strings into plain text by applying [Unicode normalization](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize?retiredLocale=id#form), stripping HTML tags, and removing Markdown syntax.\n\n- `input`: The hypertext strings to transform.\n- `options` (optional): Options for transforming the input.\n  - `form` (optional): The [Unicode normalization](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize?retiredLocale=id#form) form to apply. Defaults to `'NFC'`.\n  - `stripHtml` (optional): Indicates whether to strip HTML tags from the text. Defaults to `false`.\n  - `stripMarkdown` (optional): Indicates whether to strip Markdown syntax from the text. Defaults to `false`.\n\n## Related\n\n- [boox](https://github.com/bent10/boox) – Performing full-text search across multiple documents by combining [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) score with [inverted index](https://en.wikipedia.org/wiki/Inverted_index) weight.\n- [stophtml](https://github.com/bent10/stophtml) – Extracts plain text from an HTML string.\n- [stopmarkdown](https://github.com/bent10/stopmarkdown) – Extracts plain text from an Markdown strings.\n- [stopword](https://github.com/fergiemcdowall/stopword) – Allows you to strip stopwords from an input text (supports a ton of languages).\n\n## Contributing\n\nWe 💛\u0026nbsp; issues.\n\nWhen committing, please conform to [the semantic-release commit standards](https://www.conventionalcommits.org/). Please install `commitizen` and the adapter globally, if you have not already.\n\n```bash\nnpm i -g commitizen cz-conventional-changelog\n```\n\nNow you can use `git cz` or just `cz` instead of `git commit` when committing. You can also use `git-cz`, which is an alias for `cz`.\n\n```bash\ngit add . \u0026\u0026 git cz\n```\n\n## License\n\n![GitHub](https://img.shields.io/github/license/bent10/nomark)\n\nA project by [Stilearning](https://stilearning.com) \u0026copy; 2024.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbent10%2Fnomark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbent10%2Fnomark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbent10%2Fnomark/lists"}