{"id":16301653,"url":"https://github.com/bbkr/homoglypher","last_synced_at":"2025-04-06T12:31:46.907Z","repository":{"id":138947594,"uuid":"228688842","full_name":"bbkr/HomoGlypher","owner":"bbkr","description":"Homoglyph toolset for Raku language.","archived":false,"fork":false,"pushed_at":"2023-04-21T23:57:22.000Z","size":90,"stargazers_count":4,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-21T23:41:25.919Z","etag":null,"topics":["homoglyph","raku"],"latest_commit_sha":null,"homepage":"","language":"Raku","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"artistic-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bbkr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-17T19:40:46.000Z","updated_at":"2023-12-27T19:14:59.000Z","dependencies_parsed_at":null,"dependency_job_id":"c97b3194-fa80-4a1a-a8da-353a72e0d2a1","html_url":"https://github.com/bbkr/HomoGlypher","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbkr%2FHomoGlypher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbkr%2FHomoGlypher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbkr%2FHomoGlypher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bbkr%2FHomoGlypher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bbkr","download_url":"https://codeload.github.com/bbkr/HomoGlypher/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247484312,"owners_count":20946384,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["homoglyph","raku"],"created_at":"2024-10-10T20:55:16.723Z","updated_at":"2025-04-06T12:31:46.887Z","avatar_url":"https://github.com/bbkr.png","language":"Raku","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Homoglyph toolset for [Raku](https://www.raku.org) language\n\n[![.github/workflows/test.yml](https://github.com/bbkr/HomoGlypher/actions/workflows/test.yml/badge.svg)](https://github.com/bbkr/HomoGlypher/actions/workflows/test.yml)\n\n[Homoglyph](https://en.wikipedia.org/wiki/Homoglyph) is set of one or more graphemes that has identical or very similar look to some other set of graphemes.\n\nFor example:\n\n * `6` (DIGIT SIX) and `б` (CYRILLIC SMALL LETTER BE)\n * `w` (LATIN SMALL LETTER W) and `ω` (GREEK SMALL LETTER OMEGA)\n * `oo` (2 x LATIN SMALL LETTER O) and `က` (MYANMAR LETTER KA)\n * `E` (LATIN CAPITAL LETTER E) and `Ε` (GREEK CAPITAL LETTER EPSILON) and `Е` (CYRILLIC CAPITAL LETTER IE)\n * `V` (LATIN CAPITAL LETTER V) and `\\/` (REVERSE SOLIDUS + SOLIDUS)\n\nHomoglyphs are:\n\n* Font dependent - two homoglyphs may be 100% identical in one font but have visual differences when rendered in other. Even cursive matters, for example `т` in cursive in some fonts looks like `m`.\n* Subjective - similarity level cannot be measured and there is no fixed point where two sets of graphemes stops being homoglyphs. Are `a` and `а` homoglyphs? Sure! How about `ź` and `ž`? Probably yes. What will you say about `R` and `Я`? Er.... You see the point?\n* Funny - replace `;` (SEMICOLON) with `;` (GREEK QUESTION MARK) in someone's code and watch them trying to debug code that looks perfectly fine :)\n* Dangerous - someone can register [IDN domain](https://en.wikipedia.org/wiki/Internationalized_domain_name) that looks very similar to your business domain to swindle money out of your clients.\n\n# TABLE OF CONTENTS\n\n* [SYNOPSIS](#synopsis)\n* [HINT](#hint)\n* [METHODS](#methods)\n  * [add-mapping](#add-mapping)\n  * [unwind](#unwind)\n  * [collapse](#collapse)\n  * [tokenize](#tokenize)\n  * [randomize](#randomize)\n* [CONTACT](#contact) \n\n# SYNOPSIS\n\n```raku\nuse HomoGlypher;\n\nmy %cyrillic = (\n    '6' =\u003e [ 'б' ],\n    'a' =\u003e [ 'а' ],\n    'b' =\u003e [ 'б', 'ь' ],\n    'r' =\u003e [ 'г' ]\n);\n\nmy %greek = (\n    'a' =\u003e [ 'α' ],\n    'o' =\u003e [ 'ο' ]\n);\n\nmy %myanmar = (\n    'oo' =\u003e [ 'က' ]\n);\n\nmy $hg = HomoGlypher.new;\n\n$hg.add-mapping( %cyrillic );\n$hg.add-mapping( %greek );\n$hg.add-mapping( %myanmar );\n\nmy @unwinded = $hg.unwind( 'foo' );    # [ 'foο', 'fοo', 'fοο', 'fက' ]\n\nmy @collapsed = $hg.collapse( 'бαг' ); # [ 'bar', '6ar' ]\n\nmy $randomized = $hg.randomize( 'bar', level =\u003e 80 ); # for example 'bαr'\n\nmy \u0026tokenized = $hg.tokenize( );\nsay so 'bαг' ~~ / \u003c\u0026tokenized: 'bar'\u003e /; # True\n\n```\n\n# HINT\n\nWhen dealing with homoglyphs the easiest method to debug them is to use uniname(s) method:\n\n```\n$ raku -e '.say for \"fοο\".uninames'\n\nLATIN SMALL LETTER F\nGREEK SMALL LETTER OMICRON\nGREEK SMALL LETTER OMICRON\n```\n\n# METHODS\n\n## add-mapping\n\nMerge given mapping (given as Hash of Arrays) with existed mappings.\n\nTypically keys are composed from ASCII characters.\nDuplicates are filtered out automatically.\nMulti character glyphs can be used both in keys and values:\n\n```raku\nmy %mapping = (\n    'IO' =\u003e [ 'Ю' ],\n    'P' =\u003e [ '|Ͻ']\n);\n```\n\nYou can inspect megred mappings under `$hg.mappings`, just ***do not modify it directly***.\nIf you want to fine tune it then fetch merged result, tweak it and add to new `HomoGlypher` object.\n\nFew ready to use mappings are provided in [HomoGlypher::Mappings](https://github.com/bbkr/HomoGlypher/blob/master/lib/HomoGlypher/Mappings.rakumod):\n\n* `@basic` - ASCII letters and digits that are faked by completely different characters: `ΤꜦꜪ QՍΙᴄк вᚱՕꓪɴ ꓝᏅХ` `jսოр𐑈 օ𐐷еᎱ tᏥе ιαzႸ Ժօց` `ОᛐշʒᏎƼỼ7ꝸᏭ`. Consists of:\n    * `%armenian`\n    * `%cherokee`\n    * `%cyrillic`\n    * `%deseret`\n    * `%greek`\n    * `%greek-mathematical-typeface`\n    * `%georgian`\n    * `%latin`\n    * `%lisu`\n    * `%myanmar`\n    * `%roman-numerals`\n    * `%runic`\n    * `%math-symbols`\n* `@typeface` - ASCII letters and digits that have typeface styles applied, base characters are not changed: `𝗧𝕳𝓔 𝒬𝕌𝕀𝙲𝔎 𝔹𝗥ＯＷ𝓝 𝘍𝕆𝗫` `𝒿𝓾𝗺𝚙𝕤 𝔬𝘃𝘦𝓇 𝔱𝘩𝘦 𝖑𝖆𝕫𝔂 𝗱𝓸𝔤` `𝟘𝟙２𝟹４𝟻𝟼𝟽𝟠𝟡`. Consists of:\n    * `%ballot`\n    * `%ballot-bold-script`\n    * `%ballot-script`\n    * `%bold`\n    * `%bold-fraktur`\n    * `%bold-italic`\n    * `%bold-script`\n    * `%doublestruck`\n    * `%doublestruck-italic`\n    * `%fraktur`\n    * `%fullwidth`\n    * `%heavy-ballot`\n    * `%italic`\n    * `%monospace`\n    * `%sansserif`\n    * `%sansserif-bold`\n    * `%sansserif-bold-italic`\n    * `%sansserif-italic`\n    * `%script`\n* `%accented` - ASCII letters that have accents applied, base characters are not changed: `ȚȞȆ ꝖṲÏÇꝂ ḂŔǾⱲṆ ḞṌẌ` `ĵữṁꝕṩ ǭⱱëȑ ʈẖḕ ļǟʐȳ ɗȫǵ`. Try to read it loud... Correctly :)\n* `%control` - ASCII printable representations of non printable characters: `P␆ ␎ME ␖THE␏SE␞`. Have perfect similarity but letters are very crammed and those acronyms are unlikely to be found in regular language.\n* `%flipped` - ASCII letters, digits and symbols that are faked by some completely different characters in various rotations and mirroring: `ꓕH⧢ Ꝺ⋂I𐐣ꓘ ꓭꓤOW𐐥 ꓞOX` `jᴝᴟpƨ ᴑ⋏ǝɹ ʇɥɘ ꞁɐzʎ dᴑᵷ` `0ᛚ2Ƹ4567∞9`\n\n```\nuse HomoGlypher;\nuse HomoGlypher::Mappings;\n\nmy $hg = HomoGlypher.new;\n\n$hg.add-mapping( $_ ) for @HomoGlypher::Mappings::basic;    # load all basic mappings\n$hg.add-mapping( %HomoGlypher::Mappings::accented );        # load single, specific mapping\n```\n\nI won't tell you where to get perfect, complete, ultimate mapping because homoglyphs are font-dependent and similarity is subjective. Good start point for creating your own mappings are [*_alphabet](https://en.wikipedia.org/wiki/List_of_writing_systems) and [*_numeral](https://en.wikipedia.org/wiki/List_of_numeral_systems) pages on Wikipedia. Or you can borrow mappings from some other projects like [Codebox homoglyphs](https://github.com/codebox/homoglyph), [IronGeek Homoglyph Attack Generator](https://www.irongeek.com/homoglyph-attack-generator.php) and many others.\n\n## unwind\n\nGenerates every possible mapping combination for your ASCII text.\nBeware, ***this works only for short inputs*** and ***list grows really, really fast***.\n\n```raku\nmy %cyrillic = (\n    '6' =\u003e [ 'б' ],\n    'a' =\u003e [ 'а' ],\n    'b' =\u003e [ 'б', 'ь' ],\n    'e' =\u003e [ 'е', 'ё' ],\n    'm' =\u003e [ 'м' ],\n    'p' =\u003e [ 'р' ],\n    'r' =\u003e [ 'г' ],\n    'x' =\u003e [ 'х' ]\n);\n\nmy $hg = HomoGlypher.new;\n$hg.add-mapping( %cyrillic );\n\n.say for $hg.unwind( 'example' );\n```\n\n```\nexamplё\nexamрle\nexamрlе\nexamрlё\nexaмple\nexaмplе\nexaмplё\nexaмрle\n...\n```\n(total 143 combinations)\n\nOutput list:\n\n* Is lazy - so you can iterate over it without worrying about memory consumption.\n* Has preserverd mappings order - so if you sort your mappings from most to less similar your result will have the same characteristics.\n\nMain purpose of homoglyph unwinding is to check if someone is spoofing your domain.\nSee ready to use [IDN Checker](https://github.com/bbkr/HomoGlypher/blob/master/example/IDN-checker.raku) script.\n\n## collapse\n\nOpposite of [unwind](#unwind).\nIf you have suspicious, homoglyphed text you can check which ASCII texts it might be derived from.\nBeware, ***this works only for short inputs***.\n\n\n```raku\nmy %ascii-art = (\n    'O' =\u003e [ '()' ],\n    'V' =\u003e [ '\\/' ],\n    'W' =\u003e [ '\\/\\/' ]\n);\n\nmy $hg = HomoGlypher.new;\n$hg.add-mapping( %ascii-art );\n\n.print for $hg.collapse( '\\/()\\/\\/EL' );\n```\n\n```\nVOVVEL\nVOWEL\n```\n(as you can see sometimes it may return more than one possible ASCII text)\n\nMain purpose of homoglyph collapsing is to check if someone is using your forums, hostings, or other services for phishing or false advertising.\nCheck also [tokenize](#tokenize) method.\n\n[Unicode::Security](https://github.com/JJ/perl6-unicode-security) module does similar thing.\n\n## tokenize\n\nConstruct token that can be used to match homoglyphed text in grammars.\n\n```raku\nmy %greek = (\n    'a' =\u003e [ 'α' ],\n    'r' =\u003e [ 'Γ' ],\n);\n\nmy $hg = HomoGlypher.new;\n$hg.add-mapping( %greek );\n\nmy \u0026homoglyphy = $hg.tokenize( );\n\n'foobαΓbaz' ~~ / $\u003cresult\u003e=\u003c\u0026homoglyphy: 'bar'\u003e /;\nsay $/{ 'result' };\n\n```\n\n```\n｢bαΓ｣\n```\n\nBeware, ***token uses mappings present at match time***.\nYou can create token without any mappings added, define grammar that uses this token and then add mappings before text is actually matched against grammar.\nIf you need tokens with different set of mapping in one grammar you can create and tokenize many `HomoGlypher` instances.\n\n[Regex::FuzzyToken](https://github.com/alabamenhu/RegexFuzzyToken) module can be used to catch misspelled phrases. Homoglypher and FuzzyToken can coexist in single grammar:\n\n```raku\nsay 'Suspicious!' if $email-text ~~ / [ \u003cfuzzy: 'paypal'\u003e | \u003c\u0026homoglyphy: 'paypal'\u003e ] /;\n```\n\nWill catch both `papyal` (misspelled) and `pαypαl` (homoglyphed). And yes, you can throw nuke on phishers and catch misspells and homoglyphs at the same time:\n\n```raku\nsay 'Suspicious!' if $email-text ~~ / \u003cfuzzy: $hg.unwind('paypal')\u003e /;\n```\n\nWill catch such sneaky phrases as `pαpyαl`.\n\n## randomize\n\nReplace characters in text with homoglyphs with given probability.\n\n```\nmy $hg = HomoGlypher.new;\n$hg.add-mapping( %HomoGlypher::Mappings::flipped );\n\nsay $hg.randomize( 'DIRECTIONS \u0026 CAKE ARE A LIE', level =\u003e 100 );\n```\n\n```\n⫏Iя∃C⟘IOИƧ ⅋ C∀K⧢ ∀Я∃ ∀ LI∃\n```\n\nLevel can be given as percentage value from 1 to 100 (default 50). It decides if ***possible*** mapping should be used at given position. Do not confuse that with amount of replaced characters. For example you have mapping `'a' =\u003e [ 'α' ]` and level set to 50%. Transforming `barrrr` will result with unmodified `barrrr` with 50% probability (at second position transformation was possible but not used) and modified `bαrrrr` with 50% probability (at second position transformation was possible and used). Each position is rolled individually against level. Each possible replacement glyph has equal chance to be picked.\n\n[Text::Homoglyph](https://github.com/MattOates/Text--Homoglyph) module does similar thing.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbkr%2Fhomoglypher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbbkr%2Fhomoglypher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbbkr%2Fhomoglypher/lists"}