{"id":24901756,"url":"https://github.com/antonsjava/charmap","last_synced_at":"2025-03-27T18:19:53.492Z","repository":{"id":57732439,"uuid":"52556808","full_name":"antonsjava/charmap","owner":"antonsjava","description":"Simple Java library for char to char mapping in Strings","archived":false,"fork":false,"pushed_at":"2023-12-17T17:28:56.000Z","size":33,"stargazers_count":0,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-01T21:17:28.752Z","etag":null,"topics":["java","string-transformations"],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/antonsjava.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-02-25T21:06:47.000Z","updated_at":"2021-12-15T20:37:11.000Z","dependencies_parsed_at":"2022-08-28T07:51:59.467Z","dependency_job_id":null,"html_url":"https://github.com/antonsjava/charmap","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antonsjava%2Fcharmap","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antonsjava%2Fcharmap/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antonsjava%2Fcharmap/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/antonsjava%2Fcharmap/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/antonsjava","download_url":"https://codeload.github.com/antonsjava/charmap/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245898314,"owners_count":20690466,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["java","string-transformations"],"created_at":"2025-02-01T21:17:35.664Z","updated_at":"2025-03-27T18:19:53.472Z","avatar_url":"https://github.com/antonsjava.png","language":"Java","readme":"# CharMap\n\n\nCharMap is simple Java library with API for transforming strings in char by char way.\nIt simplyfy code, where you neew to replace some chars in string into onother ones.\n\n## Motivation\n\nReal motivation for this API was class CE2Ascii from this library. It enables you \nto replace some special Slovak characters (like 'ô') into pure ascii characters (like\n'o' in this case) \n\nI had some trouble with such characters in some cases where you need text used in more \nthan one encoding. For example you need to create filename from user input which is \nused in solaris/windows and by several programs. Believe me you don't want to have \nletter 'ô' in such name. On other side you want to have that name readable. \n\nThen I found useful to provide implementation of the CE2Ascii class as set of \ngeneral mapper classes to simplify such kind of tasks. \n\n\n## CharMapper\n\nCharMapper class is base of all implementing classes. It implements only one method\nfor string transformation. \n```java\n  String value = ...\n  CharMapper anMapper = ...\n  String newValue = anMapper.map(value);\n```\nThe method creates new string from input one. During copying chars each char is check\nif must be removed (isToBeRemoved() method) and if not it is mapped to another one (map()).\n\nSubclasses implements isToBeRemoved(char) and map(char) methods to change behaviour of \nstring mapping. \n\n## SequenceCharMapper\n\nSequenceCharMapper implements char by char mapping by using two strings of same length. \nChars are mapped by same position in strings. Also set of chars to be removed is defined \nby string. \n```java\n  CharMapper anMapper = SequenceCharMapper.instance(\".\\\\\", \"-/\", \":;\\n\\r\")\n```\nThis mapper converts each '.' to '-' and '\\\\' to '/' and chars ':', ';', '\\n', '\\r' will be \nstriped out.\n\n## BTCharMapper\n\nSequenceCharMapper must iterate whole char sequence to find whether char must be mapped or not. \nBTCharMapper is little modification of SequenceCharMapper. It requires that mapping fromChars \nsequence is ordered in binary tree form. In this way it is possible to iterate sequence faster. \nSo if you have long mapped sequence it is better to use BTCharMapper. \n\nremoveChars sequence is iterate sequentially (normally thera are only few chars to be explicitly \nremoved.)\n\nIt is recomended to use BTCharMapper for long charmap sequences. And it is recomended to convert \nsequences to BT form in compile time. \n\nYou can use method BTCharMapper.convertLinearToBT() to order mapping sequence in binary tree.\n\nYou can write an simple code for transforming chars stored in file (first two lines as fromChars \nand toChars) into new file with chars in binary tree order. (for simplicity it usess also utilities \nfrom jaul project)\n```java\nimport java.util.List;\nimport sk.antons.charmap.BTCharMapper;\nimport sk.antons.jaul.Is;\nimport sk.antons.jaul.Split;\nimport sk.antons.jaul.binary.Unicode;\nimport sk.antons.jaul.util.TextFile;\n\npublic class AlphabetFile {\n  \n  private static void simpleFileEscape(String filename) {\n    List\u003cString\u003e lines = Split.file(filename, \"utf-8\").byLinesToList();\n    String fromLine = lines.get(0);\n    String toLine = lines.get(1);\n    String[] newLines = BTCharMapper.convertLinearToBT(fromLine, toLine, (char)0);\n    StringBuilder sb = new StringBuilder();\n    sb.append(\"    String fromChars = \\\"\").append(Unicode.escapeJava(newLines[0])).append(\"\\\";\"));\n    sb.append('\\n');\n    sb.append(\"    String toChars = \\\"\").append(Unicode.escapeJava(newLines[1])).append(\"\\\";\"));\n    TextFile.save(filename + \".escaped\", \"utf-8\", sb.toString());\n  }\n\n  public static void main(String[] params) {\n    simpleFileEscape(\"c:/tmp/_bordel/slovak.alphabet\");\n  }\n}\n```\nIf you want to create BTCharMapper from plain sequences in runtime you can use instanceFromNoBT() \nfactory methods to create BTCharMapper instance. \n\n\n## MultipleCharMapper\n\nIf you already have some CharMappers you want to use them in sequence you can use it \nusing MultipleCharMapper. The class allows you to combine implemented functionality \nfor char mapping and removing but string is converted only once. \n```java\n    CharMapper filenameMapper = MultipleCharMapper.instance(\n      CE2Ascii.charMapper()\n      , SequenceCharMapper.instance(\"\\\\/ \", \"___\", \";:\u0026\")\n      , new CharMapper() {\n          protected boolean isToBeRemoved(char c) { return (c \u003c 32) || (c \u003e 126); }\n          protected char map(char c) { return c; }\n      }\n    );\n```\nThis example combine \n - CE2Ascii mapper.\n - Mapper mapping slash, backslash and space into underline and remoces some special chars.\n - Maper which keeps onlu printable ascii chars.\n\n## CE2Ascii, EE2Ascii ...\n\nCE2Ascii was main reason for this API. I need to transform some special \ncharacters from Slovak alphabet into pure ASCII chars. So text is readable \nand some third party libraries has no problems with such chars. \n\nThere are many mappings for that alphabet and I also add some other characters \nto ensure clear text. So I decided to use BTCharMapper as internal implementation \nof mapping. \n\nAs I found, that after 20 years I completely forget azbuka EE2Ascii is just try. \n\nCE2Ascii mapping\n```\n# slovak\nfrom:ÁáÄäČčĎďÉéÍíĹĺĽľŇňÓóÔôŔŕŠšŤťÚúÝýŽž\n  to:AaAaCcDdEeIiLlLlNnOoOoRrSsTtUuYyZz\n\n# czech\nfrom:ÁáČčĎďÉéĚěÍíŇňŘřŠšŤťÚúŮůÝýŽž\n  to:AaCcDdEeEeIiNnRrSsTtUuUuYyZz\n\n# polish\nfrom:ĄąĆćĘęŁłŃńÓóŚśŹźŻż\n  to:AaCcEeLlNnOoSsZzZz\n\n# hungarian\nfrom:ÁáÉéÍíÓóÖöŐőÚúÜüŰű\n  to:AaEeIiOoOoOoUuUuUu\n\n# german\nfrom:ÄäÖöÜüß\n  to:AaOoUuS\n\n# svedish\nfrom:ÅåÄäÖö\n  to:AaAaOo\n\n# norveg\nfrom:ÆæØøÅå\n  to:EeOoAa\n\n# roman\nfrom:ĂăÂâÎîȘșȚț\n  to:AaAaIiSsTt\n\n# serbian\nfrom:ČčĆćĚěŁłŃńÓóŘřŔŕŠšŚśŽžŹź\n  to:CcCcEeLlNnOoRrRrSsSsZzZz\n\n# turkish\nfrom:ÇçĞğİıÖöŞşÜü\n  to:CcGgIiOoSsUu\n\n# ukranian\nfrom:ĆćĎďŁłŃńŔŕŚśŤťŹźŻż\n  to:CcDdLlNnRrSsTtZzZz\n```\n\nEE2Ascii mapping\n```\n# ukraine\nfrom:АаЯяБбЦцЦцЧчХхДдДдЕеЄєЄєФфҐґГгІіЇїЙйКкЛлЛлМмНнНнОойоПпРрРрСсСсШшЩщТтТтУуЮюВвВвИиийЗзЗзЖж\n  to:AaJjBbCcCcCcHhDdDdEeEeJjFfGgGgIiJjJjKkLlLlMmNnNnOoioPpRrRrSsSsSsSsTtTtUuJjVvWwYyYyZzZzZz\n\n# russia\nfrom:АаБбВвГгДдЕеЁёЖжЗзИиЙйКкЛлМмНнОоПпРрСсТтУуФфХхЦцЧчШшЩщЭэЮюЯяІіѲѳѴѵЫы\n  to:AaBbVvGgDdEeEeZzZzIiJjKkLlMmNnOoPpRrSsTtUuFfHhCcCcSsSsEeJjJjIiFfIiYy\n\n# belarus\nfrom:АаБбЦцЦцЧчДдДдзжЭэФфҐґГгХхІіЙйКкЛлМмНнОоПпРрСсШшТтУуЎўВвЫыЗзЖж\n  to:AaBbCcCcCcDdDdzzEeFfGgHhHhIiJjKkLlMmNnOoPpRrSsSsTtUuUuVvYyZzZz\n\n```\n\nAny2Ascii mapping\n\nI collected many characters (from several sources in many years) and try map them to ascii.\nThis mapping include CE2Ascii mapping bud maps also some iregular characters, which are not in regulkar aplphabets. \nit maps around 700 characters so mapping is littlebit slower than CE2Ascii\n\n## Maven usage\n\n```\n   \u003cdependency\u003e\n      \u003cgroupId\u003eio.github.antonsjava\u003c/groupId\u003e\n      \u003cartifactId\u003echarmap\u003c/artifactId\u003e\n      \u003cversion\u003eLASTVERSION\u003c/version\u003e\n   \u003c/dependency\u003e\n```\n\n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantonsjava%2Fcharmap","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantonsjava%2Fcharmap","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantonsjava%2Fcharmap/lists"}