{"id":19039215,"url":"https://github.com/sualeh/what-a-character","last_synced_at":"2025-04-23T20:10:54.567Z","repository":{"id":96252039,"uuid":"55248142","full_name":"sualeh/What-a-Character","owner":"sualeh","description":"Code snippets for testing and understanding Java support for Unicode","archived":false,"fork":false,"pushed_at":"2024-05-15T00:46:41.000Z","size":2364,"stargazers_count":4,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-18T04:54:45.490Z","etag":null,"topics":["character","character-set","encoding","java-support","unicode","utf-8"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sualeh.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":".github/CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":"sualeh","custom":"https://www.paypal.me/sualeh"}},"created_at":"2016-04-01T16:54:54.000Z","updated_at":"2024-05-15T00:43:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"d3bf46e0-e1d8-4cc1-b47f-b03b97f4a7de","html_url":"https://github.com/sualeh/What-a-Character","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sualeh%2FWhat-a-Character","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sualeh%2FWhat-a-Character/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sualeh%2FWhat-a-Character/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sualeh%2FWhat-a-Character/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sualeh","download_url":"https://codeload.github.com/sualeh/What-a-Character/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250506140,"owners_count":21441723,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["character","character-set","encoding","java-support","unicode","utf-8"],"created_at":"2024-11-08T22:12:12.614Z","updated_at":"2025-04-23T20:10:54.434Z","avatar_url":"https://github.com/sualeh.png","language":"Jupyter Notebook","funding_links":["https://github.com/sponsors/sualeh","https://www.paypal.me/sualeh"],"categories":[],"sub_categories":[],"readme":"\u003e **:star: Star it :arrow_heading_up: if you love it!**\n\n# What a Character\n\nExplains Unicode and character encoding to software engineers, and the pitfalls of working with international characters in Java, JavaScript and Python.\n\n\n## Slides and Code\n\n- **Overview**\n  Goes over concepts of international character identification and Unicode.\n  - [Concepts](https://sualeh.github.io/What-a-Character/part1/what-a-character-concepts.pdf)\n  - [`char` Data Type](https://sualeh.github.io/What-a-Character/part1/what-a-character-char.pdf)\n- **Unicode Support in Programming Languages**\n  Goes over how to handle international text in many programming languages, using powerful regular expressions, converting case, and extracting numeric data from text.\n  - _Java_\n    - [Unicode Support in Java](https://sualeh.github.io/What-a-Character/part1/what-a-character-unicode-support-in-java.pdf)\n  - _Code Examples_\n    - [Code Examples](https://github.com/sualeh/What-a-Character/tree/main/Notebooks)\n- **Encoding**\n  Goes over how to encode international text when reading and writing files, and what programmers need to be careful about in order not to get garbled data.\n  - [Encoding Concepts](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding.pdf)\n  - [Encoding Details](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-details.pdf)\n  - _Java_\n    - [Character Encoding with Java](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-java.pdf)\n  - _JavaScript_\n    - [Character Encoding with JavaScript](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-javascript.pdf)\n  - _Python_\n    - [Character Encoding with Python](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-python.pdf)\n  - _Code Examples_\n    - [Code Examples](https://github.com/sualeh/What-a-Character/tree/main/Notebooks)\n- **Regular Expressions**\n  Goes over how to craft regular expressions to match Unicode characaters in different languages.\n  - [Regular Expression Concepts](https://sualeh.github.io/What-a-Character/part3/what-a-character-regular-expressions.pdf)\n  - _Code Examples_\n    - [Code Examples](https://github.com/sualeh/What-a-Character/tree/main/Notebooks)\n\nFor additional code, please see [utf8db2](https://github.com/sualeh/utf8db2).\n\n## Video\n\nVideo presentation of the [What a Character](https://vimeo.com/743222944) content.\n\n\n## References\n\n### Unicode\n\n\u003e \"So I have an announcement to make: if you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will.\"\n\u003e\n\u003e **Joel Spolsky**\n\u003e [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html)\n\n- [An Introduction to Writing Systems \u0026 Unicode](https://r12a.github.io/scripts/tutorial/)\n- [Encoding converter](https://r12a.github.io/app-encodings/)\n\n### Java Support\n- [Supplementary Characters in the Java Platform](http://www.oracle.com/us/technologies/java/supplementary-142654.html)\n- [Supported Encodings](https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html)\n\n### XKCD\n\n| ![RTL](http://imgs.xkcd.com/comics/rtl.png \"RTL\") | ![Encoding](http://imgs.xkcd.com/comics/encoding.png \"Encoding\") |\n\n![Unicode](http://imgs.xkcd.com/comics/unicode.png \"Unicode\")\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsualeh%2Fwhat-a-character","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsualeh%2Fwhat-a-character","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsualeh%2Fwhat-a-character/lists"}