Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sualeh/what-a-character
Code snippets for testing and understanding Java support for Unicode
https://github.com/sualeh/what-a-character
character character-set encoding java-support unicode utf-8
Last synced: 2 months ago
JSON representation
Code snippets for testing and understanding Java support for Unicode
- Host: GitHub
- URL: https://github.com/sualeh/what-a-character
- Owner: sualeh
- License: other
- Created: 2016-04-01T16:54:54.000Z (almost 9 years ago)
- Default Branch: main
- Last Pushed: 2024-05-15T00:46:41.000Z (8 months ago)
- Last Synced: 2024-05-15T18:29:48.325Z (8 months ago)
- Topics: character, character-set, encoding, java-support, unicode, utf-8
- Language: Jupyter Notebook
- Homepage:
- Size: 2.25 MB
- Stars: 4
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: .github/CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: .github/CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
> **:star: Star it :arrow_heading_up: if you love it!**
# What a Character
Explains Unicode and character encoding to software engineers, and the pitfalls of working with international characters in Java, JavaScript and Python.
## Slides and Code
- **Overview**
Goes over concepts of international character identification and Unicode.
- [Concepts](https://sualeh.github.io/What-a-Character/part1/what-a-character-concepts.pdf)
- [`char` Data Type](https://sualeh.github.io/What-a-Character/part1/what-a-character-char.pdf)
- **Unicode Support in Programming Languages**
Goes over how to handle international text in many programming languages, using powerful regular expressions, converting case, and extracting numeric data from text.
- _Java_
- [Unicode Support in Java](https://sualeh.github.io/What-a-Character/part1/what-a-character-unicode-support-in-java.pdf)
- _Code Examples_
- [Code Examples](https://github.com/sualeh/What-a-Character/tree/main/Notebooks)
- **Encoding**
Goes over how to encode international text when reading and writing files, and what programmers need to be careful about in order not to get garbled data.
- [Encoding Concepts](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding.pdf)
- [Encoding Details](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-details.pdf)
- _Java_
- [Character Encoding with Java](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-java.pdf)
- _JavaScript_
- [Character Encoding with JavaScript](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-javascript.pdf)
- _Python_
- [Character Encoding with Python](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-python.pdf)
- _Code Examples_
- [Code Examples](https://github.com/sualeh/What-a-Character/tree/main/Notebooks)
- **Regular Expressions**
Goes over how to craft regular expressions to match Unicode characaters in different languages.
- [Regular Expression Concepts](https://sualeh.github.io/What-a-Character/part3/what-a-character-regular-expressions.pdf)
- _Code Examples_
- [Code Examples](https://github.com/sualeh/What-a-Character/tree/main/Notebooks)For additional code, please see [utf8db2](https://github.com/sualeh/utf8db2).
## Video
Video presentation of the [What a Character](https://vimeo.com/743222944) content.
## References
### Unicode
> "So I have an announcement to make: if you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will."
>
> **Joel Spolsky**
> [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html)- [An Introduction to Writing Systems & Unicode](https://r12a.github.io/scripts/tutorial/)
- [Encoding converter](https://r12a.github.io/app-encodings/)### Java Support
- [Supplementary Characters in the Java Platform](http://www.oracle.com/us/technologies/java/supplementary-142654.html)
- [Supported Encodings](https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html)### XKCD
| ![RTL](http://imgs.xkcd.com/comics/rtl.png "RTL") | ![Encoding](http://imgs.xkcd.com/comics/encoding.png "Encoding") |
![Unicode](http://imgs.xkcd.com/comics/unicode.png "Unicode")