Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sualeh/what-a-character

Code snippets for testing and understanding Java support for Unicode
https://github.com/sualeh/what-a-character

character character-set encoding java-support unicode utf-8

Last synced: 2 months ago
JSON representation

Code snippets for testing and understanding Java support for Unicode

Awesome Lists containing this project

README

        

> **:star: Star it :arrow_heading_up: if you love it!**

# What a Character

Explains Unicode and character encoding to software engineers, and the pitfalls of working with international characters in Java, JavaScript and Python.

## Slides and Code

- **Overview**
Goes over concepts of international character identification and Unicode.
- [Concepts](https://sualeh.github.io/What-a-Character/part1/what-a-character-concepts.pdf)
- [`char` Data Type](https://sualeh.github.io/What-a-Character/part1/what-a-character-char.pdf)
- **Unicode Support in Programming Languages**
Goes over how to handle international text in many programming languages, using powerful regular expressions, converting case, and extracting numeric data from text.
- _Java_
- [Unicode Support in Java](https://sualeh.github.io/What-a-Character/part1/what-a-character-unicode-support-in-java.pdf)
- _Code Examples_
- [Code Examples](https://github.com/sualeh/What-a-Character/tree/main/Notebooks)
- **Encoding**
Goes over how to encode international text when reading and writing files, and what programmers need to be careful about in order not to get garbled data.
- [Encoding Concepts](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding.pdf)
- [Encoding Details](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-details.pdf)
- _Java_
- [Character Encoding with Java](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-java.pdf)
- _JavaScript_
- [Character Encoding with JavaScript](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-javascript.pdf)
- _Python_
- [Character Encoding with Python](https://sualeh.github.io/What-a-Character/part2/what-a-character-encoding-python.pdf)
- _Code Examples_
- [Code Examples](https://github.com/sualeh/What-a-Character/tree/main/Notebooks)
- **Regular Expressions**
Goes over how to craft regular expressions to match Unicode characaters in different languages.
- [Regular Expression Concepts](https://sualeh.github.io/What-a-Character/part3/what-a-character-regular-expressions.pdf)
- _Code Examples_
- [Code Examples](https://github.com/sualeh/What-a-Character/tree/main/Notebooks)

For additional code, please see [utf8db2](https://github.com/sualeh/utf8db2).

## Video

Video presentation of the [What a Character](https://vimeo.com/743222944) content.

## References

### Unicode

> "So I have an announcement to make: if you are a programmer working in 2003 and you don't know the basics of characters, character sets, encodings, and Unicode, and I catch you, I'm going to punish you by making you peel onions for 6 months in a submarine. I swear I will."
>
> **Joel Spolsky**
> [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html)

- [An Introduction to Writing Systems & Unicode](https://r12a.github.io/scripts/tutorial/)
- [Encoding converter](https://r12a.github.io/app-encodings/)

### Java Support
- [Supplementary Characters in the Java Platform](http://www.oracle.com/us/technologies/java/supplementary-142654.html)
- [Supported Encodings](https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html)

### XKCD

| ![RTL](http://imgs.xkcd.com/comics/rtl.png "RTL") | ![Encoding](http://imgs.xkcd.com/comics/encoding.png "Encoding") |

![Unicode](http://imgs.xkcd.com/comics/unicode.png "Unicode")