https://github.com/begriffs/wchar-conformance
Test ISO 10646 conformance of wchar_t
https://github.com/begriffs/wchar-conformance
Last synced: 7 months ago
JSON representation
Test ISO 10646 conformance of wchar_t
- Host: GitHub
- URL: https://github.com/begriffs/wchar-conformance
- Owner: begriffs
- Created: 2020-04-19T00:38:24.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-04-19T16:12:27.000Z (almost 6 years ago)
- Last Synced: 2025-05-08T03:12:58.169Z (10 months ago)
- Language: C
- Size: 2.93 KB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## wchar\_t ISO 10646 conformance test
The C99 spec, in 6.10.8 "Predefined macro names" says that the symbol
`__STDC_ISO_10646__` will be present in C implementations that support storing
Unicode codepoints (more accurately, UTF-32 code units) in `wchar_t`.
The standard describes the symbol like this:
> An integer constant of the form yyyymmL (for example, 199712L). If this
> symbol is defined, then every character in the Unicode required set, when
> stored in an object of type wchar\_t, has the same value as the short
> identifier of that character. The Unicode required set consists of all the
> characters that are defined by ISO/IEC 10646, along with all amendments and
> technical corrigenda, as of the specified year and month.
This repo contains a program "w" to test the implementation. It works like this:
* Generate every codepoint
* (Skip codepoints that are part of surrogate pairs)
* Convert the codepoint to a UTF-8 string
* Convert the UTF-8 to wchar\_t\* with mbstowcs()
* Ensure the wide character string has length one
* Ensure the first element numerically matches the original codepoint
The program reports any conversion errors, as well as whether the compiler
environment defines `__STDC_ISO_10646__`.
## Building
* Install [ICU4C](http://site.icu-project.org/download/) library
* Make:
```sh
./configure
make
```
* Run
```sh
./w
```