An open API service indexing awesome lists of open source software.

https://github.com/tech-gian/utf-8-validator

A university's project, to create a utf-8 validator
https://github.com/tech-gian/utf-8-validator

c utf-8 validator

Last synced: about 1 year ago
JSON representation

A university's project, to create a utf-8 validator

Awesome Lists containing this project

README

          

# UTF-8-Validator
Developed as an assignment for the Introduction to Programming class at the National Kapodistrian University of Athens by Angelos Sfyrakis.

# What is UTF-8- ?

UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit. UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes.

![](sQuKr.png)
# Prefix code

The first byte indicates the number of bytes in the sequence. Reading from a stream can instantaneously decode each individual fully received sequence, without first having to wait for either the first byte of a next sequence or an end-of-stream indication. The length of multi-byte sequences is easily determined by humans as it is simply the number of high-order 1s in the leading byte. An incorrect character will not be decoded if a stream ends mid-sequence.

# Compilation and Input

-To compile:
```bash
gcc utf8validate.c -o name
```
Input must be a text file for the program to run.

-To run:
```bash
./name < text_file_name
```
# Notes
In this assignment students were given instructions not to use arrays(strings included).