{"id":21690608,"url":"https://github.com/reiver/go-utf8","last_synced_at":"2026-03-07T20:03:55.353Z","repository":{"id":46511896,"uuid":"139397169","full_name":"reiver/go-utf8","owner":"reiver","description":"Package utf8 implements encoding and decoding of UTF-8, for the Go programming language. This package is meant to be a replacement for Go's built-in \"unicode/utf8\" package.","archived":false,"fork":false,"pushed_at":"2024-08-25T11:24:57.000Z","size":69,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-18T10:52:01.170Z","etag":null,"topics":["golang","unicode","utf-8","utf8"],"latest_commit_sha":null,"homepage":"https://godoc.org/github.com/reiver/go-utf8s","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/reiver.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-07-02T05:58:01.000Z","updated_at":"2024-08-25T11:25:00.000Z","dependencies_parsed_at":"2024-11-25T17:35:19.559Z","dependency_job_id":"1cc961b0-8bfd-496e-b0c6-99d3227c85f6","html_url":"https://github.com/reiver/go-utf8","commit_stats":null,"previous_names":["reiver/go-utf8s"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/reiver/go-utf8","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reiver%2Fgo-utf8","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reiver%2Fgo-utf8/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reiver%2Fgo-utf8/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reiver%2Fgo-utf8/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/reiver","download_url":"https://codeload.github.com/reiver/go-utf8/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/reiver%2Fgo-utf8/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30229589,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-07T19:01:10.287Z","status":"ssl_error","status_checked_at":"2026-03-07T18:59:58.103Z","response_time":53,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang","unicode","utf-8","utf8"],"created_at":"2024-11-25T17:32:24.757Z","updated_at":"2026-03-07T20:03:55.336Z","avatar_url":"https://github.com/reiver.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# go-utf8\n\nPackage **utf8** implements encoding and decoding of UTF-8, for the Go programming language.\n\nThis package is meant to be a replacement for Go's built-in `\"unicode/utf8\"` package.\n\n## Documention\n\nOnline documentation, which includes examples, can be found at: http://godoc.org/github.com/reiver/go-utf8\n\n[![GoDoc](https://godoc.org/github.com/reiver/go-utf8?status.svg)](https://godoc.org/github.com/reiver/go-utf8)\n\n## Reading a Single UTF-8 Character\n\nThis is the simplest way of reading a single UTF-8 character.\n\n```go\nvar reader io.Reader\n\n// ...\n\nr, n, err := utf8.ReadRune(reader)\n```\n## Write a Single UTF-8 Character\n\nThis is the simplest way of writing a single UTF-8 character.\n\n```go\nvar writer io.Writer\n\n// ...\n\nvar r rune\n\n// ...\n\nn, err := utf8.WriteRune(w, r)\n```\n## io.RuneReader\n\nThis is how you can create an `io.RuneReader`:\n\n```go\nvar reader io.Reader\n\n// ...\n\nvar runeReader io.RuneReader = utf8.NewRuneReader(reader)\n\n// ...\n\nr, n, err := runeReader.ReadRune()\n```\n## io.RuneScanner\n\nThis is how you can create an `io.RuneScanner`:\n\n```go\nvar reader io.Reader\n\n// ...\n\nvar runeScanner io.RuneScanner := utf8.NewRuneScanner(reader)\n\n// ...\n\nr, n, err := runeScanner.ReadRune()\n\n// ...\n\nerr = runeScanner.UnreadRune()\n```\n\n## UTF-8\n\nUTF-8 is a variable length encoding of Unicode.\nAn encoding of a single Unicode code point can be from 1 to 4 bytes longs.\n\nSome examples of UTF-8 encoding of Unicode code points are:\n\n\u003ctable\u003e\n\t\u003cthead\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd colspan=\"4\" align=\"center\"\u003eUTF-8 encoding\u003c/td\u003e\n\t\t\t\u003ctd rowspan=\"2\"\u003evalue\u003c/td\u003e\n\t\t\t\u003ctd rowspan=\"2\"\u003ecode point\u003c/td\u003e\n\t\t\t\u003ctd rowspan=\"2\"\u003edecimal\u003c/td\u003e\n\t\t\t\u003ctd rowspan=\"2\"\u003ebinary\u003c/td\u003e\n\t\t\t\u003ctd rowspan=\"2\"\u003ename\u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd\u003ebyte 1\u003c/td\u003e\n\t\t\t\u003ctd\u003ebyte 2\u003c/td\u003e\n\t\t\t\u003ctd\u003ebyte 3\u003c/td\u003e\n\t\t\t\u003ctd\u003ebyte 4\u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\u003c/thead\u003e\n\t\u003ctbody\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b0,1000001\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003eA\u003c/td\u003e\n\t\t\t\u003ctd\u003eU+0041\u003c/td\u003e\n\t\t\t\u003ctd\u003e65\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b0000,0000,0100,0001\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003eLATIN CAPITAL LETTER A\u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b0,1110010\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003er\u003c/td\u003e\n\t\t\t\u003ctd\u003eU+0072\u003c/td\u003e\n\t\t\t\u003ctd\u003e114\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b0000,0000,0111,0010\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003eLATIN SMALL LETTER R\u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b110,00010\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,100001\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e¡\u003c/td\u003e\n\t\t\t\u003ctd\u003eU+00A1\u003c/td\u003e\n\t\t\t\u003ctd\u003e161\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b0000,0000,1010,0001\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003eINVERTED EXCLAMATION MARK\u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b110,11011\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,110101\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e۵\u003c/td\u003e\n\t\t\t\u003ctd\u003eU+06F5\u003c/td\u003e\n\t\t\t\u003ctd\u003e1781\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b0000,0110,1111,0101\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003eEXTENDED ARABIC-INDIC DIGIT FIVE\u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b1110,0010\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,000000\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,110001\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e‱\u003c/td\u003e\n\t\t\t\u003ctd\u003eU+2031\u003c/td\u003e\n\t\t\t\u003ctd\u003e8241\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b0010,0000,0011,0001\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003ePER TEN THOUSAND SIGN\u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b1110,0010\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,001001\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,100001\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e≡\u003c/td\u003e\n\t\t\t\u003ctd\u003eU+2261\u003c/td\u003e\n\t\t\t\u003ctd\u003e8801\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b0010,0010,0110,0001\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003eIDENTICAL TO\u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b11110,000\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,010000\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,001111\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,010101\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e𐏕\u003c/td\u003e\n\t\t\t\u003ctd\u003eU+000103D5\u003c/td\u003e\n\t\t\t\u003ctd\u003e66517\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003eb0001,0000,0011,1101,0101\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003eOLD PERSIAN NUMBER HUNDRED\u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\t\u003ctr\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b11110,000\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,011111\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,011001\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b10,000010\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003e🙂\u003c/td\u003e\n\t\t\t\u003ctd\u003eU+0001F642\u003c/td\u003e\n\t\t\t\u003ctd\u003e128578\u003c/td\u003e\n\t\t\t\u003ctd\u003e\u003ccode\u003e0b0001,1111,0110,0100,0010\u003c/code\u003e\u003c/td\u003e\n\t\t\t\u003ctd\u003eSLIGHTLY SMILING FACE\u003c/td\u003e\n\t\t\u003c/tr\u003e\n\t\u003c/tbody\u003e\n\u003c/table\u003e\n\n## UTF-8 Versus ASCII\n\nUTF-8 was (partially) designed to be backwards compatible with 7-bit ASCII.\n\nThus, all 7-bit ASCII is valid UTF-8.\n\n## UTF-8 Encoding\n\nSince, at least as of 2003, Unicode fits into 21 bits, and thus UTF-8 was designed to support at most 21 bits of information.\n\nThis is done as described in the following table:\n\n| # of bytes | # bits for code point | 1st code point |  last code point |   byte 1   |   byte 2   |   byte 3   |   byte 4   |\n|------------|-----------------------|----------------|------------------|------------|------------|------------|------------|\n|     1      |            7          |    U+000000    |     U+00007F     | `0xxxxxxx` |            |            |            |\n|     2      |           11          |    U+000080    |     U+0007FF     | `110xxxxx` | `10xxxxxx` |            |            |\n|     3      |           16          |    U+000800    |     U+00FFFF     | `1110xxxx` | `10xxxxxx` | `10xxxxxx` |            |\n|     4      |           21          |    U+010000    |     U+10FFFF     | `11110xxx` | `10xxxxxx` | `10xxxxxx` | `10xxxxxx` |\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freiver%2Fgo-utf8","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Freiver%2Fgo-utf8","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Freiver%2Fgo-utf8/lists"}