{"id":13413528,"url":"https://github.com/chmike/varint","last_synced_at":"2026-01-06T06:56:30.284Z","repository":{"id":45866341,"uuid":"433404963","full_name":"chmike/varint","owner":"chmike","description":"variable length integer encoding using prefix code","archived":false,"fork":false,"pushed_at":"2023-10-07T07:59:10.000Z","size":151,"stargazers_count":13,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-07-31T20:52:32.399Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chmike.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-11-30T11:29:34.000Z","updated_at":"2024-07-09T09:48:43.000Z","dependencies_parsed_at":"2024-01-08T15:34:45.501Z","dependency_job_id":null,"html_url":"https://github.com/chmike/varint","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chmike%2Fvarint","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chmike%2Fvarint/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chmike%2Fvarint/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chmike%2Fvarint/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chmike","download_url":"https://codeload.github.com/chmike/varint/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221498768,"owners_count":16833057,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-30T20:01:42.460Z","updated_at":"2026-01-06T06:56:30.279Z","avatar_url":"https://github.com/chmike.png","language":"Go","readme":"# Variable length unsigned integer encoding\n\nSmall unsigned integer values are more frequent than big values. A substantial\ncompression may then be achieved by dropping the insignificant 0 bits when\nserializing integers in a byte sequence. The byte length of the encoded \ninteger will then vary according to the number of significant bits. \n\nThe length of the encoded integer must be encoded with the integer bits so\nthat the original integer can be restored. There exist different ways to \nencode a varying length integer. Here we consider only the encoding of `uint64`\nvalues. \n\nThe most popular of such encoding is \n[LEB128](https://en.wikipedia.org/wiki/LEB128). It allows to encode integers of\nany byte size. It is the encoding used in UTF8 characters and in the \n`encoding/binary` package for `Uvarint`. The maximum byte length of an `uint64`\nencoded in LEB128 is 10 bytes. \n\nThis package implements the \n[prefix code](https://en.wikipedia.org/wiki/Prefix_code). The significant bytes\nare serialized in big endian order, and the most significant bits of the first \nbyte set to 1 encode the number of bytes that follow the first byte. This \nencoding requires less bit fiddling than the postfiy code to encode or decode. \nThe maximum byte length of an `uint64` prefix encoded is 9 bytes.\n\nAn alternate encoding is the [postfix code](). In this encoding the significant \nbytes are serialized in little endian order. The byte length is encoded as a \nnumber of 0 bits in the last byte which appears first in the encoded byte \nsequence. It may seam more performant than the prefix code since words may be \nwritten in memory in one instruction on little endian computers, but the bit \nfiddling adds a non nigligible overhead. Benchmarks comparing prefix en postfix\ncode show indeed that prefix code is more performant when coded in Go.\n\nThere exist other encoding as the one use by SQLight, but we don't discuss them\nhere. \n\n## Usage\n\nTo use this package you must first make sure that you have a `go.mod` file in \nyour project. To create a `go.mod` file, you must issue the following command in\nyour terminal where you replace `myProgram` with the name of your program. \n\n```bash\n$ go mod init myProgram\n```\n\nThe second step is to import this package in the go source files that will use \nits functions. \n\n```go\nimport \"github.com/chmike/varint\"\n```\n\n### Encoding\n\nTo encode a `uint64` value, use the following function. It will serialize the \nvalue in the byte slice and return the number of bytes written. If the slice \nis too small to hold the encoded integer, the function returns 0. The \nfunction never panics.\n\n```go\nfunc varint.Encode([]byte, uint64) int\n```\n\nIn your program, you would encode the value `1234` in the byte slice `b` \nlike this:\n\n```go\nl := varint.Encode(b, 1234)\nif l == 0 {\n  // b is too small to hold the encoded value\n}\nb = b[l:]\n```\n\n### Decoding\n\nTo decode an encoded `uint64` value, use the following function. It will deserialize\nthe value and return the number of bytes read. If the slice is too small to contain\na value (slice empty or value truncated), the function returns the value 0 and 0 bytes\nread. The function never panics.\n\n```go\nfunc varint.Decode([]byte) (uint64,int)\n```\n\nIn your program, you would decode an encoded value in the byte slice `b` like this:\n\n```go\nl, v := varint.Decod(b)\nif l == 0 {\n  // b is empty or the value is truncated\n}\nb = b[l:]\n```\n### difference with the Uvarint functions\n\nThese functions have the same API as the `binary.PutUvarint()` and the `binary.Uvarint`\nfunctions and can conveniently replace them. There are a number of differences though. \n\n- the `Uvarint()` and `PatUvarint` functions may panic\n- the function `PutUvarint()` will never return 0\n- `Uvarint()` may return a negative number of bytes read in some conditions.\n\n## Preformance \n\nThe extensive study of performance comparing different implementation of \npostfix and prefix encoding led to this code. Its performance is compared\nwith the `binary.PutUvarint()` and `binary.Uvarint()` and presented in\nthe following graphic. The data is made available in `docs/data.txt`.\n\n![benchmarks](img/benchmarks.png)\n\nThe LEB128 encoding with `binary.PutUvarint()` for values up to 35 bits \nis as fast as `varint.Encode()` function. Encoding values smaller than\n128 is inlined. `varint.Decode()` of such values is slower than `Uvarint`\nbecause it is not inlined by the compiler. \n\nFor unknown reason the LEB128 encoding and decoding functions become\nsignificantly slower for values bigger than 35bits on my computer. I \ndon't know if it's due to the compiler or the CPU. \n\nThe following graphic shows the time required to encode and decode a\n`uint64` value. This clearly shows that `varint` is faster. Since a value\nis usually written once and read many times, it shows that prefix encoding\nis a more efficient encoding than LEB128. \n\n![benchmark encode and decode](img/benchmarkEncodeAndDecode.png)\n\n### Note of caution on the benchmark results\n\nThe benchmark is advantaging the `chmike/varint` code due to branch prediction\noptimisation. Encoding a slice of uint64 values of different magnitude would\nmitigate this effect. A zipf distributed magnitude might be more realist. \n\nAny other suggestion for a better benchmarking method is welcome in the issue\nsection.\n","funding_links":[],"categories":["杂项","Miscellaneous","Uncategorized","Microsoft Office"],"sub_categories":["未分类的","Uncategorized"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchmike%2Fvarint","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchmike%2Fvarint","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchmike%2Fvarint/lists"}