{"id":21835065,"url":"https://github.com/simdutf/is_utf8","last_synced_at":"2025-03-17T16:11:42.735Z","repository":{"id":65092675,"uuid":"581877704","full_name":"simdutf/is_utf8","owner":"simdutf","description":"Fast C++ function \"is_utf8\": checks if the input is valid UTF-8. Made of a single source file. Optimized for ARM NEON, x64 SSE,  AVX2 and AVX-512.","archived":false,"fork":false,"pushed_at":"2024-09-30T18:25:54.000Z","size":191,"stargazers_count":58,"open_issues_count":1,"forks_count":8,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-04T18:45:03.993Z","etag":null,"topics":["avx-512","avx2","cpp","neon","simd","unicode"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simdutf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-24T17:33:27.000Z","updated_at":"2025-02-15T19:16:30.000Z","dependencies_parsed_at":"2023-12-11T06:53:49.097Z","dependency_job_id":"570c3486-9f13-43c7-8ebf-ff20d243ec5c","html_url":"https://github.com/simdutf/is_utf8","commit_stats":null,"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simdutf%2Fis_utf8","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simdutf%2Fis_utf8/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simdutf%2Fis_utf8/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simdutf%2Fis_utf8/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simdutf","download_url":"https://codeload.github.com/simdutf/is_utf8/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244066180,"owners_count":20392406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["avx-512","avx2","cpp","neon","simd","unicode"],"created_at":"2024-11-27T20:17:17.169Z","updated_at":"2025-03-17T16:11:42.713Z","avatar_url":"https://github.com/simdutf.png","language":"C++","readme":"# is_utf8\n\nMost strings online are in unicode using the UTF-8 encoding. Validating strings\nquickly before accepting them is important.\n\n## How to use is_utf8\n\nThis is a simple one-source file library to validate UTF-8 strings at high\nspeeds using SIMD instructions. It works on all platforms (ARM, x64).\n\nBuild and link `is_utf8.cpp` with your project. Code usage:\n\n```C++\n  #include \"is_utf8.h\"\n\n  char * mystring = ...\n  bool is_it_valid = is_utf8(mystring, thestringlength);\n```\n\nIt should be able to validate strings using less than 1 cycle per input byte.\n\n## Requirements\n\n- C++11 compatible compiler. We support LLVM clang, GCC, Visual Studio. (Our\n  optional benchmark tool requires C++17.)\n- For high speed, you should have a recent 64-bit system (e.g., ARM or x64).\n- If you rely on CMake, you should use a recent CMake (at least 3.15).\n- AVX-512 support require a processor with AVX512-VBMI2 (Ice Lake or better) and\n  a recent compiler (GCC 8 or better, Visual Studio 2019 or better, LLVM clang 6\n  or better). You need a correspondingly recent assembler such as gas (2.30+) or\n  nasm (2.14+): recent compilers usually come with recent assemblers. If you mix\n  a recent compiler with an incompatible/old assembler (e.g., when using a\n  recent compiler with an old Linux distribution), you may get errors at build\n  time because the compiler produces instructions that the assembler does not\n  recognize: you should update your assembler to match your compiler (e.g.,\n  upgrade binutils to version 2.30 or better under Linux) or use an older\n  compiler matching the capabilities of your assembler.\n\n## Build with CMake\n\n```\ncmake -B build\ncmake --build build\ncd build\nctest .\n```\n\nVisual Studio users must specify whether they want to build the Release or Debug\nversion.\n\nTo run benchmarks, build and execute the `bench` command.\n\n```\ncmake -B build\ncmake --build build\n./build/benchmarks/bench\n```\n\nInstructions are similar for Visual Studio users.\n\n## Real-word usage\n\nThis C++ library is part of the JavaScript package\n[utf-8-validate](https://github.com/websockets/utf-8-validate). The\nutf-8-validate package is routinely downloaded more than\n[a million times per week](https://www.npmjs.com/package/utf-8-validate).\n\nIf you are using Node JS (19.4.0 or better), you already have access to this\nfunction as\n[`buffer.isUtf8(input)`](https://nodejs.org/api/buffer.html#bufferisutf8input).\n\n## Reference\n\n- John Keiser, Daniel Lemire,\n  [Validating UTF-8 In Less Than One Instruction Per Byte](https://arxiv.org/abs/2010.03090),\n  Software: Practice \u0026 Experience 51 (5), 2021\n\n## Want more?\n\nIf you want a wide range of fast Unicode function for production use, you can\nrely on the simdutf library. It is as simple as the following:\n\n```C++\n#include \"simdutf.cpp\"\n#include \"simdutf.h\"\n\nint main(int argc, char *argv[]) {\n  const char *source = \"1234\";\n  // 4 == strlen(source)\n  bool validutf8 = simdutf::validate_utf8(source, 4);\n  if (validutf8) {\n    std::cout \u003c\u003c \"valid UTF-8\" \u003c\u003c std::endl;\n  } else {\n    std::cerr \u003c\u003c \"invalid UTF-8\" \u003c\u003c std::endl;\n    return EXIT_FAILURE;\n  }\n}\n```\n\nSee https://github.com/simdutf/\n\n## License\n\nThis library is distributed under the terms of any of the following licenses, at\nyour option:\n\n- Apache License (Version 2.0) [LICENSE-APACHE](LICENSE-APACHE),\n- Boost Software License [LICENSE-BOOST](LICENSE-BOOST), or\n- MIT License [LICENSE-MIT](LICENSE-MIT).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimdutf%2Fis_utf8","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimdutf%2Fis_utf8","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimdutf%2Fis_utf8/lists"}