An open API service indexing awesome lists of open source software.

https://github.com/rampaa/unicoderangetoutf16compliantregex

Non-Basic Multilingual Plane Regex Ranges to UTF-16 Compliant Regex
https://github.com/rampaa/unicoderangetoutf16compliantregex

regex regex-pattern regexp supplementary-plane surrogate-pairs unicode utf-16

Last synced: about 1 year ago
JSON representation

Non-Basic Multilingual Plane Regex Ranges to UTF-16 Compliant Regex

Awesome Lists containing this project

README

          

Some programming languages that use UTF-16 for strings face problems with unicode ranges not found in [Basic Multilingual Plane](https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane) (e.g., [CJK Unified Ideographs Extension B](https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B)) while matching those characters with using RegEx (see: https://github.com/dotnet/runtime/issues/79865). This program converts unsupported unicode range RegExes into UTF-16 compliant RegExes. For example, `[\U00020000-\U0002A6DF]` will be converted into `\uD840[\uDC00-\uDFFF]|[\uD841-\uD868][\uDC00-\uDFFF]|\uD869[\uDC00-\uDEDF]`.

The code is basically taken from https://stackoverflow.com/a/47627127 with some small modifications. This repo solely exists for the sake of convenience.