An open API service indexing awesome lists of open source software.

https://github.com/voku/portable-utf8

🉑 Portable UTF-8 library - performance optimized (unicode) string functions for PHP.
https://github.com/voku/portable-utf8

ascii hacktoberfest multibyte multibyte-strings php php7 string string-encoding string-manipulation unicode utf-8 utf8

Last synced: about 1 year ago
JSON representation

🉑 Portable UTF-8 library - performance optimized (unicode) string functions for PHP.

Awesome Lists containing this project

README

          

[//]: # (AUTO-GENERATED BY "PHP README Helper": base file -> docs/base.md)
[![SWUbanner](https://raw.githubusercontent.com/vshymanskyy/StandWithUkraine/main/banner2-direct.svg)](https://github.com/vshymanskyy/StandWithUkraine/blob/main/docs/README.md)

[![Build Status](https://github.com/voku/portable-utf8/actions/workflows/ci.yml/badge.svg?branch=master)](https://github.com/voku/portable-utf8/actions)
[![Build status](https://ci.appveyor.com/api/projects/status/gnejjnk7qplr7f5t/branch/master?svg=true)](https://ci.appveyor.com/project/voku/portable-utf8/branch/master)
[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fvoku%2Fportable-utf8.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Fvoku%2Fportable-utf8?ref=badge_shield)
[![codecov.io](https://codecov.io/github/voku/portable-utf8/coverage.svg?branch=master)](https://codecov.io/github/voku/portable-utf8?branch=master)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/997c9bb10d1c4791967bdf2e42013e8e)](https://www.codacy.com/app/voku/portable-utf8)
[![Latest Stable Version](https://poser.pugx.org/voku/portable-utf8/v/stable)](https://packagist.org/packages/voku/portable-utf8)
[![Total Downloads](https://poser.pugx.org/voku/portable-utf8/downloads)](https://packagist.org/packages/voku/portable-utf8)
[![License](https://poser.pugx.org/voku/portable-utf8/license)](https://packagist.org/packages/voku/portable-utf8)
[![Donate to this project using PayPal](https://img.shields.io/badge/paypal-donate-yellow.svg)](https://www.paypal.me/moelleken)
[![Donate to this project using Patreon](https://img.shields.io/badge/patreon-donate-yellow.svg)](https://www.patreon.com/voku)

# 🉑 Portable UTF-8

## Description

It is written in PHP (PHP 7+) and can work without "mbstring", "iconv" or any other extra encoding php-extension on your server.

The benefit of Portable UTF-8 is that it is easy to use, easy to bundle. This library will also
auto-detect your server environment and will use the installed php-extensions if they are available,
so you will have the best possible performance.

As a fallback we will use Symfony Polyfills, if needed. (https://github.com/symfony/polyfill)

The project based on ...
+ Hamid Sarfraz's work - [portable-utf8](http://pageconfig.com/attachments/portable-utf8.php)
+ Nicolas Grekas's work - [tchwork/utf8](https://github.com/tchwork/utf8)
+ Behat's work - [Behat/Transliterator](https://github.com/Behat/Transliterator)
+ Sebastián Grignoli's work - [neitanod/forceutf8](https://github.com/neitanod/forceutf8)
+ Ivan Enderlin's work - [hoaproject/Ustring](https://github.com/hoaproject/Ustring)
+ and many cherry-picks from "GitHub"-gists and "Stack Overflow"-snippets ...

## Demo

Here you can test some basic functions from this library and you can compare some results with the native php function results.

+ [encoder.suckup.de](https://encoder.suckup.de/)

## Index

* [Alternative](#alternative)
* [Install](#install-portable-utf-8-via-composer-require)
* [Why Portable UTF-8?](#why-portable-utf-8)
* [Requirements and Recommendations](#requirements-and-recommendations)
* [Warning](#warning)
* [Usage](#usage)
* [Class methods](#class-methods)
* [Unit Test](#unit-test)
* [License and Copyright](#license-and-copyright)

## Alternative

If you like a more Object Oriented Way to edit strings, then you can take a look at [voku/Stringy](https://github.com/voku/Stringy), it's a fork of "danielstjules/Stringy" but it used the "Portable UTF-8"-Class and some extra methods.

```php
// Standard library
strtoupper('fòôbàř'); // 'FòôBàř'
strlen('fòôbàř'); // 10

// mbstring
// WARNING: if you don't use a polyfill like "Portable UTF-8", you need to install the php-extension "mbstring" on your server
mb_strtoupper('fòôbàř'); // 'FÒÔBÀŘ'
mb_strlen('fòôbàř'); // '6'

// Portable UTF-8
use voku\helper\UTF8;
UTF8::strtoupper('fòôbàř'); // 'FÒÔBÀŘ'
UTF8::strlen('fòôbàř'); // '6'

// voku/Stringy
use Stringy\Stringy as S;
$stringy = S::create('fòôbàř');
$stringy->toUpperCase(); // 'FÒÔBÀŘ'
$stringy->length(); // '6'
```

## Install "Portable UTF-8" via "composer require"
```shell
composer require voku/portable-utf8
```

If your project do not need some of the Symfony polyfills please use the `replace` section of your `composer.json`.
This removes any overhead from these polyfills as they are no longer part of your project. e.g.:
```json
{
"replace": {
"symfony/polyfill-php72": "1.99",
"symfony/polyfill-iconv": "1.99",
"symfony/polyfill-intl-grapheme": "1.99",
"symfony/polyfill-intl-normalizer": "1.99",
"symfony/polyfill-mbstring": "1.99"
}
}
```

## Why Portable UTF-8?[]()
PHP 5 and earlier versions have no native Unicode support. To bridge the gap, there exist several extensions like "mbstring", "iconv" and "intl".

The problem with "mbstring" and others is that most of the time you cannot ensure presence of a specific one on a server. If you rely on one of these, your application is no more portable. This problem gets even severe for open source applications that have to run on different servers with different configurations. Considering these, I decided to write a library:

## Requirements and Recommendations

* No extensions are required to run this library. Portable UTF-8 only needs PCRE library that is available by default since PHP 4.2.0 and cannot be disabled since PHP 5.3.0. "\u" modifier support in PCRE for UTF-8 handling is not a must.
* PHP 5.3 is the minimum requirement, and all later versions are fine with Portable UTF-8.
* PHP 7.0 is the minimum requirement since version 4.0 of Portable UTF-8, otherwise composer will install an older version
* PHP 8.0 support is also available and will adapt the behaviours of the native functions.
* To speed up string handling, it is recommended that you have "mbstring" or "iconv" available on your server, as well as the latest version of PCRE library
* Although Portable UTF-8 is easy to use; moving from native API to Portable UTF-8 may not be straight-forward for everyone. It is highly recommended that you do not update your scripts to include Portable UTF-8 or replace or change anything before you first know the reason and consequences. Most of the time, some native function may be all what you need.
* There is also a shim for "mbstring", "iconv" and "intl", so you can use it also on shared webspace.

## Usage

Example 1: UTF8::cleanup()
```php
echo UTF8::cleanup('�Düsseldorf�');

// will output:
// Düsseldorf
```

Example 2: UTF8::strlen()
```php
$string = 'string with utf-8 chars åèä - doo-bee doo-bee dooh';

echo strlen($string) . "\n
";
echo UTF8::strlen($string) . "\n
";

// will output:
// 70
// 67

$string_test1 = strip_tags($string);
$string_test2 = UTF8::strip_tags($string);

echo strlen($string_test1) . "\n
";
echo UTF8::strlen($string_test2) . "\n
";

// will output:
// 53
// 50
```

Example 3: UTF8::fix_utf8()
```php

echo UTF8::fix_utf8('Düsseldorf');
echo UTF8::fix_utf8('ä');

// will output:
// Düsseldorf
// ä
```

# Portable UTF-8 | API

The API from the "UTF8"-Class is written as small static methods that will match the default PHP-API.

## Class methods

access
add_bom_to_string
array_change_key_case
between
binary_to_str
bom
callback
char_at
chars
checkForSupport
chr
chr_map
chr_size_list
chr_to_decimal
chr_to_hex
chunk_split
clean
cleanup
codepoints
collapse_whitespace
count_chars
css_identifier
css_stripe_media_queries
ctype_loaded
decimal_to_chr
decode_mimeheader
emoji_decode
emoji_encode
emoji_from_country_code
encode
encode_mimeheader
extract_text
file_get_contents
file_has_bom
filter
filter_input
filter_input_array
filter_var
filter_var_array
finfo_loaded
first_char
fits_inside
fix_simple_utf8
fix_utf8
getCharDirection
getSupportInfo
getUrlParamFromArray
get_file_type
get_random_string
get_unique_string
has_lowercase
has_uppercase
has_whitespace
hex_to_chr
hex_to_int
html_encode
html_entity_decode
html_escape
html_stripe_empty_tags
htmlentities
htmlspecialchars
iconv_loaded
int_to_hex
intlChar_loaded
intl_loaded
is_alpha
is_alphanumeric
is_ascii
is_base64
is_binary
is_binary_file
is_blank
is_bom
is_empty
is_hexadecimal
is_html
is_json
is_lowercase
is_printable
is_punctuation
is_serialized
is_uppercase
is_url
is_utf8
is_utf16
is_utf32
json_decode
json_encode
json_loaded
lcfirst
lcwords
levenshtein
ltrim
max
max_chr_width
mbstring_loaded
min
normalize_encoding
normalize_line_ending
normalize_msword
normalize_whitespace
ord
parse_str
pcre_utf8_support
range
rawurldecode
regex_replace
remove_bom
remove_duplicates
remove_html
remove_html_breaks
remove_ileft
remove_invisible_characters
remove_iright
remove_left
remove_right
replace
replace_all
replace_diamond_question_mark
rtrim
showSupport
single_chr_html_encode
spaces_to_tabs
str_camelize
str_capitalize_name
str_contains
str_contains_all
str_contains_any
str_dasherize
str_delimit
str_detect_encoding
str_ends_with
str_ends_with_any
str_ensure_left
str_ensure_right
str_humanize
str_iends_with
str_iends_with_any
str_insert
str_ireplace
str_ireplace_beginning
str_ireplace_ending
str_istarts_with
str_istarts_with_any
str_isubstr_after_first_separator
str_isubstr_after_last_separator
str_isubstr_before_first_separator
str_isubstr_before_last_separator
str_isubstr_first
str_isubstr_last
str_last_char
str_limit
str_limit_after_word
str_longest_common_prefix
str_longest_common_substring
str_longest_common_suffix
str_matches_pattern
str_obfuscate
str_offset_exists
str_offset_get
str_pad
str_pad_both
str_pad_left
str_pad_right
str_repeat
str_replace_beginning
str_replace_ending
str_replace_first
str_replace_last
str_shuffle
str_slice
str_snakeize
str_sort
str_split
str_split_array
str_split_pattern
str_starts_with
str_starts_with_any
str_substr_after_first_separator
str_substr_after_last_separator
str_substr_before_first_separator
str_substr_before_last_separator
str_substr_first
str_substr_last
str_surround
str_titleize
str_titleize_for_humans
str_to_binary
str_to_lines
str_to_words
str_truncate
str_truncate_safe
str_underscored
str_upper_camelize
str_word_count
strcasecmp
strcmp
strcspn
string
string_has_bom
strip_tags
strip_whitespace
stripos
stripos_in_byte
stristr
strlen
strlen_in_byte
strnatcasecmp
strnatcmp
strncasecmp
strncmp
strpbrk
strpos
strpos_in_byte
strrchr
strrev
strrichr
strripos
strripos_in_byte
strrpos
strrpos_in_byte
strspn
strstr
strstr_in_byte
strtocasefold
strtolower
strtoupper
strtr
strwidth
substr
substr_compare
substr_count
substr_count_in_byte
substr_count_simple
substr_ileft
substr_in_byte
substr_iright
substr_left
substr_replace
substr_right
swapCase
symfony_polyfill_used
tabs_to_spaces
titlecase
to_ascii
to_boolean
to_filename
to_int
to_iso8859
to_string
to_utf8
to_utf8_string
trim
ucfirst
ucwords
urldecode
utf8_decode
utf8_encode
whitespace_table
words_limit
wordwrap
wordwrap_per_line
ws

## access(string $str, int $pos, string $encoding): string

Return the character at the specified position: $str[1] like functionality.

EXAMPLE: UTF8::access('fòô', 1); // 'ò'

**Parameters:**
- `string $str

A UTF-8 string.

`
- `int $pos

The position of character to return.

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `string

Single multi-byte character.

`

--------

## add_bom_to_string(string $str): non-empty-string

Prepends UTF-8 BOM character to the string and returns the whole string.

INFO: If BOM already existed there, the Input string is returned.

EXAMPLE: UTF8::add_bom_to_string('fòô'); // "\xEF\xBB\xBF" . 'fòô'

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `non-empty-string

The output string that contains BOM.

`

--------

## array_change_key_case(array $array, int $case, string $encoding): string[]

Changes all keys in an array.

**Parameters:**
- `array $array

The array to work on

`
- `int $case [optional]

Either CASE_UPPER

or CASE_LOWER (default)

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `string[]

An array with its keys lower- or uppercased.

`

--------

## between(string $str, string $start, string $end, int $offset, string $encoding): string

Returns the substring between $start and $end, if found, or an empty
string. An optional offset may be supplied from which to begin the
search for the start string.

**Parameters:**
- `string $str`
- `string $start

Delimiter marking the start of the substring.

`
- `string $end

Delimiter marking the end of the substring.

`
- `int $offset [optional]

Index from which to begin the search. Default: 0

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `string`

--------

## binary_to_str(string $bin): string

Convert binary into a string.

INFO: opposite to UTF8::str_to_binary()

EXAMPLE: UTF8::binary_to_str('11110000100111111001100010000011'); // '😃'

**Parameters:**
- `string $bin 1|0`

**Return:**
- `string`

--------

## bom(): non-empty-string

Returns the UTF-8 Byte Order Mark Character.

INFO: take a look at UTF8::$bom for e.g. UTF-16 and UTF-32 BOM values

EXAMPLE: UTF8::bom(); // "\xEF\xBB\xBF"

**Parameters:**
__nothing__

**Return:**
- `non-empty-string

UTF-8 Byte Order Mark.

`

--------

## callback(callable(string): string $callback, string $str): string[]

**Parameters:**
- `callable(string): string $callback`
- `string $str`

**Return:**
- `string[]`

--------

## char_at(string $str, int $index, string $encoding): string

Returns the character at $index, with indexes starting at 0.

**Parameters:**
- `string $str

The input string.

`
- `int<1, max> $index

Position of the character.

`
- `string $encoding [optional]

Default is UTF-8

`

**Return:**
- `string

The character at $index.

`

--------

## chars(string $str): string[]

Returns an array consisting of the characters in the string.

**Parameters:**
- `T $str

The input string.

`

**Return:**
- `string[]

An array of chars.

`

--------

## checkForSupport(): true|null

This method will auto-detect your server environment for UTF-8 support.

**Parameters:**
__nothing__

**Return:**
- `true|null`

--------

## chr(int $code_point, string $encoding): string|null

Generates a UTF-8 encoded character from the given code point.

INFO: opposite to UTF8::ord()

EXAMPLE: UTF8::chr(0x2603); // '☃'

**Parameters:**
- `int $code_point

The code point for which to generate a character.

`
- `string $encoding [optional]

Default is UTF-8

`

**Return:**
- `string|null

Multi-byte character, returns null on failure or empty input.

`

--------

## chr_map(callable(string): string $callback, string $str): string[]

Applies callback to all characters of a string.

EXAMPLE: UTF8::chr_map([UTF8::class, 'strtolower'], 'Κόσμε'); // ['κ','ό', 'σ', 'μ', 'ε']

**Parameters:**
- `callable(string): string $callback`
- `string $str

UTF-8 string to run callback on.

`

**Return:**
- `string[]

The outcome of the callback, as array.

`

--------

## chr_size_list(string $str): int[]

Generates an array of byte length of each character of a Unicode string.

1 byte => U+0000 - U+007F
2 byte => U+0080 - U+07FF
3 byte => U+0800 - U+FFFF
4 byte => U+10000 - U+10FFFF

EXAMPLE: UTF8::chr_size_list('中文空白-test'); // [3, 3, 3, 3, 1, 1, 1, 1, 1]

**Parameters:**
- `T $str

The original unicode string.

`

**Return:**
- `int[]

An array of byte lengths of each character.

`

--------

## chr_to_decimal(string $char): int

Get a decimal code representation of a specific character.

INFO: opposite to UTF8::decimal_to_chr()

EXAMPLE: UTF8::chr_to_decimal('§'); // 0xa7

**Parameters:**
- `string $char

The input character.

`

**Return:**
- `int`

--------

## chr_to_hex(int|string $char, string $prefix): string

Get hexadecimal code point (U+xxxx) of a UTF-8 encoded character.

EXAMPLE: UTF8::chr_to_hex('§'); // U+00a7

**Parameters:**
- `int|string $char

The input character

`
- `string $prefix [optional]`

**Return:**
- `string

The code point encoded as U+xxxx.

`

--------

## chunk_split(string $str, int $chunk_length, string $end): string

Splits a string into smaller chunks and multiple lines, using the specified line ending character.

EXAMPLE: UTF8::chunk_split('ABC-ÖÄÜ-中文空白-κόσμε', 3); // "ABC\r\n-ÖÄ\r\nÜ-中\r\n文空白\r\n-κό\r\nσμε"

**Parameters:**
- `T $str

The original string to be split.

`
- `int<1, max> $chunk_length [optional]

The maximum character length of a chunk.

`
- `string $end [optional]

The character(s) to be inserted at the end of each chunk.

`

**Return:**
- `string

The chunked string.

`

--------

## clean(string $str, bool $remove_bom, bool $normalize_whitespace, bool $normalize_msword, bool $keep_non_breaking_space, bool $replace_diamond_question_mark, bool $remove_invisible_characters, bool $remove_invisible_characters_url_encoded): string

Accepts a string and removes all non-UTF-8 characters from it + extras if needed.

EXAMPLE: UTF8::clean("\xEF\xBB\xBF„Abcdef\xc2\xa0\x20…” — 😃 - Düsseldorf", true, true); // '„Abcdef  …” — 😃 - Düsseldorf'

**Parameters:**
- `string $str

The string to be sanitized.

`
- `bool $remove_bom [optional]

Set to true, if you need to remove
UTF-BOM.

`
- `bool $normalize_whitespace [optional]

Set to true, if you need to normalize the
whitespace.

`
- `bool $normalize_msword [optional]

Set to true, if you need to normalize MS
Word chars e.g.: "…"
=> "..."

`
- `bool $keep_non_breaking_space [optional]

Set to true, to keep non-breaking-spaces,
in
combination with
$normalize_whitespace

`
- `bool $replace_diamond_question_mark [optional]

Set to true, if you need to remove diamond
question mark e.g.: "�"

`
- `bool $remove_invisible_characters [optional]

Set to false, if you not want to remove
invisible characters e.g.: "\0"

`
- `bool $remove_invisible_characters_url_encoded [optional]

Set to true, if you not want to remove
invisible url encoded characters e.g.: "%0B"
WARNING:
maybe contains false-positives e.g. aa%0Baa -> aaaa.

`

**Return:**
- `string

An clean UTF-8 encoded string.

`

--------

## cleanup(string $str): string

Clean-up a string and show only printable UTF-8 chars at the end + fix UTF-8 encoding.

EXAMPLE: UTF8::cleanup("\xEF\xBB\xBF„Abcdef\xc2\xa0\x20…” — 😃 - Düsseldorf", true, true); // '„Abcdef  …” — 😃 - Düsseldorf'

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `string`

--------

## codepoints(string|string[] $arg, bool $use_u_style): int[]|string[]

Accepts a string or an array of chars and returns an array of Unicode code points.

INFO: opposite to UTF8::string()

EXAMPLE:
UTF8::codepoints('κöñ'); // array(954, 246, 241)
// ... OR ...
UTF8::codepoints('κöñ', true); // array('U+03ba', 'U+00f6', 'U+00f1')

**Parameters:**
- `T $arg

A UTF-8 encoded string or an array of such chars.

`
- `bool $use_u_style

If True, will return code points in U+xxxx format,
default, code points will be returned as integers.

`

**Return:**
- `int[]|string[]


The array of code points:

int[] for $u_style === false

string[] for $u_style === true

`

--------

## collapse_whitespace(string $str): string

Trims the string and replaces consecutive whitespace characters with a
single space. This includes tabs and newline characters, as well as
multibyte whitespace such as the thin space and ideographic space.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `string

A string with trimmed $str and condensed whitespace.

`

--------

## count_chars(string $str, bool $clean_utf8, bool $try_to_use_mb_functions): int[]

Returns count of characters used in a string.

EXAMPLE: UTF8::count_chars('κaκbκc'); // array('κ' => 3, 'a' => 1, 'b' => 1, 'c' => 1)

**Parameters:**
- `T $str

The input string.

`
- `bool $clean_utf8 [optional]

Remove non UTF-8 chars from the string.

`
- `bool $try_to_use_mb_functions [optional]

Set to false, if you don't want to use`

**Return:**
- `int[]

An associative array of Character as keys and
their count as values.

`

--------

## css_identifier(string $str, string[] $filter, bool $strip_tags, bool $strtolower): string

Create a valid CSS identifier for e.g. "class"- or "id"-attributes.

EXAMPLE: UTF8::css_identifier('123foo/bar!!!'); // _23foo-bar

copy&past from https://github.com/drupal/core/blob/8.8.x/lib/Drupal/Component/Utility/Html.php#L95

**Parameters:**
- `string $str

INFO: if no identifier is given e.g. " " or "", we will create a unique string automatically

`
- `array $filter`
- `bool $strip_tags`
- `bool $strtolower`

**Return:**
- `string`

--------

## css_stripe_media_queries(string $str): string

Remove css media-queries.

**Parameters:**
- `string $str`

**Return:**
- `string`

--------

## ctype_loaded(): bool

Checks whether ctype is available on the server.

**Parameters:**
__nothing__

**Return:**
- `bool

true if available, false otherwise

`

--------

## decimal_to_chr(int|string $int): string

Converts an int value into a UTF-8 character.

INFO: opposite to UTF8::string()

EXAMPLE: UTF8::decimal_to_chr(931); // 'Σ'

**Parameters:**
- `int|string $int`

**Return:**
- `string`

--------

## decode_mimeheader(string $str, string $encoding): false|string

Decodes a MIME header field

**Parameters:**
- `string $str`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `false|string

A decoded MIME field on success,
or false if an error occurs during the decoding.

`

--------

## emoji_decode(string $str, bool $use_reversible_string_mappings): string

Decodes a string which was encoded by "UTF8::emoji_encode()".

INFO: opposite to UTF8::emoji_encode()

EXAMPLE:
UTF8::emoji_decode('foo CHARACTER_OGRE', false); // 'foo 👹'
//
UTF8::emoji_decode('foo _-_PORTABLE_UTF8_-_308095726_-_627590803_-_8FTU_ELBATROP_-_', true); // 'foo 👹'

**Parameters:**
- `string $str

The input string.

`
- `bool $use_reversible_string_mappings [optional]


When TRUE, we se a reversible string mapping
between "emoji_encode" and "emoji_decode".

`

**Return:**
- `string`

--------

## emoji_encode(string $str, bool $use_reversible_string_mappings): string

Encode a string with emoji chars into a non-emoji string.

INFO: opposite to UTF8::emoji_decode()

EXAMPLE:
UTF8::emoji_encode('foo 👹', false)); // 'foo CHARACTER_OGRE'
//
UTF8::emoji_encode('foo 👹', true)); // 'foo _-_PORTABLE_UTF8_-_308095726_-_627590803_-_8FTU_ELBATROP_-_'

**Parameters:**
- `string $str

The input string

`
- `bool $use_reversible_string_mappings [optional]


when TRUE, we use a reversible string mapping
between "emoji_encode" and "emoji_decode"

`

**Return:**
- `string`

--------

## emoji_from_country_code(string $country_code_iso_3166_1): string

Convert any two-letter country code (ISO 3166-1) to the corresponding Emoji.

**Parameters:**
- `string $country_code_iso_3166_1

e.g. DE

`

**Return:**
- `string

Emoji or empty string on error.

`

--------

## encode(string $to_encoding, string $str, bool $auto_detect_the_from_encoding, string $from_encoding): string

Encode a string with a new charset-encoding.

INFO: This function will also try to fix broken / double encoding,
so you can call this function also on a UTF-8 string and you don't mess up the string.

EXAMPLE:
UTF8::encode('ISO-8859-1', '-ABC-中文空白-'); // '-ABC-????-'
//
UTF8::encode('UTF-8', '-ABC-中文空白-'); // '-ABC-中文空白-'
//
UTF8::encode('HTML', '-ABC-中文空白-'); // '-ABC-中文空白-'
//
UTF8::encode('BASE64', '-ABC-中文空白-'); // 'LUFCQy3kuK3mlofnqbrnmb0t'

**Parameters:**
- `string $to_encoding

e.g. 'UTF-16', 'UTF-8', 'ISO-8859-1', etc.

`
- `string $str

The input string

`
- `bool $auto_detect_the_from_encoding [optional]

Force the new encoding (we try to fix broken / double
encoding for UTF-8)
otherwise we auto-detect the current
string-encoding

`
- `string $from_encoding [optional]

e.g. 'UTF-16', 'UTF-8', 'ISO-8859-1', etc.

A empty string will trigger the autodetect anyway.

`

**Return:**
- `string`

--------

## encode_mimeheader(string $str, string $from_charset, string $to_charset, string $transfer_encoding, string $linefeed, int $indent): false|string

**Parameters:**
- `string $str`
- `string $from_charset [optional]

Set the input charset.

`
- `string $to_charset [optional]

Set the output charset.

`
- `string $transfer_encoding [optional]

Set the transfer encoding.

`
- `string $linefeed [optional]

Set the used linefeed.

`
- `int<1, max> $indent [optional]

Set the max length indent.

`

**Return:**
- `false|string

An encoded MIME field on success,
or false if an error occurs during the encoding.

`

--------

## extract_text(string $str, string $search, int|null $length, string $replacer_for_skipped_text, string $encoding): string

Create an extract from a sentence, so if the search-string was found, it tries to center in the output.

**Parameters:**
- `string $str

The input string.

`
- `string $search

The searched string.

`
- `int|null $length [optional]

Default: null === text->length / 2

`
- `string $replacer_for_skipped_text [optional]

Default: …

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `string`

--------

## file_get_contents(string $filename, bool $use_include_path, resource|null $context, int|null $offset, int|null $max_length, int $timeout, bool $convert_to_utf8, string $from_encoding): false|string

Reads entire file into a string.

EXAMPLE: UTF8::file_get_contents('utf16le.txt'); // ...

WARNING: Do not use UTF-8 Option ($convert_to_utf8) for binary files (e.g.: images) !!!

**Parameters:**
- `string $filename


Name of the file to read.

`
- `bool $use_include_path [optional]


Prior to PHP 5, this parameter is called
use_include_path and is a bool.
As of PHP 5 the FILE_USE_INCLUDE_PATH can be used
to trigger include path
search.

`
- `resource|null $context [optional]


A valid context resource created with
stream_context_create. If you don't need to use a
custom context, you can skip this parameter by &null;.

`
- `int|null $offset [optional]


The offset where the reading starts.

`
- `int<0, max>|null $max_length [optional]


Maximum length of data read. The default is to read until end
of file is reached.

`
- `int $timeout

The time in seconds for the timeout.

`
- `bool $convert_to_utf8 WARNING!!!

Maybe you can't use this option for
some files, because they used non default utf-8 chars. Binary files
like images or pdf will not be converted.

`
- `string $from_encoding [optional]

e.g. 'UTF-16', 'UTF-8', 'ISO-8859-1', etc.

A empty string will trigger the autodetect anyway.

`

**Return:**
- `false|string

The function returns the read data as string or false on failure.

`

--------

## file_has_bom(string $file_path): bool

Checks if a file starts with BOM (Byte Order Mark) character.

EXAMPLE: UTF8::file_has_bom('utf8_with_bom.txt'); // true

**Parameters:**
- `string $file_path

Path to a valid file.

`

**Return:**
- `bool

true if the file has BOM at the start, false otherwise

`

--------

## filter(array|object|string $var, int $normalization_form, string $leading_combining): mixed

Normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

EXAMPLE: UTF8::filter(array("\xE9", 'à', 'a')); // array('é', 'à', 'a')

**Parameters:**
- `TFilter $var`
- `int $normalization_form`
- `string $leading_combining`

**Return:**
- `mixed`

--------

## filter_input(int $type, string $variable_name, int $filter, int|int[]|null $options): mixed

"filter_input()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

Gets a specific external variable by name and optionally filters it.

EXAMPLE:
// _GET['foo'] = 'bar';
UTF8::filter_input(INPUT_GET, 'foo', FILTER_UNSAFE_RAW)); // 'bar'

**Parameters:**
- `int $type


One of INPUT_GET, INPUT_POST,
INPUT_COOKIE, INPUT_SERVER, or
INPUT_ENV.

`
- `string $variable_name


Name of a variable to get.

`
- `int $filter [optional]


The ID of the filter to apply. The
manual page lists the available filters.

`
- `int|int[]|null $options [optional]


Associative array of options or bitwise disjunction of flags. If filter
accepts options, flags can be provided in "flags" field of array.

`

**Return:**
- `mixed


Value of the requested variable on success, FALSE if the filter fails, or NULL if the
variable_name variable is not set. If the flag FILTER_NULL_ON_FAILURE is used, it
returns FALSE if the variable is not set and NULL if the filter fails.

`

--------

## filter_input_array(int $type, array|null $definition, bool $add_empty): array|false|null

"filter_input_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

Gets external variables and optionally filters them.

EXAMPLE:
// _GET['foo'] = 'bar';
UTF8::filter_input_array(INPUT_GET, array('foo' => 'FILTER_UNSAFE_RAW')); // array('bar')

**Parameters:**
- `int $type


One of INPUT_GET, INPUT_POST,
INPUT_COOKIE, INPUT_SERVER, or
INPUT_ENV.

`
- `array|null $definition [optional]


An array defining the arguments. A valid key is a string
containing a variable name and a valid value is either a filter type, or an array
optionally specifying the filter, flags and options. If the value is an
array, valid keys are filter which specifies the
filter type,
flags which specifies any flags that apply to the
filter, and options which specifies any options that
apply to the filter. See the example below for a better understanding.



This parameter can be also an integer holding a filter constant. Then all values in the
input array are filtered by this filter.

`
- `bool $add_empty [optional]


Add missing keys as NULL to the return value.

`

**Return:**
- `array|false|null


An array containing the values of the requested variables on success, or FALSE on failure.
An array value will be FALSE if the filter fails, or NULL if the variable is not
set. Or if the flag FILTER_NULL_ON_FAILURE is used, it returns FALSE if the variable
is not set and NULL if the filter fails.

`

--------

## filter_var(float|int|string|null $variable, int $filter, int|int[] $options): mixed

"filter_var()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

Filters a variable with a specified filter.

EXAMPLE: UTF8::filter_var('-ABC-中文空白-', FILTER_VALIDATE_URL); // false

**Parameters:**
- `float|int|string|null $variable


Value to filter.

`
- `int $filter [optional]


The ID of the filter to apply. The
manual page lists the available filters.

`
- `int|int[] $options [optional]


Associative array of options or bitwise disjunction of flags. If filter
accepts options, flags can be provided in "flags" field of array. For
the "callback" filter, callable type should be passed. The
callback must accept one argument, the value to be filtered, and return
the value after filtering/sanitizing it.




// for filters that accept options, use this format
$options = array(
'options' => array(
'default' => 3, // value to return if the filter fails
// other options here
'min_range' => 0
),
'flags' => FILTER_FLAG_ALLOW_OCTAL,
);
$var = filter_var('0755', FILTER_VALIDATE_INT, $options);
// for filter that only accept flags, you can pass them directly
$var = filter_var('oops', FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE);
// for filter that only accept flags, you can also pass as an array
$var = filter_var('oops', FILTER_VALIDATE_BOOLEAN,
array('flags' => FILTER_NULL_ON_FAILURE));
// callback validate filter
function foo($value)
{
// Expected format: Surname, GivenNames
if (strpos($value, ", ") === false) return false;
list($surname, $givennames) = explode(", ", $value, 2);
$empty = (empty($surname) || empty($givennames));
$notstrings = (!is_string($surname) || !is_string($givennames));
if ($empty || $notstrings) {
return false;
} else {
return $value;
}
}
$var = filter_var('Doe, Jane Sue', FILTER_CALLBACK, array('options' => 'foo'));

`

**Return:**
- `mixed

The filtered data, or FALSE if the filter fails.

`

--------

## filter_var_array(array $data, array|int $definition, bool $add_empty): array|false|null

"filter_var_array()"-wrapper with normalizes to UTF-8 NFC, converting from WINDOWS-1252 when needed.

Gets multiple variables and optionally filters them.

EXAMPLE:
$filters = [
'name' => ['filter' => FILTER_CALLBACK, 'options' => [UTF8::class, 'ucwords']],
'age' => ['filter' => FILTER_VALIDATE_INT, 'options' => ['min_range' => 1, 'max_range' => 120]],
'email' => FILTER_VALIDATE_EMAIL,
];

$data = [
'name' => 'κόσμε',
'age' => '18',
'email' => 'foo@bar.de'
];

UTF8::filter_var_array($data, $filters, true); // ['name' => 'Κόσμε', 'age' => 18, 'email' => 'foo@bar.de']

**Parameters:**
- `array $data


An array with string keys containing the data to filter.

`
- `array|int $definition [optional]


An array defining the arguments. A valid key is a string
containing a variable name and a valid value is either a
filter type, or an
array optionally specifying the filter, flags and options.
If the value is an array, valid keys are filter
which specifies the filter type,
flags which specifies any flags that apply to the
filter, and options which specifies any options that
apply to the filter. See the example below for a better understanding.



This parameter can be also an integer holding a filter constant. Then all values
in the input array are filtered by this filter.

`
- `bool $add_empty [optional]


Add missing keys as NULL to the return value.

`

**Return:**
- `array|false|null


An array containing the values of the requested variables on success, or FALSE on failure.
An array value will be FALSE if the filter fails, or NULL if the variable is not
set.

`

--------

## finfo_loaded(): bool

Checks whether finfo is available on the server.

**Parameters:**
__nothing__

**Return:**
- `bool

true if available, false otherwise

`

--------

## first_char(string $str, int $n, string $encoding): string

Returns the first $n characters of the string.

**Parameters:**
- `T $str

The input string.

`
- `int<1, max> $n

Number of characters to retrieve from the start.

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `string`

--------

## fits_inside(string $str, int $box_size): bool

Check if the number of Unicode characters isn't greater than the specified integer.

EXAMPLE: UTF8::fits_inside('κόσμε', 6); // false

**Parameters:**
- `string $str the original string to be checked`
- `int $box_size the size in number of chars to be checked against string`

**Return:**
- `bool

TRUE if string is less than or equal to $box_size, FALSE otherwise.

`

--------

## fix_simple_utf8(string $str): string

Try to fix simple broken UTF-8 strings.

INFO: Take a look at "UTF8::fix_utf8()" if you need a more advanced fix for broken UTF-8 strings.

EXAMPLE: UTF8::fix_simple_utf8('Düsseldorf'); // 'Düsseldorf'

If you received an UTF-8 string that was converted from Windows-1252 as it was ISO-8859-1
(ignoring Windows-1252 chars from 80 to 9F) use this function to fix it.
See: http://en.wikipedia.org/wiki/Windows-1252

**Parameters:**
- `string $str

The input string

`

**Return:**
- `string`

--------

## fix_utf8(string|string[] $str): string|string[]

Fix a double (or multiple) encoded UTF8 string.

EXAMPLE: UTF8::fix_utf8('Fédération'); // 'Fédération'

**Parameters:**
- `TFixUtf8 $str you can use a string or an array of strings`

**Return:**
- `string|string[]

Will return the fixed input-"array" or
the fixed input-"string".

`

--------

## getCharDirection(string $char): string

Get character of a specific character.

EXAMPLE: UTF8::getCharDirection('ا'); // 'RTL'

**Parameters:**
- `string $char`

**Return:**
- `string

'RTL' or 'LTR'.

`

--------

## getSupportInfo(string|null $key): mixed

Check for php-support.

**Parameters:**
- `string|null $key`

**Return:**
- `mixed Return the full support-"array", if $key === null

return bool-value, if $key is used and available

otherwise return null`

--------

## getUrlParamFromArray(string $param, array $data): mixed

Get data from an array via array like string.

EXAMPLE: $array['foo'][123] = 'lall'; UTF8::getUrlParamFromArray('foo[123]', $array); // 'lall'

**Parameters:**
- `string $param`
- `array $data`

**Return:**
- `mixed`

--------

## get_file_type(string $str, array $fallback):

Warning: this method only works for some file-types (png, jpg)
if you need more supported types, please use e.g. "finfo"

**Parameters:**
- `string $str`
- `array{ext: (null|string), mime: (null|string), type: (null|string)} $fallback`

**Return:**
- `array{ext: (null|string), mime: (null|string), type: (null|string)}`

--------

## get_random_string(int $length, string $possible_chars, string $encoding): string

**Parameters:**
- `int<1, max> $length

Length of the random string.

`
- `T $possible_chars [optional]

Characters string for the random selection.

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `string`

--------

## get_unique_string(int|string $extra_entropy, bool $use_md5): non-empty-string

**Parameters:**
- `int|string $extra_entropy [optional]

Extra entropy via a string or int value.

`
- `bool $use_md5 [optional]

Return the unique identifier as md5-hash? Default: true

`

**Return:**
- `non-empty-string`

--------

## has_lowercase(string $str): bool

Returns true if the string contains a lower case char, false otherwise.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not the string contains a lower case character.

`

--------

## has_uppercase(string $str): bool

Returns true if the string contains an upper case char, false otherwise.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not the string contains an upper case character.

`

--------

## has_whitespace(string $str): bool

Returns true if the string contains whitespace, false otherwise.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not the string contains whitespace.

`

--------

## hex_to_chr(string $hexdec): string

Converts a hexadecimal value into a UTF-8 character.

INFO: opposite to UTF8::chr_to_hex()

EXAMPLE: UTF8::hex_to_chr('U+00a7'); // '§'

**Parameters:**
- `string $hexdec

The hexadecimal value.

`

**Return:**
- `string

One single UTF-8 character.

`

--------

## hex_to_int(string $hexdec): false|int

Converts hexadecimal U+xxxx code point representation to integer.

INFO: opposite to UTF8::int_to_hex()

EXAMPLE: UTF8::hex_to_int('U+00f1'); // 241

**Parameters:**
- `string $hexdec

The hexadecimal code point representation.

`

**Return:**
- `false|int

The code point, or false on failure.

`

--------

## html_encode(string $str, bool $keep_ascii_chars, string $encoding): string

Converts a UTF-8 string to a series of HTML numbered entities.

INFO: opposite to UTF8::html_decode()

EXAMPLE: UTF8::html_encode('中文空白'); // '中文空白'

**Parameters:**
- `T $str

The Unicode string to be encoded as numbered entities.

`
- `bool $keep_ascii_chars [optional]

Keep ASCII chars.

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `string

HTML numbered entities.

`

--------

## html_entity_decode(string $str, int|null $flags, string $encoding): string

UTF-8 version of html_entity_decode()

The reason we are not using html_entity_decode() by itself is because
while it is not technically correct to leave out the semicolon
at the end of an entity most browsers will still interpret the entity
correctly. html_entity_decode() does not convert entities without
semicolons, so we are left with our own little solution here. Bummer.

Convert all HTML entities to their applicable characters.

INFO: opposite to UTF8::html_encode()

EXAMPLE: UTF8::html_entity_decode('中文空白'); // '中文空白'

**Parameters:**
- `T $str


The input string.

`
- `int|null $flags [optional]


A bitmask of one or more of the following flags, which specify how to handle quotes
and which document type to use. The default is ENT_COMPAT | ENT_HTML401.


Available flags constants

Constant Name
Description

ENT_COMPAT
Will convert double-quotes and leave single-quotes alone.

ENT_QUOTES
Will convert both double and single quotes.

ENT_NOQUOTES
Will leave both double and single quotes unconverted.

ENT_HTML401

Handle code as HTML 4.01.

ENT_XML1

Handle code as XML 1.

ENT_XHTML

Handle code as XHTML.

ENT_HTML5

Handle code as HTML 5.

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `string

The decoded string.

`

--------

## html_escape(string $str, string $encoding): string

Create a escape html version of the string via "UTF8::htmlspecialchars()".

**Parameters:**
- `string $str`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `string`

--------

## html_stripe_empty_tags(string $str): string

Remove empty html-tag.

e.g.:

**Parameters:**
- `string $str`

**Return:**
- `string`

--------

## htmlentities(string $str, int $flags, string $encoding, bool $double_encode): string

Convert all applicable characters to HTML entities: UTF-8 version of htmlentities().

EXAMPLE: UTF8::htmlentities('<白-öäü>'); // '<白-öäü>'

**Parameters:**
- `string $str


The input string.

`
- `int $flags [optional]


A bitmask of one or more of the following flags, which specify how to handle
quotes, invalid code unit sequences and the used document type. The default is
ENT_COMPAT | ENT_HTML401.


Available flags constants

Constant Name
Description

ENT_COMPAT
Will convert double-quotes and leave single-quotes alone.

ENT_QUOTES
Will convert both double and single quotes.

ENT_NOQUOTES
Will leave both double and single quotes unconverted.

ENT_IGNORE

Silently discard invalid code unit sequences instead of returning
an empty string. Using this flag is discouraged as it
may have security implications.

ENT_SUBSTITUTE

Replace invalid code unit sequences with a Unicode Replacement Character
U+FFFD (UTF-8) or &#38;#FFFD; (otherwise) instead of returning an empty
string.

ENT_DISALLOWED

Replace invalid code points for the given document type with a
Unicode Replacement Character U+FFFD (UTF-8) or &#38;#FFFD;
(otherwise) instead of leaving them as is. This may be useful, for
instance, to ensure the well-formedness of XML documents with
embedded external content.

ENT_HTML401

Handle code as HTML 4.01.

ENT_XML1

Handle code as XML 1.

ENT_XHTML

Handle code as XHTML.

ENT_HTML5

Handle code as HTML 5.

`
- `string $encoding [optional]


Like htmlspecialchars,
htmlentities takes an optional third argument
encoding which defines encoding used in
conversion.
Although this argument is technically optional, you are highly
encouraged to specify the correct value for your code.

`
- `bool $double_encode [optional]


When double_encode is turned off PHP will not
encode existing html entities. The default is to convert everything.

`

**Return:**
- `string


The encoded string.



If the input string contains an invalid code unit
sequence within the given encoding an empty string
will be returned, unless either the ENT_IGNORE or
ENT_SUBSTITUTE flags are set.

`

--------

## htmlspecialchars(string $str, int $flags, string $encoding, bool $double_encode): string

Convert only special characters to HTML entities: UTF-8 version of htmlspecialchars()

INFO: Take a look at "UTF8::htmlentities()"

EXAMPLE: UTF8::htmlspecialchars('<白-öäü>'); // '<白-öäü>'

**Parameters:**
- `T $str


The string being converted.

`
- `int $flags [optional]


A bitmask of one or more of the following flags, which specify how to handle
quotes, invalid code unit sequences and the used document type. The default is
ENT_COMPAT | ENT_HTML401.


Available flags constants

Constant Name
Description

ENT_COMPAT
Will convert double-quotes and leave single-quotes alone.

ENT_QUOTES
Will convert both double and single quotes.

ENT_NOQUOTES
Will leave both double and single quotes unconverted.

ENT_IGNORE

Silently discard invalid code unit sequences instead of returning
an empty string. Using this flag is discouraged as it
may have security implications.

ENT_SUBSTITUTE

Replace invalid code unit sequences with a Unicode Replacement Character
U+FFFD (UTF-8) or &#38;#FFFD; (otherwise) instead of returning an empty
string.

ENT_DISALLOWED

Replace invalid code points for the given document type with a
Unicode Replacement Character U+FFFD (UTF-8) or &#38;#FFFD;
(otherwise) instead of leaving them as is. This may be useful, for
instance, to ensure the well-formedness of XML documents with
embedded external content.

ENT_HTML401

Handle code as HTML 4.01.

ENT_XML1

Handle code as XML 1.

ENT_XHTML

Handle code as XHTML.

ENT_HTML5

Handle code as HTML 5.

`
- `string $encoding [optional]


Defines encoding used in conversion.



For the purposes of this function, the encodings
ISO-8859-1, ISO-8859-15,
UTF-8, cp866,
cp1251, cp1252, and
KOI8-R are effectively equivalent, provided the
string itself is valid for the encoding, as
the characters affected by htmlspecialchars occupy
the same positions in all of these encodings.

`
- `bool $double_encode [optional]


When double_encode is turned off PHP will not
encode existing html entities, the default is to convert everything.

`

**Return:**
- `string

The converted string.



If the input string contains an invalid code unit
sequence within the given encoding an empty string
will be returned, unless either the ENT_IGNORE or
ENT_SUBSTITUTE flags are set.

`

--------

## iconv_loaded(): bool

Checks whether iconv is available on the server.

**Parameters:**
__nothing__

**Return:**
- `bool

true if available, false otherwise

`

--------

## int_to_hex(int $int, string $prefix): string

Converts Integer to hexadecimal U+xxxx code point representation.

INFO: opposite to UTF8::hex_to_int()

EXAMPLE: UTF8::int_to_hex(241); // 'U+00f1'

**Parameters:**
- `int $int

The integer to be converted to hexadecimal code point.

`
- `string $prefix [optional]`

**Return:**
- `string the code point, or empty string on failure`

--------

## intlChar_loaded(): bool

Checks whether intl-char is available on the server.

**Parameters:**
__nothing__

**Return:**
- `bool

true if available, false otherwise

`

--------

## intl_loaded(): bool

Checks whether intl is available on the server.

**Parameters:**
__nothing__

**Return:**
- `bool

true if available, false otherwise

`

--------

## is_alpha(string $str): bool

Returns true if the string contains only alphabetic chars, false otherwise.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not $str contains only alphabetic chars.

`

--------

## is_alphanumeric(string $str): bool

Returns true if the string contains only alphabetic and numeric chars, false otherwise.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not $str contains only alphanumeric chars.

`

--------

## is_ascii(string $str): bool

Checks if a string is 7 bit ASCII.

EXAMPLE: UTF8::is_ascii('白'); // false

**Parameters:**
- `string $str

The string to check.

`

**Return:**
- `bool


true if it is ASCII

false otherwise

`

--------

## is_base64(string|null $str, bool $empty_string_is_valid): bool

Returns true if the string is base64 encoded, false otherwise.

EXAMPLE: UTF8::is_base64('4KSu4KWL4KSo4KS/4KSa'); // true

**Parameters:**
- `string|null $str

The input string.

`
- `bool $empty_string_is_valid [optional]

Is an empty string valid base64 or not?

`

**Return:**
- `bool

Whether or not $str is base64 encoded.

`

--------

## is_binary(int|string $input, bool $strict): bool

Check if the input is binary... (is look like a hack).

EXAMPLE: UTF8::is_binary(01); // true

**Parameters:**
- `int|string $input`
- `bool $strict`

**Return:**
- `bool`

--------

## is_binary_file(string $file): bool

Check if the file is binary.

EXAMPLE: UTF8::is_binary('./utf32.txt'); // true

**Parameters:**
- `string $file`

**Return:**
- `bool`

--------

## is_blank(string $str): bool

Returns true if the string contains only whitespace chars, false otherwise.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not $str contains only whitespace characters.

`

--------

## is_bom(string $str): bool

Checks if the given string is equal to any "Byte Order Mark".

WARNING: Use "UTF8::string_has_bom()" if you will check BOM in a string.

EXAMPLE: UTF8::is_bom("\xef\xbb\xbf"); // true

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

true if the $utf8_chr is Byte Order Mark, false otherwise.

`

--------

## is_empty(array|float|int|string $str): bool

Determine whether the string is considered to be empty.

A variable is considered empty if it does not exist or if its value equals FALSE.
empty() does not generate a warning if the variable does not exist.

**Parameters:**
- `array|float|int|string $str`

**Return:**
- `bool

Whether or not $str is empty().

`

--------

## is_hexadecimal(string $str): bool

Returns true if the string contains only hexadecimal chars, false otherwise.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not $str contains only hexadecimal chars.

`

--------

## is_html(string $str): bool

Check if the string contains any HTML tags.

EXAMPLE: UTF8::is_html('lall'); // true

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not $str contains html elements.

`

--------

## is_json(string $str, bool $only_array_or_object_results_are_valid): bool

Try to check if "$str" is a JSON-string.

EXAMPLE: UTF8::is_json('{"array":[1,"¥","ä"]}'); // true

**Parameters:**
- `string $str

The input string.

`
- `bool $only_array_or_object_results_are_valid [optional]

Only array and objects are valid json
results.

`

**Return:**
- `bool

Whether or not the $str is in JSON format.

`

--------

## is_lowercase(string $str): bool

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not $str contains only lowercase chars.

`

--------

## is_printable(string $str, bool $ignore_control_characters): bool

Returns true if the string contains only printable (non-invisible) chars, false otherwise.

**Parameters:**
- `string $str

The input string.

`
- `bool $ignore_control_characters [optional]

Ignore control characters like [LRM] or [LSEP].

`

**Return:**
- `bool

Whether or not $str contains only printable (non-invisible) chars.

`

--------

## is_punctuation(string $str): bool

Returns true if the string contains only punctuation chars, false otherwise.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not $str contains only punctuation chars.

`

--------

## is_serialized(string $str): bool

Returns true if the string is serialized, false otherwise.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not $str is serialized.

`

--------

## is_uppercase(string $str): bool

Returns true if the string contains only lower case chars, false
otherwise.

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `bool

Whether or not $str contains only lower case characters.

`

--------

## is_url(string $url, bool $disallow_localhost): bool

Check if $url is an correct url.

**Parameters:**
- `string $url`
- `bool $disallow_localhost`

**Return:**
- `bool`

--------

## is_utf8(int|string|string[]|null $str, bool $strict): bool

Checks whether the passed input contains only byte sequences that appear valid UTF-8.

EXAMPLE:
UTF8::is_utf8(['Iñtërnâtiônàlizætiøn', 'foo']); // true
//
UTF8::is_utf8(["Iñtërnâtiônàlizætiøn\xA0\xA1", 'bar']); // false

**Parameters:**
- `int|string|string[]|null $str

The input to be checked.

`
- `bool $strict

Check also if the string is not UTF-16 or UTF-32.

`

**Return:**
- `bool`

--------

## is_utf16(string $str, bool $check_if_string_is_binary): false|int

Check if the string is UTF-16.

EXAMPLE:
UTF8::is_utf16(file_get_contents('utf-16-le.txt')); // 1
//
UTF8::is_utf16(file_get_contents('utf-16-be.txt')); // 2
//
UTF8::is_utf16(file_get_contents('utf-8.txt')); // false

**Parameters:**
- `string $str

The input string.

`
- `bool $check_if_string_is_binary`

**Return:**
- `false|int false if is't not UTF-16,

1 for UTF-16LE,

2 for UTF-16BE`

--------

## is_utf32(string $str, bool $check_if_string_is_binary): false|int

Check if the string is UTF-32.

EXAMPLE:
UTF8::is_utf32(file_get_contents('utf-32-le.txt')); // 1
//
UTF8::is_utf32(file_get_contents('utf-32-be.txt')); // 2
//
UTF8::is_utf32(file_get_contents('utf-8.txt')); // false

**Parameters:**
- `string $str

The input string.

`
- `bool $check_if_string_is_binary`

**Return:**
- `false|int false if is't not UTF-32,

1 for UTF-32LE,

2 for UTF-32BE`

--------

## json_decode(string $json, bool $assoc, int $depth, int $options): mixed

(PHP 5 >= 5.2.0, PECL json >= 1.2.0)

Decodes a JSON string

EXAMPLE: UTF8::json_decode('[1,"\u00a5","\u00e4"]'); // array(1, '¥', 'ä')

**Parameters:**
- `string $json


The json string being decoded.



This function only works with UTF-8 encoded strings.


PHP implements a superset of
JSON - it will also encode and decode scalar types and NULL. The JSON standard
only supports these values when they are nested inside an array or an object.

`
- `bool $assoc [optional]


When TRUE, returned objects will be converted into
associative arrays.

`
- `int $depth [optional]


User specified recursion depth.

`
- `int $options [optional]


Bitmask of JSON decode options. Currently only
JSON_BIGINT_AS_STRING
is supported (default is to cast large integers as floats)

`

**Return:**
- `mixed

The value encoded in json in appropriate PHP type. Values true, false and
null (case-insensitive) are returned as TRUE, FALSE and NULL respectively.
NULL is returned if the json cannot be decoded or if the encoded data
is deeper than the recursion limit.

`

--------

## json_encode(mixed $value, int $options, int $depth): false|string

(PHP 5 >= 5.2.0, PECL json >= 1.2.0)

Returns the JSON representation of a value.

EXAMPLE: UTF8::json_encode(array(1, '¥', 'ä')); // '[1,"\u00a5","\u00e4"]'

**Parameters:**
- `mixed $value


The value being encoded. Can be any type except
a resource.



All string data must be UTF-8 encoded.


PHP implements a superset of
JSON - it will also encode and decode scalar types and NULL. The JSON standard
only supports these values when they are nested inside an array or an object.

`
- `int $options [optional]


Bitmask consisting of JSON_HEX_QUOT,
JSON_HEX_TAG,
JSON_HEX_AMP,
JSON_HEX_APOS,
JSON_NUMERIC_CHECK,
JSON_PRETTY_PRINT,
JSON_UNESCAPED_SLASHES,
JSON_FORCE_OBJECT,
JSON_UNESCAPED_UNICODE. The behaviour of these
constants is described on
the JSON constants page.

`
- `int $depth [optional]


Set the maximum depth. Must be greater than zero.

`

**Return:**
- `false|string

A JSON encoded string on success or

FALSE on failure.

`

--------

## json_loaded(): bool

Checks whether JSON is available on the server.

**Parameters:**
__nothing__

**Return:**
- `bool

true if available, false otherwise

`

--------

## lcfirst(string $str, string $encoding, bool $clean_utf8, string|null $lang, bool $try_to_keep_the_string_length): string

Makes string's first char lowercase.

EXAMPLE: UTF8::lcfirst('ÑTËRNÂTIÔNÀLIZÆTIØN'); // ñTËRNÂTIÔNÀLIZÆTIØN

**Parameters:**
- `string $str

The input string

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`
- `bool $clean_utf8 [optional]

Remove non UTF-8 chars from the string.

`
- `string|null $lang [optional]

Set the language for special cases: az, el, lt,
tr

`
- `bool $try_to_keep_the_string_length [optional]

true === try to keep the string length: e.g. ẞ
-> ß

`

**Return:**
- `string

The resulting string.

`

--------

## lcwords(string $str, string[] $exceptions, string $char_list, string $encoding, bool $clean_utf8, string|null $lang, bool $try_to_keep_the_string_length): string

Lowercase for all words in the string.

**Parameters:**
- `string $str

The input string.

`
- `string[] $exceptions [optional]

Exclusion for some words.

`
- `string $char_list [optional]

Additional chars that contains to words and do
not start a new word.

`
- `string $encoding [optional]

Set the charset.

`
- `bool $clean_utf8 [optional]

Remove non UTF-8 chars from the string.

`
- `string|null $lang [optional]

Set the language for special cases: az, el, lt,
tr

`
- `bool $try_to_keep_the_string_length [optional]

true === try to keep the string length: e.g. ẞ
-> ß

`

**Return:**
- `string`

--------

## levenshtein(string $str1, string $str2, int $insertionCost, int $replacementCost, int $deletionCost): int

Calculate Levenshtein distance between two strings.

For better performance, in a real application with a single input string
matched against many strings from a database, you will probably want to pre-
encode the input only once and use \levenshtein().

Source: https://github.com/KEINOS/mb_levenshtein

**Parameters:**
- `string $str1

One of the strings being evaluated for Levenshtein distance.

`
- `string $str2

One of the strings being evaluated for Levenshtein distance.

`
- `int $insertionCost [optional]

Defines the cost of insertion.

`
- `int $replacementCost [optional]

Defines the cost of replacement.

`
- `int $deletionCost [optional]

Defines the cost of deletion.

`

**Return:**
- `int`

--------

## ltrim(string $str, string|null $chars): string

Strip whitespace or other characters from the beginning of a UTF-8 string.

EXAMPLE: UTF8::ltrim(' 中文空白  '); // '中文空白  '

**Parameters:**
- `string $str

The string to be trimmed

`
- `string|null $chars

Optional characters to be stripped

`

**Return:**
- `string the string with unwanted characters stripped from the left`

--------

## max(string|string[] $arg): string|null

Returns the UTF-8 character with the maximum code point in the given data.

EXAMPLE: UTF8::max('abc-äöü-中文空白'); // 'ø'

**Parameters:**
- `string|string[] $arg

A UTF-8 encoded string or an array of such strings.

`

**Return:**
- `string|null the character with the highest code point than others, returns null on failure or empty input`

--------

## max_chr_width(string $str): int

Calculates and returns the maximum number of bytes taken by any
UTF-8 encoded character in the given string.

EXAMPLE: UTF8::max_chr_width('Intërnâtiônàlizætiøn'); // 2

**Parameters:**
- `string $str

The original Unicode string.

`

**Return:**
- `int

Max byte lengths of the given chars.

`

--------

## mbstring_loaded(): bool

Checks whether mbstring is available on the server.

**Parameters:**
__nothing__

**Return:**
- `bool

true if available, false otherwise

`

--------

## min(string|string[] $arg): string|null

Returns the UTF-8 character with the minimum code point in the given data.

EXAMPLE: UTF8::min('abc-äöü-中文空白'); // '-'

**Parameters:**
- `string|string[] $arg A UTF-8 encoded string or an array of such strings.`

**Return:**
- `string|null

The character with the lowest code point than others, returns null on failure or empty input.

`

--------

## normalize_encoding(mixed $encoding, mixed $fallback): mixed|string

Normalize the encoding-"name" input.

EXAMPLE: UTF8::normalize_encoding('UTF8'); // 'UTF-8'

**Parameters:**
- `mixed $encoding

e.g.: ISO, UTF8, WINDOWS-1251 etc.

`
- `string|TNormalizeEncodingFallback $fallback

e.g.: UTF-8

`

**Return:**
- `mixed|string

e.g.: ISO-8859-1, UTF-8, WINDOWS-1251 etc.
Will return a empty string as fallback (by default)

`

--------

## normalize_line_ending(string $str, string|string[] $replacer): string

Standardize line ending to unix-like.

**Parameters:**
- `string $str

The input string.

`
- `string|string[] $replacer

The replacer char e.g. "\n" (Linux) or "\r\n" (Windows). You can also use \PHP_EOL
here.

`

**Return:**
- `string

A string with normalized line ending.

`

--------

## normalize_msword(string $str): string

Normalize some MS Word special characters.

EXAMPLE: UTF8::normalize_msword('„Abcdef…”'); // '"Abcdef..."'

**Parameters:**
- `string $str

The string to be normalized.

`

**Return:**
- `string

A string with normalized characters for commonly used chars in Word documents.

`

--------

## normalize_whitespace(string $str, bool $keep_non_breaking_space, bool $keep_bidi_unicode_controls, bool $normalize_control_characters): string

Normalize the whitespace.

EXAMPLE: UTF8::normalize_whitespace("abc-\xc2\xa0-öäü-\xe2\x80\xaf-\xE2\x80\xAC", true); // "abc-\xc2\xa0-öäü- -"

**Parameters:**
- `string $str

The string to be normalized.

`
- `bool $keep_non_breaking_space [optional]

Set to true, to keep non-breaking-spaces.

`
- `bool $keep_bidi_unicode_controls [optional]

Set to true, to keep non-printable (for the web)
bidirectional text chars.

`
- `bool $normalize_control_characters [optional]

Set to true, to convert e.g. LINE-, PARAGRAPH-SEPARATOR with "\n" and LINE TABULATION with "\t".

`

**Return:**
- `string

A string with normalized whitespace.

`

--------

## ord(string $chr, string $encoding): int

Calculates Unicode code point of the given UTF-8 encoded character.

INFO: opposite to UTF8::chr()

EXAMPLE: UTF8::ord('☃'); // 0x2603

**Parameters:**
- `string $chr

The character of which to calculate code point.

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`

**Return:**
- `int

Unicode code point of the given character,

0 on invalid UTF-8 byte sequence

`

--------

## parse_str(string $str, array $result, bool $clean_utf8): bool

Parses the string into an array (into the the second parameter).

WARNING: Unlike "parse_str()", this method does not (re-)place variables in the current scope,
if the second parameter is not set!

EXAMPLE:
UTF8::parse_str('Iñtërnâtiônéàlizætiøn=測試&arr[]=foo+測試&arr[]=ການທົດສອບ', $array);
echo $array['Iñtërnâtiônéàlizætiøn']; // '測試'

**Parameters:**
- `string $str

The input string.

`
- `array $result

The result will be returned into this reference parameter.

`
- `bool $clean_utf8 [optional]

Remove non UTF-8 chars from the string.

`

**Return:**
- `bool

Will return false if php can't parse the string and we haven't any $result.

`

--------

## pcre_utf8_support(): bool

Checks if \u modifier is available that enables Unicode support in PCRE.

**Parameters:**
__nothing__

**Return:**
- `bool


true if support is available,

false otherwise

`

--------

## range(int|string $var1, int|string $var2, bool $use_ctype, string $encoding, float|int $step): list

Create an array containing a range of UTF-8 characters.

EXAMPLE: UTF8::range('κ', 'ζ'); // array('κ', 'ι', 'θ', 'η', 'ζ',)

**Parameters:**
- `int|string $var1

Numeric or hexadecimal code points, or a UTF-8 character to start from.

`
- `int|string $var2

Numeric or hexadecimal code points, or a UTF-8 character to end at.

`
- `bool $use_ctype

use ctype to detect numeric and hexadecimal, otherwise we will use a simple
"is_numeric"

`
- `string $encoding [optional]

Set the charset for e.g. "mb_" function

`
- `float|int $step [optional]


If a step value is given, it will be used as the
increment between elements in the sequence. step
should be given as a positive number. If not specified,
step will default to 1.

`

**Return:**
- `list`

--------

## rawurldecode(string $str, bool $multi_decode): string

Multi decode HTML entity + fix urlencoded-win1252-chars.

EXAMPLE: UTF8::rawurldecode('tes%20öäü%20\u00edtest+test'); // 'tes öäü ítest+test'

e.g:
'test+test' => 'test+test'
'Düsseldorf' => 'Düsseldorf'
'D%FCsseldorf' => 'Düsseldorf'
'Düsseldorf' => 'Düsseldorf'
'D%26%23xFC%3Bsseldorf' => 'Düsseldorf'
'Düsseldorf' => 'Düsseldorf'
'D%C3%BCsseldorf' => 'Düsseldorf'
'D%C3%83%C2%BCsseldorf' => 'Düsseldorf'
'D%25C3%2583%25C2%25BCsseldorf' => 'Düsseldorf'

**Parameters:**
- `T $str

The input string.

`
- `bool $multi_decode

Decode as often as possible.

`

**Return:**
- `string

The decoded URL, as a string.

`

--------

## regex_replace(string $str, string $pattern, string $replacement, string $options, string $delimiter): string

Replaces all occurrences of $pattern in $str by $replacement.

**Parameters:**
- `string $str

The input string.

`
- `string $pattern

The regular expression pattern.

`
- `string $replacement

The string to replace with.

`
- `string $options [optional]

Matching conditions to be used.

`
- `string $delimiter [optional]

Delimiter the the regex. Default: '/'

`

**Return:**
- `string`

--------

## remove_bom(string $str): string

Remove the BOM from UTF-8 / UTF-16 / UTF-32 strings.

EXAMPLE: UTF8::remove_bom("\xEF\xBB\xBFΜπορώ να"); // 'Μπορώ να'

**Parameters:**
- `string $str

The input string.

`

**Return:**
- `string

A string without UTF-BOM.

`

--------

## remove_duplicates(string $str, string|string[] $what): string

Removes duplicate occurrences of a string in another string.

EXAMPLE: UTF8::remove_duplicates('öäü-κόσμεκόσμε-äöü', 'κόσμε'); // 'öäü-κόσμε-äöü'

**Parameters:**
- `string $str

The base string.

`
- `string|string[] $what

String to search for in the base string.

`

**Return:**
- `string

A string with removed duplicates.

`

--------

## remove_html(string $str, string $allowable_tags): string

Remove html via "strip_tags()" from the string.

**Parameters:**
- `string $str

The input string.

`
- `string $allowable_tags [optional]

You can use the optio