UTF8PROC_NULLTERM

Undocumented in source.

Values

ValueMeaning
UTF8PROC_NULLTERM1 << 0

The given UTF-8 input is NULL terminated.

UTF8PROC_STABLE1 << 1

Unicode Versioning Stability has to be respected.

UTF8PROC_COMPAT1 << 2

Compatibility decomposition (i.e. formatting information is lost).

UTF8PROC_COMPOSE1 << 3

Return a result with decomposed characters.

UTF8PROC_DECOMPOSE1 << 4

Return a result with decomposed characters.

UTF8PROC_IGNORE1 << 5

Strip "default ignorable characters" such as SOFT-HYPHEN or ZERO-WIDTH-SPACE.

UTF8PROC_REJECTNA1 << 6

Return an error, if the input contains unassigned codepoints.

UTF8PROC_NLF2LS1 << 7

Indicating that NLF-sequences (LF, CRLF, CR, NEL) are representing a line break, and should be converted to the codepoint for line separation (LS).

UTF8PROC_NLF2PS1 << 8

Indicating that NLF-sequences are representing a paragraph break, and should be converted to the codepoint for paragraph separation (PS).

UTF8PROC_NLF2LFUTF8PROC_NLF2LS | UTF8PROC_NLF2PS

Indicating that the meaning of NLF-sequences is unknown.

UTF8PROC_STRIPCC1 << 9

Strips and/or convers control characters.

NLF-sequences are transformed into space, except if one of the NLF2LS/PS/LF options is given. HorizontalTab (HT) and FormFeed (FF) are treated as a NLF-sequence in this case. All other control characters are simply removed.

UTF8PROC_CASEFOLD1 << 10

Performs unicode case folding, to be able to do a case-insensitive string comparison.

UTF8PROC_CHARBOUND1 << 11

Inserts 0xFF bytes at the beginning of each sequence which is representing a single grapheme cluster (see UAX#29).

UTF8PROC_LUMP1 << 12

Lumps certain characters together.

E.g. HYPHEN U+2010 and MINUS U+2212 to ASCII "-". See lump.md for details.

If NLF2LF is set, this includes a transformation of paragraph and line separators to ASCII line-feed (LF).

UTF8PROC_STRIPMARK1 << 13

Strips all character markings.

This includes non-spacing, spacing and enclosing (i.e. accents). @note This option works only with @ref UTF8PROC_COMPOSE or @ref UTF8PROC_DECOMPOSE

UTF8PROC_STRIPNA1 << 14

Strip unassigned codepoints.

Meta