utf8proc

@mainpage

utf8proc is a free/open-source (MIT/expat licensed) C library providing Unicode normalization, case-folding, and other operations for strings in the UTF-8 encoding, supporting up-to-date Unicode versions. See the utf8proc home page (http://julialang.org/utf8proc/) for downloads and other information, or the source code on github (https://github.com/JuliaLang/utf8proc).

For the utf8proc API documentation, see: @ref utf8proc.h

The features of utf8proc include:

- Transformation of strings (@ref utf8proc_map) to: - decompose (@ref UTF8PROC_DECOMPOSE) or compose (@ref UTF8PROC_COMPOSE) Unicode combining characters (http://en.wikipedia.org/wiki/Combining_character) - canonicalize Unicode compatibility characters (@ref UTF8PROC_COMPAT) - strip "ignorable" (@ref UTF8PROC_IGNORE) characters, control characters (@ref UTF8PROC_STRIPCC), or combining characters such as accents (@ref UTF8PROC_STRIPMARK) - case-folding (@ref UTF8PROC_CASEFOLD) - Unicode normalization: @ref utf8proc_NFD, @ref utf8proc_NFC, @ref utf8proc_NFKD, @ref utf8proc_NFKC - Detecting grapheme boundaries (@ref utf8proc_grapheme_break and @ref UTF8PROC_CHARBOUND) - Character-width computation: @ref utf8proc_charwidth - Classification of characters by Unicode category: @ref utf8proc_category and @ref utf8proc_category_string - Encode (@ref utf8proc_encode_char) and decode (@ref utf8proc_iterate) Unicode codepoints to/from UTF-8.

Members

Aliases

String
alias String = dString!aumem
Undocumented in source.
utf8proc_bidi_class_t
alias utf8proc_bidi_class_t = int

Bidirectional character classes.

utf8proc_bool
alias utf8proc_bool = bool
Undocumented in source.
utf8proc_boundclass_t
alias utf8proc_boundclass_t = int

Boundclass property. (TR29)

utf8proc_category_t
alias utf8proc_category_t = int

Unicode categories.

utf8proc_custom_func
alias utf8proc_custom_func = int function(utf8proc_int32_t codepoint, void* data)

Function pointer type passed to @ref utf8proc_map_custom and @ref utf8proc_decompose_custom, which is used to specify a user-defined mapping of codepoints to be applied in conjunction with other mappings.

utf8proc_decomp_type_t
alias utf8proc_decomp_type_t = int

Decomposition type.

utf8proc_int16_t
alias utf8proc_int16_t = short
Undocumented in source.
utf8proc_int32_t
alias utf8proc_int32_t = int
Undocumented in source.
utf8proc_int8_t
alias utf8proc_int8_t = byte

@}

utf8proc_option_t
alias utf8proc_option_t = int

Option flags used by several functions in the library.

utf8proc_property_t
alias utf8proc_property_t = utf8proc_property_struct
Undocumented in source.
utf8proc_propval_t
alias utf8proc_propval_t = short

Holds the value of a property.

utf8proc_size_t
alias utf8proc_size_t = ulong
Undocumented in source.
utf8proc_ssize_t
alias utf8proc_ssize_t = long
Undocumented in source.
utf8proc_uint16_t
alias utf8proc_uint16_t = ushort
Undocumented in source.
utf8proc_uint32_t
alias utf8proc_uint32_t = uint
Undocumented in source.
utf8proc_uint8_t
alias utf8proc_uint8_t = ubyte
Undocumented in source.

Enums

UTF8PROC_BIDI_CLASS_L
anonymousenum UTF8PROC_BIDI_CLASS_L
Undocumented in source.
UTF8PROC_BOUNDCLASS_START
anonymousenum UTF8PROC_BOUNDCLASS_START
Undocumented in source.
UTF8PROC_CATEGORY_CN
anonymousenum UTF8PROC_CATEGORY_CN
Undocumented in source.
UTF8PROC_DECOMP_TYPE_FONT
anonymousenum UTF8PROC_DECOMP_TYPE_FONT
Undocumented in source.
UTF8PROC_NULLTERM
anonymousenum UTF8PROC_NULLTERM
Undocumented in source.

Functions

utf8proc_NFC
utf8proc_uint8_t* utf8proc_NFC(utf8proc_uint8_t* str)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_NFD
utf8proc_uint8_t* utf8proc_NFD(utf8proc_uint8_t* str)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_NFKC
utf8proc_uint8_t* utf8proc_NFKC(utf8proc_uint8_t* str)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_NFKC_Casefold
utf8proc_uint8_t* utf8proc_NFKC_Casefold(utf8proc_uint8_t* str)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_NFKD
utf8proc_uint8_t* utf8proc_NFKD(utf8proc_uint8_t* str)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_category
utf8proc_category_t utf8proc_category(utf8proc_int32_t c)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_category_string
const(char)* utf8proc_category_string(utf8proc_int32_t c)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_charwidth
int utf8proc_charwidth(utf8proc_int32_t c)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_codepoint_valid
utf8proc_bool utf8proc_codepoint_valid(utf8proc_int32_t uc)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_decompose
utf8proc_ssize_t utf8proc_decompose(utf8proc_uint8_t* str, utf8proc_ssize_t strlen, utf8proc_int32_t* buffer, utf8proc_ssize_t bufsize, utf8proc_option_t options)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_decompose_char
utf8proc_ssize_t utf8proc_decompose_char(utf8proc_int32_t uc, utf8proc_int32_t* dst, utf8proc_ssize_t bufsize, utf8proc_option_t options, int* last_boundclass)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_decompose_custom
utf8proc_ssize_t utf8proc_decompose_custom(utf8proc_uint8_t* str, utf8proc_ssize_t strlen, utf8proc_int32_t* buffer, utf8proc_ssize_t bufsize, utf8proc_option_t options, utf8proc_custom_func custom_func, void* custom_data)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_encode_char
utf8proc_ssize_t utf8proc_encode_char(utf8proc_int32_t uc, utf8proc_uint8_t* dst)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_errmsg
String utf8proc_errmsg(utf8proc_ssize_t errcode)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_get_property
const(utf8proc_property_t)* utf8proc_get_property(utf8proc_int32_t uc)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_grapheme_break
utf8proc_bool utf8proc_grapheme_break(utf8proc_int32_t c1, utf8proc_int32_t c2)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_grapheme_break_stateful
utf8proc_bool utf8proc_grapheme_break_stateful(utf8proc_int32_t c1, utf8proc_int32_t c2, utf8proc_int32_t* state)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_iterate
utf8proc_ssize_t utf8proc_iterate(utf8proc_uint8_t* str, utf8proc_ssize_t strlen, utf8proc_int32_t* dst)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_map
utf8proc_ssize_t utf8proc_map(utf8proc_uint8_t* str, utf8proc_ssize_t strlen, utf8proc_uint8_t** dstptr, utf8proc_option_t options)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_map_custom
utf8proc_ssize_t utf8proc_map_custom(utf8proc_uint8_t* str, utf8proc_ssize_t strlen, utf8proc_uint8_t** dstptr, utf8proc_option_t options, utf8proc_custom_func custom_func, void* custom_data)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_normalize_utf32
utf8proc_ssize_t utf8proc_normalize_utf32(utf8proc_int32_t* buffer, utf8proc_ssize_t length, utf8proc_option_t options)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_reencode
utf8proc_ssize_t utf8proc_reencode(utf8proc_int32_t* buffer, utf8proc_ssize_t length, utf8proc_option_t options)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_tolower
utf8proc_int32_t utf8proc_tolower(utf8proc_int32_t c)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_totitle
utf8proc_int32_t utf8proc_totitle(utf8proc_int32_t c)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_toupper
utf8proc_int32_t utf8proc_toupper(utf8proc_int32_t c)
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_unicode_version
String utf8proc_unicode_version()
Undocumented in source. Be warned that the author may not have intended to support it.
utf8proc_version
String utf8proc_version()
Undocumented in source. Be warned that the author may not have intended to support it.

Manifest constants

SSIZE_MAX
enum SSIZE_MAX;
Undocumented in source.
UINT16_MAX
enum UINT16_MAX;
Undocumented in source.
UTF8PROC_ERROR_INVALIDOPTS
enum UTF8PROC_ERROR_INVALIDOPTS;

Invalid options have been used.

UTF8PROC_ERROR_INVALIDUTF8
enum UTF8PROC_ERROR_INVALIDUTF8;

The given string is not a legal UTF-8 string.

UTF8PROC_ERROR_NOMEM
enum UTF8PROC_ERROR_NOMEM;

Memory could not be allocated.

UTF8PROC_ERROR_NOTASSIGNED
enum UTF8PROC_ERROR_NOTASSIGNED;

The @ref UTF8PROC_REJECTNA flag was set and an unassigned codepoint was found.

UTF8PROC_ERROR_OVERFLOW
enum UTF8PROC_ERROR_OVERFLOW;

The given string is too long to be processed.

UTF8PROC_HANGUL_LBASE
enum UTF8PROC_HANGUL_LBASE;
Undocumented in source.
UTF8PROC_HANGUL_LCOUNT
enum UTF8PROC_HANGUL_LCOUNT;
Undocumented in source.
UTF8PROC_HANGUL_L_END
enum UTF8PROC_HANGUL_L_END;
Undocumented in source.
UTF8PROC_HANGUL_L_FILLER
enum UTF8PROC_HANGUL_L_FILLER;
Undocumented in source.
UTF8PROC_HANGUL_L_START
enum UTF8PROC_HANGUL_L_START;
Undocumented in source.
UTF8PROC_HANGUL_NCOUNT
enum UTF8PROC_HANGUL_NCOUNT;
Undocumented in source.
UTF8PROC_HANGUL_SBASE
enum UTF8PROC_HANGUL_SBASE;
Undocumented in source.
UTF8PROC_HANGUL_SCOUNT
enum UTF8PROC_HANGUL_SCOUNT;
Undocumented in source.
UTF8PROC_HANGUL_S_END
enum UTF8PROC_HANGUL_S_END;
Undocumented in source.
UTF8PROC_HANGUL_S_START
enum UTF8PROC_HANGUL_S_START;
Undocumented in source.
UTF8PROC_HANGUL_TBASE
enum UTF8PROC_HANGUL_TBASE;
Undocumented in source.
UTF8PROC_HANGUL_TCOUNT
enum UTF8PROC_HANGUL_TCOUNT;
Undocumented in source.
UTF8PROC_HANGUL_T_END
enum UTF8PROC_HANGUL_T_END;
Undocumented in source.
UTF8PROC_HANGUL_T_START
enum UTF8PROC_HANGUL_T_START;
Undocumented in source.
UTF8PROC_HANGUL_VBASE
enum UTF8PROC_HANGUL_VBASE;
Undocumented in source.
UTF8PROC_HANGUL_VCOUNT
enum UTF8PROC_HANGUL_VCOUNT;
Undocumented in source.
UTF8PROC_HANGUL_V_END
enum UTF8PROC_HANGUL_V_END;
Undocumented in source.
UTF8PROC_HANGUL_V_START
enum UTF8PROC_HANGUL_V_START;
Undocumented in source.
UTF8PROC_VERSION_MAJOR
enum UTF8PROC_VERSION_MAJOR;

The MAJOR version number (increased when backwards API compatibility is broken).

UTF8PROC_VERSION_MINOR
enum UTF8PROC_VERSION_MINOR;

The MINOR version number (increased when new functionality is added in a backwards-compatible manner).

UTF8PROC_VERSION_PATCH
enum UTF8PROC_VERSION_PATCH;

The PATCH version (increased for fixes that do not change the API).

_false
enum _false;
Undocumented in source.
_true
enum _true;
Undocumented in source.

Static functions

charbound_encode_char
utf8proc_ssize_t charbound_encode_char(utf8proc_int32_t uc, utf8proc_uint8_t* dst)
Undocumented in source. Be warned that the author may not have intended to support it.
grapheme_break_extended
utf8proc_bool grapheme_break_extended(int lbc, int tbc, utf8proc_int32_t* state)
Undocumented in source. Be warned that the author may not have intended to support it.
grapheme_break_simple
utf8proc_bool grapheme_break_simple(int lbc, int tbc)
Undocumented in source. Be warned that the author may not have intended to support it.
seqindex_decode_entry
utf8proc_int32_t seqindex_decode_entry(utf8proc_uint16_t** entry)
Undocumented in source. Be warned that the author may not have intended to support it.
seqindex_decode_index
utf8proc_int32_t seqindex_decode_index(utf8proc_uint32_t seqindex)
Undocumented in source. Be warned that the author may not have intended to support it.
seqindex_write_char_decomposed
utf8proc_ssize_t seqindex_write_char_decomposed(utf8proc_uint16_t seqindex, utf8proc_int32_t* dst, utf8proc_ssize_t bufsize, utf8proc_option_t options, int* last_boundclass)
Undocumented in source. Be warned that the author may not have intended to support it.
unsafe_get_property
const(utf8proc_property_t)* unsafe_get_property(utf8proc_int32_t uc)
Undocumented in source. Be warned that the author may not have intended to support it.
utf_cont
bool utf_cont(T ch)
Undocumented in source. Be warned that the author may not have intended to support it.

Static variables

s__
const(char)[3][30] s__;
Undocumented in source.

Structs

utf8proc_property_struct
struct utf8proc_property_struct

Struct containing information about a codepoint.

Variables

utf8proc_utf8class
utf8proc_int8_t[256] utf8proc_utf8class;
Undocumented in source.

Meta