X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;f=utf8%2Fdoc%2Futf8tools.html;fp=utf8%2Fdoc%2Futf8tools.html;h=b7bd5d70b530cfec94aa317dbbde0cba375b56dd;hb=f4eb6ab0f45d0e81abdca3f121a8c91aaa11330d;hp=0000000000000000000000000000000000000000;hpb=ba80113991af72ec053d761276c6c7f4a4035880;p=gedcom-parse.git diff --git a/utf8/doc/utf8tools.html b/utf8/doc/utf8tools.html new file mode 100644 index 0000000..b7bd5d7 --- /dev/null +++ b/utf8/doc/utf8tools.html @@ -0,0 +1,175 @@ +
libutf8tools
is part of the GEDCOM parser library,
+but it can be used in unrelated programs too. It provides some help
+functions for handling UTF-8 encoding. It comes with the following
+installed:libutf8tools.so
', which should be linked in in your programutf8tools.h
', which should be included in the source code of your program++The +first one returns 1 if the given input is a valid UTF-8 string, it returns +0 otherwise, the second gives the number of UTF-8 characters in the given +input. Note that the second function assumes that the input is valid +UTF-8, and gives unpredictable results if it isn't.int is_utf8_string (char *input);
int utf8_strlen (char *input);
++The first function returns a conversion handle, which needs to be passed +to all generic conversion functions. Through this handle, bidirectional +conversion can take place between UTF-8 and the given character setconvert_t initialize_utf8_conversion (const char *charset, int ext_outbuf);
void cleanup_utf8_conversion (convert_t conv);
'charset'
.
+ The implementation of this handle is not visible to the program that
+uses it. In case of an error, the returned value is NULL and errno
gives the error that occurred.ext_outbuf
should be non-zero if you want
+to control the output buffer yourself (see below). For normal circumstances,
+you should pass 0 for this parameter.cleanup_utf8_conversion
function. Note that after using this function, any access to the handle will result in undefined behaviour.++All three functions take the conversion handle as first parameter, and the +text to convert as second parameter. They return a pointer to an output +buffer, which is overwritten at each call of the functions (unless you control +your own output buffers, see below).char* convert_from_utf8 (convert_t conv, const char* input, int* conv_fails, size_t* output_len);
char* convert_to_utf8 (convert_t conv, const char* input, size_t input_len);
char* convert_to_utf8_incremental (convert_t conv, const char* input, size_t input_len);
convert_to_utf8
converts only entire strings (i.e. it resets the conversion state each time), whereas convert_to_utf8_incremental
+takes previous conversions into account for the current conversion (left
+over input characters from the previous conversion can then be combined with
+the current input characters). If you pass NULL
as input to convert_to_utf8_incremental
, the conversion restarts from a clean state.convert_from_utf8
has a third parameter, conv_fails
,
+which can return the number of conversion failures in the input. Pass
+a pointer to an integer if you're interested, or pass NULL otherwise. Note
+that for conversion failures the string '?' will be put in the output instead
+of the character that could not be converted. This string can be changed
+using:++Some character sets use wide characters to encode text. But since the +conversion functions above for simplicity all need and return normal+int conversion_set_unknown (convert_t conv, const char *unknown);
char
+strings, it is necessary to know in some cases how long the strings are (if
+the string is actually using wide characters, then it cannot be considered
+a null-terminated string, so strlen
cannot work on it). convert_from_utf8
has a fourth
+parameter which can return the length of the output string (pass NULL if
+you know you don't need it), and the other functions have an input_len
parameter, which should always be the string length of the input
string, even if it could also be retrieved via strlen.++The first function returns a handle to a new conversion buffer with given +initial size (the buffer is expanded dynamically when necessary). The +second function frees the buffer: all further access to the buffer handle +will result in undefined behaviour.conv_buffer_t create_conv_buffer (int initial_size);
void free_conv_buffer (conv_buffer_t buf);
int conversion_set_output_buffer (convert_t conv, conv_buffer_t buf);
+++char *convert_utf8_to_locale (char *input, int *conv_failures);
char *convert_locale_to_utf8 (char *input);
++ + Both functions return a pointer to a static buffer that is overwritten + on each call. To function properly, the application must first set +the locale using the
setlocale
function.
+ NULL
if you are not interested
+(note that usually, the interesting information is just whether there
+were conversion failures or not, which is then given by the integer
+being bigger than zero or not). The second function doesn't need this,
+because any locale can be converted to UTF-8.++ + ++void convert_set_unknown (const char *unknown);
$Id$+ + +
$Name$
+ + +