From 8c784f82961c03c9e7d209a4249f2590cfaaf4c9 Mon Sep 17 00:00:00 2001 From: Peter Verthez Date: Fri, 22 Nov 2002 21:32:34 +0000 Subject: [PATCH] Updates for string and UTF-8 functions. --- doc/gom.html | 16 ++++++++++------ doc/usage.html | 33 +++++++++++++++++++++++---------- 2 files changed, 33 insertions(+), 16 deletions(-) diff --git a/doc/gom.html b/doc/gom.html index f36dbd0..a0829bc 100644 --- a/doc/gom.html +++ b/doc/gom.html @@ -134,16 +134,16 @@ or locale-defined strings.

The following functions retrieve and set the string in UTF-8 encoding:
char* gom_get_string (char* data);
-char* gom_set_string (char** data, const char* utf8_str);

+char* gom_set_string (char** data, const char* str_in_utf8);
The first function is in fact superfluous, because it just returns the data, but it is there for symmetry with the functions given below for the locale-defined input and output.  

The second function returns the new value if successful, or NULL -if an error occurred (e.g. failure to allocate memory).  It makes a +if an error occurred (e.g. failure to allocate memory or the given string is not a valid UTF-8 string).  It makes a copy of the input string to store it in the object model.  It also takes care of deallocating the old value of the data if needed.  Note that the set function needs the address of the data variable, to be able to modify -it.
+it.  In the case of an error, the target data variable is not modified.

Examples of use of these strings would be, e.g. for retrieving and setting the system ID in the header:
struct header* head = gom_get_header();
@@ -157,16 +157,18 @@ char* newvalue = "My_Gedcom_Tool";

A second couple of functions retrieve and set the string in the format defined by the current locale:
char* gom_get_string_for_locale (char* data, int* conversion_failures);
-char* gom_set_string_for_locale (char** data, const char* locale_str)
;
+char* gom_set_string_for_locale (char** data, const char* str_in_locale);
The use of these functions is the same as the previous ones, but e.g. in the "en_US" locale the string will be returned by the first function in the -ISO-8859-1 encoding and the second function expects the locale_str to be in this encoding.  Conversion to and from UTF-8 for the object model is done on the fly.
+ISO-8859-1 encoding and the second function expects the str_in_locale to be in this encoding.  Conversion to and from UTF-8 for the object model is done on the fly.

Since the conversion from UTF-8 to the locale encoding is not always possible, the get function has a second parameter that can return the number of conversion failures for the result string.  Pass a pointer to an integer if you -want to know this.  You can pass NULL if you're not interested.
+want to know this.  You can pass NULL if you're not interested.  The function returns NULL +if an error occurred (e.g. if the given string is not a valid string for +the current locale); in that case the target data variable is not modified.

$Id$
$Name$

@@ -179,4 +181,6 @@ want to know this.  You can pass NULL if you're not interested


+
+
\ No newline at end of file diff --git a/doc/usage.html b/doc/usage.html index 00bddef..0a8c7e8 100644 --- a/doc/usage.html +++ b/doc/usage.html @@ -516,17 +516,29 @@ controls the gettext mechanism in the application.  

The source distribution of -gedcom-parse contains an example implementation (utf8-locale.c - and utf8-locale.h in the "t" subdirectory of the top directory).  - Feel free to use it in your source code (it is not part of the library, -and it isn't installed anywhere, so you need to take over the source and -header file in your application).  
+gedcom-parse contains an a library implementing help functions for UTF-8 encoding (see +the "utf8" subdirectory of the top directory).   Feel free to use + it in your source code.  It isn't installed anywhere, so you need +to take over the source and header files in your application. Note that on +some systems it uses libcharset, which is also included in this subdirectory. +  

- Its interface is:
+ Its interface contains first of all the following two help functions:
-
char *convert_utf8_to_locale (char *input, int *conv_failures);
char *convert_locale_to_utf8 (char *input);
+
int   is_utf8_string (char *input);
int utf8_strlen (char *input);
The +first one returns 1 if the given input is a valid UTF-8 string, it returns +0 otherwise, the second gives the number of UTF-8 characters in the given +input.  Note that the second function assumes that the input is valid +UTF-8, and gives unpredictable results if it isn't.
+
+For conversion, the following functions are available:
+
+
char *convert_utf8_to_locale (char *input, int *conv_failures);
char *convert_locale_to_utf8 (char *input);
+
+
+ Both functions return a pointer to a static buffer that is overwritten on each call.  To function properly, the application must first set the locale using the setlocale function (the second step detailed @@ -674,9 +686,9 @@ handle needs to be closed (when the program exits):
iconv_close(iconv_handle);
-
- The example implementation - mentioned above grows the output buffer dynamically and outputs "?" for characters + + The example implementation +mentioned above grows the output buffer dynamically and outputs "?" for characters that can't be converted.
@@ -730,4 +742,5 @@ There are three preprocessor symbols defined for version checks in the


+
\ No newline at end of file -- 2.30.2