From: Peter Verthez <Peter.Verthez@advalvas.be>
Date: Fri, 22 Nov 2002 21:32:34 +0000 (+0000)
Subject: Updates for string and UTF-8 functions.
X-Git-Url: https://git.dlugolecki.net.pl/?a=commitdiff_plain;h=8c784f82961c03c9e7d209a4249f2590cfaaf4c9;p=gedcom-parse.git

Updates for string and UTF-8 functions.
---

diff --git a/doc/gom.html b/doc/gom.html
index f36dbd0..a0829bc 100644
--- a/doc/gom.html
+++ b/doc/gom.html
@@ -134,16 +134,16 @@ or locale-defined strings.<br>
 <br>
 The following functions retrieve and set the string in UTF-8 encoding:<br>
 <blockquote><code>char* <b>gom_get_string</b> (char* data);<br>
-char* <b>gom_set_string</b> (char** data, const char* utf8_str);</code><br>
+char* <b>gom_set_string</b> (char** data, const char* str_in_utf8);</code><br>
 </blockquote>
 The first function is in fact superfluous, because it just returns the <code>data</code>, but it is there for symmetry with the functions given below for the locale-defined input and output. &nbsp;<br>
 <br>
 The second function returns the new value if successful, or <code>NULL</code>
-if an error occurred (e.g. failure to allocate memory). &nbsp;It makes a
+if an error occurred (e.g. failure to allocate memory or the given string is not a valid UTF-8 string). &nbsp;It makes a
 copy of the input string to store it in the object model. &nbsp;It also takes
 care of deallocating the old value of the data if needed. &nbsp;Note that
 the set function needs the address of the data variable, to be able to modify
-it.<br>
+it. &nbsp;In the case of an error, the target data variable is not modified.<br>
 <br>
 Examples of use of these strings would be, e.g. for retrieving and setting the system ID in the header:<br>
 <blockquote><code>struct header* head = gom_get_header();</code><code></code><br>
@@ -157,16 +157,18 @@ char* newvalue = "My_Gedcom_Tool";<br>
 <br>
 A second couple of functions retrieve and set the string in the format defined by the current locale:<br>
 <blockquote><code>char* <b>gom_get_string_for_locale</b> (char* data, int* conversion_failures);<br>
-char* <b>gom_set_string_for_locale</b> (char** data, const char* locale_str)</code>;<br>
+char* <b>gom_set_string_for_locale</b> (char** data, const char* str_in_locale)</code>;<br>
 </blockquote>
 The use of these functions is the same as the previous ones, but e.g. in
 the "en_US" locale the string will be returned by the first function in the
-ISO-8859-1 encoding and the second function expects the <code>locale_str</code> to be in this encoding. &nbsp;Conversion to and from UTF-8 for the object model is done on the fly.<br>
+ISO-8859-1 encoding and the second function expects the <code>str_in_locale</code> to be in this encoding. &nbsp;Conversion to and from UTF-8 for the object model is done on the fly.<br>
 <br>
 Since the conversion from UTF-8 to the locale encoding is not always possible,
 the get function has a second parameter that can return the number of conversion
 failures for the result string. &nbsp;Pass a pointer to an integer if you
-want to know this. &nbsp;You can pass <code>NULL</code> if you're not interested.<br>
+want to know this. &nbsp;You can pass <code>NULL</code> if you're not interested. &nbsp;The function returns <code>NULL</code>
+if an error occurred (e.g. if the given string is not a valid string for
+the current locale); in that case the target data variable is not modified.<br>
 <hr width="100%" size="2">
 <pre><font size="-1">$Id$<br>$Name$</font><br></pre>
 
@@ -179,4 +181,6 @@ want to know this. &nbsp;You can pass <code>NULL</code> if you're not interested
 <br>
 <br>
 <br>
+<br>
+<br>
 </body></html>
\ No newline at end of file
diff --git a/doc/usage.html b/doc/usage.html
index 00bddef..0a8c7e8 100644
--- a/doc/usage.html
+++ b/doc/usage.html
@@ -516,17 +516,29 @@ controls the <code>gettext</code>  mechanism in the application. &nbsp;<br>
                        <br>
                                                                         
                                         The source distribution of <code>
-gedcom-parse</code>   contains an example implementation (<code>utf8-locale.c</code>
- and <code>  utf8-locale.h</code>  in the "t" subdirectory of the top directory).&nbsp;
-&nbsp;Feel free to use  it in your source code (it is not part of the library,
-and it isn't installed  anywhere, so you need to take over the source and
-header file in your application).  &nbsp;<br>
+gedcom-parse</code>   contains an a library implementing help functions for UTF-8 encoding (<code></code>see
+the "utf8" subdirectory of the top directory).&nbsp; &nbsp;Feel free to use
+ it in your source code. &nbsp;It isn't installed  anywhere, so you need
+to take over the source and header files in your application. Note that on
+some systems it uses libcharset, which is also included in this subdirectory.
+ &nbsp;<br>
                        <br>
-    Its interface is:<br>
+    Its interface contains first of all the following two help functions:<br>
                          
 <blockquote>      
-  <pre><code>char *<b>convert_utf8_to_locale</b> (char *input, int *conv_failures);<br>char *<b>convert_locale_to_utf8</b> (char *input);<br></code></pre>
+  <pre><code>int   <b>is_utf8_string</b> (char *input);<br>int   <b>utf8_strlen</b> (char *input);<br></code></pre></blockquote>The
+first one returns 1 if the given input is a valid UTF-8 string, it returns
+0 otherwise, the second gives the number of UTF-8 characters in the given
+input. &nbsp;Note that the second function assumes that the input is valid
+UTF-8, and gives unpredictable results if it isn't.<br>
+<br>
+For conversion, the following functions are available:<br>
+<blockquote>
+  <pre><code></code><code>char *<b>convert_utf8_to_locale</b> (char *input, int *conv_failures);<br>char *<b>convert_locale_to_utf8</b> (char *input);<br></code></pre>
+</blockquote>
+<blockquote>
   </blockquote>
+
     Both functions return a pointer to a static buffer that is overwritten
  on each call. &nbsp;To function properly, the application must first set
 the locale using the <code>setlocale</code> function (the second step detailed
@@ -674,9 +686,9 @@ handle needs to be closed (when the program exits):<br>
   <blockquote>                                                     
     <pre><code>iconv_close(iconv_handle);<br></code></pre>
                                              </blockquote>
-                                             </blockquote>
-                                                  The example implementation 
- mentioned above grows the output buffer dynamically and outputs "?" for characters
+                                             </blockquote> 
+                                                  The example implementation
+mentioned above grows the output buffer dynamically and outputs "?" for characters 
  that can't be converted.<br>
                                                                         
                          
@@ -730,4 +742,5 @@ There are three preprocessor symbols defined for version checks in the
 <br>
 <br>
 <br>
+<br>
 </body></html>
\ No newline at end of file