<br>
The following functions retrieve and set the string in UTF-8 encoding:<br>
<blockquote><code>char* <b>gom_get_string</b> (char* data);<br>
-char* <b>gom_set_string</b> (char** data, const char* utf8_str);</code><br>
+char* <b>gom_set_string</b> (char** data, const char* str_in_utf8);</code><br>
</blockquote>
The first function is in fact superfluous, because it just returns the <code>data</code>, but it is there for symmetry with the functions given below for the locale-defined input and output. <br>
<br>
The second function returns the new value if successful, or <code>NULL</code>
-if an error occurred (e.g. failure to allocate memory). It makes a
+if an error occurred (e.g. failure to allocate memory or the given string is not a valid UTF-8 string). It makes a
copy of the input string to store it in the object model. It also takes
care of deallocating the old value of the data if needed. Note that
the set function needs the address of the data variable, to be able to modify
-it.<br>
+it. In the case of an error, the target data variable is not modified.<br>
<br>
Examples of use of these strings would be, e.g. for retrieving and setting the system ID in the header:<br>
<blockquote><code>struct header* head = gom_get_header();</code><code></code><br>
<br>
A second couple of functions retrieve and set the string in the format defined by the current locale:<br>
<blockquote><code>char* <b>gom_get_string_for_locale</b> (char* data, int* conversion_failures);<br>
-char* <b>gom_set_string_for_locale</b> (char** data, const char* locale_str)</code>;<br>
+char* <b>gom_set_string_for_locale</b> (char** data, const char* str_in_locale)</code>;<br>
</blockquote>
The use of these functions is the same as the previous ones, but e.g. in
the "en_US" locale the string will be returned by the first function in the
-ISO-8859-1 encoding and the second function expects the <code>locale_str</code> to be in this encoding. Conversion to and from UTF-8 for the object model is done on the fly.<br>
+ISO-8859-1 encoding and the second function expects the <code>str_in_locale</code> to be in this encoding. Conversion to and from UTF-8 for the object model is done on the fly.<br>
<br>
Since the conversion from UTF-8 to the locale encoding is not always possible,
the get function has a second parameter that can return the number of conversion
failures for the result string. Pass a pointer to an integer if you
-want to know this. You can pass <code>NULL</code> if you're not interested.<br>
+want to know this. You can pass <code>NULL</code> if you're not interested. The function returns <code>NULL</code>
+if an error occurred (e.g. if the given string is not a valid string for
+the current locale); in that case the target data variable is not modified.<br>
<hr width="100%" size="2">
<pre><font size="-1">$Id$<br>$Name$</font><br></pre>
<br>
<br>
<br>
+<br>
+<br>
</body></html>
\ No newline at end of file
<br>
The source distribution of <code>
-gedcom-parse</code> contains an example implementation (<code>utf8-locale.c</code>
- and <code> utf8-locale.h</code> in the "t" subdirectory of the top directory).
- Feel free to use it in your source code (it is not part of the library,
-and it isn't installed anywhere, so you need to take over the source and
-header file in your application). <br>
+gedcom-parse</code> contains an a library implementing help functions for UTF-8 encoding (<code></code>see
+the "utf8" subdirectory of the top directory). Feel free to use
+ it in your source code. It isn't installed anywhere, so you need
+to take over the source and header files in your application. Note that on
+some systems it uses libcharset, which is also included in this subdirectory.
+ <br>
<br>
- Its interface is:<br>
+ Its interface contains first of all the following two help functions:<br>
<blockquote>
- <pre><code>char *<b>convert_utf8_to_locale</b> (char *input, int *conv_failures);<br>char *<b>convert_locale_to_utf8</b> (char *input);<br></code></pre>
+ <pre><code>int <b>is_utf8_string</b> (char *input);<br>int <b>utf8_strlen</b> (char *input);<br></code></pre></blockquote>The
+first one returns 1 if the given input is a valid UTF-8 string, it returns
+0 otherwise, the second gives the number of UTF-8 characters in the given
+input. Note that the second function assumes that the input is valid
+UTF-8, and gives unpredictable results if it isn't.<br>
+<br>
+For conversion, the following functions are available:<br>
+<blockquote>
+ <pre><code></code><code>char *<b>convert_utf8_to_locale</b> (char *input, int *conv_failures);<br>char *<b>convert_locale_to_utf8</b> (char *input);<br></code></pre>
+</blockquote>
+<blockquote>
</blockquote>
+
Both functions return a pointer to a static buffer that is overwritten
on each call. To function properly, the application must first set
the locale using the <code>setlocale</code> function (the second step detailed
<blockquote>
<pre><code>iconv_close(iconv_handle);<br></code></pre>
</blockquote>
- </blockquote>
- The example implementation
- mentioned above grows the output buffer dynamically and outputs "?" for characters
+ </blockquote>
+ The example implementation
+mentioned above grows the output buffer dynamically and outputs "?" for characters
that can't be converted.<br>
<br>
<br>
<br>
+<br>
</body></html>
\ No newline at end of file