+<h3><a name="Compatibility_mode"></a>Compatibility mode<br>
+ </h3>
+ Applications are not necessarily true to the GEDCOM spec (or use a
+different version than 5.5). The intention is that the library is
+resilient to this, and goes in compatibility mode for files written by specific
+programs (detected via the HEAD.SOUR tag). This compatibility mode
+can be enabled and disabled via the following function:<br>
+
+
+<blockquote><code>void <b>gedcom_set_compat_handling</b> (int enable_compat)</code><br>
+ </blockquote>
+ The argument can be:<br>
+
+
+<ul>
+ <li>0: disable compatibility mode</li>
+ <li>1: allow compatibility mode (this is the
+default)<br>
+ </li>
+
+
+</ul>
+ Note that, currently, no actual compatibility code is present, but
+this is on the to-do list.<br>
+
+<hr width="100%" size="2">
+<h2><a name="Converting_character_sets"></a>Converting character sets</h2>
+ All strings passed by the GEDCOM parser to the application are in UTF-8
+ encoding. Typically, an application needs to convert this to something
+ else to be able to display it.<br>
+ <br>
+ The most common case is that the output character set is controlled by
+the <code>locale</code> mechanism (i.e. via the <code>LANG</code>, <code>
+ LC_ALL</code> or <code>LC_CTYPE</code> environment variables), which also
+controls the <code>gettext</code> mechanism in the application. <br>
+ <br>
+ <br>
+
+ The source distribution of <code>
+gedcom-parse</code> contains an example implementation (<code>utf8-locale.c</code>
+ and <code> utf8-locale.h</code> in the "t" subdirectory of the top directory).
+ Feel free to use it in your source code (it is not part of the library,
+and it isn't installed anywhere, so you need to take over the source and
+header file in your application). <br>
+ <br>
+ Its interface is:<br>
+
+<blockquote>
+ <pre><code>char *<b>convert_utf8_to_locale</b> (char *input, int *conv_failures);<br>char *<b>convert_locale_to_utf8</b> (char *input);<br></code></pre>
+ </blockquote>
+ Both functions return a pointer to a static buffer that is overwritten
+ on each call. To function properly, the application must first set
+the locale using the <code>setlocale</code> function (the second step detailed
+ below). All other steps given below, including setting up and closing
+ down the conversion handles, are transparantly handled by the two functions.
+ <br>
+ <br>
+ If you pass a pointer to an integer to the first function, it will be
+set to the number of conversion failures, i.e. characters that couldn't
+be converted; you can also just pass <code>NULL</code> if you are not interested
+(note that usually, the interesting information is just whether there <i>
+were</i> conversion failures or not, which is then given by the integer
+being bigger than zero or not). The second function doesn't need this,
+because any locale can be converted to UTF-8.<br>
+ <br>
+ You can change the "?" that is output for characters that can't be converted
+ to any string you want, using the following function before the conversion
+ calls:<br>
+
+<blockquote>
+ <pre><code>void <b>convert_set_unknown</b> (const char *unknown);</code></pre>
+ </blockquote>
+ <br>
+ If you want to have your own functions for it instead of this example
+implementation, the following steps need to be taken by the application
+(more detailed info can be found in the info file of the GNU libc library
+in the "Generic Charset Conversion" section under "Character Set Handling"
+or online <a
+ href="http://www.gnu.org/manual/glibc-2.2.3/html_chapter/libc_6.html#SEC99">
+ here</a>):<br>
+
+<ul>
+ <li>inclusion of some headers:</li>
+
+</ul>
+
+<blockquote>
+ <blockquote>
+ <pre><code>#include <locale.h> /* for setlocale */<br>#include <langinfo.h> /* for nl_langinfo */<br>#include <iconv.h> /* for iconv_* functions */<br></code></pre>
+ </blockquote>
+ </blockquote>
+
+<ul>
+ <li>set the program's current locale to what
+the user configured in the environment:</li>
+
+</ul>
+
+<blockquote>
+ <blockquote>
+ <pre><code>setlocale(LC_ALL, "");</code><br></pre>
+ </blockquote>
+ </blockquote>
+
+<ul>
+ <li>open a conversion handle for conversion
+ from UTF-8 to the character set of the current locale (once for the entire
+ program):</li>
+
+</ul>
+
+<blockquote>
+ <blockquote>
+ <pre><code>iconv_t iconv_handle;<br>...<br>iconv_handle = iconv_open(nl_langinfo(CODESET), "UTF-8");</code><br>if (iconv_handle == (iconv_t) -1)<br> /* signal an error */<br></pre>
+ </blockquote>
+ </blockquote>
+
+<ul>
+ <li>then, every string can be converted
+ using the following:</li>
+
+</ul>
+
+<blockquote>
+ <blockquote>
+ <pre><code>/* char* in_buf is the input buffer, size_t in_len is its length */<br>/* char* out_buf is the output buffer, size_t out_len is its length */<br><br>size_t nconv;<br>char *in_ptr = in_buf;<br>char *out_ptr = out_buf;<br>nconv = iconv(iconv_handle, &in_ptr, &in_len, &out_ptr, &out_len);</code></pre>
+ </blockquote>
+ </blockquote>
+
+<blockquote>If the output buffer is not big enough, <code>iconv</code> will
+ return -1 and set <code>errno</code> to <code>E2BIG</code>. Also,
+the <code>in_ptr</code> and <code>out_ptr</code> will point just after
+the last successfully converted character in the respective buffers, and
+the <code> in_len</code> and <code>out_len</code> will be updated to show
+the remaining lengths. There can be two strategies here:<br>
+
+ <ul>
+ <li>Make sure from the beginning
+ that the output buffer is big enough. However, it's difficult to find
+ an absolute maximum length in advance, even given the length of the input
+ string.<br>
+ <br>
+ </li>
+ <li>Do the conversion in several
+ steps, growing the output buffer each time to make more space, and calling
+ <code>iconv</code> consecutively until the conversion is complete.
+ This is the preferred way (a function could be written to encapsulate
+ all this).</li>
+
+ </ul>
+ Another error case is when the conversion was unsuccessful (if one of
+the characters can't be represented in the target character set). The
+ <code> iconv</code> function will then also return -1 and set <code>errno</code>
+ to <code>EILSEQ</code>; the <code>in_ptr</code> will point to the character
+ that couldn't be converted. In that case, again two strategies are
+possible:<br>
+
+ <ul>
+ <li>Just fail the conversion,
+and show an error. This is not very user friendly, of course.<br>
+ <br>
+ </li>
+ <li>Skip over the character that
+ can't be converted and append a "?" to the output buffer, then call <code>
+ iconv</code> again. Skipping over a UTF-8 character is fairly simple,
+ as follows from the <a
+ href="http://www.cl.cam.ac.uk/%7Emgk25/unicode.html#utf-8">encoding rules</a>
+ :</li>
+
+ </ul>
+
+ <ol>
+
+ <ol>
+ <li>if the first byte is in
+binary 0xxxxxxx, then the character is only one byte long, just skip over
+that byte<br>
+ <br>
+ </li>
+ <li>if the first byte is in
+binary 11xxxxxx, then skip over that byte and all bytes 10xxxxxx that follow.<br>
+ </li>
+
+ </ol>
+
+ </ol>
+ </blockquote>
+
+<ul>
+ <li>eventually, the conversion
+handle needs to be closed (when the program exits):<br>
+ </li>
+
+</ul>
+
+<blockquote>
+ <blockquote>
+ <pre><code>iconv_close(iconv_handle);<br></code></pre>
+ </blockquote>
+ </blockquote>
+ The example implementation
+ mentioned above grows the output buffer dynamically and outputs "?" for characters
+ that can't be converted.<br>
+
+
+<hr width="100%" size="2">
+
+<h2><a name="Support_for_configure.in"></a>Support for configure.in</h2>
+ Programs using the GEDCOM parser library and using autoconf to configure
+ their sources can use the following statements in configure.in (the example
+ is checking for gedcom-parse, version 1.34):<br>
+
+<blockquote><code>AC_CHECK_LIB(gedcom, gedcom_parse_file,,<br>
+ AC_MSG_ERROR(Cannot
+ find libgedcom: Please install gedcom-parse))<br>
+ AC_MSG_CHECKING(for libgedcom version)<br>
+ AC_TRY_RUN([<br>
+ #include <stdio.h><br>
+ #include <stdlib.h><br>
+ #include <gedcom.h><br>
+ int<br>
+ main()<br>
+ {<br>
+ if (GEDCOM_PARSE_VERSION >= 1034) exit(0);<br>
+ exit(1);<br>
+ }],<br>
+ ac_gedcom_version_ok='yes',<br>
+ ac_gedcom_version_ok='no',<br>
+ ac_gedcom_version_ok='no')<br>
+ if test "$ac_gedcom_version_ok" = 'yes' ; then<br>
+ AC_MSG_RESULT(ok)<br>
+ else<br>
+ AC_MSG_RESULT(not ok)<br>
+ AC_MSG_ERROR(You need at least version 1.34 of gedcom-parse)<br>
+ fi</code><br>
+ </blockquote>
+ There are three preprocessor symbols defined for version checks in the
+ header:<br>
+
+<ul>
+ <li><code>GEDCOM_PARSE_VERSION_MAJOR</code></li>
+ <li><code>GEDCOM_PARSE_VERSION_MINOR</code></li>
+ <li><code>GEDCOM_PARSE_VERSION</code><br>
+ </li>
+
+</ul>
+ The last one is equal to <code>(GEDCOM_PARSE_VERSION_MAJOR * 1000) + GEDCOM_PARSE_VERSION_MINOR.</code><br>
+
+<hr width="100%" size="2">
+
+<pre><font size="-1">$Id$<br>$Name$</font><br></pre>
+
+
+<pre> </pre>
+
+
+</body>
+</html>