+ </h3>
+ As described above, an application doesn't always implement the entire
+ GEDCOM spec, and application-specific tags may have been added by other applications.
+ To preserve this extra data anyway, a default callback can be registered
+ by the application, as in the following example:<br>
+
+ <blockquote><code>void <b>my_default_cb</b> (Gedcom_ctxt parent,
+ int level, char* tag, char* raw_value, int parsed_tag)<br>
+ {<br>
+ ...<br>
+ }<br>
+ <br>
+ ...<br>
+ <b>gedcom_set_default_callback</b>(my_default_cb);<br>
+ ...<br>
+ result = <b>gedcom_parse_file</b>("myfamily.ged");</code><br>
+ </blockquote>
+ This callback has a similar signature as the previous ones,
+ but it doesn't contain a parsed value. However, it does contain the
+ parent context, that was returned by the application for the most specific
+ containing tag that the application supported.<br>
+ <br>
+ Suppose e.g. that this callback is called for some tags in the header
+that are specific to some other application, then our application could make
+sure that the parent context contains the struct or object that represents
+ the header, and use the default callback here to add the level, tag and
+raw_value as plain text in a member of that struct or object, thus preserving
+the information. The application can then write this out when the
+data is saved again in a GEDCOM file. To make it more specific, consider
+ the following example:<br>
+
+ <blockquote><code>struct header {<br>
+ char* source;<br>
+ ...<br>
+ char* extra_text;<br>
+ };<br>
+ <br>
+ Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag,
+char *raw_value,<br>
+
+ int parsed_tag, Gedcom_val parsed_value)<br>
+ {<br>
+ struct header head = my_make_header_struct();<br>
+ return (Gedcom_ctxt)head;<br>
+ }<br>
+ <br>
+ void my_default_cb(Gedcom_ctxt parent, int level, char* tag, char* raw_value,
+ int parsed_tag)<br>
+ {<br>
+ struct header head = (struct header)parent;<br>
+ my_header_add_to_extra_text(head, level, tag, raw_value);<br>
+ }<br>
+ <br>
+ gedcom_set_default_callback(my_default_cb);<br>
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);<br>
+ ...<br>
+ result = gedcom_parse_file(filename);</code><br>
+ </blockquote>
+ Note that the default callback will be called for any tag that isn't
+specifically subscribed upon by the application, and can thus be called
+in various contexts. For simplicity, the example above doesn't take
+this into account (the <code>parent</code> could be of different
+types, depending on the context).<br>
+ <br>
+Note also that the default callback is not called when the parent context is <code>NULL</code><code></code>. This is e.g. the case if none of the "upper" tags has been subscribed upon.<br>
+
+ <hr width="100%" size="2">
+
+ <h2><a name="Other_API_functions"></a>Other API functions<br>
+ </h2>
+ Although the above describes the basic interface of libgedcom, there
+are some other functions that allow to customize the behaviour of the library.
+ These will be explained in the current section.<br>
+
+ <h3><a name="Debugging"></a>Debugging</h3>
+ The library can generate various debugging output, not only from itself,
+ but also the debugging output generated by the yacc parser. By default,
+ no debugging output is generated, but this can be customized using the
+following function:<br>
+
+ <blockquote><code>void <b>gedcom_set_debug_level</b> (int level,
+ FILE* trace_output)</code><br>
+ </blockquote>
+ The <code>level</code> can be one of the following values:<br>
+
+ <ul>
+ <li>0: no debugging information (this is the
+default)</li>
+ <li>1: only debugging information from libgedcom
+ itself</li>
+ <li>2: debugging information from libgedcom and
+ yacc</li>
+
+ </ul>
+ If the <code>trace_output</code> is <code>NULL</code>, debugging information
+ will be written to <code>stderr</code>, otherwise the given file handle
+is used (which must be open).<br>
+ <br>
+
+ <h3><a name="Error_treatment"></a>Error treatment</h3>
+ One of the previous sections already described the callback to be registered
+ to get error messages. The library also allows to customize what
+happens on an error, using the following function:<br>
+
+ <blockquote><code>void <b>gedcom_set_error_handling</b> (Gedcom_err_mech
+ mechanism)</code><br>
+ </blockquote>
+ The <code>mechanism</code> can be one of:<br>
+
+
+ <ul>
+ <li><code>IMMED_FAIL</code>: immediately fail the
+parsing on an error (this is the default)</li>
+ <li><code>DEFER_FAIL</code>: continue parsing after
+ an error, but return a failure code eventually</li>
+ <li><code>IGNORE_ERRORS</code>: continue parsing
+after an error, return success always</li>
+
+
+ </ul>
+ This doesn't influence the generation of error or warning messages, only
+ the behaviour of the parser and its return code.<br>
+ <br>
+
+
+ <h3><a name="Compatibility_mode"></a>Compatibility mode<br>
+ </h3>
+ Applications are not necessarily true to the GEDCOM spec (or use a different
+ version than 5.5). The intention is that the library is resilient
+to this, and goes in compatibility mode for files written by specific programs
+ (detected via the HEAD.SOUR tag). This compatibility mode can be
+enabled and disabled via the following function:<br>
+
+
+ <blockquote><code>void <b>gedcom_set_compat_handling</b>
+ (int enable_compat)</code><br>
+ </blockquote>
+ The argument can be:<br>
+
+
+ <ul>
+ <li>0: disable compatibility mode</li>
+ <li>1: allow compatibility mode (this is the default)<br>
+ </li>
+
+
+ </ul>
+ Note that, currently, no actual compatibility code is present, but this
+ is on the to-do list.<br>
+ <hr width="100%" size="2">
+ <h2><a name="Converting_character_sets"></a>Converting character sets</h2>
+All strings passed by the GEDCOM parser to the application are in UTF-8 encoding.
+ Typically, an application needs to convert this to something else to
+be able to display it.<br>
+ <br>
+The most common case is that the output character set is controlled by the <code>locale</code> mechanism (i.e. via the <code>LANG</code>, <code>LC_ALL</code> or <code>LC_CTYPE</code> environment variables), which also controls the <code>gettext</code>
+ mechanism in the application. For this, the following steps need to
+be taken by the application (more detailed info can be found in the info
+file of the GNU libc library in the "Generic Charset Conversion" section
+under "Character Set Handling" or online <a href="http://www.gnu.org/manual/glibc-2.2.3/html_chapter/libc_6.html#SEC99">here</a>):<br>
+ <ul>
+ <li>inclusion of some headers:</li>
+ </ul>
+ <blockquote>
+ <blockquote>
+ <pre><code>#include <locale.h> /* for setlocale */<br>#include <langinfo.h> /* for nl_langinfo */<br>#include <iconv.h> /* for iconv_* functions */<br></code></pre>
+ </blockquote>
+ </blockquote>
+ <ul>
+ <li>set the program's current locale to what the user configured in the environment:</li>
+ </ul>
+ <blockquote>
+ <blockquote>
+ <pre><code>setlocale(LC_ALL, "");</code><br></pre>
+ </blockquote>
+ </blockquote>
+ <ul>
+ <li>open a conversion handle for conversion from UTF-8 to the character set of the current locale (once for the entire program):</li>
+ </ul>
+ <blockquote>
+ <blockquote>
+ <pre><code>iconv_t iconv_handle;<br>...<br>iconv_handle = iconv_open(nl_langinfo(CODESET), "UTF-8");</code><br>if (iconv_handle == (iconv_t) -1)<br> /* signal an error */<br></pre>
+ </blockquote>
+ </blockquote>
+ <ul>
+ <li>then, every string can be converted using the following:</li>
+ </ul>
+ <blockquote>
+ <blockquote>
+ <pre><code>/* char* in_buf is the input buffer, size_t in_len is its length */<br>/* char* out_buf is the output buffer, size_t out_len is its length */<br><br>size_t nconv;<br>char *in_ptr = in_buf;<br>char *out_ptr = out_buf;<br>nconv = iconv(iconv_handle, &in_ptr, &in_len, &out_ptr, &out_len);</code></pre>
+ </blockquote>
+ </blockquote>
+ <blockquote>If the output buffer is not big enough, <code>iconv</code> will return -1 and set <code>errno</code> to <code>E2BIG</code>. Also, the <code>in_ptr</code> and <code>out_ptr</code> will point just after the last successfully converted character in the respective buffers, and the <code>in_len</code> and <code>out_len</code> will be updated to show the remaining lengths. There can be two strategies here:<br>
+ <ul>
+ <li>Make sure from the beginning
+that the output buffer is big enough. However, it's difficult to find
+an absolute maximum length in advance, even given the length of the input
+string.<br>
+ <br>
+ </li>
+ <li>Do the conversion in several steps, growing the output buffer each time to make more space, and calling <code>iconv</code>
+ consecutively until the conversion is complete. This is the preferred
+way (a function could be written to encapsulate all this).</li>
+ </ul>
+Another error case is when the conversion was unsuccessful (if one of the
+characters can't be represented in the target character set). The <code>iconv</code> function will then also return -1 and set <code>errno</code> to <code>EILSEQ</code>; the <code>in_ptr</code> will point to the character that couldn't be converted. In that case, again two strategies are possible:<br>
+ <ul>
+ <li>Just fail the conversion, and show an error. This is not very user friendly, of course.<br>
+ <br>
+ </li>
+ <li>Skip over the character that can't be converted and append a "?" to the output buffer, then call <code>iconv</code> again. Skipping over a UTF-8 character is fairly simple, as follows from the <a href="http://www.cl.cam.ac.uk/%7Emgk25/unicode.html#utf-8">encoding rules</a>:</li>
+ </ul>
+ <ol>
+ <ol>
+ <li>if the first byte is in binary 0xxxxxxx, then the character is only one byte long, just skip over that byte<br>
+ <br>
+ </li>
+ <li>if the first byte is in binary 11xxxxxx, then skip over that byte and all bytes 10xxxxxx that follow.<br>
+ </li>
+ </ol>
+ </ol>
+ </blockquote>
+ <ul>
+ <li>eventually, the conversion handle needs to be closed (when the program exits):<br>
+ </li>
+ </ul>
+ <blockquote>
+ <blockquote>
+ <pre><code>iconv_close(iconv_handle);<br></code></pre>
+ </blockquote>
+ </blockquote>
+
+
+
+ The source distribution of <code>gedcom-parse</code> contains an example implementation (<code>utf8-locale.c</code> and <code>utf8-locale.h</code>
+ in the top directory) that grows the output buffer dynamically and outputs
+"?" for characters that can't be converted. Feel free to use it in
+your source code (it is not part of the library, and it isn't installed anywhere,
+so you need to take over the source and header file in your application).
+ <br>
+ <br>
+Its interface is:<br>
+ <blockquote>
+ <pre><code>char *<b>convert_utf8_to_locale</b> (char *input);<br>char *<b>convert_locale_to_utf8</b> (char *input);<br></code></pre>
+ </blockquote>
+Both functions return a pointer to a static buffer that is overwritten on
+each call. To function properly, the application must first set the
+locale using the <code>setlocale</code> function (the second step above).
+ All other steps, including setting up and closing down the conversion
+handles, are transparantly handled by the two functions.<br>
+ <br>
+You can change the "?" that is output for characters that can't be converted
+to any string you want, using the following function before the conversion
+calls:<br>
+ <blockquote>
+ <pre><code>void <b>convert_set_unknown</b> (const char *unknown);</code></pre>
+ </blockquote>
+ <hr width="100%" size="2">
+
+ <pre><font size="-1">$Id$<br>$Name$</font><br></pre>
+
+ <pre> </pre>
+
+
+ </body></html>
\ No newline at end of file