+ A typical piece of code would be (<code>gom_parse_file</code> would be called in case the C object model is used):<br>
+
+<blockquote><code>void <b>my_message_handler</b> (Gedcom_msg_type type,
+ char *msg)<br>
+ {<br>
+ ...<br>
+ }<br>
+ ...<br>
+ <b>gedcom_set_message_handler</b>(my_message_handler);<br>
+ ...<br>
+ result = <b>gedcom_parse_file</b>("myfamily.ged");</code><br>
+ </blockquote>
+ In the above piece of code, <code>my_message_handler</code> is the
+ callback that will be called for errors (<code>type=ERROR</code>), warnings
+ (<code>type=WARNING</code>) and messages (<code>type=MESSAGE</code>). The
+ callback must have the signature as in the example. For errors,
+the <code> msg</code> passed to the callback will have the format:<br>
+
+<blockquote><code>Error on line</code> <i><lineno></i>: <i><actual_message></i><br>
+ </blockquote>
+ Note that the entire string will be properly internationalized, and
+ encoded in UTF-8 (<a href="encoding.html">Why UTF-8?</a>). Also,
+no newline is appended, so that the application program can use it in any
+way it wants. Warnings are similar, but use "Warning" instead of "Error".
+ Messages are plain text, without any prefix.<br>
+ <br>
+ With this in place, the resulting code will already show errors and
+ warnings produced by the parser, e.g. on the terminal if a simple <code>
+ printf</code> is used in the message handler.<br>
+
+<hr width="100%" size="2">
+<h2><a name="Data_callback_mechanism"></a>Data callback mechanism</h2>
+ The most important use of the parser is of course to get the data
+out of the GEDCOM file. This section focuses on the callback mechanism (see <a href="gom.html">here</a> for the C object model). In fact, the mechanism involves two levels.<br>
+ <br>
+ The primary level is that each of the sections in a GEDCOM file is
+ notified to the application code via a "start element" callback and an
+ "end element" callback (much like in a SAX interface for XML), i.e. when
+ a line containing a certain tag is parsed, the "start element" callback
+ is called for that tag, and when all its subordinate lines with their
+tags have been processed, the "end element" callback is called for the
+original tag. Since GEDCOM is hierarchical, this results in properly
+nested calls to appropriate "start element" and "end element" callbacks.<br>
+ <br>
+ However, it would be typical for a genealogy program to support only
+ a subset of the GEDCOM standard, certainly a program that is still under
+ development. Moreover, under GEDCOM it is allowed for an application
+ to define its own tags, which will typically not be supported by
+another application. Still, in that case, data preservation is important;
+ it would hardly be accepted that information that is not understood by
+ a certain program is just removed.<br>
+ <br>
+ Therefore, the second level of callbacks involves a "default callback".
+ An application needs to subscribe to callbacks for tags it does support,
+ and need to provide a "default callback" which will be called for tags
+it doesn't support. The application can then choose to just store
+the information that comes via the default callback in plain textual format.<br>
+ <br>
+ After this introduction, let's see what the API looks like...<br>
+ <br>
+
+<h3><a name="Start_and_end_callbacks"></a>Start and end callbacks</h3>
+
+<h4><i>Callbacks for records</i> <br>
+ </h4>
+ As a simple example, we will get some information from the header
+of a GEDCOM file. First, have a look at the following piece of code:<br>
+
+<blockquote><code>Gedcom_ctxt <b>my_header_start_cb</b> (Gedcom_rec rec,<br>
+ int level, <br>
+
+ Gedcom_val xref, <br>
+
+ char *tag, <br>
+
+ char *raw_value,<br>
+
+ int parsed_tag, <br>
+
+ Gedcom_val parsed_value)<br>
+ {<br>
+ printf("The header starts\n");<br>
+ return (Gedcom_ctxt)1;<br>
+ }<br>
+ <br>
+ void <b>my_header_end_cb</b> (Gedcom_rec rec, Gedcom_ctxt self)<br>
+ {<br>
+ printf("The header ends, context is %d\n", (int)self);
+ /* context will print as "1" */<br>
+ }<br>
+ <br>
+ ...<br>
+ <b>gedcom_subscribe_to_record</b>(REC_HEAD, my_header_start_cb,
+ my_header_end_cb);<br>
+ ...<br>
+ result = <b>gedcom_parse_file</b>("myfamily.ged");</code><br>
+ </blockquote>
+ Using the <code>gedcom_subscribe_to_record</code> function, the
+ application requests to use the specified callbacks as start and end
+callback. The end callback is optional: you can pass <code>NULL</code>
+ if you are not interested in the end callback. The identifiers
+to use as first argument to the function (here <code>REC_HEAD</code>)
+are described in the <a href="interface.html#Record_identifiers"> interface
+details</a> . These are also passed as first argument in the callbacks (the <code>Gedcom_rec</code> argument).<br>
+ <br>
+ From the name of the function it becomes clear that this function
+is specific to complete records. For the separate elements in records
+ there is another function, which we'll see shortly. Again, the callbacks
+ need to have the signatures as shown in the example.<br>
+ <br>
+ The <code>Gedcom_ctxt</code> type that is used as a result of the
+start callback and as an argument to the end callback is vital for passing
+context necessary for the application. This type is meant to be opaque;
+in fact, it's a void pointer, so you can pass anything via it. The
+important thing to know is that the context that the application returns
+in the start callback will be passed in the end callback as an argument,
+and as we will see shortly, also to all the directly subordinate elements
+of the record.<br>
+ <br>
+ The <code>tag</code> is the GEDCOM tag in string format, the <code>parsed_tag</code>
+ is an integer, for which symbolic values are defined as <code>TAG_HEAD,</code>
+ <code>TAG_SOUR,</code> <code>TAG_DATA,</code> ... and <code>USERTAG
+</code><code></code> for the application-specific tags. These values
+are defined in the header <code>gedcom-tags.h</code> that is installed,
+and included via <code> gedcom.h</code> (so no need to include <code>gedcom-tags.h</code>
+ yourself).<br>
+ <br>
+ The example passes a simple integer as context, but an application
+ could e.g. pass a <code>struct</code> (or an object in a C++ application)
+ that will contain the information for the header. In the end callback,
+ the application could then e.g. do some finalizing operations on the
+<code> struct</code> to put it in its database.<br>
+ <br>
+ (Note that the <code>Gedcom_val</code> type for the <code>xref</code>
+ and <code>parsed_value</code> arguments was not discussed, see further
+ for this)<br>
+ <br>
+
+<h4><i>Callbacks for elements</i></h4>
+ We will now retrieve the SOUR field (the name of the program that
+wrote the file) from the header:<br>
+
+<blockquote><code>Gedcom_ctxt <b>my_header_source_start_cb</b>(Gedcom_elt elt,<br>
+
+ Gedcom_ctxt
+ parent,<br>
+
+ int
+ level,<br>
+
+ char*
+ tag,<br>
+
+ char*
+ raw_value,<br>
+
+ int
+ parsed_tag,<br>
+
+ Gedcom_val
+ parsed_value)<br>
+ {<br>
+ char *source = GEDCOM_STRING(parsed_value);<br>
+ printf("This file was written by %s\n", source);<br>
+ return parent;<br>
+ }<br>
+ <br>
+ void <b>my_header_source_end_cb</b>(Gedcom_elt elt,<br>
+ Gedcom_ctxt parent,<br>
+
+ Gedcom_ctxt self,<br>
+
+ Gedcom_val parsed_value)<br>
+ {<br>
+ printf("End of the source description\n");<br>
+ }<br>
+ <br>
+ ...<br>
+ <b>gedcom_subscribe_to_element</b>(ELT_HEAD_SOUR,<br>
+
+ my_header_source_start_cb,<br>
+
+ my_header_source_end_cb);<br>
+ ...<br>
+ result = <b>gedcom_parse_file</b>("myfamily.ged");</code><br>
+ </blockquote>
+ The subscription mechanism for elements is similar, only the signatures
+ of the callbacks differ. The signature for the start callback shows
+ that the context of the parent line (here e.g. the <code>struct</code>
+ that describes the header) is passed to this start callback. The
+ callback itself returns here in this example the same context, but this
+can be its own context object of course. The end callback is called
+with both the context of the parent and the context of itself, which in this
+example will be the same. Again, the list of identifiers to use as
+a first argument for the subscription function are detailed in the <a href="interface.html#Element_identifiers"> interface details</a> . Again, these are passed as first argument in the callback (the <code>Gedcom_elt</code> argument).<br>
+ <br>
+ If we look at the other arguments of the start callback, we see the
+ level number (the initial number of the line in the GEDCOM file), the tag
+ (e.g. "SOUR"), and then a raw value, a parsed tag and a parsed value. The
+ raw value is just the raw string that occurs as value on the line next
+to the tag (in UTF-8 encoding). The parsed value is the meaningful
+value that is parsed from that raw string. The parsed tag is described
+in the section for record callbacks above.<br>
+ <br>
+ The <code>Gedcom_val</code> type is meant to be an opaque type. The
+ only thing that needs to be known about it is that it can contain specific
+ data types, which have to be retrieved from it using pre-defined macros.
+ These data types are described in the <a href="interface.html#Gedcom_val_types"> interface details</a>.
+ <br>
+ <br>
+ Some extra notes:<br>
+
+
+<ul>
+ <li>The <code>Gedcom_val</code> argument of the end callback
+ is currently not used. It is there for future enhancements.</li>
+ <li>There are also two <code>Gedcom_val</code> arguments
+ in the start callback for records. The first one (<code>xref</code>
+ ) contains the <code>xref_value</code> corresponding to the cross-reference
+ (or <code>NULL</code> if there isn't one), the second one (<code>parsed_value</code>
+ ) contains the value that is parsed from the <code>raw_value</code>. See
+ the <a href="interface.html#Record_identifiers">interface details</a>
+ .</li>
+
+
+</ul>
+
+
+<h3><a name="Default_callbacks"></a>Default callbacks<br>
+ </h3>
+ As described above, an application doesn't always implement the entire
+ GEDCOM spec, and application-specific tags may have been added by other
+ applications. To preserve this extra data anyway, a default callback
+ can be registered by the application, as in the following example:<br>
+
+<blockquote><code>void <b>my_default_cb</b> (Gedcom_elt elt, Gedcom_ctxt parent, int level,
+ char* tag, char* raw_value, int parsed_tag)<br>
+ {<br>
+ ...<br>
+ }<br>
+ <br>
+ ...<br>
+ <b>gedcom_set_default_callback</b>(my_default_cb);<br>
+ ...<br>
+ result = <b>gedcom_parse_file</b>("myfamily.ged");</code><br>
+ </blockquote>
+ This callback has a similar signature as the previous ones,
+ but it doesn't contain a parsed value. However, it does contain the
+ parent context, that was returned by the application for the most specific
+ containing tag that the application supported.<br>
+ <br>
+ Suppose e.g. that this callback is called for some tags in the header
+ that are specific to some other application, then our application could
+ make sure that the parent context contains the struct or object that represents
+ the header, and use the default callback here to add the level, tag and
+ raw_value as plain text in a member of that struct or object, thus preserving
+ the information. The application can then write this out when the
+data is saved again in a GEDCOM file. To make it more specific, consider
+ the following example:<br>
+
+<blockquote><code>struct header {<br>
+ char* source;<br>
+ ...<br>
+ char* extra_text;<br>
+ };<br>
+ <br>
+ Gedcom_ctxt my_header_start_cb(Gedcom_rec rec, int level, Gedcom_val xref, char* tag,
+ char *raw_value,<br>
+
+ int parsed_tag, Gedcom_val parsed_value)<br>
+ {<br>
+ struct header head = my_make_header_struct();<br>
+ return (Gedcom_ctxt)head;<br>
+ }<br>
+ <br>
+ void my_default_cb(Gedcom_elt elt, Gedcom_ctxt parent, int level, char* tag, char*
+raw_value, int parsed_tag)<br>
+ {<br>
+ struct header head = (struct header)parent;<br>
+ my_header_add_to_extra_text(head, level, tag, raw_value);<br>
+ }<br>
+ <br>
+ gedcom_set_default_callback(my_default_cb);<br>
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);<br>
+ ...<br>
+ result = gedcom_parse_file(filename);</code><br>
+ </blockquote>
+ Note that the default callback will be called for any tag that isn't
+ specifically subscribed upon by the application, and can thus be called
+ in various contexts. For simplicity, the example above doesn't take
+ this into account (the <code>parent</code> could be of different
+ types, depending on the context).<br>
+ <br>
+ Note also that the default callback is not called when the parent context
+ is <code>NULL</code><code></code>. This is e.g. the case if none
+ of the "upper" tags has been subscribed upon.<br>
+
+
+<hr width="100%" size="2"><br>
+<h2><a name="Support_for_writing_GEDCOM_files"></a>Support for writing GEDCOM files</h2>
+The Gedcom parser library also contains functions to writing GEDCOM files.
+ Similar as for the parsing itself, there are two interfaces: an interface
+which is very basic, and requires you to call a function for each line in
+the GEDCOM file, and an interface which just dumps the Gedcom object model
+to a file in one shot (if you use the Gedcom object model).<br>
+<br>
+Again, this section focuses on the basic interface, the Gedcom object model interface is described <a href="gom.html#Writing_the_object_model_to_file">here</a>.<br>
+<br>
+<h3><a name="Opening_and_closing_files"></a>Opening and closing files</h3>
+The basic functions for opening and closing Gedcom files for writing are the following:<br>
+<code></code>
+<blockquote><code>Gedcom_write_hndl <b>gedcom_write_open</b> (const char* filename);<br>
+int <b>gedcom_write_close</b> (Gedcom_write_hndl hndl, int* total_conv_fails);<br></code></blockquote>
+The function <code>gedcom_write_open</code> takes a parameter the name of
+the file to write, and returns a write handle, which needs to be used in
+subsequent functions. It returns <code>NULL</code> in case of errors.<br>
+<br>
+The function <code>gedcom_write_close</code> takes, next to the write handle,
+an integer pointer as parameter. If you pass an actual pointer for
+this, the function will write in it the total number of conversion failures;
+you can pass <code>NULL</code> if you're not interested. The function returns 0 in case of success, non-zero in case of failure.<br>
+<br>
+<h3><a name="Controlling_some_settings"></a>Controlling some settings<br>
+</h3>
+Note that by default the file is written in ASCII encoding (and hence e.g.
+accented characters will cause a conversion failure). You can change
+this by calling the following function <i>before</i> calling <code>gedcom_write_open</code>, i.e. it affects all files that are opened after it is being called:<code></code><code><br>
+</code>
+<blockquote><code>int <b>gedcom_write_set_encoding</b> (const char* charset, Encoding width, Enc_bom bom);<br></code></blockquote>
+The valid <code>charset</code> values are given in the first column in the file <code>gedcom.enc</code> in the data directory of gedcom-parse (<code>$PREFIX/share/gedcom-parse</code>).
+ The character sets UNICODE, ASCII and ANSEL are always supported (these
+are standard for GEDCOM), as well as ANSI (not standard), but there may be
+others.<br>
+<br>
+The <code>width</code> parameter takes one of the following values:<br>
+<ul>
+</ul>
+<ul>
+ <li><code><b>ONE_BYTE</b></code>: This should be used for all character sets except UNICODE.</li>
+ <li><code><b>TWO_BYTE_HILO</b></code>: High-low encoding for UNICODE (i.e. big-endian)</li>
+ <li><code><b>TWO_BYTE_LOHI</b></code>: Low-high encoding for UNICODE (i.e. little-endian)</li>
+</ul>
+The <code>bom</code> parameter determines whether a byte-order-mark should
+be written in the file in case of UNICODE encoding (usually preferred because
+it then clearly indicates the byte ordering). It takes one of the following
+values:<br>
+<ul>
+ <li><code><b>WITHOUT_BOM</b></code></li>
+ <li><code><b>WITH_BOM</b></code></li>
+</ul> For both these parameters you can pass 0 for non-UNICODE encodings,
+since that corresponds to the correct values (and is ignored anyway). The
+function returns 0 in case of success, non-zero in case of error. Note
+that you still need to pass the correct charset value for the HEAD.CHAR tag,
+otherwise you will get a warning, and the value will be forced to the correct
+value.<br>
+<br>
+Further, it is possible to control the kind of line terminator that is used, via the following function (also to be used before <code>gedcom_write_open</code>):<br>
+<blockquote><code>int <b>gedcom_write_set_line_terminator</b> (Enc_line_end end);<br></code></blockquote>
+The <code>end</code> parameter takes one of the following values:<br>
+<ul>
+ <li><b><code>END_CR</code></b>: only carriage return ("/r") (cf. Macintosh)</li>
+ <li><b><code>END_LF</code></b>: only line feed ("/n") (cf. Unix, Mac OS X)</li>
+ <li><b><code>END_CR_LF</code></b>: first carriage return, then line feed ("/r/n") (cf. DOS, Windows)</li>
+ <li><b><code>END_LF_CR</code></b>: first line feed, then carriage return ("/n/r")</li>
+</ul>
+By default, this is set to the appropriate line terminator on the current
+platform, so it only needs to be changed if there is some special reason
+for it.<br>
+<h3><a name="Writing_data"></a>Writing data<br>
+</h3>
+For actually writing the data, the principle is that every line in the GEDCOM
+file to write corresponds to a call to one of the following functions, except
+that CONT/CONC lines can be automatically taken care of. Note that
+the resulting GEDCOM file should conform to the GEDCOM standard. Several
+checks are built in already, and more will follow, to force this. There
+is (currently) no compatibility mode for writing GEDCOM files.<br>
+<br>
+In general, each of the following functions expect their input in UTF-8 encoding (see also <a href="#Converting_character_sets">here</a>). If this is not the case, errors will be returned.<br>
+<br>
+Note that for examples of using these functions you can look at the sources for the Gedcom object model (e.g. the function <code>write_header</code> in <code>gom/header.c</code>).<br>
+<h4>Records</h4>
+For writing lines corresponding to records (i.e. on level 0), the following function is available:
+<blockquote><code>int <b>gedcom_write_record_str</b> (Gedcom_write_hndl hndl, Gedcom_rec rec, char* xrefstr, char* value);<br></code></blockquote>
+The <code>hndl</code> parameter is the write handle that was returned by <code>gedcom_write_open</code>. The <code>rec</code> parameter is one of the identifiers given in the first column in <a href="interface.html#Record_identifiers">this table</a> (except <code>REC_USER</code>: see below). The <code>xrefstr</code> and <code>val</code> parameters are respectively the cross-reference key of the record (something like '<code>@FAM01@</code>'), and the value of the record line, which should be <code>NULL</code> for some record types, according to the same table.<br>
+<h4>Elements</h4>
+For writing lines corresponding to elements (inside records, i.e. on a level
+bigger than 0), the following functions are available, depending on the data
+type:
+<blockquote><code>int <b>gedcom_write_element_str</b> (Gedcom_write_hndl hndl, Gedcom_elt elt, int parsed_tag, <br>