+
+ The most important use of the parser is of course to get the data out of
+ the GEDCOM file. This section focuses on the callback mechanism (see
+ \ref gom "here" for the C object model). In fact, the mechanism involves
+ two levels.
+
+ The primary level is that each of the sections in a GEDCOM file is notified
+ to the application code via a "start element" callback and an "end element"
+ callback (much like in a SAX interface for XML), i.e. when a line containing
+ a certain tag is parsed, the "start element" callback is called for that tag
+ , and when all its subordinate lines with their tags have been processed,
+ the "end element" callback is called for the original tag. Since GEDCOM is
+ hierarchical, this results in properly nested calls to appropriate "start
+ element" and "end element" callbacks (note: see
+ \ref compat "compatibility handling").
+
+ However, it would be typical for a genealogy program to support only a
+ subset of the GEDCOM standard, certainly a program that is still under
+ development. Moreover, under GEDCOM it is allowed for an application to
+ define its own tags, which will typically not be supported by another
+ application. Still, in that case, data preservation is important; it would
+ hardly be accepted that information that is not understood by a certain
+ program is just removed.
+
+ Therefore, the second level of callbacks involves a "default callback". An
+ application needs to subscribe to callbacks for tags it does support, and
+ need to provide a "default callback" which will be called for tags it
+ doesn't support. The application can then choose to just store the
+ information that comes via the default callback in plain textual format.
+*/
+
+/*! \defgroup start_end Start and end callbacks
+ \ingroup cb_mech
+
+ The following simple example gets some information from the header record
+ of a GEDCOM file.
+
+ \code
+ Gedcom_ctxt my_header_start_cb (Gedcom_rec rec,
+ int level,
+ Gedcom_val xref,
+ char *tag,
+ char *raw_value,
+ int parsed_tag,
+ Gedcom_val parsed_value)
+ {
+ printf("The header starts\n");
+ return (Gedcom_ctxt)1;
+ }
+
+ void my_header_end_cb (Gedcom_rec rec, Gedcom_ctxt self)
+ {
+ printf("The header ends, context is %d\n", (int)self);
+ }
+
+ ...
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, my_header_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+ \endcode
+
+ Using the gedcom_subscribe_to_record() function, the application requests
+ to use the specified callbacks as start and end callback (type
+ \ref Gedcom_rec_start_cb and \ref Gedcom_rec_end_cb).
+
+ Such a callback
+ can return a context value of type \ref Gedcom_ctxt. This type is meant to
+ be opaque; in fact, it's a void pointer, so you can pass anything via it.
+ This context value will be passed to the callbacks of the direct
+ child elements, and to the end callback.
+
+ The example passes a simple integer as context, but an application could e.g.
+ pass a \c struct (or an object in a C++ application) that will contain the
+ information for the record. In the end callback, the application could then
+ e.g. do some finalizing operations on the \c struct to put it in its
+ database.
+
+ From the name of the function it becomes clear that this function is
+ specific to complete records. For the separate elements in records there
+ is another function, which we'll see shortly. Note that the callbacks need
+ to have the signatures as shown in the example.
+
+ We will now retrieve the SOUR field (the name of the program that wrote the
+ file) from the header:
+ \code
+ Gedcom_ctxt my_header_source_start_cb(Gedcom_elt elt,
+ Gedcom_ctxt parent,
+ int level,
+ char* tag,
+ char* raw_value,
+ int parsed_tag,
+ Gedcom_val parsed_value)
+ {
+ char *source = GEDCOM_STRING(parsed_value);
+ printf("This file was written by %s\n", source);
+ return parent;
+ }
+
+ ...
+ gedcom_subscribe_to_element(ELT_HEAD_SOUR,
+ my_header_source_start_cb,
+ NULL);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+ \endcode
+
+ The subscription mechanism for elements is similar, only the signatures of
+ the callbacks differ. The signature for the start callback shows that the
+ context of the parent line (here e.g. the \c struct that describes the
+ header) is passed to this start callback.
+
+ The callback itself returns here in this example the same context, but this
+ can be its own context object of course. The end callback is called with
+ both the context of the parent and the context of itself, which in this
+ example will be the same.
+*/
+
+/*! \defgroup defcb Default callbacks
+ \ingroup cb_mech
+
+ An application doesn't always implement the entire GEDCOM spec, and
+ application-specific tags may have been added by other applications. To
+ preserve this extra data anyway, a default callback can be registered by
+ the application, as in the following example:
+
+ \code
+ void my_default_cb (Gedcom_elt elt, Gedcom_ctxt parent, int level,
+ char* tag, char* raw_value, int parsed_tag)
+ {
+ ...
+ }
+
+ ...
+ gedcom_set_default_callback(my_default_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+ \endcode
+
+ This callback has a similar signature as the previous ones, but it doesn't
+ contain a parsed value. However, it does contain the parent context, that
+ was returned by the application for the most specific containing tag that
+ the application supported.
+
+ Suppose e.g. that this callback is called for some tags in the header that
+ are specific to some other application, then our application could make
+ sure that the parent context contains the struct or object that represents
+ the header, and use the default callback here to add the level, tag and
+ raw_value as plain text in a member of that struct or object, thus
+ preserving the information.
+
+ The application can then write this out when the data is saved again in a
+ GEDCOM file. To make it more specific, consider the following example:
+
+ \code
+ struct header {
+ char* source;
+ ...
+ char* extra_text;
+ };
+
+ Gedcom_ctxt my_header_start_cb(Gedcom_rec rec, int level, Gedcom_val xref,
+ char* tag, char *raw_value,
+ int parsed_tag, Gedcom_val parsed_value)
+ {
+ struct header head = my_make_header_struct();
+ return (Gedcom_ctxt)head;
+ }
+
+ void my_default_cb(Gedcom_elt elt, Gedcom_ctxt parent, int level,
+ char* tag, char* raw_value, int parsed_tag)
+ {
+ struct header head = (struct header)parent;
+ my_header_add_to_extra_text(head, level, tag, raw_value);
+ }
+
+ gedcom_set_default_callback(my_default_cb);
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
+ ...
+ result = gedcom_parse_file(filename);
+ \endcode
+
+ Note that the default callback will be called for any tag that isn't
+ specifically subscribed upon by the application, and can thus be called in
+ various contexts. For simplicity, the example above doesn't take this into
+ account (the parent could be of different types, depending on the context).
+
+ Note also that the default callback is not called when the parent context is
+ \c NULL. This is e.g. the case if none of the "upper" tags has been
+ subscribed upon.
+*/
+
+/*! \defgroup parsed Parsed values
+ \ingroup callback
+
+ The \c Gedcom_val type is meant to be an opaque type. The only thing that
+ needs to be known about it is that it can contains specific data types, which
+ have to be retrieved from it using pre-defined macros.
+
+ Currently, the specific \c Gedcom_val types are (with \c val of type
+ \c Gedcom_val):
+
+ <table border="1" width="100%">
+ <tr>
+ <td> </td>
+ <td><b>type checker</b></td>
+ <td><b>cast function</b></td>
+ </tr>
+ <tr>
+ <td>null value</td>
+ <td><code>GEDCOM_IS_NULL(val)</code></td>
+ <td>N/A</td>
+ </tr>
+ <tr>
+ <td>string</td>
+ <td><code>GEDCOM_IS_STRING(val)</code></td>
+ <td><code>char* str = GEDCOM_STRING(val);</code></td>
+ </tr>
+ <tr>
+ <td>date</td>
+ <td><code>GEDCOM_IS_DATE(val)</code></td>
+ <td><code>struct date_value dv = GEDCOM_DATE(val);</code></td>
+ </tr>
+ <tr>
+ <td>age</td>
+ <td><code>GEDCOM_IS_AGE(val)</code></td>
+ <td><code>struct age_value age = GEDCOM_AGE(val);</code></td>
+ </tr>
+ <tr>
+ <td>xref pointer</td>
+ <td><code>GEDCOM_IS_XREF_PTR(val)</code></td>
+ <td><code>struct xref_value *xr = GEDCOM_XREF_PTR(val);</code></td>
+ </tr>
+ </table>
+
+ The type checker returns a true or a false value according to the type of
+ the value, but this is in principle only necessary in the rare circumstances
+ that two types are possible, or where an optional value can be provided.
+ In most cases, the type is fixed for a specific tag.
+
+ The exact type per tag can be found in the
+ <a href="interface.html">interface details</a>.
+
+ The null value is used for when the GEDCOM spec doesn't allow a value, or
+ when an optional value is allowed but none is given.
+
+ The string value is the most general used value currently, for all those
+ values that don't have a more specific meaning. In essence, the value that
+ is returned by \c GEDCOM_STRING(val) is always the same as the \c raw_value
+ passed to the start callback, and is thus in fact redundant.
+
+ For the other data types, there is a specific section giving details.
+*/
+
+/*! \defgroup parsed_date Date values
+ \ingroup parsed
+
+ The Gedcom_val contains a struct date_value if it denotes a date. The
+ struct date is a part of the struct date_value.
+*/
+
+/*! \defgroup parsed_age Age values
+ \ingroup parsed
+
+ The Gedcom_val contains a struct age_value if it denotes an age.
+*/
+
+/*! \defgroup parsed_xref Cross-reference values
+ \ingroup parsed
+
+ The Gedcom_val contains a pointer to a struct xref_value if it denotes a
+ cross-reference (note: not the struct itself, but a pointer to it !)
+
+ The parser checks whether all cross-references that are used are defined
+ (if not, an error is produced) and whether all cross-references that are
+ defined are used (if not, a warning is produced). It also checks whether
+ the type of the cross-reference is the same on definition and use (if
+ not, an error is produced).
+
+ The first two checks are done at the end of
+ the parsing, because cross-references can be defined after their usage
+ in GEDCOM.
+
+ A cross-reference key must be a string of maximum 22 characters, of the
+ following format:
+
+ - an at sign ('@')
+ - followed by an alphanumeric character (A-Z, a-z, 0-9 or underscore)
+ - followed by zero or more characters, which can be any character
+ except an at sign
+ - terminated by an at sign ('@')
+
+ An example would thus be: <code>"@This is an xref_val@"</code>.
+*/
+
+/*! \defgroup compat Compatibility mode
+ \ingroup callback
+
+ Applications are not necessarily true to the GEDCOM spec (or use a different
+ version than 5.5). The intention is that the library is resilient to this,
+ and goes in compatibility mode for files written by specific programs
+ (detected via the \c HEAD.SOUR tag).
+
+ Currently, there is (some) compatibility for:
+ - ftree
+ - Lifelines (3.0.2)
+ - Personal Ancestral File (PAF), version 2, 4 and 5
+ - Family Origins
+ - EasyTree