X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;f=doc%2Fusage.html;h=cd050df414a0f2e707d0e8b9a71fd0999318046f;hb=f4e5ab78b88a194151651db2415dd210fb4ec494;hp=76a11bfab4748ab93db208346238bd4ba927dc72;hpb=f789da85454184a473145e9a1d1260b9e09afcd1;p=gedcom-parse.git diff --git a/doc/usage.html b/doc/usage.html index 76a11bf..cd050df 100644 --- a/doc/usage.html +++ b/doc/usage.html @@ -1,416 +1,455 @@ - - - - Using the GEDCOM parser library - - - - - +Using the GEDCOM parser library + + + +

Using the GEDCOM parser library

-
- +
+

Index

- + - -
+ +

Overview
-

- The GEDCOM parser library is built as a callback-based parser (comparable - to the SAX interface of XML).  It comes with:
- + + The GEDCOM parser library is built as a callback-based parser (comparable + to the SAX interface of XML).  It comes with:
+ - Next to these, there is also a data directory in $PREFIX/share/gedcom-parse - that contains some additional stuff, but which is not immediately important - at first.  I'll leave the description of the data directory for later.
-
- The very simplest call of the gedcom parser is simply the following piece - of code (include of the gedcom header is assumed, as everywhere in this manual):
- + Next to these, there is also a data directory in $PREFIX/share/gedcom-parse + that contains some additional stuff, but which is not immediately +important at first.  I'll leave the description of the data directory +for later.
+
+ The very simplest call of the gedcom parser is simply the following +piece of code (include of the gedcom header is assumed, as everywhere in +this manual):
+
int result;
- ...
- result = gedcom_parse_file("myfamily.ged");
-
- Although this will not provide much information, one thing it does is -parse the entire file and return the result.  The function returns -0 on success and 1 on failure.  No other information is available using -this function only.
-
- The next sections will refine this to be able to have meaningful errors -and the actual data that is in the file.
- -
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+ + Although this will not provide much information, one thing it does +is parse the entire file and return the result.  The function returns +0 on success and 1 on failure.  No other information is available using + this function only.
+
+ The next sections will refine this to be able to have meaningful errors + and the actual data that is in the file.
+ +

Error handling

- Since this is a relatively simple topic, it is discussed before the actual - callback mechanism, although it also uses a callback...
-
- The library can be used in several different circumstances, both terminal-based - as GUI-based.  Therefore, it leaves the actual display of the error -message up to the application.  For this, the application needs to register -a callback before parsing the GEDCOM file, which will be called by the library - on errors, warnings and messages.
-
- A typical piece of code would be:
- -
void my_message_handler (Gedcom_msg_type type, - char *msg)
- {
-   ...
- }
- ...
- gedcom_set_message_handler(my_message_handler);
- ...
- result = gedcom_parse_file("myfamily.ged");

-
- In the above piece of code, my_message_handler is the callback - that will be called for errors (type=ERROR), warnings ( - type=WARNING) and messages (type=MESSAGE).  The -callback must have the signature as in the example.  For errors, the - msg passed to the callback will have the format:
- + Since this is a relatively simple topic, it is discussed before the +actual callback mechanism, although it also uses a callback...
+
+ The library can be used in several different circumstances, both terminal-based + as GUI-based.  Therefore, it leaves the actual display of the error + message up to the application.  For this, the application needs to register + a callback before parsing the GEDCOM file, which will be called by the library + on errors, warnings and messages.
+
+ A typical piece of code would be:
+ +
void my_message_handler (Gedcom_msg_type type, + char *msg)
+ {
+   ...
+ }
+ ...
+ gedcom_set_message_handler(my_message_handler);
+ ...
+ result = gedcom_parse_file("myfamily.ged");

+
+ In the above piece of code, my_message_handler is the callback + that will be called for errors (type=ERROR), warnings (type=WARNING) and messages (type=MESSAGE).  The + callback must have the signature as in the example.  For errors, the + msg passed to the callback will have the format:
+
Error on line <lineno>: <actual_message>
-
- Note that the entire string will be properly internationalized, and encoded - in UTF-8 (see "Why UTF-8?"  LINK TBD).  Also, no newline - is appended, so that the application program can use it in any way it wants. -  Warnings are similar, but use "Warning" instead of "Error".  Messages - are plain text, without any prefix.
-
- With this in place, the resulting code will already show errors and warnings - produced by the parser, e.g. on the terminal if a simple printf - is used in the message handler.
- -
+ + Note that the entire string will be properly internationalized, and +encoded in UTF-8 (see "Why UTF-8?"  LINK TBD).  Also, +no newline is appended, so that the application program can use it in any +way it wants.  Warnings are similar, but use "Warning" instead of +"Error".  Messages are plain text, without any prefix.
+
+ With this in place, the resulting code will already show errors and +warnings produced by the parser, e.g. on the terminal if a simple +printf is used in the message handler.
+ +

Data callback mechanism

- The most important use of the parser is of course to get the data out of - the GEDCOM file.  As already mentioned, the parser uses a callback -mechanism for that.  In fact, the mechanism involves two levels.
-
- The primary level is that each of the sections in a GEDCOM file is notified - to the application code via a "start element" callback and an "end element" - callback (much like in a SAX interface for XML), i.e. when a line containing - a certain tag is parsed, the "start element" callback is called for that -tag, and when all its subordinate lines with their tags have been processed, -the "end element" callback is called for the original tag.  Since GEDCOM - is hierarchical, this results in properly nested calls to appropriate "start - element" and "end element" callbacks.
-
- However, it would be typical for a genealogy program to support only a -subset of the GEDCOM standard, certainly a program that is still under development. -  Moreover, under GEDCOM it is allowed for an application to define -its own tags, which will typically not  be supported by another application. -  Still, in that case, data preservation is important; it would hardly - be accepted that information that is not understood by a certain program -is just removed.
-
- Therefore, the second level of callbacks involves a "default callback". - An application needs to subscribe to callbacks for tags it does support, -and need to provide a "default callback" which will be called for tags it -doesn't support.  The application can then choose to just store the information -that comes via the default callback in plain textual format.
-
- After this introduction, let's see what the API looks like...
-
- + The most important use of the parser is of course to get the data out + of the GEDCOM file.  As already mentioned, the parser uses a callback + mechanism for that.  In fact, the mechanism involves two levels.
+
+ The primary level is that each of the sections in a GEDCOM file is notified + to the application code via a "start element" callback and an "end element" + callback (much like in a SAX interface for XML), i.e. when a line containing + a certain tag is parsed, the "start element" callback is called for that + tag, and when all its subordinate lines with their tags have been processed, + the "end element" callback is called for the original tag.  Since GEDCOM + is hierarchical, this results in properly nested calls to appropriate "start + element" and "end element" callbacks.
+
+ However, it would be typical for a genealogy program to support only +a subset of the GEDCOM standard, certainly a program that is still under +development.  Moreover, under GEDCOM it is allowed for an application +to define its own tags, which will typically not  be supported by another +application.  Still, in that case, data preservation is important; +it would hardly be accepted that information that is not understood by a +certain program is just removed.
+
+ Therefore, the second level of callbacks involves a "default callback". +  An application needs to subscribe to callbacks for tags it does support, + and need to provide a "default callback" which will be called for tags +it doesn't support.  The application can then choose to just store +the information that comes via the default callback in plain textual format.
+
+ After this introduction, let's see what the API looks like...
+
+

Start and end callbacks

- +

Callbacks for records
-

- As a simple example, we will get some information from the header of a -GEDCOM file.  First, have a look at the following piece of code:
- -
Gedcom_ctxt my_header_start_cb (int level, - Gedcom_val xref, char *tag)
- {
-   printf("The header starts\n");
-   return (Gedcom_ctxt)1;
- }
-
- void my_header_end_cb (Gedcom_ctxt self)
- {
-   printf("The header ends, context is %d\n", self);   /* context - will print as "1" */
- }
-
- ...
- gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, -my_header_end_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");

-
- Using the gedcom_subscribe_to_record function, the application - requests to use the specified callbacks as start and end callback. The end - callback is optional: you can pass NULL if you are not interested - in the end callback.  The identifiers to use as first argument to the - function (here REC_HEAD) are described in the - interface details.
-
- From the name of the function it becomes clear that this function is specific - to complete records.  For the separate elements in records there is -another function, which we'll see shortly.  Again, the callbacks need -to have the signatures as shown in the example.
-
- The Gedcom_ctxt type that is used as a result of the start -callback and as an argument to the end callback is vital for passing context -necessary for the application.  This type is meant to be opaque; in fact, -it's a void pointer, so you can pass anything via it.  The important -thing to know is that the context that the application returns in the start -callback will be passed in the end callback as an argument, and as we will -see shortly, also to all the directly subordinate elements of the record.
-
- The example passes a simple integer as context, but an application could - e.g. pass a struct that will contain the information for the - header.  In the end callback, the application could then e.g. do some - finalizing operations on the struct to put it in its database.
-
- (Note that the Gedcom_val type for the xref argument - was not discussed, see further for this)
+ + As a simple example, we will get some information from the header of +a GEDCOM file.  First, have a look at the following piece of code:
+ +
Gedcom_ctxt my_header_start_cb (int level, +
+                       +          Gedcom_val xref,
+                       +          char *tag,
+                       +          char *raw_value,
+                       +          int parsed_tag,
+                       +          Gedcom_val parsed_value)
+ {
+   printf("The header starts\n");
+   return (Gedcom_ctxt)1;
+ }
+
+ void my_header_end_cb (Gedcom_ctxt self)
+ {
+   printf("The header ends, context is %d\n", (int)self);   /* context + will print as "1" */
+ }
+
+ ...
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, + my_header_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");

+
+ Using the gedcom_subscribe_to_record function, the application + requests to use the specified callbacks as start and end callback. The end + callback is optional: you can pass NULL if you are not interested + in the end callback.  The identifiers to use as first argument to +the function (here REC_HEAD) are described in the + interface details.
+
+ From the name of the function it becomes clear that this function is +specific to complete records.  For the separate elements in records +there is another function, which we'll see shortly.  Again, the callbacks +need to have the signatures as shown in the example.
+
+ The Gedcom_ctxt type that is used as a result of the start + callback and as an argument to the end callback is vital for passing context + necessary for the application.  This type is meant to be opaque; in + fact, it's a void pointer, so you can pass anything via it.  The important + thing to know is that the context that the application returns in the start + callback will be passed in the end callback as an argument, and as we will + see shortly, also to all the directly subordinate elements of the record.

- + The tag is the GEDCOM tag in string format, the parsed_tag + is an integer, for which symbolic values are defined as TAG_HEAD, + TAG_SOUR, TAG_DATA, ... and USERTAG + for the application-specific tags.  These values are defined in the + header gedcom-tags.h that is installed, and included via + gedcom.h (so no need to include gedcom-tags.h yourself).
+
+ The example passes a simple integer as context, but an application could + e.g. pass a struct (or an object in a C++ application) that will contain the information for the + header.  In the end callback, the application could then e.g. do some + finalizing operations on the struct to put it in its database.
+
+ (Note that the Gedcom_val type for the xref + and parsed_value arguments was not discussed, see further +for this)
+
+

Callbacks for elements

- We will now retrieve the SOUR field (the name of the program that wrote -the file) from the header:
- -
Gedcom_ctxt my_header_source_start_cb(Gedcom_ctxt - parent,
+ We will now retrieve the SOUR field (the name of the program that wrote + the file) from the header:
+ +
Gedcom_ctxt my_header_source_start_cb(Gedcom_ctxt + parent,
+                     +                   int   +       level,
+                     +                   char*   +     tag,
+                     +                   char*   +     raw_value,
                                      int     -     level,
-                       -                 char*     -   tag,
-                       -                 char*     -   raw_value,
-                       -                 Gedcom_val  parsed_value)
- {
-   char *source = GEDCOM_STRING(parsed_value);
-   printf("This file was written by %s\n", source);
-   return parent;
- }
-
- void my_header_source_end_cb(Gedcom_ctxt parent,
-                       -        Gedcom_ctxt self,
-                       -        Gedcom_val  parsed_value)
- {
-   printf("End of the source description\n");
- }
-
- ...
- gedcom_subscribe_to_element(ELT_HEAD_SOUR,
-                       -       my_header_source_start_cb,
-                       -       my_header_source_end_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");

-
- The subscription mechanism for elements is similar, only the signatures -of the callbacks differ.  The signature for the start callback shows -that the context of the parent line (e.g. the struct that describes - the header) is passed to this start callback.  The callback itself -returns here the same context, but this can be its own context object of -course.  The end callback is called with both the context of the parent -and the context of itself, which will be the same in the example.  Again, -the list of identifiers to use as a first argument for the subscription function -are detailed in the interface -details .
-
- If we look at the other arguments of the start callback, we see the level - number (the initial number of the line in the GEDCOM file), the tag (e.g. - "SOUR"), and then a raw value and a parsed value.  The raw value is -just the raw string that occurs as value on the line next to the tag (in -UTF-8 encoding).  The parsed value is the meaningful value that is parsed -from that raw string.
-
- The Gedcom_val type is meant to be an opaque type.  The - only thing that needs to be known about it is that it can contain specific - data types, which have to be retrieved from it using pre-defined macros. - These data types are described in the - interface details.
-
- Some extra notes:
- +     parsed_tag,
+                     +                   Gedcom_val +  parsed_value)
+ {
+   char *source = GEDCOM_STRING(parsed_value);
+   printf("This file was written by %s\n", source);
+   return parent;
+ }
+
+ void my_header_source_end_cb(Gedcom_ctxt parent,
+                     +          Gedcom_ctxt self,
+                     +          Gedcom_val  parsed_value)
+ {
+   printf("End of the source description\n");
+ }
+
+ ...
+ gedcom_subscribe_to_element(ELT_HEAD_SOUR,
+                     +         my_header_source_start_cb,
+                     +         my_header_source_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");

+
+ The subscription mechanism for elements is similar, only the signatures + of the callbacks differ.  The signature for the start callback shows + that the context of the parent line (here e.g. the struct that +describes the header) is passed to this start callback.  The callback +itself returns here in this example the same context, but this can be its own context object +of course.  The end callback is called with both the context of the +parent and the context of itself, which in this example will be the same. + Again, the list of identifiers to use as a first argument for the +subscription function are detailed in the + interface details .
+
+ If we look at the other arguments of the start callback, we see the +level number (the initial number of the line in the GEDCOM file), the tag +(e.g. "SOUR"), and then a raw value, a parsed tag and a parsed value.  The + raw value is just the raw string that occurs as value on the line next to + the tag (in UTF-8 encoding).  The parsed value is the meaningful value + that is parsed from that raw string.  The parsed tag is described in + the section for record callbacks above.
+
+ The Gedcom_val type is meant to be an opaque type.  The + only thing that needs to be known about it is that it can contain specific + data types, which have to be retrieved from it using pre-defined macros. +  These data types are described in the + interface details.
+
+ Some extra notes:
+ - +

Default callbacks
-

- As described above, an application doesn't always implement the entire -GEDCOM spec, and application-specific tags may have been added by other applications. - To preserve this extra data anyway, a default callback can be registered -by the application, as in the following example:
- -
void my_default_cb (Gedcom_ctxt parent, -int level, char* tag, char* raw_value)
- {
-   ...
- }
-
- ...
- gedcom_set_default_callback(my_default_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");

-
- This callback has a similar signature as the previous ones, but -it doesn't contain a parsed value.  However, it does contain the parent -context, that was returned by the application for the most specific containing -tag that the application supported.
-
- Suppose e.g. that this callback is called for some tags in the header that -are specific to some other application, then our application could make sure -that the parent context contains the struct or object that represents the -header, and use the default callback here to add the level, tag and raw_value -as plain text in a member of that struct or object, thus preserving the information. - The application can then write this out when the data is saved again -in a GEDCOM file.  To make it more specific, consider the following example:
- + + As described above, an application doesn't always implement the entire + GEDCOM spec, and application-specific tags may have been added by other applications. +  To preserve this extra data anyway, a default callback can be registered + by the application, as in the following example:
+ +
void my_default_cb (Gedcom_ctxt parent, + int level, char* tag, char* raw_value, int parsed_tag)
+ {
+   ...
+ }
+
+ ...
+ gedcom_set_default_callback(my_default_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");

+
+ This callback has a similar signature as the previous ones, + but it doesn't contain a parsed value.  However, it does contain the + parent context, that was returned by the application for the most specific + containing tag that the application supported.
+
+ Suppose e.g. that this callback is called for some tags in the header +that are specific to some other application, then our application could make +sure that the parent context contains the struct or object that represents + the header, and use the default callback here to add the level, tag and +raw_value as plain text in a member of that struct or object, thus preserving +the information.  The application can then write this out when the +data is saved again in a GEDCOM file.  To make it more specific, consider + the following example:
+
struct header {
-   char* source;
-   ...
-   char* extra_text;
- };
-
- Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag)
- {
-   struct header head = my_make_header_struct();
-   return (Gedcom_ctxt)head;
- }
-
- void my_default_cb(Gedcom_ctxt parent, int level, char* tag, char* raw_value)
- {
-   struct header head = (struct header)parent;
-   my_header_add_to_extra_text(head, level, tag, raw_value);
- }
-
- gedcom_set_default_callback(my_default_cb);
- gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
- ...
- result = gedcom_parse_file(filename);

-
- Note that the default callback will be called for any tag that isn't specifically -subscribed upon by the application, and can thus be called in various contexts. - For simplicity, the example above doesn't take this into account (the - parent could be of different types, depending on -the context).
- -
+   char* source;
+   ...
+   char* extra_text;
+ };
+
+ Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag, +char *raw_value,
+                       +         int parsed_tag, Gedcom_val parsed_value)
+ {
+   struct header head = my_make_header_struct();
+   return (Gedcom_ctxt)head;
+ }
+
+ void my_default_cb(Gedcom_ctxt parent, int level, char* tag, char* raw_value, + int parsed_tag)
+ {
+   struct header head = (struct header)parent;
+   my_header_add_to_extra_text(head, level, tag, raw_value);
+ }
+
+ gedcom_set_default_callback(my_default_cb);
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
+ ...
+ result = gedcom_parse_file(filename);
+ + Note that the default callback will be called for any tag that isn't +specifically subscribed upon by the application, and can thus be called +in various contexts.  For simplicity, the example above doesn't take +this into account (the parent could be of different +types, depending on the context).
+
+Note also that the default callback is not called when the parent context is NULL.  This is e.g. the case if none of the "upper" tags has been subscribed upon.
+ +
+

Other API functions
-

- Although the above describes the basic interface of libgedcom, there are -some other functions that allow to customize the behaviour of the library. - These will be explained in the current section.
- + + Although the above describes the basic interface of libgedcom, there +are some other functions that allow to customize the behaviour of the library. +  These will be explained in the current section.
+

Debugging

- The library can generate various debugging output, not only from itself, -but also the debugging output generated by the yacc parser.  By default, -no debugging output is generated, but this can be customized using the following -function:
- -
void gedcom_set_debug_level (int level, -FILE* trace_output)
-
- The level can be one of the following values:
- + The library can generate various debugging output, not only from itself, + but also the debugging output generated by the yacc parser.  By default, + no debugging output is generated, but this can be customized using the +following function:
+ +
void gedcom_set_debug_level (int level, + FILE* trace_output)
+
+ The level can be one of the following values:
+ - If the trace_output is NULL, debugging information -will be written to stderr, otherwise the given file handle is -used (which must be open).
-
- + If the trace_output is NULL, debugging information + will be written to stderr, otherwise the given file handle +is used (which must be open).
+
+

Error treatment

- One of the previous sections already described the callback to be registered -to get error messages.  The library also allows to customize what happens -on an error, using the following function:
- -
void gedcom_set_error_handling (Gedcom_err_mech -mechanism)
-
- The mechanism can be one of:
- + One of the previous sections already described the callback to be registered + to get error messages.  The library also allows to customize what +happens on an error, using the following function:
+ +
void gedcom_set_error_handling (Gedcom_err_mech + mechanism)
+
+ The mechanism can be one of:
+ + - This doesn't influence the generation of error or warning messages, only -the behaviour of the parser and its return code.
-
- + This doesn't influence the generation of error or warning messages, only + the behaviour of the parser and its return code.
+
+ +

Compatibility mode
-

- Applications are not necessarily true to the GEDCOM spec (or use a different -version than 5.5).  The intention is that the library is resilient to -this, and goes in compatibility mode for files written by specific programs -(detected via the HEAD.SOUR tag).  This compatibility mode can be enabled -and disabled via the following function:
- + + Applications are not necessarily true to the GEDCOM spec (or use a different + version than 5.5).  The intention is that the library is resilient +to this, and goes in compatibility mode for files written by specific programs + (detected via the HEAD.SOUR tag).  This compatibility mode can be +enabled and disabled via the following function:
+ +
void gedcom_set_compat_handling - (int enable_compat)
-
- The argument can be:
- + (int enable_compat)
+ + The argument can be:
+ + - Note that, currently, no actual compatibility code is present, but this -is on the to-do list.
- -
-
$Id$
$Name$
-
-                    
- - - + Note that, currently, no actual compatibility code is present, but this + is on the to-do list.
+ + +
+ +
$Id$
$Name$

+ +
                    
+ + + \ No newline at end of file