X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;f=doc%2Fusage.html;h=2843d7c528b8c8fa5765eab45601b3fc8407b5c7;hb=8c92a223c34fbd674f26520fb990c64a7b2f9147;hp=3d13ab0d8f93eedabbd68c4f3fdb7deb04af6e8c;hpb=a6f453d612fe285d2585b689e3e4d675de455510;p=gedcom-parse.git diff --git a/doc/usage.html b/doc/usage.html index 3d13ab0..2843d7c 100644 --- a/doc/usage.html +++ b/doc/usage.html @@ -2,398 +2,434 @@
libgedcom.so
), to be linked in the application
- programgedcom.h
), to be used in the sources
-of the application programlibgedcom.so
), to be linked in the application
+ programgedcom.h
), to be used in the sources
+ of the application programgedcom-tags.h
) that is also installed,
+but that is automatically included via gedcom.h
$PREFIX/share/gedcom-parse
- that contains some additional stuff, but which is not immediately important
- at first. I'll leave the description of the data directory for later.$PREFIX/share/gedcom-parse
+ that contains some additional stuff, but which is not immediately important
+ at first. I'll leave the description of the data directory for later.int result;
- ...
- result = gedcom_parse_file("myfamily.ged");
-
- Although this will not provide much information, one thing it does is parse
-the entire file and return the result. The function returns 0 on success
-and 1 on failure. No other information is available using this function
-only.- In the above piece of code,void my_message_handler (Gedcom_msg_type type, -char *msg)
- {
- ...
- }
- ...
- gedcom_set_message_handler(my_message_handler);
- ...
- result = gedcom_parse_file("myfamily.ged");
-
my_message_handler
is the callback
-that will be called for errors (type=ERROR
), warnings (
-type=WARNING
) and messages (type=MESSAGE
). The callback
-must have the signature as in the example. For errors, the
-msg
passed to the callback will have the format:- Note that the entire string will be properly internationalized, and encoded -in UTF-8 (see "Why UTF-8?" LINK TBD). Also, no newline -is appended, so that the application program can use it in any way it wants. - Warnings are similar, but use "Warning" instead of "Error". Messages -are plain text, without any prefix.Error on line
<lineno>: <actual_message>
+ char *msg)
+ {
+ ...
+ }
+ ...
+ gedcom_set_message_handler(my_message_handler);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
printf
- is used in the message handler.+ TheGedcom_ctxt my_header_start_cb (int level, -Gedcom_val xref, char *tag)
- {
- printf("The header starts\n");
- return (Gedcom_ctxt)1;
- }
-
- void my_header_end_cb (Gedcom_ctxt self)
- {
- printf("The header ends, context is %d\n", self); /* context -will print as "1" */
- }
-
- ...
- gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, -my_header_end_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");
+ In the above piece of code,my_message_handler
is the callback + that will be called for errors (type=ERROR
), warnings (+ type=WARNING
) and messages (type=MESSAGE
). The +callback must have the signature as in the example. For errors, the +msg
passed to the callback will have the format:
+ +- Using theError on line
<lineno>: <actual_message>
gedcom_subscribe_to_record
function, the application -requests to use the specified callbacks as start and end callback. The end -callback is optional: you can passNULL
if you are not interested -in the end callback. The identifiers to use as first argument to the -function (hereREC_HEAD
) are described in the -interface details.
+ Note that the entire string will be properly internationalized, and encoded + in UTF-8 (see "Why UTF-8?" LINK TBD). Also, no newline + is appended, so that the application program can use it in any way it wants. + Warnings are similar, but use "Warning" instead of "Error". Messages + are plain text, without any prefix.
- From the name of the function it becomes clear that this function is specific -to complete records. For the separate elements in records there is another -function, which we'll see shortly. Again, the callbacks need to have -the signatures as shown in the example.
+ With this in place, the resulting code will already show errors and warnings + produced by the parser, e.g. on the terminal if a simpleprintf
+ is used in the message handler.
+ +
+Data callback mechanism
+ The most important use of the parser is of course to get the data out +of the GEDCOM file. As already mentioned, the parser uses a callback +mechanism for that. In fact, the mechanism involves two levels.
- TheGedcom_ctxt
type that is used as a result of the start -callback and as an argument to the end callback is vital for passing context -necessary for the application. This type is meant to be opaque; in -fact, it's a void pointer, so you can pass anything via it. The important -thing to know is that the context that the application returns in the start -callback will be passed in the end callback as an argument, and as we will -see shortly, also to all the directly subordinate elements of the record.
+ The primary level is that each of the sections in a GEDCOM file is notified + to the application code via a "start element" callback and an "end element" + callback (much like in a SAX interface for XML), i.e. when a line containing + a certain tag is parsed, the "start element" callback is called for that +tag, and when all its subordinate lines with their tags have been processed, +the "end element" callback is called for the original tag. Since GEDCOM + is hierarchical, this results in properly nested calls to appropriate "start + element" and "end element" callbacks.
- The example passes a simple integer as context, but an application could -e.g. pass astruct
that will contain the information for the -header. In the end callback, the application could then e.g. do some -finalizing operations on thestruct
to put it in its database.
+ However, it would be typical for a genealogy program to support only a +subset of the GEDCOM standard, certainly a program that is still under development. + Moreover, under GEDCOM it is allowed for an application to define its + own tags, which will typically not be supported by another application. + Still, in that case, data preservation is important; it would hardly + be accepted that information that is not understood by a certain program +is just removed.
- (Note that theGedcom_val
type for thexref
argument -was not discussed, see further for this)
+ Therefore, the second level of callbacks involves a "default callback". + An application needs to subscribe to callbacks for tags it does support, + and need to provide a "default callback" which will be called for tags it + doesn't support. The application can then choose to just store the +information that comes via the default callback in plain textual format.
- -Callbacks for elements
- We will now retrieve the SOUR field (the name of the program that wrote -the file) from the header:
- -- The subscription mechanism for elements is similar, only the signatures -of the callbacks differ. The signature for the start callback shows -that the context of the parent line (e.g. theGedcom_ctxt my_header_source_start_cb(Gedcom_ctxt -parent,
- - int - level,
- - char* - tag,
- - char* - raw_value,
- - Gedcom_val parsed_value)
- {
- char *source = GEDCOM_STRING(parsed_value);
- printf("This file was written by %s\n", source);
- return parent;
- }
+ After this introduction, let's see what the API looks like...
+
+ +Start and end callbacks
+ +Callbacks for records
+ As a simple example, we will get some information from the header of a +GEDCOM file. First, have a look at the following piece of code:
+
+ +Gedcom_ctxt my_header_start_cb (int level, + Gedcom_val xref, char *tag, int parsed_tag)
+ {
+ printf("The header starts\n");
+ return (Gedcom_ctxt)1;
+ }
- void my_header_source_end_cb(Gedcom_ctxt parent,
- - Gedcom_ctxt self,
- - Gedcom_val parsed_value)
- {
- printf("End of the source description\n");
- }
+ void my_header_end_cb (Gedcom_ctxt self)
+ {
+ printf("The header ends, context is %d\n", self); /* context + will print as "1" */
+ }
- ...
- gedcom_subscribe_to_element(ELT_HEAD_SOUR,
- - my_header_source_start_cb,
- - my_header_source_end_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");
+ ...
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, + my_header_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
struct
that describes -the header) is passed to this start callback. The callback itself returns -here the same context, but this can be its own context object of course. The -end callback is called with both the context of the parent and the context -of itself, which will be the same in the example. Again, the list of -identifiers to use as a first argument for the subscription function are -detailed in the interface details -.
+ Using thegedcom_subscribe_to_record
function, the application + requests to use the specified callbacks as start and end callback. The end + callback is optional: you can passNULL
if you are not interested + in the end callback. The identifiers to use as first argument to the + function (hereREC_HEAD
) are described in the + interface details.
- If we look at the other arguments of the start callback, we see the level -number (the initial number of the line in the GEDCOM file), the tag (e.g. -"SOUR"), and then a raw value and a parsed value. The raw value is just -the raw string that occurs as value on the line next to the tag (in UTF-8 -encoding). The parsed value is the meaningful value that is parsed from -that raw string.
+ From the name of the function it becomes clear that this function is specific + to complete records. For the separate elements in records there is +another function, which we'll see shortly. Again, the callbacks need +to have the signatures as shown in the example.
- TheGedcom_val
type is meant to be an opaque type. The -only thing that needs to be known about it is that it can contain specific -data types, which have to be retrieved from it using pre-defined macros. These -data types are described in the -interface details.
-
- Some extra notes:
- + TheGedcom_ctxt
type that is used as a result of the start + callback and as an argument to the end callback is vital for passing context + necessary for the application. This type is meant to be opaque; in +fact, it's a void pointer, so you can pass anything via it. The important + thing to know is that the context that the application returns in the start + callback will be passed in the end callback as an argument, and as we will + see shortly, also to all the directly subordinate elements of the record.
+
+Thetag
is the GEDCOM tag in string format, theparsed_tag
+ is an integer, for which symbolic values are defined asTAG_HEAD,
+TAG_SOUR,
TAG_DATA,
... andUSERTAG
+for the application-specific tags. These values are defined in the +header
gedcom-tags.h
that is installed, and included via+gedcom.h
(so no need to includegedcom-tags.h
yourself).
+
+ The example passes a simple integer as context, but an application could + e.g. pass astruct
that will contain the information for the + header. In the end callback, the application could then e.g. do some + finalizing operations on thestruct
to put it in its database.
+
+ (Note that theGedcom_val
type for thexref
+argument was not discussed, see further for this)
+
+ +Callbacks for elements
+ We will now retrieve the SOUR field (the name of the program that wrote + the file) from the header:
+ ++ The subscription mechanism for elements is similar, only the signatures + of the callbacks differ. The signature for the start callback shows + that the context of the parent line (e.g. theGedcom_ctxt my_header_source_start_cb(Gedcom_ctxt + parent,
+ + int + level,
+ + char* + tag,
+ + char* + raw_value,
+ + int + parsed_tag,
+ + Gedcom_val + parsed_value)
+ {
+ char *source = GEDCOM_STRING(parsed_value);
+ printf("This file was written by %s\n", source);
+ return parent;
+ }
+
+ void my_header_source_end_cb(Gedcom_ctxt parent,
+ + Gedcom_ctxt self,
+ + Gedcom_val parsed_value)
+ {
+ printf("End of the source description\n");
+ }
+
+ ...
+ gedcom_subscribe_to_element(ELT_HEAD_SOUR,
+ + my_header_source_start_cb,
+ + my_header_source_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+struct
that describes + the header) is passed to this start callback. The callback itself returns + here the same context, but this can be its own context object of course. + The end callback is called with both the context of the parent and +the context of itself, which will be the same in the example. Again, +the list of identifiers to use as a first argument for the subscription function +are detailed in the interface +details .
+
+ If we look at the other arguments of the start callback, we see the level + number (the initial number of the line in the GEDCOM file), the tag (e.g. + "SOUR"), and then a raw value, a parsed tag and a parsed value. The +raw value is just the raw string that occurs as value on the line next to +the tag (in UTF-8 encoding). The parsed value is the meaningful value +that is parsed from that raw string. The parsed tag is described in +the section for record callbacks.
+
+ TheGedcom_val
type is meant to be an opaque type. The + only thing that needs to be known about it is that it can contain specific + data types, which have to be retrieved from it using pre-defined macros. + These data types are described in the + interface details.
+
+ Some extra notes:
+-
- +- The
-Gedcom_val
argument of the end callback -is currently not used. It is there for future enhancements.- There is also a
- +Gedcom_val
argument in the -start callback for records. This argument is currently a string value -giving the pointer in string form.- The
+Gedcom_val
argument of the end callback + is currently not used. It is there for future enhancements.- There is also a
+Gedcom_val
argument in the + start callback for records. This argument is currently a string value + giving the pointer in string form.Default callbacks
- As described above, an application doesn't always implement the entire GEDCOM -spec, and application-specific tags may have been added by other applications. - To preserve this extra data anyway, a default callback can be registered -by the application, as in the following example:
-
+ + As described above, an application doesn't always implement the entire +GEDCOM spec, and application-specific tags may have been added by other applications. + To preserve this extra data anyway, a default callback can be registered + by the application, as in the following example:
+- This callback has a similar signature as the previous ones, but -it doesn't contain a parsed value. However, it does contain the parent -context, that was returned by the application for the most specific containing -tag that the application supported.void my_default_cb (Gedcom_ctxt parent, -int level, char* tag, char* raw_value)
-{
- ...
-}
-
-...
- gedcom_set_default_callback(my_default_cb);
-...
-result = gedcom_parse_file("myfamily.ged");
-
-
-Suppose e.g. that this callback is called for some tags in the header that -are specific to some other application, then our application could make sure -that the parent context contains the struct or object that represents the -header, and use the default callback here to add the level, tag and raw_value -as plain text in a member of that struct or object, thus preserving the information. - The application can then write this out when the data is saved again -in a GEDCOM file. To make it more specific, consider the following -example:
--Note that the default callback will be called for any tag that isn't specifically -subscribed upon by the application, and can thus be called in various contexts. - For simplicity, the example above doesn't take this into account (the -struct header {
- char* source;
- ...
- char* extra_text;
-};
-
-Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag)
-{
- struct header head = my_make_header_struct();
- return (Gedcom_ctxt)head;
-}
-
-void my_default_cb(Gedcom_ctxt parent, int level, char* tag, char* raw_value)
-{
- struct header head = (struct header)parent;
- my_header_add_to_extra_text(head, level, tag, raw_value);
-}
+ int level, char* tag, char* raw_value, int parsed_tag)
+ {
+ ...
+ }
-gedcom_set_default_callback(my_default_cb);
-gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
-...
-result = gedcom_parse_file(filename);
+ ...
+ gedcom_set_default_callback(my_default_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
parent
could be of different types, depending + This callback has a similar signature as the previous ones, +but it doesn't contain a parsed value. However, it does contain the +parent context, that was returned by the application for the most specific +containing tag that the application supported.
+
+ Suppose e.g. that this callback is called for some tags in the header that + are specific to some other application, then our application could make +sure that the parent context contains the struct or object that represents +the header, and use the default callback here to add the level, tag and +raw_value as plain text in a member of that struct or object, thus preserving +the information. The application can then write this out when the +data is saved again in a GEDCOM file. To make it more specific, consider +the following example:
+ ++ Note that the default callback will be called for any tag that isn't specifically + subscribed upon by the application, and can thus be called in various contexts. + For simplicity, the example above doesn't take this into account (the +struct header {
+ char* source;
+ ...
+ char* extra_text;
+ };
+
+ Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag, int +parsed_tag)
+ {
+ struct header head = my_make_header_struct();
+ return (Gedcom_ctxt)head;
+ }
+
+ void my_default_cb(Gedcom_ctxt parent, int level, char* tag, char* raw_value, +int parsed_tag)
+ {
+ struct header head = (struct header)parent;
+ my_header_add_to_extra_text(head, level, tag, raw_value);
+ }
+
+ gedcom_set_default_callback(my_default_cb);
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
+ ...
+ result = gedcom_parse_file(filename);
+parent
could be of different types, depending on the context).
-
+ +
Other API functions
-Although the above describes the basic interface of libgedcom, there are -some other functions that allow to customize the behaviour of the library. - These will be explained in the current section.
-
+ + Although the above describes the basic interface of libgedcom, there are + some other functions that allow to customize the behaviour of the library. + These will be explained in the current section.
+Debugging
-The library can generate various debugging output, not only from itself, -but also the debugging output generated by the yacc parser. By default, -no debugging output is generated, but this can be customized using the following -function:
+ The library can generate various debugging output, not only from itself, + but also the debugging output generated by the yacc parser. By default, + no debugging output is generated, but this can be customized using the following + function:
+-Thevoid gedcom_set_debug_level (int level, -FILE* trace_output)
-level
can be one of the following values:
+ FILE* trace_output)
+
level
can be one of the following values:trace_output
is NULL
, debugging information
-will be written to stderr
, otherwise the given file handle is
-used (which must be open).trace_output
is NULL
, debugging information
+ will be written to stderr
, otherwise the given file handle
+is used (which must be open).-Thevoid gedcom_set_error_handling (Gedcom_err_mech -mechanism)
-
mechanism
can be one of:mechanism
can be one of:IMMED_FAIL
: immediately fail the parsing
-on an error (this is the default)DEFER_FAIL
: continue parsing after an
-error, but return a failure code eventuallyIGNORE_ERRORS
: continue parsing after
-an error, return success alwaysIMMED_FAIL
: immediately fail the parsing
+ on an error (this is the default)DEFER_FAIL
: continue parsing after
+an error, but return a failure code eventuallyIGNORE_ERRORS
: continue parsing after
+ an error, return success always-The argument can be:void gedcom_set_compat_handling - (int enable_compat)
-
$Id$+ +
$Name$
+