X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;ds=inline;f=doc%2Fdoxygen%2Findex.doc;fp=doc%2Fdoxygen%2Findex.doc;h=9e4dd5d81136a57ddd143f676671814b0ace6246;hb=4c2f5e310ee8e58010d068cf004ec4533418e1d3;hp=ebade0bb19568cefe31df1b1a712a8caf592fdf7;hpb=78db2a84062f4088b1fce98cb2886d444ea8cfac;p=gedcom-parse.git diff --git a/doc/doxygen/index.doc b/doc/doxygen/index.doc index ebade0b..9e4dd5d 100644 --- a/doc/doxygen/index.doc +++ b/doc/doxygen/index.doc @@ -34,6 +34,8 @@ - \ref callback - \ref gom + \section libraries_headers Libraries and headers + The Gedcom Parser Library provides two interfaces. On the one hand, it can be used as a callback-based parser (comparable to the SAX interface of XML); on the other hand, the parser can be used to convert the GEDCOM file into an @@ -56,13 +58,24 @@ program There is a separate script and an M4 macro (for autoconf) to help with - library and compilation flags, see the development support (TODO: REFERENCE!) + library and compilation flags, see the \ref devel "development support". + + \section utf8 Converting character sets + + All strings passed by the GEDCOM parser to the application are in UTF-8 + encoding. Typically, an application needs to convert this to something + else to be able to display it. + + The most common case is that the output character set is controlled by the + locale mechanism (i.e. via the LANG, LC_ALL or LC_CTYPE environment + variables), which also controls the gettext mechanism in the application. + + With gedcom-parse comes a library implementing help functions for UTF-8 + encoding (see the documentation for this library). */ /*! \defgroup callback Callback Interface */ -/*! \defgroup gom Gedcom Object Model in C */ - /*! \defgroup main Main functions of the parser \ingroup callback @@ -97,7 +110,8 @@ GEDCOM file, which will be called by the library on errors, warnings and messages. - A typical piece of code would be: + A typical piece of code would be (gom_parse_file() would be called in case + the C object model is used): \code void my_message_handler(Gedcom_msg_type type, char* msg) @@ -117,10 +131,360 @@ /*! \defgroup cb_mech Data callback mechanism \ingroup callback + + The most important use of the parser is of course to get the data out of + the GEDCOM file. This section focuses on the callback mechanism (see + \ref gom "here" for the C object model). In fact, the mechanism involves + two levels. + + The primary level is that each of the sections in a GEDCOM file is notified + to the application code via a "start element" callback and an "end element" + callback (much like in a SAX interface for XML), i.e. when a line containing + a certain tag is parsed, the "start element" callback is called for that tag + , and when all its subordinate lines with their tags have been processed, + the "end element" callback is called for the original tag. Since GEDCOM is + hierarchical, this results in properly nested calls to appropriate "start + element" and "end element" callbacks (note: see + \ref compat "compatibility handling"). + + However, it would be typical for a genealogy program to support only a + subset of the GEDCOM standard, certainly a program that is still under + development. Moreover, under GEDCOM it is allowed for an application to + define its own tags, which will typically not be supported by another + application. Still, in that case, data preservation is important; it would + hardly be accepted that information that is not understood by a certain + program is just removed. + + Therefore, the second level of callbacks involves a "default callback". An + application needs to subscribe to callbacks for tags it does support, and + need to provide a "default callback" which will be called for tags it + doesn't support. The application can then choose to just store the + information that comes via the default callback in plain textual format. +*/ + +/*! \defgroup start_end Start and end callbacks + \ingroup cb_mech + + The following simple example gets some information from the header record + of a GEDCOM file. + + \code + Gedcom_ctxt my_header_start_cb (Gedcom_rec rec, + int level, + Gedcom_val xref, + char *tag, + char *raw_value, + int parsed_tag, + Gedcom_val parsed_value) + { + printf("The header starts\n"); + return (Gedcom_ctxt)1; + } + + void my_header_end_cb (Gedcom_rec rec, Gedcom_ctxt self) + { + printf("The header ends, context is %d\n", (int)self); + } + + ... + gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, my_header_end_cb); + ... + result = gedcom_parse_file("myfamily.ged"); + \endcode + + Using the gedcom_subscribe_to_record() function, the application requests + to use the specified callbacks as start and end callback (type + \ref Gedcom_rec_start_cb and \ref Gedcom_rec_end_cb). + + Such a callback + can return a context value of type \ref Gedcom_ctxt. This type is meant to + be opaque; in fact, it's a void pointer, so you can pass anything via it. + This context value will be passed to the callbacks of the direct + child elements, and to the end callback. + + The example passes a simple integer as context, but an application could e.g. + pass a \c struct (or an object in a C++ application) that will contain the + information for the record. In the end callback, the application could then + e.g. do some finalizing operations on the \c struct to put it in its + database. + + From the name of the function it becomes clear that this function is + specific to complete records. For the separate elements in records there + is another function, which we'll see shortly. Note that the callbacks need + to have the signatures as shown in the example. + + We will now retrieve the SOUR field (the name of the program that wrote the + file) from the header: + \code + Gedcom_ctxt my_header_source_start_cb(Gedcom_elt elt, + Gedcom_ctxt parent, + int level, + char* tag, + char* raw_value, + int parsed_tag, + Gedcom_val parsed_value) + { + char *source = GEDCOM_STRING(parsed_value); + printf("This file was written by %s\n", source); + return parent; + } + + ... + gedcom_subscribe_to_element(ELT_HEAD_SOUR, + my_header_source_start_cb, + NULL); + ... + result = gedcom_parse_file("myfamily.ged"); + \endcode + + The subscription mechanism for elements is similar, only the signatures of + the callbacks differ. The signature for the start callback shows that the + context of the parent line (here e.g. the \c struct that describes the + header) is passed to this start callback. + + The callback itself returns here in this example the same context, but this + can be its own context object of course. The end callback is called with + both the context of the parent and the context of itself, which in this + example will be the same. +*/ + +/*! \defgroup defcb Default callbacks + \ingroup cb_mech + + An application doesn't always implement the entire GEDCOM spec, and + application-specific tags may have been added by other applications. To + preserve this extra data anyway, a default callback can be registered by + the application, as in the following example: + + \code + void my_default_cb (Gedcom_elt elt, Gedcom_ctxt parent, int level, + char* tag, char* raw_value, int parsed_tag) + { + ... + } + + ... + gedcom_set_default_callback(my_default_cb); + ... + result = gedcom_parse_file("myfamily.ged"); + \endcode + + This callback has a similar signature as the previous ones, but it doesn't + contain a parsed value. However, it does contain the parent context, that + was returned by the application for the most specific containing tag that + the application supported. + + Suppose e.g. that this callback is called for some tags in the header that + are specific to some other application, then our application could make + sure that the parent context contains the struct or object that represents + the header, and use the default callback here to add the level, tag and + raw_value as plain text in a member of that struct or object, thus + preserving the information. + + The application can then write this out when the data is saved again in a + GEDCOM file. To make it more specific, consider the following example: + + \code + struct header { + char* source; + ... + char* extra_text; + }; + + Gedcom_ctxt my_header_start_cb(Gedcom_rec rec, int level, Gedcom_val xref, + char* tag, char *raw_value, + int parsed_tag, Gedcom_val parsed_value) + { + struct header head = my_make_header_struct(); + return (Gedcom_ctxt)head; + } + + void my_default_cb(Gedcom_elt elt, Gedcom_ctxt parent, int level, + char* tag, char* raw_value, int parsed_tag) + { + struct header head = (struct header)parent; + my_header_add_to_extra_text(head, level, tag, raw_value); + } + + gedcom_set_default_callback(my_default_cb); + gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL); + ... + result = gedcom_parse_file(filename); + \endcode + + Note that the default callback will be called for any tag that isn't + specifically subscribed upon by the application, and can thus be called in + various contexts. For simplicity, the example above doesn't take this into + account (the parent could be of different types, depending on the context). + + Note also that the default callback is not called when the parent context is + \c NULL. This is e.g. the case if none of the "upper" tags has been + subscribed upon. +*/ + +/*! \defgroup parsed Parsed values + \ingroup callback + + The \c Gedcom_val type is meant to be an opaque type. The only thing that + needs to be known about it is that it can contains specific data types, which + have to be retrieved from it using pre-defined macros. + + Currently, the specific \c Gedcom_val types are (with \c val of type + \c Gedcom_val): + +
+ | type checker | +cast function | +
null value | +GEDCOM_IS_NULL(val) |
+ N/A | +
string | +GEDCOM_IS_STRING(val) |
+ char* str = GEDCOM_STRING(val); |
+
date | +GEDCOM_IS_DATE(val) |
+ struct date_value dv = GEDCOM_DATE(val); |
+
age | +GEDCOM_IS_AGE(val) |
+ struct age_value age = GEDCOM_AGE(val); |
+
xref pointer | +GEDCOM_IS_XREF_PTR(val) |
+ struct xref_value *xr = GEDCOM_XREF_PTR(val); |
+
"@This is an xref_val@"
.
+*/
+
+/*! \defgroup compat Compatibility mode
+ \ingroup callback
+
+ Applications are not necessarily true to the GEDCOM spec (or use a different
+ version than 5.5). The intention is that the library is resilient to this,
+ and goes in compatibility mode for files written by specific programs
+ (detected via the \c HEAD.SOUR tag).
+
+ Currently, there is (some) compatibility for:
+ - ftree
+ - Lifelines (3.0.2)
+ - Personal Ancestral File (PAF), version 2, 4 and 5
+ - Family Origins
+ - EasyTree
*/
/*! \defgroup write Support for writing GEDCOM files
\ingroup callback
+
+ The Gedcom parser library also contains functions to writing GEDCOM files.
+ Similar as for the parsing itself, there are two interfaces: an interface
+ which is very basic, and requires you to call a function for each line in
+ the GEDCOM file, and an interface which just dumps the Gedcom object model
+ to a file in one shot (if you use the Gedcom object model).
+
+ Again, this section focuses on the basic interface, the Gedcom object model
+ interface is described \ref gom "here".
+
+ Writing a GEDCOM file involves the following steps:
+
+ - first set the encoding options as you want them using
+ gedcom_write_set_encoding() and gedcom_write_set_line_terminator()\n\n
+ By default a file is written in the same encoding as the last read file
+ was in, and the terminator is set to the appropriate one on the current
+ platform.
+
+ - open the file using gedcom_write_open()
+
+ - write the date using gedcom_write_record_str(), ...\n\n
+ The principle is that every line in the GEDCOM file to write corresponds
+ to a call of one of these functions, except that \c CONT/CONC lines can
+ be automatically taken care of.\n\n
+ Note that the result GEDCOM file should conform to the GEDCOM standard.
+ Several checks are built in already, and more will follow, to force this.
+ There is no compatibility mode for writing GEDCOM file (and probably never
+ will be).\n\n
+ All these functions expect their input in UTF-8 encoding. If this is
+ not the case, errors will be returned. Note that for examples of using
+ these functions, you can look at the sources of the Gedcom object model
+ (e.g. the function \c write_header in \c gom/header.c).
+
+ - close the file using gedcom_write_close()
+*/
+
+/*! \defgroup debug Debugging
+ \ingroup callback
+
+ The library can generate various debugging output, not only from itself, but
+ also the debugging output generated by the yacc parser. By default, no
+ debugging output is generated, but this can be changed.
*/
/*! \defgroup gommain Main functions of the object model
@@ -139,3 +503,78 @@
result = gom_parse_file("myfamily.ged");
\endcode
*/
+
+/*! \defgroup gom Gedcom Object Model in C */
+
+/*! \defgroup devel Development support
+ \section configure Macro for configure.in
+
+ There is a macro available for use in configure.in for applications that
+ are using autoconf to configure their sources. The following macro checks
+ whether the Gedcom parser library is available and whether its version is
+ high enough:
+ \code
+ AM_PATH_GEDCOM_PARSER([min_version,[action_if_found,[action_if_not_found,[modules]]]])
+ \endcode
+
+ All the arguments are optional and default to 0. E.g. to check for version
+ 1.34.2, you would put in configure.in the following statement:
+ \code
+ AM_PATH_GEDCOM_PARSER(1.34.2)
+ \endcode
+
+ Note that version numbers now contains three parts (since version 0.20.0:
+ this is also the first version in which this macro is available).
+
+ The macro also sets the variables GEDCOM_CFLAGS and GEDCOM_LIBS for use in
+ Makefiles. Typically, this would be done as follows in a Makefile.am:
+ \code
+ bin_programs = myprg
+ myprg_SOURCES = myprg.c foo.c bar.c
+ INCLUDES = @GEDCOM_CFLAGS@
+ LDADD = @GEDCOM_LIBS@
+ \endcode
+
+ If your program uses some extra modules, they can be passed as fourth
+ argument in the macro, so that the CFLAGS and LIBS are correctly filled in.
+ Currently, the only available module is gom (the Gedcom object model). For
+ example:
+ \code
+ AM_PATH_GEDCOM_PARSER(0.21.2, , ,gom)
+ \endcode
+
+ To be able to use this macro in the sources of your application, you have
+ three options:
+
+ - Put the file \c m4/gedcom.m4 in your autoconf data directory (i.e. the
+ path given by 'aclocal --print-ac-dir'
, usually
+ /usr/share/aclocal)
. You can
+ do this automatically by going into the m4 subdirectory and typing
+ 'make install-m4'
.
+
+ - If you're using autoconf, but not automake, copy the contents of
+ \c m4/gedcom.m4 in the \c aclocal.m4 file in your sources.
+
+ - If you're using automake, copy the contents of \c m4/gedcom.m4 in the
+ \c acinclude.m4 file in your sources.
+
+ \section flags Compilation and linking flags
+
+ Similar to other libraries, the GEDCOM parse library installs a script
+ \c gedcom-config to help with compilation and linking flags for programs
+ that don't use autoconf/automake.
+
+ To get compilation flags for your program, use (depending on whether you
+ only use the callback parser, or also the GEDCOM object model):
+ \code
+ gedcom-config --cflags
+ gedcom-config --cflags gom
+ \endcode
+
+ Similarly, to get linking flags, use one of the following:
+ \code
+ gedcom-config --libs
+ gedcom-config --libs gom
+ \endcode
+ */
+