2 Copyright (C) 2001,2002,2003 The Genes Development Team
3 This file is part of the Gedcom parser library.
4 Contributed by Peter Verthez <Peter.Verthez@advalvas.be>, 2003.
6 The Gedcom parser library is free software; you can redistribute it
7 and/or modify it under the terms of the GNU Lesser General Public
8 License as published by the Free Software Foundation; either
9 version 2.1 of the License, or (at your option) any later version.
11 The Gedcom parser library is distributed in the hope that it will be
12 useful, but WITHOUT ANY WARRANTY; without even the implied warranty of
13 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
14 Lesser General Public License for more details.
16 You should have received a copy of the GNU Lesser General Public
17 License along with the Gedcom parser library; if not, write to the
18 Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
24 /*! \mainpage The Gedcom Parser Library
26 The Gedcom Parser Library is a C library that provides an API to applications
27 to parse and process arbitrary genealogy files in the standard GEDCOM format.
29 <a href="http://www.gendex.com/gedcom55/55gctoc.htm">release 5.5</a> of the
32 The following links provide a manual to the Gedcom parser library:
37 \section libraries_headers Libraries and headers
39 The Gedcom Parser Library provides two interfaces. On the one hand, it can
40 be used as a callback-based parser (comparable to the SAX interface of XML);
41 on the other hand, the parser can be used to convert the GEDCOM file into an
42 object model (comparable to the DOM interface of XML). It comes with:
44 - a library (\c libgedcom.so), to be linked in the application program,
45 which implements the callback parser
46 - a header file (\c gedcom.h), to be used in the sources of the application
48 - a header file (\c gedcom-tags.h) that is also installed, but that is
49 automatically included via \c gedcom.h
51 Additionally, if you want to use the C object model, the following should be
52 used (note that \c libgedcom.so is also needed in this case, because the
53 object model uses the callback parser internally):
55 - a library (\c libgedcom_gom.so), to be linked in the application program,
56 which implements the C object model
57 - a header file (\c gom.h), to be used in the source of the application
60 There is a separate script and an M4 macro (for autoconf) to help with
61 library and compilation flags, see the \ref devel "development support".
63 \section utf8 Converting character sets
65 All strings passed by the GEDCOM parser to the application are in UTF-8
66 encoding. Typically, an application needs to convert this to something
67 else to be able to display it.
69 The most common case is that the output character set is controlled by the
70 locale mechanism (i.e. via the LANG, LC_ALL or LC_CTYPE environment
71 variables), which also controls the gettext mechanism in the application.
73 With gedcom-parse comes a library implementing help functions for UTF-8
74 encoding (see the <a href=utf8tools.html>documentation</a> for this library).
77 /*! \defgroup callback Callback Interface */
79 /*! \defgroup main Main functions of the parser
82 The very simplest call of the Gedcom callback parser is simply the following
83 piece of code (include of the \c gedcom.h header is assumed, as everywhere
91 result = gedcom_parse_file("myfamily.ged");
94 Although this will not provide much information, one thing it does is parse
95 the entire file and return the result.
97 The next sections will refine this
98 piece of code to be able to have meaningful errors and the actual data that
102 /*! \defgroup error Error handling
105 The library can be used in several different circumstances, both
106 terminal-based as GUI-based. Therefore, it leaves the actual display of the
107 error message up to the application.
109 For this, the application needs to register a callback before parsing the
110 GEDCOM file, which will be called by the library on errors, warnings and
113 A typical piece of code would be (gom_parse_file() would be called in case
114 the C object model is used):
117 void my_message_handler(Gedcom_msg_type type, char* msg)
122 gedcom_set_message_handler(my_message_handler);
124 result = gedcom_parse_file("myfamily.ged");
127 With this in place, the resulting code will show errors and warnings produced
128 by the parser, e.g. on the terminal if a simple \c printf is used in the
132 /*! \defgroup cb_mech Data callback mechanism
135 The most important use of the parser is of course to get the data out of
136 the GEDCOM file. This section focuses on the callback mechanism (see
137 \ref gom "here" for the C object model). In fact, the mechanism involves
140 The primary level is that each of the sections in a GEDCOM file is notified
141 to the application code via a "start element" callback and an "end element"
142 callback (much like in a SAX interface for XML), i.e. when a line containing
143 a certain tag is parsed, the "start element" callback is called for that tag
144 , and when all its subordinate lines with their tags have been processed,
145 the "end element" callback is called for the original tag. Since GEDCOM is
146 hierarchical, this results in properly nested calls to appropriate "start
147 element" and "end element" callbacks (note: see
148 \ref compat "compatibility handling").
150 However, it would be typical for a genealogy program to support only a
151 subset of the GEDCOM standard, certainly a program that is still under
152 development. Moreover, under GEDCOM it is allowed for an application to
153 define its own tags, which will typically not be supported by another
154 application. Still, in that case, data preservation is important; it would
155 hardly be accepted that information that is not understood by a certain
156 program is just removed.
158 Therefore, the second level of callbacks involves a "default callback". An
159 application needs to subscribe to callbacks for tags it does support, and
160 need to provide a "default callback" which will be called for tags it
161 doesn't support. The application can then choose to just store the
162 information that comes via the default callback in plain textual format.
165 /*! \defgroup start_end Start and end callbacks
168 The following simple example gets some information from the header record
172 Gedcom_ctxt my_header_start_cb (Gedcom_rec rec,
178 Gedcom_val parsed_value)
180 printf("The header starts\n");
181 return (Gedcom_ctxt)1;
184 void my_header_end_cb (Gedcom_rec rec, Gedcom_ctxt self)
186 printf("The header ends, context is %d\n", (int)self);
190 gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, my_header_end_cb);
192 result = gedcom_parse_file("myfamily.ged");
195 Using the gedcom_subscribe_to_record() function, the application requests
196 to use the specified callbacks as start and end callback (type
197 \ref Gedcom_rec_start_cb and \ref Gedcom_rec_end_cb).
200 can return a context value of type \ref Gedcom_ctxt. This type is meant to
201 be opaque; in fact, it's a void pointer, so you can pass anything via it.
202 This context value will be passed to the callbacks of the direct
203 child elements, and to the end callback.
205 The example passes a simple integer as context, but an application could e.g.
206 pass a \c struct (or an object in a C++ application) that will contain the
207 information for the record. In the end callback, the application could then
208 e.g. do some finalizing operations on the \c struct to put it in its
211 From the name of the function it becomes clear that this function is
212 specific to complete records. For the separate elements in records there
213 is another function, which we'll see shortly. Note that the callbacks need
214 to have the signatures as shown in the example.
216 We will now retrieve the SOUR field (the name of the program that wrote the
217 file) from the header:
219 Gedcom_ctxt my_header_source_start_cb(Gedcom_elt elt,
225 Gedcom_val parsed_value)
227 char *source = GEDCOM_STRING(parsed_value);
228 printf("This file was written by %s\n", source);
233 gedcom_subscribe_to_element(ELT_HEAD_SOUR,
234 my_header_source_start_cb,
237 result = gedcom_parse_file("myfamily.ged");
240 The subscription mechanism for elements is similar, only the signatures of
241 the callbacks differ. The signature for the start callback shows that the
242 context of the parent line (here e.g. the \c struct that describes the
243 header) is passed to this start callback.
245 The callback itself returns here in this example the same context, but this
246 can be its own context object of course. The end callback is called with
247 both the context of the parent and the context of itself, which in this
248 example will be the same.
251 /*! \defgroup defcb Default callbacks
254 An application doesn't always implement the entire GEDCOM spec, and
255 application-specific tags may have been added by other applications. To
256 preserve this extra data anyway, a default callback can be registered by
257 the application, as in the following example:
260 void my_default_cb (Gedcom_elt elt, Gedcom_ctxt parent, int level,
261 char* tag, char* raw_value, int parsed_tag)
267 gedcom_set_default_callback(my_default_cb);
269 result = gedcom_parse_file("myfamily.ged");
272 This callback has a similar signature as the previous ones, but it doesn't
273 contain a parsed value. However, it does contain the parent context, that
274 was returned by the application for the most specific containing tag that
275 the application supported.
277 Suppose e.g. that this callback is called for some tags in the header that
278 are specific to some other application, then our application could make
279 sure that the parent context contains the struct or object that represents
280 the header, and use the default callback here to add the level, tag and
281 raw_value as plain text in a member of that struct or object, thus
282 preserving the information.
284 The application can then write this out when the data is saved again in a
285 GEDCOM file. To make it more specific, consider the following example:
294 Gedcom_ctxt my_header_start_cb(Gedcom_rec rec, int level, Gedcom_val xref,
295 char* tag, char *raw_value,
296 int parsed_tag, Gedcom_val parsed_value)
298 struct header head = my_make_header_struct();
299 return (Gedcom_ctxt)head;
302 void my_default_cb(Gedcom_elt elt, Gedcom_ctxt parent, int level,
303 char* tag, char* raw_value, int parsed_tag)
305 struct header head = (struct header)parent;
306 my_header_add_to_extra_text(head, level, tag, raw_value);
309 gedcom_set_default_callback(my_default_cb);
310 gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
312 result = gedcom_parse_file(filename);
315 Note that the default callback will be called for any tag that isn't
316 specifically subscribed upon by the application, and can thus be called in
317 various contexts. For simplicity, the example above doesn't take this into
318 account (the parent could be of different types, depending on the context).
320 Note also that the default callback is not called when the parent context is
321 \c NULL. This is e.g. the case if none of the "upper" tags has been
325 /*! \defgroup parsed Parsed values
328 The \c Gedcom_val type is meant to be an opaque type. The only thing that
329 needs to be known about it is that it can contains specific data types, which
330 have to be retrieved from it using pre-defined macros.
332 Currently, the specific \c Gedcom_val types are (with \c val of type
335 <table border="1" width="100%">
338 <td><b>type checker</b></td>
339 <td><b>cast function</b></td>
343 <td><code>GEDCOM_IS_NULL(val)</code></td>
348 <td><code>GEDCOM_IS_STRING(val)</code></td>
349 <td><code>char* str = GEDCOM_STRING(val);</code></td>
353 <td><code>GEDCOM_IS_DATE(val)</code></td>
354 <td><code>struct date_value dv = GEDCOM_DATE(val);</code></td>
358 <td><code>GEDCOM_IS_AGE(val)</code></td>
359 <td><code>struct age_value age = GEDCOM_AGE(val);</code></td>
362 <td>xref pointer</td>
363 <td><code>GEDCOM_IS_XREF_PTR(val)</code></td>
364 <td><code>struct xref_value *xr = GEDCOM_XREF_PTR(val);</code></td>
368 The type checker returns a true or a false value according to the type of
369 the value, but this is in principle only necessary in the rare circumstances
370 that two types are possible, or where an optional value can be provided.
371 In most cases, the type is fixed for a specific tag.
373 The exact type per tag can be found in the
374 <a href="interface.html">interface details</a>.
376 The null value is used for when the GEDCOM spec doesn't allow a value, or
377 when an optional value is allowed but none is given.
379 The string value is the most general used value currently, for all those
380 values that don't have a more specific meaning. In essence, the value that
381 is returned by \c GEDCOM_STRING(val) is always the same as the \c raw_value
382 passed to the start callback, and is thus in fact redundant.
384 For the other data types, there is a specific section giving details.
387 /*! \defgroup parsed_date Date values
390 The Gedcom_val contains a struct date_value if it denotes a date. The
391 struct date is a part of the struct date_value.
394 /*! \defgroup parsed_age Age values
397 The Gedcom_val contains a struct age_value if it denotes an age.
400 /*! \defgroup parsed_xref Cross-reference values
403 The Gedcom_val contains a pointer to a struct xref_value if it denotes a
404 cross-reference (note: not the struct itself, but a pointer to it !)
406 The parser checks whether all cross-references that are used are defined
407 (if not, an error is produced) and whether all cross-references that are
408 defined are used (if not, a warning is produced). It also checks whether
409 the type of the cross-reference is the same on definition and use (if
410 not, an error is produced).
412 The first two checks are done at the end of
413 the parsing, because cross-references can be defined after their usage
416 A cross-reference key must be a string of maximum 22 characters, of the
420 - followed by an alphanumeric character (A-Z, a-z, 0-9 or underscore)
421 - followed by zero or more characters, which can be any character
423 - terminated by an at sign ('@')
425 An example would thus be: <code>"@This is an xref_val@"</code>.
428 /*! \defgroup compat Compatibility mode
431 Applications are not necessarily true to the GEDCOM spec (or use a different
432 version than 5.5). The intention is that the library is resilient to this,
433 and goes in compatibility mode for files written by specific programs
434 (detected via the \c HEAD.SOUR tag).
436 Currently, there is (some) compatibility for:
439 - Personal Ancestral File (PAF), version 2, 4 and 5
444 /*! \defgroup write Support for writing GEDCOM files
447 The Gedcom parser library also contains functions to writing GEDCOM files.
448 Similar as for the parsing itself, there are two interfaces: an interface
449 which is very basic, and requires you to call a function for each line in
450 the GEDCOM file, and an interface which just dumps the Gedcom object model
451 to a file in one shot (if you use the Gedcom object model).
453 Again, this section focuses on the basic interface, the Gedcom object model
454 interface is described \ref gom "here".
456 Writing a GEDCOM file involves the following steps:
458 - first set the encoding options as you want them using
459 gedcom_write_set_encoding() and gedcom_write_set_line_terminator()\n\n
460 By default a file is written in the same encoding as the last read file
461 was in, and the terminator is set to the appropriate one on the current
464 - open the file using gedcom_write_open()
466 - write the date using gedcom_write_record_str(), ...\n\n
467 The principle is that every line in the GEDCOM file to write corresponds
468 to a call of one of these functions, except that \c CONT/CONC lines can
469 be automatically taken care of.\n\n
470 Note that the result GEDCOM file should conform to the GEDCOM standard.
471 Several checks are built in already, and more will follow, to force this.
472 There is no compatibility mode for writing GEDCOM file (and probably never
474 All these functions expect their input in UTF-8 encoding. If this is
475 not the case, errors will be returned. Note that for examples of using
476 these functions, you can look at the sources of the Gedcom object model
477 (e.g. the function \c write_header in \c gom/header.c).
479 - close the file using gedcom_write_close()
482 /*! \defgroup debug Debugging
485 The library can generate various debugging output, not only from itself, but
486 also the debugging output generated by the yacc parser. By default, no
487 debugging output is generated, but this can be changed.
490 /*! \defgroup gommain Main functions of the object model
493 Programs using the Gedcom object model in C should use the following
495 both the \c gedcom.h and \c gom.h headers is required; contrast this with
496 the example given for the \ref main "callback parser"):
503 result = gom_parse_file("myfamily.ged");
507 /*! \defgroup gom Gedcom Object Model in C */
509 /*! \defgroup devel Development support
510 \section configure Macro for configure.in
512 There is a macro available for use in configure.in for applications that
513 are using autoconf to configure their sources. The following macro checks
514 whether the Gedcom parser library is available and whether its version is
517 AM_PATH_GEDCOM_PARSER([min_version,[action_if_found,[action_if_not_found,[modules]]]])
520 All the arguments are optional and default to 0. E.g. to check for version
521 1.34.2, you would put in configure.in the following statement:
523 AM_PATH_GEDCOM_PARSER(1.34.2)
526 Note that version numbers now contains three parts (since version 0.20.0:
527 this is also the first version in which this macro is available).
529 The macro also sets the variables GEDCOM_CFLAGS and GEDCOM_LIBS for use in
530 Makefiles. Typically, this would be done as follows in a Makefile.am:
533 myprg_SOURCES = myprg.c foo.c bar.c
534 INCLUDES = @GEDCOM_CFLAGS@
535 LDADD = @GEDCOM_LIBS@
538 If your program uses some extra modules, they can be passed as fourth
539 argument in the macro, so that the CFLAGS and LIBS are correctly filled in.
540 Currently, the only available module is gom (the Gedcom object model). For
543 AM_PATH_GEDCOM_PARSER(0.21.2, , ,gom)
546 To be able to use this macro in the sources of your application, you have
549 - Put the file \c m4/gedcom.m4 in your autoconf data directory (i.e. the
550 path given by <code>'aclocal --print-ac-dir'</code>, usually
551 <code>/usr/share/aclocal)</code>. You can
552 do this automatically by going into the m4 subdirectory and typing
553 <code>'make install-m4'</code>.
555 - If you're using autoconf, but not automake, copy the contents of
556 \c m4/gedcom.m4 in the \c aclocal.m4 file in your sources.
558 - If you're using automake, copy the contents of \c m4/gedcom.m4 in the
559 \c acinclude.m4 file in your sources.
561 \section flags Compilation and linking flags
563 Similar to other libraries, the GEDCOM parse library installs a script
564 \c gedcom-config to help with compilation and linking flags for programs
565 that don't use autoconf/automake.
567 To get compilation flags for your program, use (depending on whether you
568 only use the callback parser, or also the GEDCOM object model):
570 gedcom-config --cflags
571 gedcom-config --cflags gom
574 Similarly, to get linking flags, use one of the following:
577 gedcom-config --libs gom