Added some initial documentation.

author Peter Verthez <Peter.Verthez@advalvas.be>

Sun, 30 Dec 2001 22:45:43 +0000 (22:45 +0000)

committer Peter Verthez <Peter.Verthez@advalvas.be>

Sun, 30 Dec 2001 22:45:43 +0000 (22:45 +0000)
author Peter Verthez <Peter.Verthez@advalvas.be>
Sun, 30 Dec 2001 22:45:43 +0000 (22:45 +0000)
committer Peter Verthez <Peter.Verthez@advalvas.be>
Sun, 30 Dec 2001 22:45:43 +0000 (22:45 +0000)
diff --git a/AUTHORS b/AUTHORS

index 44ca7de0936813841b36f7a437a48c8a90bee73b..1f55b053932d0ae582be65f5688e2843c894d7f0 100644 (file)
--- a/AUTHORS
+++ b/AUTHORS
@@ -1,2 +1,17 @@
+Authors of gedcom-parse:
+
+Design and implementation:
+  Peter Verthez <Peter.Verthez@advalvas.be>
+
+Thanks for contributing ideas:
+  Geert Vantienen <Geert.Vantienen@advalvas.be>
+  Perry Rapp <prapp@erols.com>
+
+Integrated external code and data:
+  - Date calculation code: Scott E. Lee (http://www.genealogy.org/~scottlee/)
+  - Skeleton for gconv   : Ulrich Drepper <drepper@cygnus.com>
+  - Gedcom test files    : Heiner Eichmann
+                           (http://heiner-eichmann.de/gedcom/gedcom.htm)
+
  # $Id$
  # $Name$
diff --git a/ChangeLog b/ChangeLog

index 23b439fbaf958ad7710b60a3b86391a064bf361a..6ef55a6c564769f61bd3ada2d26c6882d9d597e4 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2001-12-30  Peter Verthez  <Peter.Verthez@advalvas.be>
+
+       * all: Added some initial documentation.
+
+       * gedcom/gedcom.y: Completed the calling of callbacks.
+
+2001-12-29  Peter Verthez  <Peter.Verthez@advalvas.be>
+
+       * gedcom/gedcom_date.y: Added graceful fallback for date parse errors:
+       put everything as a 'date phrase'.
+
  2001-12-28  Peter Verthez  <Peter.Verthez@advalvas.be>
  
         * gedcom_date.*, date.*: Parsing dates via a separate yacc parser.
diff --git a/Makefile.am b/Makefile.am

index c6a2ee6be713470403847baf13b9c406593b1b34..6b284254bbce80755aeb8a06fef4f49e63810b0e 100644 (file)
--- a/Makefile.am
+++ b/Makefile.am
@@ -12,6 +12,15 @@ testgedcom_LDFLAGS = -L gedcom/.libs -lgedcom
  testgedcom_LDADD  = @INTLLIBS@
  
  EXTRA_DIST = $(pkgdata_DATA)
+VERSIONED_FILES = README
+
+dist-hook:
+       @cd $(distdir); \
+       for file in $(VERSIONED_FILES); do \
+          sed 's/\@VERSION\@/${VERSION}/' $$file > $$file.new; \
+         rm $$file; \
+         mv $$file.new $$file; \
+       done
  
  clean-local:
         rm -f testgedcom.out
diff --git a/NEWS b/NEWS

index 44ca7de0936813841b36f7a437a48c8a90bee73b..2e8f0ad4da6967c69fe90f54172e42c38f45315b 100644 (file)
--- a/NEWS
+++ b/NEWS
@@ -1,2 +1,19 @@
+NOTE: NO BACKWARD COMPATIBILITY IS GUARANTEED FOR 0.x RELEASES !!
+
+release 0.12 ():
+
+ - The calling of callbacks is now completed.
+
+ - The parsed value that is returned in callbacks can now be:
+     - a null value
+     - a string
+     - a date (struct date_value)
+   See the documentation for more info.  Parsing and checking of cross-
+   references will be added next.
+
+release 0.11 (15 December 2001):
+
+ - Initial release from Sourceforge.net (developers only !)
+  
  # $Id$
  # $Name$
diff --git a/README b/README

index 44ca7de0936813841b36f7a437a48c8a90bee73b..aa4e4a27b79a33d87318c15fced164aac424c4fe 100644 (file)
--- a/README
+++ b/README
@@ -1,2 +1,79 @@
+The Gedcom parser library (release @VERSION@)
+-------------------------
+The Gedcom parser library is a C library that provides an API to applications
+to parse and process arbitrary genealogy files in the standard gedcom format.
+
+Its main features are:
+
+ - strict callback-based parser written in C (using lex/yacc)
+
+ - supports the Gedcom 5.5 standard fully
+
+ - supports the standard encoding formats (ASCII, ANSEL, UNICODE), but
+   extensible (via a configuration file) to other encoding formats; by
+   default ANSI is also supported.
+
+ - all strings passed from callbacks to the using program are in UTF-8 format
+
+ - internationalization of the error and warning messages
+
+ - specific parsing of date values to a calendar-neutral date system (Julian
+   days aka serial day numbers); the date parser can be called separately
+
+ - provisions for "compatibility-mode" parsing, to allow for not-exactly-
+   standard syntaxes used by other genealogy programs (only the hooks are
+   in at the moment, not the actual compatibility)
+
+NOTE:
+ - NO BACKWARD COMPATIBILITY is guaranteed for 0.x releases !
+
+To do list:
+ - specific parsing and checking of cross-references
+ - specific parsing of other special values
+ - C++ interface
+ - compatibility with other genealogy programs
+ - older/newer Gedcom standards ?
+ - ...
+
+For more information, refer to the documentation in the doc subdirectory,
+or to the SourceForge project web site and summary page:
+  http://gedcom-parse.sourceforge.net
+  http://sourceforge.net/projects/gedcom-parse
+
+Also, have a look at the 'Genes' program, from which this library is a
+spin-off, and which intends to use this library:
+  http://genes.sourceforge.net
+  http://sourceforge.net/projects/genes
+
+
+Requirements:
+------------
+ - glibc 2.2 or higher
+
+To build from sources, you'll also need:
+ - gcc
+ - autoconf
+ - automake
+ - flex
+ - bison (won't work with plain yacc)
+
+It is possible that it also runs on other platforms than Linux (and that the
+glibc version requirement can be loosened), however, I can only support Linux
+because that is the only platform I have...
+
+
+Installation:
+------------
+This is simply:
+
+  ./configure
+  make
+  make install
+
+You can also run some tests via:
+  make check
+
+
+###############################################################################
  # $Id$
  # $Name$
diff --git a/doc/Makefile.am b/doc/Makefile.am

index 4da9c1879df4527ba75d7dbbd31e440c7bfd6d2b..b328f8f693a78200080f7e1e97e87edc9d99382f 100644 (file)
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
@@ -2,9 +2,15 @@
  # $Id$
  # $Name$
  
-EXTRA_DIST=parser.html
+EXTRA_DIST = index.html usage.html parser.html
+VERSIONED_FILES = index.html
  
  dist-hook:
-       mkdir $(distdir)/images
-       cp -p $(srcdir)/images/schema.obj $(srcdir)/images/schema.png \
-             $(distdir)/images
+       @cd $(distdir); \
+       mkdir images
+       cp -p $(srcdir)/images/schema.obj $(srcdir)/images/schema.png images \
+       for file in $(VERSIONED_FILES); do \
+          sed 's/\@VERSION\@/${VERSION}/' $$file > $$file.new; \
+         rm $$file; \
+         mv $$file.new $$file; \
+       done
diff --git a/doc/index.html b/doc/index.html

new file mode 100644 (file)

index 0000000..09858f3
--- /dev/null
+++ b/doc/index.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+  <title>The GEDCOM parser library</title>
+             
+  <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
+</head>
+  <body>
+   
+<h1 align="center">The GEDCOM parser library</h1>
+  This is the documentation for the GEDCOM parser library, release @VERSION@.<br>
+  <br>
+  The GEDCOM parser library is a C library that provides an API to applications
+ to parse and process arbitrary genealogy files in the standard gedcom format. 
+&nbsp;It supports <a href="http://www.gendex.com/gedcom55">release 5.5</a>
+  of the GEDCOM standard.<br>
+ <br>
+ The rest of the documentation is divided into three parts:<br>
+   
+<ul>
+    <li><a href="usage.html">Usage</a>: This is the main entry point for
+application developers, using the library</li>
+  <li><a href="development.html">Development</a>: This describes some internals
+of the library</li>
+  <li><a href="links.html">Links</a>: A collection of useful links, also
+referenced in the rest of the documentation<br>
+  </li>
+   
+</ul>
+   
+<hr width="100%" size="2">$Id$<br>
+     $Name$<br>
+  <br>
+   
+</body>
+</html>
diff --git a/doc/parser.html b/doc/parser.html

index 15d253b856fe9b32ef9adea9b2495b409d423687..5637cd9edd128fb14c0af8dbc38f03fff8f0c4af 100644 (file)
--- a/doc/parser.html
+++ b/doc/parser.html
@@ -1,129 +1,125 @@
  <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  <html>
  <head>
-                   
+                       
    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
-  <title>Gedcom parser in Genes</title>
+  <title>The Gedcom parser library</title>
  </head>
    <body>
-       
-<div align="Center">    
-<h1>Gedcom parser in Genes</h1>
-       
-<div align="Left">The intention of this page is to provide some explanation 
-  of the gedcom parser, to aid development on and with it. &nbsp;Currently, 
-  the parser is in a state that it works, but some parts are still missing, 
-  notably the interface towards applications. &nbsp;First, some practical 
-issues  of testing with the parser will be explained.<br>
-    <br>
-       
+         
+<div align="center">     
+<h1>The Gedcom parser library</h1>
+         
+<div align="left">The intention of this page is to provide some explanation
+   of the gedcom parser, to aid development on and with it. &nbsp;First,
+some practical  issues  of testing with the parser will be explained.<br>
+     <br>
+         
  <h2>Basic testing<br>
-    </h2>
-     The parser is located in the "gedcom" subdirectory of the Genes source 
- code.  &nbsp;You should be able to perform a basic test using the commands:<br>
-       
-<blockquote><code>make clean<br>
-    make<br>
-     make test</code><br>
-      </blockquote>
-     If everything goes OK, you'll see that some gedcom files are parsed, 
-and   that each parse is successful. &nbsp;Note that the used gedcom files 
-are  made by <a href="http://heiner-eichmann.de/gedcom/gedcom.htm">Heiner 
-Eichmann</a>
-      and are an excellent way to test gedcom parsers thoroughly.<br>
-      <br>
-               
+     </h2>
+You should be able to perform a basic test using the commands:<br>
+         
+<blockquote><code>./configure<br>
+     make<br>
+      make check</code><br>
+       </blockquote>
+      If everything goes OK, you'll see that some gedcom files are parsed,
+ and   that each parse is successful. &nbsp;Note that the used gedcom files
+ are  made by <a href="http://heiner-eichmann.de/gedcom/gedcom.htm">Heiner
+ Eichmann</a>       and are an excellent way to test gedcom parsers thoroughly.<br>
+       <br>
+                   
    <h2>Preparing for further testing</h2>
-     The basic testing described above doesn't show anything else than "Parse 
-  succeeded", which is nice, but not very interesting. &nbsp;Some more detailed 
-  tests are possible, via the <code>gedcom-parse</code> program that is generated 
-  by <code>make test</code>. &nbsp;<br>
-      <br>
-     However, since the output that <code>gedcom-parse</code> generates is
- in  UTF-8 format (more on this later), some preparation is necessary to
-have  a full view on it. &nbsp;Basically, you need a terminal that understands 
-and can display UTF-8 encoded characters, and you need to proper fonts installed 
-  to display them. &nbsp;I'll give some advice on this here, based on the 
-Red  Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x. &nbsp;Any 
-  other distribution that has the same or newer versions for these components 
-  should give the same results.<br>
-      <br>
-     For the first issue, the UTF-8 capable terminal, the safest bet is to
- use    <code>xterm</code> in its unicode mode (which is supported by the
-  <code>   xterm</code> coming with XFree86 4.0.x). &nbsp;UTF-8 capabilities
- have only  recently been added to <code>gnome-terminal</code>, so probably
+      The basic testing described above doesn't show anything else than "Parse
+   succeeded", which is nice, but not very interesting. &nbsp;Some more detailed
+   tests are possible, via the <code>testgedcom</code> program that is generated
+   by <code>make test</code>. &nbsp;<br>
+       <br>
+      However, since the output that <code>testgedcom</code> generates is 
+ in  UTF-8 format (more on this later), some preparation is necessary to have
+ a full view on it. &nbsp;Basically, you need a terminal that understands
+ and can display UTF-8 encoded characters, and you need to proper fonts installed
+   to display them. &nbsp;I'll give some advice on this here, based on the
+ Red  Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x.
+&nbsp;Any    other distribution that has the same or newer versions for these
+components    should give the same results.<br>
+       <br>
+      For the first issue, the UTF-8 capable terminal, the safest bet is
+to  use    <code>xterm</code> in its unicode mode (which is supported by
+the   <code>   xterm</code> coming with XFree86 4.0.x). &nbsp;UTF-8 capabilities 
+ have only  recently been added to <code>gnome-terminal</code>, so probably 
   that is not  in your distribution yet (it certainly isn't in Red Hat 7.1).<br>
-      <br>
-     For the second issue, you'll need the ISO 10646-1 fonts. &nbsp;These 
-come   also with XFree86 4.0.x.<br>
-      <br>
-     The way to start <code>xterm</code> in unicode mode is then e.g. (put
+       <br>
+      For the second issue, you'll need the ISO 10646-1 fonts. &nbsp;These
+ come   also with XFree86 4.0.x.<br>
+       <br>
+      The way to start <code>xterm</code> in unicode mode is then e.g. (put 
   everything  on 1 line !):<br>
-               
-  <blockquote><code>LANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm 
-  -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'</code><br>
-        </blockquote>
-          This first sets the <code>LANG</code> variable to a locale that 
-uses  UTF-8, and then starts <code>xterm</code> with a proper Unicode font. 
-&nbsp;Some  sample UTF-8 plain text files can be found <a href="http://www.cl.cam.ac.uk/%7Emgk25/ucs/examples">
-    here</a>
-    . &nbsp;Just <code>cat</code> them on the command line and see the result.<br>
-        <br>
-                       
+                   
+  <blockquote><code>LANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm
+   -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'</code><br>
+         </blockquote>
+           This first sets the <code>LANG</code> variable to a locale that
+ uses  UTF-8, and then starts <code>xterm</code> with a proper Unicode font.
+ &nbsp;Some  sample UTF-8 plain text files can be found <a href="http://www.cl.cam.ac.uk/%7Emgk25/ucs/examples">
+     here</a>     . &nbsp;Just <code>cat</code> them on the command line
+and see the result.<br>
+         <br>
+                             
      <h2>Testing the parser with debugging</h2>
-    Given the UTF-8 capable terminal, you can now let the <code>gedcom-parse</code>
-     program print the values that it parses. &nbsp;An example of a command 
- line is (in the <code>gedcom</code> directory):<br>
-                       
-    <blockquote><code>./gedcom_parse -dg t/ulhc.ged</code><br>
-          </blockquote>
-    The <code>-dg</code> option instructs the parser to show its own debug
- messages  &nbsp;(see <code>./gedcom_parse -h</code> for the full set of
-options).  &nbsp;If  everything is OK, you'll see the values from the gedcom
-file, containing  a lot of special characters.<br>
-          <br>
-    For the ANSEL test file (<code>t/ansel.ged</code>), you have to set the 
- environment variable <code>GCONV_PATH</code> to the <code>ansel</code> subdirectory 
- of the gedcom directory:<br>
-                               
-      <blockquote><code>export GCONV_PATH=./ansel<br>
-    ./gedcom_parse -dg t/ansel.ged<br>
-            </code></blockquote>
-    This is because for the ANSEL character set an extra module is needed 
-for  the iconv library (more on this later). &nbsp;But again, this should 
-show  a lot of special characters.<br>
-            <br>
+     Given the UTF-8 capable terminal, you can now let the <code>testgedcom</code>
+      program print the values that it parses. &nbsp;An example of a command
+  line is (in the <code>gedcom</code> directory):<br>
+                             
+    <blockquote><code>./testgedcom -dg t/ulhc.ged</code><br>
+           </blockquote>
+     The <code>-dg</code> option instructs the parser to show its own debug 
+ messages  &nbsp;(see <code>./testgedcom -h</code> for the full set of options).
+ &nbsp;If  everything is OK, you'll see the values from the gedcom file,
+containing  a lot of special characters.<br>
+           <br>
+     For the ANSEL test file (<code>t/ansel.ged</code>), you have to set
+the   environment variable <code>GCONV_PATH</code> to the <code>ansel</code>
+ subdirectory   of the gedcom directory:<br>
                                         
+      <blockquote><code>export GCONV_PATH=./ansel<br>
+     ./testgedcom -dg t/ansel.ged<br>
+             </code></blockquote>
+     This is because for the ANSEL character set an extra module is needed
+ for  the iconv library (more on this later). &nbsp;But again, this should
+ show  a lot of special characters.<br>
+             <br>
+                                                 
          <h2>Testing the lexers separately</h2>
-    The lexers themselves can be tested separately. &nbsp;For the 1-byte
-lexer   (i.e. supporting the encodings with 1 byte per characters, such as
+     The lexers themselves can be tested separately. &nbsp;For the 1-byte 
+lexer   (i.e. supporting the encodings with 1 byte per characters, such as 
  ASCII,   ANSI and ANSEL), the sequence of commands would be:<br>
-                                       
+                                                 
          <blockquote><code>make clean<br>
-    make test_1byte<br>
-          </code></blockquote>
-This will show all tokens in the <code>t/allged.ged</code> test file. &nbsp;Similar
+     make test_1byte<br>
+           </code></blockquote>
+ This will show all tokens in the <code>t/allged.ged</code> test file. &nbsp;Similar 
  tests can be done using <code>make test_hilo</code> and <code>make test_lohi</code>
- (for the unicode lexers).<br>
-              <br>
-    This concludes the testing setup. &nbsp;Now for some explanations...<br>
-              <br>
-                                               
+  (for the unicode lexers).<br>
+               <br>
+     This concludes the testing setup. &nbsp;Now for some explanations...<br>
+               <br>
+                                                           
            <h2>Structure of the parser</h2>
-    I see the structure of a program using the gedcom parser as follows:<br>
-              <br>
-              <img src="images/schema.png" alt="Gedcom parsing scheme">
-              <br>
-              <br>
-              <br>
-    TO BE COMPLETED...<br>
-           
-          <hr width="100%" size="2">$Id: parser.html,v 1.2 2001/12/01 15:29:00
+     I see the structure of a program using the gedcom parser as follows:<br>
+               <br>
+               <img src="images/schema.png" alt="Gedcom parsing scheme">
+               <br>
+               <br>
+               <br>
+     TO BE COMPLETED...<br>
+                       
+          <hr width="100%" size="2">$Id: parser.html,v 1.2 2001/12/01 15:29:00 
  verthezp Exp $<br>
- $Name$<br>
-            <br>
-              </div>
-              </div>
-                                               
+  $Name$<br>
+             <br>
+               </div>
+               </div>
+                                                           
            </body>
            </html>
diff --git a/doc/usage.html b/doc/usage.html

new file mode 100644 (file)

index 0000000..f47684f
--- /dev/null
+++ b/doc/usage.html
@@ -0,0 +1,307 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+<head>
+  <title>Using the GEDCOM parser library</title>
+      
+  <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
+</head>
+<body>
+ 
+<h1 align="center">Using the GEDCOM parser library</h1>
+ <br>
+ 
+<h2>Index</h2>
+<ul>
+  <li><a href="#anchor">Overview</a></li>
+  <li><a href="#Error_handling">Error handling</a></li>
+  <li><a href="#Data_callback_mechanism">Data callback mechanism</a></li>
+  <ul>
+    <li><a href="#Start_and_end_callbacks">Start and end callbacks</a></li>
+    <li><a href="#Default_callbacks">Default callbacks</a><br>
+    </li>
+  </ul>
+</ul>
+<hr width="100%" size="2"> 
+<h2><a name="Overview"></a>Overview<br>
+ </h2>
+ The GEDCOM parser library is built as a callback-based parser (comparable 
+to the SAX interface of XML). &nbsp;It comes with:<br>
+ 
+<ul>
+   <li>a library (<code>libgedcom.so</code>), to be linked in the application 
+program</li>
+   <li>a header file (<code>gedcom.h</code>), to be used in the sources of 
+the application program</li>
+ 
+</ul>
+ Next to these, there is also a data directory in <code>$PREFIX/share/gedcom-parse</code>
+  that contains some additional stuff, but which is not immediately important 
+at first. &nbsp;I'll leave the description of the data directory for later.<br>
+ <br>
+ The very simplest call of the gedcom parser is simply the following piece 
+of code (include of the gedcom header is assumed, as everywhere in this manual):<br>
+ 
+<blockquote><code>int result;<br>
+ ...<br>
+ result = <b>gedcom_parse_file</b>("myfamily.ged");<br>
+   </code>   </blockquote>
+ Although this will not provide much information, one thing it does is parse
+the entire file and return the result. &nbsp;The function returns 0 on success
+and 1 on failure. &nbsp;No other information is available using this function
+only.<br>
+  <br>
+The next sections will refine this to be able to have meaningful errors and
+the actual data that is in the file.<br>
+  <hr width="100%" size="2">   
+  <h2><a name="Error_handling"></a>Error handling</h2>
+Since this is a relatively simple topic, it is discussed before the actual
+callback mechanism, although it also uses a callback...<br>
+  <br>
+The library can be used in several different circumstances, both terminal-based
+as GUI-based. &nbsp;Therefore, it leaves the actual display of the error
+message up to the application. &nbsp;For this, the application needs to register
+a callback before parsing the GEDCOM file, which will be called by the library
+on errors, warnings and messages.<br>
+  <br>
+A typical piece of code would be:<br>
+  <blockquote><code>void <b>my_message_handler</b> (Gedcom_msg_type type,
+char *msg)<br>
+{<br>
+&nbsp; ...<br>
+}<br>
+...<br>
+    <b>gedcom_set_message_handler</b>(my_message_handler);<br>
+...<br>
+result = <b>gedcom_parse_file</b>("myfamily.ged");</code><br>
+    </blockquote>
+In the above piece of code, <code>my_message_handler</code> is the callback
+that will be called for errors (<code>type=ERROR</code>), warnings (<code>
+type=WARNING</code>) and messages (<code>type=MESSAGE</code>). &nbsp;The
+callback must have the signature as in the example. &nbsp;For errors, the
+    <code>msg</code> passed to the callback will have the format:<br>
+    <blockquote><code>Error on line</code> <i>&lt;lineno&gt;</i>: <i>&lt;actual_message&gt;</i><br>
+      </blockquote>
+Note that the entire string will be properly internationalized, and encoded
+in UTF-8 (see "Why UTF-8?" &nbsp;<i>LINK TBD</i>). &nbsp;Also, no newline
+is appended, so that the application program can use it in any way it wants.
+&nbsp;Warnings are similar, but use "Warning" instead of "Error". &nbsp;Messages
+are plain text, without any prefix.<br>
+      <br>
+With this in place, the resulting code will already show errors and warnings
+produced by the parser, e.g. on the terminal if a simple <code>printf</code>
+ is used in the message handler.<br>
+      <hr width="100%" size="2">
+      <h2><a name="Data_callback_mechanism"></a>Data callback mechanism</h2>
+The most important use of the parser is of course to get the data out of
+the GEDCOM file. &nbsp;As already mentioned, the parser uses a callback mechanism
+for that. &nbsp;In fact, the mechanism involves two levels.<br>
+      <br>
+The primary level is that each of the sections in a GEDCOM file is notified
+to the application code via a "start element" callback and an "end element"
+callback (much like in a SAX interface for XML), i.e. when a line containing
+a certain tag is parsed, the "start element" callback is called for that
+tag, and when all its subordinate lines with their tags have been processed,
+the "end element" callback is called for the original tag. &nbsp;Since GEDCOM
+is hierarchical, this results in properly nested calls to appropriate "start
+element" and "end element" callbacks.<br>
+      <br>
+However, it would be typical for a genealogy program to support only a subset
+of the GEDCOM standard, certainly a program that is still under development.
+&nbsp;Moreover, under GEDCOM it is allowed for an application to define its
+own tags, which will typically not &nbsp;be supported by another application.
+&nbsp;Still, in that case, data preservation is important; it would hardly
+be accepted that information that is not understood by a certain program
+is just removed.<br>
+      <br>
+Therefore, the second level of callbacks involves a "default callback". &nbsp;An
+application needs to subscribe to callbacks for tags it does support, and
+need to provide a "default callback" which will be called for tags it doesn't
+support. &nbsp;The application can then choose to just store the information
+that comes via the default callback in plain textual format.<br>
+      <br>
+After this introduction, let's see what the API looks like...<br>
+      <br>
+      <h3><a name="Start_and_end_callbacks"></a>Start and end callbacks</h3>
+      <h4><i>Callbacks for records</i> <br>
+      </h4>
+As a simple example, we will get some information from the header of a GEDCOM
+file. &nbsp;First, have a look at the following piece of code:<br>
+      <blockquote><code>Gedcom_ctxt <b>my_header_start_cb</b> (int level,
+Gedcom_val xref, char *tag)<br>
+{<br>
+&nbsp; printf("The header starts\n");<br>
+&nbsp; return (Gedcom_ctxt)1;<br>
+}<br>
+        <br>
+void <b>my_header_end_cb</b> (Gedcom_ctxt self)<br>
+{<br>
+&nbsp; printf("The header ends, context is %d\n", self); &nbsp; /* context
+will print as "1" */<br>
+}<br>
+        <br>
+...<br>
+        <b>gedcom_subscribe_to_record</b>(REC_HEAD, my_header_start_cb, my_header_end_cb);<br>
+...<br>
+result = <b>gedcom_parse_file</b>("myfamily.ged");</code><br>
+        </blockquote>
+   Using the <code>gedcom_subscribe_to_record</code> function, the application
+requests to use the specified callbacks as start and end callback. The end
+callback is optional: you can pass <code>NULL</code> if you are not interested
+in the end callback. &nbsp;The identifiers to use as first argument to the
+function (here <code>REC_HEAD</code>) are described in <i>TBD (use the header
+file for now...)</i>.<br>
+        <br>
+From the name of the function it becomes clear that this function is specific
+to complete records. &nbsp;For the separate elements in records there is
+another function, which we'll see shortly. &nbsp;Again, the callbacks need
+to have the signatures as shown in the example.<br>
+        <br>
+The <code>Gedcom_ctxt</code> type that is used as a result of the start callback
+and as an argument to the end callback is vital for passing context necessary
+for the application. &nbsp;This type is meant to be opaque; in fact, it's
+a void pointer, so you can pass anything via it. &nbsp;The important thing
+to know is that the context that the application returns in the start callback
+will be passed in the end callback as an argument, and as we will see shortly,
+also to all the directly subordinate elements of the record.<br>
+        <br>
+The example passes a simple integer as context, but an application could
+e.g. pass a <code>struct</code> that will contain the information for the
+header. &nbsp;In the end callback, the application could then e.g. do some
+finalizing operations on the <code>struct</code> to put it in its database.<br>
+        <br>
+(Note that the <code>Gedcom_val</code> type for the <code>xref</code> argument
+was not discussed, see further for this)<br>
+        <br>
+        <h4><i>Callbacks for elements</i></h4>
+We will now retrieve the SOUR field (the name of the program that wrote the
+file) from the header:<br>
+        <blockquote><code>Gedcom_ctxt <b>my_header_source_start_cb</b>(Gedcom_ctxt
+parent,<br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int &nbsp; &nbsp;
+&nbsp; &nbsp; level,<br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; char* &nbsp; &nbsp;
+&nbsp; tag,<br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; char* &nbsp; &nbsp;
+&nbsp; raw_value,<br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Gedcom_val &nbsp;parsed_value)<br>
+{<br>
+&nbsp; char *source = GEDCOM_STRING(parsed_value);<br>
+&nbsp; printf("This file was written by %s\n", source);<br>
+&nbsp; return parent;<br>
+}<br>
+          <br>
+void <b>my_header_source_end_cb</b>(Gedcom_ctxt parent,<br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; &nbsp;Gedcom_ctxt self,<br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; &nbsp;Gedcom_val &nbsp;parsed_value)<br>
+{<br>
+&nbsp; printf("End of the source description\n");<br>
+}<br>
+          <br>
+...<br>
+          <b>gedcom_subscribe_to_element</b>(ELT_HEAD_SOUR,<br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; my_header_source_start_cb,<br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; my_header_source_end_cb);<br>
+...<br>
+result = <b>gedcom_parse_file</b>("myfamily.ged");</code><br>
+          </blockquote>
+The subscription mechanism for elements is similar, only the signatures of
+the callbacks differ. &nbsp;The signature for the start callback shows that
+the context of the parent line (e.g. the <code>struct</code> that describes
+the header) is passed to this start callback. &nbsp;The callback itself returns
+here the same context, but this can be its own context object of course.
+&nbsp;The end callback is called with both the context of the parent and
+the context of itself, which will be the same in the example.<br>
+          <br>
+If we look at the other arguments of the start callback, we see the level
+number (the initial number of the line in the GEDCOM file), the tag (e.g.
+"SOUR"), and then a raw value and a parsed value. &nbsp;The raw value is
+just the raw string that occurs as value on the line next to the tag (in
+UTF-8 encoding). &nbsp;The parsed value is the meaningful value that is parsed
+from that raw string.<br>
+          <br>
+The <code>Gedcom_val</code> type is meant to be an opaque type. &nbsp;The
+only thing that needs to be known about it is that it can contain specific
+data types, which have to be retrieved from it using pre-defined macros.
+&nbsp;Currently, the specific types are (with <code>val</code> of type <code>
+Gedcom_val</code>):<br>
+          <br>
+          <table cellpadding="2" cellspacing="2" border="1" width="100%">
+            <tbody>
+              <tr>
+                <td valign="top"><br>
+                </td>
+                <td valign="top"><b>type checker</b><br>
+                </td>
+                <td valign="top"><b>cast operator</b><br>
+                </td>
+              </tr>
+              <tr>
+                <td valign="top">null value<br>
+                </td>
+                <td valign="top"><code>GEDCOM_IS_NULL(val)</code><br>
+                </td>
+                <td valign="top">N/A<br>
+                </td>
+              </tr>
+              <tr>
+                <td valign="top">string<br>
+                </td>
+                <td valign="top"><code>GEDCOM_IS_STRING(val)</code><br>
+                </td>
+                <td valign="top"><code>char* str = GEDCOM_STRING(val);</code><br>
+                </td>
+              </tr>
+              <tr>
+                <td valign="top">date<br>
+                </td>
+                <td valign="top"><code>GEDCOM_IS_DATE(val)</code><br>
+                </td>
+                <td valign="top"><code>struct date_value dv = GEDCOM_DATE(val)
+;</code><br>
+                </td>
+              </tr>
+            </tbody>
+          </table>
+          <br>
+The null value is used for when the GEDCOM spec doesn't allow a value, or
+when an optional value is allowed but none is given.<br>
+&nbsp; <br>
+The string value is the most general used value currently, for all those
+values that don't have a more specific meaning. &nbsp;In essence, the value
+that is returned by GEDCOM_STRING is always the same as the raw_value passed
+to the start callback, and is thus in fact redundant.<br>
+          <br>
+The date value is used for all elements that return a date. &nbsp;(<i>Description
+of struct date_value TBD: look in the header file for the moment</i>).<br>
+          <br>
+The type checker returns a true or a false value according to the type of
+the value, but this is in principle only necessary in the rare circumstances
+that two types are possible, or where an optional value can be provided.
+&nbsp;In most cases, the type is fixed for a specific tag (<i>types per tag
+to be described</i>).<br>
+          <br>
+Some extra notes:<br>
+          <ul>
+            <li>The <code>Gedcom_val</code> argument of the end callback
+is currently not used. &nbsp;It is there for future enhancements.</li>
+            <li>There is also a <code>Gedcom_val</code> argument in the start
+callback for records. &nbsp;This argument is currently a string value giving
+the pointer in string form.</li>
+          </ul>
+          <h3><a name="Default_callbacks"></a>Default callbacks<br>
+          </h3>
+TO BE COMPLETED<br>
+          <hr width="100%" size="2">$Id$<br>
+       $Name$<br>
+   <br>
+   
+          </body>
+          </html>
author	Peter Verthez <Peter.Verthez@advalvas.be>
	Sun, 30 Dec 2001 22:45:43 +0000 (22:45 +0000)
committer	Peter Verthez <Peter.Verthez@advalvas.be>
	Sun, 30 Dec 2001 22:45:43 +0000 (22:45 +0000)
AUTHORS		patch \| blob \| history
ChangeLog		patch \| blob \| history
Makefile.am		patch \| blob \| history
NEWS		patch \| blob \| history
README		patch \| blob \| history
doc/Makefile.am		patch \| blob \| history
doc/index.html	[new file with mode: 0644]	patch \| blob
doc/parser.html		patch \| blob \| history
doc/usage.html	[new file with mode: 0644]	patch \| blob