Only try to delete address if present.

[gedcom-parse.git] / doc / usage.html
diff --git a/doc/usage.html b/doc/usage.html

index 6880dc8f65ab3f3025d679ab4d488f676d8d17cf..39a0cb3594f58e5ca6cfeb0a30d3c1e99c09f371 100644 (file)
--- a/doc/usage.html
+++ b/doc/usage.html
@@ -1,9 +1,7 @@
  <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><title>Using the GEDCOM parser library</title>
    
                                                                
-  <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"></head>
-
-<body text="#000000" bgcolor="#ffffff" link="#000099" vlink="#990099" alink="#000099">
+  <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"></head><body text="#000000" bgcolor="#ffffff" link="#000099" vlink="#990099" alink="#000099">
                   
  <h1 align="center">Using the GEDCOM parser library</h1>
           <br>
@@ -19,8 +17,14 @@
              <li><a href="#Start_and_end_callbacks">Start and end callbacks</a></li>
              <li><a href="#Default_callbacks">Default callbacks</a></li>
                                 
+  </ul><li><a href="#Support_for_writing_GEDCOM_files">Support for writing GEDCOM files</a></li>
+  <ul>
+    <li><a href="#Opening_and_closing_files">Opening and closing files</a></li>
+    <li><a href="#Controlling_some_settings">Controlling some settings</a></li>
+    <li><a href="#Writing_data">Writing data</a><br>
+    </li>
    </ul>
-         <li><a href="#Other_API_functions">Other API functions</a></li>
+<li><a href="#Other_API_functions">Other API functions</a></li>
                             
    <ul>
             <li><a href="#Debugging">Debugging</a></li>
@@ -29,36 +33,46 @@
                             
    </ul>
      <li><a href="#Converting_character_sets">Converting character sets</a></li>
-    <li><a href="#Support_for_configure.in">Support for configure.in</a><br>
+    <li><a href="#Support_for_configure.in">Development support</a><br>
+<br>
       </li>
-           <li><a href="interface.html">Interface details</a><br>
+           <li><a href="interface.html">Interface details of the callback parser</a></li><li><a href="gom.html">C object model</a><br>
              </li>
+
                 
  </ul>
                 
  <hr width="100%" size="2">         
  <h2><a name="Overview"></a>Overview<br>
-         </h2>
-         The GEDCOM parser library is built as a callback-based parser (comparable 
-    to the SAX interface of XML). &nbsp;It comes with:<br>
+         </h2>          The GEDCOM
+parser library provides two interfaces. &nbsp;At the one hand, it can be
+used as a callback-based parser (comparable      to the SAX interface of
+XML); at the other hand, the parser can be used to convert the GEDCOM file
+into an object model (comparable to the DOM interface of XML). &nbsp;It comes
+with:<br>
                   
  <ul>
             <li>a library (<code>libgedcom.so</code>), to be linked in the 
-application     program</li>
+application     program, which implements the callback parser</li>
             <li>a header file (<code>gedcom.h</code>), to be used in the sources 
     of  the application program</li>
         <li>a header file (<code>gedcom-tags.h</code>) that is also installed, 
-  but that is automatically included via <code>gedcom.h</code><br>
-       </li>
+  but that is automatically included via <code>gedcom.h</code></li></ul>Additionally, if you want to use the GEDCOM C object model, the following should be used (note that <code>libgedcom.so</code> is also needed in this case, because the object model uses the callback parser internally):<br>
+<ul>
+  <li>a library (<code>libgedcom_gom.so</code>), to be linked in the application program, which implements the C object model</li>
+  <li>a header file (<code>gom.h</code>), to be used in the sources of the application program<br>
+  </li>
+
                   
-</ul>
-         Next to these, there is also a data directory in <code>$PREFIX/share/gedcom-parse</code>
+</ul>There is a separate script to help with library and compilation flags, see the <a href="#Support_for_configure.in">development support</a>.<br>
+<br>
+Next to these, there is also a data directory in <code>$PREFIX/share/gedcom-parse</code>
            that contains some additional stuff, but which is not immediately 
   important    at first. &nbsp;I'll leave the description of the data directory 
   for later.<br>
           <br>
-         The very simplest call of the gedcom parser is simply the following
-  piece   of code (include of the gedcom header is assumed, as everywhere
+         The very simplest call of the gedcom callback parser is simply the following
+  piece   of code (include of the <code>gedcom.h</code> header is assumed, as everywhere
  in  this manual):<br>
                   
  <blockquote><code>int result;<br>
@@ -71,32 +85,43 @@ in  this manual):<br>
   is  parse  the entire file and return the result. &nbsp;The function returns
    0 on success  and 1 on failure. &nbsp;No other information is available
  using   this function  only.<br>
-  <br>
-  The call to <code>gedcom_init</code>() should be one of the first calls 
+<br>
+Alternatively, programs using the C object model should use the following (in this case, the inclusion of both <code>gedcom.h</code> and <code>gom.h</code> is required):<br>
+  
+<blockquote><code>int result;<br>
+  ...<br>
+    <b>gedcom_init</b>();<br>
+         ...<br>
+         result = <b>gom_parse_file</b>("myfamily.ged");<br>
+           </code>   </blockquote>
+The call to <code>gom_parse_file</code> will build the C object model, which is then a complete representation of the GEDCOM file.<br>
+<br>
+No matter which of the interfaces you use, the call to <code>gedcom_init</code>() should be one of the first calls 
  in your program. &nbsp;The requirement is that it should come before the first
  call to <code>iconv_open</code> (part of the generic character set conversion
  feature) in the program, either by your program itself, or indirectly by
  the library calls it makes. &nbsp;Practically, it should e.g. come before
   any calls to any GTK functions, because GTK uses <code>iconv_open</code>
- in its initialization. &nbsp;For the same reason it is also advised to put
-the <code>-lgedcom</code> option on the linking of the program as the last
-option, so that its initialization code is run first.<br>
-          <br>
-        The next sections will refine this piece of code to be able to have
+ in its initialization.<br>
+&nbsp; <br>
+For the same reason it is also advised to put
+the <code>-lgedcom</code> option
+on the linking of the program as the last option, so that its initialization
+code is run first. &nbsp;In the case of using the C object model, the linking
+options should be: <code>-lgedcom_gom -lgedcom</code><br>
+          <br>The function <code>gedcom_init()</code> also initializes locale handling by calling <code>setlocale(LC_ALL, "")</code>, in case the application would not do this (it doesn't hurt for the application to do the same).<br>
+&nbsp;<br>
+The next sections will refine this piece of code to be able to have
   meaningful errors   and the actual data that is in the file.<br>
                             
  <hr width="100%" size="2">                       
-<h2><a name="Error_handling"></a>Error handling</h2>
-        Since this is a relatively simple topic, it is discussed before the 
- actual   callback mechanism, although it also uses a callback...<br>
-          <br>
-        The library can be used in several different circumstances, both
+<h2><a name="Error_handling"></a>Error handling</h2>The library can be used in several different circumstances, both
  terminal-based     as GUI-based. &nbsp;Therefore, it leaves the actual display
  of the error    message up to the application. &nbsp;For this, the application
  needs to  register  a callback before parsing the GEDCOM file, which will
  be called  by the library   on errors, warnings and messages.<br>
            <br>
-        A typical piece of code would be:<br>
+        A typical piece of code would be (<code>gom_parse_file</code> would be called in case the C object model is used):<br>
                             
  <blockquote><code>void <b>my_message_handler</b> (Gedcom_msg_type type,  
   char *msg)<br>
@@ -129,8 +154,7 @@ way it wants.   &nbsp;Warnings are similar, but use "Warning" instead of "Error"
  <hr width="100%" size="2">                                            
  <h2><a name="Data_callback_mechanism"></a>Data callback mechanism</h2>
          The most important use of the parser is of course to get the data 
-out   of  the GEDCOM file. &nbsp;As already mentioned, the parser uses a callback
-  mechanism  for that. &nbsp;In fact, the mechanism involves two levels.<br>
+out   of  the GEDCOM file. &nbsp;This section focuses on the callback mechanism (see <a href="gom.html">here</a> for the C object model). &nbsp;In fact, the mechanism involves two levels.<br>
                <br>
          The primary level is that each of the sections in a GEDCOM file is
   notified    to the application code via a "start element" callback and an
@@ -395,11 +419,142 @@ raw_value,   int parsed_tag)<br>
   of the "upper" tags has been subscribed upon.<br>
                                                                          
            
+<hr width="100%" size="2"><br>
+<h2><a name="Support_for_writing_GEDCOM_files"></a>Support for writing GEDCOM files</h2>
+The Gedcom parser library also contains functions to writing GEDCOM files.
+&nbsp;Similar as for the parsing itself, there are two interfaces: an interface
+which is very basic, and requires you to call a function for each line in
+the GEDCOM file, and an interface which just dumps the Gedcom object model
+to a file in one shot (if you use the Gedcom object model).<br>
+<br>
+Again, this section focuses on the basic interface, the Gedcom object model interface is described <a href="gom.html#Writing_the_object_model_to_file">here</a>.<br>
+<br>
+<h3><a name="Opening_and_closing_files"></a>Opening and closing files</h3>
+The basic functions for opening and closing Gedcom files for writing are the following:<br>
+<code></code>
+<blockquote><code>Gedcom_write_hndl <b>gedcom_write_open</b> (const char* filename);<br>
+int &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <b>gedcom_write_close</b> (Gedcom_write_hndl hndl, int* total_conv_fails);<br></code></blockquote>
+The function <code>gedcom_write_open</code> takes a parameter the name of
+the file to write, and returns a write handle, which needs to be used in
+subsequent functions. &nbsp;It returns <code>NULL</code> in case of errors.<br>
+<br>
+The function <code>gedcom_write_close</code> takes, next to the write handle,
+an integer pointer as parameter. &nbsp;If you pass an actual pointer for
+this, the function will write in it the total number of conversion failures;
+you can pass <code>NULL</code> if you're not interested. &nbsp;The function returns 0 in case of success, non-zero in case of failure.<br>
+<br>
+<h3><a name="Controlling_some_settings"></a>Controlling some settings<br>
+</h3>
+Note that by default the file is written in ASCII encoding (and hence e.g.
+accented characters will cause a conversion failure). &nbsp;You can change
+this by calling the following function <i>before</i> calling <code>gedcom_write_open</code>, i.e. it affects all files that are opened after it is being called:<code></code><code><br>
+</code>
+<blockquote><code>int <b>gedcom_write_set_encoding</b> (const char* charset, Encoding width, Enc_bom bom);<br></code></blockquote>
+The valid <code>charset</code> values are given in the first column in the file <code>gedcom.enc</code> in the data directory of gedcom-parse (<code>$PREFIX/share/gedcom-parse</code>).
+&nbsp;The character sets UNICODE, ASCII and ANSEL are always supported (these
+are standard for GEDCOM), as well as ANSI (not standard), but there may be
+others.<br>
+<br>
+The <code>width</code> parameter takes one of the following values:<br>
+<ul>
+</ul>
+<ul>
+  <li><code><b>ONE_BYTE</b></code>: This should be used for all character sets except UNICODE.</li>
+  <li><code><b>TWO_BYTE_HILO</b></code>: High-low encoding for UNICODE (i.e. big-endian)</li>
+  <li><code><b>TWO_BYTE_LOHI</b></code>: Low-high encoding for UNICODE (i.e. little-endian)</li>
+</ul>
+The <code>bom</code> parameter determines whether a byte-order-mark should
+be written in the file in case of UNICODE encoding (usually preferred because
+it then clearly indicates the byte ordering). &nbsp;It takes one of the following
+values:<br>
+<ul>
+  <li><code><b>WITHOUT_BOM</b></code></li>
+  <li><code><b>WITH_BOM</b></code></li>
+</ul> For both these parameters you can pass 0 for non-UNICODE encodings,
+since that corresponds to the correct values (and is ignored anyway). &nbsp;The 
+function returns 0 in case of success, non-zero in case of error. &nbsp;Note
+that you still need to pass the correct charset value for the HEAD.CHAR tag,
+otherwise you will get a warning, and the value will be forced to the correct
+value.<br>
+<br>
+Further, it is possible to control the kind of line terminator that is used, via the following function (also to be used before <code>gedcom_write_open</code>):<br>
+<blockquote><code>int <b>gedcom_write_set_line_terminator</b> (Enc_line_end end);<br></code></blockquote>
+The <code>end</code> parameter takes one of the following values:<br>
+<ul>
+  <li><b><code>END_CR</code></b>: only carriage return ("/r") (cf. Macintosh)</li>
+  <li><b><code>END_LF</code></b>: only line feed ("/n") (cf. Unix, Mac OS X)</li>
+  <li><b><code>END_CR_LF</code></b>: first carriage return, then line feed ("/r/n") (cf. DOS, Windows)</li>
+  <li><b><code>END_LF_CR</code></b>: first line feed, then carriage return ("/n/r")</li>
+</ul>
+By default, this is set to the appropriate line terminator on the current
+platform, so it only needs to be changed if there is some special reason
+for it.<br>
+<h3><a name="Writing_data"></a>Writing data<br>
+</h3>
+For actually writing the data, the principle is that every line in the GEDCOM
+file to write corresponds to a call to one of the following functions, except
+that CONT/CONC lines can be automatically taken care of. &nbsp;Note that
+the resulting GEDCOM file should conform to the GEDCOM standard. &nbsp;Several
+checks are built in already, and more will follow, to force this. &nbsp;There
+is (currently) no compatibility mode for writing GEDCOM files.<br>
+<br>
+In general, each of the following functions expect their input in UTF-8 encoding (see also <a href="#Converting_character_sets">here</a>). &nbsp;If this is not the case, errors will be returned.<br>
+<br>
+Note that for examples of using these functions you can look at the sources for the Gedcom object model (e.g. the function <code>write_header</code> in <code>gom/header.c</code>).<br>
+<h4>Records</h4>
+For writing lines corresponding to records (i.e. on level 0), the following function is available:
+<blockquote><code>int <b>gedcom_write_record_str</b> (Gedcom_write_hndl hndl, Gedcom_rec rec, char* xrefstr, char* value);<br></code></blockquote>
+The <code>hndl</code> parameter is the write handle that was returned by <code>gedcom_write_open</code>. &nbsp;The <code>rec</code> parameter is one of the identifiers given in the first column in <a href="interface.html#Record_identifiers">this table</a> (except <code>REC_USER</code>: see below). &nbsp;The <code>xrefstr</code> and <code>val</code> parameters are respectively the cross-reference key of the record (something like '<code>@FAM01@</code>'), and the value of the record line, which should be <code>NULL</code> for some record types, according to the same table.<br>
+<h4>Elements</h4>
+For writing lines corresponding to elements (inside records, i.e. on a level
+bigger than 0), the following functions are available, depending on the data
+type:
+<blockquote><code>int <b>gedcom_write_element_str</b> &nbsp;(Gedcom_write_hndl hndl, Gedcom_elt elt, int parsed_tag, <br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;int parent_rec_or_elt, char* value);<br>
+i</code><code>nt <b>gedcom_write_element_xref</b> (Gedcom_write_hndl hndl, Gedcom_elt elt, int parsed_tag, <br> 
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;int parent_rec_or_elt, struct xref_value*
+value);</code><br>
+  <code>int <b>gedcom_write_element_date</b> (Gedcom_write_hndl hndl, Gedcom_elt elt, int parsed_tag, <br> 
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;int parent_rec_or_elt, struct date_value*
+value);</code><br>
+  <code>i</code><code>nt <b>gedcom_write_element_age&nbsp;</b> (Gedcom_write_hndl hndl, Gedcom_elt elt, int parsed_tag, <br> 
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;int parent_rec_or_elt, struct age_value*
+value);</code><br>
+</blockquote>
+<blockquote><code></code></blockquote>
+These functions only differ in the type of the last argument, which is the value of the element.<br>
+<br>
+The <code>hndl</code> parameter is again the write handle returned by <code>gedcom_write_open</code>. &nbsp;The <code>elt</code> parameter is one of the identifiers given in the first column in <a href="interface.html#Element_identifiers">this table</a> (except <code>ELT_USER</code>: see below). &nbsp;The <code>parent_rec_or_elt</code> is the corresponding <code>rec</code> or <code>elt</code>
+identifier of the logically enclosing statement: this will determine the
+level number written on the line, as the level number of the parent + 1.<br>
+<br>
+Some of the identifiers can actually stand for different tags. &nbsp;For this reason, the <code>parsed_tag</code> has to be passed for some of them. &nbsp;This parsed tag is the same as was returned by the callback functions defined <a href="#Start_and_end_callbacks">above</a>, and is an identifier of the form <code>TAG_<i>name</i></code>. &nbsp;This parameter is needed whenever the second column in <a href="interface.html#Element_identifiers">this table</a> shows several possible tags (this is e.g. the case for <code>ELT_SUB_FAM_EVT</code>).<br>
+<br>
+Note that for writing a date value, the given value should be valid, i.e.
+all its struct fields filled in properly and consistent. &nbsp;This can be
+done by calling <code>gedcom_normalize_date</code> (see <a href="interface.html#date">here</a>).<br>
+<h4>User-defined tags</h4>
+For user-defined tags (tags starting with an underscore), there are separate functions, again depending on the data type:<code></code>
+<blockquote><code>int <b>gedcom_write_user_str</b> &nbsp;(Gedcom_write_hndl hndl, int level, char* tag, char* xrefstr,<br>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; char* value);<br>
+i</code><code>nt <b>gedcom_write_user_xref</b> (Gedcom_write_hndl hndl, </code><code>int level, char* tag, char* xrefstr,</code><br>
+  <code>
+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; struct xref_value* value);</code><br>
+  <code></code></blockquote>
+In the case of user-defined tags, the level and tag string are passed verbatim
+(not controlled by the library). &nbsp;This allows to write any extra data
+that doesn't use a standard tag, but is only allowed for tags starting with
+an underscore.<br>
  <hr width="100%" size="2">                                              
                                
  <h2><a name="Other_API_functions"></a>Other API functions<br>
                       </h2>
-       Although the above describes the basic interface of libgedcom, there 
+
+       Although the above describes the basic interface of the gedcom parser, there 
   are   some other functions that allow to customize the behaviour of the library.
     &nbsp;These will be explained in the current section.<br>
                                                                          
@@ -494,207 +649,52 @@ default)<br>
  the <code>locale</code> mechanism (i.e. via the <code>LANG</code>, <code>
   LC_ALL</code>  or <code>LC_CTYPE</code> environment variables), which also 
  controls the <code>gettext</code>  mechanism in the application. &nbsp;<br>
-                       <br>
-                       <br>
-                                                                        
-                                        The source distribution of <code>
-gedcom-parse</code>   contains an example implementation (<code>utf8-locale.c</code>
- and <code>  utf8-locale.h</code>  in the "t" subdirectory of the top directory).&nbsp;
-&nbsp;Feel free to use  it in your source code (it is not part of the library,
-and it isn't installed  anywhere, so you need to take over the source and
-header file in your application).  &nbsp;<br>
-                       <br>
-    Its interface is:<br>
-                         
-<blockquote>      
-  <pre><code>char *<b>convert_utf8_to_locale</b> (char *input, int *conv_failures);<br>char *<b>convert_locale_to_utf8</b> (char *input);<br></code></pre>
-  </blockquote>
-    Both functions return a pointer to a static buffer that is overwritten
- on each call. &nbsp;To function properly, the application must first set
-the locale using the <code>setlocale</code> function (the second step detailed
- below). &nbsp;All other steps given below, including setting up and closing
- down the conversion handles, are transparantly handled by the two functions.
- &nbsp;<br>
-                         <br>
-   If you pass a pointer to an integer to the first function, it will be
-set  to the number of conversion failures, i.e. characters that couldn't
-be converted;  you can also just pass <code>NULL</code> if you are not interested
-(note  that usually, the interesting information is just whether there <i>
-were</i>    conversion failures or not, which is then given by the integer
-being bigger  than zero or not). &nbsp;The second function doesn't need this,
-because any  locale can be converted to UTF-8.<br>
-                         <br>
-    You can change the "?" that is output for characters that can't be converted 
- to any string you want, using the following function before the conversion 
- calls:<br>
-                           
-<blockquote>      
-  <pre><code>void <b>convert_set_unknown</b> (const char *unknown);</code></pre>
-  </blockquote>
-                           <br>
-   If you want to have your own functions for it instead of this example
-implementation,  the following steps need to be taken by the application
-(more detailed info  can be found in the info file of the GNU libc library
-in the "Generic Charset  Conversion" section under "Character Set Handling"
-or online <a href="http://www.gnu.org/manual/glibc-2.2.3/html_chapter/libc_6.html#SEC99">
-  here</a>):<br>
-                         
-<ul>
-                         <li>inclusion of some headers:</li>
-                         
-</ul>
-                         
-<blockquote>                             
-  <blockquote>                                   
-    <pre><code>#include &lt;locale.h&gt;    /* for setlocale */<br>#include &lt;langinfo.h&gt;  /* for nl_langinfo */<br>#include &lt;iconv.h&gt;     /* for iconv_* functions */<br></code></pre>
-                           </blockquote>
-                           </blockquote>
-                             
-<ul>
-                             <li>set the program's current locale to what 
-the user configured in the environment:</li>
-                             
-</ul>
-                             
-<blockquote>                                 
-  <blockquote>                                       
-    <pre><code>setlocale(LC_ALL, "");</code><br></pre>
-                               </blockquote>
-                               </blockquote>
-                                 
-<ul>
-                                 <li>open a conversion handle for conversion
- from UTF-8 to the character set of the current locale (once for the entire
- program):</li>
-                                 
-</ul>
-                                 
-<blockquote>                                     
-  <blockquote>                                           
-    <pre><code>iconv_t iconv_handle;<br>...<br>iconv_handle = iconv_open(nl_langinfo(CODESET), "UTF-8");</code><br>if (iconv_handle == (iconv_t) -1)<br>  /* signal an error */<br></pre>
-                                   </blockquote>
-                                   </blockquote>
-                                     
-<ul>
-                                     <li>then, every string can be converted
- using the following:</li>
-                                     
-</ul>
-                                     
-<blockquote>                                         
-  <blockquote>                                               
-    <pre><code>/* char* in_buf is the input buffer,    size_t in_len is its length */<br>/* char* out_buf is the output buffer,  size_t out_len is its length */<br><br>size_t nconv;<br>char *in_ptr = in_buf;<br>char *out_ptr = out_buf;<br>nconv = iconv(iconv_handle, &amp;in_ptr, &amp;in_len,&nbsp;&amp;out_ptr, &amp;out_len);</code></pre>
-                                       </blockquote>
-                                       </blockquote>
-                                         
-<blockquote>If the output buffer is not big enough, <code>iconv</code> will
- return -1 and set <code>errno</code> to <code>E2BIG</code>. &nbsp;Also,
-the    <code>in_ptr</code> and <code>out_ptr</code> will point just after
-the last successfully converted character in the respective buffers, and
-the   <code> in_len</code> and <code>out_len</code> will be updated to show
-the remaining lengths. &nbsp;There can be two strategies here:<br>
-                                               
-  <ul>
-                                           <li>Make sure from the beginning 
- that the output buffer is big enough. &nbsp;However, it's difficult to find 
- an absolute maximum length in advance, even given the length of the input 
- string.<br>
-                                             <br>
-                                           </li>
-                                           <li>Do the conversion in several
- steps, growing the output buffer each time to make more space, and calling
-       <code>iconv</code>  consecutively until the conversion is complete.
- &nbsp;This is the preferred way (a function could be written to encapsulate
- all this).</li>
-                                               
-  </ul>
-   Another error case is when the conversion was unsuccessful (if one of
-the  characters can't be represented in the target character set). &nbsp;The 
-  <code> iconv</code> function will then also return -1 and set <code>errno</code>
-   to <code>EILSEQ</code>; the <code>in_ptr</code> will point to the character
- that couldn't be converted. &nbsp;In that case, again two strategies are
-possible:<br>
-                                               
-  <ul>
-                                           <li>Just fail the conversion,
-and  show an error. &nbsp;This is not very user friendly, of course.<br>
-                                             <br>
-                                           </li>
-                                           <li>Skip over the character that
- can't be converted and append a "?" to the output buffer, then call <code>
-  iconv</code> again. &nbsp;Skipping over a UTF-8 character is fairly simple,
- as follows from the <a href="http://www.cl.cam.ac.uk/%7Emgk25/unicode.html#utf-8">encoding rules</a>
-  :</li>
-                                               
-  </ul>
-                                               
-  <ol>
-                                                     
-    <ol>
-                                             <li>if the first byte is in
-binary  0xxxxxxx, then the character is only one byte long, just skip over
-that byte<br>
-                                               <br>
-                                             </li>
-                                             <li>if the first byte is in
-binary  11xxxxxx, then skip over that byte and all bytes 10xxxxxx that follow.<br>
-                                             </li>
-                                                     
-    </ol>
-                                               
-  </ol>
-                                         </blockquote>
-                                           
-<ul>
-                                           <li>eventually, the conversion 
-handle needs to be closed (when the program exits):<br>
-                                           </li>
-                                           
-</ul>
-                                           
-<blockquote>                                               
-  <blockquote>                                                     
-    <pre><code>iconv_close(iconv_handle);<br></code></pre>
-                                             </blockquote>
-                                             </blockquote>
-                                                  The example implementation 
- mentioned above grows the output buffer dynamically and outputs "?" for characters
- that can't be converted.<br>
+                       <br>With<code>
+gedcom-parse</code>   comes a library implementing help functions for UTF-8 encoding (<code></code>see
+the <a href="utf8tools.html">documentation</a> for this library).<br>
                                                                          
                           
  <hr width="100%" size="2">                                              
   
-<h2><a name="Support_for_configure.in"></a>Support for configure.in</h2>
-   Programs using the GEDCOM parser library and using autoconf to configure 
- their sources can use the following statements in configure.in (the example 
- is checking for gedcom-parse, version 1.34):<br>
-                                                   
-<blockquote><code>AC_CHECK_LIB(gedcom, gedcom_parse_file,,<br>
-   &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AC_MSG_ERROR(Cannot
- find libgedcom: Please install gedcom-parse))<br>
-   AC_MSG_CHECKING(for libgedcom version)<br>
-   AC_TRY_RUN([<br>
-   #include &lt;stdio.h&gt;<br>
-   #include &lt;stdlib.h&gt;<br>
-   #include &lt;gedcom.h&gt;<br>
-   int<br>
-   main()<br>
-   {<br>
-   if (GEDCOM_PARSE_VERSION &gt;= 1034) exit(0);<br>
-   exit(1);<br>
-   }],<br>
-   ac_gedcom_version_ok='yes',<br>
-   ac_gedcom_version_ok='no',<br>
-   ac_gedcom_version_ok='no')<br>
-   if test "$ac_gedcom_version_ok" = 'yes' ; then<br>
-   &nbsp; AC_MSG_RESULT(ok)<br>
-   else<br>
-   &nbsp; AC_MSG_RESULT(not ok)<br>
-   &nbsp; AC_MSG_ERROR(You need at least version 1.34 of gedcom-parse)<br>
-   fi</code><br>
-                                                   </blockquote>
-    There are three preprocessor symbols defined for version checks in the
- header:<br>
+<h2><a name="Support_for_configure.in"></a>Development support</h2>
+<h3>Macro for configure.in<br>
+</h3>
+There
+is a macro available for use in configure.in for applications that are using
+autoconf to configure their sources. &nbsp;The following macro checks whether
+the Gedcom parser library is available and whether its version is high enough:<br>
+<blockquote><code>AM_PATH_GEDCOM_PARSER([<i>min_version</i>,[<i>action_if_found</i>,[<i>action_if_not_found,</i>[<i>modules</i>]]]])</code><br>
+</blockquote>
+All the arguments are optional and default to 0. &nbsp;E.g. to check for
+version 1.34.2, you would put in configure.in the following statement:<br>
+<blockquote><code>AM_PATH_GEDCOM_PARSER(1.34.2)</code><br>
+</blockquote>Note that version numbers now contains three parts (since version
+0.20.0: this is also the first version in which this macro is available).<br>
+<br>
+The macro also sets the variables <code>GEDCOM_CFLAGS</code> and <code>GEDCOM_LIBS</code> for use in Makefiles. &nbsp;Typically, this would be done as follows in a Makefile.am:<br>
+<blockquote><code>bin_programs &nbsp; = myprg</code><br>
+  <code>myprg_SOURCES &nbsp;= myprg.c foo.c bar.c<br>
+INCLUDES &nbsp; &nbsp; &nbsp; = @GEDCOM_CFLAGS@<br>
+LDADD &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= @GEDCOM_LIBS@</code></blockquote>
+If your program uses some extra modules, they can be passed as fourth argument
+in the macro, so that the CFLAGS and LIBS are correctly filled in. &nbsp;Currently,
+the only available module is <code>gom</code> (the Gedcom object model). &nbsp;For example:<br>
+<blockquote><code>AM_PATH_GEDCOM_PARSER(0.21.2, , ,gom)</code><br>
+</blockquote>
+To be able to use this macro in the sources of your application, you have three options:<br>
+<ul>
+  <li>Put the file <code>m4/gedcom.m4</code> in your autoconf data directory (i.e. the path given by '<code>aclocal --print-ac-dir</code>', usually <code>/usr/share/aclocal</code>). &nbsp;You can do this automatically by going into the m4 subdirectory and typing '<code>make install-m4</code>'.<br>
+    <br>
+  </li>
+  <li>If you're using autoconf, but not automake, copy the contents of <code>m4/gedcom.m4</code> in the <code>aclocal.m4</code> file in your sources.<br>
+    <br>
+  </li>
+  <li>If you're using automake, copy the contents of <code>m4/gedcom.m4</code> in the <code>acinclude.m4</code> file in your sources.<br>
+  </li>
+</ul>
+<br>
+There are three preprocessor symbols defined for version checks in the
+ header (but their direct use is deprecated: please use the macro above):<br>
                                                       
  <ul>
                                                       <li><code>GEDCOM_PARSE_VERSION_MAJOR</code></li>
@@ -703,7 +703,22 @@ handle needs to be closed (when the program exits):<br>
                                                       </li>
                                                       
  </ul>
-   The last one is equal to <code>(GEDCOM_PARSE_VERSION_MAJOR * 1000) + GEDCOM_PARSE_VERSION_MINOR.</code><br>
+   The last one is equal to <code>(GEDCOM_PARSE_VERSION_MAJOR * 1000) + GEDCOM_PARSE_VERSION_MINOR.</code> As you see, this only checked the major and minor version, not the patch number, so this is obsolete.<br>
+<br>
+<h3>Compilation and linking flags</h3>
+Similar to other libraries, the GEDCOM parse library installs a script <code>gedcom-config</code> to help with compilation and linking flags for programs that don't use autoconf/automake.<br>
+<br>
+To get compilation flags for your program, use (depending on whether you
+only use the callback parser, or also the GEDCOM object model):
+<blockquote><code>gedcom-config --cflags<br>
+gedcom-config --cflags gom</code><br>
+</blockquote>
+Similarly, to get linking flags, use one of the following:
+<blockquote><code>gedcom-config --libs<br>
+gedcom-config --libs gom</code><br>
+</blockquote>
+
+
       
  <hr width="100%" size="2">                                              
                                         
@@ -713,5 +728,14 @@ handle needs to be closed (when the program exits):<br>
  <pre>                    </pre>
                                                                          
                                                          
+<br>
+<br>
+<br>
+<br>
+<br>
+<br>
+<br>
+<br>
+<br>
  <br>
  </body></html>
 \ No newline at end of file