+ <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"><title>The Gedcom parser library internals</title></head><body text="#000000" bgcolor="#ffffff" link="#000099" vlink="#990099" alink="#000099">
+
+<div align="center">
+<h1>The Gedcom parser library internals</h1>
+
+<div align="left">The intention of this page is to provide some explanation
+ of the gedcom parser, to aid development on and with it. First,
+some practical issues of testing with the parser will be explained.<br>
+<br>
+<h2>Index</h2>
+<ul>
+ <li><a href="#Testing">Testing</a></li>
+ <ul>
+ <li><a href="#Basic_testing">Basic testing</a></li>
+ <li><a href="#Preparing_for_further_testing">Preparing for further testing</a></li>
+ <li><a href="#Testing_the_parser_with_debugging">Testing the parser with debugging</a></li>
+ <li><a href="#Testing_the_lexers_separately">Testing the lexers separately</a><br>
+ </li>
+ </ul>
+ <li><a href="#Structure_of_the_parser">Structure of the parser</a></li>
+ <ul>
+ <li><a href="#Character_encoding">Character encoding</a><br>
+ </li>
+ </ul>
+</ul>
+<br>
+<hr width="100%" size="2">
+<h2><a name="Testing"></a>Testing<br>
+</h2>
+
+
+<h3><a name="Basic_testing"></a>Basic testing<br>
+
+ </h3>
+
+ You should be able to perform a basic test using the commands:<br>
+
+<blockquote><code>./configure<br>
+ make<br>
+ make check</code><br>
+ </blockquote>
+ If everything goes OK, you'll see that some gedcom files are parsed,
+ and that each parse is successful. Note that some of the used gedcom files
+ are made by <a href="http://heiner-eichmann.de/gedcom/gedcom.htm">Heiner
+ Eichmann</a> and are an excellent way to test gedcom parsers thoroughly.<br>
+ <br>
+
+
+ <h3><a name="Preparing_for_further_testing"></a>Preparing for further testing</h3>
+Some
+more detailed tests are possible, via the <code>testgedcom</code> program
+that is generated by <code>make</code>. <br>
+ <br>
+ However, since the output that <code>testgedcom</code> generates
+is in UTF-8 format (more on this later), some preparation is necessary
+to have a full view on it. Basically, you need a terminal that understands
+ and can display UTF-8 encoded characters, and you need to proper fonts
+installed to display them. I'll give some advice on this here,
+based on the Red Hat 7.1 distribution that I use, with glibc 2.2 and XFree86
+4.0.x. Any other distribution that has the same or newer versions
+for these components should give the same results.<br>
+ <br>
+ For the first issue, the UTF-8 capable terminal, the safest bet is
+ to use <code>xterm</code> in its unicode mode (which is supported by
+the <code> xterm</code> coming with XFree86 4.0.x). UTF-8 capabilities
+ have only recently been added to <code>gnome-terminal</code>, so probably
+ that is not in your distribution yet (it certainly isn't in Red Hat 7.1).<br>
+ <br>
+ For the second issue, you'll need the ISO 10646-1 fonts. These
+ come also with XFree86 4.0.x.<br>
+ <br>
+ The way to start <code>xterm</code> in unicode mode is then e.g.
+(put everything on 1 line !):<br>
+
+ <blockquote><code>LANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm
+ -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'</code><br>
+ </blockquote>
+ This first sets the <code>LANG</code> variable to a locale that
+ uses UTF-8, and then starts <code>xterm</code> with a proper Unicode font.
+ Some sample UTF-8 plain text files can be found <a href="http://www.cl.cam.ac.uk/%7Emgk25/ucs/examples">
+ here</a> . Just <code>cat</code> them on the command line
+ and see the result.<br>
+ <br>
+
+
+ <h3><a name="Testing_the_parser_with_debugging"></a>Testing the parser with debugging</h3>
+
+ Given the UTF-8 capable terminal, you can now let the <code>testgedcom</code>
+ program print the values that it parses. An example of a command
+ line is (in the top <code></code>directory):<br>
+
+ <blockquote><code>./testgedcom -dg t/input/ulhc.ged</code><br>
+ </blockquote>
+ The <code>-dg</code> option instructs the parser to show its own debug
+ messages (see <code>./testgedcom -h</code> for the full set of options).
+ If everything is OK, you'll see the values from the gedcom file,
+containing a lot of special characters.<br>
+ <br>
+ For the ANSEL test file (<code>t/ansel.ged</code>), you have to set
+ the environment variable <code>GCONV_PATH</code> to the <code>ansel</code>
+ subdirectory of the top directory:<br>
+