1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
5 <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
6 <title>The Gedcom parser library</title>
11 <h1>The Gedcom parser library</h1>
13 <div align="left">The intention of this page is to provide some explanation
14 of the gedcom parser, to aid development on and with it. First,
15 some practical issues of testing with the parser will be explained.<br>
20 You should be able to perform a basic test using the commands:<br>
22 <blockquote><code>./configure<br>
26 If everything goes OK, you'll see that some gedcom files are parsed,
27 and that each parse is successful. Note that the used gedcom files
28 are made by <a href="http://heiner-eichmann.de/gedcom/gedcom.htm">Heiner
29 Eichmann</a> and are an excellent way to test gedcom parsers thoroughly.<br>
32 <h2>Preparing for further testing</h2>
33 The basic testing described above doesn't show anything else than "Parse
34 succeeded", which is nice, but not very interesting. Some more detailed
35 tests are possible, via the <code>testgedcom</code> program that is generated
36 by <code>make test</code>. <br>
38 However, since the output that <code>testgedcom</code> generates is
39 in UTF-8 format (more on this later), some preparation is necessary to have
40 a full view on it. Basically, you need a terminal that understands
41 and can display UTF-8 encoded characters, and you need to proper fonts installed
42 to display them. I'll give some advice on this here, based on the
43 Red Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x.
44 Any other distribution that has the same or newer versions for these
45 components should give the same results.<br>
47 For the first issue, the UTF-8 capable terminal, the safest bet is
48 to use <code>xterm</code> in its unicode mode (which is supported by
49 the <code> xterm</code> coming with XFree86 4.0.x). UTF-8 capabilities
50 have only recently been added to <code>gnome-terminal</code>, so probably
51 that is not in your distribution yet (it certainly isn't in Red Hat 7.1).<br>
53 For the second issue, you'll need the ISO 10646-1 fonts. These
54 come also with XFree86 4.0.x.<br>
56 The way to start <code>xterm</code> in unicode mode is then e.g. (put
57 everything on 1 line !):<br>
59 <blockquote><code>LANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm
60 -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'</code><br>
62 This first sets the <code>LANG</code> variable to a locale that
63 uses UTF-8, and then starts <code>xterm</code> with a proper Unicode font.
64 Some sample UTF-8 plain text files can be found <a href="http://www.cl.cam.ac.uk/%7Emgk25/ucs/examples">
65 here</a> . Just <code>cat</code> them on the command line
66 and see the result.<br>
69 <h2>Testing the parser with debugging</h2>
70 Given the UTF-8 capable terminal, you can now let the <code>testgedcom</code>
71 program print the values that it parses. An example of a command
72 line is (in the <code>gedcom</code> directory):<br>
74 <blockquote><code>./testgedcom -dg t/ulhc.ged</code><br>
76 The <code>-dg</code> option instructs the parser to show its own debug
77 messages (see <code>./testgedcom -h</code> for the full set of options).
78 If everything is OK, you'll see the values from the gedcom file,
79 containing a lot of special characters.<br>
81 For the ANSEL test file (<code>t/ansel.ged</code>), you have to set
82 the environment variable <code>GCONV_PATH</code> to the <code>ansel</code>
83 subdirectory of the gedcom directory:<br>
85 <blockquote><code>export GCONV_PATH=./ansel<br>
86 ./testgedcom -dg t/ansel.ged<br>
88 This is because for the ANSEL character set an extra module is needed
89 for the iconv library (more on this later). But again, this should
90 show a lot of special characters.<br>
93 <h2>Testing the lexers separately</h2>
94 The lexers themselves can be tested separately. For the 1-byte
95 lexer (i.e. supporting the encodings with 1 byte per characters, such as
96 ASCII, ANSI and ANSEL), the sequence of commands would be:<br>
98 <blockquote><code>make clean<br>
101 This will show all tokens in the <code>t/allged.ged</code> test file. Similar
102 tests can be done using <code>make test_hilo</code> and <code>make test_lohi</code>
103 (for the unicode lexers).<br>
105 This concludes the testing setup. Now for some explanations...<br>
108 <h2>Structure of the parser</h2>
109 I see the structure of a program using the gedcom parser as follows:<br>
111 <img src="images/schema.png" alt="Gedcom parsing scheme">
115 TO BE COMPLETED...<br>
117 <hr width="100%" size="2">$Id: parser.html,v 1.2 2001/12/01 15:29:00