- ./testgedcom -dg t/ansel.ged<br>
- </code></blockquote>
- This is because for the ANSEL character set an extra module is needed
- for the iconv library (more on this later). But again, this should
- show a lot of special characters.<br>
- <br>
-
- <h2>Testing the lexers separately</h2>
- The lexers themselves can be tested separately. For the 1-byte
- lexer (i.e. supporting the encodings with 1 byte per characters, such
-as ASCII, ANSI and ANSEL), the sequence of commands would be:<br>
-
- <blockquote><code>make clean<br>
- make test_1byte<br>
- </code></blockquote>
- This will show all tokens in the <code>t/allged.ged</code> test file. Similar
- tests can be done using <code>make test_hilo</code> and <code>make test_lohi</code>
- (for the unicode lexers).<br>
- <br>
- This concludes the testing setup. Now for some explanations...<br>
- <br>
-
- <h2>Structure of the parser</h2>
- I see the structure of a program using the gedcom parser as follows:<br>
- <br>
- <img src="images/schema.png" alt="Gedcom parsing scheme">
- <br>
- <br>
- <br>
- TO BE COMPLETED...<br>
-
- <hr width="100%" size="2">
- <pre>$Id$<br>$Name$<br></pre>
+ ./testgedcom -dg t/input/ansel.ged<br>
+ </code></blockquote>
+ This is because for the ANSEL character set an extra module is needed
+ for the iconv library (more on this later). But again, this should
+ show a lot of special characters.<br>
+ <br>
+
+
+ <h3><a name="Testing_the_lexers_separately"></a>Testing the lexers separately</h3>
+
+ The lexers themselves can be tested separately. For the 1-byte
+ lexer (i.e. supporting the encodings with 1 byte per characters, such as
+ ASCII, ANSI and ANSEL), the command would be (in the <code>gedcom</code> subdirectory):<br>
+
+ <blockquote><code>make lexer_1byte<br>
+ </code></blockquote>
+ This will generate a lexer program that can process e.g. the <code>t/input/allged.ged</code>
+ test file. Simply cat the file through the lexer on standard input
+and you should get all the tokens in the file. Similar tests can be
+done using <code>make lexer_hilo</code> and <code>
+make lexer_lohi</code>
+ (for the unicode lexers). In each of the cases you need to know
+yourself which of the test files are appropriate to pass through the lexer.<br>
+ <br>
+ This concludes the testing setup. Now for some explanations...<br>
+ <hr width="100%" size="2"><br>
+
+
+ <h2><a name="Structure_of_the_parser"></a>Structure of the parser</h2>
+ I see the structure of a program using the gedcom parser as follows:<br>
+ <br>
+ <img src="images/schema.png" alt="Gedcom parsing scheme">
+ <br>
+ <br>
+ <br>
+ The parser is based on <code>lex/yacc</code>, which means that a module generated by <code>lex</code>
+ takes the inputfile and determines the tokens in that file (i.e. the smallest
+units, such as numbers, line terminators, GEDCOM tags, characters in GEDCOM
+values...). These tokens are passed to the parser module, which is
+generated by yacc, to parse the syntax of the file, i.e. whether the tokens
+appear in a sequence that is valid. <br>