X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;f=doc%2Fparser.html;h=26586a36e6ec7ec4b6d70a0dca5a9064bc883a0d;hb=8b7ff0dff0815a94ff08a9825d22a0c44490317a;hp=cc1d718525a084029d9595196447def81569d806;hpb=6e1c05d011f3db288b5fc19cb52f494152392cd7;p=gedcom-parse.git diff --git a/doc/parser.html b/doc/parser.html index cc1d718..26586a3 100644 --- a/doc/parser.html +++ b/doc/parser.html @@ -1,129 +1,132 @@ - + Gedcom parser in Genes - -
+ +

Gedcom parser in Genes

- -
The intention of this page is to provide some explanation - of the gedcom parser, to aid development on and with it.  Currently, - the parser is in a state that it works, but some parts are still missing, - notably the interface towards applications.  First, some practical issues - of testing with the parser will be explained.
-
- + +
The intention of this page is to provide some explanation + of the gedcom parser, to aid development on and with it.  Currently, + the parser is in a state that it works, but some parts are still missing, + notably the interface towards applications.  First, some practical +issues of testing with the parser will be explained.
+
+

Basic testing
-

- The parser is located in the "gedcom" subdirectory of the Genes source -code.  You should be able to perform a basic test using the commands:
- + + The parser is located in the "gedcom" subdirectory of the Genes source + code.  You should be able to perform a basic test using the commands:
+
make clean
- make
- make test

-
- If everything goes OK, you'll see that some gedcom files are parsed, and - that each parse is successful.  Note that the used gedcom files are -made by Heiner Eichmann - and are an excellent way to test gedcom parsers thoroughly.
-
- + make
+ make test
+ + If everything goes OK, you'll see that some gedcom files are parsed, +and that each parse is successful.  Note that the used gedcom files +are made by Heiner +Eichmann + and are an excellent way to test gedcom parsers thoroughly.
+
+

Preparing for further testing

- The basic testing described above doesn't show anything else than "Parse - succeeded", which is nice, but not very interesting.  Some more detailed - tests are possible, via the gedcom-parse program that is generated - by make test.  
-
- However, since the output that gedcom-parse generates is -in UTF-8 format (more on this later), some preparation is necessary to have -a full view on it.  Basically, you need a terminal that understands and -can display UTF-8 encoded characters, and you need to proper fonts installed - to display them.  I'll give some advice on this here, based on the Red - Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x.  Any - other distribution that has the same or newer versions for these components - should give the same results.
-
- For the first issue, the UTF-8 capable terminal, the safest bet is to -use xterm in its unicode mode (which is supported by the - xterm coming with XFree86 4.0.x).  UTF-8 capabilities -have only recently been added to gnome-terminal, so probably + The basic testing described above doesn't show anything else than "Parse + succeeded", which is nice, but not very interesting.  Some more detailed + tests are possible, via the gedcom-parse program that is generated + by make test.  
+
+ However, since the output that gedcom-parse generates is +in UTF-8 format (more on this later), some preparation is necessary to have + a full view on it.  Basically, you need a terminal that understands +and can display UTF-8 encoded characters, and you need to proper fonts installed + to display them.  I'll give some advice on this here, based on the +Red Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x.  Any + other distribution that has the same or newer versions for these components + should give the same results.
+
+ For the first issue, the UTF-8 capable terminal, the safest bet is to +use xterm in its unicode mode (which is supported by the + xterm coming with XFree86 4.0.x).  UTF-8 capabilities +have only recently been added to gnome-terminal, so probably that is not in your distribution yet (it certainly isn't in Red Hat 7.1).
-
- For the second issue, you'll need the ISO 10646-1 fonts.  These come - also with XFree86 4.0.x.
-
- The way to start xterm in unicode mode is then e.g. (put +
+ For the second issue, you'll need the ISO 10646-1 fonts.  These +come also with XFree86 4.0.x.
+
+ The way to start xterm in unicode mode is then e.g. (put everything on 1 line !):
- -
LANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm - -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
-
- This first sets the LANG variable to a locale that uses - UTF-8, and then starts xterm with a proper Unicode font.  Some - sample UTF-8 plain text files can be found - here - .  Just cat them on the command line and see the result.
-
+
LANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm + -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
+
+ This first sets the LANG variable to a locale that +uses UTF-8, and then starts xterm with a proper Unicode font. + Some sample UTF-8 plain text files can be found + here + .  Just cat them on the command line and see the result.
+
+

Testing the parser with debugging

- Given the UTF-8 capable terminal, you can now let the gedcom-parse - program print the values that it parses.  An example of a command -line is (in the gedcom directory):
- + Given the UTF-8 capable terminal, you can now let the gedcom-parse + program print the values that it parses.  An example of a command + line is (in the gedcom directory):
+
./gedcom_parse -dg t/ulhc.ged
-
- The -dg option instructs the parser to show its own debug -messages  (see ./gedcom_parse -h for the full set of options). - If everything is OK, you'll see the values from the gedcom file, containing + + The -dg option instructs the parser to show its own debug +messages  (see ./gedcom_parse -h for the full set of options). + If everything is OK, you'll see the values from the gedcom file, containing a lot of special characters.
-
- For the ANSEL test file (t/ansel.ged), you have to set the -environment variable GCONV_PATH to the ansel subdirectory -of the gedcom directory:
- +
+ For the ANSEL test file (t/ansel.ged), you have to set the + environment variable GCONV_PATH to the ansel subdirectory + of the gedcom directory:
+
export GCONV_PATH=./ansel
- ./gedcom_parse -dg t/ansel.ged
-
- This is because for the ANSEL character set an extra module is needed for - the iconv library (more on this later).  But again, this should show - a lot of special characters.
-
- + ./gedcom_parse -dg t/ansel.ged
+ + This is because for the ANSEL character set an extra module is needed +for the iconv library (more on this later).  But again, this should +show a lot of special characters.
+
+

Testing the lexers separately

- The lexers themselves can be tested separately.  For the 1-byte lexer - (i.e. supporting the encodings with 1 byte per characters, such as ASCII, + The lexers themselves can be tested separately.  For the 1-byte lexer + (i.e. supporting the encodings with 1 byte per characters, such as ASCII, ANSI and ANSEL), the sequence of commands would be:
- +
make clean
- make test_1byte
- cat t/allged.ged | ./test_1byte

-
- This will show all tokens in the t/allged.ged test file.  With - the lexers you have to make sure that you use the proper lexer for each -test file.  The test_1byte test program is OK for -allged.ged and ansel.ged (the last one again with the -environment variable set); for the uhl*.ged files you need -the test_hilo test program; for the ulh*.ged -files you need the test_lohi program.
-
- This concludes the testing setup.  Now for some explanations...
-
- + make test_1byte
+ cat t/allged.ged | ./test_1byte
+ + This will show all tokens in the t/allged.ged test file. + With the lexers you have to make sure that you use the proper lexer +for each test file.  The test_1byte test program is OK +for allged.ged and ansel.ged (the last one again +with the environment variable set); for the uhl*.ged files +you need the test_hilo test program; for the ulh*.ged + files you need the test_lohi program.
+
+ This concludes the testing setup.  Now for some explanations...
+
+

Structure of the parser

- I see the structure of a program using the gedcom parser as follows:
-
- Gedcom parsing scheme -
-
-
- TO BE COMPLETED...
-
-
-
- + I see the structure of a program using the gedcom parser as follows:
+
+ Gedcom parsing scheme +
+
+
+ TO BE COMPLETED...
+
$Id$
+$Name$
+
+
+
+