X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;f=doc%2Fparser.html;h=15d253b856fe9b32ef9adea9b2495b409d423687;hb=32e04eb26dbc6c16a45bd00bb7cb741ad2363919;hp=cc1d718525a084029d9595196447def81569d806;hpb=6e1c05d011f3db288b5fc19cb52f494152392cd7;p=gedcom-parse.git diff --git a/doc/parser.html b/doc/parser.html index cc1d718..15d253b 100644 --- a/doc/parser.html +++ b/doc/parser.html @@ -1,129 +1,129 @@
- +- If everything goes OK, you'll see that some gedcom files are parsed, and - that each parse is successful. Note that the used gedcom files are -made by Heiner Eichmann - and are an excellent way to test gedcom parsers thoroughly.make clean
- make
- make test
-
gedcom-parse
program that is generated
- by make test
. gedcom-parse
generates is
-in UTF-8 format (more on this later), some preparation is necessary to have
-a full view on it. Basically, you need a terminal that understands and
-can display UTF-8 encoded characters, and you need to proper fonts installed
- to display them. I'll give some advice on this here, based on the Red
- Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x. Any
- other distribution that has the same or newer versions for these components
- should give the same results.xterm
in its unicode mode (which is supported by the
- xterm
coming with XFree86 4.0.x). UTF-8 capabilities
-have only recently been added to gnome-terminal
, so probably
-that is not in your distribution yet (it certainly isn't in Red Hat 7.1).xterm
in unicode mode is then e.g. (put
-everything on 1 line !):LANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm - -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
+- This first sets themake clean
+ make
+ make test
LANG
variable to a locale that uses - UTF-8, and then startsxterm
with a proper Unicode font. Some - sample UTF-8 plain text files can be found - here - . Justcat
them on the command line and see the result.
+ If everything goes OK, you'll see that some gedcom files are parsed, +and that each parse is successful. Note that the used gedcom files +are made by Heiner +Eichmann + and are an excellent way to test gedcom parsers thoroughly.
- -Testing the parser with debugging
- Given the UTF-8 capable terminal, you can now let thegedcom-parse
- program print the values that it parses. An example of a command -line is (in thegedcom
directory):
- -+This will show all tokens in the./gedcom_parse -dg t/ulhc.ged
+ +Preparing for further testing
+ The basic testing described above doesn't show anything else than "Parse + succeeded", which is nice, but not very interesting. Some more detailed + tests are possible, via thegedcom-parse
program that is generated + bymake test
.
+
+ However, since the output thatgedcom-parse
generates is + in UTF-8 format (more on this later), some preparation is necessary to +have a full view on it. Basically, you need a terminal that understands +and can display UTF-8 encoded characters, and you need to proper fonts installed + to display them. I'll give some advice on this here, based on the +Red Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x. Any + other distribution that has the same or newer versions for these components + should give the same results.
+
+ For the first issue, the UTF-8 capable terminal, the safest bet is to + usexterm
in its unicode mode (which is supported by the +xterm
coming with XFree86 4.0.x). UTF-8 capabilities + have only recently been added tognome-terminal
, so probably + that is not in your distribution yet (it certainly isn't in Red Hat 7.1).
+
+ For the second issue, you'll need the ISO 10646-1 fonts. These +come also with XFree86 4.0.x.
+
+ The way to startxterm
in unicode mode is then e.g. (put + everything on 1 line !):
+ +- TheLANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm + -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
-dg
option instructs the parser to show its own debug -messages (see./gedcom_parse -h
for the full set of options). - If everything is OK, you'll see the values from the gedcom file, containing -a lot of special characters.
+ This first sets theLANG
variable to a locale that +uses UTF-8, and then startsxterm
with a proper Unicode font. + Some sample UTF-8 plain text files can be found + here + . Justcat
them on the command line and see the result.
- For the ANSEL test file (t/ansel.ged
), you have to set the -environment variableGCONV_PATH
to theansel
subdirectory -of the gedcom directory:
- -- This is because for the ANSEL character set an extra module is needed for - the iconv library (more on this later). But again, this should show - a lot of special characters.export GCONV_PATH=./ansel
- ./gedcom_parse -dg t/ansel.ged
-
+ +Testing the parser with debugging
+ Given the UTF-8 capable terminal, you can now let thegedcom-parse
+ program print the values that it parses. An example of a command + line is (in thegedcom
directory):
+ ++ The./gedcom_parse -dg t/ulhc.ged
+-dg
option instructs the parser to show its own debug + messages (see./gedcom_parse -h
for the full set of +options). If everything is OK, you'll see the values from the gedcom +file, containing a lot of special characters.
- + For the ANSEL test file (t/ansel.ged
), you have to set the + environment variableGCONV_PATH
to theansel
subdirectory + of the gedcom directory:
+ ++ This is because for the ANSEL character set an extra module is needed +for the iconv library (more on this later). But again, this should +show a lot of special characters.export GCONV_PATH=./ansel
+ ./gedcom_parse -dg t/ansel.ged
+
+
+Testing the lexers separately
- The lexers themselves can be tested separately. For the 1-byte lexer - (i.e. supporting the encodings with 1 byte per characters, such as ASCII, - ANSI and ANSEL), the sequence of commands would be:
- + The lexers themselves can be tested separately. For the 1-byte +lexer (i.e. supporting the encodings with 1 byte per characters, such as +ASCII, ANSI and ANSEL), the sequence of commands would be:
+- This will show all tokens in themake clean
- make test_1byte
- cat t/allged.ged | ./test_1byte
-t/allged.ged
test file. With - the lexers you have to make sure that you use the proper lexer for each -test file. Thetest_1byte
test program is OK for-allged.ged
andansel.ged
(the last one again with the -environment variable set); for theuhl*.ged
files you need -thetest_hilo
test program; for theulh*.ged
-files you need thetest_lohi
program.
-
- This concludes the testing setup. Now for some explanations...
-
- + make test_1byte
+t/allged.ged
test file. Similar +tests can be done usingmake test_hilo
andmake test_lohi
+ (for the unicode lexers).
+
+ This concludes the testing setup. Now for some explanations...
+
+Structure of the parser
- I see the structure of a program using the gedcom parser as follows:
-
- -
-
+ I see the structure of a program using the gedcom parser as follows:
+
+ +
+
+
+ TO BE COMPLETED...
+ +
$Id: parser.html,v 1.2 2001/12/01 15:29:00 +verthezp Exp $
+ $Name$
- TO BE COMPLETED...
-
-