X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;f=doc%2Fparser.html;h=26586a36e6ec7ec4b6d70a0dca5a9064bc883a0d;hb=8b7ff0dff0815a94ff08a9825d22a0c44490317a;hp=cc1d718525a084029d9595196447def81569d806;hpb=6e1c05d011f3db288b5fc19cb52f494152392cd7;p=gedcom-parse.git diff --git a/doc/parser.html b/doc/parser.html index cc1d718..26586a3 100644 --- a/doc/parser.html +++ b/doc/parser.html @@ -1,129 +1,132 @@
- +- If everything goes OK, you'll see that some gedcom files are parsed, and - that each parse is successful. Note that the used gedcom files are -made by Heiner Eichmann - and are an excellent way to test gedcom parsers thoroughly.make clean
- make
- make test
-
gedcom-parse
program that is generated
- by make test
. gedcom-parse
generates is
-in UTF-8 format (more on this later), some preparation is necessary to have
-a full view on it. Basically, you need a terminal that understands and
-can display UTF-8 encoded characters, and you need to proper fonts installed
- to display them. I'll give some advice on this here, based on the Red
- Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x. Any
- other distribution that has the same or newer versions for these components
- should give the same results.xterm
in its unicode mode (which is supported by the
- xterm
coming with XFree86 4.0.x). UTF-8 capabilities
-have only recently been added to gnome-terminal
, so probably
+ The basic testing described above doesn't show anything else than "Parse
+ succeeded", which is nice, but not very interesting. Some more detailed
+ tests are possible, via the gedcom-parse
program that is generated
+ by make test
. gedcom-parse
generates is
+in UTF-8 format (more on this later), some preparation is necessary to have
+ a full view on it. Basically, you need a terminal that understands
+and can display UTF-8 encoded characters, and you need to proper fonts installed
+ to display them. I'll give some advice on this here, based on the
+Red Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x. Any
+ other distribution that has the same or newer versions for these components
+ should give the same results.xterm
in its unicode mode (which is supported by the
+ xterm
coming with XFree86 4.0.x). UTF-8 capabilities
+have only recently been added to gnome-terminal
, so probably
that is not in your distribution yet (it certainly isn't in Red Hat 7.1).xterm
in unicode mode is then e.g. (put
+ xterm
in unicode mode is then e.g. (put
everything on 1 line !):- This first sets theLANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm - -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
-
LANG
variable to a locale that uses
- UTF-8, and then starts xterm
with a proper Unicode font. Some
- sample UTF-8 plain text files can be found
- here
- . Just cat
them on the command line and see the result.+ This first sets theLANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm + -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
+
LANG
variable to a locale that
+uses UTF-8, and then starts xterm
with a proper Unicode font.
+ Some sample UTF-8 plain text files can be found
+ here
+ . Just cat
them on the command line and see the result.gedcom-parse
- program print the values that it parses. An example of a command
-line is (in the gedcom
directory):gedcom-parse
+ program print the values that it parses. An example of a command
+ line is (in the gedcom
directory):- The./gedcom_parse -dg t/ulhc.ged
-
-dg
option instructs the parser to show its own debug
-messages (see ./gedcom_parse -h
for the full set of options).
- If everything is OK, you'll see the values from the gedcom file, containing
+
+ The -dg
option instructs the parser to show its own debug
+messages (see ./gedcom_parse -h
for the full set of options).
+ If everything is OK, you'll see the values from the gedcom file, containing
a lot of special characters.t/ansel.ged
), you have to set the
-environment variable GCONV_PATH
to the ansel
subdirectory
-of the gedcom directory:t/ansel.ged
), you have to set the
+ environment variable GCONV_PATH
to the ansel
subdirectory
+ of the gedcom directory:export GCONV_PATH=./ansel
- ./gedcom_parse -dg t/ansel.ged
-
- This is because for the ANSEL character set an extra module is needed for
- the iconv library (more on this later). But again, this should show
- a lot of special characters.- This will show all tokens in themake clean
- make test_1byte
- cat t/allged.ged | ./test_1byte
-
t/allged.ged
test file. With
- the lexers you have to make sure that you use the proper lexer for each
-test file. The test_1byte
test program is OK for
-allged.ged
and ansel.ged
(the last one again with the
-environment variable set); for the uhl*.ged
files you need
-the test_hilo
test program; for the ulh*.ged
-files you need the test_lohi
program.t/allged.ged
test file.
+ With the lexers you have to make sure that you use the proper lexer
+for each test file. The test_1byte
test program is OK
+for allged.ged
and ansel.ged
(the last one again
+with the environment variable set); for the uhl*.ged
files
+you need the test_hilo
test program; for the ulh*.ged
+ files you need the test_lohi
program.