X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;ds=inline;f=doc%2Fparser.html;h=a344644c334e745a4a1067c045f74226240dc373;hb=707b9f33892b2e6690dea7f8a6d0f6422374d696;hp=5637cd9edd128fb14c0af8dbc38f03fff8f0c4af;hpb=7a161f98fa3efba595c96577e3ae7eda15b3dec3;p=gedcom-parse.git diff --git a/doc/parser.html b/doc/parser.html index 5637cd9..a344644 100644 --- a/doc/parser.html +++ b/doc/parser.html @@ -1,125 +1,175 @@ - - -
- - -- If everything goes OK, you'll see that some gedcom files are parsed, - and that each parse is successful. Note that the used gedcom files - are made by Heiner - Eichmann and are an excellent way to test gedcom parsers thoroughly../configure
- make
- make check
-
testgedcom
program that is generated
- by make test
. testgedcom
generates is
- in UTF-8 format (more on this later), some preparation is necessary to have
- a full view on it. Basically, you need a terminal that understands
- and can display UTF-8 encoded characters, and you need to proper fonts installed
- to display them. I'll give some advice on this here, based on the
- Red Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x.
- Any other distribution that has the same or newer versions for these
-components should give the same results.xterm
in its unicode mode (which is supported by
-the xterm
coming with XFree86 4.0.x). UTF-8 capabilities
- have only recently been added to gnome-terminal
, so probably
- that is not in your distribution yet (it certainly isn't in Red Hat 7.1).xterm
in unicode mode is then e.g. (put
- everything on 1 line !):- This first sets theLANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm - -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
+ make
+ make check
LANG
variable to a locale that
- uses UTF-8, and then starts xterm
with a proper Unicode font.
- Some sample UTF-8 plain text files can be found
- here . Just cat
them on the command line
-and see the result.testgedcom
program
+that is generated by make
. testgedcom
- program print the values that it parses. An example of a command
- line is (in the gedcom
directory):./testgedcom -dg t/ulhc.ged
+ However, since the output thattestgedcom
generates +is in UTF-8 format (more on this later), some preparation is necessary +to have a full view on it. Basically, you need a terminal that understands + and can display UTF-8 encoded characters, and you need to proper fonts +installed to display them. I'll give some advice on this here, +based on the Red Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 +4.0.x. Any other distribution that has the same or newer versions +for these components should give the same results.
+
+ For the first issue, the UTF-8 capable terminal, the safest bet is + to usexterm
in its unicode mode (which is supported by +thexterm
coming with XFree86 4.0.x). UTF-8 capabilities + have only recently been added tognome-terminal
, so probably + that is not in your distribution yet (it certainly isn't in Red Hat 7.1).
+
+ For the second issue, you'll need the ISO 10646-1 fonts. These + come also with XFree86 4.0.x.
+
+ The way to startxterm
in unicode mode is then e.g. +(put everything on 1 line !):
+ +- TheLANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm + -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
-dg
option instructs the parser to show its own debug - messages (see./testgedcom -h
for the full set of options). - If everything is OK, you'll see the values from the gedcom file, -containing a lot of special characters.
+ This first sets theLANG
variable to a locale that + uses UTF-8, and then startsxterm
with a proper Unicode font. + Some sample UTF-8 plain text files can be found + here . Justcat
them on the command line + and see the result.
- For the ANSEL test file (t/ansel.ged
), you have to set -the environment variableGCONV_PATH
to theansel
- subdirectory of the gedcom directory:
- -- This is because for the ANSEL character set an extra module is needed - for the iconv library (more on this later). But again, this should - show a lot of special characters.export GCONV_PATH=./ansel
- ./testgedcom -dg t/ansel.ged
-
+ + +Testing the parser with debugging
+ + Given the UTF-8 capable terminal, you can now let thetestgedcom
+ program print the values that it parses. An example of a command + line is (in the topdirectory):
+ ++ The./testgedcom -dg t/input/ulhc.ged
+-dg
option instructs the parser to show its own debug + messages (see./testgedcom -h
for the full set of options). + If everything is OK, you'll see the values from the gedcom file, +containing a lot of special characters.
- -Testing the lexers separately
- The lexers themselves can be tested separately. For the 1-byte -lexer (i.e. supporting the encodings with 1 byte per characters, such as -ASCII, ANSI and ANSEL), the sequence of commands would be:
- -- This will show all tokens in themake clean
- make test_1byte
-t/allged.ged
test file. Similar -tests can be done usingmake test_hilo
andmake test_lohi
- (for the unicode lexers).
-
- This concludes the testing setup. Now for some explanations...
-
- -Structure of the parser
- I see the structure of a program using the gedcom parser as follows:
-
- -
-
+ For the ANSEL test file (t/ansel.ged
), you have to set + the environment variableGCONV_PATH
to theansel
+ subdirectory of the top directory:
+ ++ This is because for the ANSEL character set an extra module is needed + for the iconv library (more on this later). But again, this should + show a lot of special characters.export GCONV_PATH=./ansel
+ ./testgedcom -dg t/input/ansel.ged
+
- TO BE COMPLETED...
- -
$Id: parser.html,v 1.2 2001/12/01 15:29:00 -verthezp Exp $
- $Name$
-
-
gedcom
subdirectory):make lexer_1byte
+
+ This will generate a lexer program that can process e.g. the t/input/allged.ged
+ test file. Simply cat the file through the lexer on standard input
+and you should get all the tokens in the file. Similar tests can be
+done using make lexer_hilo
and
+make lexer_lohi
+ (for the unicode lexers). In each of the cases you need to know
+yourself which of the test files are appropriate to pass through the lexer.lex/yacc
, which means that a module generated by lex
+ takes the inputfile and determines the tokens in that file (i.e. the smallest
+units, such as numbers, line terminators, GEDCOM tags, characters in GEDCOM
+values...). These tokens are passed to the parser module, which is
+generated by yacc, to parse the syntax of the file, i.e. whether the tokens
+appear in a sequence that is valid. $Id$+
$Name$