X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;ds=inline;f=doc%2Fparser.html;h=a344644c334e745a4a1067c045f74226240dc373;hb=989bc1d7262c6e767de09804fb530a5ebea55a5d;hp=15d253b856fe9b32ef9adea9b2495b409d423687;hpb=fdcc3ac19960afc4ee198a82cb397afa34ca6a67;p=gedcom-parse.git diff --git a/doc/parser.html b/doc/parser.html index 15d253b..a344644 100644 --- a/doc/parser.html +++ b/doc/parser.html @@ -1,129 +1,175 @@ - - -
- - -- If everything goes OK, you'll see that some gedcom files are parsed, -and that each parse is successful. Note that the used gedcom files -are made by Heiner -Eichmann - and are an excellent way to test gedcom parsers thoroughly.make clean
- make
- make test
-
gedcom-parse
program that is generated
- by make test
. gedcom-parse
generates is
- in UTF-8 format (more on this later), some preparation is necessary to
-have a full view on it. Basically, you need a terminal that understands
-and can display UTF-8 encoded characters, and you need to proper fonts installed
- to display them. I'll give some advice on this here, based on the
-Red Hat 7.1 distribution that I use, with glibc 2.2 and XFree86 4.0.x. Any
- other distribution that has the same or newer versions for these components
- should give the same results.xterm
in its unicode mode (which is supported by the
- xterm
coming with XFree86 4.0.x). UTF-8 capabilities
- have only recently been added to gnome-terminal
, so probably
- that is not in your distribution yet (it certainly isn't in Red Hat 7.1).xterm
in unicode mode is then e.g. (put
- everything on 1 line !):- This first sets theLANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm - -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
-
LANG
variable to a locale that
-uses UTF-8, and then starts xterm
with a proper Unicode font.
- Some sample UTF-8 plain text files can be found
- here
- . Just cat
them on the command line and see the result.gedcom-parse
- program print the values that it parses. An example of a command
- line is (in the gedcom
directory):- The./gedcom_parse -dg t/ulhc.ged
-
-dg
option instructs the parser to show its own debug
- messages (see ./gedcom_parse -h
for the full set of
-options). If everything is OK, you'll see the values from the gedcom
-file, containing a lot of special characters.t/ansel.ged
), you have to set the
- environment variable GCONV_PATH
to the ansel
subdirectory
- of the gedcom directory:+ If everything goes OK, you'll see that some gedcom files are parsed, + and that each parse is successful. Note that some of the used gedcom files + are made by Heiner + Eichmann and are an excellent way to test gedcom parsers thoroughly../configure
+ make
+ make check
+
testgedcom
program
+that is generated by make
. testgedcom
generates
+is in UTF-8 format (more on this later), some preparation is necessary
+to have a full view on it. Basically, you need a terminal that understands
+ and can display UTF-8 encoded characters, and you need to proper fonts
+installed to display them. I'll give some advice on this here,
+based on the Red Hat 7.1 distribution that I use, with glibc 2.2 and XFree86
+4.0.x. Any other distribution that has the same or newer versions
+for these components should give the same results.xterm
in its unicode mode (which is supported by
+the xterm
coming with XFree86 4.0.x). UTF-8 capabilities
+ have only recently been added to gnome-terminal
, so probably
+ that is not in your distribution yet (it certainly isn't in Red Hat 7.1).xterm
in unicode mode is then e.g.
+(put everything on 1 line !):+ This first sets theLANG=en_GB.UTF-8 xterm -bg 'black' -fg 'DarkGrey' -cm + -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1'
+
LANG
variable to a locale that
+ uses UTF-8, and then starts xterm
with a proper Unicode font.
+ Some sample UTF-8 plain text files can be found
+ here . Just cat
them on the command line
+ and see the result.testgedcom
+ program print the values that it parses. An example of a command
+ line is (in the top
directory):+ The./testgedcom -dg t/input/ulhc.ged
+
-dg
option instructs the parser to show its own debug
+ messages (see ./testgedcom -h
for the full set of options).
+ If everything is OK, you'll see the values from the gedcom file,
+containing a lot of special characters.t/ansel.ged
), you have to set
+ the environment variable GCONV_PATH
to the ansel
+ subdirectory of the top directory:export GCONV_PATH=./ansel
- ./gedcom_parse -dg t/ansel.ged
-
- This is because for the ANSEL character set an extra module is needed
-for the iconv library (more on this later). But again, this should
-show a lot of special characters.make clean
- make test_1byte
-
-This will show all tokens in the t/allged.ged
test file. Similar
-tests can be done using make test_hilo
and make test_lohi
- (for the unicode lexers).gedcom
subdirectory):make lexer_1byte
+
+ This will generate a lexer program that can process e.g. the t/input/allged.ged
+ test file. Simply cat the file through the lexer on standard input
+and you should get all the tokens in the file. Similar tests can be
+done using make lexer_hilo
and
+make lexer_lohi
+ (for the unicode lexers). In each of the cases you need to know
+yourself which of the test files are appropriate to pass through the lexer.lex/yacc
, which means that a module generated by lex
+ takes the inputfile and determines the tokens in that file (i.e. the smallest
+units, such as numbers, line terminators, GEDCOM tags, characters in GEDCOM
+values...). These tokens are passed to the parser module, which is
+generated by yacc, to parse the syntax of the file, i.e. whether the tokens
+appear in a sequence that is valid. $Id$+
$Name$