X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;f=doc%2Fparser.html;h=a344644c334e745a4a1067c045f74226240dc373;hb=f8f253aa29e3c2561d325cb47cc17a727f76266e;hp=893670cf3fb7361de6e2183bee074c6a3115ca90;hpb=37246a4b0ab22fb948d6fb5b9b91917441db26cf;p=gedcom-parse.git diff --git a/doc/parser.html b/doc/parser.html index 893670c..a344644 100644 --- a/doc/parser.html +++ b/doc/parser.html @@ -89,9 +89,9 @@ the xterm coming with XFree86 4.0.x).  UTF-8 capabilities Given the UTF-8 capable terminal, you can now let the testgedcom program print the values that it parses.  An example of a command - line is (in the gedcom directory):
+ line is (in the top directory):
-
./testgedcom -dg t/ulhc.ged
+
./testgedcom -dg t/input/ulhc.ged
The -dg option instructs the parser to show its own debug messages  (see ./testgedcom -h for the full set of options). @@ -100,10 +100,10 @@ containing a lot of special characters.

For the ANSEL test file (t/ansel.ged), you have to set the environment variable GCONV_PATH to the ansel - subdirectory of the gedcom directory:
+ subdirectory of the top directory:
export GCONV_PATH=./ansel
- ./testgedcom -dg t/ansel.ged
+ ./testgedcom -dg t/input/ansel.ged
This is because for the ANSEL character set an extra module is needed for the iconv library (more on this later).  But again, this should @@ -119,7 +119,7 @@ containing a lot of special characters.
make lexer_1byte
- This will generate a lexer program that can process e.g. the t/allged.ged + This will generate a lexer program that can process e.g. the t/input/allged.ged test file.  Simply cat the file through the lexer on standard input and you should get all the tokens in the file.  Similar tests can be done using make lexer_hilo and @@ -149,12 +149,20 @@ For each recognized statement in the GEDCOM file, the parser calls some callback which can be registered by the application to get the information out of the file.

-This basic description ignores the problem of character encoding.  The next section describes what this problem exactly is.
+This basic description ignores the problem of character encoding.

Character encoding

Refer to this page for some introduction on character encoding...
-


-TO BE COMPLETED
+
+GEDCOM defines three standard encodings:
+ These are all supported by the parser, and converted into UTF-8 format.
+ + +