From: Peter Verthez Date: Wed, 2 Jan 2002 20:09:28 +0000 (+0000) Subject: Pass the parsed tag value (integer format) together with the string value X-Git-Url: https://git.dlugolecki.net.pl/?a=commitdiff_plain;h=8c92a223c34fbd674f26520fb990c64a7b2f9147;p=gedcom-parse.git Pass the parsed tag value (integer format) together with the string value in the callbacks. In an installed system, symbolic names are defined via gedcom-tags.h (included in gedcom.h). --- diff --git a/Makefile.am b/Makefile.am index 6b28425..fc28a10 100644 --- a/Makefile.am +++ b/Makefile.am @@ -1,7 +1,7 @@ ## Process this file with automake to produce Makefile.in # $Id$ # $Name$ -SUBDIRS = intl ansel gedcom . t doc include po +SUBDIRS = intl ansel gedcom include . t doc po INCLUDES = -I $(srcdir)/include pkgdata_DATA = gedcom.enc diff --git a/doc/usage.html b/doc/usage.html index 76a11bf..2843d7c 100644 --- a/doc/usage.html +++ b/doc/usage.html @@ -2,415 +2,434 @@ Using the GEDCOM parser library - + - +

Using the GEDCOM parser library

-
- +
+

Index

- + - -
-

Overview
-

- The GEDCOM parser library is built as a callback-based parser (comparable - to the SAX interface of XML).  It comes with:
+
+

Overview
+

+ The GEDCOM parser library is built as a callback-based parser (comparable + to the SAX interface of XML).  It comes with:
+ - Next to these, there is also a data directory in $PREFIX/share/gedcom-parse - that contains some additional stuff, but which is not immediately important - at first.  I'll leave the description of the data directory for later.
-
- The very simplest call of the gedcom parser is simply the following piece - of code (include of the gedcom header is assumed, as everywhere in this manual):
- -
int result;
- ...
- result = gedcom_parse_file("myfamily.ged");
-
- Although this will not provide much information, one thing it does is -parse the entire file and return the result.  The function returns -0 on success and 1 on failure.  No other information is available using -this function only.
+ Next to these, there is also a data directory in $PREFIX/share/gedcom-parse + that contains some additional stuff, but which is not immediately important + at first.  I'll leave the description of the data directory for later.

- The next sections will refine this to be able to have meaningful errors -and the actual data that is in the file.
+ The very simplest call of the gedcom parser is simply the following piece + of code (include of the gedcom header is assumed, as everywhere in this +manual):
-
+
int result;
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+
+ Although this will not provide much information, one thing it does is +parse the entire file and return the result.  The function returns 0 +on success and 1 on failure.  No other information is available using +this function only.
+
+ The next sections will refine this to be able to have meaningful errors + and the actual data that is in the file.
+ +

Error handling

- Since this is a relatively simple topic, it is discussed before the actual + Since this is a relatively simple topic, it is discussed before the actual callback mechanism, although it also uses a callback...
-
- The library can be used in several different circumstances, both terminal-based - as GUI-based.  Therefore, it leaves the actual display of the error -message up to the application.  For this, the application needs to register -a callback before parsing the GEDCOM file, which will be called by the library +
+ The library can be used in several different circumstances, both terminal-based + as GUI-based.  Therefore, it leaves the actual display of the error +message up to the application.  For this, the application needs to register +a callback before parsing the GEDCOM file, which will be called by the library on errors, warnings and messages.
-
- A typical piece of code would be:
- -
void my_message_handler (Gedcom_msg_type type, +
+ A typical piece of code would be:
+ +
void my_message_handler (Gedcom_msg_type type, char *msg)
- {
-   ...
- }
- ...
- gedcom_set_message_handler(my_message_handler);
- ...
- result = gedcom_parse_file("myfamily.ged");

-
- In the above piece of code, my_message_handler is the callback + {
+   ...
+ }
+ ...
+ gedcom_set_message_handler(my_message_handler);
+ ...
+ result = gedcom_parse_file("myfamily.ged");

+
+ In the above piece of code, my_message_handler is the callback that will be called for errors (type=ERROR), warnings ( - type=WARNING) and messages (type=MESSAGE).  The -callback must have the signature as in the example.  For errors, the + type=WARNING) and messages (type=MESSAGE).  The +callback must have the signature as in the example.  For errors, the msg passed to the callback will have the format:
- +
Error on line <lineno>: <actual_message>
-
- Note that the entire string will be properly internationalized, and encoded - in UTF-8 (see "Why UTF-8?"  LINK TBD).  Also, no newline - is appended, so that the application program can use it in any way it wants. -  Warnings are similar, but use "Warning" instead of "Error".  Messages + + Note that the entire string will be properly internationalized, and encoded + in UTF-8 (see "Why UTF-8?"  LINK TBD).  Also, no newline + is appended, so that the application program can use it in any way it wants. +  Warnings are similar, but use "Warning" instead of "Error".  Messages are plain text, without any prefix.
-
- With this in place, the resulting code will already show errors and warnings +
+ With this in place, the resulting code will already show errors and warnings produced by the parser, e.g. on the terminal if a simple printf - is used in the message handler.
- -
+ is used in the message handler.
+ +

Data callback mechanism

- The most important use of the parser is of course to get the data out of - the GEDCOM file.  As already mentioned, the parser uses a callback + The most important use of the parser is of course to get the data out +of the GEDCOM file.  As already mentioned, the parser uses a callback mechanism for that.  In fact, the mechanism involves two levels.
-
- The primary level is that each of the sections in a GEDCOM file is notified - to the application code via a "start element" callback and an "end element" - callback (much like in a SAX interface for XML), i.e. when a line containing - a certain tag is parsed, the "start element" callback is called for that -tag, and when all its subordinate lines with their tags have been processed, -the "end element" callback is called for the original tag.  Since GEDCOM - is hierarchical, this results in properly nested calls to appropriate "start +
+ The primary level is that each of the sections in a GEDCOM file is notified + to the application code via a "start element" callback and an "end element" + callback (much like in a SAX interface for XML), i.e. when a line containing + a certain tag is parsed, the "start element" callback is called for that +tag, and when all its subordinate lines with their tags have been processed, +the "end element" callback is called for the original tag.  Since GEDCOM + is hierarchical, this results in properly nested calls to appropriate "start element" and "end element" callbacks.
-
- However, it would be typical for a genealogy program to support only a -subset of the GEDCOM standard, certainly a program that is still under development. -  Moreover, under GEDCOM it is allowed for an application to define -its own tags, which will typically not  be supported by another application. -  Still, in that case, data preservation is important; it would hardly - be accepted that information that is not understood by a certain program +
+ However, it would be typical for a genealogy program to support only a +subset of the GEDCOM standard, certainly a program that is still under development. +  Moreover, under GEDCOM it is allowed for an application to define its + own tags, which will typically not  be supported by another application. +  Still, in that case, data preservation is important; it would hardly + be accepted that information that is not understood by a certain program is just removed.
-
- Therefore, the second level of callbacks involves a "default callback". - An application needs to subscribe to callbacks for tags it does support, -and need to provide a "default callback" which will be called for tags it -doesn't support.  The application can then choose to just store the information -that comes via the default callback in plain textual format.
-
- After this introduction, let's see what the API looks like...
-
- +
+ Therefore, the second level of callbacks involves a "default callback". +  An application needs to subscribe to callbacks for tags it does support, + and need to provide a "default callback" which will be called for tags it + doesn't support.  The application can then choose to just store the +information that comes via the default callback in plain textual format.
+
+ After this introduction, let's see what the API looks like...
+
+

Start and end callbacks

- +

Callbacks for records
-

- As a simple example, we will get some information from the header of a + + As a simple example, we will get some information from the header of a GEDCOM file.  First, have a look at the following piece of code:
- -
Gedcom_ctxt my_header_start_cb (int level, - Gedcom_val xref, char *tag)
- {
-   printf("The header starts\n");
-   return (Gedcom_ctxt)1;
- }
-
- void my_header_end_cb (Gedcom_ctxt self)
- {
-   printf("The header ends, context is %d\n", self);   /* context + +
Gedcom_ctxt my_header_start_cb (int level, + Gedcom_val xref, char *tag, int parsed_tag)
+ {
+   printf("The header starts\n");
+   return (Gedcom_ctxt)1;
+ }
+
+ void my_header_end_cb (Gedcom_ctxt self)
+ {
+   printf("The header ends, context is %d\n", self);   /* context will print as "1" */
- }
-
- ...
- gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, -my_header_end_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");

-
- Using the gedcom_subscribe_to_record function, the application - requests to use the specified callbacks as start and end callback. The end - callback is optional: you can pass NULL if you are not interested - in the end callback.  The identifiers to use as first argument to the + }
+
+ ...
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, + my_header_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");

+
+ Using the gedcom_subscribe_to_record function, the application + requests to use the specified callbacks as start and end callback. The end + callback is optional: you can pass NULL if you are not interested + in the end callback.  The identifiers to use as first argument to the function (here REC_HEAD) are described in the - interface details.
-
- From the name of the function it becomes clear that this function is specific - to complete records.  For the separate elements in records there is -another function, which we'll see shortly.  Again, the callbacks need + interface details.
+
+ From the name of the function it becomes clear that this function is specific + to complete records.  For the separate elements in records there is +another function, which we'll see shortly.  Again, the callbacks need to have the signatures as shown in the example.
-
- The Gedcom_ctxt type that is used as a result of the start -callback and as an argument to the end callback is vital for passing context -necessary for the application.  This type is meant to be opaque; in fact, -it's a void pointer, so you can pass anything via it.  The important -thing to know is that the context that the application returns in the start -callback will be passed in the end callback as an argument, and as we will -see shortly, also to all the directly subordinate elements of the record.
-
- The example passes a simple integer as context, but an application could - e.g. pass a struct that will contain the information for the - header.  In the end callback, the application could then e.g. do some +
+ The Gedcom_ctxt type that is used as a result of the start + callback and as an argument to the end callback is vital for passing context + necessary for the application.  This type is meant to be opaque; in +fact, it's a void pointer, so you can pass anything via it.  The important + thing to know is that the context that the application returns in the start + callback will be passed in the end callback as an argument, and as we will + see shortly, also to all the directly subordinate elements of the record.
+
+The tag is the GEDCOM tag in string format, the parsed_tag + is an integer, for which symbolic values are defined as TAG_HEAD, + TAG_SOUR, TAG_DATA, ... and USERTAG +for the application-specific tags.  These values are defined in the +header gedcom-tags.h that is installed, and included via +gedcom.h (so no need to include gedcom-tags.h yourself).
+
+ The example passes a simple integer as context, but an application could + e.g. pass a struct that will contain the information for the + header.  In the end callback, the application could then e.g. do some finalizing operations on the struct to put it in its database.
-
- (Note that the Gedcom_val type for the xref argument - was not discussed, see further for this)
-
- +
+ (Note that the Gedcom_val type for the xref +argument was not discussed, see further for this)
+
+

Callbacks for elements

- We will now retrieve the SOUR field (the name of the program that wrote -the file) from the header:
- -
Gedcom_ctxt my_header_source_start_cb(Gedcom_ctxt + We will now retrieve the SOUR field (the name of the program that wrote + the file) from the header:
+ +
Gedcom_ctxt my_header_source_start_cb(Gedcom_ctxt parent,
-                       -                 int     -     level,
-                       -                 char*     -   tag,
-                       -                 char*     -   raw_value,
-                       -                 Gedcom_val  parsed_value)
- {
-   char *source = GEDCOM_STRING(parsed_value);
-   printf("This file was written by %s\n", source);
-   return parent;
- }
-
- void my_header_source_end_cb(Gedcom_ctxt parent,
-                       -        Gedcom_ctxt self,
-                       -        Gedcom_val  parsed_value)
- {
-   printf("End of the source description\n");
- }
-
- ...
- gedcom_subscribe_to_element(ELT_HEAD_SOUR,
-                       -       my_header_source_start_cb,
-                       -       my_header_source_end_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");

-
- The subscription mechanism for elements is similar, only the signatures -of the callbacks differ.  The signature for the start callback shows -that the context of the parent line (e.g. the struct that describes - the header) is passed to this start callback.  The callback itself -returns here the same context, but this can be its own context object of -course.  The end callback is called with both the context of the parent -and the context of itself, which will be the same in the example.  Again, -the list of identifiers to use as a first argument for the subscription function -are detailed in the interface +                     +                  int   +      level,
+                     +                  char*   +    tag,
+                     +                  char*   +    raw_value,
+                      +                int     +    parsed_tag,
+                     +                  Gedcom_val + parsed_value)
+ {
+   char *source = GEDCOM_STRING(parsed_value);
+   printf("This file was written by %s\n", source);
+   return parent;
+ }
+
+ void my_header_source_end_cb(Gedcom_ctxt parent,
+                     +         Gedcom_ctxt self,
+                     +         Gedcom_val  parsed_value)
+ {
+   printf("End of the source description\n");
+ }
+
+ ...
+ gedcom_subscribe_to_element(ELT_HEAD_SOUR,
+                     +        my_header_source_start_cb,
+                     +        my_header_source_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");

+
+ The subscription mechanism for elements is similar, only the signatures + of the callbacks differ.  The signature for the start callback shows + that the context of the parent line (e.g. the struct that describes + the header) is passed to this start callback.  The callback itself returns + here the same context, but this can be its own context object of course. + The end callback is called with both the context of the parent and +the context of itself, which will be the same in the example.  Again, +the list of identifiers to use as a first argument for the subscription function +are detailed in the interface details .
-
- If we look at the other arguments of the start callback, we see the level - number (the initial number of the line in the GEDCOM file), the tag (e.g. - "SOUR"), and then a raw value and a parsed value.  The raw value is -just the raw string that occurs as value on the line next to the tag (in -UTF-8 encoding).  The parsed value is the meaningful value that is parsed -from that raw string.
-
- The Gedcom_val type is meant to be an opaque type.  The - only thing that needs to be known about it is that it can contain specific - data types, which have to be retrieved from it using pre-defined macros. +
+ If we look at the other arguments of the start callback, we see the level + number (the initial number of the line in the GEDCOM file), the tag (e.g. + "SOUR"), and then a raw value, a parsed tag and a parsed value.  The +raw value is just the raw string that occurs as value on the line next to +the tag (in UTF-8 encoding).  The parsed value is the meaningful value +that is parsed from that raw string.  The parsed tag is described in +the section for record callbacks.
+
+ The Gedcom_val type is meant to be an opaque type.  The + only thing that needs to be known about it is that it can contain specific + data types, which have to be retrieved from it using pre-defined macros.  These data types are described in the - interface details.
-
- Some extra notes:
- + interface details.
+
+ Some extra notes:
+ - +

Default callbacks
-

- As described above, an application doesn't always implement the entire -GEDCOM spec, and application-specific tags may have been added by other applications. - To preserve this extra data anyway, a default callback can be registered -by the application, as in the following example:
- -
void my_default_cb (Gedcom_ctxt parent, -int level, char* tag, char* raw_value)
- {
-   ...
- }
-
- ...
- gedcom_set_default_callback(my_default_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");

-
- This callback has a similar signature as the previous ones, but -it doesn't contain a parsed value.  However, it does contain the parent -context, that was returned by the application for the most specific containing -tag that the application supported.
-
- Suppose e.g. that this callback is called for some tags in the header that -are specific to some other application, then our application could make sure -that the parent context contains the struct or object that represents the -header, and use the default callback here to add the level, tag and raw_value -as plain text in a member of that struct or object, thus preserving the information. - The application can then write this out when the data is saved again -in a GEDCOM file.  To make it more specific, consider the following example:
- + + As described above, an application doesn't always implement the entire +GEDCOM spec, and application-specific tags may have been added by other applications. +  To preserve this extra data anyway, a default callback can be registered + by the application, as in the following example:
+ +
void my_default_cb (Gedcom_ctxt parent, + int level, char* tag, char* raw_value, int parsed_tag)
+ {
+   ...
+ }
+
+ ...
+ gedcom_set_default_callback(my_default_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");

+
+ This callback has a similar signature as the previous ones, +but it doesn't contain a parsed value.  However, it does contain the +parent context, that was returned by the application for the most specific +containing tag that the application supported.
+
+ Suppose e.g. that this callback is called for some tags in the header that + are specific to some other application, then our application could make +sure that the parent context contains the struct or object that represents +the header, and use the default callback here to add the level, tag and +raw_value as plain text in a member of that struct or object, thus preserving +the information.  The application can then write this out when the +data is saved again in a GEDCOM file.  To make it more specific, consider +the following example:
+
struct header {
-   char* source;
-   ...
-   char* extra_text;
- };
-
- Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag)
- {
-   struct header head = my_make_header_struct();
-   return (Gedcom_ctxt)head;
- }
-
- void my_default_cb(Gedcom_ctxt parent, int level, char* tag, char* raw_value)
- {
-   struct header head = (struct header)parent;
-   my_header_add_to_extra_text(head, level, tag, raw_value);
- }
-
- gedcom_set_default_callback(my_default_cb);
- gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
- ...
- result = gedcom_parse_file(filename);

-
- Note that the default callback will be called for any tag that isn't specifically -subscribed upon by the application, and can thus be called in various contexts. - For simplicity, the example above doesn't take this into account (the - parent could be of different types, depending on -the context).
- -
+   char* source;
+   ...
+   char* extra_text;
+ };
+
+ Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag, int +parsed_tag)
+ {
+   struct header head = my_make_header_struct();
+   return (Gedcom_ctxt)head;
+ }
+
+ void my_default_cb(Gedcom_ctxt parent, int level, char* tag, char* raw_value, +int parsed_tag)
+ {
+   struct header head = (struct header)parent;
+   my_header_add_to_extra_text(head, level, tag, raw_value);
+ }
+
+ gedcom_set_default_callback(my_default_cb);
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
+ ...
+ result = gedcom_parse_file(filename);
+ + Note that the default callback will be called for any tag that isn't specifically + subscribed upon by the application, and can thus be called in various contexts. +  For simplicity, the example above doesn't take this into account (the + parent could be of different types, depending +on the context).
+ +

Other API functions
-

- Although the above describes the basic interface of libgedcom, there are -some other functions that allow to customize the behaviour of the library. - These will be explained in the current section.
- + + Although the above describes the basic interface of libgedcom, there are + some other functions that allow to customize the behaviour of the library. +  These will be explained in the current section.
+

Debugging

- The library can generate various debugging output, not only from itself, -but also the debugging output generated by the yacc parser.  By default, -no debugging output is generated, but this can be customized using the following -function:
- -
void gedcom_set_debug_level (int level, -FILE* trace_output)
-
- The level can be one of the following values:
- + The library can generate various debugging output, not only from itself, + but also the debugging output generated by the yacc parser.  By default, + no debugging output is generated, but this can be customized using the following + function:
+ +
void gedcom_set_debug_level (int level, + FILE* trace_output)
+
+ The level can be one of the following values:
+ - If the trace_output is NULL, debugging information -will be written to stderr, otherwise the given file handle is -used (which must be open).
-
- + If the trace_output is NULL, debugging information + will be written to stderr, otherwise the given file handle +is used (which must be open).
+
+

Error treatment

- One of the previous sections already described the callback to be registered -to get error messages.  The library also allows to customize what happens -on an error, using the following function:
- -
void gedcom_set_error_handling (Gedcom_err_mech -mechanism)
-
- The mechanism can be one of:
- + One of the previous sections already described the callback to be registered + to get error messages.  The library also allows to customize what happens + on an error, using the following function:
+ +
void gedcom_set_error_handling (Gedcom_err_mech + mechanism)
+
+ The mechanism can be one of:
+ - This doesn't influence the generation of error or warning messages, only -the behaviour of the parser and its return code.
-
- + This doesn't influence the generation of error or warning messages, only + the behaviour of the parser and its return code.
+
+

Compatibility mode
-

- Applications are not necessarily true to the GEDCOM spec (or use a different -version than 5.5).  The intention is that the library is resilient to -this, and goes in compatibility mode for files written by specific programs -(detected via the HEAD.SOUR tag).  This compatibility mode can be enabled -and disabled via the following function:
- + + Applications are not necessarily true to the GEDCOM spec (or use a different + version than 5.5).  The intention is that the library is resilient +to this, and goes in compatibility mode for files written by specific programs + (detected via the HEAD.SOUR tag).  This compatibility mode can be enabled + and disabled via the following function:
+
void gedcom_set_compat_handling - (int enable_compat)
-
- The argument can be:
- + (int enable_compat)
+ + The argument can be:
+ - Note that, currently, no actual compatibility code is present, but this + Note that, currently, no actual compatibility code is present, but this is on the to-do list.
- -
+ +
$Id$
$Name$
-
-                    
- + +
                    
+ diff --git a/gedcom/gedcom.y b/gedcom/gedcom.y index e57b539..eff6267 100644 --- a/gedcom/gedcom.y +++ b/gedcom/gedcom.y @@ -1,5 +1,5 @@ /* Parser for Gedcom. - Copyright (C) 2001 The Genes Development Team + Copyright (C) 2001, 2002 The Genes Development Team This file is part of the Gedcom parser library. Contributed by Peter Verthez , 2001. @@ -243,6 +243,7 @@ int compat_mode(int flags); %union { int number; char *string; + struct tag_struct tag; Gedcom_ctxt ctxt; } @@ -256,139 +257,144 @@ int compat_mode(int flags); %token DELIM %token ANYCHAR %token POINTER -%token USERTAG -%token TAG_ABBR -%token TAG_ADDR -%token TAG_ADR1 -%token TAG_ADR2 -%token TAG_ADOP -%token TAG_AFN -%token TAG_AGE -%token TAG_AGNC -%token TAG_ALIA -%token TAG_ANCE -%token TAG_ANCI -%token TAG_ANUL -%token TAG_ASSO -%token TAG_AUTH -%token TAG_BAPL -%token TAG_BAPM -%token TAG_BARM -%token TAG_BASM -%token TAG_BIRT -%token TAG_BLES -%token TAG_BLOB -%token TAG_BURI -%token TAG_CALN -%token TAG_CAST -%token TAG_CAUS -%token TAG_CENS -%token TAG_CHAN -%token TAG_CHAR -%token TAG_CHIL -%token TAG_CHR -%token TAG_CHRA -%token TAG_CITY -%token TAG_CONC -%token TAG_CONF -%token TAG_CONL -%token TAG_CONT -%token TAG_COPR -%token TAG_CORP -%token TAG_CREM -%token TAG_CTRY -%token TAG_DATA -%token TAG_DATE -%token TAG_DEAT -%token TAG_DESC -%token TAG_DESI -%token TAG_DEST -%token TAG_DIV -%token TAG_DIVF -%token TAG_DSCR -%token TAG_EDUC -%token TAG_EMIG -%token TAG_ENDL -%token TAG_ENGA -%token TAG_EVEN -%token TAG_FAM -%token TAG_FAMC -%token TAG_FAMF -%token TAG_FAMS -%token TAG_FCOM -%token TAG_FILE -%token TAG_FORM -%token TAG_GEDC -%token TAG_GIVN -%token TAG_GRAD -%token TAG_HEAD -%token TAG_HUSB -%token TAG_IDNO -%token TAG_IMMI -%token TAG_INDI -%token TAG_LANG -%token TAG_LEGA -%token TAG_MARB -%token TAG_MARC -%token TAG_MARL -%token TAG_MARR -%token TAG_MARS -%token TAG_MEDI -%token TAG_NAME -%token TAG_NATI -%token TAG_NATU -%token TAG_NCHI -%token TAG_NICK -%token TAG_NMR -%token TAG_NOTE -%token TAG_NPFX -%token TAG_NSFX -%token TAG_OBJE -%token TAG_OCCU -%token TAG_ORDI -%token TAG_ORDN -%token TAG_PAGE -%token TAG_PEDI -%token TAG_PHON -%token TAG_PLAC -%token TAG_POST -%token TAG_PROB -%token TAG_PROP -%token TAG_PUBL -%token TAG_QUAY -%token TAG_REFN -%token TAG_RELA -%token TAG_RELI -%token TAG_REPO -%token TAG_RESI -%token TAG_RESN -%token TAG_RETI -%token TAG_RFN -%token TAG_RIN -%token TAG_ROLE -%token TAG_SEX -%token TAG_SLGC -%token TAG_SLGS -%token TAG_SOUR -%token TAG_SPFX -%token TAG_SSN -%token TAG_STAE -%token TAG_STAT -%token TAG_SUBM -%token TAG_SUBN -%token TAG_SURN -%token TAG_TEMP -%token TAG_TEXT -%token TAG_TIME -%token TAG_TITL -%token TAG_TRLR -%token TAG_TYPE -%token TAG_VERS -%token TAG_WIFE -%token TAG_WILL - -%type anystdtag -%type anytoptag +%token USERTAG +%token TAG_ABBR +%token TAG_ADDR +%token TAG_ADR1 +%token TAG_ADR2 +%token TAG_ADOP +%token TAG_AFN +%token TAG_AGE +%token TAG_AGNC +%token TAG_ALIA +%token TAG_ANCE +%token TAG_ANCI +%token TAG_ANUL +%token TAG_ASSO +%token TAG_AUTH +%token TAG_BAPL +%token TAG_BAPM +%token TAG_BARM +%token TAG_BASM +%token TAG_BIRT +%token TAG_BLES +%token TAG_BLOB +%token TAG_BURI +%token TAG_CALN +%token TAG_CAST +%token TAG_CAUS +%token TAG_CENS +%token TAG_CHAN +%token TAG_CHAR +%token TAG_CHIL +%token TAG_CHR +%token TAG_CHRA +%token TAG_CITY +%token TAG_CONC +%token TAG_CONF +%token TAG_CONL +%token TAG_CONT +%token TAG_COPR +%token TAG_CORP +%token TAG_CREM +%token TAG_CTRY +%token TAG_DATA +%token TAG_DATE +%token TAG_DEAT +%token TAG_DESC +%token TAG_DESI +%token TAG_DEST +%token TAG_DIV +%token TAG_DIVF +%token TAG_DSCR +%token TAG_EDUC +%token TAG_EMIG +%token TAG_ENDL +%token TAG_ENGA +%token TAG_EVEN +%token TAG_FAM +%token TAG_FAMC +%token TAG_FAMF +%token TAG_FAMS +%token TAG_FCOM +%token TAG_FILE +%token TAG_FORM +%token TAG_GEDC +%token TAG_GIVN +%token TAG_GRAD +%token TAG_HEAD +%token TAG_HUSB +%token TAG_IDNO +%token TAG_IMMI +%token TAG_INDI +%token TAG_LANG +%token TAG_LEGA +%token TAG_MARB +%token TAG_MARC +%token TAG_MARL +%token TAG_MARR +%token TAG_MARS +%token TAG_MEDI +%token TAG_NAME +%token TAG_NATI +%token TAG_NATU +%token TAG_NCHI +%token TAG_NICK +%token TAG_NMR +%token TAG_NOTE +%token TAG_NPFX +%token TAG_NSFX +%token TAG_OBJE +%token TAG_OCCU +%token TAG_ORDI +%token TAG_ORDN +%token TAG_PAGE +%token TAG_PEDI +%token TAG_PHON +%token TAG_PLAC +%token TAG_POST +%token TAG_PROB +%token TAG_PROP +%token TAG_PUBL +%token TAG_QUAY +%token TAG_REFN +%token TAG_RELA +%token TAG_RELI +%token TAG_REPO +%token TAG_RESI +%token TAG_RESN +%token TAG_RETI +%token TAG_RFN +%token TAG_RIN +%token TAG_ROLE +%token TAG_SEX +%token TAG_SLGC +%token TAG_SLGS +%token TAG_SOUR +%token TAG_SPFX +%token TAG_SSN +%token TAG_STAE +%token TAG_STAT +%token TAG_SUBM +%token TAG_SUBN +%token TAG_SURN +%token TAG_TEMP +%token TAG_TEXT +%token TAG_TIME +%token TAG_TITL +%token TAG_TRLR +%token TAG_TYPE +%token TAG_VERS +%token TAG_WIFE +%token TAG_WILL + +%type anystdtag +%type anytoptag +%type fam_event_tag +%type indiv_attr_tag +%type indiv_birt_tag +%type indiv_gen_tag +%type lio_bapl_tag %type line_item %type line_value %type mand_line_item @@ -398,11 +404,6 @@ int compat_mode(int flags); %type opt_xref %type opt_value %type opt_line_item -%type fam_event_tag -%type indiv_attr_tag -%type indiv_birt_tag -%type indiv_gen_tag -%type lio_bapl_tag %type head_sect %% @@ -514,7 +515,7 @@ head_sour_name_sect : OPEN DELIM TAG_NAME mand_line_item ; head_sour_corp_sect : OPEN DELIM TAG_CORP mand_line_item { $$ = start_element(ELT_HEAD_SOUR_CORP, PARENT, - $1, $3, $4, + $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(CORP, $$) } @@ -536,7 +537,7 @@ head_sour_corp_sub : addr_struc_sub /* 0:1 */ head_sour_data_sect : OPEN DELIM TAG_DATA mand_line_item { $$ = start_element(ELT_HEAD_SOUR_DATA, PARENT, - $1, $3, $4, + $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(DATA, $$) } @@ -560,7 +561,7 @@ head_sour_data_sub : head_sour_data_date_sect { OCCUR2(DATE, 0, 1) } head_sour_data_date_sect : OPEN DELIM TAG_DATE mand_line_item { struct date_value dv = gedcom_parse_date($4); $$ = start_element(ELT_HEAD_SOUR_DATA_DATE, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_DATE(dv)); START(DATE, $$) } @@ -573,7 +574,7 @@ head_sour_data_date_sect : OPEN DELIM TAG_DATE mand_line_item ; head_sour_data_copr_sect : OPEN DELIM TAG_COPR mand_line_item { $$ = start_element(ELT_HEAD_SOUR_DATA_COPR, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(COPR, $$) } @@ -588,7 +589,7 @@ head_sour_data_copr_sect : OPEN DELIM TAG_COPR mand_line_item /* HEAD.DEST */ head_dest_sect : OPEN DELIM TAG_DEST mand_line_item { $$ = start_element(ELT_HEAD_DEST, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(DEST, $$) } @@ -604,7 +605,7 @@ head_dest_sect : OPEN DELIM TAG_DEST mand_line_item head_date_sect : OPEN DELIM TAG_DATE mand_line_item { struct date_value dv = gedcom_parse_date($4); $$ = start_element(ELT_HEAD_DATE, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_DATE(dv)); START(DATE, $$) } @@ -626,7 +627,7 @@ head_date_sub : head_date_time_sect { OCCUR2(TIME, 0, 1) } head_date_time_sect : OPEN DELIM TAG_TIME mand_line_item { $$ = start_element(ELT_HEAD_DATE_TIME, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(TIME, $$) } @@ -641,7 +642,7 @@ head_date_time_sect : OPEN DELIM TAG_TIME mand_line_item /* HEAD.SUBM */ head_subm_sect : OPEN DELIM TAG_SUBM mand_pointer { $$ = start_element(ELT_HEAD_SUBM, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(SUBM, $$) } @@ -655,7 +656,7 @@ head_subm_sect : OPEN DELIM TAG_SUBM mand_pointer /* HEAD.SUBN */ head_subn_sect : OPEN DELIM TAG_SUBN mand_pointer { $$ = start_element(ELT_HEAD_SUBN, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(SUBN, $$) } @@ -669,7 +670,7 @@ head_subn_sect : OPEN DELIM TAG_SUBN mand_pointer /* HEAD.FILE */ head_file_sect : OPEN DELIM TAG_FILE mand_line_item { $$ = start_element(ELT_HEAD_FILE, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(FILE, $$) } @@ -682,7 +683,7 @@ head_file_sect : OPEN DELIM TAG_FILE mand_line_item /* HEAD.COPR */ head_copr_sect : OPEN DELIM TAG_COPR mand_line_item { $$ = start_element(ELT_HEAD_COPR, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(COPR, $$) } @@ -716,7 +717,7 @@ head_gedc_sub : head_gedc_vers_sect { OCCUR2(VERS, 1, 1) } ; head_gedc_vers_sect : OPEN DELIM TAG_VERS mand_line_item { $$ = start_element(ELT_HEAD_GEDC_VERS, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(VERS, $$) } @@ -729,7 +730,7 @@ head_gedc_vers_sect : OPEN DELIM TAG_VERS mand_line_item ; head_gedc_form_sect : OPEN DELIM TAG_FORM mand_line_item { $$ = start_element(ELT_HEAD_GEDC_FORM, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(FORM, $$) } @@ -745,7 +746,7 @@ head_gedc_form_sect : OPEN DELIM TAG_FORM mand_line_item head_char_sect : OPEN DELIM TAG_CHAR mand_line_item { if (open_conv_to_internal($4) == 0) YYERROR; $$ = start_element(ELT_HEAD_CHAR, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(CHAR, $$) } @@ -765,7 +766,7 @@ head_char_sub : head_char_vers_sect { OCCUR2(VERS, 0, 1) } ; head_char_vers_sect : OPEN DELIM TAG_VERS mand_line_item { $$ = start_element(ELT_HEAD_CHAR_VERS, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(VERS, $$) } @@ -780,7 +781,7 @@ head_char_vers_sect : OPEN DELIM TAG_VERS mand_line_item /* HEAD.LANG */ head_lang_sect : OPEN DELIM TAG_LANG mand_line_item { $$ = start_element(ELT_HEAD_LANG, - PARENT, $1, $3, $4, + PARENT, $1, $3, $4, GEDCOM_MAKE_STRING($4)); START(LANG, $$) } @@ -3280,7 +3281,7 @@ no_std_rec : user_rec /* 0:M */ ; user_rec : OPEN DELIM opt_xref USERTAG - { if ($4[0] != '_') { + { if ($4.string[0] != '_') { gedcom_error(_("Undefined tag (and not a valid user tag): %s"), $4); YYERROR; @@ -3297,7 +3298,7 @@ user_rec : OPEN DELIM opt_xref USERTAG { end_record(REC_USER, $7); } ; user_sect : OPEN DELIM opt_xref USERTAG - { if ($4[0] != '_') { + { if ($4.string[0] != '_') { gedcom_error(_("Undefined tag (and not a valid user tag): %s"), $4); YYERROR; diff --git a/gedcom/gedcom_internal.h b/gedcom/gedcom_internal.h index 29925c4..abd8091 100644 --- a/gedcom/gedcom_internal.h +++ b/gedcom/gedcom_internal.h @@ -1,5 +1,5 @@ /* General header for the Gedcom parser. - Copyright (C) 2001 The Genes Development Team + Copyright (C) 2001, 2002 The Genes Development Team This file is part of the Gedcom parser library. Contributed by Peter Verthez , 2001. @@ -47,6 +47,13 @@ #define GEDCOMTAGOFFSET 257 #define INTERNAL_ENCODING "UTF8" +#define GEDCOM_INTERNAL 1 + +struct tag_struct { + char *string; + int value; +}; + int gedcom_error(char* s, ...); int gedcom_warning(char* s, ...); int gedcom_message(char* s, ...); diff --git a/gedcom/gedcom_lex_common.c b/gedcom/gedcom_lex_common.c index e1c17bb..8e83d04 100644 --- a/gedcom/gedcom_lex_common.c +++ b/gedcom/gedcom_lex_common.c @@ -1,5 +1,5 @@ /* Common lexer code. - Copyright (C) 2001 The Genes Development Team + Copyright (C) 2001, 2002 The Genes Development Team This file is part of the Gedcom parser library. Contributed by Peter Verthez , 2001. @@ -73,8 +73,8 @@ int test_loop(ENCODING enc, char* code) case DELIM: printf("DELIM "); break; case ANYCHAR: printf("%s ", gedcom_lval.string); break; case POINTER: printf("POINTER(%s) ", gedcom_lval.string); break; - case USERTAG: printf("USERTAG(%s) ", gedcom_lval.string); break; - default: printf("TAG(%s) ", gedcom_lval.string); break; + case USERTAG: printf("USERTAG(%s) ", gedcom_lval.tag.string); break; + default: printf("TAG(%s) ", gedcom_lval.tag.string); break; } tok = gedcom_lex(); } @@ -107,7 +107,8 @@ int test_loop(ENCODING enc, char* code) #define MKTAGACTION(THETAG) \ { CHECK_LINE_LEN; \ - gedcom_lval.string = TO_INTERNAL(yytext, tag_buf); \ + gedcom_lval.tag.string = TO_INTERNAL(yytext, tag_buf); \ + gedcom_lval.tag.value = TAG_##THETAG; \ BEGIN(NORMAL); \ return TAG_##THETAG; \ } @@ -211,7 +212,8 @@ int test_loop(ENCODING enc, char* code) return BADTOKEN; \ } \ CHECK_LINE_LEN; \ - gedcom_lval.string = TO_INTERNAL(yytext, tag_buf); \ + gedcom_lval.tag.string = TO_INTERNAL(yytext, tag_buf); \ + gedcom_lval.tag.value = USERTAG; \ BEGIN(NORMAL); \ return USERTAG; \ } diff --git a/gedcom/interface.c b/gedcom/interface.c index 5860444..8a69ffe 100644 --- a/gedcom/interface.c +++ b/gedcom/interface.c @@ -1,5 +1,5 @@ /* Implementation of the interface to applications. - Copyright (C) 2001 The Genes Development Team + Copyright (C) 2001, 2002 The Genes Development Team This file is part of the Gedcom parser library. Contributed by Peter Verthez , 2001. @@ -56,11 +56,11 @@ void gedcom_subscribe_to_element(Gedcom_elt elt, } Gedcom_ctxt start_record(Gedcom_rec rec, - int level, Gedcom_val xref, char *tag) + int level, Gedcom_val xref, struct tag_struct tag) { Gedcom_rec_start_cb cb = record_start_callback[rec]; if (cb != NULL) - return (*cb)(level, xref, tag); + return (*cb)(level, xref, tag.string, tag.value); else return NULL; } @@ -73,15 +73,16 @@ void end_record(Gedcom_rec rec, Gedcom_ctxt self) } Gedcom_ctxt start_element(Gedcom_elt elt, Gedcom_ctxt parent, - int level, char *tag, char *raw_value, + int level, struct tag_struct tag, char *raw_value, Gedcom_val parsed_value) { Gedcom_elt_start_cb cb = element_start_callback[elt]; Gedcom_ctxt ctxt = parent; if (cb != NULL) - ctxt = (*cb)(parent, level, tag, raw_value, parsed_value); + ctxt = (*cb)(parent, level, tag.string, raw_value, + tag.value, parsed_value); else if (default_cb != NULL) - (*default_cb)(parent, level, tag, raw_value); + (*default_cb)(parent, level, tag.string, raw_value, tag.value); return ctxt; } diff --git a/gedcom/interface.h b/gedcom/interface.h index 2e716c4..6eb0ed6 100644 --- a/gedcom/interface.h +++ b/gedcom/interface.h @@ -1,5 +1,5 @@ /* Header for interface.c - Copyright (C) 2001 The Genes Development Team + Copyright (C) 2001, 2002 The Genes Development Team This file is part of the Gedcom parser library. Contributed by Peter Verthez , 2001. @@ -27,11 +27,11 @@ #include "gedcom.h" Gedcom_ctxt start_record(Gedcom_rec rec, - int level, Gedcom_val xref, char *tag); + int level, Gedcom_val xref, struct tag_struct tag); void end_record(Gedcom_rec rec, Gedcom_ctxt self); Gedcom_ctxt start_element(Gedcom_elt elt, Gedcom_ctxt parent, - int level, char *tag, char *raw_value, + int level, struct tag_struct tag, char *raw_value, Gedcom_val parsed_value); void end_element(Gedcom_elt elt, Gedcom_ctxt parent, Gedcom_ctxt self, Gedcom_val parsed_value); diff --git a/include/Makefile.am b/include/Makefile.am index 180b31f..2ab50f6 100644 --- a/include/Makefile.am +++ b/include/Makefile.am @@ -1,4 +1,10 @@ ## Process this file with automake to produce Makefile.in # $Id$ # $Name$ -include_HEADERS = gedcom.h +include_HEADERS = gedcom.h \ + gedcom-tags.h +BUILT_SOURCES = gedcom-tags.h + +gedcom-tags.h: $(srcdir)/../gedcom/gedcom.tab.h + cat $(srcdir)/../gedcom/gedcom.tab.h | grep "TAG_\|USERTAG" > gedcom-tags.h + diff --git a/include/gedcom.h b/include/gedcom.h index 5e66476..a7a64c8 100644 --- a/include/gedcom.h +++ b/include/gedcom.h @@ -1,5 +1,5 @@ /* External header for the Gedcom parser library. - Copyright (C) 2001 The Genes Development Team + Copyright (C) 2001,2002 The Genes Development Team This file is part of the Gedcom parser library. Contributed by Peter Verthez , 2001. @@ -28,6 +28,10 @@ __BEGIN_DECLS +#ifndef GEDCOM_INTERNAL +#include +#endif + /**************************************************************************/ /*** First the records and elements to subscribe upon ***/ /**************************************************************************/ @@ -276,10 +280,14 @@ typedef enum _DATE_VAL_MOD { DV_PHRASE /* Only phrase is given */ } Date_value_type; +/* All Unicode characters between U+0000 and U+FFFF can be encoded in + UTF-8 with 3 or less bytes */ +#define UTF_FACTOR 3 + #define MAX_DAY_LEN 2 #define MAX_MONTH_LEN 4 #define MAX_YEAR_LEN 7 -#define MAX_PHRASE_LEN 35 +#define MAX_PHRASE_LEN 35 * UTF_FACTOR struct date { Calendar_type cal; @@ -302,6 +310,9 @@ struct date_value { char phrase[MAX_PHRASE_LEN + 1]; }; +/* Type for context handling, meant to be opaque */ +typedef void* Gedcom_ctxt; + /**************************************************************************/ /*** Things meant to be internal, susceptible to changes ***/ /*** Use the GEDCOM_STRING/GEDCOM_DATE interface instead of relying ***/ @@ -327,6 +338,7 @@ typedef struct _Gedcom_val_struct { void gedcom_cast_error(char* file, int line, Gedcom_val_type tried_type, Gedcom_val_type real_type); + extern struct date_value def_date_val; #define GV_CHECK_CAST(VAL, TYPE, MEMBER, DEFVAL) \ @@ -341,9 +353,6 @@ extern struct date_value def_date_val; /*** Function interface ***/ /**************************************************************************/ -/* Type for context handling, meant to be opaque */ -typedef void* Gedcom_ctxt; - /* Type for parsed values, meant to be opaque */ typedef Gedcom_val_struct* Gedcom_val; @@ -371,7 +380,7 @@ typedef void typedef Gedcom_ctxt (*Gedcom_rec_start_cb) - (int level, Gedcom_val xref, char *tag); + (int level, Gedcom_val xref, char *tag, int tag_value); typedef void (*Gedcom_rec_end_cb) (Gedcom_ctxt self); @@ -379,14 +388,16 @@ typedef void typedef Gedcom_ctxt (*Gedcom_elt_start_cb) (Gedcom_ctxt parent, - int level, char *tag, char *raw_value, Gedcom_val parsed_value); + int level, char *tag, char *raw_value, + int tag_value, Gedcom_val parsed_value); typedef void (*Gedcom_elt_end_cb) (Gedcom_ctxt parent, Gedcom_ctxt self, Gedcom_val parsed_value); typedef void (*Gedcom_def_cb) - (Gedcom_ctxt parent, int level, char *tag, char *raw_value); + (Gedcom_ctxt parent, int level, char *tag, char *raw_value, + int tag_value); int gedcom_parse_file(char* file_name); void gedcom_set_debug_level(int level, FILE* trace_output); diff --git a/standalone.c b/standalone.c index 863da53..524dfac 100644 --- a/standalone.c +++ b/standalone.c @@ -1,5 +1,5 @@ /* Test program for the Gedcom library. - Copyright (C) 2001 The Genes Development Team + Copyright (C) 2001, 2002 The Genes Development Team This file is part of the Gedcom parser library. Contributed by Peter Verthez , 2001. @@ -59,7 +59,7 @@ void show_help () printf(" -3 Run the test parse 3 times instead of once\n"); } -Gedcom_ctxt header_start(int level, Gedcom_val xref, char *tag) +Gedcom_ctxt header_start(int level, Gedcom_val xref, char *tag, int tag_value) { output(1, "Header start\n"); return (Gedcom_ctxt)0; @@ -73,7 +73,7 @@ void header_end(Gedcom_ctxt self) char family_xreftags[100][255]; int family_nr = 0; -Gedcom_ctxt family_start(int level, Gedcom_val xref, char *tag) +Gedcom_ctxt family_start(int level, Gedcom_val xref, char *tag, int tag_value) { output(1, "Family start, xref is %s\n", GEDCOM_STRING(xref)); strcpy(family_xreftags[family_nr], GEDCOM_STRING(xref)); @@ -85,14 +85,15 @@ void family_end(Gedcom_ctxt self) output(1, "Family end, xref is %s\n", family_xreftags[(int)self]); } -Gedcom_ctxt submit_start(int level, Gedcom_val xref, char *tag) +Gedcom_ctxt submit_start(int level, Gedcom_val xref, char *tag, int tag_value) { output(1, "Submitter, xref is %s\n", GEDCOM_STRING(xref)); return (Gedcom_ctxt)10000; } Gedcom_ctxt source_start(Gedcom_ctxt parent, int level, char *tag, - char* raw_value, Gedcom_val parsed_value) + char* raw_value, + int tag_value, Gedcom_val parsed_value) { Gedcom_ctxt self = (Gedcom_ctxt)((int) parent + 1000); output(1, "Source is %s (ctxt is %d, parent is %d)\n", @@ -106,7 +107,8 @@ void source_end(Gedcom_ctxt parent, Gedcom_ctxt self, Gedcom_val parsed_value) } Gedcom_ctxt source_date_start(Gedcom_ctxt parent, int level, char *tag, - char* raw_value, Gedcom_val parsed_value) + char* raw_value, + int tag_value, Gedcom_val parsed_value) { struct date_value dv; Gedcom_ctxt self = (Gedcom_ctxt)((int) parent + 1000); @@ -134,9 +136,11 @@ Gedcom_ctxt source_date_start(Gedcom_ctxt parent, int level, char *tag, return self; } -void default_cb(Gedcom_ctxt ctxt, int level, char *tag, char *raw_value) +void default_cb(Gedcom_ctxt ctxt, int level, char *tag, char *raw_value, + int tag_value) { - output(0, "== %d %s %s (ctxt is %d)\n", level, tag, raw_value, (int)ctxt); + output(0, "== %d %s (%d) %s (ctxt is %d)\n", + level, tag, tag_value, raw_value, (int)ctxt); } void subscribe_callbacks()