X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;f=doc%2Fusage.html;h=409ebbf308878bf17f0b2726ade6faf57a8eedb3;hb=b109f95d373fa6baae6c1a43e5b3805fb7fd22fb;hp=fbdcf1db7d5416728fee344d717d5af047b294bf;hpb=0a3842e05ee5ad37ffacefb70f685bbae3ad7fe6;p=gedcom-parse.git diff --git a/doc/usage.html b/doc/usage.html index fbdcf1d..409ebbf 100644 --- a/doc/usage.html +++ b/doc/usage.html @@ -1,451 +1,615 @@ - - -
-libgedcom.so
), to be linked in the application
- programgedcom.h
), to be used in the sources
- of the application programgedcom-tags.h
) that is also installed,
-but that is automatically included via gedcom.h
libgedcom.so
), to be linked in the application
+ programgedcom.h
), to be used in the sources
+ of the application programgedcom-tags.h
) that is also installed,
+ but that is automatically included via gedcom.h
$PREFIX/share/gedcom-parse
- that contains some additional stuff, but which is not immediately important
- at first. I'll leave the description of the data directory for later.$PREFIX/share/gedcom-parse
+ that contains some additional stuff, but which is not immediately
+important at first. I'll leave the description of the data directory
+for later.int result;
- ...
- result = gedcom_parse_file("myfamily.ged");
-
- Although this will not provide much information, one thing it does is
- parse the entire file and return the result. The function returns
-0 on success and 1 on failure. No other information is available using
+ ...+ In the above piece of code,void my_message_handler (Gedcom_msg_type type, + Since this is a relatively simple topic, it is discussed before the +actual callback mechanism, although it also uses a callback...
+
+ The library can be used in several different circumstances, both terminal-based + as GUI-based. Therefore, it leaves the actual display of the error + message up to the application. For this, the application needs to register + a callback before parsing the GEDCOM file, which will be called by the library + on errors, warnings and messages.
+
+ A typical piece of code would be:
+ +- In the above piece of code,void my_message_handler (Gedcom_msg_type type, char *msg)
- {
- ...
- }
- ...
- gedcom_set_message_handler(my_message_handler);
- ...
- result = gedcom_parse_file("myfamily.ged");
-my_message_handler
is the callback - that will be called for errors (type=ERROR
), warnings (- type=WARNING
) and messages (type=MESSAGE
). The - callback must have the signature as in the example. For errors, the + {
+ ...
+ }
+ ...
+ gedcom_set_message_handler(my_message_handler);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+
my_message_handler
is the callback
+ that will be called for errors (type=ERROR
), warnings (type=WARNING
) and messages (type=MESSAGE
). The
+ callback must have the signature as in the example. For errors, the
msg
passed to the callback will have the format:- Note that the entire string will be properly internationalized, and encoded - in UTF-8 (see "Why UTF-8?" LINK TBD). Also, no newline - is appended, so that the application program can use it in any way it wants. - Warnings are similar, but use "Warning" instead of "Error". Messages - are plain text, without any prefix.Error on line
<lineno>: <actual_message>
-
printf
- is used in the message handler.
+printf
is used in the message handler.+ Using theGedcom_ctxt my_header_start_cb (int level, + +
- Using theGedcom_ctxt my_header_start_cb (int level,
- + Gedcom_val xref,
- + char *tag,
- + char *raw_value,
- + int parsed_tag,
- + Gedcom_val parsed_value)
- {
- printf("The header starts\n");
- return (Gedcom_ctxt)1;
- }
-
- void my_header_end_cb (Gedcom_ctxt self)
- {
- printf("The header ends, context is %d\n", self); /* context + {
+ printf("The header starts\n");
+ return (Gedcom_ctxt)1;
+ }
+
+ void my_header_end_cb (Gedcom_ctxt self)
+ {
+ printf("The header ends, context is %d\n", (int)self); /* context will print as "1" */
- }
-
- ...
- gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, - my_header_end_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");
-gedcom_subscribe_to_record
function, the application - requests to use the specified callbacks as start and end callback. The -end callback is optional: you can passNULL
if you are not -interested in the end callback. The identifiers to use as first argument -to the function (hereREC_HEAD
) are described in the - interface details.
-
- From the name of the function it becomes clear that this function is -specific to complete records. For the separate elements in records -there is another function, which we'll see shortly. Again, the callbacks + }
+
+ ...
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, + my_header_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+
gedcom_subscribe_to_record
function, the application
+ requests to use the specified callbacks as start and end callback. The end
+ callback is optional: you can pass NULL
if you are not interested
+ in the end callback. The identifiers to use as first argument to
+the function (here REC_HEAD
) are described in the
+ interface details.Gedcom_ctxt
type that is used as a result of the start
- callback and as an argument to the end callback is vital for passing context
- necessary for the application. This type is meant to be opaque; in
-fact, it's a void pointer, so you can pass anything via it. The important
- thing to know is that the context that the application returns in the start
- callback will be passed in the end callback as an argument, and as we will
- see shortly, also to all the directly subordinate elements of the record.tag
is the GEDCOM tag in string format, the parsed_tag
- is an integer, for which symbolic values are defined as TAG_HEAD,
- TAG_SOUR,
TAG_DATA,
... and USERTAG
- for the application-specific tags. These values are defined in the
-header gedcom-tags.h
that is installed, and included via
- gedcom.h
(so no need to include gedcom-tags.h
yourself).struct
that will contain the information for the
- header. In the end callback, the application could then e.g. do some
+ Gedcom_ctxt
type that is used as a result of the start
+ callback and as an argument to the end callback is vital for passing context
+ necessary for the application. This type is meant to be opaque; in
+ fact, it's a void pointer, so you can pass anything via it. The important
+ thing to know is that the context that the application returns in the start
+ callback will be passed in the end callback as an argument, and as we will
+ see shortly, also to all the directly subordinate elements of the record.tag
is the GEDCOM tag in string format, the parsed_tag
+ is an integer, for which symbolic values are defined as TAG_HEAD,
+ TAG_SOUR,
TAG_DATA,
... and USERTAG
+ for the application-specific tags. These values are defined in the
+ header gedcom-tags.h
that is installed, and included via
+ gedcom.h
(so no need to include gedcom-tags.h
yourself).struct
(or an object in a C++ application) that will contain the information for the
+ header. In the end callback, the application could then e.g. do some
finalizing operations on the struct
to put it in its database.Gedcom_val
type for the xref
- and parsed_value
arguments was not discussed, see further
+ Gedcom_val
type for the xref
+ and parsed_value
arguments was not discussed, see further
for this)Gedcom_ctxt my_header_source_start_cb(Gedcom_ctxt
+ We will now retrieve the SOUR field (the name of the program that wrote
+ the file) from the header:
+
+ Gedcom_ctxt my_header_source_start_cb(Gedcom_ctxt
parent,
-
- int
- level,
-
- char*
- tag,
-
- char*
- raw_value,
-
- int
- parsed_tag,
-
- Gedcom_val
- parsed_value)
- {
- char *source = GEDCOM_STRING(parsed_value);
- printf("This file was written by %s\n", source);
- return parent;
- }
+
+ int
+ level,
+
+ char*
+ tag,
+
+ char*
+ raw_value,
+
+ int
+ parsed_tag,
+
+ Gedcom_val
+ parsed_value)
+ {
+ char *source = GEDCOM_STRING(parsed_value);
+ printf("This file was written by %s\n", source);
+ return parent;
+ }
+
+ void my_header_source_end_cb(Gedcom_ctxt parent,
+
+ Gedcom_ctxt self,
+
+ Gedcom_val parsed_value)
+ {
+ printf("End of the source description\n");
+ }
+
+ ...
+ gedcom_subscribe_to_element(ELT_HEAD_SOUR,
+
+ my_header_source_start_cb,
+
+ my_header_source_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+
+ The subscription mechanism for elements is similar, only the signatures
+ of the callbacks differ. The signature for the start callback shows
+ that the context of the parent line (here e.g. the struct
that
+describes the header) is passed to this start callback. The callback
+itself returns here in this example the same context, but this can be its own context object
+of course. The end callback is called with both the context of the
+parent and the context of itself, which in this example will be the same.
+ Again, the list of identifiers to use as a first argument for the
+subscription function are detailed in the
+ interface details .
+
+ If we look at the other arguments of the start callback, we see the
+level number (the initial number of the line in the GEDCOM file), the tag
+(e.g. "SOUR"), and then a raw value, a parsed tag and a parsed value. The
+ raw value is just the raw string that occurs as value on the line next to
+ the tag (in UTF-8 encoding). The parsed value is the meaningful value
+ that is parsed from that raw string. The parsed tag is described in
+ the section for record callbacks above.
+
+ The Gedcom_val
type is meant to be an opaque type. The
+ only thing that needs to be known about it is that it can contain specific
+ data types, which have to be retrieved from it using pre-defined macros.
+ These data types are described in the
+ interface details.
- void my_header_source_end_cb(Gedcom_ctxt parent,
-
- Gedcom_ctxt self,
-
- Gedcom_val parsed_value)
+ Some extra notes:
+
+
+ - The
Gedcom_val
argument of the end callback
+ is currently not used. It is there for future enhancements.
+ - There are also two
Gedcom_val
arguments in
+the start callback for records. The first one (xref
) contains the xref_value
corresponding to the cross-reference (or NULL
if there isn't one), the second one (parsed_value
) contains the value that is parsed from the raw_value
. See the interface details.
+
+
+
+ Default callbacks
+
+ As described above, an application doesn't always implement the entire
+ GEDCOM spec, and application-specific tags may have been added by other applications.
+ To preserve this extra data anyway, a default callback can be registered
+ by the application, as in the following example:
+
+ void my_default_cb (Gedcom_ctxt parent,
+ int level, char* tag, char* raw_value, int parsed_tag)
{
- printf("End of the source description\n");
+ ...
}
-
+
...
- gedcom_subscribe_to_element(ELT_HEAD_SOUR,
-
- my_header_source_start_cb,
-
- my_header_source_end_cb);
+ gedcom_set_default_callback(my_default_cb);
...
result = gedcom_parse_file("myfamily.ged");
-
- The subscription mechanism for elements is similar, only the signatures
- of the callbacks differ. The signature for the start callback shows
- that the context of the parent line (e.g. the struct
that describes
- the header) is passed to this start callback. The callback itself
-returns here the same context, but this can be its own context object of
-course. The end callback is called with both the context of the parent
-and the context of itself, which will be the same in the example. Again,
- the list of identifiers to use as a first argument for the subscription
-function are detailed in the
-interface details .
-
- If we look at the other arguments of the start callback, we see the level
- number (the initial number of the line in the GEDCOM file), the tag (e.g.
- "SOUR"), and then a raw value, a parsed tag and a parsed value. The
-raw value is just the raw string that occurs as value on the line next to
-the tag (in UTF-8 encoding). The parsed value is the meaningful value
-that is parsed from that raw string. The parsed tag is described in
-the section for record callbacks.
-
- The Gedcom_val
type is meant to be an opaque type. The
- only thing that needs to be known about it is that it can contain specific
- data types, which have to be retrieved from it using pre-defined macros.
- These data types are described in the
- interface details.
-
- Some extra notes:
-
-
- - The
Gedcom_val
argument of the end callback
- is currently not used. It is there for future enhancements.
- - There is also a
Gedcom_val
argument in the
- start callback for records. This argument is currently a string value
- giving the pointer in string form.
-
-
-
- Default callbacks
-
- As described above, an application doesn't always implement the entire
- GEDCOM spec, and application-specific tags may have been added by other
-applications. To preserve this extra data anyway, a default callback
-can be registered by the application, as in the following example:
-
- void my_default_cb (Gedcom_ctxt parent,
- int level, char* tag, char* raw_value, int parsed_tag)
- {
- ...
- }
-
- ...
- gedcom_set_default_callback(my_default_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");
-
- This callback has a similar signature as the previous ones,
-but it doesn't contain a parsed value. However, it does contain the
-parent context, that was returned by the application for the most specific
-containing tag that the application supported.
-
- Suppose e.g. that this callback is called for some tags in the header
-that are specific to some other application, then our application could
-make sure that the parent context contains the struct or object that represents
-the header, and use the default callback here to add the level, tag and raw_value
- as plain text in a member of that struct or object, thus preserving the
-information. The application can then write this out when the data
-is saved again in a GEDCOM file. To make it more specific, consider
-the following example:
-
+
+ This callback has a similar signature as the previous ones,
+ but it doesn't contain a parsed value. However, it does contain the
+ parent context, that was returned by the application for the most specific
+ containing tag that the application supported.- Note that the default callback will be called for any tag that isn't specifically - subscribed upon by the application, and can thus be called in various contexts. - For simplicity, the example above doesn't take this into account (the -struct header {
- char* source;
- ...
- char* extra_text;
- };
-
- Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag, + char* source;
+ ...
+ char* extra_text;
+ };
+
+ Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag, char *raw_value,
- + int parsed_tag, Gedcom_val parsed_value)
- {
- struct header head = my_make_header_struct();
- return (Gedcom_ctxt)head;
- }
-
- void my_default_cb(Gedcom_ctxt parent, int level, char* tag, char* raw_value, -int parsed_tag)
- {
- struct header head = (struct header)parent;
- my_header_add_to_extra_text(head, level, tag, raw_value);
- }
-
- gedcom_set_default_callback(my_default_cb);
- gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
- ...
- result = gedcom_parse_file(filename);
-
parent
could be of different types, depending
-on the context).parent
could be of different
+types, depending on the context).NULL
. This is e.g. the case if none of the "upper" tags has been subscribed upon.- Thevoid gedcom_set_debug_level (int level, - FILE* trace_output)
-
level
can be one of the following values:+ Thevoid gedcom_set_debug_level (int level, + FILE* trace_output)
+
level
can be one of the following values:trace_output
is NULL
, debugging information
- will be written to stderr
, otherwise the given file handle is
- used (which must be open).trace_output
is NULL
, debugging information
+ will be written to stderr
, otherwise the given file handle
+is used (which must be open).- Thevoid gedcom_set_error_handling (Gedcom_err_mech - mechanism)
-
mechanism
can be one of:+ Thevoid gedcom_set_error_handling (Gedcom_err_mech + mechanism)
+
mechanism
can be one of:IMMED_FAIL
: immediately fail the
+ IMMED_FAIL
: immediately fail the
parsing on an error (this is the default)DEFER_FAIL
: continue parsing after
+ DEFER_FAIL
: continue parsing after
an error, but return a failure code eventuallyIGNORE_ERRORS
: continue parsing after
- an error, return success alwaysIGNORE_ERRORS
: continue parsing
+after an error, return success always- The argument can be:void gedcom_set_compat_handling - (int enable_compat)
-
locale
mechanism (i.e. via the LANG
, LC_ALL
or LC_CTYPE
environment variables), which also controls the gettext
+ mechanism in the application. gedcom-parse
contains an example implementation (utf8-locale.c
and utf8-locale.h
+ in the top directory). Feel free to use it in
+your source code (it is not part of the library, and it isn't installed anywhere,
+so you need to take over the source and header file in your application).
+ + +Both functions return a pointer to a static buffer that is overwritten on +each call. To function properly, the application must first set the +locale using thechar *convert_utf8_to_locale (char *input, int *conv_failures);
char *convert_locale_to_utf8 (char *input);
setlocale
function (the second step detailed below).
+ All other steps given below, including setting up and closing down the conversion
+handles, are transparantly handled by the two functions. NULL
if you are not interested (note that usually, the interesting information is just whether there were
+ conversion failures or not, which is then given by the integer being bigger
+than zero or not). The second function doesn't need this, because any
+locale can be converted to UTF-8.+void convert_set_unknown (const char *unknown);
+++++#include <locale.h> /* for setlocale */
#include <langinfo.h> /* for nl_langinfo */
#include <iconv.h> /* for iconv_* functions */
+++++setlocale(LC_ALL, "");
+++++iconv_t iconv_handle;
...
iconv_handle = iconv_open(nl_langinfo(CODESET), "UTF-8");
if (iconv_handle == (iconv_t) -1)
/* signal an error */
+++++/* char* in_buf is the input buffer, size_t in_len is its length */
/* char* out_buf is the output buffer, size_t out_len is its length */
size_t nconv;
char *in_ptr = in_buf;
char *out_ptr = out_buf;
nconv = iconv(iconv_handle, &in_ptr, &in_len, &out_ptr, &out_len);
If the output buffer is not big enough,+iconv
will return -1 and seterrno
toE2BIG
. Also, thein_ptr
andout_ptr
will point just after the last successfully converted character in the respective buffers, and thein_len
andout_len
will be updated to show the remaining lengths. There can be two strategies here:
++
+Another error case is when the conversion was unsuccessful (if one of the +characters can't be represented in the target character set). The- Make sure from the beginning +that the output buffer is big enough. However, it's difficult to find +an absolute maximum length in advance, even given the length of the input +string.
+
+
+- Do the conversion in several steps, growing the output buffer each time to make more space, and calling
+iconv
+ consecutively until the conversion is complete. This is the preferred +way (a function could be written to encapsulate all this).iconv
function will then also return -1 and seterrno
toEILSEQ
; thein_ptr
will point to the character that couldn't be converted. In that case, again two strategies are possible:
++
+- Just fail the conversion, and show an error. This is not very user friendly, of course.
+
+
+- Skip over the character that can't be converted and append a "?" to the output buffer, then call
+iconv
again. Skipping over a UTF-8 character is fairly simple, as follows from the encoding rules:+
++
+- if the first byte is in binary 0xxxxxxx, then the character is only one byte long, just skip over that byte
+
+
+- if the first byte is in binary 11xxxxxx, then skip over that byte and all bytes 10xxxxxx that follow.
+
+
++ The example implementation +mentioned above grows the output buffer dynamically and outputs "?" for characters +that can't be converted.+++iconv_close(iconv_handle);
+ +There are three preprocessor symbols defined for version checks in the header:AC_CHECK_LIB(gedcom, gedcom_parse_file,,
+ AC_MSG_ERROR(Cannot find libgedcom: Please install gedcom-parse))
+AC_MSG_CHECKING(for libgedcom version)
+AC_TRY_RUN([
+#include <stdio.h>
+#include <stdlib.h>
+#include <gedcom.h>
+int
+main()
+{
+if (GEDCOM_PARSE_VERSION >= 1034) exit(0);
+exit(1);
+}],
+ac_gedcom_version_ok='yes',
+ac_gedcom_version_ok='no',
+ac_gedcom_version_ok='no')
+if test "$ac_gedcom_version_ok" = 'yes' ; then
+ AC_MSG_RESULT(ok)
+else
+ AC_MSG_RESULT(not ok)
+ AC_MSG_ERROR(You need at least version 1.34 of gedcom-parse)
+fi
+
GEDCOM_PARSE_VERSION_MAJOR
GEDCOM_PARSE_VERSION_MINOR
GEDCOM_PARSE_VERSION
(GEDCOM_PARSE_VERSION_MAJOR * 1000) + GEDCOM_PARSE_VERSION_MINOR.
$Id$-
$Name$
$Id$-
$Name$
- - - + + \ No newline at end of file