X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;ds=sidebyside;f=doc%2Fusage.html;h=a39d0ac13bb5b9d940995e902970e93221c5e8ac;hb=bd6f144d30a06cb896a0c9544e4af5f42848bae7;hp=f47684feb49bf606a63fc61fa86076dc345a2e44;hpb=7a161f98fa3efba595c96577e3ae7eda15b3dec3;p=gedcom-parse.git diff --git a/doc/usage.html b/doc/usage.html index f47684f..a39d0ac 100644 --- a/doc/usage.html +++ b/doc/usage.html @@ -2,306 +2,721 @@
libgedcom.so
), to be linked in the application
-programgedcom.h
), to be used in the sources of
-the application programlibgedcom.so
), to be linked in the
+application programgedcom.h
), to be used in the sources
+ of the application programgedcom-tags.h
) that is also installed,
+ but that is automatically included via gedcom.h
$PREFIX/share/gedcom-parse
- that contains some additional stuff, but which is not immediately important
-at first. I'll leave the description of the data directory for later.$PREFIX/share/gedcom-parse
+ that contains some additional stuff, but which is not immediately
+ important at first. I'll leave the description of the data directory
+ for later.int result;
- ...
- result = gedcom_parse_file("myfamily.ged");
-
- Although this will not provide much information, one thing it does is parse
-the entire file and return the result. The function returns 0 on success
-and 1 on failure. No other information is available using this function
-only.-In the above piece of code,void my_message_handler (Gedcom_msg_type type, -char *msg)
-{
- ...
-}
-...
- gedcom_set_message_handler(my_message_handler);
-...
-result = gedcom_parse_file("myfamily.ged");
-
my_message_handler
is the callback
-that will be called for errors (type=ERROR
), warnings (
-type=WARNING
) and messages (type=MESSAGE
). The
-callback must have the signature as in the example. For errors, the
- msg
passed to the callback will have the format:-Note that the entire string will be properly internationalized, and encoded -in UTF-8 (see "Why UTF-8?" LINK TBD). Also, no newline -is appended, so that the application program can use it in any way it wants. - Warnings are similar, but use "Warning" instead of "Error". Messages -are plain text, without any prefix.Error on line
<lineno>: <actual_message>
-
printf
- is used in the message handler.- Using theGedcom_ctxt my_header_start_cb (int level, -Gedcom_val xref, char *tag)
-{
- printf("The header starts\n");
- return (Gedcom_ctxt)1;
-}
-
-void my_header_end_cb (Gedcom_ctxt self)
-{
- printf("The header ends, context is %d\n", self); /* context -will print as "1" */
-}
-
-...
- gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, my_header_end_cb);
-...
-result = gedcom_parse_file("myfamily.ged");
-
gedcom_subscribe_to_record
function, the application
-requests to use the specified callbacks as start and end callback. The end
-callback is optional: you can pass NULL
if you are not interested
-in the end callback. The identifiers to use as first argument to the
-function (here REC_HEAD
) are described in TBD (use the header
-file for now...).Gedcom_ctxt
type that is used as a result of the start callback
-and as an argument to the end callback is vital for passing context necessary
-for the application. This type is meant to be opaque; in fact, it's
-a void pointer, so you can pass anything via it. The important thing
-to know is that the context that the application returns in the start callback
-will be passed in the end callback as an argument, and as we will see shortly,
-also to all the directly subordinate elements of the record.struct
that will contain the information for the
-header. In the end callback, the application could then e.g. do some
-finalizing operations on the struct
to put it in its database.Gedcom_val
type for the xref
argument
-was not discussed, see further for this)-The subscription mechanism for elements is similar, only the signatures of -the callbacks differ. The signature for the start callback shows that -the context of the parent line (e.g. theGedcom_ctxt my_header_source_start_cb(Gedcom_ctxt -parent,
- - int - level,
- - char* - tag,
- - char* - raw_value,
- - Gedcom_val parsed_value)
-{
- char *source = GEDCOM_STRING(parsed_value);
- printf("This file was written by %s\n", source);
- return parent;
-}
-
-void my_header_source_end_cb(Gedcom_ctxt parent,
- - Gedcom_ctxt self,
- - Gedcom_val parsed_value)
-{
- printf("End of the source description\n");
-}
+ The call togedcom_init
() should be one of the first calls +in your program. The requirement is that it should come before the first +call toiconv_open
(part of the generic character set conversion +feature) in the program, either by your program itself, or indirectly by +the library calls it makes. Practically, it should e.g. come before + any calls to any GTK functions, because GTK usesiconv_open
+ in its initialization. For the same reason it is also advised to put +the-lgedcom
option on the linking of the program as the last +option, so that it's initialization code is run first.
-...
- gedcom_subscribe_to_element(ELT_HEAD_SOUR,
- - my_header_source_start_cb,
- - my_header_source_end_cb);
-...
-result = gedcom_parse_file("myfamily.ged");
-
struct
that describes
-the header) is passed to this start callback. The callback itself returns
-here the same context, but this can be its own context object of course.
- The end callback is called with both the context of the parent and
-the context of itself, which will be the same in the example.Gedcom_val
type is meant to be an opaque type. The
-only thing that needs to be known about it is that it can contain specific
-data types, which have to be retrieved from it using pre-defined macros.
- Currently, the specific types are (with val
of type
-Gedcom_val
):- |
- type checker - |
- cast operator - |
-
null value - |
- GEDCOM_IS_NULL(val) - |
- N/A - |
-
string - |
- GEDCOM_IS_STRING(val) - |
- char* str = GEDCOM_STRING(val); - |
-
date - |
- GEDCOM_IS_DATE(val) - |
- struct date_value dv = GEDCOM_DATE(val)
-; - |
-
Gedcom_val
argument of the end callback
-is currently not used. It is there for future enhancements.Gedcom_val
argument in the start
-callback for records. This argument is currently a string value giving
-the pointer in string form.+ In the above piece of code,void my_message_handler (Gedcom_msg_type type, + char *msg)
+ {
+ ...
+ }
+ ...
+ gedcom_set_message_handler(my_message_handler);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+
my_message_handler
is the
+ callback that will be called for errors (type=ERROR
), warnings
+ (type=WARNING
) and messages (type=MESSAGE
). The
+ callback must have the signature as in the example. For errors,
+the msg
passed to the callback will have the format:+ Note that the entire string will be properly internationalized, and + encoded in UTF-8 (Why UTF-8?). Also, +no newline is appended, so that the application program can use it in any +way it wants. Warnings are similar, but use "Warning" instead of "Error". + Messages are plain text, without any prefix.Error on line
<lineno>: <actual_message>
+
+ printf
is used in the message handler.+ Using theGedcom_ctxt my_header_start_cb (int level,
+ + Gedcom_val xref,
+ + char *tag,
+ + char *raw_value,
+ + int parsed_tag,
+ + Gedcom_val parsed_value)
+ {
+ printf("The header starts\n");
+ return (Gedcom_ctxt)1;
+ }
+
+ void my_header_end_cb (Gedcom_ctxt self)
+ {
+ printf("The header ends, context is %d\n", (int)self); + /* context will print as "1" */
+ }
+
+ ...
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, + my_header_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+
gedcom_subscribe_to_record
function, the
+ application requests to use the specified callbacks as start and end
+callback. The end callback is optional: you can pass NULL
+ if you are not interested in the end callback. The identifiers
+to use as first argument to the function (here REC_HEAD
)
+are described in the interface
+details .Gedcom_ctxt
type that is used as a result of the
+start callback and as an argument to the end callback is vital for passing
+context necessary for the application. This type is meant to be opaque;
+in fact, it's a void pointer, so you can pass anything via it. The
+important thing to know is that the context that the application returns
+in the start callback will be passed in the end callback as an argument,
+and as we will see shortly, also to all the directly subordinate elements
+of the record.tag
is the GEDCOM tag in string format, the parsed_tag
+ is an integer, for which symbolic values are defined as TAG_HEAD,
+ TAG_SOUR,
TAG_DATA,
... and USERTAG
+
for the application-specific tags. These values
+are defined in the header gedcom-tags.h
that is installed,
+and included via gedcom.h
(so no need to include gedcom-tags.h
+ yourself).struct
(or an object in a C++ application)
+ that will contain the information for the header. In the end callback,
+ the application could then e.g. do some finalizing operations on the
+ struct
to put it in its database.Gedcom_val
type for the xref
+ and parsed_value
arguments was not discussed, see further
+ for this)+ The subscription mechanism for elements is similar, only the signatures + of the callbacks differ. The signature for the start callback shows + that the context of the parent line (here e.g. theGedcom_ctxt my_header_source_start_cb(Gedcom_ctxt + parent,
+ + int + level,
+ + char* + tag,
+ + char* + raw_value,
+ + int + parsed_tag,
+ + Gedcom_val + parsed_value)
+ {
+ char *source = GEDCOM_STRING(parsed_value);
+ printf("This file was written by %s\n", source);
+ return parent;
+ }
+
+ void my_header_source_end_cb(Gedcom_ctxt parent,
+ + Gedcom_ctxt self,
+ + Gedcom_val parsed_value)
+ {
+ printf("End of the source description\n");
+ }
+
+ ...
+ gedcom_subscribe_to_element(ELT_HEAD_SOUR,
+ + my_header_source_start_cb,
+ + my_header_source_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+
struct
+ that describes the header) is passed to this start callback. The
+ callback itself returns here in this example the same context, but this
+can be its own context object of course. The end callback is called
+with both the context of the parent and the context of itself, which in this
+example will be the same. Again, the list of identifiers to use as
+a first argument for the subscription function are detailed in the interface details .Gedcom_val
type is meant to be an opaque type. The
+ only thing that needs to be known about it is that it can contain specific
+ data types, which have to be retrieved from it using pre-defined macros.
+ These data types are described in the interface details.
+ Gedcom_val
argument of the end callback
+ is currently not used. It is there for future enhancements.Gedcom_val
arguments
+ in the start callback for records. The first one (xref
+ ) contains the xref_value
corresponding to the cross-reference
+ (or NULL
if there isn't one), the second one (parsed_value
+ ) contains the value that is parsed from the raw_value
. See
+ the interface details
+ .+ This callback has a similar signature as the previous ones, + but it doesn't contain a parsed value. However, it does contain the + parent context, that was returned by the application for the most specific + containing tag that the application supported.void my_default_cb (Gedcom_ctxt parent, int level, + char* tag, char* raw_value, int parsed_tag)
+ {
+ ...
+ }
+
+ ...
+ gedcom_set_default_callback(my_default_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+
+ Note that the default callback will be called for any tag that isn't + specifically subscribed upon by the application, and can thus be called + in various contexts. For simplicity, the example above doesn't take + this into account (thestruct header {
+ char* source;
+ ...
+ char* extra_text;
+ };
+
+ Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag, + char *raw_value,
+ + int parsed_tag, Gedcom_val parsed_value)
+ {
+ struct header head = my_make_header_struct();
+ return (Gedcom_ctxt)head;
+ }
+
+ void my_default_cb(Gedcom_ctxt parent, int level, char* tag, char* +raw_value, int parsed_tag)
+ {
+ struct header head = (struct header)parent;
+ my_header_add_to_extra_text(head, level, tag, raw_value);
+ }
+
+ gedcom_set_default_callback(my_default_cb);
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
+ ...
+ result = gedcom_parse_file(filename);
+
parent
could be of different
+ types, depending on the context).NULL
. This is e.g. the case if none
+ of the "upper" tags has been subscribed upon.+ Thevoid gedcom_set_debug_level (int level, FILE* +trace_output)
+
level
can be one of the following values:trace_output
is NULL
, debugging information
+ will be written to stderr
, otherwise the given file handle
+ is used (which must be open).+ Thevoid gedcom_set_error_handling (Gedcom_err_mech + mechanism)
+
mechanism
can be one of:IMMED_FAIL
: immediately fail
+the parsing on an error (this is the default)DEFER_FAIL
: continue parsing
+after an error, but return a failure code eventuallyIGNORE_ERRORS
: continue parsing
+ after an error, return success always+ The argument can be:void gedcom_set_compat_handling (int enable_compat)
+
locale
mechanism (i.e. via the LANG
,
+ LC_ALL
or LC_CTYPE
environment variables), which also
+controls the gettext
mechanism in the application.
+gedcom-parse
contains an example implementation (utf8-locale.c
+ and utf8-locale.h
in the "t" subdirectory of the top directory).
+ Feel free to use it in your source code (it is not part of the library,
+and it isn't installed anywhere, so you need to take over the source and
+header file in your application). ++ Both functions return a pointer to a static buffer that is overwritten + on each call. To function properly, the application must first set +the locale using the+char *convert_utf8_to_locale (char *input, int *conv_failures);
char *convert_locale_to_utf8 (char *input);
setlocale
function (the second step detailed
+ below). All other steps given below, including setting up and closing
+ down the conversion handles, are transparantly handled by the two functions.
+ NULL
if you are not interested
+(note that usually, the interesting information is just whether there
+were conversion failures or not, which is then given by the integer
+being bigger than zero or not). The second function doesn't need this,
+because any locale can be converted to UTF-8.+++void convert_set_unknown (const char *unknown);
++ ++++#include <locale.h> /* for setlocale */
#include <langinfo.h> /* for nl_langinfo */
#include <iconv.h> /* for iconv_* functions */
++ ++++setlocale(LC_ALL, "");
++ ++++iconv_t iconv_handle;
...
iconv_handle = iconv_open(nl_langinfo(CODESET), "UTF-8");
if (iconv_handle == (iconv_t) -1)
/* signal an error */
++ ++++/* char* in_buf is the input buffer, size_t in_len is its length */
/* char* out_buf is the output buffer, size_t out_len is its length */
size_t nconv;
char *in_ptr = in_buf;
char *out_ptr = out_buf;
nconv = iconv(iconv_handle, &in_ptr, &in_len, &out_ptr, &out_len);
If the output buffer is not big enough,+ +iconv
will + return -1 and seterrno
toE2BIG
. Also, +thein_ptr
andout_ptr
will point just after +the last successfully converted character in the respective buffers, and +thein_len
andout_len
will be updated to show +the remaining lengths. There can be two strategies here:
+ ++
+ Another error case is when the conversion was unsuccessful (if one of +the characters can't be represented in the target character set). The +- Make sure from the beginning + that the output buffer is big enough. However, it's difficult to find + an absolute maximum length in advance, even given the length of the input + string.
+
+
+- Do the conversion in several + steps, growing the output buffer each time to make more space, and calling +
+ +iconv
consecutively until the conversion is complete. + This is the preferred way (a function could be written to encapsulate + all this).iconv
function will then also return -1 and seterrno
+ toEILSEQ
; thein_ptr
will point to the character + that couldn't be converted. In that case, again two strategies are +possible:
+ ++
+ +- Just fail the conversion, +and show an error. This is not very user friendly, of course.
+
+
+- Skip over the character that + can't be converted and append a "?" to the output buffer, then call
+ ++ iconv
again. Skipping over a UTF-8 character is fairly simple, + as follows from the encoding rules + :+ +
++
+ +- if the first byte is in +binary 0xxxxxxx, then the character is only one byte long, just skip over +that byte
+
+
+- if the first byte is in +binary 11xxxxxx, then skip over that byte and all bytes 10xxxxxx that follow.
+ +
+
++ The example implementation + mentioned above grows the output buffer dynamically and outputs "?" for characters + that can't be converted.+++iconv_close(iconv_handle);
+ There are three preprocessor symbols defined for version checks in the + header:AC_CHECK_LIB(gedcom, gedcom_parse_file,,
+ AC_MSG_ERROR(Cannot + find libgedcom: Please install gedcom-parse))
+ AC_MSG_CHECKING(for libgedcom version)
+ AC_TRY_RUN([
+ #include <stdio.h>
+ #include <stdlib.h>
+ #include <gedcom.h>
+ int
+ main()
+ {
+ if (GEDCOM_PARSE_VERSION >= 1034) exit(0);
+ exit(1);
+ }],
+ ac_gedcom_version_ok='yes',
+ ac_gedcom_version_ok='no',
+ ac_gedcom_version_ok='no')
+ if test "$ac_gedcom_version_ok" = 'yes' ; then
+ AC_MSG_RESULT(ok)
+ else
+ AC_MSG_RESULT(not ok)
+ AC_MSG_ERROR(You need at least version 1.34 of gedcom-parse)
+ fi
+
GEDCOM_PARSE_VERSION_MAJOR
GEDCOM_PARSE_VERSION_MINOR
GEDCOM_PARSE_VERSION
(GEDCOM_PARSE_VERSION_MAJOR * 1000) + GEDCOM_PARSE_VERSION_MINOR.
$Id$+ + +
$Name$
+ + + +