X-Git-Url: https://git.dlugolecki.net.pl/?a=blobdiff_plain;ds=inline;f=doc%2Fusage.html;h=0a8c7e85c5ece5d1352d099348de28d8366d81f0;hb=dbe61389396a4fb8ee50f6a5bd5fe4219ed43290;hp=76a11bfab4748ab93db208346238bd4ba927dc72;hpb=f789da85454184a473145e9a1d1260b9e09afcd1;p=gedcom-parse.git diff --git a/doc/usage.html b/doc/usage.html index 76a11bf..0a8c7e8 100644 --- a/doc/usage.html +++ b/doc/usage.html @@ -1,416 +1,746 @@ - - -
-libgedcom.so
), to be linked in the application
- programgedcom.h
), to be used in the sources
-of the application programlibgedcom.so
), to be linked in the
+application program, which implements the callback parsergedcom.h
), to be used in the sources
+ of the application programgedcom-tags.h
) that is also installed,
+ but that is automatically included via gedcom.h
libgedcom.so
is also needed in this case, because the object model uses the callback parser internally):libgedcom_gom.so
), to be linked in the application program, which implements the C object modelgom.h
), to be used in the sources of the application program$PREFIX/share/gedcom-parse
- that contains some additional stuff, but which is not immediately important
- at first. I'll leave the description of the data directory for later.$PREFIX/share/gedcom-parse
+ that contains some additional stuff, but which is not immediately
+ important at first. I'll leave the description of the data directory
+ for later.gedcom.h
header is assumed, as everywhere
+in this manual):int result;
- ...
- result = gedcom_parse_file("myfamily.ged");
-
- Although this will not provide much information, one thing it does is
-parse the entire file and return the result. The function returns
-0 on success and 1 on failure. No other information is available using
-this function only.- In the above piece of code,void my_message_handler (Gedcom_msg_type type, - char *msg)
- {
- ...
- }
- ...
- gedcom_set_message_handler(my_message_handler);
...
- result = gedcom_parse_file("myfamily.ged");
-
my_message_handler
is the callback
- that will be called for errors (type=ERROR
), warnings (
- type=WARNING
) and messages (type=MESSAGE
). The
-callback must have the signature as in the example. For errors, the
- msg
passed to the callback will have the format:- Note that the entire string will be properly internationalized, and encoded - in UTF-8 (see "Why UTF-8?" LINK TBD). Also, no newline - is appended, so that the application program can use it in any way it wants. - Warnings are similar, but use "Warning" instead of "Error". Messages - are plain text, without any prefix.Error on line
<lineno>: <actual_message>
-
printf
- is used in the message handler.Gedcom_ctxt my_header_start_cb (int level,
- Gedcom_val xref, char *tag)
- {
- printf("The header starts\n");
- return (Gedcom_ctxt)1;
- }
-
- void my_header_end_cb (Gedcom_ctxt self)
- {
- printf("The header ends, context is %d\n", self); /* context
- will print as "1" */
- }
-
- ...
- gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb,
-my_header_end_cb);
+ gedcom_init();
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+
+ Although this will not provide much information, one thing it does
+ is parse the entire file and return the result. The function returns
+ 0 on success and 1 on failure. No other information is available
+using this function only.gedcom.h
and gom.h
is required):- Using theint result;
...
- result = gedcom_parse_file("myfamily.ged");
-
gedcom_subscribe_to_record
function, the application
- requests to use the specified callbacks as start and end callback. The end
- callback is optional: you can pass NULL
if you are not interested
- in the end callback. The identifiers to use as first argument to the
- function (here REC_HEAD
) are described in the
- interface details.Gedcom_ctxt
type that is used as a result of the start
-callback and as an argument to the end callback is vital for passing context
-necessary for the application. This type is meant to be opaque; in fact,
-it's a void pointer, so you can pass anything via it. The important
-thing to know is that the context that the application returns in the start
-callback will be passed in the end callback as an argument, and as we will
-see shortly, also to all the directly subordinate elements of the record.gom_parse_file
will build the C object model, which is then a complete representation of the GEDCOM file.gedcom_init
() should be one of the first calls
+in your program. The requirement is that it should come before the first
+call to iconv_open
(part of the generic character set conversion
+feature) in the program, either by your program itself, or indirectly by
+the library calls it makes. Practically, it should e.g. come before
+ any calls to any GTK functions, because GTK uses iconv_open
+ in its initialization.-lgedcom
option
+on the linking of the program as the last option, so that its initialization
+code is run first. In the case of using the C object model, the linking
+options should be: -lgedcom_gom -lgedcom
gedcom_init()
also initializes locale handling by calling setlocale(LC_ALL, "")
, in case the application would not do this (it doesn't hurt for the application to do the same).struct
that will contain the information for the
- header. In the end callback, the application could then e.g. do some
- finalizing operations on the struct
to put it in its database.Gedcom_val
type for the xref
argument
- was not discussed, see further for this)Gedcom_ctxt my_header_source_start_cb(Gedcom_ctxt - parent,
- - int - level,
- - char* - tag,
- - char* - raw_value,
- - Gedcom_val parsed_value)
- {
- char *source = GEDCOM_STRING(parsed_value);
- printf("This file was written by %s\n", source);
- return parent;
- }
-
- void my_header_source_end_cb(Gedcom_ctxt parent,
- - Gedcom_ctxt self,
- - Gedcom_val parsed_value)
- {
- printf("End of the source description\n");
- }
-
- ...
- gedcom_subscribe_to_element(ELT_HEAD_SOUR,
- - my_header_source_start_cb,
- - my_header_source_end_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");
+ A typical piece of code would be (gom_parse_file
would be called in case the C object model is used):
+ +- The subscription mechanism for elements is similar, only the signatures -of the callbacks differ. The signature for the start callback shows -that the context of the parent line (e.g. thevoid my_message_handler (Gedcom_msg_type type, + char *msg)
+ {
+ ...
+ }
+ ...
+ gedcom_set_message_handler(my_message_handler);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
struct
that describes - the header) is passed to this start callback. The callback itself -returns here the same context, but this can be its own context object of -course. The end callback is called with both the context of the parent -and the context of itself, which will be the same in the example. Again, -the list of identifiers to use as a first argument for the subscription function -are detailed in the interface -details .
-
- If we look at the other arguments of the start callback, we see the level - number (the initial number of the line in the GEDCOM file), the tag (e.g. - "SOUR"), and then a raw value and a parsed value. The raw value is -just the raw string that occurs as value on the line next to the tag (in -UTF-8 encoding). The parsed value is the meaningful value that is parsed -from that raw string.
-
- TheGedcom_val
type is meant to be an opaque type. The - only thing that needs to be known about it is that it can contain specific - data types, which have to be retrieved from it using pre-defined macros. - These data types are described in the - interface details.
-
- Some extra notes:
- --
- -- The
-Gedcom_val
argument of the end callback - is currently not used. It is there for future enhancements.- There is also a
- -Gedcom_val
argument in the -start callback for records. This argument is currently a string value -giving the pointer in string form.Default callbacks
- As described above, an application doesn't always implement the entire -GEDCOM spec, and application-specific tags may have been added by other applications. - To preserve this extra data anyway, a default callback can be registered -by the application, as in the following example:
-
- -- This callback has a similar signature as the previous ones, but -it doesn't contain a parsed value. However, it does contain the parent -context, that was returned by the application for the most specific containing -tag that the application supported.void my_default_cb (Gedcom_ctxt parent, -int level, char* tag, char* raw_value)
- {
- ...
- }
+ In the above piece of code,my_message_handler
is the + callback that will be called for errors (type=ERROR
), warnings + (type=WARNING
) and messages (type=MESSAGE
). The + callback must have the signature as in the example. For errors, +themsg
passed to the callback will have the format:
+ ++ Note that the entire string will be properly internationalized, and + encoded in UTF-8 (Why UTF-8?). Also, +no newline is appended, so that the application program can use it in any +way it wants. Warnings are similar, but use "Warning" instead of "Error". + Messages are plain text, without any prefix.Error on line
<lineno>: <actual_message>
+
+
+ With this in place, the resulting code will already show errors and + warnings produced by the parser, e.g. on the terminal if a simple+ printf
is used in the message handler.
+ +
+Data callback mechanism
+ The most important use of the parser is of course to get the data +out of the GEDCOM file. This section focuses on the callback mechanism (see here for the C object model). In fact, the mechanism involves two levels.
+
+ The primary level is that each of the sections in a GEDCOM file is + notified to the application code via a "start element" callback and an + "end element" callback (much like in a SAX interface for XML), i.e. when + a line containing a certain tag is parsed, the "start element" callback + is called for that tag, and when all its subordinate lines with their +tags have been processed, the "end element" callback is called for the +original tag. Since GEDCOM is hierarchical, this results in properly +nested calls to appropriate "start element" and "end element" callbacks.
+
+ However, it would be typical for a genealogy program to support only + a subset of the GEDCOM standard, certainly a program that is still under + development. Moreover, under GEDCOM it is allowed for an application + to define its own tags, which will typically not be supported by +another application. Still, in that case, data preservation is important; + it would hardly be accepted that information that is not understood by + a certain program is just removed.
+
+ Therefore, the second level of callbacks involves a "default callback". + An application needs to subscribe to callbacks for tags it does support, + and need to provide a "default callback" which will be called for tags +it doesn't support. The application can then choose to just store +the information that comes via the default callback in plain textual format.
+
+ After this introduction, let's see what the API looks like...
+
+ +Start and end callbacks
+ +Callbacks for records
+ As a simple example, we will get some information from the header +of a GEDCOM file. First, have a look at the following piece of code:
+
+ ++ Using theGedcom_ctxt my_header_start_cb (Gedcom_rec rec,
+ int level,
+ + Gedcom_val xref,
+ + char *tag,
+ + char *raw_value,
+ + int parsed_tag,
+ + Gedcom_val parsed_value)
+ {
+ printf("The header starts\n");
+ return (Gedcom_ctxt)1;
+ }
+
+ void my_header_end_cb (Gedcom_rec rec, Gedcom_ctxt self)
+ {
+ printf("The header ends, context is %d\n", (int)self); + /* context will print as "1" */
+ }
+
+ ...
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start_cb, + my_header_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+gedcom_subscribe_to_record
function, the + application requests to use the specified callbacks as start and end +callback. The end callback is optional: you can passNULL
+ if you are not interested in the end callback. The identifiers +to use as first argument to the function (hereREC_HEAD
) +are described in the interface +details . These are also passed as first argument in the callbacks (theGedcom_rec
argument).
+
+ From the name of the function it becomes clear that this function +is specific to complete records. For the separate elements in records + there is another function, which we'll see shortly. Again, the callbacks + need to have the signatures as shown in the example.
+
+ TheGedcom_ctxt
type that is used as a result of the +start callback and as an argument to the end callback is vital for passing +context necessary for the application. This type is meant to be opaque; +in fact, it's a void pointer, so you can pass anything via it. The +important thing to know is that the context that the application returns +in the start callback will be passed in the end callback as an argument, +and as we will see shortly, also to all the directly subordinate elements +of the record.
- ...
- gedcom_set_default_callback(my_default_cb);
- ...
- result = gedcom_parse_file("myfamily.ged");
-
-
- Suppose e.g. that this callback is called for some tags in the header that -are specific to some other application, then our application could make sure -that the parent context contains the struct or object that represents the -header, and use the default callback here to add the level, tag and raw_value -as plain text in a member of that struct or object, thus preserving the information. - The application can then write this out when the data is saved again -in a GEDCOM file. To make it more specific, consider the following example:
- -- Note that the default callback will be called for any tag that isn't specifically -subscribed upon by the application, and can thus be called in various contexts. - For simplicity, the example above doesn't take this into account (the -struct header {
- char* source;
- ...
- char* extra_text;
- };
-
- Gedcom_ctxt my_header_start_cb(int level, Gedcom_val xref, char* tag)
- {
- struct header head = my_make_header_struct();
- return (Gedcom_ctxt)head;
- }
-
- void my_default_cb(Gedcom_ctxt parent, int level, char* tag, char* raw_value)
- {
- struct header head = (struct header)parent;
- my_header_add_to_extra_text(head, level, tag, raw_value);
- }
-
- gedcom_set_default_callback(my_default_cb);
- gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
- ...
- result = gedcom_parse_file(filename);
-parent
could be of different types, depending on -the context).
- -
-Other API functions
- Although the above describes the basic interface of libgedcom, there are -some other functions that allow to customize the behaviour of the library. - These will be explained in the current section.
-
- -Debugging
- The library can generate various debugging output, not only from itself, -but also the debugging output generated by the yacc parser. By default, -no debugging output is generated, but this can be customized using the following -function:
- -- Thevoid gedcom_set_debug_level (int level, -FILE* trace_output)
-level
can be one of the following values:
- --
- If the- 0: no debugging information (this is the default)
-- 1: only debugging information from libgedcom -itself
-- 2: debugging information from libgedcom and -yacc
- -trace_output
isNULL
, debugging information -will be written tostderr
, otherwise the given file handle is -used (which must be open).
+ Thetag
is the GEDCOM tag in string format, theparsed_tag
+ is an integer, for which symbolic values are defined asTAG_HEAD,
+TAG_SOUR,
TAG_DATA,
... andUSERTAG +
for the application-specific tags. These values +are defined in the header
gedcom-tags.h
that is installed, +and included viagedcom.h
(so no need to includegedcom-tags.h
+ yourself).
+
+ The example passes a simple integer as context, but an application + could e.g. pass astruct
(or an object in a C++ application) + that will contain the information for the header. In the end callback, + the application could then e.g. do some finalizing operations on the +struct
to put it in its database.
+
+ (Note that theGedcom_val
type for thexref
+ andparsed_value
arguments was not discussed, see further + for this)
+
+ +Callbacks for elements
+ We will now retrieve the SOUR field (the name of the program that +wrote the file) from the header:
+ ++ The subscription mechanism for elements is similar, only the signatures + of the callbacks differ. The signature for the start callback shows + that the context of the parent line (here e.g. theGedcom_ctxt my_header_source_start_cb(Gedcom_elt elt,
+ + Gedcom_ctxt + parent,
+ + int + level,
+ + char* + tag,
+ + char* + raw_value,
+ + int + parsed_tag,
+ + Gedcom_val + parsed_value)
+ {
+ char *source = GEDCOM_STRING(parsed_value);
+ printf("This file was written by %s\n", source);
+ return parent;
+ }
+
+ void my_header_source_end_cb(Gedcom_elt elt,
+ Gedcom_ctxt parent,
+ + Gedcom_ctxt self,
+ + Gedcom_val parsed_value)
+ {
+ printf("End of the source description\n");
+ }
+
+ ...
+ gedcom_subscribe_to_element(ELT_HEAD_SOUR,
+ + my_header_source_start_cb,
+ + my_header_source_end_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
+struct
+ that describes the header) is passed to this start callback. The + callback itself returns here in this example the same context, but this +can be its own context object of course. The end callback is called +with both the context of the parent and the context of itself, which in this +example will be the same. Again, the list of identifiers to use as +a first argument for the subscription function are detailed in the interface details . Again, these are passed as first argument in the callback (theGedcom_elt
argument).
+
+ If we look at the other arguments of the start callback, we see the + level number (the initial number of the line in the GEDCOM file), the tag + (e.g. "SOUR"), and then a raw value, a parsed tag and a parsed value. The + raw value is just the raw string that occurs as value on the line next +to the tag (in UTF-8 encoding). The parsed value is the meaningful +value that is parsed from that raw string. The parsed tag is described +in the section for record callbacks above.
+
+ TheGedcom_val
type is meant to be an opaque type. The + only thing that needs to be known about it is that it can contain specific + data types, which have to be retrieved from it using pre-defined macros. + These data types are described in the interface details. +
- -Error treatment
- One of the previous sections already described the callback to be registered -to get error messages. The library also allows to customize what happens -on an error, using the following function:
- -void gedcom_set_error_handling (Gedcom_err_mech -mechanism)
+ Some extra notes:
+ + ++
+ + +- The
+Gedcom_val
argument of the end callback + is currently not used. It is there for future enhancements.- There are also two
+ + +Gedcom_val
arguments + in the start callback for records. The first one (xref
+ ) contains thexref_value
corresponding to the cross-reference + (orNULL
if there isn't one), the second one (parsed_value
+ ) contains the value that is parsed from theraw_value
. See + the interface details + .Default callbacks
+ As described above, an application doesn't always implement the entire + GEDCOM spec, and application-specific tags may have been added by other + applications. To preserve this extra data anyway, a default callback + can be registered by the application, as in the following example:
+
+ +- Thevoid my_default_cb (Gedcom_elt elt, Gedcom_ctxt parent, int level, + char* tag, char* raw_value, int parsed_tag)
+ {
+ ...
+ }
+
+ ...
+ gedcom_set_default_callback(my_default_cb);
+ ...
+ result = gedcom_parse_file("myfamily.ged");
mechanism
can be one of:
- --
- This doesn't influence the generation of error or warning messages, only -the behaviour of the parser and its return code.- -
IMMED_FAIL
: immediately fail the parsing -on an error (this is the default)- -
DEFER_FAIL
: continue parsing after -an error, but return a failure code eventually- - -
IGNORE_ERRORS
: continue parsing after -an error, return success always
+ This callback has a similar signature as the previous ones, + but it doesn't contain a parsed value. However, it does contain the + parent context, that was returned by the application for the most specific + containing tag that the application supported.
- -Compatibility mode
- Applications are not necessarily true to the GEDCOM spec (or use a different -version than 5.5). The intention is that the library is resilient to -this, and goes in compatibility mode for files written by specific programs -(detected via the HEAD.SOUR tag). This compatibility mode can be enabled -and disabled via the following function:
-
- -void gedcom_set_compat_handling - (int enable_compat)
+ Suppose e.g. that this callback is called for some tags in the header + that are specific to some other application, then our application could + make sure that the parent context contains the struct or object that represents + the header, and use the default callback here to add the level, tag and + raw_value as plain text in a member of that struct or object, thus preserving + the information. The application can then write this out when the +data is saved again in a GEDCOM file. To make it more specific, consider + the following example:
+ +- The argument can be:struct header {
+ char* source;
+ ...
+ char* extra_text;
+ };
+
+ Gedcom_ctxt my_header_start_cb(Gedcom_rec rec, int level, Gedcom_val xref, char* tag, + char *raw_value,
+ + int parsed_tag, Gedcom_val parsed_value)
+ {
+ struct header head = my_make_header_struct();
+ return (Gedcom_ctxt)head;
+ }
+
+ void my_default_cb(Gedcom_elt elt, Gedcom_ctxt parent, int level, char* tag, char* +raw_value, int parsed_tag)
+ {
+ struct header head = (struct header)parent;
+ my_header_add_to_extra_text(head, level, tag, raw_value);
+ }
+
+ gedcom_set_default_callback(my_default_cb);
+ gedcom_subscribe_to_record(REC_HEAD, my_header_start, NULL);
+ ...
+ result = gedcom_parse_file(filename);
- --
- Note that, currently, no actual compatibility code is present, but this -is on the to-do list.- 0: disable compatibility mode
-- 1: allow compatibility mode (this is the default)
- -
-
- -
-$Id$-
$Name$-+ Note that the default callback will be called for any tag that isn't + specifically subscribed upon by the application, and can thus be called + in various contexts. For simplicity, the example above doesn't take + this into account (theparent
could be of different + types, depending on the context).
+
+ Note also that the default callback is not called when the parent context + isNULL
. This is e.g. the case if none + of the "upper" tags has been subscribed upon.
+ + +
+ +Other API functions
+ + Although the above describes the basic interface of the gedcom parser, there + are some other functions that allow to customize the behaviour of the library. + These will be explained in the current section.
+
+ + +Debugging
+ The library can generate various debugging output, not only from itself, + but also the debugging output generated by the yacc parser. By default, + no debugging output is generated, but this can be customized using the +following function:
+ + ++ Thevoid gedcom_set_debug_level (int level, FILE* +trace_output)
+level
can be one of the following values:
+ + ++
+ If the- 0: no debugging information (this is the + default)
+- 1: only debugging information from libgedcom + itself
+- 2: debugging information from libgedcom + and yacc
+ + +trace_output
isNULL
, debugging information + will be written tostderr
, otherwise the given file handle + is used (which must be open).
+
+ + +Error treatment
+ One of the previous sections already described the callback to be +registered to get error messages. The library also allows to customize +what happens on an error, using the following function:
+ + ++ Thevoid gedcom_set_error_handling (Gedcom_err_mech + mechanism)
+mechanism
can be one of:
+ + ++
+ This doesn't influence the generation of error or warning messages, + only the behaviour of the parser and its return code.- +
IMMED_FAIL
: immediately fail +the parsing on an error (this is the default)- +
DEFER_FAIL
: continue parsing +after an error, but return a failure code eventually- + + +
IGNORE_ERRORS
: continue parsing + after an error, return success always
+
+ + +Compatibility mode
+ Applications are not necessarily true to the GEDCOM spec (or use a +different version than 5.5). The intention is that the library is +resilient to this, and goes in compatibility mode for files written by specific +programs (detected via the HEAD.SOUR tag). This compatibility mode +can be enabled and disabled via the following function:
+
+ + ++ The argument can be:void gedcom_set_compat_handling (int enable_compat)
+
+ + ++
+ Currently, there is a beginning for compatibility for ftree and Lifelines (3.0.2).- 0: disable compatibility mode
+- 1: allow compatibility mode (this is the +default)
+ + +
+
+ +
+Converting character sets
+ All strings passed by the GEDCOM parser to the application are in UTF-8 + encoding. Typically, an application needs to convert this to something + else to be able to display it.
+
+ The most common case is that the output character set is controlled by +thelocale
mechanism (i.e. via theLANG
,+ LC_ALL
orLC_CTYPE
environment variables), which also +controls thegettext
mechanism in the application.
+
+
+ + The source distribution of+gedcom-parse
contains an a library implementing help functions for UTF-8 encoding (see +the "utf8" subdirectory of the top directory). Feel free to use + it in your source code. It isn't installed anywhere, so you need +to take over the source and header files in your application. Note that on +some systems it uses libcharset, which is also included in this subdirectory. +
+
+ Its interface contains first of all the following two help functions:
+ ++The +first one returns 1 if the given input is a valid UTF-8 string, it returns +0 otherwise, the second gives the number of UTF-8 characters in the given +input. Note that the second function assumes that the input is valid +UTF-8, and gives unpredictable results if it isn't.int is_utf8_string (char *input);
int utf8_strlen (char *input);
+
+For conversion, the following functions are available:
++++char *convert_utf8_to_locale (char *input, int *conv_failures);
char *convert_locale_to_utf8 (char *input);++ + Both functions return a pointer to a static buffer that is overwritten + on each call. To function properly, the application must first set +the locale using thesetlocale
function (the second step detailed + below). All other steps given below, including setting up and closing + down the conversion handles, are transparantly handled by the two functions. +
+
+ If you pass a pointer to an integer to the first function, it will be +set to the number of conversion failures, i.e. characters that couldn't +be converted; you can also just passNULL
if you are not interested +(note that usually, the interesting information is just whether there +were conversion failures or not, which is then given by the integer +being bigger than zero or not). The second function doesn't need this, +because any locale can be converted to UTF-8.
+
+ You can change the "?" that is output for characters that can't be converted + to any string you want, using the following function before the conversion + calls:
+ ++++void convert_set_unknown (const char *unknown);
+ If you want to have your own functions for it instead of this example +implementation, the following steps need to be taken by the application +(more detailed info can be found in the info file of the GNU libc library +in the "Generic Charset Conversion" section under "Character Set Handling" +or online + here):
+ ++
+ +- inclusion of some headers:
+ +++ ++++#include <locale.h> /* for setlocale */
#include <langinfo.h> /* for nl_langinfo */
#include <iconv.h> /* for iconv_* functions */+
+ +- set the program's current locale to what +the user configured in the environment:
+ +++ ++++setlocale(LC_ALL, "");
+
+ +- open a conversion handle for conversion + from UTF-8 to the character set of the current locale (once for the entire + program):
+ +++ ++++iconv_t iconv_handle;
...
iconv_handle = iconv_open(nl_langinfo(CODESET), "UTF-8");
if (iconv_handle == (iconv_t) -1)
/* signal an error */+
+ +- then, every string can be converted + using the following:
- - +++ ++++/* char* in_buf is the input buffer, size_t in_len is its length */
/* char* out_buf is the output buffer, size_t out_len is its length */
size_t nconv;
char *in_ptr = in_buf;
char *out_ptr = out_buf;
nconv = iconv(iconv_handle, &in_ptr, &in_len, &out_ptr, &out_len);If the output buffer is not big enough,+ +iconv
will + return -1 and seterrno
toE2BIG
. Also, +thein_ptr
andout_ptr
will point just after +the last successfully converted character in the respective buffers, and +thein_len
andout_len
will be updated to show +the remaining lengths. There can be two strategies here:
+ ++
+ Another error case is when the conversion was unsuccessful (if one of +the characters can't be represented in the target character set). The +- Make sure from the beginning + that the output buffer is big enough. However, it's difficult to find + an absolute maximum length in advance, even given the length of the input + string.
+
+
+- Do the conversion in several + steps, growing the output buffer each time to make more space, and calling +
+ +iconv
consecutively until the conversion is complete. + This is the preferred way (a function could be written to encapsulate + all this).iconv
function will then also return -1 and seterrno
+ toEILSEQ
; thein_ptr
will point to the character + that couldn't be converted. In that case, again two strategies are +possible:
+ ++
+ +- Just fail the conversion, +and show an error. This is not very user friendly, of course.
+
+
+- Skip over the character that + can't be converted and append a "?" to the output buffer, then call
+ ++ iconv
again. Skipping over a UTF-8 character is fairly simple, + as follows from the encoding rules + :+ +
++
+ +- if the first byte is in +binary 0xxxxxxx, then the character is only one byte long, just skip over +that byte
+
+
+- if the first byte is in +binary 11xxxxxx, then skip over that byte and all bytes 10xxxxxx that follow.
+ +
++
+ +- eventually, the conversion +handle needs to be closed (when the program exits):
+ +
+++ The example implementation +mentioned above grows the output buffer dynamically and outputs "?" for characters + that can't be converted.+++iconv_close(iconv_handle);
+ + +
+ +Support for configure.in
There +is a macro available for use in configure.in for applications that are using +autoconf to configure their sources. The following macro checks whether +the Gedcom parser library is available and whether its version is high enough:
++All the arguments are optional and default to 0. E.g. to check for +version 1.34, you would put in configure.in the following statement:AM_LIB_GEDCOM_PARSER([major,[minor,[patch]]])
+
++To be able to use this macro in the sources of your application, you have three options:AM_LIB_GEDCOM_PARSER(1,34)
+
++
+- Put the file
+m4/gedcom.m4
in your autoconf data directory (i.e. the path given by 'aclocal --print-ac-dir
', usually/usr/share/aclocal
). You can do this automatically by going into the m4 subdirectory and typing 'make install-m4
'.
+
+- If you're using autoconf, but not automake, copy the contents of
+m4/gedcom.m4
in theaclocal.m4
file in your sources.
+
+- If you're using automake, copy the contents of
+m4/gedcom.m4
in theacinclude.m4
file in your sources.
+
+There are three preprocessor symbols defined for version checks in the + header (but their direct use is deprecated: please use the macro above):
+ ++
+ The last one is equal to- +
GEDCOM_PARSE_VERSION_MAJOR
- +
GEDCOM_PARSE_VERSION_MINOR
- + +
GEDCOM_PARSE_VERSION
+(GEDCOM_PARSE_VERSION_MAJOR * 1000) + GEDCOM_PARSE_VERSION_MINOR.
+ + +
+ +$Id$+ + +
$Name$+ + +
+
+
+
+
+
+ \ No newline at end of file