tconv_ext - tconv extended API
#include <tconv.h> tconv_t tconv_open_ext(const char *tocodes, const char *fromcodes, tconv_option_t *tconvOptionp); void tconv_trace_on(tconv_t tconvp); void tconv_trace_off(tconv_t tconvp); void tconv_trace(tconv_t tconvp, const char *fmts, ...); char *tconv_error_set(tconv_t tconvp, const char *msgs); char *tconv_error(tconv_t tconvp); char *tconv_fromcode(tconv_t tconvp); char *tconv_tocode(tconv_t tconvp); short tconv_helper(tconv_t tconvp, void *contextp, short (*producerp)(void *contextp, char **bufpp, size_t *countlp, short *eofbp), short (*consumerp)(void *contextp, char *bufp, size_t countl, short eofb, size_t *resultlp) );
tconv extended API is providing more entry points to query or control how tconv behaves: tconv is a generic layer on top of iconv(), ICU, etc. Therefore additional semantic is needed.
tconv_t tconv_open_ext(const char *tocodes, const char *fromcodes, tconv_option_t *tconvOptionp); typedef void (*tconvTraceCallback_t)(void *userDatavp, const char *msgs); typedef struct tconv_option { tconv_charset_t *charsetp; tconv_convert_t *convertp; tconvTraceCallback_t traceCallbackp; void *traceUserDatavp; const char *fallbacks; } tconv_option_t;
tconv support two engine types: one for charset detection, one for character conversion. Each engine as its own option structure:
Describe charset engine options.
Describe convertion engine options.
Logging is provided through the genericLogger package, and the developper may provide a function pointer with an associated context:
A function pointer.
Function pointer opaque context.
Fallback charset when user gave none and the guess failed.
If tconvOptionp is NULL, defaults will apply. Otherwise, if charsetp is NULL charset defaults apply, if convertp is NULL convertion defaults apply, and if traceCallbackp is NULL, no logging is possible.
tconvOptionp
charsetp
convertp
traceCallbackp
A charset engine may support three entry points:
typedef void *(*tconv_charset_new_t) (tconv_t tconvp, void *optionp); typedef char *(*tconv_charset_run_t) (tconv_t tconvp, void *contextp, char *bytep, size_t bytel); typedef void (*tconv_charset_free_t)(tconv_t tconvp, void *contextp);
All entry points start with a tconvp pointer (that they can use to trigger logging, error setting).
tconvp
The new is optional, have a pointer to an opaque (from tconv point of view) data area, and return a charset specific opaque context. If new is not NULL, then free must not be NULL, and will be called with the charset specific context pointer returned by new. When new is NULL, the charset specific context will be NULL.
The only required entry point is run, with a pointer to bytes, and the number of bytes.
charsetp must point to a structure defined as:
typedef struct tconv_charset { enum { TCONV_CHARSET_EXTERNAL = 0, TCONV_CHARSET_PLUGIN, TCONV_CHARSET_ICU, TCONV_CHARSET_CCHARDET, } charseti; union { tconv_charset_external_t external; tconv_charset_plugin_t plugin; tconv_charset_ICU_option_t *ICUOptionp; tconv_charset_cchardet_option_t *cchardetOptionp; } u; } tconv_charset_t;
i.e. a charset engine can be of four types:
An external charset engine type is a structure that give explicitly the three entry points described at the beginning of this section, and a pointer to an opaque charset specific option area. It is defined as:
typedef struct tconv_charset_external { void *optionp; tconv_charset_new_t tconv_charset_newp; tconv_charset_run_t tconv_charset_runp; tconv_charset_free_t tconv_charset_freep; } tconv_charset_external_t;
The charset engine is dynamically loaded. A plugin definition is:
typedef struct tconv_charset_plugin { void *optionp; char *news; char *runs; char *frees; char *filenames; } tconv_charset_plugin_t;
i.e. tconv will use filenames as the path of a shared library and will try to load it. optionp is a pointer to a charset specific option area. tconv will look to the three entry points named news, runs and frees:
filenames
optionp
news
runs
frees
If news is NULL, environment variable TCONV_ENV_CHARSET_NEW, else tconv_charset_newp will be looked at.
TCONV_ENV_CHARSET_NEW
tconv_charset_newp
If runs is NULL, environment variable TCONV_ENV_CHARSET_RUN, else tconv_charset_runp will be looked at.
TCONV_ENV_CHARSET_RUN
tconv_charset_runp
If frees is NULL, environment variable TCONV_ENV_CHARSET_FREE, else tconv_charset_freep will be looked at.
TCONV_ENV_CHARSET_FREE
tconv_charset_freep
Please note that dynamically load is not always thread-safe, and tconv will not try to adapt to this situation. Therefore, it is up to the caller to make sure that tconv_open_ext() is called within a context that is not affected by an eventual non-thread-safe workflow (e.g. typically within a critical section, or at program startup).
ICU built-in, available when tconv has been compiled with ICU. If tconv has not been compiled with such support, TCONV_CHARSET_ICU remain available, but using it will fail.
TCONV_CHARSET_ICU
If ICUOptionp is not NULL, it must be a pointer to a structure defined as:
ICUOptionp
typedef struct tconv_charset_ICU_option { int confidencei; } tconv_charset_ICU_option_t;
where confidencei is the minimum accepted confidence level. If NULL, a default of 10 is used, unless the environment variable TCONV_ENV_CHARSET_ICU_CONFIDENCE is set.
confidencei
TCONV_ENV_CHARSET_ICU_CONFIDENCE
cchardet built-in, always available.
If cchardetOptionp is not NULL, it must be a pointer to a structure defined as:
cchardetOptionp
typedef struct tconv_charset_cchardet_option { float confidencef; } tconv_charset_cchardet_option_t;
where confidencef is the minimum accepted confidence level. If NULL, a default of 0.4f is used. This can also be set via the environment variable TCONV_ENV_CHARSET_CCHARDET_CONFIDENCE.
confidencef
TCONV_ENV_CHARSET_CCHARDET_CONFIDENCE
A convert engine may support three entry points:
typedef void *(*tconv_convert_new_t) (tconv_t tconvp, const char *tocodes, const char *fromcodes, void *optionp); typedef size_t (*tconv_convert_run_t) (tconv_t tconvp, void *contextp, char **inbufsp, size_t *inbytesleftlp, char **outbufsp, size_t *outbytesleftlp); typedef int (*tconv_convert_free_t)(tconv_t tconvp, void *contextp);
All entry points start with a tconvp pointer.
The new is optional, have a pointer to an opaque (from tconv point of view) data area, and return a convert specific opaque context. If new is not NULL, then free must not be NULL, and will be called with the convert specific context pointer returned by new. When new is NULL, the convert specific context will be NULL.
The only required entry point is run, with additional parameters that are the iconv() semantics: pointers to
convertp must point to a structure defined as:
typedef struct tconv_convert { enum { TCONV_CONVERT_EXTERNAL = 0, TCONV_CONVERT_PLUGIN, TCONV_CONVERT_ICU, TCONV_CONVERT_ICONV } converti; union { tconv_convert_external_t external; tconv_convert_plugin_t plugin; tconv_convert_ICU_option_t *ICUOptionp; tconv_convert_iconv_option_t *iconvOptionp; } u; } tconv_convert_t;
i.e. a convert engine can be of four types:
An external convert engine type is a structure that give explicitly the three entry points described above, and a pointer to an opaque convert specific option area. It is defined as:
typedef struct tconv_convert_external { void *optionp; tconv_convert_new_t tconv_convert_newp; tconv_convert_run_t tconv_convert_runp; tconv_convert_free_t tconv_convert_freep; } tconv_convert_external_t;
The convert engine is dynamically loaded. A plugin definition is:
typedef struct tconv_convert_plugin { void *optionp; char *news; char *runs; char *frees; char *filenames; } tconv_convert_plugin_t;
i.e. tconv will use filenames as the path of a shared library and will try to load it. optionp is a pointer to a convert specific option area. tconv will look to the three entry points named news, runs and frees:
If news is NULL, environment variable TCONV_ENV_CONVERT_NEW, else tconv_convert_newp will be looked at.
TCONV_ENV_CONVERT_NEW
tconv_convert_newp
If runs is NULL, environment variable TCONV_ENV_CONVERT_RUN, else tconv_convert_runp will be looked at.
TCONV_ENV_CONVERT_RUN
tconv_convert_runp
If frees is NULL, environment variable TCONV_ENV_CONVERT_FREE, else tconv_convert_freep will be looked at.
TCONV_ENV_CONVERT_FREE
tconv_convert_freep
Same remark about thread-safety as for the charset engine.
ICU built-in, available when tconv has been compiled with ICU. If tconv has not been compiled with such support, TCONV_CONVERT_ICU remain available, but using it will fail.
TCONV_CONVERT_ICU
typedef struct tconv_convert_ICU_option { size_t uCharCapacityl; short fallbackb; int signaturei; } tconv_convert_ICU_option_t;
containing:
ICU convertion always go through an UTF-16 internal buffer by design. uCharCapacityl is the number of bytes of this internal intermediary buffer. The default is 4096, unless environment variable TCONV_ENV_CONVERT_ICU_UCHARCAPACITY is set.
uCharCapacityl
TCONV_ENV_CONVERT_ICU_UCHARCAPACITY
ICU convertion has an optional fallback mechanism for unknown characters. Default value is a false value, unless TCONV_ENV_CONVERT_ICU_FALLBACK is set.
TCONV_ENV_CONVERT_ICU_FALLBACK
A signature may be added or removed on demand. If signaturei is lower than zero, signature is removed. If signaturei is higher than zero, signature is added. Else ICU default will apply. Default is 0, unless TCONV_ENV_CONVERT_ICU_SIGNATURE is set.
signaturei
TCONV_ENV_CONVERT_ICU_SIGNATURE
iconv built-in, always available. No special option.
void tconv_trace_on(tconv_t tconvp);
Set tracing. Then any call to tconv_trace() will trigger a call to traceCallbackp given in tconv_open_ext()'s option structure.
void tconv_trace_off(tconv_t tconvp);
Unset tracing.
void tconv_trace(tconv_t tconvp, const char *fmts, ...);
Formats a message string and call traceCallbackp if tracing is on.
char *tconv_error_set(tconv_t tconvp, const char *msgs);
Set a string that should a contain a more accurate description of the last error. Any engine should use that when a specific description exist. Default is use system's errno description.
char *tconv_error(tconv_t tconvp);
Get the latest value of specific error string.
char *tconv_fromcode(tconv_t tconvp);
Get the source codeset.
char *tconv_tocode(tconv_t tconvp);
Get the destination codeset.
short tconv_helper(tconv_t tconvp, void *contextp, short (*producerp)(void *contextp, char **bufpp, size_t *countlp, short *eofbp), short (*consumerp)(void *contextp, char *bufp, size_t countl, short eofb, size_t *resultlp) );
From an end-user point of viez, the only important thing is to produce bytes that must be converted and to consume them. The tconv_helper method is totally hiding all the iconv API subtilities, leaving only the two methods that are meaningul for the vast majority of applications. The parameters are:
tconv_helper
tconv can trace itself, unless tconv has been compiled with -DTCONV_NDEBUG, which is the default. When compiled without -DTCONV_NDEBUG, default tracing level is 0, unless environment variable TCONV_ENV_TRACE is set and the value of the later is a true value.
TCONV_ENV_TRACE
tconv internally limit the length of such string to 1024 bytes (including NUL).
A charset name contains only letters in the range [a-z0-9+.:].
tconv(3), genericLogger(3)
To install MarpaX::ESLIF, copy and paste the appropriate command in to your terminal.
cpanm
cpanm MarpaX::ESLIF
CPAN shell
perl -MCPAN -e shell install MarpaX::ESLIF
For more information on module installation, please visit the detailed CPAN module installation guide.