The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Win32::MultiLanguage - Interface to IMultiLanguage I18N routines

SYNOPSIS

  use Win32::MultiLanguage;
  # @@

DESCRIPTION

Win32::MultiLanguage is an experimental wrapper module for the Windows IMultiLanguage interfaces that comes with Internet Explorer version 4 and later. Mlang.dll implements routines for dealing with character encodings, code pages, and locales.

ROUTINES

DetectInputCodepage($octets [, $flags [, $codepage]])

Detects the code page of the given string $octets. An optional $flags parameter may be specified, a combination of MLDETECTCP constants as defined above, if not specified MLDETECTCP_NONE will be used as default. An optional $codepage can also be specified, if this value is set to zero, this API returns all possible encodings. Otherwise, it lists only those encodings related to this parameter. The default is zero.

It will return a reference to an array of hash references of which each represents a DetectEncodingInfo strucure with the following keys

  LangID     => ..., # primary language identifier
  CodePage   => ..., # detected Win32-defined code page
  DocPercent => ..., # Percentage in the detected language
  Confidence => ..., # degree to which the detected data is correct

See http://msdn.microsoft.com/workshop/misc/mlang/reference/structures/detectencodinginfo.asp for details.

GetCodePageInfo($codepage, $langid)

...

GetCodePageDescription($codepage, $locale)

...

GetRfc1766FromLcid($locale)

...

DetectOutboundCodePage($utf8 [, $flags [, \@cp ]])

...

GetCharsetInfo($charset)

...

IsConvertible($src, $dst)

...

GetRfc1766Info($locale, $langid)

...

GetLcidFromRfc1766($rfc1766)

...

GetFamilyCodePage($codepage)

...

GetNumberOfCodePageInfo()

...

GetNumberOfScripts()

...

CONSTANTS

These are currently not exported/exportable.

MLDETECTCP

MLDETECTCP_NONE

Default setting will be used.

MLDETECTCP_7BIT

Input stream consists of 7-bit data.

MLDETECTCP_8BIT

Input stream consists of 8-bit data.

MLDETECTCP_DBCS

Input stream consists of double-byte data.

MLDETECTCP_HTML

Input stream is an HTML page.

MIMECONTF

MIMECONTF_MAILNEWS

Code page is meant to display on mail and news clients.

MIMECONTF_BROWSER

Code page is meant to display on browser clients.

MIMECONTF_MINIMAL

Code page is meant to display in minimal view. This value is generally not used.

MIMECONTF_IMPORT

Value that indicates that all of the import code pages should be enumerated.

MIMECONTF_SAVABLE_MAILNEWS

Code page includes encodings for mail and news clients to save a document in.

MIMECONTF_SAVABLE_BROWSER

Code page includes encodings for browser clients to save a document in.

MIMECONTF_EXPORT

Value that indicates that all of the export code pages should be enumerated.

MIMECONTF_PRIVCONVERTER

Value that indicates the encoding requires (or has) a private conversion engine. A client of IEnumCodePage doesn't use this value.

MIMECONTF_VALID

Value that indicates the corresponding encoding is supported on the system.

MIMECONTF_VALID_NLS

Value that indicates that only the language support file should be validated. Normally, both the language support file and the supporting font are checked.

MIMECONTF_MIME_IE4

Value that indicates the Microsoft® Internet Explorer 4.0 MIME data from MLang's internal data should be used.

MIMECONTF_MIME_LATEST

Value that indicates that the latest MIME data from MLang's internal data should be used.

MIMECONTF_MIME_REGISTRY

Value that indicates that the MIME data stored in the registry should be used.

MLDETECTF

MLDETECTF_MAILNEWS

Not currently supported.

MLDETECTF_BROWSER

Not currently supported.

MLDETECTF_VALID

Detection result must be valid for conversion and text rendering.

MLDETECTF_VALID_NLS

Detection result must be valid for conversion.

MLDETECTF_PRESERVE_ORDER

Preserve preferred code page order. This is meaningful only if you have set the @@puiPreferredCodePages parameter in DetectOutboundCodePage.

MLDETECTF_PREFERRED_ONLY

Only return one of the preferred code pages as the detection result. This is meaningful only if you have set the @@puiPreferredCodePages parameter in DetectOutboundCodePage.

MLDETECTF_FILTER_SPECIALCHAR

Filter out graphical symbols and punctuation.

IMPLEMENTATION STATUS

  Legend:

    + means is implemented
    ? means might get implemented
    - means unlikely that this gets implemented

  IMultiLanguage
    + GetCharsetInfo
    + GetRfc1766FromLcid
    + IsConvertible
    + GetRfc1766Info
    + GetLcidFromRfc1766
    + GetFamilyCodePage
    + GetNumberOfCodePageInfo
    
    ? ConvertString

    ? EnumCodePages
    ? EnumRfc1766
    
    - ConvertStringToUnicode
    - ConvertStringFromUnicode
    - ConvertStringReset
    - CreateConvertCharset
    
  IMultiLanguage2
    + GetCodePageInfo
    + DetectInputCodepage
    + GetCodePageDescription
    + GetNumberOfScripts

    ? EnumScripts
    
    - ValidateCodePage
    - ValidateCodePageEx
    - IsCodePageInstallable
    - ConvertStringInIStream
    - ConvertStringToUnicodeEx
    - ConvertStringFromUnicodeEx
    - DetectCodepageInIStream
    - SetMimeDBSource
    
  IMultiLanguage3
    + DetectOutboundCodePage
    
    - DetectOutboundCodePageInIStream

KNOWN ISSUES AND TODO

  • needs a test suite

  • needs more checks on input params

  • could benefit from some typemap entries

  • creating a new IML instance each time is sub-optimal

  • what happens if IE4+ is not installed?

  • needs more documentation

  • no access to DetectOutboundCodePage wcSpecialChar arg

  • GetLcidFromRfc1766 could check wantarray

  • export constants and/or methods

  • pointers to MSDN for each constant/method

  • use IMultiLanguage rather than IML2 for IML methods

  • add proper synopsis

SUPPORT

...

SEE ALSO

WARNING

This is pre-alpha software.

AUTHOR AND COPYRIGHT

  Copyright (c) 2004 Bjoern Hoehrmann <bjoern@hoehrmann.de>.
  This module is licensed under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 225:

Non-ASCII character seen before =encoding in 'Microsoft®'. Assuming CP1252