NAME
Encode::Detect::CJK - A Charset Detector, optimized for EastAsia charset and website content
SYNOPSIS
#simple use it
my
$charset
=CharsetDetector::detect(
$octets
);
#use it with advanced option
my
$charset
= CharsetDetector::detect(
$octets
,
$max_len
,
$is_consider_html_head_charset
);
#return the charset of binary string $octets
#$max_len if $octets 's size is big, will make detect slow, sometimes you need specify $max_len for detect,null is for DEFAULT(unlimit max_len)
#$is_consider_html_header_charset, by DEFAULT, detetor will consider
# html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset,
# if you don't want detetor to consider html header as a factor, set $is_consider_html_header_charset to "" or 0
Basic Function
detect - detect the charset of string
$charset
=CharsetDetector::detect(
$octets
,
$max_len
,
$is_consider_html_head_charset
);
$charset
=CharsetDetector::detect(
$octets
,
$max_len
);
#CharsetDetector::detect($octets,$max_len,1);
$charset
=CharsetDetector::detect(
$octets
);
#same as CharsetDetector::detect($octets,undef);
Param $octets - input binary string
input binary string
Param $max_len - max length for charset detector
if $octets 's size is big, will make detect slow, sometimes you need specify $max_len for detect,null is for DEFAULT(unlimit max_len) DEFAULT is unlimit
Param $is_consider_html_head_charset
by DEFAULT, detetor will consider html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset, if you don't want detetor to consider html header as a factor, set $is_consider_html_header_charset to "" or 0
Return Value $charset
if $octets is null return '' if $octets is '' return 'iso-8859-1' else return charset name
Supported Charset List
return
value: alias
ascii : ascii
iso-8859-1 : iso-8859-1
utf8 : utf8 utf-8-strict
utf16 : utf16
cp936 : euc-cn(gb2312) cp936(gbk) gb18030
big5-eten : big5-eten
euc-jp : euc-jp
shiftjis : shiftjis
iso-2022-jp : iso-2022-jp
euc-kr : euc-kr
iso-2022-kr : iso-2022-kr
COPYRIGHT
The CharsetDetector module is Copyright (c) 2003-2008 QIAN YU. All rights reserved.
You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.