Author image 钱宇/Qian Yu

NAME

CharsetDetector - A Charset Detector, optimized for EastAsia charset and website content

SYNOPSIS

        use CharsetDetector;
        use CharsetDetector qw(detect detect1);
        
        #simple use it
        $charset = CharsetDetector::detect($octets);
        
        #with length limit
        $charset = CharsetDetector::detect($octets,$max_len);
        
        #don't consider html head charset as a factor to detect charset
        $charset = CharsetDetector::detect1($octets);
        $charset = CharsetDetector::detect1($octets,$max_len);

Basic Function

detect - detect charset

        $charset = CharsetDetector::detect($octets);
        $charset = CharsetDetector::detect($octets,$max_len);

detect1 - detect only by binary

detect charset don't consider html head charset as a factor to detect charset by DEFAULT, detetor will consider html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset, if you don't want detetor to consider html header as a factor, use detect1 instead of detect

        $charset = CharsetDetector::detect1($octets);
        $charset = CharsetDetector::detect1($octets,$max_len);

Return Value

if $octets is null return '' if $octets is '' return 'iso-8859-1' else return charset name

Supported Charset List

        return value: alias
        
        ascii       : ascii
        iso-8859-1  : iso-8859-1
        utf8        : utf8 utf-8-strict
        utf16       : utf16
        cp936       : euc-cn(gb2312) cp936(gbk) gb18030
        big5-eten   : big5-eten
        euc-jp      : euc-jp
        shiftjis    : shiftjis
        iso-2022-jp : iso-2022-jp
        euc-kr      : euc-kr
        iso-2022-kr : iso-2022-kr

COPYRIGHT

The CharsetDetector module is Copyright (c) 2003-2006 QIAN YU. All rights reserved.

You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.