The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

Twitter::Text - Perl implementation of the twitter-text parsing library

SYNOPSIS

    use Twitter::Text;

    $result = parse_tweet('Hello world こんにちは世界');
    print $result->{valid} ? 'valid tweet' : 'invalid tweet';

DESCRIPTION

Twitter::Text is a Perl implementation of the twitter-text parsing library.

WARNING

This library does not implement auto-linking and hit highlighting.

Please refer Implementation status for latest status.

FUNCTIONS

All functions below are exported by default.

Extraction

extract_hashtags

    my \@hashtags = extract_hashtags($text);

extract_hashtags_with_indices

    my \@hashtags_with_indices = extract_hashtags_with_indices($text, [\%options]);

extract_mentioned_screen_names

    my \@screen_names = extract_mentioned_screen_names($text);

extract_mentioned_screen_names_with_indices

    my \@screen_names_with_indices = extract_mentioned_screen_names_with_indices($text);

extract_mentions_or_lists_with_indices

    my \@mentions_or_lists_with_indices = extract_mentions_or_lists_with_indices($text);

extract_urls

    my \@urls = extract_urls($text);

extract_urls_with_indices

    my \@urls = extract_urls_with_indices($text, [\%options]);

Validation

parse_tweet

    my \%parse_result = parse_tweet($text, [\%options]);

The parse_tweet function takes a $text string and optional \%options parameter and returns a hash reference with following values:

weighted_length

The overall length of the tweet with code points weighted per the ranges defined in the configuration file.

permillage

Indicates the proportion (per thousand) of the weighted length in comparison to the max weighted length. A value > 1000 indicates input text that is longer than the allowable maximum.

valid

Indicates if input text length corresponds to a valid result.

display_range_start, display_range_end

An array of two unicode code point indices identifying the inclusive start and exclusive end of the displayable content of the Tweet.

valid_range_start, valid_range_end

An array of two unicode code point indices identifying the inclusive start and exclusive end of the valid content of the Tweet.

EXAMPLES

    use Data::Dumper;
    use Twitter::Text;

    $result = parse_tweet('Hello world こんにちは世界');
    print Dumper($result);
    # $VAR1 = {
    #       'weighted_length' => 33
    #       'permillage' => 117,
    #       'valid' => 1,
    #       'display_range_start' => 0,
    #       'display_range_end' => 32,
    #       'valid_range_start' => 0,
    #       'valid_range_end' => 32,
    #     };

is_valid_hashtag

    my $valid = is_valid_hashtag($hashtag);

is_valid_list

    my $valid = is_valid_list($username_list);

is_valid_url

    my $valid = is_valid_url($url, [unicode_domains => 1, require_protocol => 1]);

is_valid_username

    my $valid = is_valid_username($username);

SEE ALSO

twitter-text. Implementation of Twitter::Text (this library) is heavily based on Ruby implementation of twitter-text.

https://developer.twitter.com/en/docs/counting-characters

COPYRIGHT & LICENSE

Copyright (C) Twitter, Inc and other contributors

Copyright (C) utgwkk.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

utgwkk <utagawakiki@gmail.com>