Regexp::Common::microsyntax - a collection of regular expressions for use with microblogging-style text (tweets, dents, microposts, etc.)


Version 0.02


    use Regexp::Common qw(microsyntax);

    # Available patterns: user, hashtag, grouptag, slashtag

    # Get all users/hashtags/groups/slashtags mentioned in $post
    @users     = $post =~ m/$RE{microsyntax}{user}/og;
    @hashtags  = $post =~ m/$RE{microsyntax}{hashtag}/og;
    @groups    = $post =~ m/$RE{microsyntax}{grouptag}/og;
    @slashtags = $post =~ m/$RE{microsyntax}{slashtag}/og;

    # Capture/extract individual elements (see Regexp::Common '-keep')
    my @usernames;
    while ($post =~ m/$RE{microsyntax}{user}{-keep => 1 }/go) {
      push @usernames, $3;

    # Substitute/markup individual elements
    $post =~ s|$RE{microsyntax}{user}|<span class="user">$1</span>|go;


Please consult the manual of Regexp::Common for a general description of the works of this interface.

Do not use this module directly, but load it via Regexp::Common.

This module provides regular expressions for matching microblogging-style text (tweets, dents, microposts, etc.). It is based on the ruby twitter-text Regex class, with extensions to support features that Twitter doesn't support (like !group tags, slashtags, etc.).


Returns a pattern that matches @username handles. For this pattern and the next three, using '-keep' (see Regexp::Common) allows access to the following individual components:

$1 captures the entire match
$2 captures the sigil used ('@' for usernames, '#' or '' for hashtags, etc.)
$3 captures the text after the sigil i.e. the bare username, hashtag, etc.


Returns a pattern that matches #hashtags, with support for unicode hashtags. Note that all number hashtags are specifically excluded.


Returns a pattern that matches identica/ !group tags.


Returns a pattern that matches slashtags, as defined and documented at These normally occur at the end of a post, with the first (but typically not the others) introduced by a slash e.g.

  Sample post /via @person1 by @person2 cc @person3 @person4

The following slashtags are recognised:

cc, for, and tip
hat tip, ht, and via

For this pattern, using '-keep' allows access to the following individual components:

$1 captures the entire match
$2 captures the verbatim slashtag (e.g. '/via', 'cc', 'by')
$3 captures the (potentially multiple) @user handles with this slashtag


Gavin Carr <>


Please report any bugs or feature requests to bug-regexp-common-microsyntax at, or through the web interface at


The Ruby twitter-text-rb library,




Copyright 2011 Gavin Carr <>.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See for more information.