NAME

Regexp::Common::Markdown - Markdown Common Regular Expressions

SYNOPSIS


            
              
              use Regexp::Common qw( Markdown );
while( <> )
{
    my $pos = pos( $_ );
    /\G$RE{Markdown}{Header}/gmc   and  print "Found a header at pos $pos\n";
    /\G$RE{Markdown}{Bold}/gmc     and  print "Found bold text at pos $pos\n";
}

VERSION


            
              
              v0.1.5

DESCRIPTION

This module provides Markdown regular expressions as set out by its original author John Gruber

There are different types of patterns: vanilla and extended. To get the extended regular expressions, use the -extended switch.

You can use each regular expression by using their respective names: Bold, Blockquote, CodeBlock, CodeLine, CodeSpan, Em, HtmlOpen, HtmlClose, HtmlEmpty, Header, HeaderLine, Image, ImageRef, Line, Link, LinkAuto, LinkDefinition, LinkRef, List

Almost all of the regular expressions use named capture. See "%+" in perlvar for more information on named capture.

For example:


            
              
              if( $text =~ /$RE{Markdown}{LinkAuto}/ )
{
    print( "Found https url \"$+{link_https}\"\n" ) if( $+{link_https} );
    print( "Found file url \"$+{link_file}\"\n" ) if( $+{link_file} );
    print( "Found ftp url \"$+{link_ftp}\"\n" ) if( $+{link_ftp} );
    print( "Found e-mail address \"$+{link_mailto}\"\n" ) if( $+{link_mailto} );
    print( "Found Found phone number \"$+{link_tel}\"\n" ) if( $+{link_tel} );
    my $url = URI->new( $+{link_https} );
}

As a general rule, Markdown rule requires that the text being parsed be de-tabbed, i.e. with its tabs converted into 4 spaces. Those regular expressions reflect this principle.

STANDARD MARKDOWN

`$RE{Markdown}`

This returns a pattern that recognises any of the supported vanilla Markdown formatting. If you pass the -extended parameter, some will be added and some of those regular expressions will be replaced by their extended ones, such as ExtAbbr, ExtCodeBlock, ExtLink, ExtAttributes

Blockquote


            
              
              $RE{Markdown}{Blockquote}

For example:


            
              
              > foo
>
> > bar
>
> foo

You can see example of this regular expression along with test units here: https://regex101.com/r/TdKq0K/1/tests

The capture names are:

bquote_all: The entire capture of the blockquote.
bquote_other: The inner content of the blockquote.

You can see also Markdown::Parser::Blockquote

Bold


            
              
              $RE{Markdown}{Bold}

For example:


            
              
              **This is a text in bold.**
__And so is this.__

You can see example of this regular expression along with test units here: https://regex101.com/r/Jp2Kos/3

The capture names are:

bold_all: The entire capture of the text in bold including the enclosing marker, which can be either ** or __
bold_text: The text within the markers.
bold_type: The marker type used to highlight the text. This can be either ** or __

You can see also Markdown::Parser::Bold

Code Block


            
              
              $RE{Markdown}{CodeBlock}

For example:


            
              
              ```
Some text
    Indented code block sample code
```

You can see example of this regular expression along with test units here: https://regex101.com/r/M6W99K/7

The capture names are:

code_all: The entire capture of the code block, including the enclosing markers, such as ```
code_content: The content of the code enclosed within the 2 markers.
code_start: The enclosing marker used to mark the code. Typically ```.
code_trailing_new_line: The possible trailing new lines. This is used to detect if any were captured in order to put them back in the parsed text for the next markdown, since the last new lines of a markdown are alos the first new lines of the next ones and new lines are used to delimit markdowns.

You can see also Markdown::Parser::Code

Code Line


            
              
              $RE{Markdown}{CodeLine}

For example:


            
              
              the lines in this block  
all contain trailing spaces

You can see example of this regular expression along with test units here: https://regex101.com/r/toEboU/3

The capture names are:

code_after: This contains the data that follows the code block.
code_all: The entire capture of the code lines.
code_content: The content of the code.
code_prefix: This contains the leading spaces used to mark the code as code.

You can see also Markdown::Parser::Code

Code Span


            
              
              $RE{Markdown}{CodeSpan}

For example:


            
              
              This is some `inline code`

You can see example of this regular expression along with test units here: https://regex101.com/r/C2Vl9M/1/tests

The capture names are:

code_all: The entire capture of the code lines.
code_start: Contains the marker that delimit the inline code. The delimiter is `
code_content: The content of the code.

You can see also Markdown::Parser::Code

Emphasis


            
              
              $RE{Markdown}{Em}

For example:


            
              
              This routine parameter is _test_

You can see example of this regular expression along with test units here: https://regex101.com/r/eDb6RN/5

You can see also Markdown::Parser::Emphasis


            
              
              $RE{Markdown}{Header}

For example:


            
              
              ### This is a H3 Header
### And so is this one ###

You can see example of this regular expression along with test units here: https://regex101.com/r/9uQwBk/4

The capture names are:

header_all

The entire capture of the code lines.

header_content

The text that is enclosed in the header marker.

header_level

This contains all the dashes that precedes the text. The number of dash indicates the level of the header. Thus, you could do something like this:


            
              
              length( $+{header_level} );

You can see also Markdown::Parser::Header

Header Line


            
              
              $RE{Markdown}{HeaderLine}

For example:


            
              
              This is an H1 header
====================
And this is a H2
-----------

You can see example of this regular expression along with test units here: https://regex101.com/r/sQLEqz/3

The capture names are:

header_all

The entire capture of the code lines.

header_content

The text that is enclosed in the header marker.

header_type

This contains the marker line used to mark the line above as header.

A line using = is a header of level 1, while a line using - is a header of level 2.

You can see also Markdown::Parser::Header

HTML


            
              
              $RE{Markdown}{Html}

For example:


            
              
              <div>
    foo
</div>

You can see example of this regular expression along with test units here: https://regex101.com/r/SH8ki3/4

The capture names are:

html_all

The entire capture of the html block.

html_comment

If this html block is a comment, this will contain the data within the comment.

html_content

The inner content between the opning and closing tag. This could be more html block or some text.

This capture will not be available obviously for html tags that are "empty" by nature, such as <hr />

tag_attributes

The attributes of the opening tag, if any. For example:


            
              
              <div title="Start" class="center large" id="extra_stuff">
    <span title="Brand name">MyWorld</span>
</div>

Here, the attributes will be:


            
              
              title="Start" class="center large" id="extra_stuff"

tag_close

The closing tag, including enclosing brackets.

tag_name

This contains the name of the first html tag encountered, i.e. the one that starts the html block. For example:


            
              
              <div>
    <span title="Brand name">MyWorld</span>
</div>

Here the tag name will be div

You can see also Markdown::Parser::HTML

Image


            
              
              $RE{Markdown}{Image}

For example:


            
              
              ![Alt text](/path/to/img.jpg)


            
              
              ![Alt text](/path/to/img.jpg "Optional title")

or, with reference:


            
              
              ![alt text][foo]

You can see example of this regular expression along with test units here: https://regex101.com/r/z0yH2F/10

The capture names are:

img_all

The entire capture of the markdown, such as:


            
              
              ![Alt text](/path/to/img.jpg)

img_alt

The alternative tet to be displayed for this image. This is mandatory as per markdown, so it is guaranteed to be available.

img_id

If the image, is an image reference, this will contain the reference id. When an image id is provided, there is no url and no title, because the image reference provides those information.

img_title

This is the title of the image, which may not exist, since it is optional in markdown. The title is surrounded by single or double quote that are captured in img_title_container

img_url

This is the url of the image.

You can see also Markdown::Parser::Image

Line


            
              
              $RE{Markdown}{Line}

For example:

---


            
              
              - - -

***


            
              
              * * *

___


            
              
              _ _ _
$text =~ s{$RE{Markdown}{Line}}
{
    # processing
}gexm;

Note that this regular expression uses multiline switch and not the single line /s switch since a markdown horizontal line does not span multiple lines.

You can see example of this regular expression along with test units here: https://regex101.com/r/Vlew4X/2

The capture names are:

line_all: The entire capture of the horizontal line.
line_type: This contains the marker used to set the line. Valid markers are *, -, or _

You can see also Markdown::Parser::Line

Line Break


            
              
              $RE{Markdown}{LineBreak}

For example:


            
              
              Mignonne, allons voir si la rose  
Qui ce matin avait déclose  
Sa robe de pourpre au soleil,  
A point perdu cette vesprée,  
Les plis de sa robe pourprée,  
Et son teint au vôtre pareil.

To ensure arbitrary line breaks, each line ends with 2 spaces and 1 line break. This should become:


            
              
              Mignonne, allons voir si la rose<br />
Qui ce matin avait déclose<br />
Sa robe de pourpre au soleil,<br />
A point perdu cette vesprée,<br />
Les plis de sa robe pourprée,<br />
Et son teint au vôtre pareil.

P.S.: If you're wondering, this is an extract from Ronsard.

You can see example of this regular expression along with test units here: https://regex101.com/r/6VG46H/1/

There is only one capture name: br_all. This is basically used like this:


            
              
              if( $text =~ /\G$RE{Markdown}{LineBreak}/ )
{
    print( "Found a line break\n" );
}


            
              
              $text =~ s/$RE{Markdown}{LineBreak}/<br \/>\n/gs;

You can see also Markdown::Parser::NewLine

The capture name is:

br_all: The entire capture of the line break.

Link


            
              
              $RE{Markdown}{Link}

For example:


            
              
              [Inline link](https://www.example.com "title")


            
              
              [Inline link](/some/path "title")

or, without title


            
              
              [Inline link](/some/path)

or with a reference id:


            
              
              [reference link][refid]
[refid]: /path/to/something (Title)

or, using the link text as the id for the reference:


            
              
              [My Example][]
[My Example]: https://example.com (Great Example)

You can see example of this regular expression along with test units here: https://regex101.com/r/sGsOIv/10

The capture names are:

link_all

The entire capture of the link.

link_title_container

If there is a link title, this contains the single or double quote enclosing it.

link_id

The link reference id. For example here 1 is the id.


            
              
              [Reference link 1 with parens][1]

link_name

The link text

link_title

The link title, if any.

link_url

The link url, if any

You can see also Markdown::Parser::Link and Regexp::Common::URI

Link Auto


            
              
              $RE{Markdown}{LinkAuto}

Supports, http, https, ftp, newsgroup, local file, e-mail address or phone numbers

For example:


            
              
              <https://www.example.com>

would become:


            
              
              <a href="https://www.example.com">https://www.example.com</a>

An e-mail such as:


            
              
              <!#$%&'*+-/=?^_`.{|}~@example.com>

would become:


            
              
              <a href="mailto:!#$%&'*+-/=?^_`.{|}~@example.com>!#$%&'*+-/=?^_`.{|}~@example.com</a>

Other possible and valid e-mail addresses:


            
              
              <"abc@def"@example.com>
<jsmith@[192.0.2.1]>

A file link:


            
              
              <file:///Volume/User/john/Document/form.rtf>

A newsgroup link:


            
              
              <news:alt.fr.perl>

A ftp uri:


            
              
              <ftp://ftp.example.com/plop/>

Phone numbers:


            
              
              <+81-90-1234-5678>
<tel:+81-90-1234-5678>

You can see example of this regular expression along with test units here: https://regex101.com/r/bAUu1E/3/tests

The capture names are:

link_all: The entire capture of the link.
link_file: A local file url, such as: ile:///Volume/User/john/Document/form.rtf
link_ftp: Contains an ftp url
link_http: Contains an http url
link_https: Contains an https url
link_mailto: An e-mail address with or without the mailto: prefix.
link_news: A newsgroup link url, such as news:alt.fr.perl
link_tel: Contains a telephone url according to the rfc 3966
link_url: Contains the link uri, which contains one of link_file, link_ftp, link_http, link_https, link_mailto, link_news or link_tel

You can see also Markdown::Parser::Link

Link Definition


            
              
              $RE{Markdown}{LinkDefinition}

For example:


            
              
              [1]: /url/  "Title"
[refid]: /path/to/something (Title)

Extra care has been implemented to avoid link definition from being confused with footnotes:


            
              
              [^block]:
        Paragraph.

You can see example of this regular expression along with test units here: https://regex101.com/r/edg2F7/3

The capture names are:

link_all: The entire capture of the link.
link_id: The link id
link_title: The link title
link_title_container: The character used to enclose the title, if any. This is either " or '
link_url: The link url

You can see also Markdown::Parser::LinkDefinition

Link Reference


            
              
              $RE{Markdown}{LinkRef}

Example:


            
              
              Foo [bar] [1].
Foo [bar][1].
Foo [bar]
[1].
[Foo][]
[1]: /url/  "Title"
[Foo]: https://www.example.com

You can see example of this regular expression along with test units here: https://regex101.com/r/QmyfnH/1/tests

The capture names are:

link_all

The entire capture of the link.

link_id

The link reference id. For example here 1 is the id.


            
              
              [Reference link 1 with parens][1]

link_name

The link text

You can see also Markdown::Parser::Link

List


            
              
              $RE{Markdown}{List}

For example, an unordered list:


            
              
              *       asterisk 1
*       asterisk 2
*       asterisk 3

or, an ordered list:


            
              
              1. One item
1. Second item
1. Third item

You can see example of this regular expression along with test units here: https://regex101.com/r/RfhRVg/5

The capture names are:

list_after

The data that follows the list.

list_all

The entire capture of the markdown.

list_content

The content of the list.

list_prefix

Contains the first list marker possible preceded by some space. A list marker is *, or +, or - or a digit with a dot such as 1.

list_type_any

Contains the list marker such as *, or +, or - or a digit with a dot such as 1.

This is included in the list_prefix named capture.

list_type_any2

Sale as list_type_any, but matches the following item if any. If there is no matching item, then an end of string is expected.

list_type_ordered

Contains a digit followed by a dot if the list is an ordered one.

list_type_ordered2

Same as list_type_ordered, but for the following list item, if any.

list_type_unordered_minus

Contains the marker of a minus - value if the list marker uses a minus sign.

list_type_unordered_minus2

Same as list_type_unordered_minus, but for the following list item, if any.

list_type_unordered_plus

Contains the marker of a plus + value if the list marker uses a plus sign.

list_type_unordered_plus2

Same as list_type_unordered_plus, but for the following list item, if any.

list_type_unordered_star

Contains the marker of a star * value if the list marker uses a star.

list_type_unordered_star2

Same as list_type_unordered_star, but for the following list item, if any.

You can see also Markdown::Parser::List

List First Level


            
              
              $RE{Markdown}{ListFirstLevel}

This regular expression is used for top level list, as opposed to the nth level pattern that is used for sub list. Both will match lists within list, but the processing under markdown is different whether the list is a top level one or an sub one.

You can see also Markdown::Parser::List

List Nth Level


            
              
              $RE{Markdown}{ListNthLevel}

Regular expression to process list within list.

You can see also Markdown::Parser::List

List Item


            
              
              $RE{Markdown}{ListItem}

You can see example of this regular expression along with test units here: https://regex101.com/r/bulBCP/1/tests

The capture names are:

li_all: The entire capture of the markdown.
li_content: Contains the data contained in this list item
li_lead_line: The optional leding line breaks
li_lead_space: The optional leading spaces or tabs. This is used to check that following items belong to the same list level
list_type_any: This contains the list type marker, which can be *, +, - or a digit with a dot such as 1.
list_type_any2: Sale as list_type_any, but matches the following item if any. If there is no matching item, then an end of string is expected.
list_type_ordered: This contains a true value if the list marker contains a digit followed by a dot, such as 1.
list_type_ordered2: Same as list_type_ordered, but for the following list item, if any.
list_type_unordered_minus: This contains a true value if the list marker is a minus sign, i.e. -
list_type_unordered_minus2: Same as list_type_unordered_minus, but for the following list item, if any.
list_type_unordered_plus: This contains a true value if the list marker is a plus sign, i.e. +
list_type_unordered_plus2: Same as list_type_unordered_plus, but for the following list item, if any.
list_type_unordered_star: This contains a true value if the list marker is a star, i.e. *
list_type_unordered_star2: Same as list_type_unordered_star, but for the following list item, if any.

You can see also Markdown::Parser::ListItem

Paragraph


            
              
              $RE{Markdown}{Paragraph}

For example:


            
              
              The quick brown fox
jumps over the lazy dog
Lorem Ipsum
> Why am I matching?
1. Nonononono!
* Aaaagh!
# Stahhhp!

This regular expression would capture the whole block up until "Lorem Ipsum", but will be careful not to catch other markdown element after that. Thus, anything after "Lorem Ipsum" would not be caught because this is a blockquote.

You can see example of this regular expression along with test units here: https://regex101.com/r/0B3gR4/5

The capture names are:

para_all: The entire capture of the paragraph.
para_content: Content of the paragraph
para_prefix: Any leading space (up to 3)

You can see also Markdown::Parser::Paragraph

EXTENDED MARKDOWN

Abbreviation


            
              
              $RE{Markdown}{ExtAbbr}

For example:


            
              
              Some discussion about HTML, SGML and HTML4.
*[HTML4]: Hyper Text Markup Language version 4
*[HTML]: Hyper Text Markup Language
*[SGML]: Standard Generalized Markup Language

You can see example of this regular expression along with test units here: https://regex101.com/r/ztM2Pw/2/tests

The capture names are:

abbr_all: The entire capture of the abbreviation.
abbr_name: Contains the abbreviation. For example HTML
abbr_value: Contains the abbreviation value. For example Hyper Text Markup Language

You can see also Markdown::Parser::Abbr

Attributes


            
              
              $RE{Markdown}{ExtAttributes}

For example, an header with attribute .cl.class#id7


            
              
              ### Header  {.cl.class#id7 }

Checkbox


            
              
              $RE{Markdown}{ExtCheckbox}

Introduced by Github, this markdown extension captures checkboxes whether checked or unchecked.

For example:


            
              
              - [ ] foo
- [x] bar

would become:

Those checkboxes can be placed anywhere, not just in a list.

You can see example of this regular expression along with test units here: https://regex101.com/r/ezMwsv/2/

The capture names are:

check_all: The entire capture of the checkbox.
check_content: The value inside the square brackets, which is either a blank, or the letter X in either lower or upper case.

You can see also Markdown::Parser::Checkbox

Code Block


            
              
              $RE{Markdown}{ExtCodeBlock}

This is the same as conventional blocks with backticks, except the extended version uses tilde characters.

For example:

You can see example of this regular expression along with test units here: https://regex101.com/r/Y9lPAz/9

The capture names are:

code_all

The entire capture of the code.

code_attr

The class and/or id attributes for this code. This is something like:


            
              
              `````` .html {#codeid}
</div>
``````

Here, code_class would contain #codeid

code_class

The class of code. For example:


            
              
              ``````html {#codeid}
</div>
``````

Here the code class would be html

code_content

The code data enclosed within the code markers (backticks or tilde)

code_start

Contains the code delimiter, which is either a series of backticks ` or tilde ~

You can see also Markdown::Parser::Code

Footnotes


            
              
              $RE{Markdown}{ExtFootnote}

This looks like this:


            
              
              [^1]: Content for fifth footnote.
[^2]: Content for sixth footnote spaning on 
    three lines, with some span-level markup like
    _emphasis_, a [link][].

A reference to those footnotes could be:


            
              
              Some paragraph with a footnote[^1], and another[^2].

The footnote_id reference can be anything as long as it is unique.

You can see also Markdown::Parser::Footnote

Inline Footnotes

For consistency with links, footnotes can be added inline, like this:


            
              
              I met Jack [^jack](Co-founder of Angels, Inc) at the meet-up.

Inline notes will work even without the identifier. For example:


            
              
              I met Jack [^](Co-founder of Angels, Inc) at the meet-up.

However, in compliance with pandoc footnotes style, inline footnotes can also be added like this:


            
              
              Here is an inline note.^[Inlines notes are easier to write, since
you don't have to pick an identifier and move down to type the
note.]

You can see example of this regular expression along with test units here: https://regex101.com/r/WuB1FR/2/

The capture names are:

footnote_all: The entire capture of the footnote.
footnote_id: The footnote id which must be unique and will be referenced in text.
footnote_text: The footnote text

You can see also Markdown::Parser::Footnote

Footnote Reference


            
              
              $RE{Markdown}{ExtFootnoteReference}

This regular expression matches 3 types of footnote references:

1 Conventional

An id is specified referring to a footnote that provide details.


            
              
              Here's a simple footnote,[^1]
[^1]: This is the first footnote.

2 Inline


            
              
              I met Jack [^jack](Co-founder of Angels, Inc) at the meet-up.

Inline footnotes without any id, i.e. auto-generated id. For example:


            
              
              I met Jack [^](Co-founder of Angels, Inc) at the meet-up.

3 Inline auto-generated, pandoc style


            
              
              Here is an inline note.^[Inlines notes are easier to write, since
you don't have to pick an identifier and move down to type the
note.]

See pandoc manual for more information

You can see example of this regular expression along with test units here: https://regex101.com/r/3eO7rJ/1/

The capture names are:

footnote_all

The entire capture of the footnote reference.

footnote_id

The footnote id which must be unique and must match a footnote declared anywhere in the document and not necessarily before. For example:


            
              
              Here's a simple footnote,[^1]
[^1]: This is the first footnote.

1 here is the id fo the footnote.

If it is not provided, then an id will be auto-generated, but a footnote text is then required.

footnote_text

The footnote text is optional if an id is provided. If an id is not provided, the fotnote text is guaranteed to have some value.

You can see also Markdown::Parser::FootnoteReference

Header


            
              
              $RE{Markdown}{ExtHeader}

This extends regular header with attributes.

For example:


            
              
              ### Header  {.cl.class#id7 }

You can see example of this regular expression along with test units here: https://regex101.com/r/GyzbR2/3

The capture names are:

header_all

The entire capture of the code lines.

header_attr

Contains the extended attribute set. For example:


            
              
              {.class#id}

header_content

The text that is enclosed in the header marker.

header_level

This contains all the dashes that precedes the text. The number of dash indicates the level of the header. Thus, you could do something like this:


            
              
              length( $+{header_level} );

You can see also Markdown::Parser::Header

Header Line


            
              
              $RE{Markdown}{ExtHeaderLine}

Same as header line, but with attributes.

For example:


            
              
              Header  {#id5.cl.class}
======

You can see example of this regular expression along with test units here: https://regex101.com/r/berfAR/3

The capture names are:

header_all

The entire capture of the code lines.

header_attr

Contains the extended attribute set. For example:


            
              
              {.class#id}

header_content

The text that is enclosed in the header marker.

header_type

This contains the marker line used to mark the line above as header.

A line using = is a header of level 1, while a line using - is a header of level 2.

You can see also Markdown::Parser::Header

HTML Markdown


            
              
              $RE{Markdown}{ExtHtmlMarkdown}

This is markdown embedded in html using the html tag attribute markdown="1"

For example:


            
              
              <div>
    <div markdown="1">
    This is a code block however:
        </div>
    Funny isn't it? Here is a code span: `</div>`.
    </div>
</div>

This would capture the following as markdown data:


            
              
              This is a code block however:
    </div>
Funny isn't it? Here is a code span: `</div>`.

And since </div> is indented, it would be treated as a line of code rather than html. The second </div> snce it is surrounded by backticks.

You can see example of this regular expression along with test units here: https://regex101.com/r/M6KCjp/3

The capture names are:

content

Contains the markdown data enclosed.

div_close

Contains the closing tag.

div_open

Contains the entire opening tag.

For example, in:


            
              
              <table>
<tr><td markdown="1">test _emphasis_ (span)</td></tr>
</table>

this would match:


            
              
              <td markdown="1">

leading_space

Contains any leading space before the start of the tag containing the markdown data.

html_markdown_all

Contains the entire block of data captured

mark_pat1

This contains the data captured in pattern type 1, which matches on-line html and multiline ones.

For example:


            
              
              <abbr markdown="1" title="`second backtick!">SB</abbr>


            
              
              <div>
    <div markdown="1">
    This is a code block however:
        </div>
    Funny isn't it? Here is a code span: `</div>`.
    </div>
</div>

mark_pat2

This contains the data captured in pattern type 2, which matches html markdown

For example:


            
              
              <table>
<tr><td markdown="1">test _emphasis_ (span)</td></tr>
</table>

quote

Contains the type of quote used in:


            
              
              <table>
<tr><td markdown="1">test _emphasis_ (span)</td></tr>
</table>

This would be "

tag_name

This contains the tag name that contains the markdown data.

Image


            
              
              $RE{Markdown}{ExtImage}

Same as regular image, but with attributes.

For example:


            
              
              This is an ![inline image](/img "title"){.class #inline-img}.

You can see example of this regular expression along with test units here: https://regex101.com/r/xetHV1/4

The capture names are:

img_all

The entire capture of the markdown, such as:


            
              
              ![Alt text](/path/to/img.jpg)

img_alt

The alternative tet to be displayed for this image. This is mandatory as per markdown, so it is guaranteed to be available.

img_attr

Contains the extended attribute set. For example:


            
              
              {.class#id}

img_id

If the image, is an image reference, this will contain the reference id. When an image id is provided, there is no url and no title, because the image reference provides those information.

img_title

This is the title of the image, which may not exist, since it is optional in markdown. The title is surrounded by single or double quote that are captured in img_title_container

img_url

This is the url of the image.

You can see also Markdown::Parser::Image

Insertion


            
              
              $RE{Markdown}{ExtInsertion}

This is an extension to the original Markdown.

For example:


            
              
              Tickets for the event are ~~€5~~ ++€10++

Which would become:


            
              
              Tickets for the event are <del>€5</del> <ins>€10</ins>

With €5 being stroken through and €10 being highlighted as being added. The actual representation depends on the web browser of course.

You can see example of this regular expression along with test units here: https://regex101.com/r/IZw4YU/1/

The capture names are:

ins_all: The entire capture of the insertion.
ins_content: The content of the text being inserted. In the example above, this would be €10

Katex Math Expression


            
              
              $RE{Markdown}{ExtKatex}

This is used to capture Katex math expression.

It supports the following delimiters:

open delimiter: $$

close delimiter: $$
open delimiter: $$

close delimiter: $$
open delimiter: \[

close delimiter: \]
open delimiter: $

close delimiter: $

For example:


            
              
              $$
\Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt\,.
$$


            
              
              Other node \[ displaymath \frac{1}{2} \]

It does not matter whether the expression is in its own block (first example) or inline (second example)

You can see a demo here.

By default, it supports all 4 delimiters mentioned above, but if you have some expression in your doc that may conflict, such as:


            
              
              LD_PRELOAD=libusb-driver.so $0.bin $*

Then, you can chose which delimiter to activate by calling the regular expression like this:


            
              
              $RE{Markdown}{ExtKatex}{-delimiter => '$$,$$,\[,\],\(,\)'}

As you can see you can pass the argument -delimiter and providing a comma delimited series of opening en closing delimiters. In the above example:


            
              
              $$,$$ # open, close
\[,\] # open, close
\(,\) # open, close

I would gladly allow for an array reference to be provided, but the Regexp::Common api does not make that possible.

Since Katex only recognises those delimiters, you can only choose among those.

Also, in the above example, I used single quotes because of enclosed dolar sign. Of course, if you prefer to use double quote, then you need to escape the dollar signs.

You can see example of this regular expression along with test units here: https://regex101.com/r/43OuNT/3/

The capture names are:

katex_all: The entire capture of the math expression, including its delimiters, typically $$.
katex_close: Contains the closing delimiter, such as $$, $, \] or \)
katex_content: The content of the math expression, i.e. without the surrounding delimiters
katex_open: Contains the opening delimiter, such as $$, $, \[ or \(

Link


            
              
              $RE{Markdown}{ExtLink}

Same as regular links, but with attributes.

For example:


            
              
              This is an [inline link](/url "title"){.class #inline-link}.

You can see example of this regular expression along with test units here: https://regex101.com/r/7mLssJ/7

The capture names are:

link_all

The entire capture of the link.

link_attr

Contains the extended attribute set. For example:


            
              
              {.class#id}

link_all would contain .class#id

link_title_container

If there is a link title, this contains the single or double quote enclosing it.

link_id

The link reference id. For example here 1 is the id.


            
              
              [Reference link 1 with parens][1]

link_name

The link text

link_title

The link title, if any.

link_url

The link url, if any

You can see also Markdown::Parser::Link

Link Definition


            
              
              $RE{Markdown}{ExtLinkDefinition}

Same as regular link definition, but with attributes

For example:


            
              
              [refid]: /path/to/something (Title) { .class #ref data-key=val }

You can see example of this regular expression along with test units here: https://regex101.com/r/hVfXCe/3

The capture names are:

link_all

The entire capture of the link.

link_attr

Contains the extended attribute set. For example:


            
              
              {.class#id}

link_id

The link id

link_title

The link title

link_title_container

The character used to enclose the title, if any. This is either " or '

link_url

The link url

You can see also Markdown::Parser::LinkDefinition

Strikethrough


            
              
              $RE{Markdown}{ExtStrikeThrough}

This is an extension brought by Git Flavoured Markdown.

For example:


            
              
              ~~Hi~~ Hello, world!

You can see example of this regular expression along with test units here: https://regex101.com/r/4Z3h4F/1/

The capture names are:

strike_all: The entire capture of the strikethrough.
strike_content: The content of the text being stroken through. In the example above, this would be Hi

Subscript


            
              
              $RE{Markdown}{ExtSubscript}

For example:


            
              
              log~10~100 is 2.

would set 10 as a subscript by the software using this regular expression.

You can see example of this regular expression along with test units here: https://regex101.com/r/gF6wVe/2

The capture names are:

sub_all: The entire capture of the subscript.
sub_text: Contains the text of the subscript

Superscript


            
              
              $RE{Markdown}{ExtSuperscript}

For example:


            
              
              2^10^ is 1024.

would set 10 in superscript by the software using this regular expression.

You can see example of this regular expression along with test units here: https://regex101.com/r/yAcNcX/1

The capture names are:

sup_all: The entire capture of the superscript.
sup_text: Contains the text of the superscript

Table


            
              
              $RE{Markdown}{ExtTable}

This is an extensive regular expression to capture all kinds of tables, including with caption on top or bottom.

For example:

You can see example of this regular expression along with test units here: https://regex101.com/r/01XCqB/12

The capture names are:

table

The entire capture of the table.

table_after

Contains the data that follows the table.

table_caption

Contains the table caption if set. A table caption, in markdown can be position before or after the table.

If you use "%-" in perlvar then $-{table_caption}-[0]> will give you the table caption if it was set at the top of the table, and $-{table_caption}-[1]> will give you the table caption if it was set at the bottom of the table.

table_headers

Contains the entire header rows

table_header1

Contains the first row of the header. This is contained within the capture name table_headers

table_header2

Contains the second row, if any, of the header. This is contained within the capture name table_headers

A second is optional and there can be only two rows in the headers as per standards.

table_header_sep

Contain the separator line between the table header and the table body.

table_rows

Contains the table body rows

Table format is taken from David E. Wheeler RFC

You can see also Markdown::Parser::Table

CHANGES & CONTRIBUTIONS

Feel free to reach out to the author for possible corrections, improvements, or suggestions.

AUTHOR

Jacques Deguest <jack@deguest.jp>

CREDITS

Credits to Michel Fortin and John Gruber for their test units.

Credits to Firas Dib for his online regular expression test tool.

COPYRIGHT & LICENSE

You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.

To install Regexp::Common::Markdown, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Regexp::Common::Markdown

CPAN shell

perl -MCPAN -e shell
install Regexp::Common::Markdown

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

VERSION

DESCRIPTION

STANDARD MARKDOWN

$RE{Markdown}

Blockquote

Bold

Code Block

Code Line

Code Span

Emphasis

Header

Header Line

HTML

Image

Line

Line Break

Link

Link Auto

Link Definition

Link Reference

List

List First Level

List Nth Level

List Item

Paragraph

EXTENDED MARKDOWN

Abbreviation

Attributes

Checkbox

Code Block

Footnotes

Inline Footnotes

Footnote Reference

Header

Header Line

HTML Markdown

Image

Insertion

Katex Math Expression

Link

Link Definition

Strikethrough

Subscript

Superscript

Table

SEE ALSO

CHANGES & CONTRIBUTIONS

AUTHOR

CREDITS

COPYRIGHT & LICENSE

Module Install Instructions

`$RE{Markdown}`