The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

Name

SPVM::Document::Language::Tokenization - Lexical Tokenization in The SPVM Language

Description

This document describes lexical tokenization in the SPVM language.

Tokenization

The tokenizing the source codes of SPVM language is explained.

Character Encoding of Source Code

The character encoding of SPVM source codes is UTF-8.

If a character is ASCII, it must be ASCII printable characters or ASCII space characters except for ASCII CR.

Compilation Errors:

The charactor encoding of SPVM source codes must be UTF-8. Otherwise a compilation error occurs.

If a character in an SPVM source code is ASCII, it must be ASCII printable or space.

The new line of SPVM source codes must be LF. The source code cannot contains CR and CRLF.

Line Terminators

The line terminators are 0x2A LF of ASCII.

When a line terminator appears, the current line number is incremented by 1.

Space Character

Space characters are SP, HT, FF of ASCII and the line terminators.

Word Character

The word characters are alphabet(a-zA-Z), number(0-9), and underscore(_) of ASCII.

Symbol Name

A symbol name is the characters that are composed of word characters and ::.

A symbol name cannnot contains __, and cannnot begin with a number 0-9.

A symbol name cannnot begin with ::, and cannnot end with ::.

A symbol name cannnot contains ::::, and cannnot begin with a number 0-9.

  # Symbol names
  foo
  foo_bar2
  Foo::Bar
  
  # Invalid symbol names
  2foo
  foo__bar
  ::Foo
  Foo::
  Foo::::Bar

Class Name

A class name is a symbol name.

The part names of a class name must begin uppercase letter. If the class name is Foo:Bar::Baz, part names are Foo, Bar, and Baz.

A class name must be the name that the relative class file path's all / are replaced with :: and the trailing .spvm is removed. For example, If the relative class file path is Foo/Bar/Baz.spvm, the class name must be Foo::Bar::Baz.

  # Valid class name in the class file "Foo/Bar/Baz.spvm"
  class Foo::Bar::Baz {
    
  }

  # Invalid class name in the class file "Foo/Bar/Baz.spvm"
  class Foo::Bar::Hello {
    
  }

Compilation Errors:

If class names are invalid, a compilation error occurs.

Examples:

  # Class names
  Foo
  Foo::Bar
  Foo::Bar::Baz3
  Foo::bar
  Foo_Bar::Baz_Baz

  # Invalid class names
  Foo
  Foo::::Bar
  Foo::Bar::
  Foo__Bar
  Foo::bar

Method Name

A method name is a symbol name that doesn't contains ::.

0-length method name is valid. This is used in the anon method.

Compilation Errors:

If method names are invalid, a compilation error occurs.

Examples:

  # Valid method names
  FOO
  FOO_BAR3
  foo
  foo_bar
  _foo
  _foo_bar_

  # Invalid method names
  foo__bar
  3foo

A method name that is the same as a "Keyword" in keyword is allowed.

  # "if" is a valid method name
  static method if : void () {
    
  }

Field Name

A field name is a symbol name that doesn't contains ::.

Compilation Errors:

If field names are invalid, a compilation error occurs.

Examples:

  # Field names
  FOO
  FOO_BAR3
  foo
  foo_bar
  _foo
  _foo_bar_

  # Invalid field names
  foo__bar
  3foo
  Foo::Bar

The field name that is the same as a "Keyword" in keyword is allowed.

  # "if" is a valid field name
  has if : int;

Variable Name

A variable name begins with $ and is followed by a symbol name.

Compilation Errors:

The symbol name can be wrapped by { and }. If a opening { exists and the closing } doesn't exists, a compilation error occurs.

Examples:

  # Variable names
  $name
  $my_name
  ${name}
  $Foo::name
  $Foo::Bar::name
  ${Foo::name}

  # Invalid variable names
  $::name
  $name::
  $Foo::::name
  $my__name
  ${name

Class Variable Name

A class variable name is a variable name.

Compilation Errors:

If class variable names are invalid, a compilation error occurs.

Examples:

  # Class variable names
  $NAME
  $MY_NAME
  ${NAME}
  $FOO::NAME
  $FOO::BAR::NAME
  ${FOO::NAME_BRACE}
  $FOO::name
  
  # Invalid class variable names
  $::NAME
  $NAME::
  $FOO::::NAME
  $MY__NAME
  $3FOO
  ${NAME

Local Variable Name

A local variable name is a variable name that doesn't contain ::.

Examples:

  # Local variable names
  $name
  $my_name
  ${name_brace}
  $_name
  $NAME

  # Invalid local variable names
  $::name
  $name::
  $Foo::name
  $Foo::::name
  $my__name
  ${name
  $3foo

Current Class

& before method name means the current class. & is replaced with CURRENT_CLASS_NAME->.

Examples:

  class Foo {
    
    static method test : void () {
      # This means Foo->sum(1, 2)
      my $ret = &sum(1, 2);
    }
  
    static method sum : int ($num1 : int, $num2 : int) {
      return $num1 + $num2;
    }
    
  }

Keyword

The list of keywords:

  alias
  allow
  as
  basic_type_id
  break
  byte
  can
  case
  cmp
  class
  compile_type_name
  copy
  default
  die
  div_uint
  div_ulong
  double
  dump
  elsif
  else
  enum
  eq
  eval
  eval_error_id
  extends
  for
  float
  false
  gt
  ge
  has
  if
  interface
  int
  interface_t
  isa
  isa_error
  isweak
  is_compile_type
  is_type
  is_error
  is_read_only
  args_width
  last
  length
  lt
  le
  long
  make_read_only
  my
  mulnum_t
  method
  mod_uint
  mod_ulong
  mutable
  native
  ne
  next
  new
  new_string_len
  of
  our
  object
  print
  private
  protected
  public
  precompile
  pointer
  return
  require
  required
  rw
  ro
  say
  static
  switch
  string
  short
  scalar
  true
  type_name
  undef
  unless
  unweaken
  use
  version
  void
  warn
  while
  weaken
  wo
  INIT
  __END__
  __PACKAGE__
  __FILE__
  __LINE__

Operator for Tokenization

The list of the operators for tokenization:

  !
  !=
  $
  %
  &
  &&
  &=
  =
  ==
  ^
  ^=
  |
  ||
  |=
  -
  --
  -=
  ~
  @
  +
  ++
  +=
  *
  *=
  <
  <=
  >
  >=
  <=>
  %
  %=
  <<
  <<=
  >>=
  >>
  >>>
  >>>=
  .
  .=
  /
  /=
  \
  (
  )
  {
  }
  [
  ]
  ;
  :
  ,
  ->
  =>

Note that the operators for tokenization are different from the operators that are explained in operators. The operators for tokenization are only for tokenization.

Comment

A comment begins with # and ends with a line terminator.

  # Comment

Comments have no meaning in source codes.

Line directives take precedence over comments.

A File directive take precedence over comments.

Line Directive

A line directive begins from the beggining of the line.

A line directive begins with #line and positive 32bit integer

  #line 39

And ends with a line terminator.

The line number in a line directive is set to the current line of the source code.

Line directives take precedence over comments.

Compilation Errors:

A line directive must begin from the beggining of the line. Otherwise an compilation error occurs.

A line directive must end with "\n". Otherwise an compilation error occurs.

A line directive must have a line number. Otherwise an compilation error occurs.

The line number given to a line directive must be a positive 32bit integer. Otherwise an compilation error occurs.

File Directive

A file directive begins from the beggining of the source code.

A file directive begins with #file " and is followed by a file path, and is closed with "

  #file "/Foo/Bar.spvm"

And ends with a line terminator.

The file path is set to the current file path of the source code.

A file directive take precedence over comments.

Compilation Errors:

A file directive must begin from the beggining of the source code. Otherwise an compilation error occurs.

A file directive must end with "\n". Otherwise an compilation error occurs.

A file directive must have a file path. Otherwise an compilation error occurs.

A file directive must end with ". Otherwise an compilation error occurs.

POD

POD(Plain Old Document) is a syntax to write documents in source codes.

The biginning of POD begins with =, and is followed by any string that is composed of ASCII printable characters, and end with a line terminator.

The previous line of the biginning of POD must need a line terminator

The lator line of the biginning of POD must need a line terminator

  =pod
  
  =head1
  
  =item * foo
  

The end of POD begins with =, and is followed by cut, and ends with a line terminator.

The previous line of the end of POD must need a line terminator

The lator line of the end of POD must need a line terminator

  =cut
  

Examples:

  =pod
  
  Multi-Line
  Comment
  
  =cut
  
  =head1
  
  Multi-Line
  Comment
  
  =cut
  

POD has no meaning in source codes.

Literal

A literal is the way to write a constant value in source codes.

Literals are numeric literals, the floating point literal, the character literal, the string literal and the bool literal.

Numeric Literal

A numeric literal is the way to write a constant value that type is a numeric type in source codes.

Numeric literals are the integer literal and the floating point literal.

Integer Literal

A interger literal is a "Numeric Literal" in numeric literal to write a constant value that type is an integer type in source codes.

Integer Literal Decimal Notation

The interger literal decimal notation is the way to write an integer literal using decimal numbers 0-9.

A minus - can be at the beginning, and is followed by one or more of 0-9.

_ can be used as a separator at the any positions after the first 0-9. _ has no meaning.

The suffix L or l can be at the end.

If the suffix L or l exists, the return type is the long type. Otherwise the return type is the int type.

Compilation Errors:

If the return type is the int type and the value is greater than the max value of int type or less than the minimal value of int type, a compilation error occurs.

If the return type is the long type and the value is greater than the max value of long type or less than the minimal value of long type, a compilation error occurs.

Examples:

  123
  -123
  123L
  123l
  123_456_789
  -123_456_789L

Integer Literal Hexadecimal Notation

The interger literal hexadecimal notation is the way to write an integer literal using hexadecimal numbers 0-9a-zA-Z.

A minus - can be at the beginning, and is followed by 0x or 0X, and is followed by one or more 0-9a-zA-Z.

_ can be used as a separator at the any positions after 0x or 0X. _ has no meaning.

The suffix L or l can be at the end.

If the suffix L or l exists, the return type is the long type. Otherwise the return type is the int type.

If the return type is the int type, the value that is except for - is interpreted as unsigned 32 bit integer uint32_t type in the C language, and the following conversion is performed.

  uint32_t value_uint32_t;
  int32_t value_int32_t = (int32_t)value_uint32_t;

And if - exists, the following conversion is performed.

  value_int32_t = -value_int32_t;

For example, 0xFFFFFFFF is the same as -1, -0xFFFFFFFF is the same as 1.

If the return type is the long type, the value that is except for - is interpreted as unsigned 64 bit integer uint64_t type in the C language, and the following conversion is performed.

  uint64_t value_uint64_t;
  value_int64_t = (int64_t)value_uint64_t;

And if - exists, the following conversion is performed.

  value_int64_t = -value_int64_t;

For example, 0xFFFFFFFFFFFFFFFFL is the same as -1L, -0xFFFFFFFFFFFFFFFFL is the same as 1L.

Compilation Errors:

If the return type is the int type and the value that is except for - is greater than hexadecimal FFFFFFFF, a compilation error occurs.

If the return type is the long type and the value that is except for - is greater than hexadecimal FFFFFFFFFFFFFFFF, a compilation error occurs.

Examples:

  0x3b4f
  0X3b4f
  -0x3F1A
  0xDeL
  0xFFFFFFFF
  0xFF_FF_FF_FF
  0xFFFFFFFFFFFFFFFFL

Integer Literal Octal Notation

The interger literal octal notation is the way to write an integer literal using octal numbers 0-7.

A minus - can be at the beginning, and is followed by 0, and is followed by one or more 0-7.

_ can be used as a separator at the any positions after 0. _ has no meaning.

The suffix L or l can be at the end.

If the suffix L or l exists, the return type is the long type. Otherwise the return type is the int type.

If the return type is the int type, the value that is except for - is interpreted as unsigned 32 bit integer uint32_t type in the C language, and the following conversion is performed.

  uint32_t value_uint32_t;
  int32_t value_int32_t = (int32_t)value_uint32_t;

And if - exists, the following conversion is performed.

  value_int32_t = -value_int32_t;

For example, 037777777777 is the same as -1, -037777777777 is the same as 1.

If the return type is the long type, the value that is except for - is interpreted as unsigned 64 bit integer uint64_t type in the C language, and the following conversion is performed.

  uint64_t value_uint64_t;
  value_int64_t = (int64_t)value_uint64_t;

And if - exists, the following conversion is performed.

  value_int64_t = -value_int64_t;

For example, 01777777777777777777777L is the same as -1L, -01777777777777777777777L is the same as 1L.

Compilation Errors:

If the return type is the int type and the value that is except for - is greater than octal 37777777777, a compilation error occurs.

If the return type is the long type and the value that is except for - is greater than octal 1777777777777777777777, a compilation error occurs.

Examples:

  0755
  -0644
  0666L
  0655_755

Integer Literal Binary Notation

The interger literal binary notation is the way to write an integer literal using binary numbers 0 and 1.

A minus - can be at the beginning, and is followed by 0b or 0B, and is followed by one or more 0 and 1.

_ can be used as a separator at the any positions after 0b or 0B. _ has no meaning.

The suffix L or l can be at the end.

If the suffix L or l exists, the return type is the long type. Otherwise the return type is the int type.

If the return type is the int type, the value that is except for - is interpreted as unsigned 32 bit integer uint32_t type in the C language, and the following conversion is performed.

  uint32_t value_uint32_t;
  int32_t value_int32_t = (int32_t)value_uint32_t;

And if - exists, the following conversion is performed.

  value_int32_t = -value_int32_t;

For example, 0b11111111111111111111111111111111 is the same as -1, -0b11111111111111111111111111111111 is the same as 1.

If the return type is the long type, the value that is except for - is interpreted as unsigned 64 bit integer uint64_t type in the C language, and the following conversion is performed.

  uint64_t value_uint64_t;
  value_int64_t = (int64_t)value_uint64_t;

And if - exists, the following conversion is performed.

  value_int64_t = -value_int64_t;

For example, 0b1111111111111111111111111111111111111111111111111111111111111111L is the same as -1L, -0b1111111111111111111111111111111111111111111111111111111111111111L is the same as 1L.

Compilation Errors:

If the return type is the int type and the value that is except for - is greater than binary 11111111111111111111111111111111, a compilation error occurs.

If the return type is the long type and the value that is except for - is greater than binary 1111111111111111111111111111111111111111111111111111111111111111, a compilation error occurs.

Examples:

  0b0101
  -0b1010
  0b110000L
  0b10101010_10101010

Floating Point Literal

The floating point litral is a "Numeric Literal" in numeric literal to write a constant value that type is a floating point type in source codes.

Floating Point Literal Decimal Notation

The floating point litral decimal notation is the way to write a floating point literal using decimal numbers 0-9 in source codes.

A minus - can be at the beginning, and is followed by one or more 0-9

_ can be used as a separator at the any positions after the first 0-9.

And can be followed by a floating point part.

A floating point part is . and is followed by one or more 0-9.

And can be followed by an exponent part.

An exponent part is e or E and is followed by +, -, or "", and followed by one or more 0-9.

And can be followed by a suffix is f, F, d, or D.

one of a floating point part, an exponent part, or a suffix must exist.

If the suffix f or F exists, the return type is the float type. Otherwise the return type is the double type.

Compilation Errors:

If the return type is the float type, the floating point literal is parsed by the strtof function of the C language. If the parsing fails, a compilation error occurs.

If the return type is the double type, the floating point literal is parsed by the strtod function of the C language. If the parsing fails, a compilation error occurs.

Examples:

  1.32
  -1.32
  1.32f
  1.32F
  1.32d
  1.32D
  1.32e3
  1.32e-3
  1.32E+3
  1.32E-3
  12e7

Floating Point Literal Hexadecimal Notation

The floating point litral hexadecimal notation is the way to write a floating point literal using hexadecimal numbers 0-9a-zA-Z in source codes.

A minus - can be at the beginning, and is followed by 0x or 0X, and is followed by one or more 0-9a-zA-Z.

_ can be used as a separator at the any positions after 0x or 0X.

And can be followed by a floating point part.

A floating point part is . and is followed by one or more 0-9a-zA-Z.

And can be followed by an exponent part.

An exponent part is p or P and is followed by +, -, or "", and followed by one or more decimal numbers 0-9.

And can be followed by a suffix f, F, d, or D if an exponent part exist.

one of a floating point part or an exponent part must exist.

If the suffix f or F exists, the return type is the float type. Otherwise the return type is the double type.

Compilation Errors:

If the return type is the float type, the floating point literal is parsed by the strtof function of the C language. If the parsing fails, a compilation error occurs.

If the return type is the double type, the floating point literal is parsed by the strtod function of the C language. If the parsing fails, a compilation error occurs.

Examples:

  0x3d3d.edp0
  0x3d3d.edp3
  0x3d3d.edP3
  0x3d3d.edP+3
  0x3d3d.edP-3f
  0x3d3d.edP-3F
  0x3d3d.edP-3d
  0x3d3d.edP-3D
  0x3d3dP+3

Character Literal

A character literal is a literal to write a constant value that type is the byte type in source codes.

A character literal represents an ASCII character.

A character literal begins with '.

And is followed by a printable ASCII character 0x20-0x7e or an character literal escape character.

And ends with '.

The return type is the byte type.

Compilation Errors:

If the format of the character literal is invalid, a compilation error occurs.

Character Literal Escape Characters

The list of character literal escape characters.

Character literal escape characters ASCII characters
\0 0x00 NUL
\a 0x07 BEL
\t 0x09 HT
\n 0x0A LF
\f 0x0C FF
\r 0x0D CR
\" 0x22 "
\' 0x27 '
\\ 0x5C \
Octal Escape Character An ASCII character
Hexadecimal Escape Character An ASCII character

Examples:

  # Charater literals
  'a'
  'x'
  '\a'
  '\t'
  '\n'
  '\f'
  '\r'
  '\"'
  '\''
  '\\'
  '\0'
  ' '
  '\xab'
  '\xAB'
  '\x0D'
  '\x0A'
  '\xD'
  '\xA'
  '\xFF'
  '\x{A}'

String Literal

A string literal is a literal to write a constant value that type is the string type in source codes.

The return type is the string type.

A character literal begins with ".

And is followed by zero or more than zero UTF-8 character, or string literal escape characters, or variable expansions.

And ends with ".

Compilation Errors:

If the format of the string literal is invalid, a compilation error occurs.

Examples:

  # String literals
  "abc";
  "あいう"
  "hello\tworld\n"
  "hello\x0D\x0A"
  "hello\xA"
  "hello\x{0A}"
  "AAA $foo BBB"
  "AAA $FOO BBB"
  "AAA $$foo BBB"
  "AAA $foo->{x} BBB"
  "AAA $foo->[3] BBB"
  "AAA $foo->{x}[3] BBB"
  "AAA $@ BBB"
  "\N{U+3042}\N{U+3044}\N{U+3046}"

String Literal Escape Characters

String literal escape characters Descriptions
\0 ASCII 0x00 NUL
\a ASCII 0x07 BEL
\t ASCII 0x09 HT
\n ASCII 0x0A LF
\f ASCII 0x0C FF
\r ASCII 0x0D CR
\" ASCII 0x22 "
\$ ASCII 0x24 $
\' ASCII 0x27 '
\\ ASCII 0x5C \
Octal Escape Character An ASCII character
Hexadecimal Escape Character An ASCII character
Unicode escape character An UTF-8 character
Raw escape character The value of raw escape character

Unicode Escape Character

The Unicode escape character is the way to write an UTF-8 character using an Unicode code point that is written by hexadecimal numbers 0-9a-fA-F.

The Unicode escape character can be used as an escape character of the string literal.

The Unicode escape character begins with N{U+.

And is followed by one or more 0-9a-fA-F.

And ends with }.

Compilation Errors:

If the Unicode code point is not a Unicode scalar value, a compilation error occurs.

Examples:

  # あいう
  "\N{U+3042}\N{U+3044}\N{U+3046}"
  
  # くぎが
  "\N{U+304F}\N{U+304E}\N{U+304c}"

Raw Escape Character

The raw escape character is the escapa character that <\> has no effect and \ is interpreted as ASCII \.

For example, \s is ASCII chracters \s, \d is ASCII chracters <\d>.

The raw escape character can be used as an escape character of the string literal.

The raw escape character is designed to be used by regular expression classes such as Regex.

The list of raw escape characters.

  # Raw excape literals
  \! \# \% \& \( \) \* \+ \, \- \. \/
  \: \; \< \= \> \? \@
  \A \B \D \G \H \K \N \P \R \S \V \W \X \Z
  \[ \] \^ \_ \`
  \b \d \g \h \k \p \s \v \w \z
  \{ \| \} \~

Octal Escape Character

The octal escape character is the way to write an ASCII code using octal numbers 0-7.

The octal escape character can be used as an escape character of the string literal and the character literal.

The octal escape character begins with \o{, and it must be followed by one to three 0-7, and ends with }.

Or the octal escape character begins with \0, \1, \2, \3, \4, \5, \6, \7, and it must be followed by one or two 0-7.

  # Octal escape ch1racters in ch1racter literals
  '\0'
  '\012'
  '\003'
  '\001'
  '\03'
  '\01'
  '\077'
  '\377'

  # Octal escape ch1racters in ch1racter literals
  '\o{0}'
  '\o{12}'
  '\o{03}'
  '\o{01}'
  '\o{3}'
  '\o{1}'
  '\o{77}'
  '\o{377}'

  # Octal escape ch1racters in string literals
  "Foo \0 Bar"
  "Foo \012 Bar"
  "Foo \003 Bar"
  "Foo \001 Bar"
  "Foo \03  Bar"
  "Foo \01  Bar"
  "Foo \077 Bar"
  "Foo \377 Bar"

  # Octal escape ch1racters in string literals
  "Foo \o{12} Bar"
  "Foo \o{12} Bar"
  "Foo \o{03} Bar"
  "Foo \o{01} Bar"
  "Foo \o{3}  Bar"
  "Foo \o{1}  Bar"
  "Foo \o{77} Bar"
  "Foo \o{377} Bar"

Hexadecimal Escape Character

The hexadecimal escape character is the way to write an ASCII code using hexadecimal numbers 0-9a-fA-F.

The hexadecimal escape character can be used as an escape character of the string literal and the character literal.

The hexadecimal escape character begins with \x.

And is followed by one or two 0-9a-fA-F.

The hexadecimal numbers can be sorrounded by { and }.

  # Hexadecimal escape characters in character literals
  '\xab'
  '\xAB'
  '\x0D'
  '\x0A'
  '\xD'
  '\xA'
  '\xFF'
  '\x{A}'

  # Hexadecimal escape characters in string literals
  "Foo \xab  Bar"
  "Foo \xAB  Bar"
  "Foo \x0D  Bar"
  "Foo \x0A  Bar"
  "Foo \xD   Bar"
  "Foo \xA   Bar"
  "Foo \xFF  Bar"
  "Foo \x{A} Bar"

Single-Quoted String Literal

A single-quoted string literal represents a constant string value in source codes.

The return type is the string type.

A character literal begins with q'.

And is followed by zero or more than zero UTF-8 character, or escape characters.

And ends with '.

Compilation Errors:

A single-quoted string literal must be end with '. Otherwise a compilation error occurs.

If the escape character in a single-quoted string literal is invalid, a compilation error occurs.

Examples:

  # Single-quoted string literals
  q'abc';
  q'abc\'\\';

Single-Quoted String Literal Escape Characters

Single-quoted string literal escape characters Descriptions
\\ ASCII 0x5C \
\' ASCII 0x27 '

Bool Literal

The bool literal is a literal to represent a bool value in source codes.

true

true is the alias for the TRUE method of Bool.

  true

Examples:

  # true
  my $is_valid = true;

false

false is the alias for FALSE method of Bool.

  false

Examples:

  # false
  my $is_valid = false;

Variable Expansion

The variable expasion is the feature to embed getting local variable, getting class variables, dereference, "Getting Field" in getting field, getting array element, "Getting Exception Variable" in getting exception variable into the string literal.

  "AAA $foo BBB"
  "AAA $FOO BBB"
  "AAA $$foo BBB"
  "AAA $foo->{x} BBB"
  "AAA $foo->[3] BBB"
  "AAA $foo->{x}[3] BBB"
  "AAA $foo->{x}->[3] BBB"
  "AAA $@ BBB"
  "AAA ${foo}BBB"

The above codes are convarted to the following codes.

  "AAA " . $foo . " BBB"
  "AAA " . $FOO . " BBB"
  "AAA " . $$foo . " BBB"
  "AAA " . $foo->{x} . " BBB"
  "AAA " . $foo->[3] . " BBB"
  "AAA " . $foo->{x}[3] . " BBB"
  "AAA " . $foo->{x}->[3] . " BBB"
  "AAA " . $@ . "BBB"
  "AAA " . ${foo} . "BBB"

The getting field doesn't contain space characters between { and }.

The index of getting array element must be a constant value. The getting array doesn't contain space characters between [ and ].

The end $ is not interpreted as a variable expansion.

  "AAA$"

Fat Comma

The fat comma => is a separator.

  =>

The fat comma is an alias for Comma ,.

  # Comma
  ["a", "b", "c", "d"]
  
  # Fat Comma
  ["a" => "b", "c" => "d"]

If the characters of LEFT_OPERAND of the fat camma is not wrapped by " and the characters are a symbol name that does'nt contain ::, the characters are treated as a string literal.

  # foo_bar2 is treated as "foo_bar2"
  [foo_bar2 => "Mark"]

  ["foo_bar2" => "Mark"]

Here Document

Here document is syntax to write a string literal in multiple lines without escapes and variable expansions.

  <<'HERE_DOCUMENT_NAME';
  line1
  line2
  line...
  HERE_DOCUMENT_NAME

Here document syntax begins with <<'HERE_DOCUMENT_NAME'; + a line terminator. HERE_DOCUMENT_NAME is a here document name.

A string begins from the next line.

Here document syntax ends with the line that begins HERE_DOCUMENT_NAME + a line terminator.

Compilation Errors:

<<'HERE_DOCUMENT_NAME' cannot contains spaces. If so, a compilation error occurs.

Examples:

  # Here document
  my $string = <<'EOS';
  Hello
  World
  EOS
  
  # No escapes and variable expaneions are performed.
  my $string = <<'EOS';
  $foo
  \t
  \
  EOS

Here Document Name

Here document name is composed of a-z, A-Z, _, 0-9.

Compilaition Errors:

The length of a here document name must be greater than or equal to 0. Otherwise a compilation error occurs.

A here document name cannot start with a number. If so, a compilation error occurs.

A here document name cannot contain __. If so, a compilation error occurs.

See Also

Copyright & License

Copyright (c) 2023 Yuki Kimoto

MIT License

1 POD Error

The following errors were encountered while parsing the POD:

Around line 964:

Non-ASCII character seen before =encoding in '"あいう"'. Assuming UTF-8