The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

TITLE

Synopsis 2: Bits and Pieces

AUTHOR

Larry Wall <larry@wall.org>

VERSION

  Maintainer: Larry Wall <larry@wall.org>
  Date: 10 Aug 2004
  Last Modified: 27 Oct 2005
  Number: 2
  Version: 9

This document summarizes Apocalypse 2, which covers small-scale lexical items and typological issues. (These Synopses also contain updates to reflect the evolving design of Perl 6 over time, unlike the Apocalypses, which are frozen in time as "historical documents". These updates are not marked--if a Synopsis disagrees with its Apocalypse, assume the Synopsis is correct.)

Atoms

  • In the abstract, Perl is written in Unicode, and has consistent Unicode semantics regardless of the underlying text representations.

  • Perl can count Unicode line and paragraph separators as line markers, but that behavior had better be configurable so that Perl's idea of line numbers matches what your editor thinks about Unicode lines.

  • Unicode horizontal whitespace is counted as whitespace, but it's better not to use thin spaces where they will make adjoining tokens look like a single token. On the other hand, Perl doesn't use indentation as syntax, so you are free to use any whitespace anywhere that whitespace makes sense.

Molecules

  • In general, whitespace is optional in Perl 6 except where it is needed to separate constructs that would be misconstrued as a single token or other syntactic unit. (In other words, Perl 6 follows the standard "longest-token" principle, or in the cases of large constructs, a "prefer shifting to reducing" principle.)

    This is an unchanging deep rule, but the surface ramifications of it change as various operators and macros are added to or removed from the language, which we expect to happen because Perl 6 is designed to be a mutable language. In particular, there is a natural conflict between postfix operators and infix operators, either of which may occur after a term. If a given token may be interpreted as either a postfix operator or an infix operator, the infix operator requires space before it, and the postfix operator requires a lack of space before it, unless it begins with a dot. (Infix operators may not start with a dot.) For instance, if you were to add your own infix:<++> operator, then it must have space before it, and the normal autoincrementing postfix:<++> operator may not have space before it, or must be written as .++ instead. In standard Perl 6, however, it doesn't matter if you put a space in front of postfix:<++>. To be future proof, though, you should omit the space or use dot.

  • Single-line comments work as in Perl 5, starting with a # character and ending with the subsequent newline. They count as whitespace for purposes of separation. Certain quoting tokens may make use of # characters as delimiters without starting a comment.

  • Multiline comments will be provided by extending the syntax of POD to nest =begin COMMENT/=end COMMENT correctly without the need for =cut. (Doesn't have to be "COMMENT"--any unrecognized POD stream will do to make it a comment. Bare =begin and =end probably aren't good enough though, unless you want all your comments to end up in the manpage...)

    We have single paragraph comments with =for COMMENT as well. That lets =for keep its meaning as the equivalent of a =begin and =end combined. As with =begin and =end, a comment started in code reverts to code afterwards.

  • Intra-line comments will not be supported in standard Perl (but it would be trivial to declare them as a macro).

Built-In Data Types

  • In support of OO encapsulation, there is a new fundamental datatype: "opaque". External access to opaque objects is always through method calls, even for attributes.

  • Perl 6 will have an optional type system that helps you write safer code that performs better. The compiler is free to infer what type information it can from the types you supply, but will not complain about missing type information unless you ask it to.

  • Perl 6 will support the notion of "properties" on various kinds of objects. Properties are like object attributes, except that they're managed by the individual object rather than by the object's class. According to A12, properties are actually implemented by a kind of mixin mechanism, and such mixins are accomplished by the generation of an individual anonymous class for the object (unless an identical anonymous class already exists and can safely be shared).

  • Properties applied to compile-time objects such as variables and classes are also called "traits". Traits are not expected to change at run time. Changing run-time properties should be done via mixin instead, so that the compiler can optimize based on declared traits.

  • Perl 6 is an OO engine, but you're not generally required to think in OO when that's inconvenient. However, some built-in concepts such as filehandles will be more object-oriented in a user-visible way than in Perl 5.

  • A variable's type is an interface contract indicating what sorts of values the variable may contain. More precisely, it's a promise that the object or objects contained in the variable are capable of responding to the methods of the indicated "role". See A12 for more about roles. A variable object may itself be bound to a container type that specifies how the container works without necessarily specifying what kinds of things it contains.

  • You'll be able to ask for the length of an array, but it won't be called that, because "length" does not specify units. So .elems is the number of array elements. (You can also ask for the length of an array in bytes or codepoints or graphemes. Same for strings. There is no .length on strings either.)

  • my Dog $spot by itself does not automatically call a Dog constructor. The actual constructor syntax turns out to be my Dog $spot.=new;, making use of the .= mutator method-call syntax.

  • If you say

        my int @array is MyArray;

    you are declaring that the elements of @array are integers, but that the array itself is implemented by the MyArray class. Untyped arrays and hashes are still perfectly acceptable, but have the same performance issues they have in Perl 5.

  • Built-in object types start with an uppercase letter: Int, Num, Complex, Str, Bit, Ref, Scalar, Array, Hash, Rule and Code. Non-object (value) types are lowercase: int, num, complex, str, bit, and ref. Value types are primarily intended for declaring compact array storage. However, Perl will try to make those look like their corresponding uppercase types if you treat them that way. (In other words, it does autoboxing. Note, however, that sometimes repeated autoboxing can slow your program more than the native type can speed it up.)

  • All Object types support the "undefined" role, and may contain an alternate set of attributes when undefined, such as the unthrown exception explaining why the value is undefined. Non-object types are not required to support undefinedness, but it is an error to assign an undefined value to such a location.

  • Regardless of whether they are defined, all objects support a .meta method that returns the class instance managing the current kind of object. Any object (whether defined, undefined, or somewhere between) can be used as a "kind" when stored in a type variable such as ^MyClass.

  • Perl 6 will intrinsically support big integers and rationals through its system of type declarations. Int automatically supports promotion to arbitrary precision. Rat supports arbitrary precision rational arithmetic. Value types like int and num imply the natural machine representation for integers and floating-point numbers, respectively, and do not promote to arbitrary precision. Untyped numeric scalars use Int and Num semantics rather than int and num.

  • Perl 6 should by default make standard IEEE floating point concepts visible, such as Inf (infinity) and NaN (not a number). It should also be at least pragmatically possible to throw exceptions on overflow.

  • A Str is a Unicode string object of some sort. A str is a stringish view of an array of integers, and has no Unicode or character properties without explicit conversion to some kind of Str. Typically it's an array of bytes serving as a buffer.

Names and Variables

  • The $pkg'var syntax is dead. Use $pkg::var instead.

  • Perl 6 includes a system of "sigils" to mark the fundamental structural type of a variable:

        $   scalar
        @   ordered array
        %   unordered hash (associative array)
        &   code
        ^   class/type/kind variable

    In addition, package literals may be introduced with :: when they would otherwise be misinterpreted:

        ::  package/module/class/role/type literal

    Within a declaration, the & sigil also declares the visibility of the subroutine name without the sigil within the scope of the declaration.

    Within a declaration, the ^ sigil also declares the visibility of the type name without the sigil within the scope of the declaration. The first such declaration within a scope is assumed to be an unbound type, and takes the actual type of its associated argument. With subsequent declarations the use of the sigil is optional, since the bare type name is also declared.

  • Unlike in Perl 5, you may no longer put whitespace between a sigil and its following name or construct.

  • Ordinary sigils indicate normally scoped variables, either lexical or package scoped. Oddly scoped variables include a secondary sigil (a "twigil") that indicates what kind of strange scoping the variable is subject to:

        $foo        ordinary scoping
        $.foo       object attribute accessor
        $^foo       self-declared formal parameter
        $*foo       global variable
        $+foo       environmental variable
        $?foo       compiler hint variable
        $=foo       pod variable
        $<foo>      match variable, short for $/{'foo'}
        $:foo       :foo($foo) shortcut (allowed only on ordinary variables)

    Most variables with twigils are implicitly declared or assumed to be declared in some other scope, and don't need a "my" or "our". Attribute variables are declared with has, though, and environment variables are declared somewhere in the dynamic scope with env.

  • Sigils are now invariant. $ always means a scalar variable, @ an array variable, and % a hash variable, even when subscripting. Array and hash variable names in scalar context automatically produce references.

  • In string contexts container references automatically dereference to appropriate (white-space separated) string values. In numeric contexts, the number of elements in the container is returned. In boolean contexts, a true value is returned if and only if there are any elements in the container.

  • To get a Perlish representation of any data value, use the .perl method. This will put quotes around strings, square brackets around list values, curlies around hash values, constructors around objects, etc., such that standard Perl could reparse the result.

  • To get a formatted representation of any scalar data value, use the .as('%03d') method to do an implicit sprintf on the value. To format an array value separated by commas, supply a second argument: .as('%03d', ', '). To format a hash value or list of pairs, include formats for both key and value in the first string: .as('%s: %s', "\n").

  • Subscripts now consistently dereference the reference produced by whatever was to their left. Whitespace is not allowed between a variable name and its subscript. However, there is a corresponding "dot" form of each subscript (@foo.[1] and %bar.{'a'}) which allows optional whitespace before the dot (except when interpolating). Constant string subscripts may be placed in angles, so %bar.{'a'} may also be written as %bar<a> or %bar.<a>.

  • Slicing is specified by the nature of the subscript, not by the sigil.

  • The context in which a subscript is evaluated is no longer controlled by the sigil either. Subscripts are always evaluated in list context on the assumption that slicing behavior is desired. If you need to force inner context to scalar, we now have convenient single-character context specifiers such as + for numbers and ~ for strings.

  • There is a need to distinguish list assignment from list binding. List assignment works exactly as it does in Perl 5, copying the values. There's a new := binding operator that lets you bind names to array and hash references without copying, just as function arguments are bound to formal parameters. See A6.

  • An argument list object (List) may be created with backslashed parens:

        $args = \(1,2,3,:mice<blind>)

    A List's values are parsed as ordinary expressions. By default a List is lazy. If all of a List's arguments are fully evaluated (such as when all the arguments are constants), the List is promoted to being an immutable Tuple. (You can use a Tuple anywhere you can use a List, so in general it doesn't matter.)

  • A signature object may be created with coloned parens:

        my ^MySig = :(Int,Num,Complex, Status :mice)

    A signature's values are parsed as declarations rather than ordinary expressions. You may not put arbitrary expressions, but you may, for instance stack multiple types that all must match:

        :(Any Num Dog|Cat $numdog)

    Such a signature may be used within another signature to apply additional type constraints. When applied to a tuple argument, the signature allows you to specify the types of parameters that would otherwise be untyped:

        :(Any Num Dog|Cat $numdog, MySig *$a ($i,$j,$k,$mousestatus))
  • Unlike in Perl 5, the notation &foo merely creates a reference to function "foo" without calling it. Any function reference may be dereferenced and called using parens (which may, of course, contain arguments). Whitespace is not allowed before the parens, but there is a corresponding .() operator, which allows you to insert optional whitespace before the dot.

  • With multis, &foo may not be sufficient to uniquely name a specific function. In that case, the type may be refined by using a signature literal as a postfix operator:

        &foo:(Int,Num)

    It still just returns a function reference. A call may also be partially applied by using a tuple literal as a postfix operator:

        &foo\(1,2,3,:mice<blind>)

    This is really just a shorthand for

        &foo.assuming(1,2,3,:mice<blind>)
  • Slicing syntax is covered in S9. Multidimensional slices will be done with semicolons between individual slice subscripts. Each such slice is evaluated lazily.

  • Slicing hashes to return pairs rather than values should probably be done with an optional selection argument to .pairs() or .kv().

  • A hash reference in numeric context returns the number of pairs contained in the hash. A hash reference in a boolean context returns true if there are any pairs in the hash. In either case, any intrinsic iterator would be reset. (If hashes do carry an intrinsic iterator (as they do in Perl 5), there will be a .reset method on the hash object to reset the iterator explicitly.)

  • Sorting a list of pairs should sort on their keys by default. For more on sort see S29. (If there is no S29 yet, write one.)

  • Many of the special variables of Perl 5 are going away. Those that apply to some object such as a filehandle will instead be attributes of the appropriate object. Those that are truly global will have global alphabetic names, such as $*PID or @*ARGS. Certain of these global values may retain punctuational shortcuts, such as $! for $*ERROR.

  • Any remaining special variables will be lexically scoped. This includes $_ and @_, as well as the new $/, which is the return value of the last regex match. $0, $1, $2, etc., are aliases into the $/ object.

  • The $#foo notation is dead. Use @foo.end or [-1] instead. (Or @foo.shape[$dimension] for multidimensional arrays.)

  • A2 proposes $(...) and @(...) to interpolate arbitrary expressions, but these have been replaced with interpolation of curlies (closures).

Names

  • Ordinary package-qualified names look like in Perl 5:

        $Foo::Bar::baz      # the $baz variable in package Foo::bar

    Sometimes it's clearer to keep the sigil with the variable name, so an alternate way to write this is:

        Foo::Bar::<$baz>
  • The following pseudo-package names are reserved in the first position:

        MY
        OUR
        GLOBAL
        OUTER
        CALLER
        ENV
        SUPER
        COMPILING

    Other all-caps names are semi-reserved. We may add more of them in the future, so you can protect yourself from future collisions by using mixed case on your top-level packages. (We promise not to break any existing top-level CPAN package, of course.)

  • You may interpolate a string into a package or variable name using ::($expr) where you'd ordinarily put a package or variable name. The string is allowed to contain additional instances of ::, which will be interpreted as package nesting. You may only interpolate entire names, since the construct starts with ::, and either ends immediately or is continued with another :: outside the curlies. Most symbolic references are done with this notation:

        $foo = "Foo";
        $foobar = "Foo::Bar";
        $::($foo)           # package-scoped $Foo
        $::("MY::$foo")     # lexically-scoped $Foo
        $::("*::$foo")      # global $Foo
        $::($foobar)        # $Foo::Bar
        $::($foobar)::baz   # $Foo::Bar::baz
        $::($foo)::Bar::baz # $Foo::Bar::baz
        $::($foobar)baz     # ILLEGAL at compile time (no operator baz)

    Note that unlike in Perl 5, initial :: doesn't imply global. Package names are searched for from inner lexical scopes to outer, then from inner packages to outer. Variable names are searched for from inner lexical scopes to outer, but unlike package names are looked for in only the current package and the global package. The global namespace is the last place it looks in either case. You must use the * (or GLOBAL) package on the front of the string argument to force the search to start in the global namespace. Use the MY pseudopackage to limit the scopes to lexical, and OUR to limit the scopes to package.

  • To do direct lookup in a package's symbol table without scanning, treat the package name as a hash:

        Foo::Bar::{'&baz'}  # same as &Foo::Bar::baz
        GLOBAL::<$IN>       # Same as $*IN
        Foo::<::Bar><::Baz> # same as Foo::Bar::Baz

    Unlike ::() symbolic references, this does not parse the argument for ::, nor does it initiate a namespace scan from that initial point.

  • The current lexical symbol table may now be referenced through the pseudo-package MY. The current package symbol table is visible as pseudo-package OUR. The OUTER name refers to the MY symbol table immediately surrounding the current MY, and OUTER::OUTER is the one surrounding that one.

        our $foo = 41;
        say $::foo;         # prints 41, :: is no-op
        {
            my $foo = 42;
            say MY<$foo>;           # prints "42"
            say $MY::foo;           # same thing
            say $::foo;             # same thing, :: is no-op here
    
            say OUR<$foo>;          # prints "41"
            say $OUR::foo;          # same thing
    
            say OUTER<$foo>;        # prints "41" (our $foo is also lexical)
            say $OUTER::foo;        # same thing
        }

    You may not use any lexically scoped symbol table, either by name or by reference, to add symbols to a lexical scope that is done compiling. (We reserve the right to relax this if it turns out to be useful though.)

  • The CALLER package refers to the lexical scope of the (dynamically scoped) caller. The caller's lexical scope is allowed to hide any variable except $_ from you. In fact, that's the default, and a lexical variable must be declared using "env" rather than my to be visible via CALLER. ($_ is always environmental. [Conjectural: so are $! and $/.]) If the variable is not visible in the caller, it returns failure.

    An explicit env declaration is implicitly readonly. You may add is rw to allow subroutines from modifying your value. $_ is rw by default. In any event, your lexical scope can access the variable as if it were an ordinary my; the restriction on writing applies only to subroutines.

  • The ENV pseudo-package is just like CALLER except that it scans outward through all dynamic scopes until it finds an environmental variable of that name in that caller's lexical scope. (Use of $+FOO is equivalent to ENV::<$FOO> or $ENV::FOO.) If after scanning all the lexical scopes of each dynamic scope, there is no variable of that name, it looks in the * package. If there is no variable in the * package, it looks in %*ENV for the name, that is, in the environment variables passed to program. If the value is not found there, it returns failure. Note that $+_ is always the same as CALLER::<$_> since all callers have a $_ that is automatically considered environmental. Note also that ENV and $+ always skip the current scope, since you can always name the variable directly without the ENV or + if it's been declared env in the current lexical scope.

    Subprocesses are passed only the global %*ENV values. They do not see any lexical variables or their values. The ENV package is only for internal overriding of environmental parameters. Change %*ENV to change what subprocesses see. [Conjecture: This might be suboptimal in the abstract, but it would be difficult to track the current set of environment variable names unless we actually passed around a list. The alternative seems to be to walk the entire dynamic scope and reconstruct %*ENV for each subprogram call, and then we only slow down subprogram calls.]

  • There is no longer any special package hash such as %Foo::. Just subscript the package object itself as a hash object, the key of which is the variable name, including any sigil. The package object can be derived from a type name by use of the :: postfix operator:

        MyType .:: .{'$foo'}
        MyType::<$foo>              # same thing

    (Directly subscripting the type with either square brackets or curlies is reserved for various generic type-theoretic operations. In most other matters type names and package names are interchangeable.)

    Typeglobs are gone. Use binding (:= or ::=) to do aliasing. Individual variable objects are still accessible through the hash representing each symbol table, but you have to include the sigil in the variable name now: MyPackage::{'$foo'} (or also MyPackage::<$foo> these days).

  • Truly global variables live in the * package: $*UID, %*ENV. (The * may generally be omitted if there is no inner declaration hiding the global name.) $*foo is short for $*::foo, suggesting that the variable is "wild carded" into every package.

  • Standard input is $*IN, standard output is $*OUT, and standard error is $*ERR. The magic command-line input handle is $*ARGS.

  • Magical file-scoped values live in variables with a = secondary sigil. $=DATA is the name of your DATA filehandle, for instance. All pod structures are available through %=POD (or some such). As with *, the = may also be used as a package name: $=::DATA.

  • Magical lexically scoped values live in variables with a ? secondary sigil. These are all values that are known to the compiler, and may in fact be dynamically scoped within the compiler itself, and only appear to be lexically scoped because dynamic scopes of the compiler resolve to lexical scopes of the program. All $? variables are considered constants, and may not be modified after being compiled in, except insofar as the compiler arranges in advance for such variables to be rebound (as is the case with $?SELF).

    $?FILE and $?LINE are your current file and line number, for instance. ? is not a shortcut for a package name like * is. Instead of $?OUTER::SUB you probably want to write OUTE$?SUB.

    Here are some possibilities:

        $?OS        Which os am I compiled for?
        $?OSVER     Which os version am I compiled for?
        $?PERLVER   Which Perl version am I compiled for?
        $?FILE      Which file am I in?
        $?LINE      Which line am I at?
        $?PACKAGE   Which package am I in?
        @?PACKAGE   Which packages am I in?
        $?MODULE    Which module am I in?
        @?MODULE    Which modules am I in?
        ::?CLASS    Which class am I in? (as package name)
        $?CLASS     Which class am I in? (as variable)
        @?CLASS     Which classes am I in?
        ::?ROLE     Which role am I in? (as package name)
        $?ROLE      Which role am I in? (as variable)
        @?ROLE      Which roles am I in?
        $?GRAMMAR   Which grammar am I in?
        @?GRAMMAR   Which grammars am I in?
        $?PARSER    Which Perl grammar was used to parse this statement?
        &?SUB       Which sub am I in?
        @?SUB       Which subs am I in?
        $?SUBNAME   Which sub name am I in?
        @?SUBNAME   Which sub names am I in?
        &?BLOCK     Which block am I in?
        @?BLOCK     Which blocks am I in?
        $?LABEL     Which block label am I in?
        @?LABEL     Which block labels am I in?

    Note that some of these things have parallels in the * space at run time:

        $*OS        Which OS I'm running under
        $*OSVER     Which OS version I'm running under
        $*PERLVER   Which Perl version I'm running under

    You should not assume that these will have the same value as their compile-time cousins.

  • While $? variables are constant to the run time, the compiler has to have a way of changing these values at compile time without getting confused about its own $? variables (which were frozen in when the compile-time code was itself compiled). The compiler can talk about these compiler-dynamic values using the COMPILING pseudopackage.

    References to COMPILING variables are automatically hoisted into the context currently being compiled. Setting or temporizing a COMPILING variable sets or temporizes the incipient $? variable in the surrounding lexical context that is being compiled. If nothing in the context is being compiled, an exception is thrown.

        $?FOO // say "undefined";   # probably says undefined
        BEGIN { COMPILING::<$?FOO> = 42 }
        say $?FOO;                  # prints 42
        {
            say $?FOO;              # prints 42
            BEGIN { temp COMPILING::<$?FOO> = 43 } # temporizes to *compiling* block
            say $?FOO;              # prints 43
            BEGIN { COMPILING::<$?FOO> = 44 }
            say $?FOO;              # prints 44
            BEGIN { say COMPILING::<$?FOO> }        # prints 44, but $?FOO probably undefined
        }
        say $?FOO;                  # prints 42 (left scope of temp above)
        $?FOO = 45;                 # always an error
        COMPILING::<$?FOO> = 45;    # an error unless we are compiling something

    Note that CALLE$?FOO might discover the same variable as COMPILING::<$?FOO>, but only if the compiling context is the immediate caller. Likewise OUTER::<$?FOO> might or might not get you to the right place. In the abstract, COMPILING::<$?FOO> goes outwards dynamically until it finds a compiling scope, and so is guaranteed to find the "right" $?FOO. (In practice, the compiler hopefully keeps track of its current compiling scope anyway, so no scan is needed.)

    Perceptive readers will note that this subsumes various "compiler hints" proposals. Crazy readers will wonder whether this means you could set an initial value for other lexicals in the compiling scope. The answer is yes. In fact, this mechanism is probably used by the exporter to bind names into the importer's namespace.

  • The currently compiling Perl parser is switched by modifying COMPILING::<$?PARSER>. Lexically scoped parser changes should temporize the modification. Changes from here to end-of-compilation unit can just assign or bind it. In general, most parser changes involve deriving a new grammar and then pointing COMPILING::<$?PARSER> at that new grammar. Alternately, the tables driving the current parser can be modified without derivation, but at least one level of anonymous derivation must intervene from the standard Perl grammar, or you might be messing up someone else's grammar. Basically, the current grammar has to belong only to the current compiling scope. It may not be shared, at least not without explicit consent of all parties. No magical syntax at a distance. Consent of the governed, and all that.

Literals

  • Underscores are allowed between any two digits in a literal number, where the definition of digit depends on the radix. Underscores are not allowed anywhere else in any numeric literal, including next to the radix point or exponentiator.

  • Initial 0 no longer indicates octal numbers by itself. You must use an explicit radix marker for that. Pre-defined radix prefixes include:

        0b          base 2, digits 0..1
        0o          base 8, digits 0..7
        0d          base 10, digits 0..9
        0x          base 16, digits 0..9,a..f (case insensitive)
  • The general radix form of a number involves prefixing with the radix followed by a colon:

        10:42               same as 0d42 or 42
        16:dead_beef        same as 0xdeadbeef
        8:177777            same as 0o177777 (65535)
        2:1.1               same as 0b1.1 (0d1.5)

    Extra digits are assumed to be represented by 'a'..'z', so you can go up to base 36. (Use 'a' and 'b' for base twelve, not 't' and 'e'.)

    Any radix may include a fractional part. A dot will always be interpreted as a radix point if possible by the longest-token rule, so to call a method on a literal with a base greater than 10, the safe thing is to put a space before the dot:

        16:dead_beef.face   # fraction
        16:dead_beef .face  # method call
  • Only base 10 (in any form) allows an additional exponentiator starting with 'e' or 'E'. All other radixes must rely on the constant folding properties of ordinary multiplication and exponentiation.

        16:dead_beef * 16**8

    It's true that only radixes that define 'e' as a digit are ambiguous that way, but with any radix it's not clear whether the exponentiator should be 10 or the radix, and this makes it explicit:

        0b1.1e10            illegal, could mean any of:
    
        2:1.1 * 2**10       1536
        2:1.1 * 10**10      15,000,000,000
        2:1.1 * 2:10**2:10  6

    The generic string-to-number converter will recognize all of these forms (including the * form, since constant folding is not available to the run time). Also allowed in strings are leading plus or minus, and maybe a trailing Units type for an implied scaling. Note also that leading 0 by itself never implies octal in Perl 6. Also, the hextonum converter function will interpret leading 0b or 0d as hex digits, not radix switchers.

  • The qw/foo bar/ quote operator now has a bracketed form: <foo bar>. When used as a subscript it performs a slice equivalent to {'foo','bar'}. Much like the relationship between single quotes and double quotes, single angles do not interpolate while double angles do. The double angles may be written either with French quotes, «$foo @bar[]», or with "Texas" quotes, <<$foo @bar[]>>, as the ASCII workaround. The implicit split is done after interpolation, but respects quotes in a shell-like fashion, so that «'$foo' "@bar[]"» is guaranteed to produce a list of two "words" equivalent to ('$foo', "@bar[]"). Pair notation is also recognized inside «...» and such "words" are returned as Pair objects.

  • Generalized quotes may now take adverbs:

        Short       Long            Meaning
        =====       ====            =======
        :x          :exec           Execute as command and return results
        :w          :words          Split result on words (no quote protection)
        :ww         :quotewords     Split result on words (with quote protection)
        :t          :to             Interpret result as heredoc terminator
        :0          :raw            No escapes at all (unless otherwise adverbed)
        :1          :single         Interpolate \\, \q and \' (or whatever)
        :2          :double         Interpolate all the following
        :s          :scalar         Interpolate $ vars
        :a          :array          Interpolate @ vars
        :h          :hash           Interpolate % vars
        :f          :function       Interpolate & calls
        :c          :closure        Interpolate {...} expressions
        :b          :backslash      Interpolate \n, \t, etc. (implies :m)

    Any of these may omit the colon after an initial "q", so we automatically get the forms:

        Form        Same as
        ====        =======
        qx//        q:x//
        qw//        q:w//
        qww//       q:ww//
        qt//        q:t//
        q0//        q:0//
        q1//        q:1//   (same as q//)
        q2//        q:2//   (same as qq//)
        qs//        q:s//
        qa//        q:a//
        qh//        q:h//
        qf//        q:f//
        qc//        q:c//
        qb//        q:b//

    If this is all too much of a hardship, you can define your own quote adverbs and operators. All the uppercase adverbs are reserved for user-defined quotes. All of Unicode above Latin-1 is reserved for user-defined quotes.

  • A consequence of the previous item is that we can now say:

        %hash = qw:c/a b c d {@array} {%hash}/;

    or

        %hash = qq:w/a b c d {@array} {%hash}/;

    to interpolate items into a qw. Conveniently, arrays and hashes interpolate with only whitespace separators by default, so the subsequent split on whitespace still works out. (But the built-in «...» quoter automatically does interpolation equivalent to qq:ww/.../. The built-in <...> is equivalent to q:w/.../.)

  • Whitespace is allowed between the "q" and its adverb: q :w /.../.

  • For these "q" forms the choice of delimiters has no influence on the semantics. That is, '', "", <>, «», ``, (), [], and {} have no special significance when used in place of // as delimiters. There may be whitespace or a colon before the opening delimiter. (Which is mandatory for parens because q() is a subroutine call and q:w(0) is an adverb with arguments). Other brackets may also require a colon or space when they would be understood as an argument to an adverb in something like q:z<foo>//. A colon may never be used as the delimiter since it will always be taken to mean something else regardless of what's in front of it.

  • New quoting constructs may be declared as macros:

        macro quote:<qX> (*%adverbs) {...}

    Note: macro adverbs are automatically evaluated at macro call time if the adverbs are included in the parse. If the adverbs are to affect the parsing of the quoted text of the macro, then the text must be parsed by the body of the macro rather than by an is parsed rule.

  • You may interpolate double-quotish text into a single-quoted string using the \qq[...] construct. Other "q" forms also work, including user-defined ones, as long as they start with "q". Otherwise you'll just have to embed your construct inside a \qq[...].

  • Bare scalar variables always interpolate in double-quotish strings. Bare array, hash, and subroutine variables may never be interpolated. However, any sigiled variable may start an interpolation if it is followed by a sequence of one or more bracketed dereferencers: that is, any of 1) an array subscript, 2) a hash subscript, 3) a set of parentheses indicating a function call, 4) any of 1 through 3 in their "dot" form, 5) a dot-form method call that includes argument parentheses, or 6) a sequence of one or more unparenthesized method call if followed by any of 1 through 5. In other words, this is legal:

        "Val = $a.ord.as('%x')\n"

    and is equivalent to

        "Val = { $a.ord.as('%x') }\n"
  • In order to interpolate an entire array, it's necessary now to subscript with empty brackets:

        print "The answers are @foo[]\n"

    Note that this fixes the spurious "@" problem in double-quoted email addresses.

    As with Perl 5 array interpolation, the elements are separated by a space. (Except that a space is not added if the element already ends in some kind of whitespace. In particular, a list of pairs will interpolate with a tab between the key and value, and a newline after the pair.)

  • In order to interpolate an entire hash, it's necessary to subscript with empty braces or angles:

        print "The associations are:\n%bar{}"
        print "The associations are:\n%bar<>"

    Note that this avoids the spurious "%" problem in double-quoted printf formats.

    By default, keys and values are separated by tab characters, and pairs are terminated by newlines. (This is almost never what you want, but if you want something polished, you can be more specific.)

  • In order to interpolate the result of a sub call, it's necessary to include both the sigil and parentheses:

        print "The results are &baz().\n"

    The function is called in scalar context. (If it returns a list, that list is interpolated as if it were an array.)

  • In order to interpolate the result of a method call without arguments, it's necessary to include parentheses or extend the call with something ending in brackets:

        print "The attribute is $obj.attr().\n"
        print "The attribute is $obj.attr<Jan>.\n"

    The method is called in scalar context. (If it returns a list, that list is interpolated as if it were an array.)

    It is allowed to have a cascade of argumentless methods as long as the last one ends with parens:

        print "The attribute is %obj.keys.sort.reverse().\n"

    (The cascade is basically counted as a single method call for the end-bracket rule.)

  • A class method can be called as a method if you use the type sigil:

        print "The dog bark is ^Dog.bark().\n"

    Again, the parens are required.

  • Multiple dereferencers may be stacked as long as each one ends in some kind of bracket:

        print "The attribute is @baz[3](1,2,3){$xyz}<blurfl>.attr().\n"

    Note that the final period above is not taken as part of the expression since it doesn't introduce a bracketed dereferencer. Spaces are not allowed between the dereferencers even when you use the dotted forms.

  • A bare closure also interpolates in double-quotish context. It may not be followed by any dereferencers, since you can always put them inside the closure. The expression inside is evaluated in scalar (string) context. You can force list context on the expression using either the * or list operator if necessary.

    The following means the same as the previous example.

        print "The attribute is { @baz[3](1,2,3){$xyz}<blurfl>.attr }.\n"

    The final parens are unnecessary since we're providing "real" code in the curlies. If you need to have double quotes that don't interpolate curlies, you can explicitly remove the capability:

        qq:c(0) "Here are { $two uninterpolated } curlies";

    Alternately, you can build up capabilities from single quote to tell it exactly what you do want to interpolate:

        q:s 'Here are { $two uninterpolated } curlies';
  • Secondary sigils (twigils) have no influence over whether the primary sigil interpolates. That is, if $a interpolates, so do $^a, $*a, $=a, $?a, $.a, etc. It only depends on the $.

  • No other expressions interpolate. Use curlies.

  • The old disambiguation syntax:

        ${foo[$bar]}
        ${foo}[$bar]

    is dead. Use closure curlies instead:

        {$foo[$bar]}
        {$foo}[$bar]

    (You may be detecting a trend here...)

  • To interpolate an unparenthesized class method, use curlies: "{Dog.bark}".

  • To interpolate a topical method, use curlies: "{.bark}".

  • To interpolate a function call without a sigil, use curlies: "{abs $var}".

  • And so on.

  • Backslash sequences still interpolate, but there's no longer any \v to mean "vertical tab", whatever that is...

  • There's also no longer any \L, \U, \l, \u, or \Q. Use curlies with the appropriate function instead: "{ucfirst $word}".

  • There are no barewords in Perl 6. An undeclared bare identifier will always be taken to mean a subroutine or method name. (Class names (and other type names) are predeclared, or prefixed with the :: package literal marker, or the ^ type sigil.) A consequence of this is that there's no longer any "use strict 'subs'".

  • There's also no "use strict 'refs'" because symbolic dereferences are now syntactically distinguished from hard dereferences. @{$arrayref} must now be a hard reference, while @::($string) is explicitly a symbolic reference. (Yes, this may give fits to the P5-to-P6 translator, but I think it's worth it to separate the concepts. Perhaps the symbolic ref form will admit hard refs in a pinch.)

  • There is no hash subscript autoquoting in Perl 6. Use %x<foo> for constant hash subscripts, or the old standby %x{'foo'}. (It also works to say %x«foo» as long as you realized it's subject to interpolation.)

    But => still autoquotes any bare identifier to its immediate left (horizontal whitespace allowed but not comments). The identifier is not subject to keyword or even macro interpretation. If you say

        $x = do {
            call_something();
            if => 1;
        }

    then $x ends up containing the pair ("if" => 1). Always. (Unlike in Perl 5, where version numbers didn't autoquote.)

    You can also use the :key($value) form to quote the keys of option pairs. To align values of option pairs, you may not use the dot postfix forms:

        :longkey  .($value)
        :shortkey .<string>
        :fookey   .{ $^a <=> $^b }

    These will be interpreted as

        :longkey(1)  .($value)
        :shortkey(1) .<string>
        :fookey(1)   .{ $^a <=> $^b }

    You just have to put spaces inside the parenthesis form to align things.

  • The double-underscore forms are going away:

        Old                 New
        ---                 ---
        __LINE__            $?LINE
        __FILE__            $?FILE
        __PACKAGE__         $?PACKAGE
        __END__             =begin END
        __DATA__            =begin DATA

    The =begin END pod stream is special in that it assumes there's no corresponding =end END before end of file. The DATA stream is no longer special--any POD stream in the current file can be accessed via a filehandle, named as %=POD{'DATA'} and such. Alternately, you can treat a pod stream as a scalar via $=DATA or as an array via @=DATA. Presumably a module could read all its COMMENT blocks from @=COMMENT, for instance. Each chunk of pod comes as a separate array element. You have to split it into lines yourself. Each chunk has a .linenum property that indicates its starting line within the source file.

    There is also a new $?SUBNAME variable containing the name of current lexical sub. The lexical sub itself is &?SUB. The current block is &?BLOCK. If the block has a label, that shows up in $?BLOCKLABEL.

  • Heredocs are no longer written with <<, but with an adverb on any other quote construct:

        print qq:to/END/
            Give $amount to the man behind curtain number $curtain.
            END

    Other adverbs are also allowed:

        print q:c:to/END/
            Give $100 to the man behind curtain number {$curtain}.
            END
  • Here docs allow optional whitespace both before and after terminating delimiter. Leading whitespace equivalent to the indentation of the delimiter will be removed from all preceding lines. If a line is deemed to have less whitespace than the terminator, only whitespace is removed, and a warning may be issued. (Hard tabs will be assumed to be 8 spaces, but as long as tabs and spaces are used consistently that doesn't matter.) A null terminating delimiter terminates on the next line consisting only of whitespace, but such a terminator will be assumed to have no indentation. (That is, it's assumed to match at the beginning of any whitespace.)

Context

  • Perl still has the three main contexts: void, scalar, and list.

  • In addition to undifferentiated scalars, we also have these scalar contexts:

        Context     Type    OOtype  Operator
        -------     ----    ------  --------
        boolean     bit     Bit     ?
        integer     int     Int     int
        numeric     num     Num     +
        string      str     Str     ~

    There are also various reference contexts that require particular kinds of container references.

  • Unlike in Perl 5, references are no longer always considered true. It depends on the state of their .bit property. Classes get to decide which of their values are true and which are false. Individual objects can override the class definition:

        return 0 but true;

Lists

  • List context in Perl 6 is by default lazy. This means a list can contain infinite generators without blowing up. No flattening happens to a lazy list until it is bound to the signature of a function or method at call time (and maybe not even then). We say that such an argument list is "lazily flattened", meaning that we promise to flatten the list on demand, but not before.

  • There is a "list" operator which imposes a list context on its arguments even if list itself occurs in a scalar context. In list context, it flattens lazily. In a scalar context, it returns a reference to the resulting list. (So the list operator really does exactly the same thing as putting a list in parentheses. But it's more readable in some situations.)

  • The * unary operator may be used to force list context on its argument and also defeat any scalar argument checking imposed by subroutine signature declarations. This list flattens lazily. When applied to a scalar value containing an iterator, * causes the iterator's return values be interpolated into the list lazily. Note that * is destructive when applied to a scalar iterator, but non-destructive when applied to an array, even if that array represents an iterator.

    There is an argumentless form of * which may be used within a multi-dimensional array or hash subscript to indicate all of the current set of subscripts available for this dimension. It actually returns a type value of Any, so it can be used in any selector where you would use Any.

  • To force non-lazy list flattening, use the ** unary operator. Don't use it on an infinite generator unless you have a machine with infinite memory, and are willing to wait a long time. It may also be applied to a scalar iterator to force immediate iteration to completion.

    Argumentless ** in a multi-dimensional subscript indicates 0 or more dimensions of * where the number of dimension isn't necessarily known: @foo[1;**;5]. It has a value of List of Any, or something like that. The argumentless * and ** forms are probably only useful in "dimensional" list contexts.

  • Signatures on non-multi subs can be checked at compile time, whereas multi sub and method call signatures can only be checked at run time (in the absence of special instructions to the optimizer). This is not a problem for arguments that are arrays or hashes, since they don't have to care about their context, but just return a reference in any event, which may or may not be lazily flattened. However, function calls in the argument list can't know their eventual context because the method hasn't been dispatched yet, so we don't know which signature to check against. As in Perl 5, list context is assumed unless you explicitly qualify the argument with a scalar context operator.

  • The => operator now constructs Pair objects rather than merely functioning as a comma. Both sides are in scalar context.

  • The .. operator now constructs Range objects rather than merely functioning as an operator. Both sides are in scalar context.

  • There is no such thing as a hash list context. Assignment to a hash produces an ordinary list context. You may assign alternating keys and values just as in Perl 5. You may also assign lists of Pair objects, in which case each pair provides a key and a value. You may, in fact, mix the two forms, as long as the pairs come when a key is expected. If you wish to supply a Pair as a key, you must compose an outer Pair in which the key is the inner Pair:

        %hash = (($keykey => $keyval) => $value);
  • The anonymous enum function takes a list of keys or pairs, and adds values to any keys that are not already part of a key. The value added is one more than the previous key or pair's value. This works nicely with the new qq:ww form:

        %hash = enum <<:Mon(1) Tue Wed Thu Fri Sat Sun>>;
        %hash = enum « :Mon(1) Tue Wed Thu Fri Sat Sun »;

    are the same as:

        %hash = ();
        %hash<Mon Tue Wed Thu Fri Sat Sun> = 1..7;
  • In contrast to assignment, binding to a hash requires a Hash (or Pair) reference. Binding to a "splat" hash requires a list of pairs or hashes, and stops processing the argument list when it runs out of pairs or hashes. See S6 for much more about parameter binding.

Files

  • Filename globs are no longer done with angle brackets. Use the glob function.

  • Input from a filehandle is no longer done with angle brackets. Instead of

        while (<HANDLE>) {...}

    you now write

        for =$handle {...}

    As a unary prefix operator, you may also apply adverbs to =:

        for =$handle :prompt('$ ') { say $_ + 1 }

    or

        for =($handle):prompt('$ ') { say $_ + 1 }

    or you may even write it in its functional form, passing the adverbs as ordinary named arguments.

        for prefix:<=>($handle, :prompt('$ ')) { say $_ + 1 }

Properties

  • Properties work as detailed in A12. They're actually object attributes provided by role mixins. Compile-time properties applied to containers and such still use the is keyword, but are now called "traits". On the other hand, run-time properties are attached to individual objects using the but keyword instead, but are still called "properties".

  • Properties are accessed just like attributes because they are in fact attributes of some class or other, even if it's an anonymous singleton class generated on the fly for that purpose. Since "rw" attributes behave in all respects as variables, properties may therefore also be temporized with temp, or hypotheticalized with let.

4 POD Errors

The following errors were encountered while parsing the POD:

Around line 239:

'=item' outside of any '=over'

Around line 664:

Deleting unknown formatting code R<>

Around line 736:

Deleting unknown formatting code R<>

Around line 843:

Non-ASCII character seen before =encoding in 'C<«$foo'. Assuming UTF-8