The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

docs/pdds/pdd04_datatypes.pod - Parrot's internal data types

ABSTRACT

This PDD describes Parrot's internal data types.

{{ NOTE: this is a good overview, but we need more complete specifications of the behavior of the datatypes. }}

DESCRIPTION

This PDD details the basic datatypes that the Parrot core knows how to deal with. Three of these (the integer, floating point and string datatypes) have no additional semantics. The fourth datatype, the Parrot Magic Cookie (PMC) acts as the basis for all high level languages running on top of Parrot; only the most basic aspects are described here.

Note that PMC and string internals are volatile and may be changed in the future (although this will become increasingly unlikely as we near v1.0). Access from external code to the internals of particular datatypes should be via the extension mechanism (see docs/pdds/pdd11_extending.pod, which has more explicit guarantees of stability.

IMPLEMENTATION

Integer data types

Integer data types are generically referred to as INTs. These are whatever size native integer was chosen at Parrot configuration time. The C-level typedefs INTVAL and UINTVAL get you a platform-native signed and unsigned integer respectively.

Floating point data types

Floating point data types are generically referred to as NUMs. These are whatever size float was chosen when parrot was configured. The C level typedef FLOATVAL will get you one of these.

String data types

Parrot has a single internal string form:

    struct parrot_string_t {
        pobj_t obj;
        UINTVAL bufused;
        void *strstart;
        UINTVAL strlen;
        const ENCODING *encoding;
        const CHARTYPE *type;
        INTVAL language;
    }

The fields are:

obj

A pointer to a Parrot object, Parrot's most general internal data type. In this case, it holds the buffer for the string data, the size of the buffer in bytes, and any applicable flags.

bufused

The amount of the buffer currently in use, in bytes.

strstart

A pointer to the beginning of the actual string (which may not be positioned at the start of the buffer).

strlen

The length of the string, in characters.

encoding

How the data is encoded (e.g. fixed 8-bit characters, UTF-8, or UTF-32). Note that this specifies encoding only -- it's valid to encode EBCDIC characters with the UTF-8 algorithm. Silly, but valid.

The ENCODING structure specifies the encoding (by index number and by name, for ease of lookup), the maximum number of bytes that a single character will occupy in that encoding, as well as functions for manipulating strings with that encoding.

type

What sort of string data is in the buffer, for example ASCII, EBCDIC, or Unicode.

The CHARTYPE structure specifies the character type (by index number and by name) and provides functions for transcoding to and from that character type.

language

This specifies the language corresponding to the string. This is to allow for locale-based data to be attached to strings. To give an example of the use of this: strings in German may not sort in the same way as strings in French, even when both types use the Latin-1 charset and are encoded in UTF-8.

Note that language-agnostic utilities are at liberty to ignore this entry.

Parrot Magic Cookies (PMCs)

Parrot Magic Cookies, or PMCs, are the last of Parrot's basic datatypes, but are also potentially the most important. Their basic structure is as follows. All PMCs have the form:

    struct PMC {
        pobj_t obj;
        VTABLE *vtable;
 #if ! PMC_DATA_IN_EXT
        DPOINTER *data;
 #endif
        struct PMC_EXT *pmc_ext;
    };

where obj is a pointer to an pobj_t structure:

    typedef struct pobj_t {
        UnionVal u;
        Parrot_UInt flags;
 #if ! DISABLE_GC_DEBUG
        UINTVAL _pobj_version;
 #endif
    } pobj_t;

and where:

    typedef union UnionVal {
        struct {
            void * _bufstart;
            size_t _buflen;
        } _b;
        struct {
            DPOINTER* _struct_val;
            PMC* _pmc_val;
        } _ptrs;
        INTVAL _int_val;
        FLOATVAL _num_val;
        struct parrot_string_t * _string_val;
    } UnionVal;

u holds data associated with the PMC. This can be in the form of an integer value, a floating point value, a string value, or a pointer to other data. u may be empty, since the PMC structure also provides a more general data pointer, but is useful for PMCs which hold only a single piece of data (e.g. PerlInts).

flags holds a set of flags associated with the PMC; these are documented in include/parrot/pobj.h, and are generally only used within the Parrot internals.

_pobj_version is only used for debugging Parrot's garbage collector. It is documented elsewhere (well, it will be once we get around to doing that...).

vtable holds a pointer to the vtable associated with the PMC. This points to a set of functions, with interfaces described in docs/pdds/pdd02_vtables.pod that implement the basic behaviour of the PMC (i.e. how it behaves under addition, subtraction, cloning etc.)

data (if present) holds a pointer to any additional data associated with the PMC. This may be NULL.

pmc_ext points to an extended PMC structure. This has the form:

    struct PMC_EXT {
 #if PMC_DATA_IN_EXT
        DPOINTER *data;
 #endif
        PMC *_metadata;
        struct _Sync *_synchronize;
        PMC *_next_for_GC;
    };

data is a generic data pointer, as described above.

_metadata holds internal PMC metadata. The specification for this has not yet been finalized.

_synchronize is for access synchronization between shared PMCs.

_next_for_GC determines the next PMC in the 'used' list during dead object detection in the GC.

PMCs are not required to have a PMC_EXT structure (i.e. pmc_ext can be null).

PMCs are used to implement the basic data types of the high level languages running on top of Parrot. For instance, a Perl 5 SV will map onto one (or more) types of PMC, while particular Python datatypes will map onto different types of PMC.

Vtable Overloading

PMCs may declare vtable methods. The following list details the raw method names:

init

Called when an object is first created.

init_pmc

Alternative entry point called when an object is first created. Accepts a PMC parameter used to initialize the given object. Interpretation of the PMC is PMC-specific.

NOTE: It is strongly suggested that init_pmc(PMCNULL) be equivalent to init(), though there will of necessity be exceptions.

morph
mark

Called when the DOD is tracing live PMCs. If this method is called then the code must mark all strings and PMCs that it contains as live, otherwise they may be collected.

This method is only called if the PMC is flagged as having a special mark routine, and is not necessary for normal objects.

destroy

Called when the PMC is destroyed. This method is only called if the PMC is marked as having an active finalizer.

clone

Clone a PMC.

getprop
setprop
delprop
getprops
type
type_keyed
type_keyed_int
type_keyed_str
subtype
name
find_method
get_integer

Return the integer value of the object

get_integer_keyed
get_integer_keyed_int
get_integer_keyed_str
get_number

Return the floating-point value of the object

get_number_keyed
get_number_keyed_int
get_number_keyed_str
get_bignum

Return the extended precision numeric value of the PMC

get_string

Return the string value of the PMC

get_string_keyed
get_string_keyed_int
get_string_keyed_str
get_bool

Return the true/false value of the PMC

get_bool_keyed
get_bool_keyed_int
get_bool_keyed_str
get_pmc

Return the PMC for this PMC.

get_pmc_keyed
get_pmc_keyed_int
get_pmc_keyed_str
get_pointer
get_pointer_keyed
get_pointer_keyed_int
get_pointer_keyed_str
set_integer_native

Set the integer value of this PMC

set_integer_same
set_integer_keyed
set_integer_keyed_int
set_integer_keyed_str
set_number_native

Set the floating-point value of this PMC

set_number_same
set_number_keyed
set_number_keyed_int
set_number_keyed_str
set_bignum_int

Set the extended-precision value of this PMC

set_string_native

Set the string value of this PMC

set_string_same
set_string_keyed
set_string_keyed_int
set_string_keyed_str
set_bool

Set the true/false value of this PMC

assign_pmc

Set the value to the value of the passed in

set_pmc

Make the PMC refer to the PMC passed in

set_pmc_keyed
set_pmc_keyed_int
set_pmc_keyed_str
set_pointer
set_pointer_keyed
set_pointer_keyed_int
set_pointer_keyed_str
elements

Return the number of elements in the PMC, if the PMC is treated as an aggregate.

pop_integer
pop_float
pop_string
pop_pmc
push_integer
push_float
push_string
push_pmc
shift_integer
shift_float
shift_string
shift_pmc
unshift_integer
unshift_float
unshift_string
unshift_pmc
splice
add
add_int
add_float
subtract
subtract_int
subtract_float
multiply
multiply_int
multiply_float
divide
divide_int
divide_float
modulus
modulus_int
modulus_float
cmodulus
cmodulus_int
cmodulus_float
neg
bitwise_or
bitwise_or_int
bitwise_and
bitwise_and_int
bitwise_xor
bitwise_xor_int
bitwise_ors
bitwise_ors_str
bitwise_ands
bitwise_ands_str
bitwise_xors
bitwise_xors_str
bitwise_not
bitwise_shl
bitwise_shl_int
bitwise_shr
bitwise_shr_int
concatenate
concatenate_native
is_equal
is_same
cmp
cmp_num
cmp_string
logical_or
logical_and
logical_xor
logical_not
repeat
repeat_int
increment
decrement
exists_keyed
exists_keyed_int
exists_keyed_str
defined
defined_keyed
defined_keyed_int
defined_keyed_str
dtem delete_keyed_str
nextkey_keyed
nextkey_keyed_itr_str
invoke
can
does
isa
fsh
visit
share
add_method
add_attribute
add_parent
add_role

Interaction between PMCs and high-level objects

{{ Address the problem of high-level objects inheriting from low-level PMCs, and any structural changes to low-level PMCs that might require. }}

ATTACHMENTS

None.

REFERENCES

The perl modules Math::BigInt and Math::BigFloat. Alex Gough's suggestions for bigint/bignum implementation, outlined in docs/pdds/pdd14_bignum.pod. The Unicode standard at http://www.unicode.org.

GLOSSARY

Type

Type refers to a basic Parrot data type. There are four such: integers, floating point numbers (often just numbers), strings and Parrot Magic Cookies (PMCs).

VERSION

1.4

CURRENT

     Maintainer: Dan Sugalski <dan@sidhe.org>
     Class: Internals
     PDD Number: 4
     Version: 1.5
     Status: Developing
     Last Modified: 11 June 2005
     PDD Format: 1
     Language: English

HISTORY

Version 1.5, 11 June 2005
Version 1.4, 20 February 2004
Version 1.3, 2 July 2001
Version 1.2, 2 July 2001
Version 1.1, 2 March 2001
Version 1, 1 March 2001

CHANGES

Version 1.5

Removed BigInt and BigNum from the definition of I* and N* registers -- according to Leo they are now always PMCs, never register types of their own.

Version 1.4

Document basic PMC internals. Make clear the fact that the bigint/bignum description is still provisional. Other minor fixups to make the documentation match reality.

Version 1.3

Fixed some silly typos and dropped phrases.

Took all the underscores out of the field names.

Version 1.2

The string header format has changed some to allow for type tagging. The flags information for strings has changed as well.

Version 1.1

INT and NUM are now concepts rather than data structures, as making them data structures was a Bad Idea.

Version 1

None. First version