Bob Mathews

NAME

Disassemble::X86 - Disassemble Intel x86 binary code

SYNOPSIS

  use Disassemble::X86;
  $d = Disassemble::X86->new(text => $text_seg);
  while (defined( $op = $d->disasm() )) {
    printf "%04x  %s\n", $d->op_start(), $op;
  }

DESCRIPTION

This module disassembles binary-coded Intel x86 machine instructions. Output can be produced as plain text, or as a tree structure suitable for further processing.

METHODS

new

  $d = Disassemble::X86->new(
      text      => $text_seg,
      start     => $text_load_addr,
      pos       => $initial_eip,
      addr_size => 32,
      data_size => 32,
      size      => 32,
      format    => "Text",
  );

Creates a new disassembler object. There are a number of named parameters which can be given, all of which are optional.

text

The so-called text segment, which consists of the binary data to be disassembled. It can be given either as a string or as a Disassemble::X86::MemRegion object.

start

The address at which the text segment would be loaded to execute the program. This parameter is ignored if text is a MemRegion object, and defaults to 0 otherwise.

pos

The address at which disassembly is to begin, unless changed by $d->pos(). Default value is the start of the text segment.

addr_size

Gives the address size (16 or 32 bit) which will be used when disassembling the code. Default is 32 bits. See below.

data_size

Gives the data operand size, similar to addr_size.

size

Sets both addr_size and data_size.

format

Gives the name of an output-formatting module, which will be used to process the disassembled instructions. Currently, valid values are Text and Tree. See Disassemble::X86::FormatText, Disassemble::X86::FormatTree.

disasm

  $op = $d->disasm();

Disassembles a single machine instruction from the current position. Advances the current position to the next instruction. If no valid instruction is found at the current position, returns undef and leaves the current position unchanged. In that case, you can check $d->error() for more information.

addr_size

  $d->addr_size(16);

Sets the address size for disassembled code. Valid values are 16, "word", 32, "dword", and "long", but some of these are synonyms. With no argument, returns the current address size as 16 or 32.

data_size

  $d->data_size("long");

Similar to addr_size above, but sets the data operand size.

pos

  $d->pos($new_pos);

Sets the current disassembly position. With no argument, returns the current position.

text

  $text = $d->text();

Returns the text segment as a Disassemble::X86::MemRegion object.

at_end

  until ( $d->at_end() ) {
    ...
  }

Returns true if the current disassembly position has reached the end of the text segment.

contains

  if ( $d->contains($addr) ) {
    ...
  }

Returns true if $addr is within the memory region being disassembled.

next_byte

  $byte = $d->next_byte();

Returns the next byte from the current disassembly position as an integer value, and advances the current position by one. This can be used to skip over invalid instructions that are encountered during disassembly. If the current position is not valid, returns 0, but still advances the current position. Attempting to read beyond the 15-byte opcode size limit will cause an error.

op

This and the following functions return information about the previously disassembled machine instruction. $d->op() returns the instruction itself, in tree-structure format.

op_start

Returns the starting address of the instruction.

op_len

Returns the length of the instruction, in bytes.

op_proc

Returns the minimum processor model required. For instructions present in the original 8086 processor, the value 86 is returned. For instructions supported by the 8087 math coprocessor, the value is 87. Instructions initially introduced with the Pentium return 586, and so on. Note that setting the address or operand size to 32 bits requires at least a 386. Other possible return values are "mmx", "sse", "sse2", "3dnow", and "3dnow-e" (for extended 3DNow! instructions).

This information should be used carefully, because there may be subtle differences between different steppings of the same processor. In some cases, you must check the CPUID instruction to see exactly what your processor supports. When in doubt, consult the Intel Architecture Software Developer's Manual.

error

Returns the error message encountered while trying to disassemble an instruction.

LIMITATIONS

Multiple discontinuous text segments are not supported. Use additional Disassemble::X86 objects if you need them.

In some cases, this module will disassemble an opcode that would actually cause the processor to raise an illegal opcode exception. This may also be construed as a feature.

Some of the more exotic instructions like cache control and MMX extensions have not been thoroughly tested. Please let me know if you find something that is broken.

SEE ALSO

Disassemble::X86::FormatText

Disassemble::X86::FormatTree

Disassemble::X86::MemRegion

AUTHOR

Bob Mathews <bobmathews@alumni.calpoly.edu>

COPYRIGHT

Copyright (c) 2002 Bob Mathews. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.