NAME

File::FormatIdentification::RandomSampling - methods to identify content of device o media files using random sampling

VERSION

version 0.006

SYNOPSIS

This module is suitable to get a good estimation about the content of media (or files). It uses random sampling of sectors to obtain heuristics about the content types.

To check the base type of a given binary string:

my $ff = File::FormatIdentification::RandomSampling->new(); # basic instantiation
my $type = $ff->calc_type($buffer); # calc type of given binary string

NAME

File::FormatIdentification::RandomSampling

TOOLS

The following tools are supplied with this module and are presented below:

crazy_fast_image_scan.pl

This script scans devices or images very fast using random sampling and reports wht kind of content could be found.

For a detailed documentation use the included POD there.

cfi_create_training_data.pl

This script scans a bunch of files and calcs most frequent one- and bigrams and stores them in a CSV file.

cfi_learn_model.pl

This script uses the CSV file and prints a new model module in style of File::FormatIdentification::RandomSampling::Model using AI::DecisionTree.

SOURCE

The actual development version is available at https://art1pirat.spdns.org/art1/crazy-fast-image-scan

METHODS

init_bytegrams

resets the internal bytegram state. Also called if object will be instantiated

update_bytegram

$buffer - updates the internal bytegram states using $buffer

calc_histogram

uses the most significant first 8 bytegram entries to from a histogram, returned as hash reference

is_uniform

returns true, if 1-byte bytegrams are uniform

is_empty

returns true, if 1-byte bytegrams indicating empty buffers

is_text

returns true, if 1-byte bytegrams are typical for texts

is_video

returns true, if 1-byte bytegrams are typical for MPEG/Quicktime Videos

calc_type

returns string indicating type of a given buffer

AUTHOR

Andreas Romeyke <pause@andreas-romeyke.de>

COPYRIGHT AND LICENSE

This is free software, licensed under:

The GNU General Public License, Version 3, June 2007

To install File::FormatIdentification::RandomSampling, copy and paste the appropriate command in to your terminal.

cpanm

cpanm File::FormatIdentification::RandomSampling

CPAN shell

perl -MCPAN -e shell
install File::FormatIdentification::RandomSampling

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	Go to GitHub issues (only if GitHub is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

SYNOPSIS

NAME

TOOLS

crazy_fast_image_scan.pl

cfi_create_training_data.pl

cfi_learn_model.pl

SOURCE

METHODS

init_bytegrams

update_bytegram

calc_histogram

is_uniform

is_empty

is_text

is_video

calc_type

AUTHOR

COPYRIGHT AND LICENSE

Module Install Instructions