The Perl Toolchain Summit needs more sponsors. If your company depends on Perl, please support this very important event.

NAME

File::FormatIdentification::Pronom

VERSION

version 0.06

SYNOPSIS

  use File::FormatIdentification::Pronom;
  my $pronomfile = "Droid-Signature.xml";
  my ( $signatures, $internals ) = parse_signaturefile($pronomfile);

DESCRIPTION

The module allows to handle Droid signatures. Droid is a utility which uses the PRONOM database to identify file formats.

See https://www.nationalarchives.gov.uk/PRONOM/ for details.

With this module you could:

convert Droid signatures to Perl regular expressions
analyze files and display which/where pattern of Droid signature matches via tag-files for wxHexEditor
calc statistics about Droid signatures

The module is in early alpha state and should not be used in production.

Examples

Colorize wxHexeditor fields

See example file bin/pronom2wxhexeditor.pl. This colorizes the hex-blob to check PRONOM pattern matches for a given file.

Identify file

There are better tools for the job, but as a proof of concept certainly not bad: Identifying the file type of a file.

  my $pronom = File::FormatIdentification::Pronom->new(
    "droid_signature_filename" => $pronomfile
  );
  # .. $filestream is a scalar representing a file
  foreach my $internalid ( $pronom->get_all_internal_ids() ) {
    my $sig = $pronom->get_signature_id_by_internal_id($internalid);
    next unless defined $sig;
    my @regexes = $pronom->get_regular_expressions_by_internal_id($internalid);
    if ( all {$filestream =~ m/$_/saa} @regexes ) {
        my $puid = $pronom->get_puid_by_signature_id($sig);
        my $name = $pronom->get_name_by_signature_id($sig);
        my $quality = $pronom->get_qualities_by_internal_id($internalid);
        say "$binaryfile identified as $name with PUID $puid (regex quality $quality)";
    }
  }

See example file bin/pronomidentify.pl for a full working script.

Get PRONOM Statistics

To get a feeling for which signatures need to be revised in PRONOM, or why which file formats are difficult to recognize, you can get detailed statistics for given signature files.

In the blog entry under https://kulturreste.blogspot.com/2018/10/heres-tool-make-it-work.html the statistic report is presented in more detail.

EXPORT

None by default.

NAME

File::FormatIdentification::Pronom - Perl extension for parsing PRONOM-Signatures using DROID-Signature file

SEE ALSO

File::FormatIdentification::Regex

AUTHOR

Andreas Romeyke pause@andreas-romeyke.de

COPYRIGHT AND LICENSE

Copyright (C) 2018/19/20 by Andreas Romeyke

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

The droid-signature file in t/ is from https://www.nationalarchives.gov.uk/PRONOM/Default.aspx and without guarantee, it does not look like it is legally protected. If there are any legal claims, please let me know that I can remove them from the distribution.

BUGS

Some droid recipes results in PCREs which are greedy and therefore the running time could be exponential with size of binary file.

CONTRIBUTING

Please feel free to send me comments and patches to my email address. You can clone the modules from https://art1pirat.spdns.org/art1/File-FormatIdentification-Pronom and send me merge requests.

AUTHOR

Andreas Romeyke <pause@andreas-romeyke.de>

COPYRIGHT AND LICENSE

This software is copyright (c) 2018 by Andreas Romeyke.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.