NAME
File::FormatIdentification::Pronom
VERSION
version 0.07
SYNOPSIS
use File::FormatIdentification::Pronom;
my $pronomfile = "Droid-Signature.xml";
my ( $signatures, $internals ) = parse_signaturefile($pronomfile);
DESCRIPTION
The module allows to handle Droid signatures. Droid is a utility which uses the PRONOM database to identify file formats.
See https://www.nationalarchives.gov.uk/PRONOM/ for details.
With this module you could:
- convert Droid signatures to Perl regular expressions
- analyze files and display which/where pattern of Droid signature matches via tag-files for wxHexEditor
- calc statistics about Droid signatures
The module is in early alpha state and should not be used in production.
Examples
Colorize wxHexeditor fields
See example file bin/pronom2wxhexeditor.pl. This colorizes the hex-blob to check PRONOM pattern matches for a given file.
Identify file
There are better tools for the job, but as a proof of concept certainly not bad: Identifying the file type of a file.
my $pronom = File::FormatIdentification::Pronom->new(
"droid_signature_filename" => $pronomfile
);
# .. $filestream is a scalar representing a file
foreach my $internalid ( $pronom->get_all_internal_ids() ) {
my $sig = $pronom->get_signature_id_by_internal_id($internalid);
next unless defined $sig;
my @regexes = $pronom->get_regular_expressions_by_internal_id($internalid);
if ( all {$filestream =~ m/$_/saa} @regexes ) {
my $puid = $pronom->get_puid_by_signature_id($sig);
my $name = $pronom->get_name_by_signature_id($sig);
my $quality = $pronom->get_qualities_by_internal_id($internalid);
say "$binaryfile identified as $name with PUID $puid (regex quality $quality)";
}
}
See example file bin/pronomidentify.pl for a full working script.
Get PRONOM Statistics
To get a feeling for which signatures need to be revised in PRONOM, or why which file formats are difficult to recognize, you can get detailed statistics for given signature files.
In the blog entry under https://kulturreste.blogspot.com/2018/10/heres-tool-make-it-work.html the statistic report is presented in more detail.
EXPORT
None by default.
NAME
File::FormatIdentification::Pronom - Perl extension for parsing PRONOM-Signatures using DROID-Signature file
SEE ALSO
File::FormatIdentification::Regex
AUTHOR
Andreas Romeyke pause@andreas-romeyke.de
COPYRIGHT AND LICENSE
Copyright (C) 2018/19/20 by Andreas Romeyke
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
The droid-signature file in t/ is from https://www.nationalarchives.gov.uk/PRONOM/Default.aspx and without guarantee, it does not look like it is legally protected. If there are any legal claims, please let me know that I can remove them from the distribution.
BUGS
- Some droid recipes results in PCREs which are greedy and therefore the running time could be exponential with size of binary file.
CONTRIBUTING
Please feel free to send me comments and patches to my email address. You can clone the modules from https://art1pirat.spdns.org/art1/File-FormatIdentification-Pronom and send me merge requests.
AUTHOR
Andreas Romeyke <pause@andreas-romeyke.de>
COPYRIGHT AND LICENSE
This software is copyright (c) 2018 by Andreas Romeyke.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.