Todd Harris


Audio::DB - Tools for generating relational databases of MP3s


      use Audio::DB;
      my $mp3 = Audio::DB->new(-user    =>'user',
                               -pass    =>'password',
                               -host    =>'db_host',
                               -dsn     =>'music_db',
                               -adaptor => 'mysql');


      $mp3->load_database(-dirs =>['/path/to/MP3s/'],
                          -tmp  =>'/tmp');


Audio::DB is a module for creating relational databases of MP3 files directly from data stored in ID3 tags or from flatfiles of information of track information. Once created, Audio::DB provides various methods for creating reports and web pages of your collection. Although it's nutritious and delicious on its own, Audio::DB was created for use with Apache::Audio::DB, a subclass of Apache::MP3. This module makes it easy to make your collection web-accessible, complete with browsing, searching, streaming, multiple users, playlists, ratings, and more!


MP3::Info for reading ID3 tags, LWP::MediaTypes for distinguising types of readable files;


No methods are exported.


Metrics for assigning songs to albums: Since Audio::DB processes file-by-file, it uses a number of parameters to assign tracks to albums. The quality of the results of Audio::DB will depend directly on the quality and integrity of the ID3 tags of your files.

Single tracks (those not belonging to a specific album) are distinguished by either undef or the label "single" in the album tag. In this way, all the single tracks for a given artist can be easily grouped together and fetched as a sort of pseudo-album. Of course, since you've ripped all of your MP3z from albums that you own, this shouldn't be a problem ;).

If two or more albums have the same name ("Greatest Hits"), Audio::DB checks to see if the year they were released and the total number of tracks is the same. If so, it thinks they are the same album, and all tracks are grouped together. This works most of the time, but obviously will fail sometimes. If you haven't assigned either of these tags, you'll have one less metric for distinguishing tracks. If you have a better metric for distinguishing tracks, please let me know!



 Title   : initialize
 Usage   : $mp3->initialize(-erase=>$erase);
 Function: initialize a new database
 Returns : true if initialization successful
 Args    : a set of named parameters
 Status  : Public

This method can be used to initialize an empty database. It takes the following named arguments:

  -erase     A boolean value.  If true the database will be wiped clean if it
             already contains data.

A single true argument ($mp3->initialize(1) is the same as initialize(-erase=>1). Future versions may support additional options for initialization and database construction (ie custom schemas).


 Title   : load_database
 Usage   :

       Creating a database by reading the tags from MP3 files:
       $stats = $mp3->load_database(-dirs    => ['/path/to/MP3s/'],
                                    -tmp     => '/tmp',
                                    -verbose => 100);

       Creating a database from a flat file of file information
:       $stats = $mp3->load_database(-files   => ['/path/to/files/'],
                                    -columns  => '[columns in file]',
                                    -tmp      => '/tmp',
                                    -verbose  => 100);

       Creating a database from the iTunes Music Library.xml file
       $stats = $mp3->load_database(-library  => '/path/to/iTunes\ Music\ Library.xml',
                                    -verbose  =>  100);

 Function: Parses mp3s and loads database
 Returns : Hash reference containing number of artists, albums, songs,
           and genres processed.
 Args    : array of top-level paths to mp3s; path to tmp directory, 
           verbose flag
 Status  : Public

load_database is a broad wrapper method that provides simplified access to many Audio::DB less-public methods. load_database expects an array of top level paths to directories containing MP3s to load. The second required parameter is the path to a suitable /tmp directory. Audio::DB::Build will write temporary files to this directory prior to doing bulk loads into the database.

The optional -verbose flag will a variety of messages to be displayed to STDERR during processing. The value of -verbose controls how frequently to display a message during song processing.

Instead of reading the tags directly, a flat file or files containing the ID3 tag information can be read. This is particularly useful, in part for offline files that have been cataloged with utilities like MP3Rage. Furthermore, I've found that the MP3::Info modules that Audio::DB::Build relies on isn't as robust at reading tags as other applications. The path to individual files or directories contain batches of these files should be passed in as an anonymous array. A second parameter, columns, should also be passed showing the order of the fields in the file. Minimally, the file should contain album, artist, and title. The following column names should be adhered to:

       title        => song title
       artist       => performing artist
       album        => containing album
       track        => song track number
       total_tracks => total tracks on album
       duration     => [optional] formatted string of song duration
       seconds      => [optional] song duration in seconds
       bitrate      => [optional] integer. The bitrate of the song
       samplerate   => [optional] sample rate of encoding
       comment      => [optional] song comment
       filename     => [optional] duh.
       filesize     => [optional] file size in kb
       filepath     => [optional] absolute file path
       tagtypes     => [optional] ID3 tag types present
       fileformat   => [optional] file format
       channels     => [optional] number of channels
       year         => [optional] year of the album
       rating       => [optional] user rating
       playcount    => [optional] song play count
       playdate     => [optional] date song last played
       dateadded    => [optional] date song added to collection
       datemodified => [optional] date song information last modified


 Title   : update_database
 Usage   : $mp3->update_database(-dirs    =>['/path/to/MP3s/'],
                                 -tmp     =>'/tmp',
                                 -verbose => '/100/');

           $mp3->update_database(-files    =>['/path/to/files'],
                                 -columns  =>'[columns in file]',
                                 -tmp     =>'/tmp',
                                 -verbose  => 100);

 Function: Parses new mp3s and adds them to a pre-existing database,
 Returns : true if succesful
 Args    : array of top-level paths to new mp3s; path to tmp directory
 Status  : Public

<B>update_database<B> accepts the same parameters and is a similar in function to load_database except that it takes a path to new mp3s and adds them to a preexisting database. The artist and album of these new files will be checked against those already existing in the database to prevent addition of duplicates. Duplicate songs, however, will be added. This is a feature, since you may want multiple copies of some tracks. It's up to you in advance to remove duplicates if you don't want them listed in your database. See the section below "Appending To A Preexisting Database" for more information on using this method.

The optional -verbose flag will a variety of messages to be displayed to STDERR during processing. The value of -verbose controls how frequently to display a message during song processing.

Like load_database, update_database can read information directly from flat files instead of the MP3s themselves. See load_database for more information.

Additional Public Methods

Audio::DB;:Build contains several additional public methods that you are welcome to use if you'd like greater control over file parsing and database loading. In the normal course of things, you probably will not need to use these methods directly but are described for completeness.


 Title   : cache_song
 Usage   : $mp3->cache_song(-full_path=>$full_path,-file=>$file);
 Function: Parses new mp3s and adds them to a pre-existing database
 Returns : true if successful
 Args    : a pre-processed data hash arising from one of the Parse modules
 Status  : Public

cache_song accepts the filename and full path to a file to be processed. It makes seperate calls to MP3::Info to extract ID3 tag info. Once extracted, song information is checked against the database to determine if the artist or album have been seen before, adding the song to that artist or album or inserting new artists / albums into the internal temporary data structure as required. Finally, the song is added to this structure.

Alternatively, cache_song can be passed a single tab-delimited line of data that holds the relevant information. See load_database for more information and using this interface.


 Title   : get_couldnt_read
 Usage   : $mp3->get_couldnt_read()
 Function: Fetch a list of files that could not be read
 Returns : Array reference of files whose tags could not be read
 Args    : none
 Status  : Public


 Title   : get_stats
 Usage   : $mp3->get_stats;
 Function: Get some info on files loaded
 Returns : Hash reference containing the number of artists,
           albums, genres, and songs loaded into the database.
 Args    : none
 Status  : Public

Private Methods

There are a number of private methods, described here for my own sanity. These methods are not part of the public interface.


 Title   : _establish_counters
 Usage   : $mp3->_establish_counters
 Function: Used to determine the highest values for keys before adding
           new data to the database.
 Returns : Hash reference containing the number of artists,
           albums, genres, and songs loaded into the database.
 Args    : none
 Status  : Private


 Title   : get_tags
 Usage   : $mp3->get_tags(@args);
 Function: Fetch and processes raw ID3 tags from files
 Returns : Hash reference of parsed tag data
 Status  : Private

_check_*_mem, _check_*_db

  _check_artist_mem _check_album_mem 
  _check_genre_mem _check_artist_db
  _check_album_db _check_genre_db

 Title   : _check_*_mem or _check_*_db
 Usage   : $mp3->_check_album_mem($artist);
 Function: Checks for the existence of the current tag
 Returns : ID of the appropriate album, artist, genre, if it already exists
 Args    : artist, album, or genre, as appropriate
 Status  : Private

The _check_* methods check for the pre-existence of the current artist, album, or genre for the file currently being examined. The two variations, *_mem and *_db, control whether this look up is done against the internal data structure in memory or against a pre-existing database.

_check_album_* is necessarily more complex. It attempts to assign songs to albums based on both the year and total number of tracks. See "Caveats" above for more information.


 Title   : _dump_data_structures
 Usage   : _dump_data_structures
 Function: Wrapper around all the _dump_* subroutines
 Returns : true if succesful
 Args    : none
 Status  : Private


  _dump_artists  _dump_albums
  _dump_songs    _dump_genres

 Title   : _dump_*
 Usage   : _dump_artists()
 Function: Create temp files for loading into the database
 Returns : true if succesful
 Args    : none
 Status  : Private

These methods dump out the appropriate data from the internal data structure into the temporary directory path. Some dump multiple tables:

  _dump_artists : artists and artist_genres tables 
  _dump_albums  : album and album_artists tables
  _dump_songs   : songs table
  _dump_genres  : genres table


 Title   : _load_db
 Usage   : _load_db()
 Function: Loads data from temporary tables into the database
 Returns : true if succesful
 Args    : none
 Status  : Private


 Title   : _stuff_album
 Usage   : _stuff_album()
 Function: Stuffs the current album into the internal data structure
 Returns : true if succesful
 Args    : none
 Status  : Private

Internal Data Structure

Audio::DB::Build builds a large internal data structure as it reads each file. The data strucutre is:

 Lookups - For quick lookups to see if an artist, album or genre has been encountered
 $self->{lookups}->{artists}->{$artist} = $artist_id;
 $self->{lookups}->{albums}->{$album}   = $album_id;
 $self->{lookups}->{songs}->{$song}     = $song_id;
 $self->{lookups}->{songs}->{$genre}    = $genre_id;

 Counters - for tracking the number of artists, albums, songs, and genres
 $self->{counters}->{artists}= $total;
 $self->{counters}->{albums} = $total;
 $self->{counters}->{songs}  = $total;
 $self->{counters}->{genres} = $total;

 $self->{couldnt_read} = [ files that could not be read ];

The main data structure of artists, albums, songs, and genres I know, I know, its partially denormalized.

 $self->{artists}->{$artist_id} = { artist => artist name,
                                    genres => { $genre_ids => total },
                                    albums => { $album => $album_id }

 $self->{albums}->{$album_id} = { album     => $album,
          # For tracking multiple genres per album
                                  genres    => { $genre_ids => ++ },
          # For tracking multiple artists per album (compilation CDs)
                                  contributing_artists  => { $artist_id => ++ },
          # Internal measure for distinguishing same-named albums
                                  total_tracks => total number of tracks,
                                  year         => year released

 $self->{songs}->{$song_id} = { title        => song title,
                                artist_id    => artist_id,
                                album_id     => album_id,
                                genre_id     => genre_id,
                                track        => track number,
                                total_tracks => total tracks on album,
                                duration     => formatted duration,
                                seconds      => raw seconds,
                                bitrate      => song bitrate,
                                samplerate   => sample rate,
                                comment      => id3 comment,
                                filename     => filename,
                                filesize     => filesize,
                                filepath     => filepath,
                                tagtypes     => types of ID3 tags found,
                                format       => MPEG layer,
                                channels     => stereo / mono / joint,
                                song_year    => year (also with album),
                                rating       => user rating,
                                playcount    => play count }

 $self->{genres}->{$genre_id} = { genre => $genre }


This module implements a fairly complex internal data structure, which in itself rests upon lots of things going right, like reading ID3 tags, tag naming conventions, etc. On top of that, I wrote this in a Starbucks full of screaming children.


Need a resonable way of dealing with tags that can't be read

Lots of error checking needs to be added. Support for custom data schemas, including new data types like more extensive artist info, paths to images, etc.

Keep track of stats for updates. Fix update - needs to use mysql (these are the _check_artist_db routines that all need to be implemented)

Robusticize new for different adaptor types

Add in full MP4 support make the data dumps rely on the schema in the module put the schema into its own module


Copyright 2002-2004, Todd W. Harris <>.

This module is distributed under the same terms as Perl itself. Feel free to use, modify and redistribute it as long as you retain the correct attribution.


Chris Nandor <> wrote MP3::Info, the module responsible for reading MP3 tags. Without, this module would be a best-selling pulp romance novel behind the gum at the grocery store checkout. Chris has been really helpful with issues that arose with various MP3 tags from different taggers. Kudos, dude!

Lincoln (Dr. Leichtenstein) Stein <> wrote much of the original adaptor code as part of the l<Bio::DB::GFF> module. Much of that code is incorporated here, albeit in a pared-down form. The code for reading ID3 tags from files only with appropriate MIME-types is borrowed from his <Apache::MP3> module. This was a much more elegant than my lame solution of checking for .mp3! Lincoln tolerates having me in his lab, too, even though I use a Mac.


Audio::DB::Adaptor::dbi::mysql,Audio::DB::Util::Reports, Apache::MP3, Apache::Audio::DB,MP3::Info